Totally agree with the heat comment. It's why Dell and lots of other vendors are pushing for running gear hotter, the longevity you lose is more than offset by the cooling cost savings. But I'd have to disagree on the compute power being the limit on density, its still SAN storage on a raw capacity basis when SSD just isn't economical. When you need 100 TB for your data warehouse cube SSDs just don't cut it unfortunately..
Sorry, I was meaning that we have had challenges on density where heat has in the past limited our compute power density and still does (once we redesigned data center to eliminate power delivery per CUFT as the bottleneck several years ago now). In one major example mech HDDs were what killed us as heat caused premature failures, where excessive heat was caused by server designs that were too dense and didn't allow for proper cooling of the mech HDDs. This is even in a well designed DC layout(e.g. cold rows maintained at spec, where we could easily have lowered spec given our CRACs... but cold row temp can only go so low before other concerns occur). I am not including stupid things like channeling cooled air directly from CRAC through narrow diameter ducting straight to the poorly designed blade server chassis intakes, which would be a crazy requirement IMO for commodity server blades
. I am referring specifically to the now extinct BL10s from HP which crammed two 2.5" drives side by side into the front of the BL10 blade, with the rest of the server components crammed into the slim enclosure in between the drives at the front of the blade and the exhaust end of the enclosure, and then those blades were crammed together into the blade chassis. Basically you were staring at nothing but 2.5" drives when looking at a fully populated BL10 blade chassis... without the possibility of fans inside the blades to aid in pulling air over the drives. There just wasn't room for fans big enough to do anything as the blade width(when looking at the blade inserted into the chassis and racked) was equal to the height of a 2.5" drive, and the height of the blade equal to the width of two 2.5" drives. Basically looking at a rack with a few populated BL10 blade chassis was like looking at 2.5" arrays in a SAN, but without the cooling architecture of a SAN. Given 2.5" drives running at 7200RPM (or faster) aren't known for their ability to stay cool without good ventilation, and given the problem was made worse by the heat generated by the rest of the blade server preventing any other type of effective conduction or convection cooling for the HDDs(rather heating them up more), those enterprise class supposed high temp safe 2.5" drives dropped like flies. We always configured each HDD pair in a BL10 in RAID 1(BL10 only could hold 2 2.5" drives), sometimes we would lose the second before the mirror even finished rebuilding after replacing the first failed drive, probably due to the high I/O of the rebuild... it was that bad. BTW, I am talking low I/O servers given that is what BL10s were designed to for, like very low traffic web servers, nothing that would have made a 2.5" overheat in normal (non crappy blade design) scenarios. Had those 2.5" drives been SSD(or SAN booting if that was even possible with BL10s, not sure as we didn't do much SAN booting back then)... I am thinking we wouldn't have had the same issues. But, in the combination I described, ultimately it was heat (and bad server design relative to cooling) that limited our compute density until we could get rid of all of the BL10s. That is the long version of what I meant by heat limiting density.
Regarding SAN storage, we aren't running into density issues as was forecasted years ago, at least not to the same degree... not even in the same league as we anticipated. This is given the changes in app architecture that have supported (required really) moving to DAS. THe biggest example is what I will call our big mama DH(sort of a DH, but not really
); data spread across HADOOP clusters indexed via SOLR. This new stuff supports our new application architecture as well as new features in our current apps that use real time analytics which in turn leverage HUGE near real time updated reference data sets to do some really cool stuff. All of this only possible with HADOOP or similar technologies which do not recommend or even really support SAN presented storage, so we dodged the SAN density constraint again. We certainly could not do the same things via any RDBMS architecture that I am aware of, even with big data appliances facilitating. This is due to MANY bottlenecks in a typical RDBMS architecture that one hits one right after the other. For this new distributed data processing and storage architecture we haven't really needed SSD (yet) given the massive parallel nature of processing in the HADOOP clusters; our bottlenecks are elsewhere... like network. Even on 10GbE we hit bottlenecks in HADOOP clusters, especially when crossing subnets, so we are moving to 40 and 100 now at the appropriate layers of the network to mitigate. Who knew THAT was coming 10 years ago, fantastically magical and cool tech!
We were certainly worried about SAN and were exploring the many options to scale SAN/NAS usage... but fundamental changes in app and data processing architectures just made that worry more or less disappear overnight, and we had new and different things to worry about, but SAN density or I/O with the drives themselves was never *really* one of them for very long, *especially* once SSD came along
. That said, if SSD were cheaper than HDD, it would be a no brainer for us to use SSD for DAS in these servers as well, the reduction of power draw and heat generation alone for our thousands of servers in these clusters would be realized immediately and be very nice indeed!! The unneeded performance boost would just be future proofing a bit I think when we once again upgrade infrastructure and will have moved the bottleneck back to the servers.
In other cases like LDAP directories or relatively small but very high I/O local DBs that don't warrant the cost of SAN and the associated HBAs, FC infrastructure, etc. we have migrated to DAS SSD for EXCELLENT results yet again. We have effectively removed the use case of needed I/O performance justifying the cost of SAN assuming a LOT of space isn't *also* needed(rare to need more than what can be done with 14 local SSDs in an IBM x3630, most cases need far less). We run these types of apps with replicas for scaling and redundancy vs. shared DAS or SAN architectures removing yet another SAN use case... HA access to the data. An example is the underlying architecture of MS Active Directory using an LDAP directory and robust replication strategies to fulfill both HA and scaling requirements.
Our backups have also moved to DAS (where we were originally tape, then migrated to SAN, and after just finishing that migration moved to DAS
), again removing what was a huge SAN (and FC network) bottleneck.
We are mainly left with SAN to support a small subset of the traditionally architected(aka legacy) solutions we host... basically the RDBMS stuff with mega DBs, whether traditional reporting DH architectures or application DB tiers. The biggest remaining and most common use case left for SAN right now for us is VMWare based virtualization where the majority of our 60k server OS instances are virtual. Given ESXi we are pretty much entirely dependent on SAN for virtualization... no local HDD space is even really needed. All of the VMDKs boot and run straight from the SAN, and most VMDKs are small, and as most VMs are typically low I/O to the SAN, RAM and CPU end up being our bottleneck, not SAN. The tier 1 SAN/Tier 1 app combo I referenced basically correlates to a small subset of our traditionally architected applications who have VERY large Oracle/DB2 RDBMS DBs (many TB that must be in the same DB schema) with thousands of concurrent r/w connections. BUT, even in that scenario, for the majority of our app instances speed of SAN is no longer the bottleneck given SSD, now it is number of processing threads given we have hit the upper limit on number of threads that a single server has available on the ix86 and RISC servers we use (mainly HP and a few IBM for x86 and IBM for RiSC(going away in favor of RHEL on ix86)). Unfortunately our application architecture doesn't allow true active/active RAC as we can only distribute certain types of processing threads. Basically our app was developed long before RAC was even conceived, roughly 20 years ago now, and wasn't designed to handle independent middleware nodes accessing the DB in r/w mode concurrently. Given that we cannot scale compute for an RDBMS linearly as Oracle RAC is designed to do which would probably then make SAN the bottleneck... SAN density capability limiting compute density has never happened (or lasted for long), and when it did it was an I/O issue, especially once SSDs started to be used. For the apps we have that do support true active/active (like traditional DH populated with hundreds of ETL type feeds and apps like BO or OBIEE providing the analytics), we have moved towards big data type "appliances" like Exadata and Vertica, so again moving away from "traditional" SAN/NAS architecture and its bottlenecks that would indirectly limit compute density.
Basically, in all of the places we *thought* we were going to hit issues due to SAN(primarily I/O) that would limit compute density it just hasn't happened. We haven't been
constrained by storage density limiting compute density, given we bought more than enough time by moving to SSD for RDBMS to become obsolete for massive data processing needs. Now the limiting factors have become all kinds of other things depending on the architectures of the apps in question... at least design limitations. What I mean by that would be another WOT , but basically we occasionally have issues with workloads changing after hosts had storage provisioned for their DBs so that too many hosts are trying to leverage the same access gateway, director, and SAN port(s in the case of multipathing) causing either saturation issues or port CPU utilization issues. This is easily remedied by redistributing hosts' paths to storage. We are almost entirely an FC shop vs. FCoE BTW... if FCoE we would have been in better shape had we needed to be.