— Storage Horizons Blog —
Five myths and Half-truths are my second installment that I’ll continue on the Storage Horizons blog in increasing depth, because I’m convinced that efficiency is key to Cloud computing being successful for tier1 applications as well as any new IT configuration where overall cost is now a metric in the business.
The first myth is that all drives are the same. I think it’s because of the bifurcation of the PC/desktop market and the enterprise IT sector, as well as the margin that storage providers have placed upon enterprise drives. There are key differences, and relate not only to performance, but reliability, data integrity, and overall ruggedness for 24×7 utilization. This is contrasted to a drive that’s either meant to be used 8 hours a day period, or a drive that is meant to have basically continuous, less strenuous workload. The key here is that since HDDs rather look the same on the outside, a good number of people just think they ARE all the same. It’s so unfortunate, but its exacerbated by the fact that drives are used incorrectly because of this as well as the environment they are placed within (packaging, which I will cover later). All of this relates to how and why drives fail, or at least seem to fail.
Enterprise drives vs. Nearline and Desktop drives are always a subject of conversation about whether they are worth it or not in both ends of the conversation. The point is that they all have a place, and they should be used there!
It’s always interesting to see the ‘new idea’ of using the cheaper drives in place of the enterprise to achieve the same thing. When you consider the type of drive and solution to solve the data protection of data centers, considerations like the sheer numbers, the entropy difference between the solution size with different drive types can be staggering small or unfortunately, very large, driving service costs constantly.
A half-truth is that hard drives are unreliable. Well, if you believe the first one, this one leads to it directly and indirectly. Depending on different drive types, then service load, environmental conditions, and storage software in servers or arrays that interface with the drives, the pull rates of drives in the field would suggest they are just failing at a rate of almost 10%! This, when hard drives typically are specified at 1% failure rate if used in proper operating rate and environment.
To get specific (my daughter says nerdy), HDD (and SSD’s for that matter) have a specified duty cycle (amount used per day and how hard they are pushed) that the 1% failure rate specification comes from. It’s actually called the MTBF (mean time between failure) and typically manufacturers of drives strive for the 1 million hour number, after which there are diminishing returns on striving for more.
Enterprise hard drives (also called mission critical) are rated at 24×7 usage at full-speed operation. This means using it all day every day for say a database that moves the heads around constantly. This generates the most heat and the most wear, depending on the type of environment the drives is contained within, enclosure wise. Nearline hard drives (also called business critical), are rated at about 30% duty cycle per day for their 1 million hour MTBF or 1% failure rate. This means it can be used about 24 hours a day at 30% usage without undue wear and commensurate failure rate. This type of drive is not for a database, rather for backup and archive. The 30% duty cycle relates to the type of environment the drive is supposed to be used within. If only it were true, and I’ll explain later.
The last type of HDD is the desktop drive, used for PC’s. This is what people normally think of when they talk about SATA drives. This is the cheap drive you find at the electronics store and wonder why when you buy drives from your storage vendor why they cost so much more. The cost difference is not at large as you think, but most vendors of storage do rake customer over the coals for enterprise and even nearline drives all for the sake of ‘extended testing.’ Suffice it to say, it’s basically a rip-off of the customers and does explain a lot of about some of the new data centers that chose to use the cheapest drives possible and just employ mass numbers with n-way RAID-1 while dealing with the massive fall out of drive failures from over used drives in bad environmental situations. Getting back to desktop drives, these drives are meant to be used no more than 8 hours per day period. The metrics used are based on long standing degrees of design discipline within drive manufacturers. This is the way they cut costs between the three basic drive types.
To recap, enterprise drives are built for performance and high utilization, while nearline drives are meant for back-up and archive. The desktop drive is meant for the PC or external backup drive at home or in a small office.
The actual environment that drives are placed within is something of note recently. I was interviewed by Bloomberg after which an article about vibration in the data center was published in Business Week late last year. The reporter asked me many questions related to loss of performance based upon microvibration within data center racks in which the drives are housed. While this is true, and can cause up to 90% loss of performance in a drive based on ‘bad’ packaging, the key point to make is well past performance. It’s about reliability of the actual drives and the potential for early failures as well as false failures or ‘NTFs’ (no trouble found).
Half of the reasons that hard drives actually fail are because of heat and vibration. An HDD is an amazing device, and if treated as specified, with low external vibration and heat placed upon it, the HDD will last a very long time, and most likely suffer slow degradation over time versus a total failure. I’ve been in the disc and storage engineering world for 32 years and the facts have gotten buried for way too long.