Wednesday, January 21, 2015

Light Bulbs and Hard Disk Drives

Florescent Light Bulbs and Hard Disk Drives

Filling a Data Center

From 2011 to 2014 my job was to fill data centers with hardware. Tons of hardware, thousands of servers, hard drives by the truckload. There is a lot of concrete and steel in the cloud.

Ars Technica posted an article Hard disk reliability examined once more... that really hit home.

"Across a range of models from 2 to 4 terabytes, the HGST models showed low failure rates; at worse, 2.3 percent failing a year. This includes some of the oldest disks among Backblaze's collection; 2TB Desktop 7K2000 models are on average 3.9 years old, but still have a failure rate of just 1.1 percent.

At the opposite end of the spectrum are Seagate disks. Last year, the two 1.5TB Seagate models used by Backblaze had failure rates of 25.4 percent (for the Barracuda 7200.11) and 9.9 percent (for the Barracuda LP). Those units fared a little better this time around, with failure rates of 23.8 and 9.6 percent, even though they were the oldest disks in the test (average ages of 4.7 and 4.9 years, respectively).

However, their poor performance was eclipsed by the 3TB Barracuda 7200.14 units, which had a whopping 43.1 percent failure rate, in spite of an average age of just 2.2 years."

Florescent Light Bulbs

The death of the 100 watt light bulb in 2011 hit my wife really hard. 100 watt bulbs are no longer available due to government regulations. 

She has never been a fan of florescent - in any color spectrum or wattage. 60 watt bulbs don't throw enough light. For the master bathroom we have fixtures with four bulbs each, and it's still not enough light.  My experience with florescent is that the longevity tests don't match my experience. 

"They [florescent bulbs] sport a much longer average lifespan, (from: eHow article) anywhere from 8 to 15 times the normal life of an incandescent bulb, which is usually estimated to last about 1,000 hours."

I'm calling BS on 8000-15000 hours. 

24x7x365 is 8760 hours, so I should be able to burn one of these all day, every day for something approaching two years. Never seen that happen. I can't count the number that died within days of installation. The manufacturers posit that "if you just leave them on...". Well, I'm the customer, and I'm going to turn my lights on and off, all I want. Why is that considered mistreatment?

Florescent spotlights are worse. They have unusual designs that do not fit all fixtures, some have slow-flickering starts, and most cannot be used with a dimmer switch. 

Of course, you can pay more (much more) to get better color, better fit and the ability to dim.

Florescent Light Bulbs and Hard Drives - The Connection

Three common problems:

Having the right bulb / drive at the right time. How many times have you needed to replace a three-way bulb, or a Reveal bulb, or an indoor (versus outdoor) floodlight? Heavens forbid that you try to use a floodlight in the living room when you really need a spotlight. Complex lighting requirements require various bulbs.

Ditto for hard drives. Engineers and TPMs still believe that their services require very specific hardware. They dictate design for every major component in the stack. Hard drive speed, capacity, power requirements are all scientifically calculated to perform better than that other online product (everyone is just trying to survive).

The Thailand floods (October 2011) knocked major hard drive manufactures offline (Forbes article). Ever triage (ration) incoming supply during a disaster? Competing business lines, with competing VPs and P/L targets are hard to work with in a perfect situation. Even a small bad batch of drives (we had some where the internal paint was flaking off into the spindles) triggered apocalyptic escalation. When you are consuming drives as fast as they can be manufactured any hick-up can be devastating.

What is the lifespan of a family of hard drives? 
CPU lifespan is measured in months.

Maintenance costs are high, and inventory can be a hidden load on the budget. 

Imagine a datacenter the size of a Safeway or HEB store filled with rows and racks with thousands of servers. Now, picture a 3-5% hard drive failure rate -- please disregard whether the devices are being mistreated - they are all being mistreated equally.  

Sure, servers are designed for resiliency and redundancy and hard drives are totally cheap. But, how many, and what type do you have to keep on hand to service the failures? I have a box in the garage with various bulbs-- and never  have the correct one. Also, I can't reallocate spare light bulb money for any other purpose.

What is the accounting term for parts sitting on a shelf? TPMs call it waste -- until their stuff fails, then it's called the best insurance money can buy.

The only way to "turn" the parts inventory is to have something fail. Server sends alert, service ticket generated, employee wearing comfortable shoes walks out to swap out the drive. The "cheap" hard drive is really pretty expensive.

Disposal used to be easy. 

Unscrew the bulb, toss in the trash. Wipe the drive and destroy. But now, florescent bulbs must be recycled due to their mercury content. We improved a product by making it more toxic, more expensive and more difficult to dispose of. Heck, it's practically against the law to toss one out. 

Purging the data from a 3TB drive is also pretty tough. Not for any single drive... for hundreds at a time. Let's say you open a data center in 2010... and everything in it is now end of life. How long does it take to churn through the decommission of thousands of devices?

Drive wipe software was behind the curve in 2012 and 2013. Vendors could not support the 3TB drives - drives that seemed so cool to the business when they were introduced. Rising standards for data destruction, and corporate standards for recycling, will continue to push the need for industrial strength data removal.  

The Bottom Line

Obvious problems, none with a perfect solution: 

Radically standardize the hardware platform. Stick with older technology that works (incandescent works!). Be prepared for support and disposal costs...

...and, every now and then, write an article that compares performance, so that other people can see how very small problems can become so very large.