There’s a few fundamental things that scare the living bejeezus out of system administrators; losing data has got to be somewhere near the top of the list.1
Potentially losing data in this manner becomes less scary and more of a morbid curiosity …
… especially when you’re aware of exactly what a hard drive head crash is capable of doing.
For those not aware, you’re looking at a storage appliance,2 being verbally assaulted. Each one of the drives consists of spinning platters and heads that “fly” over the platters on a cushion of air produced by the platters.3 Any movement or vibration of the drive can disrupt the cushion of air, causing the heads to either lose their place on the disk or crash into the platters.
In this video, the pressure waves generated by the sound is enough to knock (multiple) drives out of action while they stabilise, find their servo tracking signals and start servicing load again.
This originally came about while engineers were attempting to reproduce an intermittent disk latency problem caused by a missing screw that should have held a disk fast in it’s support tray. As the problem wasn’t easily reproduced, they introduced some extra vibration to see how close the disks were to exhibiting the same stall-stabilise-resume cycle; it turns out, no more than a scream away 🙂
More info on Brendan Gregg’s Fishworks blog at Sun.
This raises a few interesting points:
- The automated diagnostics/monitoring package on the 7400 series appliances, Analytics, is *seriously* awesome if you can use it to (indirectly) detect a loose or missing drive screw by looking for per-disk increases in I/O latency.
- We’re really pushing the boundaries of mechanical stability in disks. I believe these are 7200RPM SATA disks; if a loud noise can freak ’em out, I’d hate to think what a similar event would do to 15K FC drives.
- SSD’s are immune to these problems; once the write performance problems have been sorted or worked around (async flushing or similar – dont forget a UPS!), they’re going to become *really* interesting for non-caching/acceleration use.
- The monitoring / diagnostics software is effectively a web interface to DTrace, a non-invasive tracing framework used to gather kernel and application information on production systems safely, without (significantly) impacting performance. DTrace is a fascinating technology that I really need to learn more about.4
… and now for the morbid curiosity. I have a couple of 12″ subwoofers and a large collection of dance, trance and drum’n’bass music; I wonder what they could do to a JBOD array … 🙂
My hat’s off to the Fishworks guys for building an awesome storage platform; I look forward to seeing more of your stuff (and hopefully playing with your kit) soon!
<disclaimer> No drives were (permanently) hurt in the production of this blog post, although my external drive started whining in sympathy. As it says in the video, don’t do this. Especially to my SAN disks. Ever. I have a cluebat. It will be used. Oh yes. It will be. </disclaimer>
** Update! **
** Update! **
- Fire, water, power problems, trojan horses, crackers, a lack of suitable sources of caffeine, aliens, zombies and the end of the internets also feature prominently. The caffeine one, in particular, keeps me up at night. *twitch* [↩]
- A server with a whole bunch of drives attached providing disk storage to remote machines [↩]
- Photo and a primer here. [↩]
- I’ve put it on The List [↩]