This time last year, some storage admins were facing the prospect of an SSD meltdown. Those unlucky enough to be using certain HPE SSDs were informed by the company that, thanks to a peculiarity in the firmware, the affected drives would cease to function once they reached 32,768 hours of operation. That’s three years, 270 days, and eight hours to you and me, after which the drives would experience failure and data loss. And when they said data loss, HPE meant total data loss.
“Neither the SSD nor the data can be recovered,” the company said at the time, adding rather chillingly that “SSDs which were put into service at the same time [in RAID arrays, for example] will likely fail nearly simultaneously.” Ouch.
The good news was the problem was caused by a bug in the SSD firmware. And, by upgrading it to a newer version that fixed the problem, this Sword of Damocles hanging over the disks could be removed. That means that they would go on working after the magic 32,768 hours of operation was exceeded.
Why dig up this story now? Because it shows how critical firmware can be to solid-state storage devices, and how bad firmware can bring devices with solid-state storage to its knees.
That’s bad news for storage bods that don’t pay attention to messages from their storage vendors, but it’s also very bad news for network admins too. That’s because there’s plenty of networking kit that has solid-state storage built in, so bad firmware has the potential to brick this storage, and bring the networking device, and perhaps the whole network, crashing down. Which is very bad.
It turns out that this risk is not theoretical. Just ask Aruba Networks, which just happens to be a subsidiary of, you guessed it, HPE.
The company’s 6300-series and 6400-series switches incorporate eMMC or SSD storage to hold config files, databases, scripts, and so forth. And as solid-state storage only works for a finite number of write cycles before it wears out, Aruba makes sure that its switches have enough solid-state storage, used in an appropriate manner, so that this storage will last longer than the switches’ lifecycles.
Except that in the case of the two switch product lines above, it won’t. It turns out that the firmware writes to this memory at an “unintended and accelerated pace.” That means that it will burn through the storage’s write cycles in no time and wear it out long before the switch has come to the end of its useful life, to you and me.
The good news for Aruba kit owners is that a new firmware version that fixes this crazy storage-writing behavior has been released, and storage admins are urged to upgrade to version 10.04.3031 to avoid what the company ominously describes as “issues with memory.” That sounds like it means the switch crashing and burning.
After the upgrade, the storage utilization rate will “meet or exceed the deployment lifetime of the switch,” the company says.
The moral of this sorry tale? A network is only as strong as its weakest part, and that may be something rather arcane, like how your networking gear’s firmware manages its solid state memory. Who knew?