Is fragmentation a big issue affecting storage area network (SAN) performance? Some say no,
while others say yes, but what’s interesting is that both sides argue their point
emphatically. So what’s the deal with SAN fragmentation?
The SAN guys explain that RAID, SAN algorithms, array
controllers and so on either minimize or eliminate fragmentation as a factor inside the
SAN. Some go so far as to say that defragmentation can even curtail overall performance
by interfering with operations.
“The benefits of defragmentation can clearly be seen in DAS
environments,” said Scott DesBles, director of technical solutions at Compellent
Technologies. “But in a data center with virtualized storage such as the Compellent SAN,
defragmentation provides little benefit and can actually hinder the SAN from managing
data as efficiently as it should.”
Such sentiments largely prevail throughout the SAN OEM community. Yet some analysts
and users take a different view.
“Fragmentation is one of those gnarly subjects open for debate that exists from DAS
all the way to SAN attached storage,” said Greg Schulz, senior analyst and founder of
Storage IO Group. “I’m of the opinion that it is an issue.”
“File defragging is certainly valuable for SAN data, but its value is going to be a
variable that depends on the type of data being accessed,” said Mike Karp, an analyst at
Infrastructure Analytics Inc. “Frequently written data will get much more value from a
good periodic defrag than will static data.”
Happy Defrag Customer
Ken Bucci, a technical support specialist at Synectics Group of Allentown, Pa., is
using defragmentation software from Diskeeper Corp. in conjunction with RAID 5, Raid 0, a
2TB HP MSA1000 SAN and two Dell EqualLogic SANs (one 4 TB and one 3.5 TB). He reports
improved SAN performance on all his arrays as a result of regular defrag.
“We always hear that defragmenting a SAN is not needed,” said Bucci. “But if there are
fragments, defrag is required. When we used the SANs for data storage for a file server,
we had people complaining all the time about performance until we used Diskeeper.”
So who is right and who is wrong? There seems to be confusion between physical disk
storage in the SAN and logical storage as viewed by the operating system, particularly
Windows. Let’s take a look at both issues.
OEMs like Compellent and HP are firm advocates of the “Don’t defrag my SAN” camp.
DesBles said the importance of fragmentation really depends on the SAN. He highlighted
Compellent’s Dynamic Block Architecture, which tracks where and how each block of data is
stored, managed and accessed in the array. This, he said, is the reason there is no need
to defrag because the SAN is already managing blocks more efficiently than the operating
system. The SAN has a view of the entire data center, takes into account the data access
patterns for all the servers connected to the SAN and manages the blocks accordingly.
Further, Compellent has a Free Space Recovery utility which reclaims free space for other
applications, thereby doing away with the need for defragmentation.
HP is another proponent of the “don’t defrag” mindset. According to Rodger Daniels,
senior EVA architect at HP, EVA virtualizes the data across the disk groups. This allows
the EVA to sprinkle the data across all the disks in the group. When data is written or
read, the EVA utilizes all the disks in the group. This improves performance using the
speed of many actuators to access the data.
“For the EVA, data fragmentation is not an issue due to our virtualization
technology,” said Daniels. “But if a customer defrags a disk, it really doesn’t impact
the EVA in a negative way.”
He said defrag programs concentrate the data to the lower logical block addresses
(LBAs) in a LUN or vdisk.
But this data remains evenly distributed across the storage pool, thereby guaranteeing
maximum performance from a pool of disks representing a storage pool. HP contends that
EVA doesn’t suffer from fragmentation due to the way it stripes its data and works with 8
MB allocation chunks that are striped across several disks. EVA is constantly executing a
process called leveling to ensure that all the disks in the system contribute their fair
share to the allocated pool and also to the overall performance of the array.
EMC makes a similar case. Symmetrix minimizes fragmentation through the way it lays
out the data such as wide striping across many disks and high performance over many
spindles. In addition, metadata is held in cache so
it doesn’t get fragmented. With NAS devices like Celerra,
fragmentation is largely avoided by the way the layout of writes is assigned.
Unlike the others, however, Bob Wambach, senior director of marketing for EMC’s
storage division, acknowledges the role of fragmentation on Windows servers associated
with the SAN; i.e., regardless of the brilliance of the SAN’s physical layout technology,
if Windows sees the data logically in 30 pieces, that might affect SAN performance.
“The SAN can’t do anything about the fact that Windows sees the file in 30 bits,” said
Wambach. “That’s really something that is happening outside of the storage realm.”
Wambach has nailed the heart of the confusion. There is a SAN running and doing what
it does. But if it is connected to Windows servers, those servers logically read and
write in a heavily fragmented way, and that is going to cause performance problems. The
SAN’s high performance, and other elements of the architecture may mitigate this
slowdown, but it appears to suffer from the consequences nonetheless.
Mark Harrer, a sales engineer at Diskeeper, explained it as follows:
You have a SAN controller managing multiple disks. But regardless of the
sophistication of the hardware installed, the SAN appears to Windows as one logical
drive. The data may look pretty on the arrays, but to the OS, it is still fragmented. So
you have to separate this out in terms of the different layers involved.
Windows has fragmentation built into the very fabric. If you don’t believe it, do a
fresh install of Windows on a brand new machine and nothing else. Then use a defrag
utility and run an analysis. It is heavily fragmented from Day One. Similarly, open up
that defrag utility on any server or PC running and see how many fragments currently
exist and the file with the most fragments. If you haven’t been running defrag, you will
find files in thousands of pieces. So when Windows does a read, it has to logically find
all those thousands of pieces, and that takes thousands of separate I/O operations to piece it all
together before it is fed to the user. That exerts a heavy toll on performance –
admittedly, which could be masked to some degree by the capability of the SAN
“The SAN guys are concerned that defrag is going to attempt to tell the controller
which disk to write files to, but neither Windows not defrag tools get involved in that
at all,” said Harrer. “The controller card and device driver deal with that at a
different layer. The defrag utility is operating only at the logical software layer. It
is not a physical disk write issue.”
He contends that the hidden influence of fragmentation could well cause storage
administrators to buy more I/O bandwidth and install more hardware than is really needed
for the applications they run. But if it is a software problem, they are targeting the
“RAID, for example, is agnostic when it comes to fragmentation; the fragmentation
occurs at a layer or two above where the RAID is performed,” said Schulz. “However, RAID
could provide a performance boost to offset fragmentation; likewise, RAID might have to
do more work in terms of IOPS to offset fragmentation
His recommendation is not to believe the vendors on either side, but to time some
things like opening files on Windows for data stored on a SAN. Then defrag and try
Article courtesy of Enterprise Storage Forum