thenightlife
asked on
Does defragging a RAID 5 array help performance?
Our office is hotly contesting defragging a RAID 5 array on our Exchange Server 2003. Some admins say to do it, while others say it is pointless. The disk analysis in 2003 server shows drive D: *entirely* in red, and states is should be defragmented. Can anyone clear this debate up?
No disrespect intended, but it's NOT answered in those links. I do research on this topic a couple times a year and have yet to find a good answer that really stands up to analysis. In defense of those who support defragmentation on RAID I have yet to see anyone who opposes it effectively explain why they see it as fruitless so there's been little to discuss or refute. Most times this question is asked a couple people come back with unsupported opinions that it's good. Add one other person who will try to provide some technical foundation that generally falls short and the topic ends up closing.
The concept of defragging a disk came about because it was well known that fragmented files caused a performance hit. This is due to the added time required for the hard disk heads to physically move across the platter(s) as they seek out each fragment of a file. With a standard disk controller the operating system has (mostly) direct hardware access and therefore has firsthand knowledge of the physical layout of the sectors on the disk. This is what you see represented when you call up a disk map such as the one you see in the Windows defrag tool.
When you add an array controller, the operating system loses this direct knowledge of the physical layout of the disk. The array controller becomes a point of abstraction between the operating system and the actual physical hardware. Here is where the confusion starts.
Let's take a simple RAID5 array of 3 40GB disks. Combined the physical capacity is 120GB, but due to the RAID 5 only 80 is usable. The array does some sleight of hand and presents the 80GB of usable space to the operating system as a single 80GB physical device. It is essentially emulating a single 80GB physical disk even though there are actually three 40GB physical disks providing the foundation. At no time is the operating system ever aware of the true physical nature of the media.
The operating system has probably been written to deal with hard disks by identifying tracks, sectors and/or blocks. This is why the array controller presents itself the way it does. By emulating a single disk it doesn't require any changes to already existing filesystems (e.g. NTFS, FAT, EXT2). The array hardware presents an emulated physical disk layout which can be used by virtually any existing operating system using already existing storage routines.
When using an array what you see in Windows defrag layout ISN'T REAL. It's an abstraction that the array controller presents to the OS because it knows that it's what it needs. It translates all the writes to this "logical" or "virtual" layout into actual physical writes to the true physical disks on the fly. While the abstracted disk layout may appear to be fragmented that's only because the the array controller is presenting the map back to the OS the same way the OS requested it be written. There is no direct correlation between where the OS thinks the data is and where it really is on the physical disks. Most importantly there is no longer a one to one correlation between the sectors or blocks that the operating system "sees" and the actual physical sectors on the disks. In fact the block size on the actual physical disks may be wildly different than the block size that is being presented to the OS.
Fact is, if you defrag an array the actual physical layout on the array will change, but that because of how defrag works. It's a series of file copies and subsequent file deletions. Since the OS is requesting a write as part of the copy process the array controller is obliged to write new data at a new physical location (because the current physical location is already occupied). The subsequent delete causes the array to then free up the physical space on the array that corresponds to the locations on the abstracted map. the data will definitely have moved, but there is no evidence that the new physical layout is more efficient because, once again, there is no direct correlation between where the OS thinks the data is and where it actually is on the physical disks.
I'm sure it has become apparent that I have my doubts about defragging an array. That's because I cannot see how it is possible for the operating system to make decisions regarding the best way to physically layout files on a disk (which is what defrag does) when it has no true knowledge of the physical nature of the disk.
The only potential benefit I can see would be in command queuing. If the OS believes that a file is fragmented it will send more commands (Read 4 sectors beginning at sector 1000, then read 10 sectors beginning at sector 2000, then read 20 sectors beginning at sector 3000) then if it believes the file is in a single chunk (read 34 sectors beginning at sector 1500, [yes this is a very small file]). This benefit is theoretical and likely intangible as the additional delay to send three commands, or even hundreds, into the queue as opposed to a single command is irrelevant when compared to the delay associated to a true physical head seek.
If anyone has any information that can truly counter (or factually support) these arguments PLEASE list it here, I beg you. If it truly stands up I will sing out loud praises to the author who can save me ever having to research this again.
The concept of defragging a disk came about because it was well known that fragmented files caused a performance hit. This is due to the added time required for the hard disk heads to physically move across the platter(s) as they seek out each fragment of a file. With a standard disk controller the operating system has (mostly) direct hardware access and therefore has firsthand knowledge of the physical layout of the sectors on the disk. This is what you see represented when you call up a disk map such as the one you see in the Windows defrag tool.
When you add an array controller, the operating system loses this direct knowledge of the physical layout of the disk. The array controller becomes a point of abstraction between the operating system and the actual physical hardware. Here is where the confusion starts.
Let's take a simple RAID5 array of 3 40GB disks. Combined the physical capacity is 120GB, but due to the RAID 5 only 80 is usable. The array does some sleight of hand and presents the 80GB of usable space to the operating system as a single 80GB physical device. It is essentially emulating a single 80GB physical disk even though there are actually three 40GB physical disks providing the foundation. At no time is the operating system ever aware of the true physical nature of the media.
The operating system has probably been written to deal with hard disks by identifying tracks, sectors and/or blocks. This is why the array controller presents itself the way it does. By emulating a single disk it doesn't require any changes to already existing filesystems (e.g. NTFS, FAT, EXT2). The array hardware presents an emulated physical disk layout which can be used by virtually any existing operating system using already existing storage routines.
When using an array what you see in Windows defrag layout ISN'T REAL. It's an abstraction that the array controller presents to the OS because it knows that it's what it needs. It translates all the writes to this "logical" or "virtual" layout into actual physical writes to the true physical disks on the fly. While the abstracted disk layout may appear to be fragmented that's only because the the array controller is presenting the map back to the OS the same way the OS requested it be written. There is no direct correlation between where the OS thinks the data is and where it really is on the physical disks. Most importantly there is no longer a one to one correlation between the sectors or blocks that the operating system "sees" and the actual physical sectors on the disks. In fact the block size on the actual physical disks may be wildly different than the block size that is being presented to the OS.
Fact is, if you defrag an array the actual physical layout on the array will change, but that because of how defrag works. It's a series of file copies and subsequent file deletions. Since the OS is requesting a write as part of the copy process the array controller is obliged to write new data at a new physical location (because the current physical location is already occupied). The subsequent delete causes the array to then free up the physical space on the array that corresponds to the locations on the abstracted map. the data will definitely have moved, but there is no evidence that the new physical layout is more efficient because, once again, there is no direct correlation between where the OS thinks the data is and where it actually is on the physical disks.
I'm sure it has become apparent that I have my doubts about defragging an array. That's because I cannot see how it is possible for the operating system to make decisions regarding the best way to physically layout files on a disk (which is what defrag does) when it has no true knowledge of the physical nature of the disk.
The only potential benefit I can see would be in command queuing. If the OS believes that a file is fragmented it will send more commands (Read 4 sectors beginning at sector 1000, then read 10 sectors beginning at sector 2000, then read 20 sectors beginning at sector 3000) then if it believes the file is in a single chunk (read 34 sectors beginning at sector 1500, [yes this is a very small file]). This benefit is theoretical and likely intangible as the additional delay to send three commands, or even hundreds, into the queue as opposed to a single command is irrelevant when compared to the delay associated to a true physical head seek.
If anyone has any information that can truly counter (or factually support) these arguments PLEASE list it here, I beg you. If it truly stands up I will sing out loud praises to the author who can save me ever having to research this again.
ASKER
Very well written. I too am confused by this subject, and the debate within our office continues. We suffer from very poor Exchange performance on a very high horse power server. My question now is, knowing that most systems using Windows 2003 Server will be using an array of some sort, if the Defrag tool is ineffective...why include it in the OS? The Diskeeper Corp sent me some documentation supporting that you should defrag a RAID 5 volume...but I would assume this has marketing considerations for them.
So red is not red is not in the server contained defrag tool as you state in not so many words above. How would you optimize the drives? Is there another tool or procedure you would recommend?
So red is not red is not in the server contained defrag tool as you state in not so many words above. How would you optimize the drives? Is there another tool or procedure you would recommend?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
https://www.experts-exchange.com/questions/21576901/OS-Defrag-of-Raid5.html
http://forums.storagereview.net/index.php?showtopic=19099
IMO, make sure you have a good know backup cause I did see a thread or two where the raid got corrupted.
Also, probably would be wise to do an eseutil offline defrag of the exchange DB's to reclaim disk space.