Solved

# File Allocation Unit Size for SQL

Posted on 2010-01-04
1,342 Views
Which of the foll 2 scenarios is better for SQL Server data disk? both have downsides. are they equal or one better than the other & why?

Stripe Unit Size of SAN Raid array = 32K
File Allocation Unit Size = 32K
---------------------------
Stripe Unit Size of SAN Raid array = 32K
File Allocation Unit Size = 64K

thanks
0
Question by:anushahanna

LVL 46

Accepted Solution

The answer depends on type of RAID and how many disks are in the array.  The OPTIMAL setting is to have the stripe size of all usable disks = 64KB.

That is, if RAID1, then stripe size of raid is 64KB.
If RAID5 with 3 disks, then stripe size is  64KB / (3 - 1)
RAID5 with 4 disks is (4 - 1) / 64KB

RAID10 (4 disks) stripe size = 32KB
0

LVL 35

Assisted Solution

Hi,

I agree - for data drives 64k stripe seems to be best in most cases.

Don't forget, that when formatting the resulting drive from Windows, that you can specify a file allocation unit as well. I suggest that for SQL DATA disks, that setting to 64k instead of letting Windows do its default will give better throughput

format diskletter: /A:64k

HTH
David
0

LVL 46

Assisted Solution

(TO clarify, stripe size in my equations were for physical disk drives. You simply want 64KB worth of data to be supplied to the O/S by the RAID controller in a single I/O for the disks in the RAID).  Any other number is wrong.  Furthermore, don't build a 3+1 or 5+1, you will get better performance in a 4+1 (if all disks are 16KB stripe size), because it only takes one cycle to read or write 64KB.
0

LVL 6

Author Comment

we have the scenario with all 3 situations raid 1, raid 5, and 10. let's talk about Raid 10 with stripe size = 32KB;
1)let's say we match the File Allocation Unit Size with the stripe size, but this will always involve 2 IOs for SQL.
2)let's say we double the File Allocation Unit Size for the stripe size (to make one SQL IO for each allocation), then it will be 2 IOs to get and to give to the RAID.

In both situations, there are 2 IOs involved, but differently. Which one is more costly?

for RAID 5 , did you mean 32K Stripe for 3 disks and 64/3 is not a whole number for 4 disks?

thanks
0

LVL 46

Assisted Solution

With stripe size of 32KB on RAID10, each disk only does a single IO.  Furthermore, more sophisticated RAID engines will balance read requests so the 2 disks that make up the RAID0 part of the RAID10 that are least busy will do the read, and the other 2 disks won't do anything, so you only have 1/2 of an I/O per disk.

If you double the file allocation unit size on NTFS to 128KB, then you will make windows write 128KB at a time and force the disks to do twice the work.   Reads won't have such a penalty because on all RAID levels, no reason to have all the disks read the parity information.  (BUt some RAID controllers will read all disks and check RAID parity

On RAID5, with a 4-drive RAID5, then it is not possible to get 64KB, you get 48KB of data to/from the RAID on any single I/O request ... so don't do it.  You want a 5-drive RAID5 with stripe size of 16KB per disk, or a 3-drive RAID5 with 32KB per disk, or 9-drive rAID5 with 8KB per disk.
0

LVL 55

Assisted Solution

Microsoft recommend 64k stripe unit size and 64k file allocation unit size in most instances for SQL. www.microsoft.com/technet/prodtechnol/sql/2005/physdbstor.mspx
http://msdn.microsoft.com/en-us/library/dd758814.aspx
(see Essential Correlations: Partition Offset, File Allocation Unit Size, and Stripe Unit Size)

Whilst setting a full stripe width to 64k may give faster transfer times for single I/Os and sequential I/Os it means that every logical I/O from SQL requires a (smaller) I/O from every disk which means it's not efficient for general SQL databases which will throw several different I/O profiles at the storage at the same time giving effectively a random rather than sequential I/O profile.
0

LVL 6

Author Comment

dlethe
so, when file allocation unit size is 64KB on RAID10(With stripe size of 32KB), it does 1/2 to 1 IO; how about file allocation unit size of 32KB on RAID10 with stripe size of 32KB?
0

LVL 6

Author Comment

andyalder
are you meaning stripe size of 64K is good for sequential I/Os and not for randon I/Os? If so, what will be the best balance for stripe size for both sequential & random IOs?
0

LVL 46

Assisted Solution

SQL server does 64KB I/Os natively.  So decreasing file allocation unit size to 32KB means NTFS will send 2 sequential reads or write requests to the device driver instead of one.  net result is while a large number of I/Os will be aggregated by the SCSI stack, so you will end up with I/O requests for 64KB from NTFS where either the first 32KB or the last 32KB will go into the bit bucket before SQL server ever gets them.

The most efficient thing to do is supply exactly 64KB worth of data, no more no less to SQL server in only one I/O request per disk drive.
0

LVL 6

Author Comment

dlethe
thanks for clarifying that. I have seen SQL implemented on disks with file allocation unit size to 8KB, in the past!! (that would be 8 IOs)

can you point me to some reading material on the para you wrote :
"
With stripe size of 32KB on RAID10, each disk only does a single IO.  Furthermore, more sophisticated RAID engines will balance read requests so the 2 disks that make up the RAID0 part of the RAID10 that are least busy will do the read, and the other 2 disks won't do anything, so you only have 1/2 of an I/O per disk."
I do not understand fully yet how 32K stripe still does it within 1 or less IO. Is this true for both reads and writes?

thanks
0

LVL 55

Assisted Solution

No, I don't mean a stripe size of 64K is good for sequential access; there are two things here, stripe unit size and stripe width (which is  stripe units * no of disks). If you set the stripe width to 64K (meaning a 32k or 16k or even 8k stripe unit) as initially suggested in a id 26176912 above it is good for single access but detrimental to normal operation since all disks are involved in any single I/O. Not normally a good thing.

For sequential access you can benefit from a high stripe unit size, e.g. 256K since the controller will fetch 256k block when a 64k block is requested, so the next 3 blocks are in cache. This isn't good for random access though as you are wasting valuable cache.

Generally you want 64K stripe unit size and 64k cluster size. It's only on systems that have been tweaked for special jobs where other valuse like 8K stripe unit is used.
0

LVL 46

Assisted Solution

Andy is right, but one other factor, depending on the queue depth and contents of a RAID controller, configurable settings and firmware, RAID controllers will coalesce sequential IO requests into a single I/O, so it is entirely possible and likely that 4 requests for 64KB will translate into a single 256KB request.

HOWEVER ... if you set it to 256KB, then ALL write requests will write 256KB, no matter what, and unless the RAID supports short reads, then you will also get 256KB reads and 3/4ths of what you need goes into the bit bucket.

That is why it is best to just set it to 64KB.
0

LVL 4

Assisted Solution

Here is my opinion,

All depends on cache unit/page/block size on the back end array.

My recommended standards :
Oracle 16k Sequential Mostly
SQL 8K Sequential/Random --> If sequential go for bigger block size, if random go with 8k .
Exchange 4K Random.

The above is recommended to utilize the cache blocks for best performance.
Then come your disk size. Best recommended is 64k element size per disk
so in a Raid5(4+1) u will have 4X64 => 256K Stripe size.

0

LVL 35

Expert Comment

Hi,

For SQL DATA, it reads an extent - 64k.

For the log files I'd suggest the OS default for the allocation unit.

Cheers
David
0

LVL 6

Author Comment

Thanks for your inputs. The server I have in mind has 2 HBA ports, each with 8 disks for RAID 10 configuration. so how do we calculate the stripe width? Do I need to be concerned about it or only the stripe unit size.

0

LVL 4

Assisted Solution

HBA Ports does not count against the stripe width.

You need to consider about the element size(64k), which is portion of the data that is written on disk one before it write to disk 2.

If you have two raid10(4+4) raig groups then the stripe size would be (64kbX4disks) 256 kb as stripe size. the other 4 disks for a mirror stripe size of same 256kb

1)Raid group1 raid type- raid10(4+4) (64KB X 4disks usable data = 256 kb stripe size. other 4 disks x 64kb is for mirror stripe.)
2)Raid group2 raid type- raid10(4+4) (64KB X 4disks usable data = 256 kb stripe size. other 4 disks x 64kb is for mirror stripe.)
0

LVL 46

Assisted Solution

Since you have 2HBA ports, then the correct (from both availability AND performance perspective) is to break up the RAID10 so that each port has one of the RAID0 stripes.   That way not only do you get load balancing, but if you lose either cable, HBA, or switch port, then you stay online.
0

LVL 55

Expert Comment

You can't do that in the real world, both the controllers see all the disks and all the hosts. You can't split a RAID 10 array down the middle except in very specific circumstances.
0

LVL 46

Expert Comment

Just configure the switch and/or HBA so that it doesn't see the other path. Easy.
0

LVL 4

Expert Comment

anushahanna,
I am sorry, What are you looking for, i feel the question is deviating with diff answers given......
0

LVL 6

Author Comment

thribhu, do you mean stripe unit size as element size?
in my case, it will be 32K. so my stripe size/width will be 128K for each RAID group.

so back the original question, how superior is the config
"
Stripe Unit Size of SAN Raid array = 32K
Stripe Width of SAN Raid array = 128K
File Allocation Unit Size = 64K
" and better than when the FAUZ is 32K.?

is this an optimal setup or what can be improved?
0

LVL 6

Author Comment

andyalder, you said in some specific circumstances, you can split a RAID 10 array down. can you please explain what those circumstances will be?

dlethe, I like your idea of making the pair disk from the other port. can you pl explain the part where you said 'it doesn't see the other path'

thanks much!
0

LVL 46

Assisted Solution

There are numerous techniques to block the other path.  If you are using a switched topology, you can tell the switch.  Emulex & Qlogic HBAs have a java-based HBA configurator that lets you set up lun masking.  You just tell it the WWNs that you want it to recognize on each port, and then once they are masked out, windows simply will not see the duplicate ports.

0

## Write Comment

Please enter a first name

Please enter a last name

We will never share this with anyone.

## Featured Post

Solid State Drive Performance Tips: Solid state storage technology is now a standard.  After testing and using several different brands and revisions of SSD's over the years I have put together a collection of tips,tools and suggestions that I haâ€¦
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been â€¦
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featuâ€¦

#### 760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

#### Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!