Link to home
Start Free TrialLog in
Avatar of theBlaine
theBlaine

asked on

SQL 2008 Optimal RAID and Virtual Disks

I've spent the last two days trying to wrap my head around this, but there are so many blogs with people saying this and that I am unable to make in progress. I have some very specific questions on Stripe size, NTFS Allocation size, Disk Read/Write policy.

I have 8 SAS 6gbps 146GB 15k hard drives. My plan is RAID1 for OS/System, RAID-1 for DB Logs, RAID-1 for Database, RAID-0 for Temp DB and swapfile. Our primary database is about 40GB in size.

I'm planning a 64KB stripe size on everything with a 64KB NTFS allocation. Would there be any benefit in using a 1MB stripe?

There is also a read/write policy (read ahead, adaptive, no read  /  write through, write back, force write back). Which should I be using for fastest performance? Do I want to enable or disable Disk Cache?

I've been testing different options and running a disk speed test and I'm getting various results, however, I am not certain that this disk speed test is testing the hard drives in the same way that SQL will use the hard drives.

I know these questions have been asked a million times, but I can't seem to find anything that is specific enough.

What other information can I provide to assist in finding an answer?
Avatar of MrAli
MrAli
Flag of United States of America image

Hello,

It looks like you are doing your research which is good.  Let's see if we can fill in the gaps.

RAID 0 for TempDB/SwapFile:  This is fine if you need the throughput but if TempDB goes down, your SQL server will immediately go offline.  No DB will be accessible.  Your data loss won't matter since it's TempDB/swap but if server availability is a concern, go RAID1.  You should run perfmon counters to see if you actually need the throughput of RAID0.  Try to keep Disk Sec/read and write down under 20ms.  

Since this looks like DAS (direct attached storage) and you're not going through a SAN, go with 64k block sizes.  SQL Server uses 'extents' of data which is 8 8Kb pages.  Going with a 64k block would help out a lot.  I could see going with 128 or higher only if your access patterns were highly tuned, such as B.I. or DW environment.  For regular OLTP stuff, 64k is probably your best friend without doing major analysis and design considerations.


As far as disk caching goes, you need to make sure you are safe or you risk major data corruption risk.  Make sure your cache is backed by a battery or is redundant.  If not, the data in cache will be lost during a reboot/crash and your data is in a hosed inconstent state. This is especially true with Write Back Caching.
Check out:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=173993

Also then go to this link and actually get a real baseline:
http://sqlserverpedia.com/wiki/SAN_Performance_Tuning_with_SQLIO

 You can use it but turn off any READ caching, SQL server is a in memory db engine and will use RAM as much as possible.  Using cache memory for reads is a waste for SQL Server.  If you need to read from your disk cache for SQL, there's something wrong.


Finally, I cannot stress the importance of getting that baseline, and setting up your perfmon metrics and running some data load.  Share you disk sec/read and write with us.  Keep it well under 100ms for data.  Under 10ms for transaction log.  

It seems like a waste to use such good high performing disks for OS.  Can you run the OS on 2 SAS 7200RPM disks instead and use these 2 for SQL Server?
Avatar of theBlaine
theBlaine

ASKER

MrAli,
Thank you for your quick reply. I forgot to mention the server's specs. Dell PowerEdge R420, Intel Xeon E5-2470 2.30 GHz, 20M Cache 8.0GT/s Turbo 8C, PERC H710 Mini, 64GB RAM, so it's a fairly beefy server.
Here are my options when managing the virtual disks...

DEFAULT WRITE BACK POLICY:   choices (Write Through, Write Back, Force Write Back)
READ CACHE POLICY:  choices (No Read Ahead, Read Ahead, Adaptive Read Ahead)
DISK CACHE:   choices (Default, Enable, Disable)

I've never used perfmon counters, but I'm familiar with it. I'll have to do some Googling on it tomorrow. Since this is a new server, I don't even have SQL installed yet. What will I be base-lining and what are we going to compare to that baseline?
The server itself is beefy so that's good, but you might have DiskIO contention in which case, all the beef won't be very helpful.

DEFAULT WRITE BACK DISK POLICY: Write Through is safer, Write Back is faster.  You should probably enable this for the transaction log drive and nothing else.  

READ CACHE POLICY: No Read Ahead

DISK CACHE: ENABLE

Only ENABLE it if you have a battery backup on your raid controller, proper monitoring, and a strategy to replace it every x years.  If you do not, you're going to end up with disk corruption that won't be easily fixed once the battery dies and you have a hardboot or a crash.  You can forego any performance gains and just turn it off in that case.


How big are your databases going to be?

How write heavy is the app going to be?

Google perfmon baseline, it'll give you a start.  Let it run under a full days load and review the disk sec read and write per each drive, then come back with the results.  We'll know if you need more perf or not.  Good luck.
Avatar of Jim Millard
While all those RAID1 volumes will give you independent disk access on this system, all of them will have the performance of a single disk. And as mentioned above, the RAID0 will perform at the speed of 2 spindles, but with the hazard of downing the DB service is the volume is degraded for any reason.

You may have read that this isolation is preferred, and in practice, that's true. However, your workload could benefit from having a single RAID10 volume for all your I/O, and multiple volumes partitioned for administrative convenience. While I/O would be shared among all processes, you will also gain the benefit of 8 spindles of performance for any read, and 4 spindles of performance for any write.

In the end, it really depends on your workload, but if you have the time to experiment, I think you'd find some interesting information about your environment and hardware if you try the single pool approach.
MrAli,
The controller has a battery backup, so sounds like I can use Disk Cache. The size of the databases are about 40GB. Not really sure how to tell how write heavy it is. If I go to the Performance Dashboard on our existing SQL server (the one we are replacing) and go Historical IO, I see Reads 75.23% and Writes 29.25% (not sure why that doesn't total 100%).
millardjk,
Is your suggestion that I run the OS on a RAID1, then take the other 6 drives I have and create a RAID10 and put everything else on that (databases, trans logs, tempdb, swapfile, etc.)? I have some time to do some testing, so I'm willing to try. My problem is that I don't know the best way to stress test everything and get an accurate result that would tell me the ideal setup.
The O/S shouldn't be used that much, and the swap file essentially not at all, so I would go with a single RAID1 for O/S, swap file and tempdb.

Then another RAID1 for the log files, and a 4-drive RAID5 for the data files (RAID5 is faster for reads).  For recoverability purposes, it's not safe to have the database and log files on the same drive set, so that prevents a single RAID10 for everything.

You should enable write back and caching since you have battery to support the cache, but you must insure 100% that the batteries are good at all times SQL is up, especially for log files.

I would certainly enable read caching and read ahead.  SQL itself uses read ahead, so the drive must as well be "in sync" with SQL itself.  

One big hope with read caching is that indexes will get cached so that later references to them can be met with disk cache if they are not still in RAM (typically you can't control caching to be only index data, although that would be the best payback if you could :-) ).
Scott,
We do a lot of writing to our DB and our current SQL server is a RAID-5 on for the databases, so I think with our application a RAID-5 would give us poor writing performance.

I was also told by an IT guy that the tempdb has a lot of activity and should be on a fairly fast RAID. But he's not a DBA, so I don't know how true that is.
I'm open for more suggestions. I'm going to try some of the suggestions above and then do some stress testing. Sounds like SQLIO is the best way to test. This new SQL server just has a fresh OS installed (2008 R2) on a RAID1, so I can play around with the other 6 drives in various RAID configurations.

Hopefully SQLIO is fairly straight forward to use, I've never used it or tried interpreting the results.
Is this going to impact anything?

"so I would go with a single RAID1 for O/S, swap file and tempdb"
>> I was also told by an IT guy that the tempdb has a lot of activity and should be on a fairly fast RAID <<

That's true.  But the O/S won't interfere with the tempdb much at all, and that's the only difference between your original setup and mine on the tempdb RAID set.


>> We do a lot of writing to our DB ... RAID-5 would give us poor writing performance <<

Very true, RAID5 is much worse on write performance.  It's definitely a balancing act.

What is "a lot of writing".  And what % of writes vs reads.  Reads WILL  be SLOWER on RAID10.

Typically most SQL does far more reads than writes, esp. when properly tuned (for example, no unused/little used indexes to add unneeded writes).

But your case could be different.  That has to be evaluated on a case by case basis.  But don't dismiss RAID5 out of hand just based on the assumption that RAID10 is always faster -- that's NOT true for reads.

Whatever you do, it's unsafe to put both data and logs (and backups too?) on a single RAID set.
I'll steer away from a RAID10. Not really sure how to tell how write heavy it is. If I go to the Performance Dashboard on our existing SQL server (the one we are replacing) and go to Historical IO, I see Reads 75.23% and Writes 29.25% (not sure why that doesn't total 100%).

I have no idea if I'm looking at the correct stat and if that is considered heaving writing or not.
So as everyone has chimed in:

TempDB can have lots of activity but it depends on the workload.  Typical best practices are to isolate temdb and tempdb transaction log but that's for a highly tuned system.  You should be fine putting it on the OS drive and then testing.  You have to get baselines, especially disk sec/read and write.  You should also make several TempDB internal files as i've seen logical file contention without it.  Check:
http://www.sqlskills.com/BLOGS/PAUL/post/A-SQL-Server-DBA-myth-a-day-(1230)-tempdb-should-always-have-one-data-file-per-processor-core.aspx

SQL Server 2005 + uses a process called 'lazy writer' which will write out to the data disks when the disks aren't busy instead of solely during checkpoint operations.  Looking at a 30% write scenario, this should be fine.  Try doing the data drives on RAID 5 and again checking disk sec/read and write.  If it's over 50ms you might want to consider changing it or asking us for help.

Transaction Logs have almost no memory buffering (only 24k before it has to be flushed to disk) so that needs to be a fast raid array.  How many databases are going to be hosted on here?  If it's only 1 user database, put that transaction log on the raid 1 drive and the other ones for the system DBs on the RAID5 or the OS / TempDB drive.  You will get serialized writes which a RAID 1 array can perform much better.


READ AHEAD is actually good in SQL Server as the poster stated, feel free to turn that on.  I personally turn on write caching for the transaction log drives.
>> Transaction Logs have almost no memory buffering (only 24k before it has to be flushed to disk) <<

?  Logs have NO true buffering in SQL (at commit).  

SQL use write-ahead logging -- as do almost all dbs since IBM invented it -- which means, the log must physically be written before the transaction can proceed (commit).  SQL literally waits for the log write confirm before continuing (upon commit).

That's why it's so important for the log writes to finish ASAP.  And that's why the write back caching is so potentially dangerous -- SQL's been "told" the log has been written to disk period, so if it actually hasn't been, SQL will NOT "know" how to recover the db properly.  The battery allows those log writes to physically take place even with a power failure, keeping the integrity of the log and thus of the SQL db.


>> SQL Server 2005 + uses a process called 'lazy writer' <<

Yes, for *data* only.  The data writes are not required for transactional consistency, but log writes are.


>>  If it's over 50ms you might want to consider changing it or asking us for help. <<

Agreed, for writes.  Reads should be somewhat faster than that.
SOLUTION
Avatar of MrAli
MrAli
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>> Historical IO, I see Reads 75.23% and Writes 29.25% (not sure why that doesn't total 100%). <<

Typically you see 80/85-20/15.  So writes are a little higher than normal.

But I don't think you'll see any writes issues on RAID5 given the size of your db and the fact that data writes are queued/'lazy'.  Remember: SQL does NOT ever wait for the data to write, only the log.  

So, the slower data write of RAID5 will not impede performance unless you have so much to write that buffers can't be made available to SQL w/o waiting for an on-demand write (a buffer "steal").  I don't see that happening in your situation given what you've described.

If it does, you can always convert the RAID5 to RAID10 :-) .
>> There absolutely is a very small log buffer for the transaction log in SQL Server. <<

Of course.  ALL disk activity in SQL goes thru a buffer, including routine SQL modules.

I said there was no true "buffering" for log records, i.e., the process of delayed/lazy writes, as you talked about for data.
There is a lot of information here for me to digest. It's becoming a little overwhelming. I'm a Network Admin and build servers all the time, but this is the first time I've given all these things a thought since I'm literally trying to make this the fastest I possibly can with the hardware on this server that my company just bought. There are a few people who are suggesting conflicting things.

I know I can always change read/write policies and caching at any point, but I need to figure out the best RAID setup so I can start testing.

Clearly I'm going to run the OS on a RAID-1 which I have just installed Server 2008 R2 Enterprise (64GB RAM). I'm going to have to try different RAID configurations with the last 6 drives (I have a max of 8 drives already installed).

What confuses me the most is if I setup a RAID, run SQLIO on it and I find a configuration that gives me the best numbers in SQLIO... that doesn't necessarily mean it's going to be the fastest setup for my specific DB. I don't even have SQL 2008 installed yet on that system. What tests am I going to do to determine the best setup for my specific database?
theBlaine,

You're right, SQLIO will just give you raw throughput numbers.  This is useful for finding the theoritical max.  

You'd want to run a program like iometer:
http://technet.microsoft.com/en-us/library/cc966412.aspx

This article will explain.  It'll show you the results you'd expect with your mixed read/write environment.  

It really seems like you could put your data drives in RAID 5 as the writes are staggered to it to avoid over burdening it, with the transaction log in a RAID 1 or RAID 10 at some point.
MrAli & others,
I have a total of 8 drives. Sounds like I need a total of 3 Virtual disks.
#1 - System/swapfile/tempdb
#2 - Database
#3 - Tran logs

#1: RAID-1 (2 disks)
#2: RAID-? (? disks)
#3: RAID-? (? disks)

If you were ME, and you were setting up your own machine. How would you assign the disks and what would each virtual disk handle (system, logs, etc.)? I have the whole weekend to test any of your suggestions.

Even if several different people have different suggestions, I have time to test them all.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
As millardjk stated,

I'd go with that as well.  

How many user databases will you have?
Sorry for the delay, I had a 3-day weekend. I'm going to test the suggested configuration above.

We have 32 DBs, but our primary DB gets 99% of the I/Os.
Wow. You might consider putting all the other DBs (unless you need them on same-host for linking or licensing purposes) on a completely different host\instance, possibly even leveraging Express and/or virtualization, just to get them out of the way of the primary. That gives the primary db (and its instance) maximum dedicated access to all the hardware (CPU, memory, I/O), not just the disk.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Those are exactly the two configurations I'm going to test. However, I accidentally deleted the System RAID. Re-installing the OS. Fortunately, it was just a fresh install of the OS before I deleted the RAID. So I only lost about 3 hours, installing Windows drivers as we speak.
I'm about halfway done with my testing. Here are my results so far. Do they appear to be consistent with what you'd expect?

RAID-1 (system/swap/tempdb): Write Back/Read Ahead/Disk Cache Enabled
RAID-5 (logs): Write Back/No Read Ahead/Disk Cache Enabled
RAID-5 (db): Write Back/No Read Ahead/Disk Cache Enabled

Still testing the RAID-10's performance.

These combinations seemed to give me the best performance. However, I'm not sure if the read/write/cache policy below is ideal for a SQL server versus just raw HD speed tests. What would the disk read/write/cache policy recommendation be if we assume....

#1 - System/swapfile/tempdb: 2 disks, RAID1
#2 - Transaction Logs: 2 disks, RAID1
#3 - Database: 4 disks, RAID5 (or RAID10, still testing)
>> RAID-5 (logs): Write Back/No Read Ahead/Disk Cache Enabled <<

Woah ... probably just a typo ... the log drive should *always* be RAID1 (or RAID10).


>> RAID-5 (db): Write Back/No Read Ahead/Disk Cache Enabled <<

Read ahead should be enabled for the dbs.


Cache enabled will give you the best performance.

I would enable read ahead for the tempdb and db drive sets.

For the log drive, it's probably not necessary -- logs are very rarely read anyway, EXCEPT when being backed up and doing (the only occasional, hopefully) rollback.
That was a typo, my logs will be on a RAID1.

Okay, I just want to make sure I understand the suggestion.

RAID1 - OS/swap/tempdb - Write Back / Read Ahead / Disk Cache Enabled
RAID1 - logs - Write Back / Read Ahead / Disk Cache Enabled
RAID5 or 10 - database - Write Back / Read Ahead / Disk Cache Enabled

So basically, all three RAIDS should be - Write Back / Read Ahead / Disk Cache Enabled  ???
I've decided to go with RAID10 on the Database disk group. The speed difference wasn't huge, so I'll take the extra redundancy. RAID1 on the other two disk groups.

So this is my proposed final configuration.
RAID1 - OS/swap/tempdb - Write Back / Read Ahead / Disk Cache Enabled
RAID1 - logs - Write Back / Read Ahead / Disk Cache Enabled
RAID10 - database - Write Back / Read Ahead / Disk Cache Enabled

Does anyone see any issues with this? Any suggested changes in the policies? When I tested the RAID10, it got slightly better performance with disk cache disabled. So I'm not sure what the ideal read/write policies for this configuration.

Keep in mind, I'm just trying to squeeze out as much performance as possible. Once it's in production, I can't make any changes. So it's a one-shot deal to get it right at the beginning.
>> RAID1 - logs - Write Back / Read Ahead / Disk Cache Enabled <<

One possibility to test is to turn off Read Ahead for the log drives.  You might want to test that out.


Other than that, I don't see anything else.
Cache Enabled and Write still okay for logs?
Yep.  Read Ahead off only for the logs is probably worth testing, if this is one-time-setup only deal.



>> I tested the RAID10, it got slightly better performance with disk cache disabled. <<

That's true for a single set of reads, because it takes some slight extra time to update the cache with what was read, and if it's never re-read, that time is then wasted; it shouldn't be a lot of time, though.

However, in a real SQL Server workload, you're much more likely to encounter a re-read, on tempdb and data particularly, that can use the cache, in which case the savings from Read Ahead is noticeable.
I'm putting the tempdb on the system drive. Should I put the tempdb logs on the same drive, or put that onto the RAID with the other log files?
You should be fine either way.

Personally I'd put the tempdb logs on the log drive, but it's not a major deal one way or the other.
Thank you for everyone's advice. SQL Server is up and running and getting nice I/Os!