Data and log devices on the separate disks ?

Posted on 2007-04-02
Medium Priority
Last Modified: 2013-11-14
I know, that it's recommended to put data and log devices on separate disks. Some say it's for security reasons, others say it's for performance.

But, is this "old truth" still valid for external redundant disk arrays ?

With proper point-to-point architecture, there should be no reason for decreased performance with all the devices placed on the same RAID group. And for security - I don't see the advantage of separate disks - if one of them fails, you have to use your backups, so there's no difference, whether you still have half of your devices available.

If I have 8 disks, is it better to create 2 separate RAID10 groups, each with 4 disks and put data on one and log on second ? Or is it better to create one big RAID10 with 8 disks and put everything on it. AFAIK RAID10 on 8 disks should be approximately 2 times quicker on both read and write operations than RAID10 on 4 disks.

So, would you still recommend to place data and log devices on separate RAID groups on external redundant disk array (like Dell/EMC CX3-20 or IBM DS4700) ?
Question by:Jan Franek
LVL 16

Assisted Solution

rboyd56 earned 400 total points
ID: 18837022
Another reason for putting the data and log on seperate devices is up to the minute recover.

If your data and log are on separate devices and the device that has you data goes bad, you can still backup the existing transaction log and get the data that is in the log so you can gt up to the minute recovery.
LVL 14

Author Comment

by:Jan Franek
ID: 18842441
Hi rboyd56,

thanks for input. I haven't experienced disk subsystem crash yet, that would left log devices functional while data devices destroyed. Is it really possible to dump transaction log in such a situation ? I would expect database server to stop working immediately.

However, in my scenario this case is covered by synchronous mirror of all the data on disk array to remote site. So in case of any disk array failure we would switch to backup remote site disk array and we should be able to continue without need for restoring data from backup. So I believe, that "security reasons" are covered.
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 18847796
It is totally possible to dump the log even when the data devices are dead. "Dump tran [database] with no_truncate".

The logs are actually more important than the data.
Build your data science skills into a career

Are you ready to take your data science career to the next step, or break into data science? With Springboard’s Data Science Career Track, you’ll master data science topics, have personalized career guidance, weekly calls with a data science expert, and a job guarantee.


Assisted Solution

VXDguy earned 600 total points
ID: 18848005
I'm a performance kind of guy.  I strongly believe in seperating database and log files for performance reasons.  There's recoverability reasons as well, but that's dependent on your company's backup strategy and disaster recoverability needs.  In corporate environments, recoverability and data integrity is far more important than speed (i.e. I'd rather be slow and recoverable than fast and destroyed).

Anyway, lets try a simple example configuration:

HDS AMS 1000 (or HP EVA, or EMC [expensive metal cabinet], or STK, or 3Ware, or ...)
  Raid Group 0 (RG0) is a RAID5 (7D+1P)
  Raid Group 1 (RG1) is a RAID10 (4D+4P)

Carve out a lun on RG0, say 500GB.
Carve out a lun on RG1, say 100GB.

Database on RG0, logs on RG1.

Logs are 100% sequential writes during normal operations.  Placing them on RAID10 parity disks allows for very fast writes.  There's no parity calculations, just duplicate writes.

Database is a mix a reads and writes, and while a RAID10 parity disk for your database would be great, it's not cost effective for the performance gain due to the read/write I/O mix and random disk I/O.  Yes there's a parity calculation penality, but this is offset by dedicated parity generation hardware in most redundant arrays.  It will never be as fast for writes as RAID10, but read performance will be better, not to mention being far cheaper than the equivalent RAID10 array.

Keep in mind that each database update transaction is a sequence of three dependent writes.  They can't be queued, they must be performed in order, and they must all succeed.  Each update writes to the logs file (identifying the records to update), updates the main database, and updates the logs file again to show a successful update.  So for each database update, you're going to hit the log files twice.  That's why having logs on RAID10 for fast writes is important.  A slow log file/lun/raid group can bring the database to a crawl.

And it gets better...

External redundant arrays are smart controllers.  They're typically cache based controllers, so your disk writes are acknowledged to the host as soon as they enter the cache on the array.  This gives <2ms response times for I/O as long as there's cache space available.  As long as the pending disk writes can be written from cache to disk faster than the write I/Os from the host to the controller's cache, you'll get absolutely fantastic disk write performance.

So you can badly misconfigure an external raid array and not know it until it gets under heavy enough log to slow down incoming host I/O to backend disk I/O speeds due to cache being flooded with disk writes.  In other words, you can stick your database and logs on the same filesystems, LUN, raid group and the raid array's cache will "hide" performance problems until it comes under a real workload.

That said, I can tell you that some people argue for placing logs on RAID5 volumes instead of RAID10.  Until you place the database under a heavy write load, the controller's cache hides the performance advantage of putting logs on RAID10 volumes.  So it may be quite acceptable to use RAID5 for logs if the database isn't heavily utilized on a regular basis.

And finally, we use the ShadowImage (in-array mirroring capability) of the HDS AMS1000 to copy the log luns.  The transaction logs are then rolled forward against the development copy of the database.  It's easier to snapshot a 100GB lun than a 500GB lun because its smaller, it's a log file so we can take a snapshot at any time, it copies faster (underlying source/target LUN is on RAID10), and we can mirror the log luns to keep a rolling backup up to date.
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 18848288

Transaction logs on RAID 5 is usually a bad idea, I think. Logs get heavy writes, usually sequential, and my experience has tended to show RAID 5 imposes a write performance penalty of anywhere from 20-50%.

For the same reasons tempdb should also not be on RAID 5. Tempdb is usually the busiest database!
LVL 14

Author Comment

by:Jan Franek
ID: 18848923
OK, in fact, I have 10 disks for my database server. And I see this possibilities:

1. 5D+5P RAID10 and I'll have 5D capacity and cca 5x the speed of single disk
2. 8D+2P RAID50 (strip over two 4D+1P RAID5) and I'll have 8D capacity and cca 4xthe speed of single disk
3. 2D+2P RAID10 for log and 5D+1P RAID5 for data and I'll have 2D capacity and 2x speed for log and 5D capacity and 2.5x speed for data.

I have done some tests on our internal RAID and it seems, that write speed for RAID5 N+1 is approximately N/2 x write speed of one disk. It's pretty in line with my understanding how RAID5 write works (read blocks from all disks, update, calucate parity, write blocks back to disks).

So I wonder, whether it's better to go for solution with higher write speed but data and log stored on the same RAID group or to separate data and log but on slower RAID groups.

BTW - I have always believed, that RAID1 has higher write speed than RAID5, so on our current production server logs are stored on RAID1 (1D+1P) and data on RAID5 (4D+1P). Lately I did some disk performance tests (using xdd) and I was quite surprised, that our RAID5 is 2x quicker for both reads and writes that our RAID1 (in cases when cache can't handle whole transfer and there are really physical disk reads and writes).
LVL 14

Author Comment

by:Jan Franek
ID: 18848929
Joe, so you are saying, that if I destroy my data disks, ASE will still keep running so I can use DUMP TRANSACTION command ? Of course, I believe you, I'm just surprised :-)
LVL 24

Accepted Solution

Joe Woodhouse earned 1000 total points
ID: 18857592

I have been in a genuine real-world situation where some raw partitions were lost, which included a number of data devices. Of course ASE wasn't happy about it (many 605 errors), but so long as master and tempdb are still up, you do still have a functional dataserver.

That was enough to do the dump tran with no_truncate for the up-to-the-minute transaction dump of all user databases. We then rebuilt all disks, rebuilt the databases, loaded all scheduled backups and then the post-crash ones. The system was down for half a day but we recovered up to the second (well, all committed transactions up to the second) of the crash.

I'm now a believer. 8-)

Since some failures can be at the LUN level, it's good to have more than one LUN. Ditto volume groups. And a few of us have now said, the performance vs availability characteristics of the RAID levels are qute different. I would never put tempdb on anything other than RAID 0, 0+1 or 10. RAID 5 is asking for trouble.

I'm also not surprised that RAID 5 outperformed RAID 1, which after all still has a single disk-head performing the write on each side of the mirror. Even with the redundant parity, RAID 5 is distributing the write. I do expect a properly configured RAID 5 to outperform everything except 0+1 or 10. 10 is unquestionably the best; 5 is a good compromise when you can't afford the number of disks to put everything on 10.

Perfect scenario: 3 x RAID 10 volume groups, one each for data, log and tempdb.
Very achieveable and still good: 1 x RAID 10 for tempdb, 1 x RAID 10 for log, 1 x RAID 5 for data.
LVL 14

Author Comment

by:Jan Franek
ID: 18858015
Hi Joe, it's very interesting. And good to know, that I should move master database out of data disk :-)

However, in my case synchronous mirror to backup disk array should deal with any primary disk array failure, so I hope, I'll be back online in few minutes without need for restoring data from backups.

So, it seems to me, that I can optimize for speed and put everything on the same RAID group.
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 18861760
From ASE's point of view, yes, in your scenario I don't think you gain anything by having multiple RAID groups or not.

I'd want someone who knows more about the O/S and the storage hardware to confirm there are no issues at those levels. I don't know enough about those areas to competently advise.

Remember, too, a backup array is less helpful than you think unless you have a fully redundant path there - a second controller and a second channel. I *have* seen both of these fail in production incidents.
LVL 14

Author Comment

by:Jan Franek
ID: 18862709
I think, I'll have fully redundant solution - two HBA in all servers, two optical switches, two controllers in disk array, each disk connected to both controllers and the same setup in backup site.
Yesterday I have read some white paper from EMC about best practices and there was one point - they say it's better not to mix sequential and random I/O. AFAIK log writes are mostly sequential and data reads would be mostly random. But with several ASE servers connected to disk array these log writes will get probably "randomized".

It's interesting that most papers compare RAID types of the same capacity. I believe, that in real world you have some budget, so you know how many drives you can afford, so comparison based on the same number of disks would probably be more useful.
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 18888486
Agree comparison on number of disks is more useful. Those pesky disk vendors! 8-)

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

In the below post we have mentioned the best hosting type for startups. Also, check out some of the superlative web hosting companies that are proposing affordable web hosting solutions to host your startup website.
Organisation is organized in a pattern to flow the day to day business, every application and system is interdepended on each other and when very important “Exchange Server downtime” happened.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
SQL Database Recovery Software repairs the MDF & NDF Files, corrupted due to hardware related issues or software related errors. Provides preview of recovered database objects and allows saving in either MSSQL, CSV, HTML or XLS format. Ensures recov…

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question