asked on

Drive configuration for SAN

Hello,

I am looking at this entry level SAN
HP P2000 G3 MSA

We are probably going to be looking for two types of drives, which we feel will require 1 hot spare for each type. This leaves us with 10 drives.

We are considering 5 drives in 1TB 7.2K drives. This leaves us with 4TB of slow disks useable for storage.

We are unsure about the best configuration for the remaining dives. A RAID 5 array would work for all 5 drives. A RAID 10 array would leave one odd drive out.

Another possibility would be a 3 DISK RAID 5 array, and a mirrored set. I am not sure if RAID 1 is as fast or faster than RAID10

So the scenarios I see for drive configuration would be this:
10 disks 2 hot spares

(5Dx RAID5 x 7200)+(5D x RAID5 x 15K) - 4TB slow/3TB
(5Dx RAID5 x 7200)+ (3D x RAID5 x 15K)+(2D x RAID1 x 15K) -4TB slow/1.2TB/600GB
(4Dx RAID5 x 7200)+ (6D x RAID10 x 15K) -3TB slow/1.8TB

With a mind for virtual machines, and performance is it better to have the 3 separate RAID arrays for “segmenting” disk utilization? Am I over concerned about say 7 or 8 VMs running on the same disk subsets, if we were only to have two different RAID volumes, be it RAID 5 or 10?
The 7200 space will only be for storage and some file shares.

Just looking for some opinoins on storage configurations, from those who are actively supporting VMs in a production environment.

SOLUTION

serchlop

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sharedit

ASKER

Since we have a hot space I guess I am not so concerned about an impending two disk failure.

I am really questioning what will give me the most options in regards to virtual machines. I will obviously not be using the 7.2K space. But I do read that raid 5 is suitable for most virtual machines, say a terminal server. but if I want to cross the bridge of bringing something like an exchange server or SQL server into the shared storage for virtualization, I may want want some faster performing RAID disks.

I am still not clear that raid 1 is as fast as RAID 10. If it is not, then there may be a reason for the RAID 1 set.

Obviously there are many variables, but I am still not sure how many VMs I would expect to get on a RAID 5 set, or RAID 10, before iops become a problem. I know that in general several smaller RAID sets are better for performance than a single RAID set accross the SAN. this is why I thought breaking the 600 GB drives into 2 arrays, would give me more options from a possible performacne standpoint, and, if RAID 1 is as fast as RAID 10, or at least better than RAID 5 from a write standpoint, then I would have a more suitable disk platform for some VMs with higher io requirements.

I am still pretty amature at SAN/virtualization, but I'm getting there.

serchlop

About RAID 1 and RAID 10, RAID 10 provides better fault tolerance and rebuild performance than RAID 01, but RAID 6 is going to be better than RAID 10 is every way: faster (less duplication needed), safer (it can guarantee survival after two drives die whereas RAID 10 can’t; it can even correct small errors on the disk) and more space-efficient (less duplication needed).

sharedit

ASKER

I have not read that RAID 6 is better than RAID 10 in write speeds. I have also read that in small disk sets, RAID 6 is no better than a small RAID 5 set with a hot spare.

SOLUTION

Paul Solovyovsky

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

David

No, RAID 10 absolutely does NOT have better fault tolerance than RAID 01. Can RAID10 survive 2 drive failures? No. RAID01 / RAID1 can.

As for performance, too many variables. Biggest one is are you transactional or throughput oriented. I/Os per second and throughput are different, as well as reads vs writes vs random vs sequential. RAID controllers vary also, as well as performance is a function of block size and # of disks you have in the array & stripe sizes. Somebody who is good and has full control over back end config and file system parameters can pretty much make any "RAID" level outperform another RAID level that was improperly configured.

Besides, in VM you are effectively doing RANDOM I/O, not sequential. Optimizing for "better write speeds" in a vmware environment means optimizing for, well, nothing. Tricks that one would use for throughput and high write speeds are counter productive. No point stuffing large caches and doing read-aheads that one might ordinarily do for a non-virtualized database storage farm will slow you down in this environment.

You want to tune it right? Don't worry about tuning it until you deploy. Nobody can possibly tell you what will ultimately be best w/o knowing a lot more info. It isn't worth the headache now. Buy as may platters as your budget allows, and even better, defer purchasing until last possible moment. Disks get faster & cheaper, so why buy more than what you need today. Get a few VMs going and actually measure their I/O characteristics. Then build an array tuned for that type of load. Move the existing VMs over to that farm (so very easy in a virtual environment), then model the new virtual machines and perhaps build a different RAID level for that and continue the process.

sharedit

ASKER

Thanks for the posts.

Let me give you a little background on how I got to be where I am at. We are in the process of purchasing an imaging software that will manage and store some high quality medical images. The company feels that because some of the devices we will be capturing images from produce such a large image, they want to see 2-4TB of space initially, with the ability to expand. When we want to expand We will probably just add a second drive enclosure.

Hense the 6 x 7.2K 1TB SAS drives. 1-hotspare, 5 - raid 5 set of 4TB.

This leaves us with 6 drive bays left. One of these will be a hot spare, so effectively 5 drives left.

Because of some old server hardware, and another software implementation project comming down the pipe, we would like to have the ability to virtualize some things. For example, terminal servers, and a few other single purpose servers that will be required. So I am interested in running VMs in the remaining storage space.

So, because I feel I am somewhat limited to how I can break up the remaining drives (RAID5, RAID10, or possibly RAID5 and a RAID1 mirrored set.) I was looking for some input from people with experience, and can speak on some fairly general terms, but knowledgable about the details, which I am not strong in, yet.

Is a mirrored set a vaible option for VMs? Is it compareable to RAID10? I though the RAID5 and RAID1(if compareable to RAID10) would give me the ability to test VMs and how well they work on either sets. with a single RAID 5 or RAID 10 out of the remaining disks as I laid out, I have sort of painted myself in a corner. I would have to temporarily move the VMs to the slow 7.2K set temporarily to destroy the RAID5 or 10,whichever I created first, inorder to create another array to test and compare with the new RAID array.

dlethe, thank you for asking me questions I do not know the answer to. I do not know if I am transactional or throughput oriented, for example. I need some unknowns to look into.

I really like the idea of testing and then building an array around that data, but because of my limited 5 drive bays open, I feel that I have to have some idea of where I will be going with it initialy, or some idea of what will most likely work. Obviously there are lots of different variables, but in general terms, there must be solutions that are common and work for a majority of VM scenarios/servers. And do any scenarios I proposed make more sense than the others. I am most curious about the raid5 raid1 raid5. combo. If I find that the RAID1 set is not necessary and they are all performing finr on RIAD5, I can destroy it and add the drives to the RAID5 array. Though I assume that I gain some performance from the disks just because the arrays opperate independently, as opposed to one large set of RAID10 or RAID 5.

We are not initially planning on putting any major servers, SQL into a virtual environment.

David

Understood, and I am not trying to beat you up. I read school, and gravitated towards elementary/high school because my wife, one sister and mother-in-law are all teachers. I apologize.
Anyway, since the "purpose" is really to get something that will satisfactorily run the imaging with enough horse power to do other things, then you need to understand the nature of the I/O. Depending on the I/O block sizes, whether the data goes in RDBMS or raw partitions, even file system could make a $2500 solution outperform an incorrect $25000 solution. Seriously ..

Ask these questions so you can choose the correct technology for this FILE SYSTEM , and the other stuff probably won't be affected that much no matter what you do.

1. OS & file system used by the app.
2. the biggie. What is the native I/O size, in blocks when the scanner saves. With large files & image capture I could see I/Os so large that you would be in TCP/IP hell that it would be next to impossible to get enough frames, that are large enough coming through uncorrupted that you would get all the performance of saving to a USB stick. (well not that bad, but certainly 5-10MB/sec is a REAL possibility)
3. does it allocate just a large amount of space to write flat files, or does it use a database. (All questions on saving scanned output are as significant as they are frequent. If you do one scan a day, or one every 10 minutes for hours at time then it matters.

(Reasons are many, but I am thinking bus saturation also)

4. If viewing will be frequent and regular, then is this a matter of opening up a file , or are images hosted in a web server? How big are the files and how often.

Bottom line, what is expectation on read/write traffic, and what is average mix you can expect, based on how the application is asking for data. Async vs synchronous I/O ?
With all the money you are spending, you seem to be more worried about makes and models of switches instead of data. Those 7200RPM SAS disks are really just SATA disks, by the way. They are enterprise class mechanisms, but the firmware is rather stupid and those disks may not be suitable.

Throwing an idea out, but you should investigate smaller faster, SAS-2 disks in RAID1 or RAID10 and automating something to migrate to a pool made out of 2TB SATA disks in RAID10 configuration also. With large files, then those slower SATA disks should be fine. Then go with the SAS drives for general use.

Nothing wrong with mixing & matching. But scanners doing write intensive large block I/O will probably just kill that RAID5. You DON'T want everybody on the network complaining about poor performance the minute somebody starts up the scanners. So find out from their engineer about the I/O. I would also get a reference and ask them what storage they used, and see if it crawls the minute people scan.

SOLUTION

Irwin W.

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Irwin W.

I hit submit too soon.

Since this is in the VMWare category, remember that VMFS stores cannot be larger than 2TB. If tou want to allocate more than 2TB to a virtualized server, make it a physical and not virtual volume after presentation.

sharedit

ASKER

At this point I am going to order the HP unit with 7 drives at 600 GB which should give me some options for fast drives, and 5 1TB drives to get a 3TB data store. which should last for about 3 years in the software engineers mind.

I am going to wrap up the point distribution, I just wanted to ask if any one has any pointers for performance tuning, or things to look out for? dlethe you seem to be intimately familiar with this subject, and you claim a misconfigured file system can ruin a RAID array, can you elaborate on that or offer any pointers.

As far as the virtual server that will host the software, I was thinking that I would have the system partition on the faster drives, but for the larger slower drive space, would the most appropriate way to present it to the server as a Virtual Disk from the SAN? this would be opposed to say something like attaching a large peice of the disk to the ESXi host, and then createing a vmdk in that space.

dlethe, I didnt think you were beating me up, I do appreciate you talking over my head, becasue I like to know what i dont know, so I can work towards filling in those areas.

Any good links to benchmarking performance would be great too.

for disk configuration, Im thinking that the 6 15k drives can be configured first as a 3 drive RAID 5 array, if performance seems to be adequate, I would then have the option to expand the RAID5 with the remaining disks, or create a second RAID 5 array with the remaining 3.

If RAID 5 is determined sub par, which I dont expect to be the case, I am envisioning creating a Mirrored set, from the remaining set, moving the vmdks to it, destroying the RAID5, and recreating a new RAID 10 out of the remaining 4 disks.

this leaves me with a second mirrored set, or I can the destroy it and add the remaining two drives to the RAID10

Please comment on any of this.
Greatly appreciated,

ASKER CERTIFIED SOLUTION

David

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sharedit

ASKER

Thanks for the comments.