Solved

VMWare ESXi 3.5 u4 suffering slow performance in iSCSI RAID-5 of 15k rpm SAS SAN

Posted on 2009-05-11
22
3,782 Views
Last Modified: 2012-05-07
Hi All,

I'm suffering very slow performance in using my VM deployed on the iSCSI SAN-VMFS datastore, the following attachment shows the deployment diagram which i believe already according to the best practice around the net by segregating the network from SAN into the server.

However, after reading the quoted article, it seems that no matter how fast the disk is, using SAN
in a VMWare environment it will always be slow around 160 MBps :-|

This usually means that customers find
that for a single iSCSI target (and however many LUNs that may be
behind that target  1 or more), they cant drive more than 120-160MBps. 

any idea or comments please ?
http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-customers-using-vmware.html

Open in new window

Deployment.jpg
0
Comment
Question by:jjoz
  • 10
  • 6
  • 5
  • +1
22 Comments
 
LVL 18

Accepted Solution

by:
larstr earned 290 total points
ID: 24361891
Yes, the limit is because you're using 1Gbit ethernet as a transport for iSCSI. 1Gbit full duplex gives you 125Gbyte each direction, but with protocol overhead and encapsulation the maximum performance you will see will typically be 120-160Gbyte. That would still be enough for many, but it depends on your workload. Latency is also higher on iSCSI (& NFS) than on FC/DAS. For most people, the maximum random IO achieved is much more important than the IO generated by a single block IO. To measure such things you can use iometer.

With 10GbE you will however be able to see higher transfer rates than 160GbE.

If iSCSI is your bottle neck do largely depend on the number of spindles in your SAN, the amount of cache on the controller(s) and your networking setup/equipment. RAID5 is also not the best protocol to use if you have multiple VMs accessing the array at the same time (generating a mixed workload), but it may still be enough. For database workloads you might want to consider using RAID10 if you can't get the requested performance when using RAID5.

Lars
0
 
LVL 1

Author Comment

by:jjoz
ID: 24361983
Ok, thanks for the reply Lars,

I was thinking to redesign the network all over again from scratch, instead of having different subnet for each cable and then perform LAN teaming so that 2x 1 GB Ethernet cable can boost the data performance for the VM.

well, the case is that:
in total 15 x 300 GB SAS HDD

i created large RAID-5 LUN from 14 x 300 GB and then i created 1 TB of VMFS partition and the SQLIO benchmark is really horrible for the VM on the SAN as oppose to the local SATA 7200k rpm.

 
Local HDD: 4x 500 GB SATA 7200 rpm RAID  5  
C:\SQLTEST>sqlio.exe  
sqlio v1.5.SG  
1 thread reading for 30 secs from file testfile.dat  
        using 2KB IOs over 128KB stripes with 64 IOs per run  
initialization done  
CUMULATIVE DATA:  
throughput metrics:  
IOs/sec:  8826.73  
MBs/sec:    17.23  

while  

SAN HDD: 14x 300 GB SAS 15000 rpm RAID  5  
C:\SQLTEST>sqlio.exe  
sqlio v1.5.SG  
1 thread reading for 30 secs from file testfile.dat  
        using 2KB IOs over 128KB stripes with 64 IOs per run  
initialization done  
CUMULATIVE DATA:  
throughput metrics:  
IOs/sec:  2314.03  
MBs/sec:     4.51  
0
 
LVL 21

Assisted Solution

by:from_exp
from_exp earned 100 total points
ID: 24362017
Please note however, that teaming will not necessarily gives you doubled performance.
It is a limit of a link aggregation(LAG) - when you have only two hosts communicating via a LAG, then only one line is used.
It is due to mechanisms used to load balance traffic across multiple links. Load balancing is done on L2 (src-dst mac pairs) or on L3 (ip src-dst pairs)
In case of two hosts communication, you have only one pair of macs and ips, so only one line can be utilized.
0
 
LVL 21

Assisted Solution

by:robocat
robocat earned 110 total points
ID: 24362026

Many admins often obsess with throughput, but in most VMWare environments, throughput is not important at all.

In our environment with 50 virtual machines running, we rarely see the average throughput go beyond 50Mbps, that's 5% of the limit of Gbit ethernet. This is because most disk I/O is random, not sequential, as larstr said.

In a VMWare environment IO LATENCY is important, not throughput.

If you experience bad IO performance, first measure IO latency using IOmeter to see if this is actually the cause of your problem.

You can improve on latency using more physical disks for each LUN, creating a single LUN/VMFS datastore for each high performance VM instead of putting multiple VMs on a datastore, using RAID10 instead of RAID5, getting more cache in your SAN, ...

0
 
LVL 21

Expert Comment

by:robocat
ID: 24362085

To add to my previous post, througput is NOT your problem.

>perform LAN teaming so that 2x 1 GB Ethernet cable can boost the data performance for the VM.
>MBs/sec:    17.23  
>MBs/sec:     4.51

So you have 1000Mbps capacity on your ethernet cable and you are using only 4.51Mbps or 0,4%!!  

Focus on latency instead.
0
 
LVL 1

Author Comment

by:jjoz
ID: 24362205
ok, i shall try to create
Oh.. so in this case i shall reformat the SAN and create multiple smaller LUNs to achieve greater performance ?
and deploy it like the following

LUN 1: 7x 300 GB RAID - 5 --> VMFS
LUN 2: 7x 300 GB RAID - 5 --> any other purpose.
0
 
LVL 18

Assisted Solution

by:larstr
larstr earned 290 total points
ID: 24362239
Disk throughput is best measured with io/sec, not MB/sec.

A single 15k SAS disk will give you ~180 IO/sec. Many small disks in a RAID is better than a few ones as you will get more IO/sec. A larger cache will benefit you and give higher IO because parts of the disks will live in the cache on your SAN.

RAID5 is good for an IO load that is consisting of sequential reads. The workload you are testing is however a database stress tool that is generating random IO. RAID10 is best for random IO.

What SAN are you using?

Lars
0
 
LVL 1

Author Comment

by:jjoz
ID: 24362261
Lars,

I'm using Dell PowerVault MD3000i - iSCSI using dual controller.

see the following simulator result.

untitled.JPG
0
 
LVL 18

Assisted Solution

by:larstr
larstr earned 290 total points
ID: 24362483
If I were you I would run iometer and compare the results with what other MD3000i users have gotten here: http://communities.vmware.com/thread/73745?start=315&tstart=0 (Note that there are tests of both RAID5 and RAID10 on MD3000i there)

Then you will know how your config compares to others and if you have done something wrong.

Keep in mind that local storage has much lower latency than iSCSI storage because iSCSI is transfered over ethernet.

Lars
0
 
LVL 1

Author Comment

by:jjoz
ID: 24362519
yes,

here it is the result.

Cheers.

VMWS01.jpg
0
 
LVL 18

Assisted Solution

by:larstr
larstr earned 290 total points
ID: 24362577
I'm sorry, but that screen shot doesn't really say much.

1. Use the config file descibed in the first posting of that thread or download this 1MB iso file:
 ftp://ftp.eurodatasystems.com/Perftest.iso   (User: customer  Password: customer)

2. Running throug hthe test will take ~20 minutes and it generates a .csv file.

3. Open the .csv file in your favorite spreadsheet application and locate the desired numbers.

Lars

0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 21

Assisted Solution

by:robocat
robocat earned 110 total points
ID: 24372235


I did some research on the MD3000i, it's possible that you're experiencing the limits of this system.

To optimize latency:

- check that the write back cache in the controller is enabled
- make sure the array is running the lastest firmware
- make sure your ESX has all the latest patches

- experiment with a RAID10 disk group consisting of 8x300GB or even 10x300GB. Performance will increase as the number of disks increases. RAID5 is a performance killer.

- check the vmware website for performance whitepapers on iscsi.
0
 
LVL 1

Author Comment

by:jjoz
ID: 24373413
ok, thanks for your suggestion Robocat, However out of my deep desperation and stress,
 I was thinking to redesign the network all over again from scratch,
 instead of having different subnet for each cable would it be better off to perform trunking ?
 
 directly from the ESXi Servers into the SAN to utilize 2x 1 GB Ethernet cable to boost the data performance for the VM without the use of any switch in between the SAN and Servers.

and also making the disk RAID-10 consisting of many smaller LUN ?
0
 
LVL 21

Assisted Solution

by:robocat
robocat earned 110 total points
ID: 24382681

Your network design is fine, no need to change it.

I repeat again that you shouldn't concentrate on the network, you're using less than 1% of the capacity, so  performance gains are unlikely to be obtained there.

Try experimenting with the storage system as indicated above, to get the maximum out of it.
0
 
LVL 1

Author Comment

by:jjoz
ID: 24382804
For your info, I share my hard to believe experience in configuring my iSCSI SAN with you here:

MD3000i is just a small entry level SAN device which can only use one single cable to access the iSCSI target, so no matter how complex the configuration is, the I/O performance will not be as great as the adding managed switch to perform VLAN trunking.

http://virtualgeek.typepad.com/virtual_gee...ing-vmware.html --> the last question #4 is the eye opener

so by using the deployment diagram that i supplied on top, i have to accept that it is not possible to achieve high performance greater than single cable connection :-|

hope that helps you in the future,

 I feel bad after spending this much money without any greater performance of my Local Server RAID-5 SATA drive :-|
0
 
LVL 1

Author Comment

by:jjoz
ID: 24393727
I 've used Microsoft iSCSI initiator inside the VM to access another LUN but it seems that it is still slow though.
0
 
LVL 18

Assisted Solution

by:larstr
larstr earned 290 total points
ID: 24393815
Did you try running that iometer test inside your VM to see how your performance compares to the others?
0
 
LVL 1

Author Comment

by:jjoz
ID: 24394674
i use this instead
http://www.roadkil.net/program.php?ProgramID=13

in terms of random access the iSCSI SAN won, but the linear is still slow.
I'm now begin to think that MD3000i was designed only to utilize one single cable connection if it is deployed without VLAN trunking (managed switch).

vm-LocalSATA.JPG
vm-SANSAS.JPG
0
 
LVL 18

Assisted Solution

by:larstr
larstr earned 290 total points
ID: 24395364
well.. Random IO is what you want. It's much more important than sequential IO
0
 
LVL 1

Author Comment

by:jjoz
ID: 24395413
yes, that's right.

It seems that even though i configure two cable in the same subnet and connect it to the same vSwitch only one link that is used which still gives me low performance. (-_-)"
0
 
LVL 21

Assisted Solution

by:robocat
robocat earned 110 total points
ID: 24401979

>It seems that even though i configure two cable in the same subnet and connect it to the same vSwitch only one link that is used which still gives me low performance.

You are still thinking that a single Gbit ethernet is giving you low performance. In a real world environment, using a typical VMWare deployment, you will never ever exceed the capacity of a single Gbit connection.

Typical VMWare environments have mostly random I/O with a block size of 4k to 32k. That's the top half of your speed test in the random column.

We run >50 virtual servers on an ESX cluster and have an average combined I/O throughput that rarely exceeds 50Mbit/s, that's 0.05Gbit/s. Only if we needed to run more than 200 VMs in our environment would we ever exceed the capacity of a single Gbit connection.

Unless you are talking about such big environments, you should NOT worry about the network design, trunking, using 2 cables etc...
 
0
 
LVL 1

Author Closing Comment

by:jjoz
ID: 31582208
Thanks guys, your explanation, really does make a difference to my understanding and future career.

Cheers.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

In this article, I show you step by step with screenshots to assist you - HOW TO: Deploy and Install the VMware vCenter Server Appliance 6.5 (VCSA 6.5), with some helpful tips along the way.
Is your company's data protection keeping pace with virtualization? Here are 7 dynamic ways to adapt to rapid breakthroughs in technology.
Teach the user how to edit .vmx files to add advanced configuration options Open vSphere Web Client: Edit Settings for a VM: Choose VM Options -> Advanced: Add Configuration Parameters:
Teach the user how to install and configure the vCenter Orchestrator virtual appliance Open vSphere Web Client: Deploy vCenter Orchestrator virtual appliance OVA file: Verify vCenter Orchestrator virtual appliance boots successfully: Connect to the …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now