We help IT Professionals succeed at work.

SOFS horrendous performance

124 Views
Last Modified: 2019-10-01
I believe I am having some performance issues with SOFS that is running on our domain.

We are using Windows 2016, but we aren't using SOFS Direct.

Our setup has 2 hosts, Storage1 and Storage2. Each machine has 3 networks.  Domain, Cluster and Storage that are used for what they are named after.

Each Host is connected to all 3 JBOD's twice with MPIO installed on both machines.

Virtual machines are running without much of an issue, however I am trying to diagnose another problem with DFSR and I think it is to do with the speed of the storage.

Each JBOD has 16 1GB HDD's and 5 SSDs (one has 6)  The virtual disk has a column count of 8 and a data copy of 2

When I first set this up I was getting great speeds using the command diskspd.exe -b64K -d10 -h -L -o32 -w30 -t2 -c2G io1.dat io2.dat io3.dat io4.dat io5.dat

thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      5158862848 |        78718 |     491.24 |    7859.87 |    4.070 |    26.885 | io1.dat (2048MB)
     1 |      5099421696 |        77811 |     485.58 |    7769.31 |    4.116 |    31.502 | io1.dat (2048MB)
     2 |      3890741248 |        59368 |     370.49 |    5927.81 |    5.394 |     9.878 | io2.dat (2048MB)
     3 |      7534411776 |       114966 |     717.45 |   11479.18 |    2.786 |     2.321 | io2.dat (2048MB)
     4 |      6189678592 |        94447 |     589.40 |    9430.39 |    3.391 |     7.073 | io3.dat (2048MB)
     5 |      6811746304 |       103939 |     648.63 |   10378.15 |    3.081 |     2.714 | io3.dat (2048MB)
     6 |      3873636352 |        59107 |     368.86 |    5901.75 |    5.418 |    10.154 | io4.dat (2048MB)
     7 |      5721554944 |        87304 |     544.82 |    8717.17 |    3.669 |    11.433 | io4.dat (2048MB)
     8 |      6418595840 |        97940 |     611.20 |    9779.16 |    3.271 |     4.274 | io5.dat (2048MB)
     9 |      6093799424 |        92984 |     580.27 |    9284.31 |    3.450 |     6.195 | io5.dat (2048MB)
-----------------------------------------------------------------------------------------------------
total:       56792449024 |       866584 |    5407.94 |   86527.11 |    3.697 |    13.982

Now I have been looking in to why one of my file servers was going slow, to try and fix it I pinned it direct to the SSD tier. Users are still reporting slowness, so I decided to run the tests again on both the pinned and unpinned VHDX's running on the same server.

I was receiving total:         775946240 |         2960 |      12.33 |      49.33 | 3571.875 |  4524.332 on the pinned VHDX which is slower than a single 7.2k HDD

On the unpinned I got total:        5881462784 |        22436 |      93.48 |     373.93 |  476.933 |  2971.906  which is better, but still pathetically slow for the hardware in use.

Watching the storage server during this period, it only averages around 300 Mbps network throughput, where it used to hit 2000+

Is there anything I can do to investigate these issues? It is kind of grinding my domain to a halt.
Comment
Watch Question

CERTIFIED EXPERT
Distinguished Expert 2018

Commented:
A bit confused on your topology, or perhaps just not enough information.

You mention having "2 hosts" but don't go into too much detail. Questions off the top of my head:

1) How many compute nodes do you have?
2) How many storage nodes do you have? (I assume 2 since you named your "hosts" storage1 and storage2)
3) How are your compute nodes connected to your storage nodes (you only talked about storage to JBOD)
4) How many CSVs have you defined?  If only 1, then you'll have some balancing issues with I/O and a storage node basically doing nothing or redirecting (performance hit.)

Author

Commented:
Hi Cliff, thanks for the fast reply!

By computer node do you mean hosts connecting to the storage nodes? If so there are 20 hosts.

It is 2 storage Nodes

Computer nodes if they are hosts are connected over a 10GB network, running speed tests the network is capable of much faster speeds than what I am currently getting.

I only have the one CSV, however the Storage node that hosts this CSV isn't being taxed at all, average 4% CPU and 10/192GB of RAM.  I have tried pausing the second storage node while running speed tests to make sure there are no redirects etc. However I still get the same results.
Technical Architect - HA/Compute/Storage
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
I only have a 2 way mirror set up. Not a 3 way.

I have run the command  Get-MSDSMGlobalDefaultLoadBalancePolicy and it just replies with "None"
Philip ElderTechnical Architect - HA/Compute/Storage
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.