Avatar of CaptainGiblets
CaptainGiblets
Flag for United Kingdom of Great Britain and Northern Ireland asked on

SOFS horrendous performance

I believe I am having some performance issues with SOFS that is running on our domain.

We are using Windows 2016, but we aren't using SOFS Direct.

Our setup has 2 hosts, Storage1 and Storage2. Each machine has 3 networks.  Domain, Cluster and Storage that are used for what they are named after.

Each Host is connected to all 3 JBOD's twice with MPIO installed on both machines.

Virtual machines are running without much of an issue, however I am trying to diagnose another problem with DFSR and I think it is to do with the speed of the storage.

Each JBOD has 16 1GB HDD's and 5 SSDs (one has 6)  The virtual disk has a column count of 8 and a data copy of 2

When I first set this up I was getting great speeds using the command diskspd.exe -b64K -d10 -h -L -o32 -w30 -t2 -c2G io1.dat io2.dat io3.dat io4.dat io5.dat

thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      5158862848 |        78718 |     491.24 |    7859.87 |    4.070 |    26.885 | io1.dat (2048MB)
     1 |      5099421696 |        77811 |     485.58 |    7769.31 |    4.116 |    31.502 | io1.dat (2048MB)
     2 |      3890741248 |        59368 |     370.49 |    5927.81 |    5.394 |     9.878 | io2.dat (2048MB)
     3 |      7534411776 |       114966 |     717.45 |   11479.18 |    2.786 |     2.321 | io2.dat (2048MB)
     4 |      6189678592 |        94447 |     589.40 |    9430.39 |    3.391 |     7.073 | io3.dat (2048MB)
     5 |      6811746304 |       103939 |     648.63 |   10378.15 |    3.081 |     2.714 | io3.dat (2048MB)
     6 |      3873636352 |        59107 |     368.86 |    5901.75 |    5.418 |    10.154 | io4.dat (2048MB)
     7 |      5721554944 |        87304 |     544.82 |    8717.17 |    3.669 |    11.433 | io4.dat (2048MB)
     8 |      6418595840 |        97940 |     611.20 |    9779.16 |    3.271 |     4.274 | io5.dat (2048MB)
     9 |      6093799424 |        92984 |     580.27 |    9284.31 |    3.450 |     6.195 | io5.dat (2048MB)
-----------------------------------------------------------------------------------------------------
total:       56792449024 |       866584 |    5407.94 |   86527.11 |    3.697 |    13.982

Now I have been looking in to why one of my file servers was going slow, to try and fix it I pinned it direct to the SSD tier. Users are still reporting slowness, so I decided to run the tests again on both the pinned and unpinned VHDX's running on the same server.

I was receiving total:         775946240 |         2960 |      12.33 |      49.33 | 3571.875 |  4524.332 on the pinned VHDX which is slower than a single 7.2k HDD

On the unpinned I got total:        5881462784 |        22436 |      93.48 |     373.93 |  476.933 |  2971.906  which is better, but still pathetically slow for the hardware in use.

Watching the storage server during this period, it only averages around 300 Mbps network throughput, where it used to hit 2000+

Is there anything I can do to investigate these issues? It is kind of grinding my domain to a halt.
StorageWindows OSWindows 10AzureWindows Server 2016

Avatar of undefined
Last Comment
Philip Elder

8/22/2022 - Mon
Cliff Galiher

A bit confused on your topology, or perhaps just not enough information.

You mention having "2 hosts" but don't go into too much detail. Questions off the top of my head:

1) How many compute nodes do you have?
2) How many storage nodes do you have? (I assume 2 since you named your "hosts" storage1 and storage2)
3) How are your compute nodes connected to your storage nodes (you only talked about storage to JBOD)
4) How many CSVs have you defined?  If only 1, then you'll have some balancing issues with I/O and a storage node basically doing nothing or redirecting (performance hit.)
CaptainGiblets

ASKER
Hi Cliff, thanks for the fast reply!

By computer node do you mean hosts connecting to the storage nodes? If so there are 20 hosts.

It is 2 storage Nodes

Computer nodes if they are hosts are connected over a 10GB network, running speed tests the network is capable of much faster speeds than what I am currently getting.

I only have the one CSV, however the Storage node that hosts this CSV isn't being taxed at all, average 4% CPU and 10/192GB of RAM.  I have tried pausing the second storage node while running speed tests to make sure there are no redirects etc. However I still get the same results.
ASKER CERTIFIED SOLUTION
Philip Elder

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
CaptainGiblets

ASKER
I only have a 2 way mirror set up. Not a 3 way.

I have run the command  Get-MSDSMGlobalDefaultLoadBalancePolicy and it just replies with "None"
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.