I believe I am having some performance issues with SOFS that is running on our domain.
We are using Windows 2016, but we aren't using SOFS Direct.
Our setup has 2 hosts, Storage1 and Storage2. Each machine has 3 networks. Domain, Cluster and Storage that are used for what they are named after.
Each Host is connected to all 3 JBOD's twice with MPIO installed on both machines.
Virtual machines are running without much of an issue, however I am trying to diagnose another problem with DFSR and I think it is to do with the speed of the storage.
Each JBOD has 16 1GB HDD's and 5 SSDs (one has 6) The virtual disk has a column count of 8 and a data copy of 2
When I first set this up I was getting great speeds using the command diskspd.exe -b64K -d10 -h -L -o32 -w30 -t2 -c2G io1.dat io2.dat io3.dat io4.dat io5.dat
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 5158862848 | 78718 | 491.24 | 7859.87 | 4.070 | 26.885 | io1.dat (2048MB)
1 | 5099421696 | 77811 | 485.58 | 7769.31 | 4.116 | 31.502 | io1.dat (2048MB)
2 | 3890741248 | 59368 | 370.49 | 5927.81 | 5.394 | 9.878 | io2.dat (2048MB)
3 | 7534411776 | 114966 | 717.45 | 11479.18 | 2.786 | 2.321 | io2.dat (2048MB)
4 | 6189678592 | 94447 | 589.40 | 9430.39 | 3.391 | 7.073 | io3.dat (2048MB)
5 | 6811746304 | 103939 | 648.63 | 10378.15 | 3.081 | 2.714 | io3.dat (2048MB)
6 | 3873636352 | 59107 | 368.86 | 5901.75 | 5.418 | 10.154 | io4.dat (2048MB)
7 | 5721554944 | 87304 | 544.82 | 8717.17 | 3.669 | 11.433 | io4.dat (2048MB)
8 | 6418595840 | 97940 | 611.20 | 9779.16 | 3.271 | 4.274 | io5.dat (2048MB)
9 | 6093799424 | 92984 | 580.27 | 9284.31 | 3.450 | 6.195 | io5.dat (2048MB)
-----------------------------------------------------------------------------------------------------
total: 56792449024 | 866584 | 5407.94 | 86527.11 | 3.697 | 13.982
Now I have been looking in to why one of my file servers was going slow, to try and fix it I pinned it direct to the SSD tier. Users are still reporting slowness, so I decided to run the tests again on both the pinned and unpinned VHDX's running on the same server.
I was receiving total: 775946240 | 2960 | 12.33 | 49.33 | 3571.875 | 4524.332 on the pinned VHDX which is slower than a single 7.2k HDD
On the unpinned I got total: 5881462784 | 22436 | 93.48 | 373.93 | 476.933 | 2971.906 which is better, but still pathetically slow for the hardware in use.
Watching the storage server during this period, it only averages around 300 Mbps network throughput, where it used to hit 2000+
Is there anything I can do to investigate these issues? It is kind of grinding my domain to a halt.
You mention having "2 hosts" but don't go into too much detail. Questions off the top of my head:
1) How many compute nodes do you have?
2) How many storage nodes do you have? (I assume 2 since you named your "hosts" storage1 and storage2)
3) How are your compute nodes connected to your storage nodes (you only talked about storage to JBOD)
4) How many CSVs have you defined? If only 1, then you'll have some balancing issues with I/O and a storage node basically doing nothing or redirecting (performance hit.)