Hi, i have a weird issue with performance on Storage Spaces Direct.
I have a blade server which connects over 10Gb (Not RDMA, although i am about to upgrade to RDMA switches and NICs) to my SOFS cluster which is running 2016 S2D.
I have 2 nodes, each with 2x NVME 1.6Tb drives (samsung pm1725) for cache, 4x SSDs for Performance (SAMSUNG MZILS1T6HEJH0D3) and then some 1TB disks (SEAGATE ST91000640SS)
I am at a loss as to what can cause this. They are on the same CSV, same host, using the same virtual switch.
Any help with what could be causing this is greatly appreciated! Storage Spaces in the bane of my life.
StorageWindows 10AzureWindows Server 2016
Last Comment
Philip Elder
8/22/2022 - Mon
Philip Elder
What is the setup for each VM? Are they identical?
Any StorageQoS policies applied to one but not the other?
CaptainGiblets
ASKER
Hi Phillip, thanks for responding!
Each VM is identical apart from ram and processor, the guest running slower actually has more ram + cpu assigned. All files for the VM are stored on the same CSV including the config files.
Both are running at configuration version 8 on Server 2016. There are no IOPS limits set under the VHDX settings.
There are no QoS policies applied on the S2D cluster other than the default one.
PS C:\S2D Scripts> Get-StorageQosPolicy
Name MinimumIops MaximumIops MaximumIOBandwidth Status
---- ----------- ----------- ------------------ ------
Default 0 0 0 MB/s Ok
Philip Elder
Okay. Flip the slower VM's settings over to the same as the better performing one vCPU and vRAM wise. Run those tests again.
I have changed everything to the same as the one performing well, however the speeds still have the same gulf in them.
I can copy a file to the C drive of the server with fast speeds and it flies through at around 600MB-800MB/s, however when i copy to the slow machine it struggles to even hit 100MB, it goes down to 0 and pauses for a few seconds then resumes again. It will do this several times over the course of a copy.
Philip Elder
Do the VHDX file(s) share the same location? Can the slow VM's VHDX file(s) be moved to the same location/CSV as the fast one?
When the virtual drive was created was the -WriteCacheSize used for New-VirtualDisk?
CaptainGiblets
ASKER
I didn’t use write cache size as non of guides have it and I have nvme cache as well. If it should use it I have space to move to one csv and recreate one and do same with second.
The -WriteCacheSize switch sets aside physical RAM on the owner host for caching. It speeds things up depending on what is being hosted in the CSV. The catch is that that is less RAM to use for virtual machines.
CaptainGiblets
ASKER
I am going to try this today, i dont host VMs on my S2D servers at the moment and it has around 192Gb of ram so assigning around 40Gb shouldnt be an issue at all to both CSVs.
The only other thing i thought it could be is to do with power loss mode? As i had to set ispowerprotected to true manually myself on the storage pool. I also had to set my disks not to use write back cache as they do support power loss.
Philip Elder
Note: I tried to set up a CSV on a newly stood up Server 2019 shared SAS cluster using the -writecachesize switch and kept getting a "There's not enough room in the pool to create the virtual disk" error. I ended up creating the CSV without that switch to move the project forward.
I have new 40Gb switches and NICs that are RDMA compatible (mellanox infiniband) being delivered today. So i am going to put them in tonight and see if that makes any difference.
Still very weird that machines on the same host and same CSV have different speeds though.
Philip Elder
We planned to get IB going with SOFS and North-South fabric back in the day but settled on Mellanox based RoCE for RDMA instead. I don't remember the "why". :S
CaptainGiblets
ASKER
Early testing yet, but speeds with the 40Gb adapters in seem to be miles better that through the 10Gb that didnt support RDMA.
When running diskspd i get a constant 4GB flowing through one port.
Going to extend to a few more machines to see if it works properly before migrating my file server to the new switches and then we will see what sort of impact it has!
Something i have noticed is that under the disk properties - policies it has write caching disabled for my SSDs and NVMEs despite them having PLP. Should this be enabled or does S2D disable this and handle it on its own? I did run
Set-Storagepool clusterpool -ispowerprotected $True as it was showing as false by default.
Along with this my MPIO policy is currently set to round robin, however tonight i will set this to least blocks as from what i have read this is recommended.
Philip Elder
Least Block Depth for MPIO in shared SAS Storage Spaces is the way to go.
CaptainGiblets
ASKER
Going to take back the comment about it working better on 40Gb network. Running this command diskspd.exe -t32 -b4k -r4k -o8 -w30 -d10 -D -L testfile.dat i am now getting around 21MB and 5499 iops. Yet on my desktop with a Samsung Evo 850 i get 287.33MB and 73557.45 IOPS.
Unless this setup just hates using a JBOD i dont know what else can be causing the issues.
Going to have to start looking at alternative solutions.
Any StorageQoS policies applied to one but not the other?