CIFS Performance Expectations Too High?

I have a Windows 2003 server connecting via CIFS to a Sun S7310 Unified Storage System.  I am trying to tune the server for optimal throughput over CIFS to this new storage system.  The server is also directly connected to an EMC Clariion array via 2Gb FC.  When copying 2GB files from the FC array to the S7310, I can't seem to achieve speeds better than 54MB/s.

The server has an Intel Pro 1000 MT NIC, connected to a Cisco 3750 switch using CAT6 cable. This switch comprises a segregated storage network.  None of this traffic is touching our production LAN in any way, so theres no chance of any broadcast traffic from other servers coming into play. I am using jumbo frames and have enabled all of the offload features of the NIC.  I have also made the following registry modifications to enable "high performance networking" as recommended by Micrososft:

MaxHashTableSize      65536
Tcp1323Opts            1
TcpAckFrequency            13
TcpWindowSize            131400

I have also installed the Windows 2003 Resource Kit to rule out possible performance problems with the FC array by using the "CREATEFIL.EXE" utility.  When using this tool to create a 2GB file on the S7310, I only get about 15MB/s.

When performing similar tests from a Solaris 10 server connected to the S7310 via NFS, I get fantastic throughput- 120MB/s over a single Gb NIC.

Can anyone think of anything else I can "tune" on this server to improve performance?  Also, the server is not being utilized in production right now,  so there's nothing else running on it to create any load that would potentially reduce performance.  Perhaps my expectations of CIFS is just too high?
Who is Participating?
First of all, good job on the home work.  Most people don't try anything at all to tune things.  :-)

Your numbers are about right for CIFS.  CIFS is really old (20+ years) and designed for 64KByte buffers.  Anyplace that is insufficient (like GigE networks), there are performance problems.

That is the primary reason for SMB2 and SMB2.1 in Vista/Windows7 and Server 2008. They also have VASTLY improved TCP/IP stacks.

You can set your window size even higher to say 1-4 Mbytes and that is about all you can fill on a 1Gbps link.  Latency * Bandwidth is the formula.  A max latency of say 10msec is quite possible between hosts so 125Mbytes/sec x 0.010 seconds = 1.25Mbytes.

To prove this you can bump the TCPWindowSize up and try an FTP in each direction to the same server.  That will eliminate the CIFS protocol but stress everything else and see what is possible.

The numbers you get from this test should give you a better idea of your peak performance.

10GE will help performance some, but use fiber or CX-4 not UTP.  It isn't the pipe size that helps as you are not filling it, but 10GE offers a bit lower latency.  The UTP PHY has much higher latency.

Anything you can do to lower latency will increase the rate of the 64KB chunks you are moving with SMB.  A faster CPU is better than more CPUs.  Look at enabling RSS in 2003SP2.  It helps spread out the interrupt servicing.  Use really good server NICs.  Intel Server NICs are about the best IMO..  Low latency, good offloads.  Do not use the Chimney TCP offload in 2003.

Consider multiple threads for moving the data.  It hides some of the weakness in SMB..  SMB2 will fix it if you can along with the CTCP.  You will likely need to bump the TCP window on Solaris as well as enable RFC 1323 for the same reasons.

There is a HotFix for Windows Server 2003 that implements the better TCP/IP congestion behavior (CTCP).

Sometimes if you can turn off the Nagel algorithm on both sides it helps.  Not very often though. You can also adjust the number of packets sent in the first sequence for the Windows side, but the CTCP is likely better.

What is Nagel?

The real answer is SMB2 on both sides...
Samba4 has SMB2 support, but I am not sure of it's availability on Solaris 10.  You might check with Sun.

I think you also have the TCPWindow size too small for the ACK frequency x Jumbo frames combination.  MS says 1/2 to 1/3rd of the window size and your Jumbo pushes the number of MTU per Window to ~14 or so.  You are not ACKing until your window is exhausted (13).  Another vote for a 1MB-4MB window.  However I doubt CIFS will really use it.  It only moves 64KB at a time.
There are a lot of things this could be. But, let's consider what part of the OSI layers we will be troubleshooting.

Since the network isn't really the issue, Maybe its and application layer, presentation layer, or session layer protocol.

It could also be something like ECC memory. Any kinds of error correction will slow down traffic.

Any kinds of presentation, like zipping up the data and unzipping the data, will slow down traffic.

Your applications pointing in the wrong directions can slow down traffic.

A busy nic will slow down traffic.

--First thing I would do is throw wire shark on there and see how chatty your nic is.

--Also see if you have some sort of session layer, like IP sec, connections.

--Then, I think I would also look at your nic bindings and see what session layer protocols you are using on top of what you really need. Let's say you are running Client services for Netware on a domain that doesn't need it.

I think I would also run DCdiag /v to see if you see any errors that the server is trying to overcome.
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

m_mccabeAuthor Commented:
Thank you for the responses, and sorry for the delay in my response.  I have run dcdiag /v, and all tests were passed.

I am not using IPsec, and there are no unnecessary protocols running- Only TCP/IP, File and Printer Sharing, and Client for Microsoft Networks.  The network that the NIC is connected to is a dedicated storage network, and broadcast traffic is very low.

I booted the server into diagnostic mode and ran through a full hardware test below the OS level, which came up clean.  I have also disabled NetBIOS over TCP.

I still can't seem to find a way to improve performance on this system.  I wish I had 2003 R2 installed instead of 2003 standard, so I could test NFS performance from the same server, but I don't.

m_mccabeAuthor Commented:
Perhaps a more valid question would be this:

What kind of performance have others seen when connecting windows servers to CIFS shares on storage appliances such as NetApp, Celera, etc.?
m_mccabeAuthor Commented:
Thanks very much for the wealth of information Corey.  I have adjusted the TCPWindowSize again, and also installed the suggested hotfix.  Performance has improved a bit, but it's not a vast difference (about 59-60MB/s).  I will say that the traffic seems to be a bit more constant though.

These speeds are already overkill for what are users requirements are, but I personally have issues with settling for anything but peak performance.

I think that I've probably pushed the CIFS performance on this particular system about as far as it will go at this point.  I haven't heard of anyone else out there getting anywhere near this performance anyway.

We'll go ahead and run with it as is, and thanks again for your sage advice!
Sure thing.

What performance numbers do you get for these same two hosts using FTP?  SMB2 will do dramatically better and you might need to tune the Sun Solaris side as well.  Traffic to the Sun would use the TCP Receive Window on that side.  That parameter is used for receiving traffic only.  If you use a program besides SMB, you will want to test larger buffers for both sides as well.  Larger buffers are the only way to fill the pipes if the turn-around for the acknowledgements is already tuned.

There are also buffers in the NIC drivers to make sure the NDIS layer has room to handle packets from the NIC.  Each NIC is different, but maxing them out might help a bit more.

I have been able to get in excess of 4Gbps with CIFS, but is takes some tuning and more than one session stream.  All the SMB I/O tends to be multiplexed over one socket so if you can set up multiple sessions you can get better aggregate throughput.

Good luck and someday SMB2 will make this problem move somewhere else!

RSS really can help spread the load around so that the latency is reduced.

Again though tuning only one side is problematic.  It might be that two WIndows boxes in your environment can move much more, but Solaris SMB/CIFS is not up to the task.    Solaris CIFS is kernel-resident and so might be up to the task.  Hard to say though and testing is the way to the truth...

m_mccabeAuthor Commented:
Good call on testing with FTP.  I'm getting 120MB/s on both the Windows and Solaris servers for uploads.

This tells me that CIFS specifically is the poor performer here.  I would love to try SMB2, but we are connecting to a Sun S7310 system- which is effectively an appliance.  Even if there were SMB2 software available for Solaris 10, I doubt I could load it on the appliance without blowing the support contract.

So, I assume the 4Gb/s you achieved with CIFS was on a 10Gb interface?  If so, it sounds like we are getting similar performance if you look at it as a percentage.  You got around 40% and I have been  getting 40-50% without jumbo frames and closer to 64% with jumbo frames.

BTW, I already maxed out the buffers on the Windows server's NIC.

The pipes are definitely a case of diminishing returns.  Without CIFS I can get about 9.8Gbps in either direction or about 14Gbps when full duplex.  CIFS is less than half that unless there are many streams. (10-25+ streams)

Sounds like you should be able to do what you need and who knows what another year will bring.  

Have a great day.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.