asked on

Socket Performance

Hi Guys,

I have written a Socket class, basically a wrapper class for WSASocket functions. I used this class to replace a Named Pipe communications over a network, because the named pipe did not want to route all too well over the network.

The sockets (obviously) does not have the same problem. However, when the client and server is on the same machine, I get a throughput rate of about 39MB/s (megabytes) using the sockets, with Named Pipes I was able to achieve about 400MB/s on the local machine.
Furthermore the processor hits 100% usage using the sockets and the 39MB/s transfer rate (processor usage was about 90% on Named Pipes)

I would like to know whether this "phenomenon" is correct, or should I have been able to achieve either a higher throughput or a lower processor usage?

Regards
OD

ASKER CERTIFIED SOLUTION

jhance

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

jhance:

When testing the socket class I wrote a small server and client that were running on the same machine. (I may be wrong but I believe that the network card is now skipped in this scenario).
Anyway, I want to test the maximum through put that the socket class can achieve, so I write as fast as possible on the one side and receive the sent data as fast as possible on the other side.

This eventually translates to a troughput of about 39MB/s, with the processor usage at a constant 100% for as long as the test was conducted.

The throughput seemed a bit low for running one one machine, what do you think?

(I can post the code but it is actually a whole lot of code, because I implemented a layer above the socket to automatically pack together packets that were split into multiple packets by the network)

jhance

The network card is skipped (in the case where you are using the loopback address, 127.0.0.1) but if you are using the nework card's IP you end up going through the network card's drivers. But in EITHER CASE you use the TCPIP protocol stack.

39MB/s (is that BYTES or BITS???)

Again, so much depends on how you have coded this. I still suspect you are doing things poorly and this is what is causing the 100% CPU utilization.

Show some critical parts of your code!

ASKER

I am doing a lookup during the connect to get to the IP address of the specified Server, which then resolves to 127.0.0.1. So at least the network card drivers are skipped.

The throughtput is 39 MegaBytes / Second.

There are three classes applicable here:
CTcpIpSocket which is the base class from which CServerSocket and CDataSocket are derived.

Here is the code:
#define TCPIPSOCKET_EVENTMESSAGE                        WM_USER + 100

#define TCPIPSOCKET_PREAMBLE                              3142587
#define TCPIPSOCKET_POSTAMBLE                              7852413

#define TCPIPSOCKET_PACKETSIZE                        131072

#define TCPIPSOCKET_LOOPTIME                              50

#define TCPIPSOCKET_CONNECTWAIT                        3333
#define TCPIPSOCKET_DISCONNECTWAIT                  2222

#define TCPIPSOCKET_PRIORITY                              THREAD_PRIORITY_BELOW_NORMAL

bool CTcpIpSocket::Open(unsigned short nPort, int nBufferSize, bool bDatagram)
{
      Close();

      m_bDatagram = bDatagram;

      int nType = (bDatagram) ? SOCK_DGRAM : SOCK_STREAM;
      int nProtocol = (bDatagram) ? IPPROTO_UDP : IPPROTO_TCP;

      m_hSocket = WSASocket(AF_INET, nType, nProtocol, NULL, 0, WSA_FLAG_OVERLAPPED);

      if (m_hSocket != INVALID_SOCKET)
      {
            m_nPort = nPort;

            if(SelectEvents())
            {
                  SetOptions(nBufferSize);
                  OnOpen();

                  return true;
            }
      }

      TRACE("\nSOCKET ERROR : %d", WSAGetLastError());

      return false;
}

void CTcpIpSocket::SetOptions(int nBufferSize)
{
      BOOL bEnable = TRUE, bDisable = FALSE;

      setsockopt(m_hSocket, SOL_SOCKET, SO_BROADCAST, (char*)&bEnable, sizeof(BOOL));
      setsockopt(m_hSocket, SOL_SOCKET, SO_DEBUG, (char*)&bDisable, sizeof(BOOL));
      setsockopt(m_hSocket, SOL_SOCKET, SO_DONTLINGER, (char*)&bEnable, sizeof(BOOL));
      setsockopt(m_hSocket, SOL_SOCKET, SO_KEEPALIVE, (char*)&bEnable, sizeof(BOOL));

      int nLength = sizeof(int),
             nSndRcvBuffer = min(TCPIPSOCKET_PACKETSIZE, nBufferSize);

      setsockopt(m_hSocket, SOL_SOCKET, SO_RCVBUF, (char*)&nSndRcvBuffer, nLength);
      setsockopt(m_hSocket, SOL_SOCKET, SO_SNDBUF, (char*)&nSndRcvBuffer, nLength);

      getsockopt(m_hSocket, SOL_SOCKET, SO_SNDBUF, (char*)&nSndRcvBuffer, &nLength);

      m_lMaxMessageSize = nSndRcvBuffer;
}

void CTcpIpSocket::ProcessEvents()
{
      WSANETWORKEVENTS sEvents = {0};
      if (!WSAEnumNetworkEvents(m_hSocket, m_hEvents, &sEvents))
      {
            if (sEvents.iErrorCode[FD_READ_BIT])
                  ProcessErrors(sEvents.iErrorCode[FD_READ_BIT]);

            if (sEvents.iErrorCode[FD_WRITE_BIT])
                  ProcessErrors(sEvents.iErrorCode[FD_WRITE_BIT]);

            if (!sEvents.lNetworkEvents)
                  return;

            // FD_READ | FD_WRITE | FD_OOB | FD_ACCEPT | FD_CONNECT | FD_CLOSE
            int nEvents = sEvents.lNetworkEvents;

            if (nEvents & FD_READ)
            {
                  OnRead();
            }

            if (nEvents & FD_WRITE)
            {
                  m_bConnected = true;
                  OnWrite();
            }

            if (nEvents & FD_OOB)
            {
                  TRACE("\nSOCKET OOB EVENT");
                  OnOob();
            }

            if (nEvents & FD_ACCEPT)
            {
                  TRACE("\nSOCKET ACCEPT EVENT");
                  OnAccept();
            }

            if (nEvents & FD_CONNECT)
            {
                  TRACE("\nSOCKET CONNECT EVENT");
                  m_bConnected = true;
                  OnConnect();
            }

            if (nEvents & FD_CLOSE)
            {
                  TRACE("\nSOCKET CLOSE EVENT");
                  OnClose();
            }

            return;
      }

      ProcessErrors();
}

Now for a server Socket:
bool CServerSocket::Listen()
{
      SOCKADDR_IN sAddress = {0};
      sAddress.sin_family = AF_INET;
      sAddress.sin_port = htons(m_nPort);
      sAddress.sin_addr.S_un.S_addr = ADDR_ANY;

      if (!bind(m_hSocket, (SOCKADDR*)&sAddress, sizeof(sAddress)))
      {
            if (!listen(m_hSocket, SOMAXCONN))
                  return true;
      }

      TRACE("\nSOCKET ERROR : %d", WSAGetLastError());

      return false;
}

void CServerSocket::OnOpen()
{
      m_bCloseEventThread = true;
      WaitForEventThread();
      m_pEventThread = NULL;

      Listen();
}

void CServerSocket::OnAccept()
{
      if (m_bDatagram)
            return;

      Purge();

      SOCKADDR_IN sAddress = {0};
      int nLength = sizeof(sAddress);

      SOCKET hClient = accept(m_hSocket, (SOCKADDR*)&sAddress, &nLength);

      if (hClient != INVALID_SOCKET)
      {
            CDataSocket *pSocket = new CDataSocket;

            if (pSocket)
            {
                  pSocket->Attach(hClient);
                  pSocket->SetEventWnd(m_hEventWnd);

                  m_cConnections.Add(pSocket);

                  return;
            }

            closesocket(hClient);
      }

      TRACE("\nSOCKET ERROR : %d", WSAGetLastError());
}

UINT CServerSocket::EventThread(LPVOID pParam)
{
      CServerSocket *pSocket = (CServerSocket*)pParam;

      pSocket->m_bCloseEventThread = false;

      for (;;)
      {
            if (pSocket->m_bCloseEventThread)
                  break;

            if (!pSocket->ProcessEvents())
                  Sleep(TCPIPSOCKET_LOOPTIME);
      }

      return 0;
}

bool CServerSocket::ProcessEvents()
{
      if (m_hSocket==INVALID_SOCKET)
            return false;

      bool bProcessed = false;

      if (EventsOccured())
      {
            CTcpIpSocket::ProcessEvents();
            bProcessed = true;
      }

      int nConnectionCount = GetClientCount();

      if (nConnectionCount > 0)
            Purge();

      for (int nI=1; nI<=nConnectionCount; nI++)
      {
            CDataSocket *pSocket = GetClient(nI);
            if (pSocket && pSocket->EventsOccured())
            {
                  pSocket->ProcessEvents();
                  bProcessed = true;
            }
      }

      return bProcessed;
}

Now for a client socket:
bool CDataSocket::Connect(CString strIpAddress, int nPort)
{
      m_bConnected = false;

      SOCKADDR_IN sAddress = {0};
      sAddress.sin_family = AF_INET;
      sAddress.sin_port = htons(nPort);

      if (strIpAddress.Find('.') == -1)
      {
            // Try to translate this address to a ip address
            HOSTENT *pHost = gethostbyname(strIpAddress);
            if (pHost)
                  strIpAddress.Format("%u.%u.%u.%u", (unsigned char)pHost->h_addr_list[0][0],
                                                                   (unsigned char)pHost->h_addr_list[0][1],
                                                                   (unsigned char)pHost->h_addr_list[0][2],
                                                                   (unsigned char)pHost->h_addr_list[0][3]);
      }
      sAddress.sin_addr.S_un.S_addr = inet_addr(strIpAddress);

      if (!connect(m_hSocket, (SOCKADDR*)&sAddress, sizeof(sAddress)))
            return true;

      int nError = WSAGetLastError();
      if (nError == WSAEWOULDBLOCK)
      {
            return WaitForConnect();;
      } else {
            ProcessErrors(nError);
      }

      return false;
}

void CDataSocket::OnRead()
{
      unsigned long ulBytes = 0;

      ioctlsocket(m_hSocket, FIONREAD, &ulBytes);

      if (ulBytes > 0)
      {
            Receive(ulBytes);
      }
}

bool CDataSocket::Write(void *pData, long nSize, bool bAbortOnBlock)
{
      if (m_hSocket == INVALID_SOCKET)
            return false;

      return WriteFragmented(pData, nSize, bAbortOnBlock);
}

bool CDataSocket::WriteFragmented(void *pData, long lMessageSize, bool bAbortOnBlock)
{
      int nError = 0;
      bool bSuccess = true;

      /*
            Write the header and the first data packet
      */
      unsigned char *pWriteBuffer = new unsigned char[m_lHeaderSize + lMessageSize + 1];
      if (pWriteBuffer)
      {
            m_sHeader.lMessageSize = lMessageSize;

            memcpy(pWriteBuffer, &m_sHeader, m_lHeaderSize);
            memcpy(&pWriteBuffer[m_lHeaderSize], pData, lMessageSize);

            int nBytesWritten = 0,
                   nBytesWrittenTotal = 0,
                   nTotalMessageSize = lMessageSize + m_lHeaderSize,
                   nBytesToWrite = min(m_lMaxMessageSize, nTotalMessageSize);

            do
            {
                  nBytesWritten = send(m_hSocket, (char*)&pWriteBuffer[nBytesWrittenTotal], nBytesToWrite, 0);

                  if (nBytesWritten == SOCKET_ERROR)
                  {
                        nError = WSAGetLastError();

                        if (nError == WSAEWOULDBLOCK)
                        {
                              /*
                                    Only abort if nothing has been written to the socket. Once anything
                                    of the message has been written, complete the write process,
                                    even if it will block
                              */
                              if (bAbortOnBlock && nBytesWrittenTotal==0)
                              {
                                    TRACE("\nDATASOCKET WOULD BLOCK");
                                    break;
                              }

                              Sleep(TCPIPSOCKET_LOOPTIME);

                              continue;
                        } else {
                              /*
                                    Now we have a problem, the header and some data is sent,
                                    but we can't write to the socket, WHAT NOW?
                              */
                              ProcessErrors(nError);

                              bSuccess = false;
                              break;
                        }
                  } else {
                        nBytesWrittenTotal += nBytesWritten;
                        nBytesToWrite = min(m_lMaxMessageSize, nTotalMessageSize-nBytesWrittenTotal);
                  }
            } while (nBytesWrittenTotal < nTotalMessageSize);

            delete [] pWriteBuffer;
      }

      return bSuccess;
}

bool CDataSocket::Receive(unsigned long ulBytes)
{
      bool bSuccess = false;

      static long lMessageBytes = 0, lMessagePosition = 0;

      unsigned char *pData = new unsigned char[ulBytes+1];
      if (pData)
      {
            int nReceived = recv(m_hSocket, (char*)pData, ulBytes, 0);

            if (nReceived==SOCKET_ERROR)
            {
                  TRACE("\nSOCKET ERROR : %d", WSAGetLastError());
            } else {
                  if (nReceived > 0)
                  {
                        if (AppendBuffer(pData, nReceived))
                        {
                              if (m_lMessageSize < 1 || lMessagePosition < 1)
                              {
                                    lMessagePosition = 0;

                                    long lHeaderPos = FindHeader();
                                    if (lHeaderPos > 0)
                                    {
                                          sMessageHeader *pHeader = (sMessageHeader*)&m_pDataBuffer[lHeaderPos-1];
                                          m_lMessageSize = pHeader->lMessageSize;

                                          if (m_lMessageSize > 0)
                                                lMessagePosition = (lHeaderPos-1) + m_lHeaderSize;
                                    }
                              }
                        }

                        while (m_lDataSize-lMessagePosition>=m_lMessageSize && lMessagePosition>0)
                        {
                              SaveMessage(lMessagePosition);
                              TrimBuffer(lMessagePosition+m_lMessageSize);

                              if (m_hEventWnd)
                                    PostMessage(m_hEventWnd, TCPIPSOCKET_EVENTMESSAGE, NULL, (long)this);

                              long lHeaderPos = FindHeader();
                              if (lHeaderPos > 0)
                              {
                                    sMessageHeader *pHeader = (sMessageHeader*)&m_pDataBuffer[lHeaderPos-1];
                                    m_lMessageSize = pHeader->lMessageSize;

                                    if (m_lMessageSize > 0)
                                          lMessagePosition = (lHeaderPos-1) + m_lHeaderSize;
                              } else {
                                    lMessagePosition = 0;
                                    m_lMessageSize = 0;
                              }
                        }

                        bSuccess = true;
                  }
            }
      }

      if (pData)
            delete [] pData;

      if (!bSuccess)
            recv(m_hSocket, NULL, 0, 0);

      return bSuccess;
}

You will notice that the WriteFragmented function implements the layer that I mentioned previously. It sends a header message that contains the actual size of the data to follow. It then sends the data in chunks of (128k). At the receiving end the data is then packed together.

ASKER

I have removed the whole layer to test its performance penalty and the throughput improved to 41 MB/s. So for what I am trying to achieve with the layer, the 2MB/s penalty is acceptable.

OD

jhance

WEll, I'm not sure what performance is theoretically possible via TCPIP sans a network connection but 39MB/s may be approaching it. That's more than 300 Mbits/sec and is far faster than it needs to be to support either 10BaseT or 100BaseT network connections. I suppose it's slow for GigaBit Ethernet networks but there may be other factors involved here.

Have you looked into things like MTU and window sizes? I think you are pressing the limits of what TCPIP is capable of and in these cases tuning may be required to extract the full potential of the system.

Paul Maker

i have always noticed that when using sockets where both client and server are on the same machine the cpu always goes flat out during transfers. this is because both sides of the protocol are executing on the same machine. also i think that from an OS point of view there is probably a special case for when client and server are on the same machine regardless of ip address used to establish comms, this special case being bypasing the network card completly.

i agree with jhance on the performance thing, TCP is a heavy protocol. if you tried using UDP then you will probably see a dramatic speed up. although you will have to deal with packet loss and ordereing etc.

Paul

ASKER

The question now is how the hell are we going to achieve 1 gigabit performance from any socket classes if the limit is reached somewhere in the 40MB/s vicinity vs. the theoretical 125MB/s gigabit performance ?

jhance

That was not your question....

Have you tried a GigaBit LAN card? I would hope that it would come with drivers which are capable of 1Gb/s performance...

You gave no indication that you were using such a device.

ahmadrazakhan

Throughput of TCP connection is decreased due to error WSAWOULDBLOCK, this error is often there caz of small buffers. Avoiding this error can enhance data throughput. Say u have to send 4MegaBytes of data, keep ur Send and Recieve buffers 10 times bigger, 40 MB, this can be set using SetsockOpt, with SO_SNDBUF,SO_RCVBUF. See the results. conduct the same test using default buffers and compare the difference.
Also if u have to send data on same machine u can bypass TCP stack, it is possible by Winsock direct. For more info about it u have to consult Microsoft.

ASKER

jhance:
Soory about the confusion, my question was from a theoretical point of view...

ahmadrazakhan:
I have tried to gather as much information regarding this topic as possible. It acutally seems that the send and receive buffer sizes should be set according to the round trip times for packets on a network (Typically the times reported by PING).

My initial scenario I set the send and receive buffers to 25 * expected message sizes. Using the round trip calculation I actually came to the conclusion that my send/receive buffers should be in the region of about 128KB on a 100MB lan.

I implemented the new smaller buffers and it actually did make a marginal improvement in the throughput. I am now just above 40MB/s throughput including all my overhead