Avatar of Renato Montenegro Rustici
Renato Montenegro RusticiFlag for Brazil

asked on 

Gigabit and parallelism

I working on a performance issue involving a backup software (HP Dataprotector, for information purposes, but forget about it). During an analysis using the iperf tool I noticed this values:

Using 1 thread: 760Mbits/s (iperf -P 1)
Using 3 threads: about 900Mbits/s (iperf -P 3)
Using 4 threads and beyond: 946Mbits/s (iperf -P 4)

The question is: Why can't I reach the full gig speed by using just one thread (one connection)?

Using a fast ethernet card, I can achieve 94Mb/s with 1 thread. No problem at all. The problem seems to show just with a gig connection.
Networking Hardware-OtherWindows NetworkingNetworking

Avatar of undefined
Last Comment
Renato Montenegro Rustici
Avatar of bbao
Flag of Australia image

basically, you cannot acutally reach the 1G limit as the benchmark is for physical bandwidth, ideally.

beside the payload for transferring actual data over the network, extra payload is required for packaging the raw data and its protocols, just like TCP is sort of payload of IP and HTTP is part of payload of TCP.

also be aware the metrics here, Mbps, it is bit per second for measuring bit stream, not Byte per Second or KB/S for benchmarking application payload.
Avatar of Renato Montenegro Rustici


Actually I can get near 1Gbit/s when I start 4 simultaneous threads (iperf -c -t 60 -P 4). The network interface utilization in Windows shows 99%. I can get 960Mbits/s. I think those remaining 40Mbits/s are related to some overhead, that's ok.

What I can't do is to reach anything beyond 760Mbits/s when using just one thread (iperf -c -t 60 -P 1). In that case, network interface utilization in Windows shows about 70%. I was wondering why I can't go beyond it. Maybe it's a limitation in iperf. That's what I want to discuss with you guys. Why a single data stream can't get to 1gig but a four way stream can.

When using a fast ethernet card, at 100Mbits/s, I can get to 94.6Mbits/s (almost full bandwidth) using a single stream of data.

Avatar of ravenpl
Flag of Poland image

You may try increasing window size (iperf has an option for that) and/or increasing interface MTU(operating system settings - but all boxes in same segment should have same MTU).
Avatar of noci

10 times the transfer rate als means 10 times the OS calls ==> 10 times the extra Overhead.
So that CPU & call overhead can sustain 760Mbps in a linear fashion.
This clearly is more than 100Mbps ==> you can saturate a 100Mbps connection.
By adding more threads you can help the frontend of the processing.., but the limit will be the overhead on the network adapter...

You may reach the 1Gbps in one thread if you use jumbo frames (frames of 8K-9K depending on hardware).
That also presumes you have a switch & other system that can handle this. (And the switch can handle this bandwidth)
noci is on to something . . , it's likely the CPU that is "limiting" your output speed. Gigabit cards require lots of CPU on a standard machine.

Good luck,
Avatar of noci

If you have a multicore CPU having several processes(threads) helps pumping out more data.
This is the hardware I am using in the test (2 identical servers):

Dell PowerEdge R610
2 x Intel Xeon E5630 2.53GHz Quad Core
2 x 136GB SAS (RAID 1)
2 Broadcom BCM5709C NetXtreme II GegE (Dual Port)
Windows 2008 R2 (fully updated)

The network interfaces are connected with a cross cable (no switch).

When I issue the iperf command, the CPU time (in all cores) barely moves. So I don't think CPU is a issue. I think the bus speed is quite good since it's one of the best hardwares from Dell.

I tried to increase the frame sizes in the network interface, There was no improvement. When I set the greatest frame size, I noticied errors and the speed dropped. It's now 1500 bytes, the default. I tried to set the maximum MTU size (-M option). There was no difference: 760Mbits/s with 1 data stream, 940Mbits/s with 4 data streams.

Any ideas? Or even other tools?
You are correct, it is not CPU. I mentioned that without understanding the type of machine. What happens when you run it thru a switch?
Avatar of ravenpl
Flag of Poland image

Blurred text
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of noci

One processes does a sync write:

- write(xxx)
  (sysCALL write()
     - copy to system buffers
     - queue to driver
     - start driver
     - wait for driver

     - driver - create task on card
     - start transfer
     - wait for end of xfer

     - get xfer status
     - post to process
     - resume process

So you can see that  although the process in line is BUSY (with mostly waiting) it will not start another write until the first is completed. ==> no high cpu load but one task will wait.

With multicore some of these processes can overlap helping even further to push data.

You will also see that on architecture without DMA the CPU is more busy (pushing data to adapters) that on systems with DMA.
(Non-DMA architecture = PIO mode ide disks).
Avatar of noci

Jumbo frames is not large MTU only, it needs to be enabled & supported with the switches. If you just declare a large MTU it will only produce NON-communication if dont fragment is set or heave fragmentation otherwise.

Packet fragmentation will proceduce large overhead on systems.
I will answer by the end of the day.
I managed to achieve the full bandwidth with only one thread by increasing the TCP Windows Size to, at least, 64MB:

iperf -c <server ip address> -t 60 -w 64000
Just a correction: 64KB, not MB.

Networking is the process of connecting computing devices, peripherals and terminals together through a system that uses wiring, cabling or radio waves that enable their users to communicate, share information and interact over distances. Often associated are issues regarding operating systems, hardware and equipment, cloud and virtual networking, protocols, architecture, storage and management.

Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews


IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo