[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 598
  • Last Modified:

Gigabit and parallelism

I working on a performance issue involving a backup software (HP Dataprotector, for information purposes, but forget about it). During an analysis using the iperf tool I noticed this values:

Using 1 thread: 760Mbits/s (iperf -P 1)
Using 3 threads: about 900Mbits/s (iperf -P 3)
Using 4 threads and beyond: 946Mbits/s (iperf -P 4)

The question is: Why can't I reach the full gig speed by using just one thread (one connection)?

Using a fast ethernet card, I can achieve 94Mb/s with 1 thread. No problem at all. The problem seems to show just with a gig connection.
0
Renato Montenegro Rustici
Asked:
Renato Montenegro Rustici
  • 5
  • 4
  • 2
  • +2
1 Solution
 
bbaoIT ConsultantCommented:
basically, you cannot acutally reach the 1G limit as the benchmark is for physical bandwidth, ideally.

beside the payload for transferring actual data over the network, extra payload is required for packaging the raw data and its protocols, just like TCP is sort of payload of IP and HTTP is part of payload of TCP.

also be aware the metrics here, Mbps, it is bit per second for measuring bit stream, not Byte per Second or KB/S for benchmarking application payload.
0
 
Renato Montenegro RusticiIT SpecialistAuthor Commented:
Actually I can get near 1Gbit/s when I start 4 simultaneous threads (iperf -c 10.1.1.3 -t 60 -P 4). The network interface utilization in Windows shows 99%. I can get 960Mbits/s. I think those remaining 40Mbits/s are related to some overhead, that's ok.

What I can't do is to reach anything beyond 760Mbits/s when using just one thread (iperf -c 10.1.1.3 -t 60 -P 1). In that case, network interface utilization in Windows shows about 70%. I was wondering why I can't go beyond it. Maybe it's a limitation in iperf. That's what I want to discuss with you guys. Why a single data stream can't get to 1gig but a four way stream can.

When using a fast ethernet card, at 100Mbits/s, I can get to 94.6Mbits/s (almost full bandwidth) using a single stream of data.

0
 
ravenplCommented:
You may try increasing window size (iperf has an option for that) and/or increasing interface MTU(operating system settings - but all boxes in same segment should have same MTU).
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
nociSoftware EngineerCommented:
10 times the transfer rate als means 10 times the OS calls ==> 10 times the extra Overhead.
So that CPU & call overhead can sustain 760Mbps in a linear fashion.
This clearly is more than 100Mbps ==> you can saturate a 100Mbps connection.
By adding more threads you can help the frontend of the processing.., but the limit will be the overhead on the network adapter...

You may reach the 1Gbps in one thread if you use jumbo frames (frames of 8K-9K depending on hardware).
That also presumes you have a switch & other system that can handle this. (And the switch can handle this bandwidth)
0
 
SteveJCommented:
noci is on to something . . , it's likely the CPU that is "limiting" your output speed. Gigabit cards require lots of CPU on a standard machine.

Good luck,
Steve
0
 
nociSoftware EngineerCommented:
If you have a multicore CPU having several processes(threads) helps pumping out more data.
0
 
Renato Montenegro RusticiIT SpecialistAuthor Commented:
This is the hardware I am using in the test (2 identical servers):

Dell PowerEdge R610
8GB RAM
2 x Intel Xeon E5630 2.53GHz Quad Core
2 x 136GB SAS (RAID 1)
2 Broadcom BCM5709C NetXtreme II GegE (Dual Port)
Windows 2008 R2 (fully updated)

The network interfaces are connected with a cross cable (no switch).

When I issue the iperf command, the CPU time (in all cores) barely moves. So I don't think CPU is a issue. I think the bus speed is quite good since it's one of the best hardwares from Dell.

I tried to increase the frame sizes in the network interface, There was no improvement. When I set the greatest frame size, I noticied errors and the speed dropped. It's now 1500 bytes, the default. I tried to set the maximum MTU size (-M option). There was no difference: 760Mbits/s with 1 data stream, 940Mbits/s with 4 data streams.

Any ideas? Or even other tools?
0
 
SteveJCommented:
You are correct, it is not CPU. I mentioned that without understanding the type of machine. What happens when you run it thru a switch?
0
 
ravenplCommented:
have You tries the iperf "-w" option? And possibly the "-N"?
0
 
nociSoftware EngineerCommented:
One processes does a sync write:

- write(xxx)
  (sysCALL write()
     - copy to system buffers
     - queue to driver
     - start driver
     - wait for driver

     - driver - create task on card
     - start transfer
     - wait for end of xfer

     - get xfer status
     - post to process
     - resume process

So you can see that  although the process in line is BUSY (with mostly waiting) it will not start another write until the first is completed. ==> no high cpu load but one task will wait.

With multicore some of these processes can overlap helping even further to push data.

You will also see that on architecture without DMA the CPU is more busy (pushing data to adapters) that on systems with DMA.
(Non-DMA architecture = PIO mode ide disks).
0
 
nociSoftware EngineerCommented:
Jumbo frames is not large MTU only, it needs to be enabled & supported with the switches. If you just declare a large MTU it will only produce NON-communication if dont fragment is set or heave fragmentation otherwise.

Packet fragmentation will proceduce large overhead on systems.
0
 
Renato Montenegro RusticiIT SpecialistAuthor Commented:
I will answer by the end of the day.
0
 
Renato Montenegro RusticiIT SpecialistAuthor Commented:
I managed to achieve the full bandwidth with only one thread by increasing the TCP Windows Size to, at least, 64MB:

iperf -c <server ip address> -t 60 -w 64000
0
 
Renato Montenegro RusticiIT SpecialistAuthor Commented:
Just a correction: 64KB, not MB.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 5
  • 4
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now