[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1922
  • Last Modified:

Slow network throughput problem

We're running into issues getting good network throughput on MySQL 5.0.20 cluster on FreeBSD 6.0 on a gigabit network (bge NICs).

Two FreeBSD machine host the cluster (dual 3.0ghz, 4gb ram, scsi), and several access it. When we do load testing, all the machines use less than 10% CPU. When accessing a local server however, the machine can use 100% CPU, leading us to the conclusion we're running in a network bottleneck somewhere.

When running ntop, network throughput is nowhere near 1000mbit/sec like we're supposed to be getting. Coming from a RH Linux environment, is FreeBSD not ready to utilize GigE out of the box? I've done a good bit of googling but couldn't turn up any solid answers.

Here are all the network parameters I could think of and their values:
kern.ipc.somaxconn: 128
kern.ipc.nmbclusters: 65536
net.inet.tcp.delayed_ack: 1
net.inet.tcp.sack.enable: 1
net.inet.tcp.sack.enable: 1
kern.ipc.maxsockets: 12328
kern.ipc.shm_use_phys: 0
kern.ipc.somaxconn: 128
kern.ipc.maxsockets: 12328
kern.ipc.maxsockbuf: 262144
net.inet.tcp.sendspace: 32768
net.inet.tcp.recvspace: 65536
net.inet.tcp.recvspace: 8192
net.local.stream.sendspace: 8192

Stack size is set to 256MB

The TCP connection between the cluster and accessing nodes is always established. For instance, when the accessing node runs a query, it runs it over an already established connection with the cluster.

Any ideas would be much appreciated!
0
ironcladsecure
Asked:
ironcladsecure
  • 6
  • 5
1 Solution
 
gheistCommented:
Describe your problem backing by serious observations, not fake claims and sysctl defaults.
0
 
ironcladsecureAuthor Commented:
Fake claims aye? Why make such wild accusations before you know the facts?

ntop shows 1% network utilization when we're running a stress test
CPU is at 10%
Hard drive barely moves

Where's the bottleneck? The answer is pretty obvious since running the same test on the local machine instead of over the network is 10 times faster.

So do you have any answers or just hot air?
0
 
ironcladsecureAuthor Commented:
We've nailed this down to be a bottleneck with the MySQL cluster process, not general network performance.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
gheistCommented:
First of all ntop puts netcard in promiscious mode, so loading system CPU more. It is invalid for observarions. Use SNMP and tools like MRTG instead.

Only releavant sysctl is net.inet.ip.intr_queue_maxlen
0
 
ironcladsecureAuthor Commented:
Thanks
0
 
gheistCommented:
Normally one would look at "netstat -an" for huge Send-Q -s in case of network slowness or Recv-Q -s in case of application slowness.
In later case look if problem is IO bound or CPU bound, or both - swapping to disk. In simple cases just configure to HW....
0
 
ironcladsecureAuthor Commented:
After some tuning of processes/etc, we were able to significantly improve performance, but still is not as fast as Red Hat ES4.

Both are identical machines, one running RedHat, the other FreeBSD. When pinging the same node on the network the average round trip for FreeBSD is 0.14ms, RedHat is 0.057. Not a significant difference, but is there an explanation for that?
0
 
gheistCommented:
1) You have to build with gcc -mcpu=??? -O2
2) Most likely different network hardware.
0
 
ironcladsecureAuthor Commented:
1) gcc no longer supports -mcpu (3.4.4). Do you recommend recompiling the kernel with -mtune instead?
2) Exact same network hardware
0
 
gheistCommented:
1) How to do that is described in /usr/share/examples/etc/make.conf

2) Actually linux result seems too good to be true - 3 ethernet timeslots is way too fast for ping to propagate network and return
Maybe Linux counts time between packet commited into network and received
On the other hand FreeBSD starts counting when starting to send to Linux.
Externally pinged machines do not look that different (FreeBSD 5.4 or Ubuntu 5.04 ~ same with Intel PRO/100 cards over same switch ). Far-from-realtime systems like Window or Net-wares do look much worse.

0
 
gheistCommented:
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.
I will leave the following recommendation for this question in the Cleanup topic area:

accept gheist http:#16543937

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

gheist
EE Cleanup Volunteer
0

Featured Post

Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

  • 6
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now