Start Free Trial

asked on

(55) No buffer space

We are having this very strange issue on Squid cache server on FreeBSD, the server after few hours hangs up and give this error
2004/08/25 15:42:48| comm_udp_sendto: FD 6, 217.21.5.35, port 53: (55) No buffer space available
2004/08/25 15:42:48| idnsSendQuery: FD 6: sendto: (55) No buffer space available
2004/08/25 15:42:48| comm_udp_sendto: FD 6, 217.21.5.35, port 53: (55) No buffer space available
2004/08/25 15:42:48| idnsSendQuery: FD 6: sendto: (55) No buffer space available

The machine has 2G Ram and 4*36 SCSI HDDs, we tried every single solution on the net with no success, here what is the loader.conf looks like

loader.conf
# -- sysinstall generated deltas -- #
userconfig_script_load="YES"
#kern.ipc.maxsockets=32768
kern.ipc.nmbclusters=262144
kern.ipc.nmbufs=131072
kern.ipc.nsfbufs=6656
kern.ipc.shm_use_phys=1
kern.maxfiles=32768
kern.maxproc=8192
kern.maxswzone=33554432
kern.nbuf=16384
kern.ncallout=32768
kern.vm.kmem.size=268435456
kern.vm.pmap.shpgperproc=2048
net.inet.tcp.tcbhashsize=16384

here netstat -m

18072/22208/524288 mbufs in use (current/peak/max):
18072 mbufs allocated to data
17932/22066/262144 mbuf clusters in use (current/peak/max)
49684 Kbytes allocated to network (1% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

WHAT ELSE CAN WE TRY ???

Try:
kern.ipc.nmbclusters="4096" # Set the number of mbuf clusters
kern.ipc.nmbufs="16384" # Set the number of mbufs = 4 * nmbclusters

see details on
http://list.cineca.it/cgi-bin/wa?A2=ind0204&L=squid&D=0&P=79396

Hope that helps

ASKER

done that as you can see above still same problem
"
kern.ipc.nmbclusters=262144
kern.ipc.nmbufs=131072
"

but your nmbufs is not nmbclusters*4

sysctl net.inet.udp.maxdgram=65535

this is not about ipc, this is about inet protocol stack....

ASKER

Are you telling me to add
sysctl net.inet.udp.maxdgram=65535
to loader.conf

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

When you exceed some system memory pools then you get feedback in system logs and immediate reboots, check netstat -m and vmstat -m to not let that happen ( very rare on modern systems)

To help more your squid:

1) use a caching DNS server on loopback interface.
2) let squidclients help you restart squid and let you analyze problem at convenient times.

Have a naice day.

ASKER

now we are having another issue
when trying to brows during peak hours we get the follwing error

ERROR
The requested URL could not be retrieved
--------------------------------------------------------------------------------
While trying to retrieve the URL: http://tarjim.ajeeb.com/
The following error was encountered:
Socket Failure
The system returned:
(49) Can't assign requested addressSquid is unable to create a TCP socket, presumably due to excessive load. Please retry your request.
Your cache administrator is webcache@xxxx.com.
--------------------------------------------------------------------------------
Generated Sun, 05 Sep 2004 19:46:00 GMT by cache1.xxxx.com (squid/2.5.STABLE5)

add kern.maxusers=512 to same loader.conf ....
it will scale up many system parameters...

ASKER

done it but still same problem (49) Can't assign requested addressSquid is unable to create a TCP socket, presumably due to excessive load. Please retry your request.

sysctl kern.maxfilesperproc
kern.maxfilesperproc: ????
Must be ten times users or so

Did you reboot after changing loader.conf ???

And please post output of uname -sir, so I can give correct ways to apply sysctl options.

Problem briefly is that load on system is a bit above average...

ASKER

cache1# uname -sr
FreeBSD 4.9-RELEASE-p8

cache1# uname -a
FreeBSD cache1.xxxx.com 4.9-RELEASE-p8 FreeBSD 4.9-RELEASE-p8 #0: Sat Jun
19 15:21:13 IDT 2004 root@cache1.xxxxx.com:/usr/src/sys/compile/SQUID4
i386

cache1# uname -sir
uname: illegal option -- i

The machine has 2G Ram and 4*36 SCSI HDDs, CPU single xeon 2.8, fujitsu-siemens TX300 server

what are actual settings of kern.maxusers and kern.maxfilesperproc ???
what does netstat -m say ???

ASKER

loader.conf
# -- sysinstall generated deltas -- #
userconfig_script_load="YES"

kern.ipc.nmbclusters=64000
kern.ipc.nmbufs=256000
kern.ipc.nsfbufs=6656
kern.ipc.shm_use_phys=1
kern.maxfiles=32768
kern.maxproc=8192
kern.maxswzone=33554432
kern.nbuf=16384
kern.ncallout=32768
kern.vm.kmem.size=268435456
kern.vm.pmap.shpgperproc=2048
kern.ipc.maxsockbuf=2621440
net.inet.tcp.tcbhashsize=16384

cache1# more sysctl.conf
# $FreeBSD: src/etc/sysctl.conf,v 1.1.2.3 2002/04/15 00:44:13 dougb Exp $
#
# This file is read when going to multi-user and its contents piped thru
# ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details.
#
machdep.hlt_logical_cpus=0
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.ip.forwarding=1
vm.v_free_min=131072
vm.v_free_target=262144
vm.v_free_severe=65536
kern.ps_showallprocs=0
vfs.vmiodirenable=1
kern.ipc.maxsockbuf=2147483648
kern.ipc.somaxconn=16384
net.inet.tcp.rfc1323=1
net.inet.tcp.delayed_ack=0
net.inet.tcp.sendspace=65535
net.inet.tcp.recvspace=65535
net.inet.udp.recvspace=65535
net.inet.udp.maxdgram=65535
net.local.stream.recvspace=65535
net.local.stream.sendspace=65535

netstat -m
18072/22208/524288 mbufs in use (current/peak/max):
18072 mbufs allocated to data
17932/22066/262144 mbuf clusters in use (current/peak/max)
49684 Kbytes allocated to network (1% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

ASKER

right now the cache is down and here is the netstat -m
cache1# netstat -m
258/17920/256000 mbufs in use (current/peak/max):
257 mbufs allocated to data
1 mbufs allocated to packet headers
256/17710/64000 mbuf clusters in use (current/peak/max)
39900 Kbytes allocated to network (20% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

what are actual settings of kern.maxusers and kern.maxfilesperproc ???

ASKER

kern.maxusers: 512
kern.maxfilesperproc: 29491

net.inet.tcp.sendspace=65535
net.inet.tcp.recvspace=65535

makes 128k *2 = 256K, so 1M of net memory for user
reduce that and problem will go away

For example:
net.inet.tcp.sendspace=8192
net.inet.tcp.recvspace=8192

newer versions have "inflight" facility to reduce net buffers if they can be smaller.

ASKER

Same issue :(

2004/09/09 15:33:04| Request header is too large (11680 bytes)
2004/09/09 15:33:04| Config 'request_header_max_size'= 10240 bytes.
2004/09/09 15:33:04| commBind: Cannot bind socket FD 2021 to *:0: (49) Can't assign requested ad
dress
2004/09/09 15:33:04| commBind: Cannot bind socket FD 1797 to *:0: (49) Can't assign requested ad
dress
2004/09/09 15:33:04| commBind: Cannot bind socket FD 1772 to *:0: (49) Can't assign requested ad
dress
2004/09/09 15:33:04| commBind: Cannot bind socket FD 1831 to *:0: (49) Can't assign requested ad
dress
2004/09/09 15:33:04| commBind: Cannot bind socket FD 1831 to *:0: (49) Can't assign requested ad
dress
2004/09/09 15:33:04| commBind: Cannot bind socket FD 1415 to *:0: (49) Can't assign requested ad
dress
2004/09/09 15:33:04| commBind: Cannot bind socket FD 1415 to *:0: (49) Can't assign requested ad
dress
2004/09/09 15:33:05| commBind: Cannot bind socket FD 270 to *:0: (49) Can't assign requested add
ress
2004/09/09 15:33:05| commBind: Cannot bind socket FD 270 to *:0: (49) Can't assign requested add
ress
2004/09/09 15:33:05| commBind: Cannot bind socket FD 527 to *:0: (49) Can't assign requested add
ress
2004/09/09 15:33:05| commBind: Cannot bind socket FD 527 to *:0: (49) Can't assign requested add
ress
2004/09/09 15:33:05| commBind: Cannot bind socket FD 134 to *:0: (49) Can't assign requested add
ress

What does this contain ???
net.inet.ip.portrange

(All seems like there are no free ports to use to connect out)

ASKER

cache1# sysctl -a|grep portrange
net.inet.ip.portrange.lowfirst: 1023
net.inet.ip.portrange.lowlast: 600
net.inet.ip.portrange.first: 1024
net.inet.ip.portrange.last: 5000
net.inet.ip.portrange.hifirst: 49152
net.inet.ip.portrange.hilast: 65535

we have switched from direct access list to WCCPv1 and did not help, also we notice that
cache1# netstat -an|wc -l
4141
figure goes above 10000 we get al kinds of trouble

net.inet.ip.portrange.last= 49151

by syctl or either .conf

you have 5000-1024 ports currently available, ten times more will be a lot of help.

ASKER

we did that but now wer are having a new error !!!!!
Increased kern.maxfiles=42000
kern.maxfilesproc=32000
and still the same ??

2004/09/10 15:26:27| comm_open: socket failure: (24) Too many open files
2004/09/10 15:26:27| comm_open: socket failure: (24) Too many open files
2004/09/10 15:26:27| comm_open: socket failure: (24) Too many open files
2004/09/10 15:26:28| comm_accept: FD 26: (53) Software caused connection abort
2004/09/10 15:26:28| httpAccept: FD 26: accept failure: (53) Software caused connection abort
2004/09/10 15:26:32| comm_accept: FD 26: (53) Software caused connection abort
2004/09/10 15:26:32| httpAccept: FD 26: accept failure: (53) Software caused con nection abort
2004/09/10 15:26:32| comm_open: socket failure: (24) Too many open files
2004/09/10 15:26:32| comm_open: socket failure: (24) Too many open files
2004/09/10 15:26:32| comm_open: socket failure: (24) Too many open files

2004/09/10 16:20:24| WARNING! Your cache is running out of filedescriptors
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files
2004/09/10 16:20:24| comm_open: socket failure: (24) Too many open files

1) kern.ips.somaxconn=1024 # will make things slower - users will wait for connections longer
2) what is file descriptor limit for running squid (look at /etc/login.conf, there should be nothing limiting squid

kern.ipc.somaxconn

And restart squid, not just rehash configuration

ASKER

cache1# sysctl -a | grep kern.ipc.somaxconn
kern.ipc.somaxconn: 16384

/etc/login.conf

default:\
:passwd_format=blf:\
:passwordtime=90d:\
:mixpasswordcase=true:\
:minpasswordlen=10:\
:idletime=30:\
:copyright=/etc/COPYRIGHT:\
:welcome=/etc/motd:\
:setenv=MAIL=/var/mail/$,BLOCKSIZE=K,FTP_PASSIVE_MODE=YES:\
:path=/sbin /bin /usr/sbin /usr/bin /usr/games /usr/local/sbin /usr/local/bin /usr/X11R6/bin ~/bin:\
:nologin=/var/run/nologin:\
:cputime=unlimited:\
:datasize=unlimited:\
:stacksize=unlimited:\
:memorylocked=unlimited:\
:memoryuse=unlimited:\
:filesize=unlimited:\
:coredumpsize=unlimited:\
:openfiles=unlimited:\
:maxproc=unlimited:\
:sbsize=unlimited:\
:vmemoryuse=unlimited:\
:priority=0:\
:ignoretime@:\
:umask=022:

su - squid -c ulimit -n ???

ASKER

First,
su - squid -c ulimit -n ???
was 8192 we recompiled squid and increased to 32768 according to the following formula
filedescriptors=40+32*maxusers
now things are much better
cache1# netstat -an | wc -l
19241
cache1# netstat -m
27345/29536/256000 mbufs in use (current/peak/max):
27342 mbufs allocated to data
3 mbufs allocated to packet headers
27186/29374/64000 mbuf clusters in use (current/peak/max)
66132 Kbytes allocated to network (12% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

but every once a while we get the following ( not as often as it used before)
2004/09/11 14:43:23| Request header is too large (11680 bytes)
2004/09/11 14:43:23| Config 'request_header_max_size'= 10240 bytes.
2004/09/11 14:43:23| parseHttpRequest: Unsupported method 'recipientid=103&sessionid=1832'
2004/09/11 14:43:23| clientReadRequest: FD 14698 Invalid Request
2004/09/11 14:43:24| comm_accept: FD 26: (53) Software caused connection abort
2004/09/11 14:43:24| httpAccept: FD 26: accept failure: (53) Software caused connection abort
2004/09/11 14:43:24| comm_accept: FD 26: (53) Software caused connection abort
2004/09/11 14:43:24| httpAccept: FD 26: accept failure: (53) Software caused connection abort
2004/09/11 14:43:24| Request header is too large (12287 bytes)
2004/09/11 14:43:24| Config 'request_header_max_size'= 10240 bytes.
2004/09/11 14:43:24| Request header is too large (12287 bytes)
2004/09/11 14:43:24| Config 'request_header_max_size'= 10240 bytes.
2004/09/11 14:43:26| Request header is too large (11680 bytes)

kern.ipc.somaxconn = 8192 instead of 16000

Looks more like protocol desynchronization
like this is request tail:
> Unsupported method 'recipientid=103&sessionid=1832'

e.g. HTTP/1.1 pipelining extension, if you could gather stats of User-agent 's, maybe problem relates to some particular browsers
request headers are usually small(max size is few kilobytes)

maybe upgrading to more recent squid is of help, or using squidclients to automatiaclly restart crashed squid or so.

squid-2.5.STABLE6

is in ports on my computer btw