?
Solved

SCO Unix Network Hang

Posted on 2008-01-24
11
Medium Priority
?
3,514 Views
Last Modified: 2013-12-05
Hi All,
We have a SCO OpenServer 5.0.7 server in our office. Lately (past week) it has been hanging very roughly every 12 hours, and I need to reboot. The reboot seems to clear it up, and it runs fine, for a while.  We have 2 remote NFS shares mounted to the local file system, and the box also serves several samba shares. When the system begins to hang, these become unreachable (NFS shares from SCO, Samba shares from remote systems). We have about 50 users logging in at a time to our foxbase applications. Sometimes (I think if not rebooted quick enough) it stops responding to any network protocol (ssh, telnet, etc.).

Here is the netstat -m from before the last reboot (this was fairly deep into the hang)
streams allocation:
                                             config    alloc     free       total      max     fail
stream                                8448      202     8246        2464      202        0
queues                                  908      413      495        4939      413        0
mblks                                 8634     8164      470    26890848     8599        0
buffer headers                  9018     8568      450      308777     8969        0
class  1,     64 bytes         396       74      322    10407791      393        0
class  2,    128 bytes         50       20       30     2273073       76        0
class  3,    256 bytes         44       28       16     5951744      585        0
class  4,    512 bytes         14       10        4       34800       58        1
class  5,   1024 bytes       31        0       31       38453       54        3
class  6,   2048 bytes      7310     7309        1     2284286     7310      804
class  7,   4096 bytes       532      532        0       15352      585 43361300
class  8,   8192 bytes        0        0        0      114823        9       47
class  9,  16384 bytes        0        0        0      224169        3       43
class 10,  32768 bytes        0        0        0      154546        3      893
class 11,  65536 bytes        0        0        0        1830        3       54
class 12, 131072 bytes        0        0        0           0        0        0
class 13, 262144 bytes        0        0        0           0        0        0
class 14, 524288 bytes        0        0        0           0        0        0
total configured streams memory: 17024.00KB
streams memory in use: 17129.64KB
maximum streams memory used: 18012.29KB


I get this error message repeated quite a bit:
WARNING: allocb failed - NSTRPAGES exceeded


I am out of ideas where to look next, any help is greatly appreciated. Thank you.
0
Comment
Question by:US-IT
  • 4
  • 2
  • 2
  • +2
11 Comments
 
LVL 11

Expert Comment

by:dfke
ID: 20741089
Looks like a kernel issue as the fail colunm should be all zero's.
I see that the total configured streams memory almost matches the streams memory in use. You should increase the  number of NSTRPAGES.  NSTRPAGES controls the number of 4K pages  of memory that can be dynamically allocated for STREAMS use.

Furthermore NSTREAM should be set to at least 256 on systems that mount NFS-filesystems or invoke remote X clients.
0
 
LVL 10

Expert Comment

by:Smart_Man
ID: 20741243
right , looks liek a bottle neck issue . needs to reduce users or increase allocated resources. or check for teh hardware real phisycal limit.
0
 

Author Comment

by:US-IT
ID: 20741792
I'm thinking it is a kernel issue as well. Last night, I disconnected a mapped drive we had set up from a 2003 sever, and I ran /etc/conf/cf.d/configure to change the streams values. The server has been up for almost 19 hours, with only only 2 fails in class 8, and one in class 7. Are any fails acceptable? Or do I need to get these down to 0? Also, I say I ran the configure command for the streams, but with only reading about it a little I wasn't too comfortable with the changes. They seemed to have helped, but there were a lot of parameters to change. Which ones, or all of them, should I be focused on?

0
[Webinar] Kill tickets & tabs using PowerShell

Are you tired of cycling through the same browser tabs everyday to close the same repetitive tickets? In this webinar JumpCloud will show how you can leverage RESTful APIs to build your own PowerShell modules to kill tickets & tabs using the PowerShell command Invoke-RestMethod.

 

Author Comment

by:US-IT
ID: 20742432
Spoke too early. After little over 20 hours:

streams allocation:
                         config    alloc     free       total      max     fail
stream                    15000      332    14668        7527      333        0
queues                     1362      674      688       15066      676        0
mblks                     16996    16785      211    59642597    16939        0
buffer headers            17082    16998       84     3695278    17058   237255
class  1,     64 bytes      342      256       86    25284596      383        0
class  2,    128 bytes      213      192       21     4416542      212        0
class  3,    256 bytes      322      253       69    10898034     1052       26
class  4,    512 bytes       13       11        2       63549       44        4
class  5,   1024 bytes       33        0       33       49609       70        8
class  6,   2048 bytes    14742    14740        2     4360914    14741        1
class  7,   4096 bytes     1000     1000        0       22585     1050      640
class  8,   8192 bytes        0        0        0      191912        9      147
class  9,  16384 bytes        0        0        0      414915        4        2
class 10,  32768 bytes        0        0        0      290699        3        2
class 11,  65536 bytes        0        0        0        1993        3        0
class 12, 131072 bytes        0        0        0           0        0        0
class 13, 262144 bytes        0        0        0           0        0        0
class 14, 524288 bytes        0        0        0           0        0        0
total configured streams memory: 32000.00KB
streams memory in use: 34311.99KB
maximum streams memory used: 35239.42KB

Note: Users began logging in after the 19th hour of uptime. Also note, we have a web application that accesses a shared drive, most likely traffic beginning around the same time.
0
 

Author Comment

by:US-IT
ID: 20742584
Don't know if this information helps at all.

Client nfs:
calls      badcalls   nclget     nclsleep
34186      0          34223      0          
null       getattr    setattr    root       lookup     readlink   read      
0  0%      2279  6%   10  0%     0  0%      4097 11%   0  0%      17128 50%  
wrcache    write      create     remove     rename     link       symlink    
0  0%      9519 27%   429  1%    39  0%     79  0%     0  0%      0  0%      
mkdir      rmdir      readdir    fsstat    
0  0%      0  0%      452  1%    154  0%    



$ ls
[Lists all files in nfs mount]
$ l
[hangs]

0
 
LVL 11

Expert Comment

by:dfke
ID: 20743229
Ok just maybe there is a problem with the network card or driver. Check to see if there is an updated driver of try to switch cards.

If that doesn't help you can try to sniff the packets in some way and compare the timings with the netstat -m output:

Try a shell script that records `netstat -m` output:

while :; do
date
netstat -m
sleep 1      #
done > netstat-m.log

Meanwhile, put a packet sniffer on the LAN, tell it to capture
everything being sent to servers IP address.  Try to make sure the
sniffer and server agree closely about the time (within a second or
better).  Then run the sniff for long enough to observe the buffers
rising significantly, according to the `netstat -m` log.

You should be able to identify specific times when buffers were
consumed.  Look at the corresponding times in the sniffer log: is there
a particular kind of incoming packet that seems to be causing this?
0
 

Author Comment

by:US-IT
ID: 20746340
I believe the NIC is onboard. This may be  a dumb question, but, what would be the best brand/model NIC to use to try out (DELL Power Edge 2500)? My knowledge is much more suited to Linux, so while I know some things, I'm almost a newcomer to SCO/Unix.

I will work on getting the packet sniffer going.

Thanks for your help.

0
 
LVL 10

Expert Comment

by:Smart_Man
ID: 20746539
what about having 2 nics ? plus teh built-in . all on teh same networking prviding the same service ?

packet sniffer is a good idea. but you may try a bandwidth manager and it is a better idea , so you can both monitor and control overshots
0
 
LVL 14

Expert Comment

by:mikelfritz
ID: 20765674
You can get a free trial of SarCheck that will ID all of the kernel tunables to adjust - you may need to get to near crash status to have it give the desired result.

Sarcheck:

http://www.sarcheck.com/scosr5.htm

go back to the home page and you can find the free trial.
0
 
LVL 14

Expert Comment

by:mikelfritz
ID: 20765725
Also - make sure you are not running out of space on the /, /var (if it's there), /usr (if it's there) filesystems.

Look at:
http://docsrv.sco.com:507/en/PERFORM/kernel_configure.html

In particular:  
STRMSGSZ
   

Although, if the problem is new, and no configuration changes were made before the problem cropped up, I'd suspect either a network issue or a Chatty Cathy client inundating the server with packets.  
0
 
LVL 1

Accepted Solution

by:
yotech earned 1500 total points
ID: 20837395
Check the way your connections to samba shares are being made, It's better to map the share on client
computers than have them reconnect every time then need access. The
netstat -an | grep 139
should show you current connections to samba shares. If you run the command every few seconds and the remote ports keeps changing you will  eventually run out of stream resources the way it's happening now. Some network card drivers may also be causing a memory leak, but mapping shares should take care of your problem or aliviate it to a more managable level.
0

Featured Post

Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hello fellow BSD lovers, I've created a patch process for patching openjdk6 for BSD (FreeBSD specifically), although I tried to keep all BSD versions in mind when creating my patch. Welcome to OpenJDK6 on BSD First let me start with a little …
Attention: This article will no longer be maintained. If you have any questions, please feel free to mail me. jgh@FreeBSD.org Please see http://www.freebsd.org/doc/en_US.ISO8859-1/articles/freebsd-update-server/ for the updated article. It is avail…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Suggested Courses
Course of the Month6 days, 2 hours left to enroll

588 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question