• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2488
  • Last Modified:

RHEL, tomcat is getting into hung situation.

Hi,
I am running tomcat 5.0.18(using JDK 1.4.2_07) on RHEL.
-bash-3.00$ uname -a
Linux localhost.com 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686 i386 GNU/Linux

After doing certain load runs, tomcat is getting into hung situation. netstat shows that the new request is not getting into ESTABLISHED state.
If i send new wget localhost:port (from same host where tomcat is running) then in netstat, i will see the entry in SYN_RECV state for quite sometime. After 2-3 minutes, wget command comes out saying no answer from the server.

In the netstat -s output, i see
From netstat -s, I noticed overflow of the socket queue:
  7 packets pruned from receive queue because of socket buffer overrun
421277 times the listen queue of a socket overflowed
    421277 SYNs to LISTEN sockets ignored

Should i fine tune something in linux?

0
bhaskarna
Asked:
bhaskarna
  • 8
  • 7
  • 4
1 Solution
 
pjedmondCommented:
>After doing certain load runs
>From netstat -s, I noticed overflow of the socket queue

The whole point of load runs is to find where performance limitations and bottlenecks are. You want these to be at a point in the system that 'fails gracefully'. You need to decide what is acceptable. In particular, they may also highlight vulnerability to flood or DOS attacks.

Tuning:

You can increase the transmission queue for your ethernet interface by:

/sbin/ifconfig eth0 txqueuelen 4000

You can also increase improve kernel performance by tweaking sysctrl

http://datatag.web.cern.ch/datatag/howto/tcp.html

Unfortunately, neither of these is really the solution in this case, as what this does is increase the ability of the system to cope with peak loads for a slightly longer period of time. (Whilst the 'buffers/queues' fill up).

You need to have a look at your overall application set up, and perhaps have your application monitor load in some way, and reject new connection or fail gracefully once a load threshold is exceeded.

(   (()
(`-' _\
 ''  ''
0
 
pjedmondCommented:
..or buy a faster processor and more memory, until the ethernet input connection becomes the bottleneck. (You always want the input to be the bottleneck with a process when you are designing it - that way the 'system' has a little bit of leaway to cope with anything 'unusual', which helps guarantee that the 'system' or in this case your application, doesn't deal with anything unexpected.)

(   (()
(`-' _\
 ''  ''

0
 
bhaskarnaAuthor Commented:
Even after stopping all the load clients, why will port remain in frozen state. Also i don't see too many active connections?
0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

 
pjedmondCommented:
>Even after stopping all the load clients, why will port remain in frozen state. Also i don't see too many active connections?

Correct - You've just managed a successful DOS attack on your system. Look at it from this point of view:

As root, you have permissions to turn off an ethernet connection. Once you've done it, it can't be used until it is reset.

As your tomcat has permissions to create sockets, and exceed the number of sockets available, and fill up the queue, then it does, and it won't work, until you reset it. Ideally, the system wouldn't allow you to create the sockets in the first place, but as the socket creation functionality is such a key part of many processes in a Linux system, the code is designed to be as fast and efficient as possible. As a result, some checks and limitations that otherwise might be there are not. Instead, the user is expected to make the checks necesary.

It may even be that there is an error somewhere being produced, and your system is ignoring it, and then carrying on to do something it shouldn't, thus freezing the setup. Probably worth checking your error logs.

(   (()
(`-' _\
 ''  ''

0
 
bhaskarnaAuthor Commented:
Please execuseme if i am asking very basic questions.

Purposefully, after hitting hung situation, i left the tomcat process for 10hrs(without killing). Even after 10 hrs, i was seeing the same problem. (New requests were not getting into ESTABLISHED state). It use to stay at SYN_RECV for 2 minutes and comeout with no answer.

What does this mean? If tomcat is holding lot of socket connections then i should be able to notice the huge list of ESTABLISHED connections in netstat. But i was not seeing lot of established connections. Does it mean connections were released? if yes, then why i am not getting new connection? will there be any leaks? if yes, how to find out?

Will RHEL blocks a particular port after bombarding it? Do we need to finetune the message_burst, message_cost etc?

0
 
pjedmondCommented:
Imagine a stick - If you bend it a little bit, then it will spring back. (This is equivalent to using the socket queue). - If you bend it a lot, then the stick snaps, and you need to get a new one. (This is equivalent to filling the socket queue, and then trying to keep on going...which is what your load testing is doing!)

Purposefully, after hitting hung situation, i left the tomcat process for 10hrs(without killing). Even after 10 hrs, i was seeing the same problem. (New requests were not getting into ESTABLISHED state). It use to stay at SYN_RECV for 2 minutes and comeout with no answer. - You've effectively 'broken' the system, so it won't recover without a restart.

If tomcat is holding lot of socket connections then i should be able to notice the huge list of ESTABLISHED connections in netstat. But i was not seeing lot of established connections. Does it mean connections were released? - All connections in Q were probably released, but the pointer for adding new connections is now pointing in the wrong place (possibly loads of bytes beyond where the Q should be), and as a result new connections cannot be placed in this Q.

Will RHEL blocks a particular port after bombarding it? - No - not unless you have a firewall rule of some sort to do this.

Do we need to finetune the message_burst, message_cost etc? - Without studying your configuration, there is no way to say. The key point is that your application is exceeding the capabilities of the setup that you have. Ideally, RHEL should prevent you from doing this, but in this case it obviously does not - probably due to optimisations that improve performance. You need to change your tomcat code to prevent it exceeding these limitations. For example - if there are more than say 100 connections, then you tomcat servlets need to reject the connection and produce a page that says "Please retry again later". A better alternative would be to configure the necessary restriction at the Tomcat Level:

http://tomcat.apache.org/tomcat-4.0-doc/config/http11.html

In particular in the services.xml file, check and adjust the 'connector' (HttpConnector) configuration to 'tune' your setup. The exact 'tuning' required will depend on your application.

(   (()
(`-' _\
 ''  ''











0
 
pjedmondCommented:
Sorry - noticed that you were running Tomcat 5. Worth having a look at logging, and see if that can help locate at what point the setup fails. Start here:

http://tomcat.apache.org/tomcat-5.0-doc/config/

Note that Tomcat 5, and 5.5 have a different logging configuration.

(   (()
(`-' _\
 ''  ''
0
 
bhaskarnaAuthor Commented:
Till now i was suspecting that the socket q overflow problem is caused due to my load runs. But today in nestat -s output, I noticed the following entry in the box where load runner will not hit:
"    391808 times the listen queue of a socket overflowed"

To confirm that loadrunner didn't hit this box, in the tomcat access logs (for last 15 days) i see max of 200-300 requests per day. Out of 300 requests, 200+ will be heartbeat checks and around 60-80 requests are real requests.

What are the various reasons to see 'listen queue of a socket overflowed'?
Is there a easy way to debug and find out who is causing this overflow?

0
 
pjedmondCommented:
Interesting, as it implies that tomcat may be the cause:

http://www.linuxquestions.org/questions/showthread.php?t=457517

>What are the various reasons to see 'listen queue of a socket overflowed'?
Only 1 reason really - too many connection attempts in too short a period of time.

>Is there a easy way to debug and find out who is causing this overflow?
Not really, unless someone is doing a 'standard' attack on your PC (Is is isolated from the rest of the network?). Best bet is probably to try using "tcpdump > filename.txt" and check what's going on after the freeze. These are raw packets and this approach is not for the faint-hearted, and will be timeconsuming to check/analyse!

(   (()
(`-' _\
 ''  ''

0
 
bhaskarnaAuthor Commented:
I tried taking tcpdump of 5 seconds and noticed:

listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
589 packets captured
595 packets received by filter
6 packets dropped by kernel

is it too much? normal? should i check specific entries in tcpdump output to better understand the symptom?
0
 
pjedmondCommented:
That could be normal - I notice that you have a 10MB card - worth upgrading for 2 reasons:

1.    It can only cope with so much trafiic, and due to the ethernet protocol, 2 packets sent at the same time will 'collide' giving an invalid packet...which might cause a problem.
2.    Older drivers potentially have more errors in them and may not make full use of functionality in modern kernels to deal with errors or other problems with packets, or the network .

The key here is that you really need to get a tcpdump during the failure process in order to see what is happening. Log files will be huge...!...but I suspect that once the proble occurs the packet behaviour will change.

      
Comment from pjedmond
Date: 07/17/2006 11:00AM BST
      Your Comment       

>After doing certain load runs
>From netstat -s, I noticed overflow of the socket queue

The whole point of load runs is to find where performance limitations and bottlenecks are. You want these to be at a point in the system that 'fails gracefully'. You need to decide what is acceptable. In particular, they may also highlight vulnerability to flood or DOS attacks.

Tuning:

You can increase the transmission queue for your ethernet interface by:

/sbin/ifconfig eth0 txqueuelen 4000

You can also increase improve kernel performance by tweaking sysctrl

http://datatag.web.cern.ch/datatag/howto/tcp.html

Unfortunately, neither of these is really the solution in this case, as what this does is increase the ability of the system to cope with peak loads for a slightly longer period of time. (Whilst the 'buffers/queues' fill up).

You need to have a look at your overall application set up, and perhaps have your application monitor load in some way, and reject new connection or fail gracefully once a load threshold is exceeded.

(   (()
(`-' _\
 ''  ''

Comment from pjedmond
Date: 07/17/2006 11:05AM BST
      Your Comment       

..or buy a faster processor and more memory, until the ethernet input connection becomes the bottleneck. (You always want the input to be the bottleneck with a process when you are designing it - that way the 'system' has a little bit of leaway to cope with anything 'unusual', which helps guarantee that the 'system' or in this case your application, doesn't deal with anything unexpected.)

(   (()
(`-' _\
 ''  ''


Comment from bhaskarna
Date: 07/17/2006 11:38AM BST
      Author Comment       

Even after stopping all the load clients, why will port remain in frozen state. Also i don't see too many active connections?

Comment from pjedmond
Date: 07/17/2006 11:49AM BST
      Your Comment       

>Even after stopping all the load clients, why will port remain in frozen state. Also i don't see too many active connections?

Correct - You've just managed a successful DOS attack on your system. Look at it from this point of view:

As root, you have permissions to turn off an ethernet connection. Once you've done it, it can't be used until it is reset.

As your tomcat has permissions to create sockets, and exceed the number of sockets available, and fill up the queue, then it does, and it won't work, until you reset it. Ideally, the system wouldn't allow you to create the sockets in the first place, but as the socket creation functionality is such a key part of many processes in a Linux system, the code is designed to be as fast and efficient as possible. As a result, some checks and limitations that otherwise might be there are not. Instead, the user is expected to make the checks necesary.

It may even be that there is an error somewhere being produced, and your system is ignoring it, and then carrying on to do something it shouldn't, thus freezing the setup. Probably worth checking your error logs.

(   (()
(`-' _\
 ''  ''


Comment from bhaskarna
Date: 07/17/2006 12:27PM BST
      Author Comment       

Please execuseme if i am asking very basic questions.

Purposefully, after hitting hung situation, i left the tomcat process for 10hrs(without killing). Even after 10 hrs, i was seeing the same problem. (New requests were not getting into ESTABLISHED state). It use to stay at SYN_RECV for 2 minutes and comeout with no answer.

What does this mean? If tomcat is holding lot of socket connections then i should be able to notice the huge list of ESTABLISHED connections in netstat. But i was not seeing lot of established connections. Does it mean connections were released? if yes, then why i am not getting new connection? will there be any leaks? if yes, how to find out?

Will RHEL blocks a particular port after bombarding it? Do we need to finetune the message_burst, message_cost etc?


Comment from pjedmond
Date: 07/17/2006 01:45PM BST
      Your Comment       

Imagine a stick - If you bend it a little bit, then it will spring back. (This is equivalent to using the socket queue). - If you bend it a lot, then the stick snaps, and you need to get a new one. (This is equivalent to filling the socket queue, and then trying to keep on going...which is what your load testing is doing!)

Purposefully, after hitting hung situation, i left the tomcat process for 10hrs(without killing). Even after 10 hrs, i was seeing the same problem. (New requests were not getting into ESTABLISHED state). It use to stay at SYN_RECV for 2 minutes and comeout with no answer. - You've effectively 'broken' the system, so it won't recover without a restart.

If tomcat is holding lot of socket connections then i should be able to notice the huge list of ESTABLISHED connections in netstat. But i was not seeing lot of established connections. Does it mean connections were released? - All connections in Q were probably released, but the pointer for adding new connections is now pointing in the wrong place (possibly loads of bytes beyond where the Q should be), and as a result new connections cannot be placed in this Q.

Will RHEL blocks a particular port after bombarding it? - No - not unless you have a firewall rule of some sort to do this.

Do we need to finetune the message_burst, message_cost etc? - Without studying your configuration, there is no way to say. The key point is that your application is exceeding the capabilities of the setup that you have. Ideally, RHEL should prevent you from doing this, but in this case it obviously does not - probably due to optimisations that improve performance. You need to change your tomcat code to prevent it exceeding these limitations. For example - if there are more than say 100 connections, then you tomcat servlets need to reject the connection and produce a page that says "Please retry again later". A better alternative would be to configure the necessary restriction at the Tomcat Level:

http://tomcat.apache.org/tomcat-4.0-doc/config/http11.html

In particular in the services.xml file, check and adjust the 'connector' (HttpConnector) configuration to 'tune' your setup. The exact 'tuning' required will depend on your application.

(   (()
(`-' _\
 ''  ''












Comment from pjedmond
Date: 07/17/2006 01:49PM BST
      Your Comment       

Sorry - noticed that you were running Tomcat 5. Worth having a look at logging, and see if that can help locate at what point the setup fails. Start here:

http://tomcat.apache.org/tomcat-5.0-doc/config/

Note that Tomcat 5, and 5.5 have a different logging configuration.

(   (()
(`-' _\
 ''  ''

Comment from bhaskarna
Date: 07/18/2006 10:17AM BST
      Author Comment       

Till now i was suspecting that the socket q overflow problem is caused due to my load runs. But today in nestat -s output, I noticed the following entry in the box where load runner will not hit:
"    391808 times the listen queue of a socket overflowed"

To confirm that loadrunner didn't hit this box, in the tomcat access logs (for last 15 days) i see max of 200-300 requests per day. Out of 300 requests, 200+ will be heartbeat checks and around 60-80 requests are real requests.

What are the various reasons to see 'listen queue of a socket overflowed'?
Is there a easy way to debug and find out who is causing this overflow?


Comment from pjedmond
Date: 07/18/2006 10:48AM BST
      Your Comment       

Interesting, as it implies that tomcat may be the cause:

http://www.linuxquestions.org/questions/showthread.php?t=457517

>What are the various reasons to see 'listen queue of a socket overflowed'?
Only 1 reason really - too many connection attempts in too short a period of time.

>Is there a easy way to debug and find out who is causing this overflow?
Not really, unless someone is doing a 'standard' attack on your PC (Is is isolated from the rest of the network?). Best bet is probably to try using "tcpdump > filename.txt" and check what's going on after the freeze. These are raw packets and this approach is not for the faint-hearted, and will be timeconsuming to check/analyse!

(   (()
(`-' _\
 ''  ''
0
 
bhaskarnaAuthor Commented:
I noticed that the overflowcount is growing studily(even though tomcat didn't get into hung situation). Because yesterday when I checked the count was:
    22288 times the listen queue of a socket overflowed
Today it grown to:
    22330 times the listen queue of a socket overflowed

sar logs doesnt' show any spikes..
Will there be any other logs that justifies too many connection attempts in too short a period of time.?  or should i explicitly turnon some specific logs to get more data on what type of requests that caused this small growth.

0
 
pjedmondCommented:
Well found......

Can you try leaving the server disconnected from the network and see if these overflows continue to grow. If so, then we know that the problem is part of the configuration rather than anything external. May be time to start looking for tomcat logs.

(   (()
(`-' _\
 ''  ''



0
 
bhaskarnaAuthor Commented:
The below thread is initiated by me for the same issue:
http://www.linuxquestions.org/questions/showthread.php?t=457517

The proc status provided in the above thread is taken when tomcat got into the frozen state.

Does proc status of jsvc (tomcat) hints something to you to tell that tomcat is a cause for this problem?
0
 
bhaskarnaAuthor Commented:
i tried to bombard the tomcat server with 100 threads again and noticed that the  overflowcount was growing studily.
Now, didn't get the right answer on why the port didn't hang this time.
But i didn't hit the no answer from port issue. In between we did some tuning in tomcat like maxThreads, acceptCount, jvm parameters like XmX, XmS.


0
 
rindiCommented:
Any reason not to close this Q? It seems there is nothing new...
0
 
rindiCommented:
bhaskarna, still there?
0
 
rindiCommented:
Thanks, but please don't accept a post which only reminds you to close a Question and has nothing to with the Q at all. Select one of the other posts or follow the instructions in my link in the administrative comment above to get the Q closed. I will reopen this Q so that you can properly close it.

Thanks, rindi
PE Storage
0
 
rindiCommented:
Please close the Q according to the EE rules, thanks.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 8
  • 7
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now