How to troubleshoot stuck JMS queue

We have a messaging application. We recently converted it "back" to using sql backed jms queues. We had this problem before and converted to non-persistent memory queues to eliminate the problem. I have need to use persistence so I am back to trying to solve this problem.

Typically the messages we are server are SMS messages. Messages come into the system, parameters are retrieved from an LDAP database and then a reply is sent back.

What will happen is eventually the replies get stuck, I then have to restart the application and thankfully the messages are persistent so messages resume.

When the queues get stuck there are no errors that I can detect and I have logging at a pretty high level also the queues will never unstick themselves and I have to do the restart, no matter how much time can pass.

We are using Mule 1.4.4, ActiveMQ 5.4.2 and Java 1.6.26

I am desperate and need some advice on how to troubleshoot this. Needless to say we HAVE tried many things like tunning queues, forcing them all the be vm queues.

Looking forward to some advice!  
skioneAsked:
Who is Participating?
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
Hi skione,

Sorry for the delay, it was the weekend here and I hadn't a chance to reply to you.

The configuration to disable producerFlowControl 'looks' ok but perhaps there is some small part of that syntax (or the location of the attribute) that is not quite right. Although, I thought that ActiveMQ validates the configuration using it's XMLSchema definition, but maybe that doesn't happen when using it embedded. Validation should definitely determine whether that config is ok or not, but if it isn't being validated and there is an error, perhaps it is silently disregarding that attempt to disable flow control. *shrugs* Just some thoughts!

> We are using persistent queues and the default ActiveMQ5.0 destination policy

By cursors, I meant which one of vmCursor, fileCursor or storeCursor are you using? I noted that you mentioned vmCursor in your original post. From my experience, vmCursors can be faster but is more likely to trigger things like flow control, etc that result in things getting stuck. Currently, we are using fileCursors and although things can slow down a little when we get a high number of pending messages, we have no longer gotten to a point where things have 'stuck' or crashed, etc and everything just runs smoother.

> The monitor shows the stuck queue as being empty

From what I understand of your problem, you have at least 2 queues, one to receive messages on and one to send the replies out. So, you say that one of the queues is empty, but what is the state of the other? It may sound illogical but what I found is that the problem doesn't always look like it is where it should be, if you get what I mean.

Here is a scenario that I think might be happening (it is similar to one we had)... You are receiving message on queue A, processing them and posting replies to queue B. Also, this is done transactionally so that if the reply doesn't successfully get to queue B, it is rolled back on queue A and will eventually be retried. The messages in queue A are all being stored in memory (generally what happens when using the vmCursor, I think), and the ActiveMQ has a finite amount of memory that it can use. Generally things work fine because messages are only relatively short lived in ActiveMQ. But if the incoming queue A's pending messages goes up, the memory usage goes up. At some point there gets to be a point where there is no memory to post the reply, and so that fails, the message gets rolled back on queue A which *doesn't* free up any memory, and therefore it can't go any further.

I still would have thought that you should see error about not being able to post the reply or from ActiveMQ filling it's memory allocation, etc. but maybe that might give you a clue to where to look. So, yeah, I would recommending to try a different cursor type (eg. fileCursor) and perhaps running ActiveMQ as a separate server process might highlight configuration issues or hidden exception messages, etc.

Hope this helps, let us know how you go!
0
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
> there are no errors that I can detect and I have logging at a pretty high level

Is this application logging that you are referring to? Have you looked at the ActiveMQ logs?

Also, you mention "sql backed" JMS queues, have you tried other persistence options, eg. Kaha?
0
 
skioneAuthor Commented:
Sorry, we are using kahadb and yes I am referring to the activeMQ (mule, app as well) logs. We have everything piped into one log.

And I believe we are using embedded ActiveMQ so its not the full on installation.
0
Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
Oh ok. My initial thought would be that it is 'Producer Flow Control' that is kicking in and stopping your app from putting messages on the queue, although I would have thought that that would produce some sort of logging. Can you see if this is enabled or not in the ActiveMQ config (note that it may be that it is a default setting, so it may not be explicitly enabled) and try disabling it and see if you get exceptions in your app? Also, what type of cursor do you currently have setup for the queue that the replies going to? And what other queues have you tried? Are you monitoring the queues while your app is running using something like the web interface (don't know if you can do this if you are using it asa embedded) or via JMX? You might want to try firing ActiveMQ up as a standalone process, it may make monitoring things a little bit easier! (I have always run our ActiveMQ instances this way (not embedded), so that is where I am coming from with my ideas to try and troubleshoot)
0
 
skioneAuthor Commented:
Answers below:

Oh ok. My initial thought would be that it is 'Producer Flow Control' that is kicking in and stopping your app from putting messages on the queue, although I would have thought that that would produce some sort of logging.
Can you see if this is enabled or not in the ActiveMQ config (note that it may be that it is a default setting, so it may not be explicitly enabled) and try disabling it and see if you get exceptions in your app?

This is the current configuration of the producer flow control.  We still get a sticky queue.
<policyEntry queue=">" producerFlowControl="false" memoryLimit="1mb" useCache="false">

Also, what type of cursor do you currently have setup for the queue that the replies going to?

We are using persistent queues and the default ActiveMQ5.0 destination policy.  We haven’t configured any destination policies

And what other queues have you tried? Not sure what you mean.  Please explain a little further

Are you monitoring the queues while your app is running using something like the web interface (don't know if you can do this if you are using it asa embedded) or via JMX?

We are using the JMX console. We are able to actively monitor the queues.  The strange thing is that when the queues are stuck It never shows up in the monitor as I thought they would.  The monitor shows the stuck queue as being empty.  


----

I am also seeing if it is a server specific issue and we are testing that now. If that proves that it is not a server issue I will try running ActiveMQ as not embedded but see if any of the info above illuminates something. Thank you very much for your responses :)
0
 
skioneAuthor Commented:
I am awarding you the points because your suggestion of using file cursors helped illuminate more details.

Thank you!
0
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
Not a problem! Also, if you did end up finding the real issue, or you do find it in the near future, it would be great if you could post some info here, just to help others if they ever stumble upon this question at a later date!
0
 
skioneAuthor Commented:
Specifically using file cursors allows us to "see" a stuck queue and that in of itself was the essence of my problem. While there may end up being a further root cause or more questions that specifically answered the one I asked here.

Happy holidays!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.