Outbound SIP random fast busy

This is one of those who's on first kind of issues.

I have a commercial asterisk box in an outbound call center.  About three weeks ago the agents making calls started getting about 25% of their outbound calls playing a recall (Fast Busy) when they are running hard.  This system is 100% SIP with no cards and running G711.  There is a dedicated T-1 for VOIP only and the trunk is set for 18 concurrent calls.  Other internet traffic is handled entirely by an 18x2 Cable Internet connection on a dual WAN router.  Problem happens when there are about 13 calls going but can be as few as 6.

On wireshark the asterisk box is sending a 486 Busy Here signal to the extension.  On the same call the SIP provider logs show a status code 404 Not Found.  I can't confirm the 404 yet as the switch the asterisk box is on doesn't support sniffing.  I have them relocating the PBX in the call center to allow me to run Wireshark on it's full packet stream.

I had the SIP provider run diagnostics on 565 incomplete calls last Saturday (3500+ completed) and all the 404 calls showed only after the 3rd gateway was given the invite to handle the call as the primary and secondary gateways were busy and declined the call.

My thought is that I am up against a timer setting somewhere that says if the call hasn't connected in X milliseconds hang up?  Does anyone know if asterisk has a setting for time to connect and if so where is it - which conf file?
Who is Participating?
feptiasConnect With a Mentor Commented:
In version 1.6 sip.conf there is a parameter "timerb" that looks like the one. However, it is new and I could not find any documentation in the wiki. The attached snippet is taken from the sample sip.conf file installed as part of Asterisk
;--------------------------- SIP timers ----------------------------------------------------
; These timers are used primarily in INVITE transactions.
; The default for Timer T1 is 500 ms or the measured run-trip time between
; Asterisk and the device if you have qualify=yes for the device.
;t1min=100                      ; Minimum roundtrip time for messages to monitored hosts
                                ; Defaults to 100 ms
;timert1=500                    ; Default T1 timer
                                ; Defaults to 500 ms or the measured round-trip
                                ; time to a peer (qualify=yes).
;timerb=32000                   ; Call setup timer. If a provisional response is not received
                                ; in this amount of time, the call will autocongest
                                ; Defaults to 64*timert1

Open in new window

LexmarAuthor Commented:
Is v1.4 likely the same?
How long does it take the asterisk box to recover trunk availability after a hangup?  I have the agents on sofphones setup where they are clicking a link that dials their phone for them.  If they call and get a busy signal (legitimate) or a disconnect, they click the hangup button and immediately click the next number to call link.
They can dial and hangup and dial a new number pretty fast, maybe 1 sec.  Is it possible they are using up the available trunk pool i.e. dialing a new call before asterisk has had time to release the last trunk?  If that is the case I could see them getting a fast busy jumping to a new number and perpetuating the condition or even making it worse.  They get the fast busy's in clusters they start looking for their supervisor to report the problem and it eventually clears up - presumably because they are inactive in the system for several moments and the trunks free up.
I will have Wireshark running Wed. and had them move the PBX onto the switch that allows me to do packet sniffing.  I should be able to see all the packets from/to the pbx from extensions and also from/to the SIP provider and the PBX.
LexmarAuthor Commented:
The info posted by feptias was very helpful in ruling out a significant part of the overall problem as the issue could be a SIP trunking problem, an asterisk problem or a problem with the X-Lite softphones or some combination of all of the above.
In this instance the answer ruled out the possibility that the asterisk box was timing out on the trunk connection while the SIP service provider was hunting a gateway to the POTS or Cell network as his primary and secondary gateways were throttling due load consumng too much bandwidth.
After moving the PBX to a place where I could see all the packets from the extensions, the pbx and outbound to the SIP provider, I could see that asterisk is sending a DNS query for [ext#]-[secret].[named LAN domain] and momentarily waiting for the query to time out.  I doesn't seem to disrupt existing calls but any other extension attempting a call during the wait period will get a fast busy and the call that caused the bogus wuery gets a fast busy because asterisk can't route back to the extension.
The freeze on a long DNS query is a known issue in asterisk.  The actual screwball DNS query should not happen under any normal circumstances and I will start a new thread on that one.
Hi Lexmar and thanks for the points. The info you posted about slow DNS queries causing problems in other Asterisk processes is very interesting and could help to explain two separate issues that I have recently been working on. Could you please post links here to relevant material - you said it is a "known issue".

I am very surprised that the DNS queries you can see are for [ext#]-[secret].[named LAN domain]. The  extension number and password make no sense as part of a DNS lookup (as well as breaching security).

The only reason I can think of for Asterisk to make DNS queries would be if the SIP URI (i.e. the "host" parameter in the peer definition in sip.conf) was given as a name rather than as an IP address. It may also be possible to specify a host name in the Dial command in extensions.conf. Registrations specified in sip.conf would also need to resolve any host name to an IP address, but these would not be triggered by calls starting.

There are several variables that could influence the behaviour of Asterisk when it is doing DNS lookups. The resolv.conf file can include more than one nameserver, there could be entries in /etc/hosts. Also, Asterisk has an option to use SRV records (if present). Please post back here any interesting findings. I am wondering about adding some entries to /etc/hosts to see if this could prevent Asterisk making "slow" DNS queries for every new call - we have a problem where various things seem to start going wrong when the system is getting very busy and it just might be that the DNS queries are the cause.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.