Link to home
Start Free TrialLog in
Avatar of Lexmar
Lexmar

asked on

Outbound SIP random fast busy

This is one of those who's on first kind of issues.

I have a commercial asterisk box in an outbound call center.  About three weeks ago the agents making calls started getting about 25% of their outbound calls playing a recall (Fast Busy) when they are running hard.  This system is 100% SIP with no cards and running G711.  There is a dedicated T-1 for VOIP only and the trunk is set for 18 concurrent calls.  Other internet traffic is handled entirely by an 18x2 Cable Internet connection on a dual WAN router.  Problem happens when there are about 13 calls going but can be as few as 6.

On wireshark the asterisk box is sending a 486 Busy Here signal to the extension.  On the same call the SIP provider logs show a status code 404 Not Found.  I can't confirm the 404 yet as the switch the asterisk box is on doesn't support sniffing.  I have them relocating the PBX in the call center to allow me to run Wireshark on it's full packet stream.

I had the SIP provider run diagnostics on 565 incomplete calls last Saturday (3500+ completed) and all the 404 calls showed only after the 3rd gateway was given the invite to handle the call as the primary and secondary gateways were busy and declined the call.

My thought is that I am up against a timer setting somewhere that says if the call hasn't connected in X milliseconds hang up?  Does anyone know if asterisk has a setting for time to connect and if so where is it - which conf file?
ASKER CERTIFIED SOLUTION
Avatar of Member_2_1968385
Member_2_1968385
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Lexmar
Lexmar

ASKER

Is v1.4 likely the same?
How long does it take the asterisk box to recover trunk availability after a hangup?  I have the agents on sofphones setup where they are clicking a link that dials their phone for them.  If they call and get a busy signal (legitimate) or a disconnect, they click the hangup button and immediately click the next number to call link.
They can dial and hangup and dial a new number pretty fast, maybe 1 sec.  Is it possible they are using up the available trunk pool i.e. dialing a new call before asterisk has had time to release the last trunk?  If that is the case I could see them getting a fast busy jumping to a new number and perpetuating the condition or even making it worse.  They get the fast busy's in clusters they start looking for their supervisor to report the problem and it eventually clears up - presumably because they are inactive in the system for several moments and the trunks free up.
I will have Wireshark running Wed. and had them move the PBX onto the switch that allows me to do packet sniffing.  I should be able to see all the packets from/to the pbx from extensions and also from/to the SIP provider and the PBX.
Avatar of Lexmar

ASKER

The info posted by feptias was very helpful in ruling out a significant part of the overall problem as the issue could be a SIP trunking problem, an asterisk problem or a problem with the X-Lite softphones or some combination of all of the above.
In this instance the answer ruled out the possibility that the asterisk box was timing out on the trunk connection while the SIP service provider was hunting a gateway to the POTS or Cell network as his primary and secondary gateways were throttling due load consumng too much bandwidth.
After moving the PBX to a place where I could see all the packets from the extensions, the pbx and outbound to the SIP provider, I could see that asterisk is sending a DNS query for [ext#]-[secret].[named LAN domain] and momentarily waiting for the query to time out.  I doesn't seem to disrupt existing calls but any other extension attempting a call during the wait period will get a fast busy and the call that caused the bogus wuery gets a fast busy because asterisk can't route back to the extension.
The freeze on a long DNS query is a known issue in asterisk.  The actual screwball DNS query should not happen under any normal circumstances and I will start a new thread on that one.
Hi Lexmar and thanks for the points. The info you posted about slow DNS queries causing problems in other Asterisk processes is very interesting and could help to explain two separate issues that I have recently been working on. Could you please post links here to relevant material - you said it is a "known issue".

I am very surprised that the DNS queries you can see are for [ext#]-[secret].[named LAN domain]. The  extension number and password make no sense as part of a DNS lookup (as well as breaching security).

The only reason I can think of for Asterisk to make DNS queries would be if the SIP URI (i.e. the "host" parameter in the peer definition in sip.conf) was given as a name rather than as an IP address. It may also be possible to specify a host name in the Dial command in extensions.conf. Registrations specified in sip.conf would also need to resolve any host name to an IP address, but these would not be triggered by calls starting.

There are several variables that could influence the behaviour of Asterisk when it is doing DNS lookups. The resolv.conf file can include more than one nameserver, there could be entries in /etc/hosts. Also, Asterisk has an option to use SRV records (if present). Please post back here any interesting findings. I am wondering about adding some entries to /etc/hosts to see if this could prevent Asterisk making "slow" DNS queries for every new call - we have a problem where various things seem to start going wrong when the system is getting very busy and it just might be that the DNS queries are the cause.