host name lookup failure

I've been having problem with dns lookup that have gotten rather acute today. My mailq is usually empty, but this morning I had 40 entries which are taking about 3 hours to get queued. They have the annotation: host name lookup failure. I also have about 500 sendmail processes running and I normally have about 40. All this is telling me sendmail is having problems connecting to servers and/or resolving hostnames.

Also, I have tried pinging various hosts (such as, and the pings timeout with no data. I've tried traceroute and I get: traceroute: unknown host I've tried pinging IP addresses (ping and it is timing out.

I suspect a problem with my DNSs, but that wouldn't explain why I can't ping an IP.

This is being a big problem. I'm running Linux 2.4.29. I'm not running named, but I have nameservers configured in resolv.conf. How do I go about figuring out my problem?
Who is Participating?
nociSoftware EngineerCommented:
Link saturation is more less comparable to a huge traffic jam.
So much traffic, that some bits get lost (udp does get lost easy, tcp less easy)
lost tcp traffic means retransmits adding to the insult.

Now some modems have a large amount of buffer memory, they are means to keep traffic
while the transmitter is busy, with a traffic jam that just heaps up,
then newer traffic gets lost, but the buffers still have to be sent,
possibly the receiver allready asked for retransmits for some of that data.
Also keep in mind that in tcpip you need to ack the reception of (sets of)packets
those acks also get stuck in that jam.

To much packets for upstream will also mean nobuffers for downstream and packet loss there.

Normaly those traffic jams clear up after some time. (minutes not hours)
unless someone is also running file sharing tools etc. That will hurt if not restrained to a trickle. (wondershaper does that...) You can effectively limit
incoming traffic like you can with outgoing, you ll just have to handle those.

There might be another thing...
How do you connect your system to the linksys is there a managed switch in between?
if so do all settings match up (actual not auto) are both sides of a cable 100FullDuplex or 100Halfduplex if not it will wreak havoc on a connection.

Also upstream problems are tracable to a certain extend.., just like traceroute
there is a tool called pchar (
that can estimate upstream characteristics hop by hop, but it needs a fairly clean first few hops to reliably test further hops, and it takes a while.

jmarkfoleyAuthor Commented:
Hey! Is there anyone left in this topic area?! This should be an easy one for some expert!
nociSoftware EngineerCommented:
dig is the tool of the trade here

  dig   (optionaly with +trace)
  will tell what it tries..

Does pinging on ip address still work?   is an ip address of google.
  if not, you can have a ip routing / filtering problem.

traceroute might help here:
  traceroute -I
  will show where it goes.

if traceroute works, try with tcptraceroute
  Source is here:
  that can attempt a traceroute for say http if policy routing has been setup
  for such protocol in a different way from say smtp .
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

jmarkfoleyAuthor Commented:
OK, next time I'll give dig a shot. I have more data on what's happening. As I said, I generally have about 40ish sendmail processing handling mail, nothing in the mailq and normally ping and whois and traceroute and nslookup all work just spiffy. However, every so often I'll notice my sendmail process count climbing. Today I saw it over 100 and yesterday when I was having noticable problems it was at 500. At these time ping, traceroute, whois, etc. all simply terminate with timeouts. I haven't tried dig. This condition is intermittant, but seems to be happening more frequently lately - once or twice a day; different times of day. The condition can persist on the order of an hour (today) to several hours (yesterday).

Right now, I don't think it is a mail problem. I just think sendmail is being affected like ping et al because sendmail can't resolve domains. I'm thinking:

a) nameserver problem (but then why would it usually work? And why would piniging an actual IP not work?)

b) you mentioned ip routing / filtering problem.

How would I go about determining ip routing/filtering, via your traceroute suggestion? Problem is, when this happens traceroute times out too.

Your thoughts?
nociSoftware EngineerCommented:
Pinging a name involves lookup of names, pinging ip doesn't, it's a quick check for name lookup failures. (first doesn't, 2nd does work).

filter/routing problem:

1) is your own routing OK
netstat -rn   # show routes, does the network with netmask point
to your gateway?

2) then use ping/traceroute -I to find out where the packets go.

And find out where stuff stops, there nearby is the culprit (probably one beyond
last answering node).

If one protocol does work and another doesn't then you might have filter issues,
otherwise check the system where it all stops.
jmarkfoleyAuthor Commented:
I'm experiencing the problem right now. Here's my test results:

1 10:21:50 mfoley@server:~
> dig

; <<>> DiG 9.3.0 <<>>
;; global options:  printcmd
;; connection timed out; no servers could be reached
1 10:22:16 mfoley@server:~
> ping
PING ( 56(84) bytes of data.

The dig timed out. I had to CRT-C the ping after several minutes. A subsequent dig worked! But when I immediately ran it again, it timed out. You see how flakey this is!

1. netstat -rn #looks OK. My gateway is

Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface   U         0 0          0 eth0       U         0 0          0 lo         UG        0 0          0 eth0

2. traceroute -I #hangs. I killed it after 17 rows of asterisks.

It seems that where it all stops is right on the other side of my gateway (I can ping my gateway), but as you can see, one of the dig's during this testing session made it through. What could it be? Downstream load? Bad router? Flakey service provider equipment?
jmarkfoleyAuthor Commented:
btw - things seem to come in OK. I'm still getting lots of spam. Plus users are able to get to my web so it looks like the outside can connect to me, I just can't out. Also, fyi it is now 40 minutes after running my tests and pings and traceroutes are still not working. I am truly puzzled.
nociSoftware EngineerCommented:
If people can reach you => at least routing should work..,
Any firewall (on this system?) and what do it's filters look like?
jmarkfoleyAuthor Commented:
The only firewall I believe I have set up is via the linksys router. The firewall settings in there are:

Block Anonymous Internet Request
Filter Multicast
Filter IDENT (Port 113)

The other option: Filter Internaet NAT Redirection, is not selected.

It is really bad today. I still am having extreme problems. I still can't ping/traceroute.

nociSoftware EngineerCommented:
A lot of modems have a problem if the upstream is too large.
(it will almost kill your total downstream),
the trick is to limit the outgoing stream (to the modem) to just below
(say 20-50 kbit per second below max, that would leave the modem
buffers almost empty, using the space for download.

Look for the wondershaper script on the internet.

I am not sure it will work on linksys, although linksys used to use linux for
its OS so I would expect it to be able to adopt the ideas. (it works for me).

Even then a massive upload should allow some pings to get through,

Even the first line is asterisks? I would expect something to return from the
default gw.
Any filters (iptables, ipchains) on your qmail box?
jmarkfoleyAuthor Commented:
I'm using sendmail. No I have no ipchains, etc.

My upstream is fairly small. In fact, I experience this problem with nothing going out. The only thing consuming bandwidth is rejected mail. I receive between 300,000 and 400,000 bogus emails per day. I haven't analyzed this activity, but right now I'm still getting about 8 bogus emails/second and my sendmail tasks are at a low of 22 and I can ping, etc. to my heart's content.

I'll check on your wondershaper link.

My building is served by a cisco router administered ny a 3rd party. Is it possible they are limiting upload based on time of day?
nociSoftware EngineerCommented:
The exact mailer is not important (i probably misread mailq)..

Please be aware that even no traffic means a stream of output
of at least ICMP messages, or SYN-ACK + (optional data ACK) + FIN + FIN-ACK
even when not sending... Those packets are at least 32 bytes (header only) big.

If there is an upstream bottleneck, it can case the same problems.

Also you have to keep in mind that UDP traffic is the first to go if a link
gets satured (UDP is a 'lossy' protocol) there is no guard against packet loss.
To have reliable links TCP was invented. TCP though ads overhead (3 packets to
start + 3/4 packets to close + ack packets for chunks of data.)

Name resolution is a UDP (port 53 ) query response protocol. If a link get satured
it will lose the data. Also traceroute uses either udp or icmp.

the a forementioned tcptraceroute might have a slightly different view
as it tries to only send the first tcp packet (SYN), and analyzes the answers
(ICMP for not the complete link yet, or SYN-ACK (bingo)).

Tools to investigate are:
measure link load menu driven. Look for detail interface statistics.

see the packets in transit...
a 'tcpdump -vni ethX udp port 53' should show all udp request & answers.

Another thing is: is you have problems resolving names, maybe adding your own nameserver can help (it will miss out on the first query but retain the name for
another time, some time later), mail would get delayed.

Also do you send the mail out yourselve, or do you use a smarthost setup.
If you have a smarthost available (the mailserver of your ISP) then that mailer has
to deal with name lookups etc. You just need to provide the name (or ipaddress)
of that smarthost. (better chance to get mail out).

jmarkfoleyAuthor Commented:
I do have a nameserver running on my host. I didn't last week, but I configured that hoping it would help. It does not.

Yes, I am sending the mail myself so I am doing the lookup.

I will also check on your suggested tools. You're just a gold mine for this stuff!

Link saturation. Hmmm. What's that? Remember that when this problem occurs I can't even ping or tracerout using an IP address, DNS not involved!

I've been tracking incidents and this inablilty to get out starts generally starts between 8:00 and 9:00am and tapers off around noon. I often have a spike at the end of the day between 3:00pm and 6:pm. My system's attempts to get out (I don't know what else to call it) doesn't really change. That is, I'm not sending thousands of emails in the morning versus the evening. I might send 30 emails a day. Nor is sendmail handling extra spam at any point.

If it is some problem downstream, will I be able to diagnose it at all?
jmarkfoleyAuthor Commented:
I had an expert come in and look at the system and run some tests. What's happening is that so much attempted spam is coming into my system that I saturate the linksys router doing black-list lookups, sending reject messages, etc. I was advised to do 4 things:

1. Get a firewall running on my linux box. Currently I have none.
2. Get an anti-virus tool. clam was recommended.
3. move the mail to a different server/IP
4. Get a more robust router. Linksys is designed for home use.

I'm going to do all of these. For #3 and #4 I think I'm just going to setup a new linux box to do routing and firewalling.

Thanks for you help.
nociSoftware EngineerCommented:
I guess if the linksys is only used as a bare router it should be able to cope with the traffic. (didn't expect it to be able to filter mail too...;)

having a system behind it doing the mail etc. handling is what i do (different modem though). Also think about using wondershaper to limit the outgoing traffic a little below line capacity (about 10Kbps should be sufficient).
It will prevent some upstream data killing your downstream.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.