Link to home
Start Free TrialLog in
Avatar of datastarstar
datastarstar

asked on

"stat=deferred" in sendmail log and messages not received

Most of my emails get through to the recipients, but several intended recipients recently have told me they have not received my emails.  When I check the sendmail log for each of these, I see:
"stat=deferred" (usually followed by "connection reset by...")

Looking further at the log, there are a variety of "stat=deferred" messages with various messages like "operation timed out", "greylisting in action", "please try again later", etc.  (I can't tell whether most of these were received or not).

Do I have a problem at my end, or is it an issue with the recipients?  

 
Avatar of Arty K
Arty K
Flag of Kazakhstan image

> Looking further at the log, there are a variety of "stat=deferred" messages with various messages like "operation timed out", "greylisting in action", "please try again later", etc.  (I can't tell whether most of these were received or not).

greylisting is one of methods to prevent spam. First try to send message will always be declined by remote server with some kind of temporary error message. So the message stays in your queue until next try. Remote peer 'remembers' your first try and when your server reverts back to this message and sends it again, it's now accepted and all further messages are accepted for some time period.

Avatar of datastarstar
datastarstar

ASKER

I am most concerned about the ones that say "deferred" and never seem to be received.  Any idea what would cause this?
Only your mail log can say what really happens.

Choose any message that you are sure was send but never was delivered and find all log entries about this mail job. Each time the mail is 'deffered', there is a reason in log entry.

All other approaches is just guessing.

I said about graylisting, because this reason was in your list and it's confusing if you don't know what is it.

Why you can get 'operation timed out' - probably because remote server is either blocks your server in firewall or just not functioning.

If you have 'conenction reset' or 'connection timeout' while in 'DATA' state, one of possible reasons is an 'MTU discovery' problem when either your or remote server have closed 'ICMP' in firewall.

Everything could be found in maillog if interpreted accordingly.


Nopius,

Here's are two long entries for a message that was never received: (I've x'd out identifying addresses).  Does this help any?

<XX>Feb 11 06:35:13 sendmail[32697]: m1BDYjYo032697: from=<xxxx@xxxxx.com>, size=2193, class=0, nrcpts=1, msgid=<005201c86cb2$ddd71ce0$7264a8c0@GALAXY.LOCAL>, proto=ESMTP, relay=75-147-0-217-NewEngland.hfc.comcastbusiness.net [75.147.0.217] (may be forged)
<XX>Feb 11 06:35:14 sendmail[32866]: m1BDYjYo032697: to=<xxxx@xxxxx.com>, ctladdr=<xxxx@xxxxxx.com> (26274/100), delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=32193, relay=inbound.xxxxx.com.netsolmail.net. [205.178.149.7], dsn=4.0.0, stat=Deferred: Connection reset by inbound.xxxxx.com.netsolmail.net.

Does the "may be forged" message mean anything.  In our situation, our outgoing mail server is a company server, but relayed through our ISP (Comcast).

Thanks!
Two log entries are not enough. If you can, please do grep for all 'm1BDYjYo032697' entries in all maillog files. Just 2 entries may be a result of graylisting, so the next try to send may be successful.


> Does the "may be forged" message mean anything.

Nothing. That means that 'ehlo domainname' and reverse lookup for IP 75.147.0.217 didn't match.

> Connection reset by inbound.xxxxx.com.netsolmail.net.

That means that remote server dropped connection, you are absolutely nothing to do with it, but can it check manually (with telnet SMTP session):

telnet 205.178.149.7 25
ehlo yourdomain.com
mail from: xxxx@xxxxx.com
rcpt to: xxxx@xxxxx.com
data
From: xxx@xxxxx.com
To: xxx@xxxxx.com
Subject: test

Test
.


After this session you will either know on what step you have a 'reset' or if no reset, just try to push mail 'queue' again.

Also check that your local firewall don't forbid icmp packets.
There are no other log entries matching the same ID -- just the two I provided.

I tried sending the test message from a telnet session.  After the '.', received the message:
250 Mail queued for delivery.

but then after exiting  telnet, I rec'd a Symantec Email Proxy message:
"... was unable to be sent because the connection to your mail server was interrupted.  Please open your email client and re-send the message from the sent messages folder."

Was there something else I should have done while in telnet?  I'm not sure what you meant by "try to push mail queue again"

> There are no other log entries matching the same ID -- just the two I provided.

I don't believe you :-) And that's why.
Sendmail should re-run queue every 30 minutes by default and re-attempt to send 'failed' messages every 30 minutes, then every 2 hours for 2 or 3 days until the message is either send or dropped.
In your case the attempt is dated with Feb, 11 and there where a temporary error, so there _must_ be further attempts entries for this job id m1BDYjYo032697 (which is constant among attempts).  Probably your mail log was rotated or inconsistent (say when syslogd dies).

> but then after exiting  telnet, I rec'd a Symantec Email Proxy message:

Where yout Symantec proxy is located and what is a real path of your message?
Can you try to send this message directly, bypassing any SMTP proxy?

> I'm not sure what you meant by "try to push mail queue again"

If you are using 'sendmail', run 'mailq -qRdomain.com' where domain.com is a name of failed recipient domain. Read 'man mailq'

...I will be tied up for the next few hours -- will respond later today.  Thanks for your help so far!
nothing in the log except those 2 lines -- really!  I've re-downloaded the log file so I know it's current and the logs are rotated only on the 1st of the month.

'mailq -qRdomain.com' reports:
1BDYjYo032697     1748 Mon Feb 11 06:35 <xxxx@xxxx.com>
                 (reply: read error from inbound.xxxxx.com.netsolmail.net.)
                                         <xxxx@xxxxx.com>


Does this suggest an error at their end?

I also tried disabling the symantec proxy and re-did the send using telnet, but nothing at all appeared in the log.  Is this because sending via telnet, am I bypassing my mail server?  I don't know yet if it was received -- is there something else I should be looking at?
The message I sent via telnet was received.  Does that shed any new light?
Hi, datastarstar.

I'm back.

> nothing in the log except those 2 lines -- really!

Now I believe you. There is only one possible reason why it may happen. Probably you have a number of messages to the same remote mail host, they are are sitting in a mail queue and processed in an order when tried to resend. When first message fails with such kind of failure, all other are skipped until next retry of entire queue.


> Does this suggest an error at their end?

No. It shows previous error. To look how it was processed, you should run 'mailq' second time after some time with the same parameters.

> I also tried disabling the symantec proxy and re-did the send using telnet, but nothing at all appeared in the log.  Is this because sending via telnet, am I bypassing my mail server?

Yes. Was your attempt successful ?

> I don't know yet if it was received -- is there something else I should be looking at?

If manual test went without errors and second 'mailq' shows no more messages then the message was sent (there should be a log entry for that). If manual test was OK, but mailq still shows 'read error' that means you have network problems.

You didn't say about firewall settings between your mail server and the Internet. Do you have 'icmp' closed? On your Linux mail host try to run 'sysctl net.ipv4.ip_no_pmtu_disc=1' and then rerun 'mailq'. If not helps, we should debug your smtp session.

To debug network problems run 'tcpdump -s 2000 -w /tmp/smtp.dump host 205.178.149.7' in one window (or SSH session), and run 'mailq -qRdomain.com' in the other. Wait for 30 minutes and close 'tcpdump' with ^C. Then download /tmp/smtp.dump and see (with wireshark program) on what step you have a connection reset. You may post the dump here as a file, if you can't find the source of the problem yourself.
Sorry for the delay in getting back to you.  

The manual test (via telnet) was successfule -- the message was received by the recipient.

When I run 'mailq' with no parameters, I get a long list, most with these 2 messages:
Deferred: Connection reset
reply: read error

Given the long list, I wonder if it's a good idea to once in a while flush the queue -- if so, how to do this?

When I run 'mailq -qRdomain.com (using the domain of the messages not received), they all show:
reply: read error
There are about 6 messages like this dating back about 3 weeks.

By the way, 'sysctl' and 'tcpdump' both resulted in a "command not found" error (the server is running FreeBSD if that helps).  Also, I think I may be in over my head with the more advanced debugging.

Based on the above info, would you say that the problem is at the receiving end, or is there any reason to suggest something needs fixing at my end?
ASKER CERTIFIED SOLUTION
Avatar of Arty K
Arty K
Flag of Kazakhstan image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
My queue was clogged with over 400 messages in total -- I think you're right that they are corrupted in some way, since many are rather old.  I'm in the process of flushing them out.  

Interestingly, most of these messages were a few domains, but all associated with netsolmail.net -- 'inbound.domain.com.netsolmail.com'.  

I'm going to close this question now and give you the points -- Nopius, you've worked hard and guided me in what this all means -- thanks for your help!
Thank you for points, datastarstar.

But that may be your firewall configuration problem, really. That's why I asked to check PMTU setting on your mailserver. Symptoms of disabled ICMP with enabled path MTU discovery (which is by default) is inability to send messages above some size with 'transfer timeout' error.
With manual test with 'telnet' you may try to send message of 64KB (just copy-paste some text in a DATA portion).