Link to home
Start Free TrialLog in
Avatar of blohrer
blohrerFlag for United States of America

asked on

Sendmail Error 451 4.7.1 and Illegal Seek

We have been having a problem lately with our two sendmail machines.  We have had them for a long time, and they have always been reliable until recently.  Currently these two sendmail machines are our outside relays. They accept mail, then relay to our spam box, which then relays to our Exchange server.  This was all working correctly until recently.

Now, several times a day, the two sendmail queues stop.  It appears that an email comes into the queue and causes a "451 4.7.1 Please try again later" error.  Once that error occurs then all the other emails that appear to come in behind it receive the illegal seek error.  The queues then appear to stop.  

Generally rebooting will not fix the problem.   And it seems that issuing a sendmail -q doesnt always fix it.   It seems to me that the machines have to queue up over 100 messages, before sendmail -q will do anything.  If I issue before that, it instantly comes back to the prompt.  When it does work, the queue clears out, except for the original message that caused the 451 4.7.1.

I am very new to sendmail, and only can relate what I know through trying to diagnose this problem.

Both boxes are running Suse 7.2 with sendmail 8.11.2.

The spambox is a third party appliance running solaris and sendmail 8.12 (I believe)

The email that causes the jam is always a piece of spam, and for some reason is always ~650 bytes.

Any help would be appreciated!!!
Avatar of jlevie
jlevie

Based on the evidence presented I'd say that the problem lies with the "spam box", not the sendmail relays. What I think is happening is that a particular spam message comes in that causes the "spam box" to go wierd and start refusing SMTP connections. When that happens the Sendmail servers will start seeing a "connection refused" and there have been bugs in Glibc and sendmail in the past that reported that as a "illegal seek". I thought that the sendmail problem with that was fixed in 8.10.10, but there could still be a problem in the 8.11.2 version (I know it is fixed in 8.12.x).

And upgrade of SuSE (and thus sendmail) to the current version should fix the error message, but not the problem. For that you'd have to look at the "spam box" and see what happens to it when this message comes in. My suspicion would be that something in that message casues the anti-spam engine to run wild and use up a lot of system resources.
Avatar of blohrer

ASKER

Thanks for the reply!!!  The problem is that the spam box is a third party appliance.  They say there is nothing wrong with the box, they have others running with our config etc etc...

They say, looking at their logs, that they see the email prior to what registers as a 451 4.7.1 in our box come in, and then nothing following it.  So to them it looks like our box just stopped sending.

Avatar of blohrer

ASKER

The Spam appliance company feels that there is something in our config or sendmail box that when the 451 4.7.1 error occurs it is causing our sendmail outbound queue to hang up.  I can sit and watch the queue (using mailq over and over) and see that inbound mail is queueing up, but its like the outbound engine just stops at the time of the 451 4.7.1.  

I have deleted the offending email and tried sendmail -q, but as I said, until about 100 -130 emails queue up sendmail -q returns to the prompt very quickly without doing anything.  Once a large amount queues up, then sendmail -q sits and processes all the mail and the queue clears out.

I don't know if I am just trying long enough and that one time it works I am over somekind of timeout limit, or if its just luck of the draw.
The only way that the queue would begin to backup is if Sendmail can't forward the queue'd messages to your "spam box". I can't remember any problems with 8.11.x that would cause this, so in the absence of evidence to the contrary I'd be much more willing to say that the problem is with the "spam box".

And we should be able to tell if that's the problem the next time this happens by executing 'sendmail -v -q'. If the "spam box" is refusing the SMTP connection you'll see the queue run very quickly with every message failing. And if you simultaneously have a tcpdump running you should see the connection being refused by the "spam box". I'd recommend running the tcpdump into a file (tcpdump -s 0 -w /path-to/dump-file host spam-box-IP) so you'll have the evidence to support a trouble call to the vendor of the "spam box", should it turn out to be a problem with that box.
Avatar of blohrer

ASKER

I went to try what you said... I couldnt get the TCPDUMP to work when I had the problem.  The sendmail -v -q did as you said, it just blew through the listing, each item marked with an illegal seek.

On our box I did a rcsendmail restart.  The first time I did this the shutdown completed, but the start failed, I issued it a second time and both succeded.  The sendmail -v -q the processed normally.  There were 215 emails in the queue.

What I did notice was that 3 emails again, where left all with the 451 4.7.1 error.

The rest of the 215 emails processed.

To me it sounds as if the sendmail box is stopping for some reason, not the spam box.
Nothing presented so far clearly indicates that the problem is with one system or another. The conclusive evidence would be knowing if the "spam box" is refusing the connection. And that is going to require seeing the results of a tcpdump of the network traffic. What sort of problem did you encounter when trying to do capture a sniffer trace to a file?
Avatar of blohrer

ASKER

Ok I did this command.. and it seemed to start capturing

tcpdump -n ether proto \\ip and dst 192.168.250.10 > test

I saw the problem, started the above command, and executed several sendmail -v -q, all sped through the listings.  I broke the above command, 0 packets captured.  When I VIed the file, nothing in it.  It seemed the first file to cause a problem was about 12:11am.

Above was about 12:35am.

I kept trying sendmail -q, until about 12:41... I noticed this because this would be, from my reading, the default 30m queue clear time (?) Then it started rolling.  Below is a a sample of what the above tcpdump command caught, starting with the first line

00:41:58.770205 192.168.250.2.minipay > 192.168.250.10.smtp: S 1176338844:117633
8844(0) win 32120 <mss 1460,sackOK,timestamp 10356281 0,nop,wscale 0> (DF)
00:41:58.771006 192.168.250.2.minipay > 192.168.250.10.smtp: . 1176338845:117633
8845(0) ack 3859001346 win 32120 <nop,nop,timestamp 10356281 31040143> (DF)
00:41:58.890967 192.168.250.2.minipay > 192.168.250.10.smtp: . 0:0(0) ack 106 wi
n 32015 <nop,nop,timestamp 10356293 31040155> (DF)
00:41:58.891135 192.168.250.2.minipay > 192.168.250.10.smtp: P 0:20(20) ack 106
win 32120 <nop,nop,timestamp 10356293 31040155> (DF)

Mail flowed out of the queue.

When the queue cleared out, 120 emails, I was left with one email.

----Q-ID---- --Size-- -----Q-Time----- ------------Sender/Recipient------------
i3J4Fan00719      657 Mon Apr 19 00:15 <vsywmdxf@finance.wiwi.tu-dresden.de>
                 (Deferred: 451 4.7.1 Please try again later)
                                       <cmistry@aaa.com>  
(dest domain changed)


I did as a way to try to solve this delete one email, that I suspected of causing the problem at 12:11.


So queue is stuck, do a capture, nothing shows in the capture until after about a half hour, when the queue starts flowing again, and then records are in the capture file.



Avatar of blohrer

ASKER

Also I may add, we have two of these sendmail boxes that deliver to the spam box.  98% of the time if one stops, the other doesn't.  Which again is making me lean towards a problem on he sendmail boxes and not the spam box.
ASKER CERTIFIED SOLUTION
Avatar of jlevie
jlevie

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of blohrer

ASKER

Ok well I still haven't figured out what is happening, but I have another couple of questions.  Trying to get a work around.

Right now we get one of these emails that causes the 451 4.7.1 error.  The queues stop.

If I issue the sendmail -bh I see an entry for our spambox and under the result it has the 451 4.7.1 Please try again later.

sendmail -q does not clear the queue until after 30mins from the problem email.  

Ok so here is my questions:

1) Is there a way to change the 30min interval (MinQueueAge I believe).  I would like to set it to something like 5minutes, then if a problem email comes in, it will be only 5 mins until the queue will clear.
2) Is there a way to change the status of what sendmail thinks of our spam box (e.g. change it from the 451 error to up or something?)
3) Is there a way to set our spambox, somewhere in sendmail, to basically say... this host is always up?

The box 99% of the time is inbound mail only.  I am not worried about it retrying outbound sites, so I think lowering the MinQueueAge will be fine.

A consultant setup these sendmail machines, and we are looking to see if he can do as you advised and setup new redhat 9 machines.

Thanks for any input

Bill
I suspect that the 30 minute queue interval is something of a red herring. The queue interval determines how ofter Sendmail will run the queue and attempt to deliver any queued messages. It has no affect on the behaviour of 'sendmail -q'. Something else is locking up the queue and having to time out and I'm thinking of something is the kernel's networking stack. A clue as to that being the problem might be found by attemting to open and SMTP connection to the "spam box" when a system gets in this state.

I'd run a tcpdump in one window (tcpdump host spam-box-ip) while executing a 'telnet spam-box-ip 25' in another window. Assuming that fails, I'd do a 'killall -9 sendmail' and try the telnet connection again. If it still fails and theres no evidence of an outgoing packet on port 25 it begins to look more like a kernel problem. Of course, if the inital telnet fails but tcpdump shows the outgoing reply and no response from the 'spam box' we are back to the problem being the issue.
Avatar of blohrer

ASKER

Will try that...

How is this for bizarre.  If I do sendmail -q... nothing happens.

If I do sendmail -qSstring that email goes
then if I do sendmail -q the queue clears.

That's, well, really bizzare. When that happens is the queue completely cleared, including the message the appears to trigger the problem?
Avatar of blohrer

ASKER

The email that triggers the problems doesnt get sent, it remains with the 451 4.7.1 Please try again error.  This email 99.999999999% of the time is spam, so I just delete it from /var/spool/mqueue

What I'd like to see you try the next time this happens is to first try a 'telnet spam-box-ip 25' while the queue is still stuch such that 'sendmail -q -v' is reporting the error to see if you can open a connection to the spam box. Then execute 'killall -9 sendmail' followed by a 'ps -ef | grep sendmail| grep -v grep' to be sure that all sendmail processes are gone followed by a 'service sendmail start'. Then see if the queue is being run and messages are going to the spam box.
Avatar of blohrer

ASKER

thanks for all the help  -- we upgraded to Sendmail 8.12.11 and so far so good
Avatar of blohrer

ASKER

Ok well that didnt fix the problem

What I have noticed is that it is one particular type of spam email that is causing it.  Every one of these is identical.  What I have been doing is moving these emails to a /var/spool/mqueue/badmail directory, then I run sendmail -qRourdomain  and the queue clears.  Sometimes I hit another, move it and again clear the queue.  The spam is as follows

First what is in the Q file from mqueue

V6
T1083256934
K1083256945
N1
P31553
MDeferred: 451 4.7.1 Please try again later
Fbs
$_c-24-19-194-71.client.comcast.net [24.19.194.71]
$rSMTP
$scrazyslagelse.dk
${daemon_flags}
${if_addr}192.168.250.2
S<hgdaptn@crazyslagelse.dk>
MDeferred: 451 4.7.1 Please try again later
rRFC822; gcortade@xxx.COM
RPFD:<gcortade@xxx.com>
H?P?Return-Path: <g>
H??Received: from crazyslagelse.dk (c-24-19-194-71.client.comcast.net [24.19.194.71])
      by dmz-2.xxx.com (8.12.11/8.11.2/SuSE Linux 8.11.1-0.5) with SMTP id i3TGg0KX011085
      for <gcortade@xxx.com>; Thu, 29 Apr 2004 12:42:14 -0400
H??age-Info: XwjaD792yajPTUii/fxsOdtxSJtPSIhhjGYMoumCKBbay0N
H??Received: from father-jy055.stitch.ccbeverages.com ([20.89.60.131]) by r24-r45.ccbeverages.com with Microsoft SMTPSVC(5.0.2195.6824);
H??From: inger Hershberger <hgdaptn@crazyslagelse.dk>
H??To: gcortade@xxx.com <loose>
H??Subject: Vicodin and Xanex early
H??Date:  EST
H??Message-ID: <951858064369909755520979.13.9398029@hard-ysb08.ccbeverages.com>
H??Mime-Version: 1.0
H??Content-Type: multipart/alternative;
      boundary="--ctnyjqe_1075936822"

Then the actual D file from mqueue...

----ctnyjqe_1075936822
Content-Type: text/html;
Content-Transfer-Encoding: plain

<HTML>
<br>
Buy Meds at 80% off, $99 V1codin Special
<br>
Vic0din, Hydrocod0ne, C|al1s, V1agra, lev1tra,
Lipitor,Xanax, and
so much more.
<br>


<a href="http://goldgripgirl.com/?d=house&a=boznew">V1sit Our Website </a>

<br>
 


No Prior Pres.cription needed <br>
No Appointments <br>
No Waiting Rooms <br>
No Embarassment <br>
Private &Confidential <br>
Discreet Packaging <br>
HUGE SAVINGS <br>


<br>
<br><br>
<br>
thumb send limit town open sail natural out paint plate camera see nose sail send comb dead advertisement library bath pin other bone ill dress rest sun regular view much with stiff rice laugh self even comparison road experience silver fold ink use this now ray liquid slip relation walk education salt water great carriage plough ball important finger yesterday watch rub doubt bed come wrong general shut rice bag nail knowledge little like form army observation space distance drop committee medical wire natural tax name collar </HTML>

----ctnyjqe_1075936822--

They always are the same spam email, but they are always from different IPs and domains.

If I copy the email (D and Q files) from my badmail directory back to my mqueue directory... bam 451 error and the queue stops!!! HEELPPPP
Could you send a qf/df pair as attachments for one of these (un-altered) to jlevie@experts-exchange?
Avatar of blohrer

ASKER

Sent
I got your mail message, but can't look at the data because it is TNEF encoded. And I probably should have been more specific as to what I need done. It is essential that I have a copy of the qf/df files that haven't been altered in any way. Simply copying those to another system may change the content, so what needs to happen is that you create a tarball or zip archive on the SuSE system containing those two files. Then send that archive to me as a standard MIME/base64 encoded attachment (not a 'winmail.dat' TNEF encoded attachment).
Avatar of blohrer

ASKER

Ok I haven't been able to get you the files, but we have had discussions with our spambox vendor.  Basically they are saying that these emails are coming in to the spambox.  The spambox generates the 451 4.7.1 error because the "Content-Transfer-Encoding: plain" causes a format error.  

This is then supposed to signal the sending server that there was a problem and to retry again later to see if the problem can be corrected.  They don't feel that this should stop our queues.

I think when our sendmail dmz machines get the 451 from the spambox, they then consider the spambox down and stop forwarding the email.

So my questions are:

1) Is there a way to filter out mail that may have format errors like "Content-Transfer-Encoding: plain"?
2) Is there a way to just drop such email automatically?
3) Is there a way to force to queue to just continue sending?

The only work around I could think of, and not being a Linux guru I don't even know if its possible, it to create a job that fires every 5 minutes that basically does sendmail -qRourdomain

While "Content-Transfer-Encoding: plain" is in violation of RFC 2045, I'd consider an application that fails to deal with that construct (like a spam filter) to be seriously flawed and in need of fixing or replacement.

I can't say why the queue processing stalls when this happens, but now that we know it is the spam box that's at fault (which was what I suspected to begin with) I'd suspect that it is what is causing the queues to stall. I'd want to see a sniffer trace of all SMTP activity between the two systems when this happens or a full debug trace from sendmail that covers a failure.

Yes, it would be possible to filter these out via a modified copy of an anti-spam filter for sendmail, like MailScanner, MIMEdefang, or AMaVis. But then wit one of those in place you may not need the spam box at all. It would take something like that to deal with these because sendmail doesn't look that deep into a message as it handles it.
Avatar of blohrer

ASKER

I agree with you completely.... :)

I just can't figure out why if we are at the same level sendmail as the spam box, why we don't kick the 451 error back to the spammer.  I would much prefer at the front door to kick the email, then inside!!!

My gut says the 451 basically tells my dmz to stop sending to that host (the spambox) because there is a communications error, and try again later.  Since the only host that the dmz machines deliver to is the spambox, everything stops.



I'm guessing that when the spambox sees that message it goes into some sort of lockup mode w/respect that server and keeps returning a 451 for connection attempts for a while. A debug trace for sendmail or a sniffer trace of SMTP activity when one of these failures occur would prove or dis-prove that.

Well, yes, employing an anti-spam filter on the DMZ servers would allow you to deal with this sort of thing "lower on the food chain", but it shouldn't really be necessary if the spam box was behaving in a sane manner.
Avatar of blohrer

ASKER

Well what we ended up doing, from a suggestion from the spam box company.  On our DMZ sendmail boxes we commented out the following line in our sendmail.cf

O HostStatusDirectory=.hoststat

They said that this would then only cause that email to stop, but not the whole queue.  They said that with the above line, the queue was noting that the host was unavailable and was stopping.

We are still getting the email.  The spambox company says yes they are generating the 451 because the email in question has this set "Content-Transfer-Encoding: plain" which is causing a format error.

Now, since only those emails are stopping, but the rest are going life is better, with only having to occasionally go in an clean out all the 451 error emails.

 
While I understand why the spambox folks would want to treat "Content-Transfer-Encoding: plain" as a format error, the more reasonable approach, in my opinion, would be to accept the message and scan it as it had "Content-Transfer-Encoding: 7bit". I think that's what MailScanner oand some of the anti-spam solutions do.
My two cents on this issue.

I had a similar problem and it turned out to be the amavis back engine which
had a limit on the number of attached files that it can process. When I increased
that limit the messages went through.

It might be that the spambox has a similar limit and it refuses to process a message
if it contains many attachements or tries to expand deep ziped files especially if it
performs virus checking too.