this may be something some of you guys stumbled across.
i worked with postfix for years but either did not or can't remember.
here are the symptoms :
some of the mail we receive take several minutes to transit
when i look in the logs, i see :
- a normal reception ( connect + cleanup + qmgr ),
- then NOTHING happens for about 5 minutes (no log line contain this message id),
- then a second "qmgr...queue active" line is printed (duplicate of the previous one except for the date)
- then i see normal delivery lines
Mar 31 16:24:54 vm-sm-prxmail01 postfix/screened_submission/smtpd: 69FDC1A8: client=HOSTNAME[ADDR], sasl_method=PLAIN, sasl_username=SENDER
Mar 31 16:24:54 vm-sm-prxmail01 postfix/cleanup: 69FDC1A8: message-id=<005c01d06bbe$69ed6bf0$3 dc843d0$@fr>
Mar 31 16:25:04 vm-sm-prxmail01 postfix/qmgr: 69FDC1A8: from=<SENDER>, size =640961, nrcpt=1 (queue active)
Mar 31 16:30:03 vm-sm-prxmail01 postfix/qmgr: 69FDC1A8: from=<SENDER>, size =640961, nrcpt=1 (queue active)
Mar 31 16:30:04 vm-sm-prxmail01 postfix/smtp: 69FDC1A8: to=<RECIPIENT>, relay=o RELAY[172.17.20.197]:587, delay=311, delays=310/0.02/0.1/0.66 , dsn=2.0.0, status=sent (250 2.0.0 t2VETixt017803 Message accepted for delivery)
Mar 31 16:30:04 vm-sm-prxmail01 postfix/qmgr: 69FDC1A8: removed
A BIT OF INTERPRETATION
looking at the delays (in the smtp client's log line), it would seem that the reception of the data took 5 minutes, and the emission was short
... which would be consistent if the first "queue active" line was not present at all
looking at the peer's logs, it would seem like postfix never attempted to deliver the message during those 5 minutes
which would also be consistent with a long reception duration
( i'm not knowlegeable regarding that server and cannot play with as much as i'd like to, so i'm unsure when the first log line occurs )
the mail is not very big, but it is sent using a smartphone over a mobile network (3G at best and probably 2G or edge)
which would also look consistent
unfortunately i cannot activate debugs easily or the server will just die
i saw the same symptom several times for different messages.
most of the messages were bigger than the average on this platform ( but small still : 1M or so tops ).
so my question is the following :
- should i assume that the data was still in transfer during those 5 minutes ? (AFAIK, clearly no)
- should i assume that the data transfer was short ? maybe the qmgr just crashed and picked up a few minutes later ? (this would be my bet, currently)
any other ideas regarding this issue ?
- postfix version is 2.11ubuntu ( with a bunch of hacks added by debian maintainers and/or canonical )
- the "screened submission" stuff is because i use postscreen in front of 587 ( only using tarpitting and very basic rejections based on dirty smtp implementations )
- the rest of the configuration is pretty straightforwards : only authenticated users can send email, a few basic checks are performed by a remote milter which answers very fast, and all email is sent to one of 2 smarthosts based on the recipient using a regular file based transport table