Link to home
Start Free TrialLog in
Avatar of jhighwind
jhighwind

asked on

Outgoing e-mails randomly stop working throughout organization

Ok, I'm about to lose my mind trying to troubleshoot this. Hopefully, someone can help me.

Dealing with an organization that has about 20-30 users. We're using POP accounts for e-mail (which we're hosting with MediaTemple). User workstations are mostly Windows 7 Professional. Outlook clients are mostly Outlook 2010, with I think a couple of them running Outlook 2007. For the last few months, we've had an on/off problem where our outgoing emails will flat-out stop working. Messages sent from Outlook will just sit in the outbox forever (for all users). We can still receive email when this is happening, but we can't send. Here are a few other bits of information:

- Sending works just fine through the webmail option we have with MediaTemple.

- Mobile devices work fine as long as they're not connected to the wireless network at the office.

- I can configure a POP account with Outlook 2010 on a laptop, entering a user's email credentials. Sending doesn't work when I'm at the office. It does work if I take the laptop home with me. Those emails that never leave the outbox? Go home and do a send/receive from there, and they all clear out. So....without changing the Outlook configuration on the laptop, it works from my house but not at this office. Which tells me that I have Outlook set up correctly and that it's not doing anything screwy. Something specific to that site is what's messing us up.

- Other POP accounts (I set up a couple of Yahoo/Gmail ones for testing) do the same thing, meaning I can receive but can't send from those accounts (unless I either use webmail or carry the laptop with Outlook off site).

- Multiple ports (25 and 587) do the same thing.

- mxtoolbox.com is giving me green lights on everything regarding my domain/SMTP functionality. No blacklists, no DNS problems, etc, according to it. The only thing that's not green on mxtoolbox is a timeout on Spamhaus ZEN (and the Spamhaus Project website confirms that our domain is not listed with them if I go there to check manually).

- We're running a SonicWall TZ205 with all the security services enabled. I have tried temporarily disabling all the security services, which didn't help. I also even tried temporarily installing a cheap "Wal Mart special" router that has pretty much nothing for security services, and that didn't make a difference.

- MediaTemple support says they're not doing anything to block us. Considering the fact that I can send from that laptop offsite with Outlook and through their webmail even at the office, I believe them.

- We have a block of static IPs with our ISP. If I change the WAN port configuration on our router to a different static IP, outgoing email starts working again. Then this problem reoccurs at some later point (first time it fixed the problem for about a week, this last time it fixed the problem for about 1-2 months). Stopped working today. I'm going to have to go back over there in the morning and go to the next IP up...

- No obvious signs of malware (my first thought was possibly someone on the network had a mass mailer that was causing our ISP to block us, but again, no obvious signs and nothing from any of my AV software, and again, our ISP denies any such blocking).

- ISP swears up and down they're not doing anything. They recommend changing my port number in Outlook SMTP settings to 587, which doesn't help. I've also tried even changing to port 465 (SSL) in Outlook, with no luck. But yeah, beyond that, conversations with the ISP's support are just becoming a "nope, it's not us" roadblock.

- Telnet testing doesn't work - I can't connect to anything on port 25/587 from that site.

I'm running out of ideas and would really like a permanent fix. I'm really thinking my ISP is doing something strange and that maybe I need to push them a little harder to figure something out, but it'd be nice to see if anyone else has any thoughts on something I could've possibly overlooked.
ASKER CERTIFIED SOLUTION
Avatar of Carl Dula
Carl Dula
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jhighwind
jhighwind

ASKER

While I have tried just using the "Walmart special" router in the SonicWall's place (problem continued to occur so I switched back to the SonicWall), I have not tried just plugging a laptop straight into the WAN port on my ISP's box. Will give that a shot today.

I have checked the SonicWall's logs while the problem was going on and didn't see anything that looked bad.

Have not run the diagnostics on the SonicWall. Will do that today and post results.
Update: I stopped by this morning, and the problem apparently resolved itself overnight. Outgoing mail stopped working at around 4:30pm yesterday. The last user left the office at 6:00pm and reported that it still wasn't working when he left. Users started showing up for work at around 8:00am this morning and noticed that those emails they all had piled up in the outbox had gone out.

I guess I jumped the gun by posting something here, but the thing is that this problem has happened before (and I honestly expect it to happen again when whatever random thing triggers it again), with the exact same symptoms and the exact same test results. This is the first time it has just worked itself out overnight. All the last times this problem has happened, I had to change our WAN IP address to the next one up (we have a block of 5 with our ISP, and I'm up to about the 3rd one now).

I can tell you one other note about the telnet testing: while the problem is occurring, I can't telnet to anything on port 25. I just get the "couldn't connect to _____________ on port 25, connect failed" message. When the problem is _not_ occurring, I can connect to that same IP address on port 25 and go through the full list of SMTP test commands.
What you describe is symptomatic of an issue with your email provider, or an RBL list. Many RBL's lists are dynamic in that you can be put on it for hours, then removed, without any action from you.

The only way to find this is to perform the tests I suggested, but also check all RBL lists while it is happening.
Carl,

I'll try the tests you suggested when the problem occurs again. For now, there's no telling how long that may take, so I'm going to go ahead and accept your answer + close this.

Thanks for your advice.
OK. Problem happened again. I'm new to Experts Exchange and not sure if I should just update this or make a new topic.  Here are results of the testing Carl suggested:

Unplugged WAN cable from our SonicWall and plugged directly into a laptop.
- Can't send email.
- Telnet to smtp.secureserver.net on port 25 gives me the following:
421 p3plibsmmtp03-06.prod.phx3.secureserver.net bizsmtp temporarily rejected. Reverse DNS for (our WAN static IP here) failed. http://x.co/srbounce

That IP it's showing is our WAN IP. We have reverse DNS entries that point mail.ourdomain.com to MediaTemple's MX servers (and that passes on MXToolbox). Are we being blocked because something is checking the originating IP here?

Plugged back into SonicWall. Tried the telnet test. This time, it just sits at "Connecting to smtp.secureserver.net and never goes beyond that (been sitting there for about 5 minutes now).

MXLookup / banner check against our domain name resolves to mail.ourdomain.com with Media Temple's MX server. Banner received shows symbols (�Ҽ )

Blacklist check against our domain on mxtoolbox still shows nothing (and problem is occurring right now). If there's another one I can check, I'm all for trying.

Users can still send and receive fine through webmail, just not through their Outlook clients here at the office (one guy who works from home and has an Outlook client set up there is working fine with both sending/receiving right now). The users at the office are all just using webmail right now until I can figure out what's going on.
Also, I changed the static IP in our SonicWall's WAN configuration to the next one up in our list, and outgoing email is working again.

So I like to think I have sort of a decent understanding of how mail flow is supposed to work, but this whole thing makes absolutely no sense to me.
Since this item is closed you should probably open a new one to continue.

Regarding the "421" mail error, according to the error list you posted this says the mail server is too busy. If this is the reason that email is being rejected in the first place, and causing the real problem, then you should contact Media Temple and tell them what is happening. One thing you could try when it is working, is to connect to the WAN (ISP router probably has a port you can use) with a laptop and try sending mail using telnet, to verify you do not get that same error.

When you say the webmail is still working, I assume the webmail site is at Media Temples site and not yours. GIven that, it explains why it would continue to work.

Reverse DNS should not be an issue since you are trying to send mail from the same place (ip) you would normally.

Do you use SPF records?
This is an old topic that's already been closed, but I thought I'd add an ending to it just in case anyone stumbles across it. I believe I was correct in my first assumption that a system on this network had some malware, which was attempting to send out mass emails and getting us blocked. Which kinda lines up with what carl said, that maybe we were temporarily getting put on RBLs as that system was sending out its mass emails.

The user mentioned that he'd been seeing some weird behavior with Outlook, like massive amounts (as in hundreds) of those "delivery status notification: failure" messages showing up in his inbox, regarding recipients he'd never attempted to send to. While I couldn't find anything suspicious through malware scans or otherwise on that machine, I ended up reformatting it shortly after, and the problem hasn't returned.

The tricky part is that Media Temple will actually block incoming/outgoing emails (from our WAN IP) if a user somewhere accidentally enters an incorrect password more than once. So once or twice, while I had users all running through webmail, there were a couple of incorrect password entries that caused our email to stop working and get us blocked. So between that + the infected machine sending out junk at random times, it made this kind of difficult to pin down.