• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 224
  • Last Modified:

Connectivity Issue - Random Customers URGENT

Hello,

I operate a web site that is hosted in a network facility.  This web site was recently to a new server in a colocation facility 2 mo ago from a hosted provider.  Today and yesterday, I have been receiving issues with certain customers not being able to access the web site.  I am posting here in hopes that someone can help me diagnose and solve this issue, or at least suggest some new diagnostic tests that can further this issue.  Details below, thanks!

The web site is: www.SERVERNAME.com, IP: 208.64.57.47, hosted by osiriscomm.com.  After spending 20-30 embarrasing minutes working with 3 diff customers, I conclude the following:

1) They cannot reach www.SERVERNAME.com via web browser
2) They cannot ping the server (request timed out), but the IP address is correct, so DNS appears to be fine.
3) Tracert fails at their DSL router:
       C:>tracert www.SERVERNAME.com
       Tracing route to www.SERVERNAME.com [208.64.57.47] over a maximum of 30 hops:

       1     <1 ms     <1ms     <1ms     192.168.1.1 (or other router IPs)
       2     *             *           *           Request timed out.
4) User can access other sites, even web sites in the same network, such as the colo providers own website (208.64.57.136), even using IE.  Pings work to common sites such as www.google.com.
5) Use of pathping results in similar results...

Under what conditions would the router not know where to go next?  An interesting test was to determine the customer's public IP address of their DSL connection, and performing a tracert from the web server.  This worked successfully.  So there is a physical network path from one machine to the other, but the router doesn't know how to traverse this from it's side.

Any suggestions?  Anyone heard of a DSL connection selectively not being able to access a certain site?  One user was AOL, two others were using SBC Yahoo! DSL.  Users not able to access website are users not able to complete orders, so you can see why this is important.  Who knows how many others are experiencing the same issue and just not calling in the problem.

Thank you,
R. Mariotti
0
rmariotti
Asked:
rmariotti
  • 5
  • 3
  • 2
  • +1
1 Solution
 
zaneyfunsterCommented:
Any reson why osiriscomm.com aren't providing your DNS as well as hosting?
when using free services, you kinda get what you pay for..

I did however run a few DNS tests all of which looked fine...

Can they reach the server by ip address from their web browser??
http://208.64.57.47


I would be getting them to clear their local cache, delete temporary files, clear history etc...

Also a handy tip

ipconfig /flushdns


This sort of thing does happen from time to time because of caching... You probably should have left the old server running a little while longer to catch these people and redirect them to the new server...

some light reading...

http://www.maintainaweb.com/faq.asp
http://www.photosector.com/c16-Transferring+Website+Between+Servers.html

0
 
rmariottiAuthor Commented:
zaney,

Thank you for your input.  At this point, I don't think it's a DNS/Nameserver issue.  When these users ping the hostname, the ping shows the correct IP address (note the IP below):

C:\>ping www.choicelunch.com
Pinging www.choicelunch.com [208.64.57.47] with 32 bytes of data:
Request timed out.
...

I've even had them execute an nslookup command to be sure that the IP answer is right.  It always is.  Even it it were not, I've tried pinging/tracing/web browsing the IP address directly.  As for the flushdns command, I hate to say that I tried that a while back as well before I had pretty much confirmed that the issue wasn't DNS-related.  

It comes down to the tracert output: under what circumstances would the tracert command timeout after 1 hop to the router?  The tech support at osiris is also stumped and plan to check with Yahoo to see if any issues have come up with my subnet that would cause them to drop these requests...

As an aside, I continue to use zoneedit because it has been a reliable nameserver for some public sites that I host at my office location and it's just easier to maintain everything in one spot.
0
 
MitchV85Commented:
did a bad static route get configured in the DSL routers? Because if it can get to that subnet then it should be able to get to that host unless there is a specific entry in the DSL router to send anything for that host to a "black hole".
0
Transaction-level recovery for Oracle database

Veeam Explore for Oracle delivers low RTOs and RPOs with agentless transaction log backup and transaction-level recovery of Oracle databases. You can restore the database to a precise point in time, even to a specific transaction.

 
zaneyfunsterCommented:
rmariotti,

You didn't answer one thing though...

From their web browser, can they see your site by ip address???
http://208.64.57.47

From command can they ping by ip address
and
From command can they tracert by ip address?

You have only said they can't ping by URL which resolves correctly and therefore DNS is fine..
0
 
zaneyfunsterCommented:
sorry, yes you did answer that question....

hmmm thinking thinking
0
 
adamdrayerCommented:
The routers should be configured to send all of its external-bound traffic to its gateway, which is normally the ISP's router.  The ISP's router seems to be refusing traffic destined to for that address.  It could be an incorrectly configured BGP, a poisioned route, bad cache, or possibly the IP address used to belong to a known offender and was blocked.

Anyway, there is not much you can do about that except let the individual ISPs work out what is happening.  Our you could possibly ask your provider to change your IP address and DNS entry.  See if the same site on a different address solves your problem.

If the problem is easy to replicate 100% of the time, I would contact the ISPs who are blocking with you.  Try to escalate the call as quickly as possible to an engineer.  They may give you a hard time because you are not a customer, so it's possibly you'll have to involve one of the people who are having trouble.  This will also help when you go through testing.
0
 
rmariottiAuthor Commented:
Interestingly enough, one user has reported back that after turning off/on her computer, she now has access.  The more I think about this, the more I think adam is on to something: this is likely an ISP routing issue.  If I continue to have issues with new customers, I'll have to contact Yahoo/Prodigy directly.  Any new issues, I'll have users try:

* ping directly to IP address
* ping firewall computer which is in front of web server

I'll keep this post open for a bit in case others have some ideas...
0
 
zaneyfunsterCommented:
that is interesting because that is exactly what the first of my light reading links suggested...

I love the old 'turn it off, turn it on, take two asprin and call me in the morning' solution..

99.999% of this type of problem sit with the users computer. I find it extremely hard to believe that an "ipconfig /flushdns" didn't work, but a reboot did...

with the risk of sounding like an ass, I am to believe that you did not approach this migration with a properly thought out plan, resulting in an unknown number of failed attempts to contact your site
When migrating a site, you do need to consider the fact that there are going to be numerous computers and systems working against you by way of caching, dns caching and other evils invented by isp'd to reduce their bandwidth costs. you need to expect a number of requests to go to your old server for an almost indefinate length of time, despite what ping anf tracert say.
I personally hate caching, it is the bane of my life, but we do have to deal with it...

turn off, turn on, take 2 asprin, call me in the morning
0
 
adamdrayerCommented:
I don't happen to believe that is a DNS name resolution or end-user caching issue.  Either way, the packet would have gotten farther before failing.  It seems that the traffic was being refused at the ISP's border gateway router.  Even if the router refused to report TTL-exipred info, it would have decremented the TTL and passed the traffic along to the next hop.  The display would have show a line of asterisks for the 2nd hop, but would also have shown 3rd hop info (whether asterisks or not) .  Since the results show only 2 lines of info, and the second does not display in its entirety, it is my understanding that the ISP's router is refusing to route the packet.

This would either be based solely on the network portion of the IP address or the host portion.  Since the user is able to access other websites with similar addresses, I would assume it's refusing based on the Host portion, which means that that particular IP address is being blocked or posioned by the ISP.  Possibly because it was marked as a spammer or a malicious website  in the past.

0
 
zaneyfunsterCommented:
this being the case adam, then why did rebooting fix the issue with one of this clients??
0
 
adamdrayerCommented:
That's an excellent question.  I don't know the answer.  It could possibly have been a coincidence, or some other reason. The site was actually moved 2 months ago, and client-side caching doesn't usually last that long.  And I find it hard to believe that all the users in question haven't restarted in 2-3 months.  It may be that a inquiry or service call was placed recently, and the problem was handled already by the ISP support team today.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now