We help IT Professionals succeed at work.
Troubleshooting Question

Servers unable to UNC to each other but can to different servers

106 Views
Last Modified: 2020-10-06
We have several Win2016 servers.  In particular a SQL and a Web.  What we are seeing is that after some arbitrary number of days the web server can no longer make any queries.  No sql backed application will run without error from the web server.  The web server is unable to \\unc to sql and sql is unable to \\unc to the web.  But both can \\unc to other servers just fine.  The message in the web event log is "A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available."  If I change the connection string in the web.config to our development sql everything works.  SQL and Web can ping each other by name and by ip address.  Both have host file entries pointing to each other.  Neither is running a firewall.  Nothing changes if we reboot the web server, but, in the past rebooting the production sql server does allow them to talk again.  This is not a good fix.  Any clues?  Again, they do talk, then they just stop.  
Comment
Watch Question

Paul MacDonaldDirector, Information Systems
CERTIFIED EXPERT

Commented:
Is it possible the DNS entry for the SQL server expires (is scavenged)?  Check DNS the next time this happens, if possible, and/or do a ipconfig /registerdns on the SQL server.
Scott PletcherSenior DBA
CERTIFIED EXPERT
Most Valuable Expert 2018
Distinguished Expert 2019

Commented:
Also, verify that the "Shared Memory" and "Shared Pipes" protocols are enabled for the SQL instance(s).  I can't remember for sure which one it is that SQL might use for those types of connections.

Author

Commented:
It is happening right now and will be until I reboot our production sql.
Under "SQL Native Client 11.0 Configuration (32bit)" Shared Memory, TCP/IP and Named Pipes are enabled.
Under "SQL Server Network Configuration" Shared Memory and TCP/IP are enabled, but Named Pipes is disabled.
In DSN there is only one entry for our sql server.  The timestamp on it is 4 days ago.  (But a lot of things are older than today.)
The problem is not general.  Is is very specific.  We run some 40 database off this thing and only one server is unable to connect to it.
Paul MacDonaldDirector, Information Systems
CERTIFIED EXPERT

Commented:
"We run some 40 database off this thing and only one server is unable to connect to it."
That wasn't clear to me in the original post.

You say there's no firewall on the destination server, but is there any sort of intrusion detection software?  Anything that might selectively block the source server?

"SQL and Web can ping each other by name and by ip address."
Is this true while the problem is active?  Also, can you map a different drive letter to the same destination while the problem is active?

Author

Commented:
Sorry for the confusion.  Yes, this sql server is answering other servers and desktop clients just fine.  It is only blocking a particular machine.  (That makes this real fun.)
The Windows firewall is disabled across all types, public, private, domain on both sql and web.  There is no 3rd party scanning done on either of these servers.  They both have the Windows Defender, but they all do.  SQL has lots of exceptions, but that is for performance reasons.  It is exactly like something monitors communications and eventually decides that it doesn't like the other.  I just can't put my finger on it.

From SQL, via explorer I can't map by name or ip, but can "net use" by IP Address, but not name.  The new map does show in Explorer and is browseable.
I can open, modify and close a file via the map.  Fast.

From Web, via explorer I can't map by name or ip, but can "net use" by Name AND IP Address. The new maps do show in Explorer and are browseable.
I can open, modify and close a file via the map.  Crazy slow lag. Notepad goes "Not Responding" but does eventually create or modify a text file from both maps.
E C
CERTIFIED EXPERT

Commented:
Some questions:

- Do either/both of these servers have multiple network adapters? If so only one should have a gateway and they should be the same gateway.

- Are both servers on the same subnet?

- Is either server using DHCP (hopefully not) and if so, are they both using the same DHCP server?

- Is your DNS server active directory integrated?

- Are both servers on the domain?

Author

Commented:
Only one pic per server.  (These are VMs.)  They both have OS static ip addresses, same vlan, same gateway, same subnet, same domain.  

Yes, our DNS is Active Directory-Integrated.  Replication : All domain controllers in this domain.

They have been working together for a long time but something recently is blocking communication.  It is only these 2 VMs.  I restarted sql this weekend and all is working again, but it is a ticking bomb.  It will break again.  It would seem that it is something on the sql server that is amiss since restarting it helps.

Author

Commented:
More fun.  After the sql restart the web started working again, but now another applications server has stopped talking to the sql server and vice versa.
E C
CERTIFIED EXPERT

Commented:
Is it possible you have IP Address conflicts?
Or, maybe your Servers have static IPs but you have a DHCP server that is handing out the same IPs?

Author

Commented:
IP addresses for this machines are all static and excluded from our DHCP ranges.
I restarted SQL again this weekend to get it talking to a different server.  I really don't want to rebuild my sql server.
Paul SolovyovskySenior IT Advisor
CERTIFIED EXPERT
Top Expert 2008

Commented:
Since it's a virtual machine you may want to verify that VMware tools are installed and the proper network card is used.  I have seen many strange issues without the appropriate drivers.
Steve KnightIT Consultancy
CERTIFIED EXPERT

Commented:
Some quick thoughts, don't think these have been mentioned above yet?  Is this the same for any user.  If you make / take a different admin user account and logon to the server is it able to UNC to the other box,either after logging out the original one or both.

Have you tried killing explorer.exe in task manager and then run explorer.exe again?

Is the SQL server instance running as any user or as Local System

Author

Commented:
The network drivers has been the same for a long time now and I'm not sure they would cause an intermittent problem like we are seeing.  But I'll give it a look.

The servers login as the same user.  Sql is run as Network Service or Local Server.
Right now the systems are talking just fine.  Which is weird and just support the erratic behaviour.  Restarting Explorer is a new idea.  When it happen I try to find things to do that do now power down the network or the server.  Restarting Explorer is worth a try.

Michael B. SmithManaging Consultant
CERTIFIED EXPERT

Commented:
Is port exhaustion a possibility?

Author

Commented:
Port exhaustion as in too many packets too quickly?  Might occur momentarily, but once this event occurs it stays broken until the sql server is rebooted.  It is also specific in its denial.  Other machines are unaffected while one is blocked.  While we are working fine now, there are two servers, plus the sql server, that this issues rotates between.  Each of the VMs is a clone of a standard installation.  So NIC and drivers are the same across the board.

Author

Commented:
It has done it again with yet a different server.  I have logged out of the affected server and logged in as a different admin level user.  I have a test app that does a simple query.  It still fails to connect.  All network drivers are the same.  Host file is empty.  Each can ping the other.  UNC is either extremely slow, minutes, or just fails to connect.  The blocked server can nslookup and get a response.  The SQL server fails an nslookup with a timeout.  DNS has entries for both servers.
Paul SolovyovskySenior IT Advisor
CERTIFIED EXPERT
Top Expert 2008

Commented:
It sounds like you have a DNS issue or something esoteric such as legacy WINS impacting communication.  Just a hunch but also check DHCP make sure that the IPs aren't being given out to other systems.  Check dropped frames on switch as well
Steve KnightIT Consultancy
CERTIFIED EXPERT

Commented:
What about \\server.domain.local\share ?

Author

Commented:
All UNC is timing out.  I checked DNS again.  Both servers have records, forward and reverse.  SQL was up to date.  The other was 4 days old.  I deleted it and added back.  No effect.  Both as static ip addresses with exclusions in dhcp.
We are putting as sniffer on each in hopes of getting more information.
This problem has been solved!
(Unlock this solution with a 7-day Free Trial)
UNLOCK SOLUTION
Paul SolovyovskySenior IT Advisor
CERTIFIED EXPERT
Top Expert 2008

Commented:
I've seen this type of issue with the E1000 or flex controllers as mentioned before, the VMXNET3 controllers are usually a good option to try.  In a 3000 VM environment we had the E1000/flex nics go down from time to time and changing them out to VMXNET3 would usually resolve the issue

Author

Commented:
We have always used the VMXNET3.  Found out there were more compatible with our hardware right from the beginning.  I'm just glad I didn't have to stop production, but was about to.  What I don't know is, what abstraction layer was refreshed to make it work again.  Maybe there is a way to refresh it on some schedule.
Seth SimmonsLead Systems Administrator
CERTIFIED EXPERT

Commented:
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I have recommended this question be closed as follows:

Accept: 'tmaususer' (https:#a43153826)

If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.

seth2740
Experts-Exchange Cleanup Volunteer