Link to home
Start Free TrialLog in
Avatar of hypercube
hypercubeFlag for United States of America

asked on

the specified network name is no longer available

I am running redundant backups on 3 Windows 7 Pro computers located at 3 different sites over MPLS.  Call them sites 1,2 and 3.  Each computer backs up the same files from the originating computers (located at all 3 sites) in sequence; i.e. there is generally no overlap between backup jobs on any one computer and/or on the MPLS.

The backup computer at Site 2 was recently changed up from XP to Windows 7.
This ONE SITE backup computer, before Windows 7 and now with it, occasionally generates errors:
"the specified network name is no longer available"
That is, it only happens at Site 2 and I've only seen it happen on backup jobs from Site 1 (Site 3 has but one small backup job so the opportunity to see this is much more limited).

This happens in the middle of a backup and the rest of the message makes it obvious that the individual filename is known.
None of the other backup computers has this kind of error.
Another site, which has to back up the exact same files from the same source computers, has none of these errors.
The site with errors is geographically further away but the MPLS is all on fiber.

All of the computers are accessed using UAC so no name service at all; i.e. \\[ipaddress]\....

I very much doubt that there are any settings that could be affecting this.  I'm rather left with wondering about hardware - thus the recent swap-out of the computer.  Yet these errors continue to occur on occasion.  

I'm looking for good ways to narrow down the possibilities.
The hardware chain is:
Site 2
- backup computer
- local switch
- main switch
- MPLS router
MPLS fiber system
- at least one switch
Site 1
- MPLS router (which also serves the Site 3 backups without errors)
So, everything is common with Site 3 at this point.....

Not that I suspect the hardware necessarily but where else to look.
And, how to look?
I can run Wireshark on the Site 2 LAN switch ports ........
Avatar of Aaron Tomosky
Aaron Tomosky
Flag of United States of America image

Ill take some shots in the dark here:
All layer 2 switches? No routing/vlans/different subnets between sites? Maybe the arp tables got full.

You using crash plan or something else's fw for this sort of thig?
Avatar of hypercube

ASKER

There are no VLANs as such.

There are separate subnets at each site with routing to connect them.  No problems there that I can see.  All of the connectivity is working except for these transient occurrences that likely have nothing at all to do with routing.

No using Crashplan.  Using Carbonite *separately* so this is not part of the solution.
This is a versioning backup system based on Second Copy where each site maintains separate backups of everything for geographical dispersion of backups.

My interest is in the networking / file sharing failure itself.
All of the switches are Layer 2 switches unless there's anything different buried in the MPLS links.
I've had to do this for win 7 before. It's like the cache fills up and new connections get denied for no good reason.  Reboot would always fix it but this change made it always work
http://dbastas.blogspot.com/2012/05/optimize-windows-xp-and-windows-7-for.html?m=1
may be worth checking into the known issues with SMB2 since the new version came out.

XP uses SMB and windows 7 used SMB2. the two are known to cause conflicts so It's worth considering disabling SMB2 across the board to see if it helps.

http://support.microsoft.com/kb/2696547


If that doesn't work I'd check the event log on sending and receiving PCs, and also leave some pings running so you can rule out connection issues/drops.
aarontomosky: those regedits had already been made on all the computers - sending and receiving.

totallytonto: I found a hotfix for Windows 7at http://support.microsoft.com/kb/2792026/en-us to get around freezeups - which I can't confirm happen but .. could be.  But, the same thing was happening under Windows XP on the receiving end.  So I wonder....

Disabling SMB2 is not recommended except for debugging...   ???
Kinda scary thought, but this whole thing may be resolved when the other clients are also running win 7... or it could just make it worse. Any way you can think of testing this out while keeping a way to revert?
Well, at this stage *most* of the computers are running Windows 7... but I can't guarantee the correlation at the moment.

"While keeping a way to revert..."?  Do you mean turning off SMB2?
I mentioned XP because it doesn't run SMB2 as far as I know....
I meant maybe setting up a win7 box to replace an xp box while keeping the xp box around in case you have to switch back, but I don't think that sounds like an option since these are upgrades.
Well, actually the XP boxes remain on hand.  The one backup "receiver" was changed out from an XP box to a new Windows 7 box with no improvement in this situation.  So, I don't think that going back is likely to help.
what are you using to do the backups? robocopy?
This is a versioning backup system based on Second Copy
it's the XP machines that suffer when you have a mixture of XP & 7 using the same shares, due to SMB2 and how it accesses the files.

once you get rid of all the XP machines it should stop being an issue, but you may have to disabled SMB2 until then.
This has boiled down now to the following:

- all of the machines/sites doing backup with Second Copy are Windows 7.
- only ONE folder on an XP machine fails to backup with the reported error and this is at another site.
So, out of around 20 backup jobs repeated at 3 sites for a total of 60, there is but one job out of the 60 from one source at one remote site backup that is doing this now.
And, now it seems consistent - although I somewhat doubt that it will always fail based on past experience with it.
The same source going to local and another remote backup works fine.
The Event Viewers show nothing.
Pings work consistently.

Here is the topology:

Source Computer<>switch<>switch<>switch<>MPLS router<>MPLS router<>switch<>switch<>Backup computer.

The topology is the same for the other remote backup that works except the
MPLS router<>switch<>switch<>Backup computer
at the end are different devices.

These observations suggest that all is fine at the source and all is fine at the backup device otherwise.

And, to repeat, 19 of 20 backups at the "failing" site still work consistently.  It's just this one.
And, to repeat, the same "failing" source/folder works on the other backups.

I might add that the failing source folder is relatively large but not the largest.
have you left a ping running during the backup to see if a brief network drop causes failure?
otherwise I'm not sure what else to suggest other than checking the event viewer for any errors around the time of the failure.
Do these run on a schedule? So it may not be the folder specifically, but the time it runs? Look at what else it happening at that time
Yes, it runs on a schedule.  In this case it is launched at 4:30 a.m.  I don't see anything else running at the same time.  The backups are staggered in time to not overlap.
so my question is: is something wrong with that folder, or is something happening around 4:30 that will mess up any backup happening at that time...
If there were something wrong with the folder then why might the other 2 backups of it work fine?

There is nothing I know of that's happening at 4:30 that would mess it up that I know of.
And, if fails if I start it manually at any time.
Here's a thought: move the schedule around and see if whatever runs in that timeslot fails or if the pattern changes
Isn't that what the manually-launched backups do?
I've seen something similar to this and it was caused by network issues (packet loss/delay)

Did you try leaving the constant ping running to see if any blips occurred around the time it failed?
Is the VPN expiring and reconnecting at that time?
There is no VPN.  Simple, bare MPLS.
Can you run pings overnight (at least around 4:30) from each site to both other sites, and something else on the Internet somewhere? Lets see exactly which links go out.
Yes, I have a "probe" program that will do that (ping continuously) and log "n" misses with a trace route.  This way a contiguous set of misses can be seen and the results of a trace route at that time can also be seen.

But, again, this failure is not connected with 4:30 because it occurs *whenever* I run the backup between these two machines and on this one source folder.  Otherwise the source folder works fine with the same backup setup on 2 other backup machines.

It seems strange that there would be an outage on this one site while doing one particular backup out of 20 backups.

I have mapped the source folder so the path length is shorter.  But path length issues are normally reported as such and aren't mysterious.  And, that's not what I'm seeing here.
Sorry, I thought you said the manually backups for this folder succeed, I see I read that wrong.
I think I may still be unclear on where the files originate and where they go:
site1 -> site2 all 20 backups work
site2 -> site3 all 20 backups work
site3 -> site1 19 backups work (the one fails no matter when it's run either on a schedule or manually)
is there a site1->site3?

so this problem folder, where does it originate?
SOLUTION
Avatar of hypercube
hypercube
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't think I've ever seen that before... I wonder if it has to do with that "i've been downloaded from the internet" flag on files, not specifically the extension.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, the error that's generated is likely different than what I'd been seeing here.

However, I *have* seen cases where files could not be backed up and this may be the answer to that issue.  In that case the error was different.  Usually the "owner" just deletes the files altogether to get past it.  I try to have a backup setup where no errors occur to avoid follow-ups.
I have now compared packet captures between successful backups and ones that don't work.

In the ones that fail, I see references to NetBIOS name service.
In the ones that succeed, I see NO such references.

Well, there is no name service between subnets.  All of the addressing here is with UNC.
So, that seems a key difference and must be a clue. But I have no idea why that would happen.
do you use netbios names in your unc? i.e. \\server1\share2 or do you use dns in your unc like \\server1.company.com\share2
No NetBIOS names as there is not site-to-site or subnet-to-subnet name service.
By UNC I meant:
\\[ipaddress]\.......
e.g.
\\10.111.1.213\sharedfolder
are all the backups still working since you manually copied that "bad" file a while back?
Mostly.  I have a case of the ScanSnap .pdf files that doesn't work and I'm keeping the situation just that way to be able to work on it.

The user just told me that *all* .pdf ScanSnap files do this.  That at least makes more sense than "some of them".
Could be they are written using a service that runs as network service user or something funny. Back to my thought its some type of weird permissions related thing.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't think we know beyond what I was able to do to change the behavior.