Link to home
Start Free TrialLog in
Avatar of bjblackmore
bjblackmore

asked on

Can't Connect to UNC Path on Windows SBS 2003

I have a Windows 2003 Small Business Server SP2, that has been running without issue for the past 2 years. Last week we started using the Exchange 2003 functionality, (previosuly we had been using hosted Exchange). Since then we've been suffering from a very slow network. I initially thought this was due to the mailbox data which we exported from the online hosted Exchange, then imported into the users Outlook profile. However, all data has now synchronised between the users Outlook, and the Exchange server, but we're still suffering from a slow network.

We only have 25 users, running Dell OptiPlex 755 desktops, the Server is a Dell PowerEdge SC440, with 2GB ram, and 500gb hard disk, it has 2 network cards, one for internal LAN, and one for external internet access. We use the SBS server as a firewall, which only allows Outlook Web Access & VPN in.

We have 3x 24 port switches, 2 of which are 10/100mb, with 2 gigabit uplink ports, the 3rd is a full gigabit switch. The 2x 10/100 switches are connected to the gigabit switch via their gigabit uplinks.

Everything seems to run fine for an hour, then suddenly mapped drives won't open, files that are open on mapped drives won't save, and we can't browse the server. However, we can ping the server IP address, and it returns the ping. We can ping via IP & name. We can use the internet, which runs through the SBS server with out any problems.

I've tried switching cables over, updating NIC drivers, the LAN cards are configured with DNS correctly - the server uses its own IP & loopback, and the clients use DHCP which sets the server IP address as the primary DNS server.

What would cause UNC paths to fail, while PING & internet acess still works?
Avatar of Ady Foot
Ady Foot
Flag of United Kingdom of Great Britain and Northern Ireland image

Hi there,

From my own personal knowledge I believe the Dell SC440 is an entry-level server and, most importantly, has only desktop level disk i/o on the motherboard.  I think the problem could be that your server isn't up to the job I'm afraid.  Please can you confirm that your server doesn't have any add-in RAID card etc and please could you tell me the disk makeup?

Regards,

Ady
To add some more meat to what I just said; adding Exchange would be the catalyst for your problems because Exchange is known to be a very disk intensive application.  So if the disks are being hammered by the introduction of Exchange you might experience problems browsing to shares etc due to disk timeouts.  This would also explain why the internet works (internet forwarding doesn't require disk access).

Regards,

Ady
Avatar of bjblackmore
bjblackmore

ASKER

Hi,

Thanks for the reply.

Yes the SC440 is entry level, but we are only 25 users, and looking at the performance monitor, disk I/O Reads & I/O writes aren't moving very fast, and the disk lights barely flutters. The 2 disks are both SATA, with one mirroring the other. Also, when UNC paths die, Outlook continues to remain connected to Exchange, with emails sending/receiving without issue.

Ben
Oh OK - it's not that then :-)

It might be that you're suffering from spammers using your server as a relay for their spam.  This could be using up your system resources.  It should be pretty evident if this is happening.  Please check the following kb article to check that this isn't the case:  http://support.microsoft.com/kb/821746

Regards,

Ady
One of the first things I did after setting up exchange was to test for open relays using http://www.mailradar.com/openrelay and http://www.spamhelp.org/shopenrelay/shopenrelaytest.php, which both report that the server is not acting as an open relay. The only IP addresses allowed to relay through the Default Virtual SMTP server are the SBS servers internal & external IP address.
OK - good stuff.  So that, too, is not the problem.  Hmmmm....

It could be that a service is failing somewhere along the lines.  Have you checked your server application logs when you have restarted after the problem has occured?  This may explain the problem.
It doesn't look like any services have failed, and there are no serious application/system errors in the even logs, just a few about Terminal Services (remote console session) not being able to redirect to a printer port.
OK - this is slightly baffling.  Lets try and narrow this down somewhere.  

From a fresh reboot of your server you notice that everything works perfectly well for about an hour and then UNC path access fails.  You are able to ping the server and internet access works.  A couple of questions spring to mind at this point.  Firstly is the timing always the same at around the 1 hour mark?  Secondly when you ping the server can you ping it by name and by IP address?  Thirdly are you able to fully access the internet or just ping out to internet-side IP addresses?

At the stage that this problem occurs you check the event logs and performance counters and find that all seems well?  You reboot the server and so the cycle continues.  Have you ever left the server running to see if the problem appears to clear itself up?  Does the problem occur from every workstation or just a few?  If it's just a few is there anything that might explain why?  I.e. network infrastructure?  Once the problem has occured are you able to logon to the server locally and perform tasks or is it sluggish and unable to perform some administrative tasks?

I'm determined to resolve this for you!

Regards,

Ady
Ady, thanks for taking the time to keep replying!

I've not timed each event when it fails exactly, but it seems to be around the hour mark. When it does fail I can ping both IP address & hostname. Internet access seems to be fully functional. I've been on the server when users reported it failed, and performance seems fine, opening applications, or MMC consoles are fairly quick, nothing seems to be spiraling out of control on the performance counters. Task manager shows network utilization is about 1%, and the disk I/O is slow and steady.

I've not tried leaving the server to see if it fixes itself, trouble is, its a production server, and when it goes down, it stops 25 people from working, including the company MD. As I said, when it fails, ALL 25 users are affected, even though they are spread out over 3 24 port switches.

As far as I can see, this narrows it down to:
a) The server NIC - which I'll be replacing first thing in the morning.
b) Some kind of DNS issue, although if it were, I shouldn't be able to ping by hostname, and there are no DNS entries in the eventlog. The server is a domain controller, running DNS, and the clients are using the servers IP address as the primary DNS server - all fairly standard.
c) One of the switches - but I'd have thought only the users on that switch would be affected. Plus, the full gigabit switch is a brand new switch, which I brought today after this issue started. So even if the other 2 are faulty, the users connected to this switch, which the server is also connected to, should be able to access the server without issue.
ASKER CERTIFIED SOLUTION
Avatar of Ady Foot
Ady Foot
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yes, SBS 2003 does install WINS by default. It was installed up til about 5:30 this afternoon. I wanted to strip out any unnecessary protocols that might be causing issues - I tend to see WINS as an unnecessary evil in modern networks, especially when all clients are Windows XP and above - so I removed WINS, then rebooted again after everyone had left for the day. As of yet, I can't tell if this as fixed the issue or not, as no one was around to test, so I won't know until the morning.

The question is, do I replace the network card anyway, before everyone gets in, just in case, and risk not knowing what fixed the issue if it doesn't re-occur, or do I leave the existing NIC in place, and see what effect removing WINS has, but risk disrupting everyone again an hour into their work, if the issue does re-occur!?

One thing I did notice before I removed WINS, was in the WINS mmc console, the tree structure showed:
WINS
 - Server1 (which was blank, and had nothing under it)
 - server1.domain.local (which had a Blue/White Exclamation point against it, not a yellow exclamation, but a blue/white one.)

I didn't think much of it at the time, as I was about to uninstall it, and was just making a note of the settings should it need reinstalling, and there was nothing in the event log to indicate what the eror/problem might be, no WINS events at all.
I think the problem could very well have been caused by WINS but, like you said, we'll never know if you change out the NIC before your users arrive in the morning.  Entirely down to you and the element of risk you're willing to accept if WINS wasn't causing the problem.

Regards,

Ady
I've just noticed a few errors in the event log:

Event Type:      Error
Event Source:      MSExchangeIS
Event Category:      General
Event ID:      9646
Date:            28/10/2009
Time:            17:41:19
User:            N/A
Computer:      SERVER1
Description:
Mapi session "/o=First Organization/ou=first administrative group/cn=Recipients/cn=user..one" exceeded the maximum of 500 objects of type "objtFolderView".

I initially thought these were just Exchange errors because one or 2 users have massive mailboxes (1.5gb) with a large number of folders. However, after searching eventid.net, I found 1 post which caught my eye: "In my case, this event occured when the network became unstable due to an error in one of the cisco switches. After replacing the swicth, the error went away." faulty switch? Coincidence?
Are you using cisco switches?  If so then does the time that this problem started correlate with the time that you introduced the switch to the network?

Regards,

Ady
Ady,

Not using Cisco switches, the 2 10/100mb with GB uplinks are Dell Power Connects, the fully GB switch is a netgear. However I thought it might point to a switch issue. HAving said that all 3 switches are blinking away green, there doesn't seem to be any obvious errors.

I'm in this morning, and the server has been up for 14 hours, users are able to connect to their mapped drives, browse the net, and send/receive email in Outlook. However, there are only 10 people in at the moment, the real test will come when all 25 are in, opening and saving multiple documents, and have been doing so for an hour.

I haven't replaced the NIC yet, I want to see if it was a WINS issue first! I need to be able to report to management what definitively caused the issue!

I will report back in an hour or two what the situation is!
It's now been up and running for 3 hours, the issue 'seems' resolved - touch-wood/fingers & toes crossed etc!

We did have a slight issue, with the MD running Sage Line 50, and searching for transactions - the transaction was taking 3 -4 minutes, where previosuly it was taking 30 seconds. We got everyone to log out of the desktops, and tried again, but it didn't help. I then tought it might have been an issue with the network card, as yesterday I'd set the speed to 1000mb/full duplex. I've known this to cause issues in the past, when the speed is set manually, so I set the speed to auto detect, and it detected 1GB. At the same time the MD tried to run a back up of sage, and found 2 users were locked in a session, even though they were logged off their machines. So she forced the sessions to close in Sage, ran the backup, then tried the transaction search again, and it seemed to be a lot quicker, taking around 30 seconds again. I don't know if this was due to the network card change, or simply not having users locked in session, but its a lot quicker.

Users are now logged back on, working in Outlook, using their mapped drives, saving word & excel documents, and browsing the net, AND the MD can run her transaction searches in Sage, as long as no one else has Sage open.

As I said, the issue seems fixed, I'm going to keep this question open until the end of today, but if the issue doesn't re-appear, I'll close the question, and issue points!
Hi Ben,

Great news that the issue seems to be resolved - looks like WINS was the culprit then.  Fingers crossed that the system remains stable and the problem doesn't show its ugly head again.

Regards,

Ady
Groan!!

13:55pm, and the issue has reappeared! 25 users can't acess their mapped drives, or save documents that are already open.

Performance monitor shows Avg Disk Queue is steady bouncing between 0 & the 60 mark (this morning it was up to 600), %Processor time is under 10%, Page/Sec is running flat along the bottom, but sometimes spikes up to 500.

I'm not in the office, but on a VPN at the moment, and I can browse the mapped drives, but if I try to open an excel file, even a small 12k one, it just hangs while downloading, then eventually says 'spreadsheet in use' even though its not.

There is an error I can see in the system log, which coincides with this, but I can't find any info on it:

Event Type:      Error
Event Source:      System Error
Event Category:      (102)
Event ID:      1003
Date:            29/10/2009
Time:            13:48:41
User:            N/A
Computer:      SERVER1
Description:
Error code 1000007e, parameter1 c0000005, parameter2 f7b9b93c, parameter3 f78b6c30, parameter4 f78b692c.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 53 79 73 74 65 6d 20 45   System E
0008: 72 72 6f 72 20 20 45 72   rror  Er
0010: 72 6f 72 20 63 6f 64 65   ror code
0018: 20 31 30 30 30 30 30 37    1000007
0020: 65 20 20 50 61 72 61 6d   e  Param
0028: 65 74 65 72 73 20 63 30   eters c0
0030: 30 30 30 30 30 35 2c 20   000005,
0038: 66 37 62 39 62 39 33 63   f7b9b93c
0040: 2c 20 66 37 38 62 36 63   , f78b6c
0048: 33 30 2c 20 66 37 38 62   30, f78b
0050: 36 39 32 63               692c    
After talking to everyone, it appears that they can see their mapped drives, and browse them, just not save or open documents, whichi s the same issue I'm having across the VPN.

However, this sounds like a different issue to yesterday, where they couldn't even browse the mapped drives. I can edit and rename the document on the server, which indicates that they are not in use, but when anyone tries to open the document, they get a message saying the document is already open.
OK - I'm sorry that this is still happening.  Try the hotfix in the following article.  Funnily enough it does talk about disk I/O so might just work!

http://support.microsoft.com/kb/837432

Regards,

Ady
Hi Ady,
I haven't applied that patch yet, after a reboot the server seemed to start working again, and I didn't want to fiddle with anything else while it seemed to be stable.
Also, the existing file version & date seem to be newer than the hotfix version. The hotfix is version 5.2.3790.132 dated 25-Feb-2004, where as the existing file on the server is version 5.2.3790.3959 dated 17-Feb-2007.
I will see how it goes today, if no more issues arise by 5pm GMT today, I will close the question & assign the points.
Thanks
Ben
Hi Ben - thanks for getting back to me.  Fingers crossed yet again :-)

Ady
Issue appears to have been caused by WINS. After removing service and rebooting issue hasn't reoccured. It's possible BackupExec Continious Protection Server (CPS) caused the last issue, after WINS was removed, as users could broser network shared, but files were locked, where as the WINS issue caused shares & UNC paths to become unavailable. CPS had been installed recently, and once removed this issue also resolved itself.