Link to home
Start Free TrialLog in
Avatar of ipendlebury
ipendlebury

asked on

Connectivity problems since sp1

I had a perfectly healthy SBS 2003 premium server before I installed 2K3 sp1. Now I have intermittent connectivity problems which are:

Often when I connect from outside via Terminal Services, the server stops responding at the instant that I login. Internal clients are often also disconnected at this time. It can be up to 5 minutes before it starts responding again. When it does, my TS session is still there waiting for me.

I have an MS Access application which runs on the server. This often fails now with an 'ODBC call - failed' error when it tries to access the SQL Server.

My Website sometimes dies and I have to restart the WebProxy service in order to make it work again. IISRESET has no effect.

I am getting several types of Events logged which I did not previously:

Event 14: There were password errors using the Credentials Mananger. Launch the Stored User Names and Passwords control applet.

Event 40960: The security system detected an authentication error for the server ldap/bfgcserver.BFGC.local@BFGC..local The failure code from authentication protocol kerberos was "The specified user does not exist"

Event 1030: Windows cannot query for the list of group policy objects

There are also several W32time events being logged. These state the a time server cannot be found, or it is receiving invalid time data. My server clock is correctly set though.

If I reboot the server, I get Event 7022: the Kerberos Key Distribution Center service hung on starting.

I read another article which stated that the Event 14 errors were releated to an XP SP2 workstation logging in. Yes I have SP2 workstations, but I am getting Event 14 logged even during the night when no workstations are switched on.

I have spoken to Microsoft Support about all this. They got me to install Security Update MS05-019, and hotfix 899148. The hotfix helped a lot, but I've still got problems.
Avatar of colin_harford
colin_harford

If you have spoken with Microsoft Support, tell them the issues have not been resolved...
Avatar of ipendlebury

ASKER

The chap at Microsoft said he had done all he could, and that I should wait for sp2. That is why I signed up here to see if I could get someone else's perspective.

Today the Server disconnected itself from the internal network. It was still possible to access it remotely. When this happens, the only way I know to correct it is to reboot.
Ask to get it escelated...  Cause Microsoft is really good for working with you on an issue to identify the cause and fix it.  Else you ask for your money back...

One thing to try doing when it happens, is restarting the netlogin.  When the machine is unresponsive on the network, can you login locally?


Can you go over what happened with Microsoft support, so I don't duplicate it...
I originally rang Microsoft asking for a hotfix I had found relating to connectivity problems. They told me that this hotfix was included in sp 1 and therefore wasn't relevent.

Because my problems had only appeared when I installed sp1, they kindly agreed to help me free of charge. This is why I haven't been too pushy with them.

I was sent me the MPSReports package to run on my server. I sent the results back and was sent Hotfix 895573. This didn't work. I was then sent Hotfix 899148. This seemed to improve matters vastly, but not completely. I asked if there was anything that could be done to cure the last bit of the problem. It was at this point that I was told that there was nothing further that could be done at this stage and that I should wait for SP2.

Since then the problem has gradually got worse. Almost every time I login via TS now, the server vanishes for up to 5 minutes.
Interesting... Did they say when SP2 is coming out?  

Okay, lets start with one thing... The TS Issue and becoming unresponsive....

Does it only happen when someone logs in via TS or is it specific users?

When the server vanishes, can you ping it? Can you login locally?  Any login processes, etc start when someone logs in?

I asked when SP2 is coming out. I didn't get a specific answer. Only that it's not too far away.

I can't be sure that TS is the only trigger for this problem. Like I said the internal clients were disconnected this afternoon. I didn't log in via TS, but someone else might have done.

I logged in several times from another site yesterday and had very little trouble with it. So I am wondering if the problem is specific to my machine. I have two pc's here at home. They both have the same trouble logging in.

I experimented with the ping response. This is strange. I logged in from one pc here, whilst pinging from the other. As usual the server froze whilst I logged in and the ping responses also stopped. The server came back to life after a couple of minutes. But the ping response has not come back at all. When I logged out, the server became unresponsive for 20 minutes.

I have seen this before. If I log out of a terminal services session, then try to log back in immediately, there is no response from the server for a period of time. So it would appear that logging out has the same effect as logging in.

This server is on a site that is unattended for 4 days a week so it is sometimes difficult to find out what is happening internally. However, I have one pc that is permanently running a program. This pc is normally visible to the server when I login via TS. When the server stops receiving data from this pc, I then look in the network neighborhood and find that the pc is not visible. Rebooting the server cures this. Next time it happens, I will restart the Netlogin service on the server.

Thanks for your help by the way

Interesting...

Any events in your event log?
I've listed all the Events that I thought that I thought might be contributary in my original question. There are others that I thought were symtomatic:

Event 14122: A packet filter could not be bound.

Event 14223: ISA Could not restore a packet filter interface after the interface changed address or was re-enabled.

Event 14178: The Web Proxy service identified that the address 195.112.48.42 was removed from the interface table and stopped listening on port 80

Event 14179: The Web Proxy service identified that the address 195.112.48.42 was addeded  to the interface table and started listening on port 80

I might be leading you astray here, but I have been logging in and out multiple times over the past few hours. Each time I was using an Account with administrative privileges and caused the server to become unresponsive. A little while ago I logged in and out several times using an account with lesser privileges. There was no disruption to the server. But then I subsequently found that I could use the administrative accounts without any disruption. A red herring perhaps.
Just a quick line to say that since I logged in yesterday with non administrative credentials, the server has behaved impeccably, even when I log in as administrator. Still might be a red herring though.
Weird, let me know if things keep happening...
I should have posted earlier. It all went belly up again today when I logged in as administrator. When I managed to get back in,  I made a point of logging in as a non administrative user. This worked ok, so I then logged back in as administrator. I got in without a problem again. So I don't know whether to read anything into this.
That is very interesting...

What programs run when a user logs in automatically?

What about as admin?


When you login as admin, is there any failed events?  What logging do you have setup?
When I log in via TS and the server loses connectivity, the only events that are logged are the ones I have already posted about the external interface being disconnected etc.

When I log in as admin, an MS Access database application opens, but there are two things to say about this:

1: When the server loses connectivity, it happens the instant I hit return after entering my password. Typically, the Access application doesn't run for 5 seconds after this.

2: I also see this behaviour when logging in under my own user account. This account doesn't launch the Access application, but does also have administrative privileges.

There aren't any programs that run when any other user logs in.

The server has never lost connectivity during any session has managed to become established, but like I said previously, it often loses connectivity for a few minutes when I log out.
Another clue...

In my conversations with Microsoft, RPC has been mentioned a number of times. At least one of the hotfixes I installed was related to RPC.

Anyway.... I always leave the server logged in running the Access application under the administrator account. When I log in via TS, I usually use my own account which also has administrative privileges.

Today there was an unrelated problem on a workstation. My way of administering the workstations is to firstly log onto the server remotely via TS, then from the server, TS into the client.

Apart from the problem I was trying to solve, I encountered some strange behaviour on the client. Further research suggested that this was a Group Policy setting that was to blame. Puzzled by this, on the server session I tried to look at Group Policy Results for the offending client. I got an error stating that Group Policy Results were unavailable for the client because the RPC server was unavailable.

I then logged out of my own server session and TS'd into the administrator session with the /console switch. I then found that I could view Group Policy Results for the client , and then when I connected to the client, the stange behaviour had gone.

I'm just putting two and two together here, but the RPC server seems to have been cropping rather a lot in my life recently.



I've been doing some more experimenting this morning.

The first observation I would make is that I always seem to be able to log into a TS session effortlessly if the Ms Access is not running. The Access application is not intensive though. Once every five minutes it picks up and imports a file deposited on the server from a weather station. This takes less than 10 seconds. The rest of the time it is stood idle.

I mentioned that I have problems logging in via TS under my own credentials. What I realised today, is that the administrator account would normally also have an open session and would be running the Access application at this time. So perhaps it's not surpise that I'm having problems under my own account. What I haven't mentioned is that normally I don't log off the administrator session. Instead I run:

tscon.exe 0 /dest:console

This hands the session back to the console so the Access Application can continue to run.

There definitely seems to be a cumulative effect in all of this. If I repeatedly connect while the Access application is running, then the problems get worse. Logging on while the Access application is not running seems to clear things. After doing so I can usually connect if the Access application is running without a problem for a while.

Don't know if this is related to RPC, but I had a problem this week whereby the Access application could not insert records into the SQL Server. I got an 'ODBC call failed' error which I could not get beyond. My code was written so as to use DAO to insert the records. I rewrote the code so it used ADO instead. It works fine now. Prior to installing sp1 it had worked fine using DAO for 18 months.
Try keeping access open, but only with a blank DB, nothing more... and then try your expirement again... Then we know if it is Office, or if it is the access db that is causing an issue.
OK, bearing in mind that there have been times when everything has worked OK, it may be dififcult to determine if I have made a change that has had an effect. Perhaps it is possible that there is something going on in my application that upsets the system with sp1 installed.

However, there are still Events being logged that require attention. Perhaps you could comment on these....

Event ID:      7022
The Kerberos Key Distribution Center service hung on starting.

Event ID:      40960
The Security System detected an authentication error for the server ldap/bfgcserver.BFGC.local/BFGC.local@BFGC.local.  The failure code from authentication protocol Kerberos was "The specified user does not exist.

Event ID:      14
There were password errors using the Credential Manager. To remedy, launch the Stored User Names and Passwords control panel applet, and reenter the password for the credential BFGC\administrator.
I found something significant tonight. Because I need the Access database application running 24 x7, The server is setup to autologon into the Administor Account, I have a shortcut to the MS Access application in the startup group.

Whenever I need to get in there to see what is going on, I use mstsc.exe /console to join the session. It is whenever I do this that the problems occur. The problems occur even if I don't have MS Access running. As I mentioned previously, I use tscon.exe 0 /dest:console to push the session back to the console. The server also locks up for a while whenever I do this. I hadn't appreciated that I was getting this behaviour because using a different method of logging on with TS.

Perhaps I should get back to MS support with this new found information.
Just when I thought I'd understood this problem, something else happens!

With the Access Application running under the Adminstrator console session, I logged in via TS with my own credentials to do some work. After 15 minutes the server became unresponsive for 5 minutes. My session eventually recovered and everything was as I had left it.
after 15 minutes of being logged in, then it went kapoop?
Yes I was just working away, and my session froze. Then 30 seconds later I got the dialog on my client the connection had been lost and it was trying to re-connect. When eventually it came back my session was there just as I had left it.
Would something have started running at the 15 minute mark?   IE: your access db running its import?

Event ID:     14
There were password errors using the Credential Manager. To remedy, launch the Stored User Names and Passwords control panel applet, and reenter the password for the credential BFGC\administrator.

For that, go in and put in the current password.  If you changed the admin password, etc, this would explain this issue.




For:

Event ID:     7022
The Kerberos Key Distribution Center service hung on starting.

Event ID:     40960
The Security System detected an authentication error for the server ldap/bfgcserver.BFGC.local/BFGC.local@BFGC.local.  The failure code from authentication protocol Kerberos was "The specified user does not exist.


Those two are probably related...

Can you run netdiag and dcdiag on this machine.
Hello Colin,

I've put the administrator password in there several times. it doesn't make any difference. The administrator password has never been changed since the server was installed. Although I have renamed the adminstrator account in Group Policy

Could you tell me how to run netdiag and dcdiag please. I have never done this.
Netdiag and dcdiag are tools in the Resoure Kit.


That is probably what it is complaining about, is it still has the account name of Administrator, and not the updated name.

It's probably worth mentioning that I did a Google search on the 7022 error. There are other people out there having this problem after installing sp1
Yes when I looked again, one of the entries in the Stored Usernames and passwords was configurered with the old administrator account name. So I updated that. The 14 and 40960 errors have gone. Still got the 7020 though.

I downloaded and installed the Resource Kit. It contains a multitude of utilitites, but neither Netdiag nor Dcdiag. When I searched my hard disk, I found them contained within the MPSReports that MS support got me to run a couple of weeks ago.

When I typed DCDiag, everything passed except the services test. It said that the following services were stopped: IsmServ, RPCLOCATOR, TrkWks and TrkSvr. Are these relevent?

When I typed Netdiag, A popup dialog appeared stating that "The procedure entrypoint DNsIsDynamicRegistrationEnabled could not be located in the dynamic link library DNSAPI.DLL

Presumably you are under the impression that these utilities should have been installed as part of the resource kit?
Do you not have the dnsapi.dll file?  Oh y,a this is SBS, not 2K3, it is on the resource kit for 2K3... You may need to install it, or have it in the same dir as the

http://www.liutilities.com/products/wintaskspro/processlibrary/ismserv/


http://www.microsoft.com/resources/documentation/Windows/2000/server/reskit/en-us/Default.asp?url=/resources/documentation/Windows/2000/server/reskit/en-us/distrib/dsbi_add_oywa.asp

TrkWks and TrkSrv are for Distributed Link Tracking (Client and Server)...

Once you get netdiag working, can you try: netdiag /test:kerberos /v
I have DNSAPI.DLL in the c:\windows\system32 folder. So presumably it is the wrong version. It is version 5.2.3790.1830 I also have access to another SBS 2k3 sp1 server. This machine has the same version of DNSAPI.DLL

I'm confused by you pointing me at Wintasks. After telling me to download the resource kit, are you suggesting that I purchase this package?




No, that was just a link about some of the services that you mentioned were not running, just so you knew what they were.  

Oh, I just realized, I believe they are part of the support tools, not the resource kit..

Did a quick check on the dnsapi error, saw this:

 "The procedure entry point DnsGetMaxNumberOfAddressToRegister could not belocated in the dynamiclink library DNSAPI.dll." error when running netdiag on XP

Make sure you are running the version form XP CD. There is a possibility that you have a W2K version of netdiag installed explicitly - there was a download from KB. Or you may just upgrade the OS from w2k to XP. To fix this problem, you may want to re-install it. To re-install,  run XP CD at \Support\Tools\SUPTOOLS.MSI and select "Remove all". After it is done, launch it again to install appropriate version of support tools.


From:  http://www.chicagotech.net/netdiag.htm

https://www.experts-exchange.com/questions/20915086/Error-message-using-Netdiag-exe-NETDIAG-EXE-ENTRY-POINT-NOT-FOUND.html


I'm not going to be around a comp  a lot the next two days... So going to leave a few things for you...

Try the netdiag, see what it finds about the kerb thing...




As for the TS issue, something isn't quite right when it locks up.

On your server, is it newer hardware, and if so, is DEP (Data Execution Protection enabled).  YOu will see the tab, in my computer, properties, advanced, in performance settings, there may be a tab called Data Execution Prevention.  If it is on, can you disable it and reboot, see if the issues continue.

 Just to be thorough, is this on a managed switch?  Does it see the connection active when the error happens, does the switch have any errors, do clients on that switch have any areas?  How is the quality of cable?   (I've come across some of the weirdest issues crashing a machine related to switch and cable in the past)  I highly doubt this is it, but best to check....  

I know you need to keep that access DB running, is it possible to do something about it? What about imaging that system to another similar hardware with ghost, or something else, just to see if the problem occurs on it as well, which it should, unless software is upsetting hardware...  If it does keep happening on the test system, we have a system to test on that will not affect your production...  Then try doing my suggestion earlier about access.  

I also saw these two articles that may be of a good read for you:

http://support.microsoft.com/?kbid=832971
http://www.windowsitpro.com/Windows/Article/ArticleID/42469/42469.html

Although, these hotfix's should have been part of SP1...  I wonder if there are some other issues, that they want you to wait for SP2 for that will include fixes.  In the past I've gotten a few fixes from them, and they aren't documented on their website...

(remember SBS is really just Windows server with some mods)


Do you have AV protection on the system, if so, can you scan your comp... what about disabling it (just to rule it out)?  What about some form of anti-spyware check?

When things start becoming unresponsive, do you have high CPU usage on the boxen, any particular process?

The other option, is if you are in a big hurry, is to call into Microsoft Support.. This is deffiently one that will have them thinking...




Hello Colin,

I downloaded and installed the support tools. Thank you. Here are the results that don't look quite right to my inexperienced eye:

Netcard queries test . . . . . . . : Passed
    [WARNING] The net card 'RAS Async Adapter' may not be working because it has not received any packets.

A while back I set up a VPN into this server, but abandoned it because it was too slow for my purposes. I thought I had uninstalled everything, but perhaps I haven't?

  Adapter : Network Connection

        Netcard queries test . . . : Passed

        Host Name. . . . . . . . . : bfgcserver
        IP Address . . . . . . . . : 195.112.48.42
        Subnet Mask. . . . . . . . : 255.255.255.0
        Default Gateway. . . . . . : 195.112.48.43
        NetBIOS over Tcpip . . . . : Disabled
        Dns Servers. . . . . . . . : 192.168.16.2

        IpConfig results . . . . . : Failed
            Pinging DHCP server  - not reachable
            WARNING: DHCP server may be down.

Default gateway test . . . . . . . : Failed

    [FATAL] NO GATEWAYS ARE REACHABLE.
    You have no connectivity to other network segments.
    If you configured the IP protocol manually then
    you need to add at least one valid gateway.

I don't know if the above error is really a problem. I use a Dlink adsl modem connected directly to my external network card. This works by establishing the connection to my ISP and then handing over the static IP address to my network card by DHCP. The modem then is effectively transparent. It has worked on this server for over a year, and I use it also at several other customers quite successfully.

I installed support tools on an sp1 box at another customer's. The same adsl modem is in use there. I ran Netdiag and the default gateway test passed. So there is a difference here.

The Kerberos test completed successfully.

I don't have antivirus software on this server. There is no email in use at this site, so I decided to take the risk. The server is quite old. A 1Ghz cpu with 1Gb ram. It worked well until recently. The DEP is not enabled. I don't have a managed switch, just a plain dlink hub. There are only 5 workstations, and these are only used internittently.

Because this system worked so well previously, I don't believe I have any cabling issues.
Just noticed something else...

In amongst all of this, i''ve been keeping an eye on the processes tab in Task Manager. System Idle process is usually in the 90's. Following some changes this morning I rebooted. An hour later I tried connecting via TS and had the usual problems. I had a look in Task Manager, but this time I also looked at the graph. It was between 90 and 97%. In the processes tab, I can see things are busy, but the system idle process is between 50% and 90% . So I don't understand how the graph can be permanently showing above 90% utilizaation.

One reason for the current high utilization is that my mirrored disks are resynching. But here's a puzzle... It's been resynching for 7 hours now and it's only 40% done. Pre sp1 the whole process would take no more than an hour.

If it's of any interest, the processes that keep popping up to the top of the utilization list are as follows. The figure at the right of them is the maximum utilization I am seeing:

explorer.exe 43%
taskmgr.exe 17%
system 30%
dmadmin.exe 30%
crss.exe 15%
sqlserver.exe 17%

Like I say, I'm not seeing System Idle Process drop below 50%. The above processes are mainly at zero, but keep springing into action. So the graph permanantly above 90% is weird. I notice also that there are 11 copies of svchost.exe running. Is this correct?

If the graph is correct, it's no wonder that i'm seeing TS performance problems. Also, when i've been on site at the server recently, i've noticed the mouse pointer not moving correctly. Jumping in fits and starts when I move it.

When the disks have resynched, I'm going to break the mirror and see what happens to performance. Although i'm puzzled as to why they're currently resynching following a graceful reboot.


I thought it was worth posting these images to illustrate what I am seeing in Task Manager.

The first Image http://81.140.18.153/taskmgr1.jpg shows the consistently high cpu utilization trace.

The second image http://81.140.18.153/taskmgr2.jpg shows the System Idle process at 91%, but also at the bottom of the window it can be seen that cpu utilization is at 85%

The graph shows that the utilization never drops below 80% but my processes window shows the System Idle process mainly above 50%

I find this very strange.
My disks have resynched now and things have quietened down. But it's far from OK. The resynching process took 14 hours. Prior to SP1 it would have taken no more than 2 hours.

It's plain to me now that there's nothing especially wrong with TS. The problem is that the server is a whole lot busier than it used to be. Exchange isn't installed. So this should be a pretty quiet server. It certainly used to be. My MS Access application sits there idle for most of the time, but every 5 minutes it spends 20 seconds doing some chores. I notice now that cpu utilization jumps to 100% at this time. It never used to.

To illustrate what I am seeing, this morning I established a TS session, then with the performance graph running I logged in a second time with TS. Both sessions cut off for 30 seconds, but when the first session came back, the graph was there for me to look at. I've uploaded it to http://81.140.18.153/tslogin.jpg

With the cpu utilization nailed to 100%, it's no wonder that my TS sessions are being kicked off.
The only thing with this, is that the network stack does not respond to ICMP ping requests.  Even a busy machine, should respond to ping requests....     CPU utalization always go quite high when someone logs in, and does take a few minutes, once logged in to quiet down.... this does depend on what else the machine is doing, what programs start when the user starts up, etc...


It would be interesting to watch the traffic going on at this time, to see what is going on at that point when it stops responding...


I do agree that there is something, very odd going on with your computer.  
Hello Colin,

Glad you're back. So what do I do next?
Just as a comparison, On another SBS 2k3 server that I administer i repeated the test whereby I established a TS sessions, then monitored the performance trace whilst I logged in a second time.

Admittedly this pc is a 2Gb box (2x faster), but it is also running Exchange whereas the troublesome server isn't.

I observed that as I logged into the second session, the cpu utilization hit 100%, but only for a second. When I logged off the second session, the cpu utilization only hit 40% for a second. Whereas the troublesome server stays pretty well nailed to 100% for 30 seconds.

The biggest problem to me is that the internal network connection ceases to work about once every two days. The only way i've found to get it working again is to reboot the server. I've just had to reboot it again now.

I tried stopping various other services to see if this allowed me to login via TS without any problems. Certainly with IIS stopped, I can log in with TS perfectly
Yes, I am back...

So, when IIS is off, you don't have any TS issues?


is it possible to do the ghost step, so we don't play on production?


Are you familiar with ethereal?  We can use this to monitor network traffic...

Never used it. I'll give it a go if you tell me what to do.
I just rebooted to get my local area connectiion working again. And as usual it's resynching my disks. So utilization will be at 100% for the next 14 hours.
Which, ghost or ethereal?


Dynamic disks, should not have to resynch your disks on reboot....  bad server, no cookie for you...


This machine keeps getting weirder, and weirder....

I'm really wondering about just rebuilding things, I know its a lot of work on your part and easy for me to say as I don't have to rebuild, but I believe this is a software issue, and not hardware...  

If you can ghost it onto another machine, we can still keep working on it...


Let me know what you want to do...
>> Which, ghost or ethereal?

I don't understand your question. I have a pair of 40Gb IDE disks in a mirrored pair.

When my disks are back in synch I can break the mirror. That means we can do whatever we like. I would prefer to prove that both halves are bootable first though.

A major problem for me is that I only go on site about once a fortnight. I administer several other networks the rest of the time. The earliest can get back there is next Monday or Tuesday.

It's probably worth mentioning that I have an ASR backup taken about six weeks ago. I wonder if it's worth reenstating that, then re-applying sp1?

Perhaps it's worth re-applying sp1 on the existing setup anyway ?

Maybe, we do that on one of the two drives in the mirrored pair... although, I honestly not 100% sure I trust what is there on the mirrored pair...
Because this is only a 1G box, i'm reluctant to spend time rebuilding it. If it comes to a rebuild, I think I would look into trying to get a new box.

I'll break the mirror. Hopefully that will reduce cpu utilization anyway. Once i've proved that both halves are bootable, i'll re-apply the service packs.
Alright, let me know how things go...
I broke the mirror. It's made a huge difference!

I would still say that it's still a bit slow logging on, but the performance trace says it all. It only touches 100% for a couple of seconds now when someone logs on via TS. Previously it went to 100% for 30 seconds when someone logged off. Now I just see a brief spike to 60% when someone logs off.

Previously it was jumping to 50% every few seconds. Now it's down at 3% for the most part and only jumps to 50% when a genuine workload is applied. For the most part it's not going above 10%.

I'll monitor it over the weekend, but it would appear that I have a working system again.

Any Comments?
This is probably one of the weirdest issues I've ever seen... maybe schedule up a thorough check on those drives...
IDE or SCSI drives?  

If IDE, do you have S.M.A.R.T enabled?
The drives are both 40Gb Seagate IDE's. Smart is certainly available in the bios. I can't remember if it is enabled. Why do you ask?

(sigh) Even though the server is responding nicely now, the internal network connection has died again. The only way I know to cure this, is to reboot. Do you have any suggestions as to why this is happening? And can you suggest how I might get it back without rebooting?
Well, the auto rebuild on reboot has me wondering... and wondering if there is an issue with a drive... I've saw it in the past, when a drive is going to fail, it starts workin really, really slow... The other time I've saw, it is, if you don't have the correct controller drivers running.  Some intel boards are bad that way, I have an old original P4 with rambus, and on XP, things run really slowly, unless I install updated drivers from the Intel site...


Try Restarting the Net Logon Service...

Failing that, try restarting the Network Connections service (Netman)...


Before you restart the services, let me know if you can ping 127.0.0.1 on the afflicted server... Then we know if the TCP/IP stack is still up and running...
Yes I could ping 127.0.0.1

Restarting those services didn't fix it. I haven't rebooted yet. Is there anything else to try?
Let me go look...

Interesting... so the network stack is up...



Is there anything using an aweful lot of CPU usage.... I'd say, force a memory dump, but I don't have time to examine that....


Try restarting the "Server" service, and the RPC services.  You can restart the RPC Location, but not the RPC service...
CPU utilization is very low. Between 3 and 5% mainly. The RPC locator service wasn't started. It's not set for automatic startup. Should it be? Anyway, starting it didn't fix the problem.

I tried repairing the network connection. Still no joy.
I'm drawing a blank...did see this out regarding R2 of SBS

http://bink.nu/Article5506.bink
I had a look at the info for SBS R2. If i'm reading it correctly, it will only run on Server 2003 R2. This is a seperate package rather than a free upgrade. I won't be buying a new operating system to solve my problem i'm afraid.

I was on site with the troublesome server today. I had been aware that it has always been the Local Area Connection that dropped rather than the Network connection. So I had a look at what network cards were installed in the box. The Network Connection uses a 3Com card, whilst the Local Area Connection used an unbranded card. So I replaced the unbranded card with a 3Com card.

I could instantly see a difference in the Performance Trace. It seems to spend most of it's time at 2% now. The occasional spikes are further apart and don't go often above 30%. So i'll monitor things now to see if the local area connection stays on line.

When I got home, I logged in via TS and was disappointed to see the server become unresponsive for 5 minutes. When it came back though, everything seemed to work somewhat more briskly. So i'm guessing that perhaps the server previously had been spending a lot of resources trying to deal with a duff network card. We'll see.

When I say that the server died for 5 minutes, this certainly happens alot when I log in with the /console switch to take over the console session. I have logged into a new session several times this evening without any problem. So what is it about taking over a console session that is so problematic?
I lost the local area connection again today. So replacing the network card hasn't changed anything in that regard.

So I think i'm looking at a re-build now. I think i'll try and get funding for better hardware though.

Your suggestions and comments have all been helpfull and constructive. I'd like to thank you very much for all your replies. I'm new to Experts Exchange, how do you go about getting awarded the 500 points?

ASKER CERTIFIED SOLUTION
Avatar of colin_harford
colin_harford

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial