Solved

Exchange crashing with errors 9056, 9057, 9188, 8260.

Posted on 2003-11-06
32
6,748 Views
Last Modified: 2007-12-19
Hello All

Once the exchange server comes back up after a reboot, it stay stable for about 30 minutes and the it will start crashing.  The event reports the following:

Event ID 9188
System Attendant failed to read the membership of group...
Event ID 9144
Clients will not be directed to ths GC
Event ID 8260
Cannot acces Address List
Event ID 9057
NSPI Proxy cannot contact any GC
Event ID 9056
NSPI Proxy listener thread
Event ID 7031
The SA terminated unexpectedly.

The DC is actually on another server.  The FSMO roles are also on this server, and another Exchange server can successfully authenticate with this server.

Server is currently down.
0
Comment
Question by:jcfrietman
  • 17
  • 10
  • 5
32 Comments
 
LVL 8

Expert Comment

by:JasonBigham
ID: 9695756
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9695979
when did this start? sounds like it's trying to contact a GC that isn't available. Anything changed in your environment? Added or removed a GC? then of course, if the first error is right, then all you'd need to do is add that computer object to the Exchange Domain servers group, per Jason's suggestion.

could you post the entire 9188 and 9144 events?

D
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9696073
Thanks for this.  I did actually looked at this article, but I think the problem is a bit deep.

The server is fully operational for about 20 minutes after services have been re-started.  They start failling with the event id.

At the moment SA has crashed but Info Store is still running (some how).  Whilst trying to get into Users and Comp we get Naming Information cannot be located because:
The server is not operational

The FSMO/DC is running fine on another server.

Many Thanks

Juan Carlos
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9696089
You still haven't posted the events.....

d
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9696100
How many DCs and GCs do you have? When did this start?

David
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9696125
One more...is something taking up a lot of processor, or does this machine seem very sluggish and slow?

d
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9696137
I know sorry just posting them now:

9144
NSPI Proxy failed to connect to Global Catalog server.domain.com over Tcp/Ip. This server is down or unreachable. Clients will not be directed to this GC until it is available again.
Solution
The global catalog server in which the exchange server is trying to contact is offli
ne. Bring the server back online and exchange will then service the clients.  

9188
Microsoft Exchange System Attendant failed to read the membership of group 'cn=Exchange Domain Servers,cn=Users,dc=your domain'. Error code '80072030'.

Please check whether the local computer is a member of the group. If it is not, stop all the Microsoft Exchange services, add the local computer into the group manually and restart all the services.
Solution
Fronm a newsgroup post: "I had this same problem and resolved two different ways. The first way was when I made the E2K server a DC the error would go away. However, once I demoted it back to a member the error came back. Fortunately, I found help from someone in the newsgroups. They suggested I should go to AD Users and Comps and remove the server in question from the EDS group and restart the system. Once I removed it, the server added it back, by realizing it was an exchange server. This then solved my problem. I have not seen the error for nearly a month since doing this".


________________________________________________

Nothing has changed in the environment.  We have two GC at this site and two Exchange servers.  The first E2K is working fine and connecting no problem to the FSMO and GC.  The second E2k decides to die after 30 minutes

Thanks for all your help
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9696188
Hello Again

Yes we do have them and we were wondering about that.  The process is part of the storage manager and is need to in order to access the info store on the SAN.

Yes, the system is currently running at 76 %, even though System Attendant is not.  

I know the CPU is being abused but will this actually kill all links to the GC and kill the SA ?

thanks (again)

0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9696318
It's possible, especially since the reboot clears all processes, and the Exchange server comes up normally. If it couldnt' contact a GC on startup, nothing would start. you seem to be failing later on as the CPU usage begins to increase dramatically. Does the other Exchange server experience this kind of usage by the storage manager? If it's not, then there maybe something wrong with that app, or even your SAN (God Forbid!)
Let me know...
D
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9696548
Hello

Our plan as per your suggestion:

Stop all exchange services, reboot server.  Restart server but not to re-start Exchange.
Monitor the server and monitor GC access.
We will monitor the processes and see if GC dies.  If GC does not die but CPU is still very high then it is a problem with that process.
The problem with that process is that it is essential to the path to the Exchange databases.  We do not want to upgrade to Volume Manager 3.0 cause if all else fails then we will not be able to find the exchange databases.

Another thing we are doing, is building another exchange boxed with a different name.  I would like to move the info store to this box and try and bring it up on line.  Will this have any implications with the AD, since for example Joe Bloggs belongs to Server A on the AD but now he is on Server B, but AD has of course not updated.

Many thanks

0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9697125
Hello

Restarted server, waited about 25 minutes and then the server failed once again (Exchange was not started).

So far the only error I have got so far in the Application Log is as follows:

Event ID   2104
Topology
Process WINMGMT.EXE (PID 1576). All the DS in servers in the domain are not responding

any ideas please?
0
 
LVL 8

Expert Comment

by:JasonBigham
ID: 9697133
0
 
LVL 8

Expert Comment

by:JasonBigham
ID: 9697152
Might be time to bust this out as well;

http://support.microsoft.com/?id=321708
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697154
"Another thing we are doing, is building another exchange boxed with a different name.  I would like to move the info store to this box and try and bring it up on line.  Will this have any implications with the AD, since for example Joe Bloggs belongs to Server A on the AD but now he is on Server B, but AD has of course not updated."

Bad idea. No one will be able to contact that server, user objects are looking at server A. Second, you'll then have a server orphaned in your AD, that owns all the mailboxes. If you delete it from AD, your AD will go haywire. sit tight, this obviously isn not an Exchange issue right now, don't compound your misery.

D

0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697159
BTW, did you pinpoint the name of the service and the exe file that's eating your processor? If so, post here please.

D
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9697242
Hello Guys

Thanks for all your help so far.

The system log reported the following :
Event ID 5783
The session setup to the Windows NT or Windows 2000 Domain Controller  <server name 2> for the domain  <domain name 2> is not responsive. The current RPC call from Netlogon on <server name 1> to <server name 2> has been cancelled.

One of the solutions was to apply SP3.  ( We are currently running SP2 for windows and exchange) .  I agree with David, this is not an Exchange issue but rather than a W2K issue.  Can you guys still help?

The actual file that is taking most resources is VxSvc.exe, but I think this might be a dead end.  I am just about to say my prayers and install SP3.  Do we need to install any additional patches?

thanks
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 8

Expert Comment

by:JasonBigham
ID: 9697266
Are you not allowed to go to later SP's? If not, sounds like it would be a good option to consider... considering your situation.

I sue all the latest, no probs at all... nothing to fear.
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697388
I'm starting to think that this machine is not patched against the RPC bug. Would you please check to see that the hotfixes 823980 and 824124 have been applied to this machine? Something on your network is pounding this machine flat....

D
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697404
Well, you didn't want to read this, but...

http://forums.veritas.com/discussions/thread.jspa?threadID=3890&tstart=60

Your SAN is giving you problems, or at least the service is. I've seen several threads like this one, but no solution yet. you may want to give Dell a call SOON!!

D
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697415
0
 
LVL 8

Expert Comment

by:JasonBigham
ID: 9697427
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697482
Yea, I LOVE how none of these guys got a solid answer, don't you? there's a reason that Dell is at the bottom of the food chain when it comes to enterprise class servers.

D
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697516
BINGO!!!

"We had a big problem with that here. Traced it down to the Volume Capacity Monitoring "feature" of Array Manager. It was slowing down machines, and causing crashes. There is a Dell utility with "Array manager Utilities" that will allow you to toggle the feature. What we found was to load the utility, connect to the box in question, check the feature on, apply, check the feature off, apply.

That should help greatly, it's fixed most of our woes, we're working on mass disabling it."

Stole that from the Dell site...

D
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9697538
Hello

We have loaded the following Hot Fixes:

KB823980 and KB 824146.  it is something on the machine that is killing it.

Are these fixes OK?  

If SP3 does not work what do you guys think?  go home and cry?

cheers
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697574
those are the mandatory RPC hotfixes, to block the blaster virus and its variants. you're fine on those, every server and desktop in your company should have those 2 fixes.

did you read my other posts?

d
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9697782
I have been reading them and even though I agree that we might have a problem with the actual exe itself, I do not think we have a problem with the SAN.

I have been monitoring the server and it is currently running at around 66% to 75 %, so no memory leaks here.  (I know we cannot see the DC either!).

I have forward this to my colleague Cliff, who will investigate as well, and pass on his comments.  (he might respond with my alias)

Yes, spending endless night in front of monitors, we managed to patch all servers and desktops, so I think we are clear of the virus! (I hope)

thanks
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697827
you're right, it's the file itself.the 66-75% is usable I suppose. unless it's more than one processor, then it's actually using up the 100%. Fact is, it's keeping you from seeing the GC/DC, and since all that operates on LDAP, this exe must be taking a very high priority.

D
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9697851
What model server is this?

D
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9697969
Cheers for your comments.

The server is a Stratus Server.  I have also noticed that the NMS Service is not running on the faulty server even though it is running of the working server.  (nmssvc.exe)

http://www.answersthatwork.com/Tasklist_pages/tasklist_n.htm

I have looked it up and it is an Intel driver service, that liases with the Simple Network Management Protocol.  Not sure if this is essential to the overall running of the system, but after the reboot I will check this.  It is currently set to Automatic, but it is currently stopped.  I will only try this after I have installed SP3 (currently backing up files).


You mentioned two hotfixes beforehand, we have installed these two:  

KB823980 and KB 824146   (you mentioned 824124).  Are we still running the correct ones?
Thanks

JC
0
 
LVL 24

Accepted Solution

by:
David Wilhoit earned 500 total points
ID: 9698431
yup, those are the ones, I had the number wrong. that's what I get for trying to do it from memory :)

D
0
 
LVL 1

Author Comment

by:jcfrietman
ID: 9700235
Hello  again and good morning!

Well, one of the NIC failed on the server after reboot with SP3.  Corrected error and reboot once again.

The server has been up and running ever since but of course we are monitoring the server.  Many thanks for all your help much appreciated.

We are still going to move the mailboxes to a more stable server, but we will use the wizard instead

thanks

Juan Carlos
0
 
LVL 24

Expert Comment

by:David Wilhoit
ID: 9700255
just build another server in the AG, and move mailboxes....you'll be better off....

D
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Disabling the Directory Sync Service Account in Office 365 will stop directory synchronization from working.
ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
In this video we show how to create a Shared Mailbox in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Recipients >> Sha…
In this Micro Video tutorial you will learn the basics about Database Availability Groups and How to configure one using a live Exchange Server Environment. The video tutorial explains the basics of the Exchange server Database Availability grou…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now