?
Solved

exchange does not find global catalog - LSASS might be the reason

Posted on 2006-03-22
8
Medium Priority
?
752 Views
Last Modified: 2010-05-18
Dear experts

I have a problem on our Exchange server witch probably is caused by a problem on our DC (jupiter).

It started with a hanging outlook. Upon inspection of the exchange server (called SATURN) I found the following entry in the app evt log:

[Event Type - Source - Category - ID - Date Time - User - Computer
Description]

***  Error - MSExchangeDSAccess - Topology - 2103 - 15.03.06 13:40:15 - N/A - SATURN
Process MAD.EXE (PID=2384). All Global Catalog Servers in use are not responding:jupiter.xxx.com

The DC was locked with the domain admin account. I was unable to unlock the server, it always said invalid password. I'm convinced I typed the correct password.
On an other occasion when I was logged in as doman admin and tried to shut the server down, it said that I had no permission to shut the server down.

I also noticed that LSASS was using 60% to 99% CPU, so even after rebooting the exchange server the information store would not start while LSASS on the DC was running wild. After a while LSASS went down to almost no CPU (I didn't actually DO anything, I was just watching with task manager and process explorer) and then a reboot of the exchange got it back to work.

*** Information - MSExchangeDSAccess - Topology - 2081 - 15.03.06 16:06:02 - N/A - SATURN
Process INETINFO.EXE (PID=728). DSAccess will use the servers from the following list:

Domain Controllers:
jupiter.xxx.com
venus.xxx.com

Global Catalogs:
jupiter.xxx.com

The Configuration Domain Controller is set to jupiter.xxx.com


I also found the following entry in the evt log of the DC:

*** Error - Userenv - None - 1000 - 15.03.06 13:39:20 - NT AUTHORITY\SYSTEM - JUPITER
Windows cannot obtain the domain controller name for your computer network. Return value (2146).Userenv.log: USERENV(e8.39c) 13:33:39:662 ProcessGPOs: DSGetDCName failed with 2146.

*** Warning - w32time - None - 63 - 15.03.06 14:42:53 - N/A - JUPITER
The time service cannot provide secure (signed) time to client 192.168.1.140
because the attempt to validate its computer account failed with error 1723.
Falling back to insecure (unsigned) time for this client.
Data:
0000: 00 00 00 00               ....  
[Note: 192.168.1.140 is a W2k client)

*** Error - Userenv - None - 1000 - 15.03.06 15:04:25 - NT AUTHORITY\SYSTEM - JUPITER
Windows cannot obtain the domain controller name for your computer network. Return value (2146).

The problem has happened 3 times with two to three days inbetween.

We have a small W2k domain with about 100 user/mailboxes and 30 desktops/notebooks.

I have collected some more evidence but not knowing what is relevant I stop here to not overwhelm you with too many details.

Thanks

Roger
0
Comment
Question by:TuliTaivas
  • 4
  • 3
8 Comments
 
LVL 48

Expert Comment

by:Jay_Jay70
ID: 16264142
how often do you reboot your DC?

sounds and looks like the system went into a pretty crazy state of mind...... :)

LSASS is a process i have seen many times cause greif, are you completely up to date with service packs and updates, and are you certain there is no malware on your DC or exchange server playing around with processes

lots of different little issues in their that all seem to be linked    hmmmm
0
 
LVL 1

Author Comment

by:TuliTaivas
ID: 16268711
Hi Jay

We reboot as little as possible, e.g. after installing patches if it is needed other wise it runs for weeks. We have used it that way for 5 years without any major problems so far.

We are up to date with windows patches on both  DC and XCHN. Not sure about the latest exchange patches, though.
It's exchange 2000 with SP3 if I recall it correctly (can't check right now).

We have Symantec AV installed which scans the server once a week. But I haven't checked with a rootkit revealer like e.g. BlackLight.

I have observed the available memory on jupiter with task manager. After cold boot LSASS uses about 40MB and there are about 270 MB available of the 500 MB that are installed. Within say 3 hrs LSASS goes up to 53MB and available memory starts to go down slowly but steadily to 60MB and even to 5 MB. What I'm wondering and what's worrying me a bit is, where does the RAM go when all other numbers stay more or less the same (system cache goes up a bit but does not account for the whole difference). Commited charge is always around 220 MB.
I also checked with Sysinternal's process explorer but this didn't reveal any memory leak either.

Once when I logged off and on again, the available memory went up from 60 to about 180 MB so I thought that it was something in the user session that used the memory. But the next time I tried this when the avail mem. was again down to 60MB, it stayed at 60MB after logging in again.

Maybe LSASS running wild is not the primary cause but rather a reaction to something else that goes bad?

As an aside:
We have a second DC which is NOT configured as a global catalog server. It's a rather old compaq deskpro 800 MHz, 392 MB, 4GB(770MB free), 14GB (7GB), 72GB (28GB). Not sure whether it could take the additional load. Setting it up as a second GC would help in keeping the exchange server up, I assume. What do you think?

Roger
0
 
LVL 48

Accepted Solution

by:
Jay_Jay70 earned 900 total points
ID: 16276482
problem with trying to troubleshoot LSASS issues is that there are so many different ones out there

i dont see the benefits of setting the machine as an additional GC but it may be worth having set as the only GC temporarily and see if it makes any difference, it will handle the load

if no difference you can always make the original CG a GC again
0
Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

 
LVL 1

Author Comment

by:TuliTaivas
ID: 16278710
Hi Jay

I tried to find out, what the return value 2146 means in the evt log message
"Windows cannot obtain the domain controller name for your computer network. Return value (2146)."
which seems to be the first in the chain of events. No luck so far. Any ideas where I should look?

I thought the benefit of having 2 GCs is, that if one fails then the other can provide the information to exchange (or whatever is querying the GC). And also the logins in the morning could be quicker. And I also remember having read (on EE) such recommendations.  

[However there is a gotcha in that the infrastructure FSMO server should not be placed on a GC under certain circumstances. But apparently in a single domain this is not an issue since the infrstructure server has nothing to do. (http://support.microsoft.com/default.aspx?scid=KB;en-us;q223346)]

Roger
0
 
LVL 1

Author Comment

by:TuliTaivas
ID: 16278720
Hi Jay

Since you didn't mention it in your reply, do you think I don't have to worry about the available memory thing?

Roger
0
 
LVL 48

Expert Comment

by:Jay_Jay70
ID: 16279196
heya mate, the rules for infrastructure master only come into play with multi domains... you are correct :)


i still think we need to look at that available memory im just trying to think on what it could be...
0
 

Assisted Solution

by:computertsu
computertsu earned 600 total points
ID: 16356017
In the Exchange System Manager, go to (your names may vary)
Administrative Groups, First Administrative Group, Servers, SATURN
right-click SATURN, go to Properties, click the tab named Directory Access
chage the Show drop down list to Global Catalog Servers, disable the automatic check box and add a different DC, then remove JUPITER.
The change should take effect immediately. See if you still have DSAccess errors.
I believe the AD and/or File Replication Service (NTFRS) may be damaged on your JUPITER DC server.
0
 
LVL 1

Author Comment

by:TuliTaivas
ID: 16356162
Hi
The systems have behaved well for more than a week now, whereas when the problems began, the servers threw up every 2 to 4 days. A few days ago I set our second DC (called VENUS) to be GC as well. Maybe this has helped.

In the properties page computertsu has pointed out I noticed that configuration server is now venus, before that it was jupiter. I don't know if this is of any significance.

I'm going to split the points among you if this is OK.

If something comes to your mind that explains what has happened, please let me know. For the moment I don't have the time to investigate more. I'm glad that it seems OK now. Hope the best.

Take care

Roger
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

NTFS file system has been developed by Microsoft that is widely used by Windows NT operating system and its advanced versions. It is the mostly used over FAT file system as it provides superior features like reliability, security, storage, efficienc…
Often times it's very very easy to extend a volume on a Linux instance in AWS, but impossible to shrink it. I wanted to contribute to the experts-exchange community a way of providing a procedure that works on an AWS instance. It can also be used on…
We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…
Screencast - Getting to Know the Pipeline
Suggested Courses
Course of the Month13 days, 22 hours left to enroll

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question