Link to home
Start Free TrialLog in
Avatar of PeteJH
PeteJH

asked on

Windows 2003 Server SBS hangs intermittently during Exchange backup

Hi all,

I have had this issue for several months now, and has become more frequent over time. Initially it was occuring once a week, if that. Now it seems to occur almost every day.

First, please review this link, describing the exact issue I am experiencing:

http://forums.microsoft.com/technet/ShowPost.aspx?postid=2551837&siteid=17

(transcript)
I have an SBS2003 server running Exchange.
Backing up the Exchange server will intermittantly (every 3 to 7 days) hang the server (by hang I mean: no video, no keyboard, no mouse, no network, no disk io, no BSOD).

The event log sequence is:
    * ESE 210 start backup
    * ESE 220 start priv1.edb
    * ESE BACKUP 907 shared memory
    *(server stops responding here)
    * ESE 221 end priv1.edb (normal execution)
    * ESE 220 start priv1.stm
    * ESE 221 end priv1.stm
    * etc

 There are NO event log entries once the server hangs to indicate why the server is hung.
(/transcript)

As above, the server completely stops responding after the ESE Backup begins. It does not seem to stop immediately though, as I have had some irrelevant other application entries get logged in eventvwr within 2 or 3 minutes of the ESE Backup event, though within around 10 minutes of the event occuring the server seems to have stopped (Symantec Mail Security looks for updates and logs it every 10 minutes on this server... something else I need to look into but is irrelevent to this scenario).

Sample of events occuring prior to server "hang":

14/12/07 3:46 AM, ESE 220: Information Store (604) First Storage Group: Beginning the backup of the file N:\Exchsrvr\mdbdata\priv1.edb (size 184 Mb).

14/12/07 3:46 AM, Information Store (604) Backup data transfer method is shared memory (64kb).

(And then nothing else, until customer hard resets server when they come in to work in the morning)




Server hosts Symantec Antivirus 10.1.0.1000 and Mail Security 5.0.2.221.  Both of these have had complete uninstalls and reinstalls performed to ensure they are not the culprit.

I also disabled the SBS Backup for 3 days several weeks back to determine if the backup itself was the issue, but the problem seemed to continue (along with the ESE Backup events that seem to be the source of the issue). This I am rather confused about, and am thinking to try this test again to doublecheck this. Thankfully, the site is a relatively small one with very small data change daily.

All basic updates except for Exchange Service Pack 2 have been performed (although this does not seem to alleviate the issue according to the link from Technet).



In most cases, the failures are occuring around 3:40 to 4:00 am in the morning, when the Exchange backup is occuring (the below sample is when the problem DOES NOT occur, there is no log generated and backup does not complete when the problem occurs):

Backup of "SERVERNAME\Microsoft Information Store\First Storage Group"
Backup set #3 on media #1
Backup description: "SBS Backup created on 7/12/2007 at 11:00 PM"
Media name: "Small Business Server Backup (01).bkf created 7/12/2007 at 11:00 PM"

Backup Type: Normal

Backup started on 8/12/2007 at 3:46 AM.
Backup completed on 8/12/2007 at 3:53 AM.
Directories: 4
Files: 5
Bytes: 327,197,234
Time:  6 minutes and  30 seconds


Anyone know this issue and have a solution? Searches online for other people experiencing the same problem have come up with very little, bar the TechNet link I provided above. Unfortunately, there is no solution there, but at least confirmation I am not alone with this problem.

The largest issue in finding information on this problem is the fact there is no error message, no failure event. The only symptom is a complete server freeze requiring a hard reboot and the fact it is almost ALWAYS occuring after the ESE backup of priv1.edb database event called by the SBS backup routine at 4am.
ASKER CERTIFIED SOLUTION
Avatar of Sembee
Sembee
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of PeteJH
PeteJH

ASKER

This is something I had previously (fair while back) suspected to be the case; up until I turned of the SBS backup completely and seemingly still had the problem.

I will move the SBS backup forward a couple of hours to move the Exchange backup out of the maintenance window and report if successful.
Avatar of Jeffrey Kane - TechSoEasy
You say that everything is up-to-date except Exchange SP2?  Can you please post the output from running SYSTEMINFO at a command prompt?

Thanks.

Jeff
TechSoEasy
Avatar of PeteJH

ASKER

Host Name:                 SERVERNAME
OS Name:                   Microsoft(R) Windows(R) Server 2003 for Small Busines
s Server
OS Version:                5.2.3790 Service Pack 2 Build 3790
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Primary Domain Controller
OS Build Type:             Multiprocessor Free
Registered Owner:          OWNER
Registered Organization:   COMPANY
Product ID:                74995-OEM-4213074-88596
Original Install Date:     28/12/2006, 9:30:17 AM
System Up Time:            0 Days, 3 Hours, 29 Minutes, 16 Seconds
System Manufacturer:       AnabelleB
System Model:              Clarendon SP350
System Type:               X86-based PC
Processor(s):              2 Processor(s) Installed.
                           [01]: x86 Family 15 Model 6 Stepping 4 GenuineIntel ~
2999 Mhz
                           [02]: x86 Family 15 Model 6 Stepping 4 GenuineIntel ~
2999 Mhz
BIOS Version:              INTEL  - 0
Windows Directory:         C:\WINDOWS
System Directory:          C:\WINDOWS\system32
Boot Device:               \Device\HarddiskDmVolumes\SERVERNAMEDg0\Volume1
System Locale:             en-au;English (Australia)
Input Locale:              N/A
Time Zone:                 (GMT+10:00) Canberra, Melbourne, Sydney
Total Physical Memory:     1,021 MB
Available Physical Memory: 172 MB
Page File: Max Size:       2,455 MB
Page File: Available:      1,205 MB
Page File: In Use:         1,250 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    DOMAIN.local
Logon Server:              \\SERVERNAME
Hotfix(s):                 74 Hotfix(s) Installed.
                           [01]: File 1
                           [02]: File 1
                           [03]: File 1
                           [04]: File 1
                           [05]: File 1
                           [06]: File 1
                           [07]: File 1
                           [08]: File 1
                           [09]: File 1
                           [10]: File 1
                           [11]: File 1
                           [12]: File 1
                           [13]: File 1
                           [14]: File 1
                           [15]: File 1
                           [16]: File 1
                           [17]: File 1
                           [18]: File 1
                           [19]: File 1
                           [20]: File 1
                           [21]: File 1
                           [22]: File 1
                           [23]: File 1
                           [24]: File 1
                           [25]: File 1
                           [26]: File 1
                           [27]: File 1
                           [28]: File 1
                           [29]: File 1
                           [30]: File 1
                           [31]: File 1
                           [32]: File 1
                           [33]: Q147222
                           [34]: KB933854 - QFE
                           [35]: SP1 - SP
                           [36]: KB912442 - Update
                           [37]: KB916803 - Update
                           [38]: KB931832 - Update
                           [39]: KB931978 - Update
                           [40]: Q927978
                           [41]: Q936181
                           [42]: IDNMitigationAPIs - Update
                           [43]: NLSDownlevelMapping - Update
                           [44]: KB925398_WMP64
                           [45]: KB938127-IE7 - Update
                           [46]: KB939653-IE7 - Update
                           [47]: KB942615-IE7 - Update
                           [48]: KB914961 - Service Pack
                           [49]: KB929969 - Update
                           [50]: KB921503 - Update
                           [51]: KB925902 - Update
                           [52]: KB926122 - Update
                           [53]: KB927891 - Update
                           [54]: KB929123 - Update
                           [55]: KB930178 - Update
                           [56]: KB931784 - Update
                           [57]: KB931836 - Update
                           [58]: KB932168 - Update
                           [59]: KB933360 - Update
                           [60]: KB933729 - Update
                           [61]: KB933854 - Update
                           [62]: KB935839 - Update
                           [63]: KB935840 - Update
                           [64]: KB935966 - Update
                           [65]: KB936021 - Update
                           [66]: KB936357 - Update
                           [67]: KB936782 - Update
                           [68]: KB941202 - Update
                           [69]: KB941568 - Update
                           [70]: KB941569 - Update
                           [71]: KB941672 - Update
                           [72]: KB942763 - Update
                           [73]: KB943460 - Update
                           [74]: KB944653 - Update
Network Card(s):           1 NIC(s) Installed.
                           [01]: Intel(R) PRO/1000 PM Network Connection
                                 Connection Name: Server Local Area Connection
                                 DHCP Enabled:    No
                                 IP address(es)
                                 [01]: 192.x.x.x





Exchange is: Version 6.5, Build 7226.6 Service Pack 1
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of PeteJH

ASKER

This server was installed using a Windows Server 2003 SBS w/ SP1 media. It has been a long time since the last time I have seen a Win2k3 SBS media without SP1 slipstreamed in tbh. R2 maybe, but not SP1.

Will see how the VSS Update goes, cheers.

As an update: I have moved the backup to 6pm (from 11pm), and moved the mail store maintenance periods to 5am-8am (from 12am-4am) to see if either the problem is rectified or to possibly change the time of the hang.
It's been about a year since SP1 was slipstreamed... just wanted to make sure.

Jeff
TechSoEasy
Hi,

The issue seems to be with the shared memory. I see that you have the following
memory

Total Physical Memory:     1,021 MB
Available Physical Memory: 172 MB
Page File: Max Size:       2,455 MB
Page File: Available:      1,205 MB
Page File: In Use:         1,250 MB
Page File Location(s):     C:\pagefile.sys

few things that are recomended are increasing the physical RAM.
Changing the location of the pagefile.sys to d:\

And above all updating the VSS component as for all backups VSS is the main module that is used while backing up.
The other thing that you can do is to change the location where the backups are placed. By default it would be going to the C:\

Please change the location to d:\

If you dont have a additional drive that is d:\

would recommend adding one more drive as you would run out of space with exchange running.

bhanu

Avatar of PeteJH

ASKER

Hi Bhanu,

I am skeptical believing that the issue is a lack of physical RAM, or that the page file is the cause simply because it is on the system volume. What is the reasoning behind your recommendation? I will likely implement a move of the pagefile but as far as I am aware this is only useful for speed purposes, not specifically to alleviate issues. I will also set the pagefile to a large static value for testing.


This is the setup for this server:

C:\ System, contains dynamic pagefile
N:\ Shared Data, Business Apps, and Exchange

All volumes hosted on a single RAID1 array.

SBS Backup is backing up to a Rev Drive set as drive D:\. I have previously moved the backup onto the local drives as a test to rule out the backup hardware as the cause of the freeze.

This server is the least intensive SBS server we have ever needed to implement, and is specced similar to some older machines running 5x the users and much more data processing. It has 3 users total, only 2 run major applications hosted by the server itself, and noone is using the resources of the server after about 5:30pm every day until around 8am in the morning. The Exchange databases are tiny, around 330MB total. If there is any specific literature or reasoning as to why a RAM upgrade may fix this issue, I'll gladly submit and plonk some more in (RAM is cheap nowadays as it is, so I'll end up doing this in most likelyhood anyway) but I'd rather not muddy the troubleshooting waters until other propositions have been tried and have failed.
Avatar of PeteJH

ASKER

Progress report:

After changing maintenance window and backup times, and updating VSS I have yet to see a freeze occur. Unfortunately, the backups have not succeeded since either; they have not managed to get to the Exchange portion of the backup in order to test if the problem has been rectified.

I am looking into why the backups are now failing, will report tomorrow should they succeed.
Hi,

Well my recommendation was based on the fact that Exchange is a very memory intensive application and it uses lot of shared memory. On the top of it the VSS also uses shared memory, and much of the time the VSS backups failing is because of the fact that the problem was in allocation of the shared memory and releasing the shared memory after the application no longer needs it.

So wanted to know if you can allocate more physical ram and move the pagefile to alternate disk.

Coming to the exchange backups not working. Check if there are any errors with exchange it self. Run exchange tools to check the consistency and also run the memory diag tools to check if there are any memory related issues.

bhanu


Avatar of PeteJH

ASKER

Not sure exactly which failures you are speaking of, but:

The recent failures I have found to be caused by PEBKAC. I accidentally changed the path the backups were being sent to and as such prior backup files were not being overwritten/erased, thus filling up the REV disks and therefore failing.

Regarding the Exchange backup failures (the cause of the server hangs), I will perform some more checks on the database files. When you state "Exchange tools" and "the memory diags", is there anything specific you're referring to there... or simply stating I should check with eseutil and any old ram testing software to determine the db's are OK and that the server doesn't have a faulty RAM module?
Hi,

when i meant exhcange tools i mean eseuitl to check the exchange db consistency.
and Memory was for checking if we have a faulty RAM.

Pete there maybe different reasonse why the server might hang one it is not able to allocate the memory or the application is creating a buffer overflow.
Added to that we have seen with win2003 SP2 scalable networking the following problems on the servers

http://support.microsoft.com/kb/945977

Well when we dont see any errors we need to look at different things. But say we see a error which refers to a errors in VSS or other things it can be specific issue. As in your case we are not able to determine the exact cause i am trying to ask for checking all the areas where it might go wrong.

bhanu
Avatar of PeteJH

ASKER

Update:

No hang or error has occurred since the initial changes stated in the first update. Backups have successfully completed over two nights now.

Should the problem reoccur I will need to perform some of the other suggestions in this thread from bhanukir7, though I am loathe to perform the RAM testing given the timing (Christmas trading, and good RAM tests requiring Windows to be down for an extended period).

I performed an eseutil /g and /k and nothing was reported out of the ordinary. I performed them on an offline copy of the dbs though, is there any reason I should be performing these tests on an online copy (if that's even possible)?

I will continue monitoring the server for a week or so before stating whether the issue has been fixed, as in the past it has at times taken that long between failures. I'm quietly optimistic though, as recently failures were getting more and more frequent, not less.

Avatar of PeteJH

ASKER

13 Days of uptime later and the first server reset has occured on the 31st Dec.

Having looked at the logs, this does not seem to be related to the issues above (up until the server restarted eventlogs were being written), I'm thinking the client is now trigger happy on the reset button  after so many months of constant server hangs.

Basically, I think the problem has been rectified. I cannot be sure if it was some problem caused by conflicting Exchange maintenance times and the Exchange backup portion of the SBS Backup, or if it was simply an update to the VSS writers. As such, I will split the points between Sembee and TechSoEasy.

Thankyou all for your input, this issue has been niggling me for a long time.
I think this is starting to happen to one of my SBS 2003 servers. The link to the VSS update is dead. Does anyone know what the update number is or have a link to Microsoft for it?
Will Microsoft Update site apply this update?

Eric
I've fixed the link.

Jeff
TechSoEasy
Part of the problem we were having was a third party backup provider was given access to the server and was running a back up with their software. Even though I configured NTBACKUP via the SBS Console, the client called this offsite backup provider which I was unaware of.