UncleVirus
asked on
NTBackup problems backing up to a Windows NAS Server
Receiving some weird errors when trying to backup one of our DC's to a Windows-Server based NAS box. (Every other server on the network backs up to this device, and currently they're all fine).
Here's a snippet from the logs:
Thanks in advance.
Here's a snippet from the logs:
Backup Status
Operation: Backup
Active backup destination: File
Media name: "Full-Friday.bkf created 26/11/2010 at 18:00"
Volume shadow copy creation: Attempt 1.
Backup (via shadow copy) of "C: "
Backup set #1 on media #1
Backup description: "Set created 11/11/2010 at 16:11"
Media name: "Full-Friday.bkf created 26/11/2010 at 18:00"
Backup Type: Normal
Backup started on 26/11/2010 at 18:08.
Error: The device reported an error on a request to write data to media.
Error reported: Unknown error.
There may be a hardware or media problem.
Please check the system event log for relevant failures.
The operation was ended.
Backup completed on 26/11/2010 at 18:09.
Directories: 66
Files: 110
Bytes: 12,082,390
Time: 22 seconds
Error: C: is not a valid drive, or you do not have access.
Error: An inconsistency was encountered in the requested backup file.
Error: D: is not a valid drive, or you do not have access.
Error: An inconsistency was encountered in the requested backup file.
----------------------
The operation did not successfully complete.
----------------------
Thanks in advance.
Can you post the job's details? Looks like a problem with job setup, especially since you're mentioning that the other servers are doing just fine
ASKER
Yeah sure, here's the shortcut content:
It fails every day.Here's the contents of the .bks (Same every day):
I'm going to attempt to backup manually - locally - To see if that goes off without a hitch while waiting for an answer also.
Thanks in advance.
C:\WINDOWS\system32\ntbackup.exe backup "@C:\Documents and Settings\Administrator.MYDOMAIN\Local Settings\Application Data\Microsoft\Windows NT\NTBackup\data\Incr-Thurs.bks" /n "Incr-Thurs.bkf created 11/11/2010 at 16:18" /d "Set created 11/11/2010 at 16:18" /v:yes /r:no /rs:no /hc:off /m incremental /j "Incr-Thurs" /l:s /f "\\Nas.mydomain.local\current$\DC2\DC2\Incr-Thurs.bkf"
It fails every day.Here's the contents of the .bks (Same every day):
C:\
D:\
SystemState
I'm going to attempt to backup manually - locally - To see if that goes off without a hitch while waiting for an answer also.
Thanks in advance.
Okay, the command line doesn't look crazy, so that's a start. Basically I can think of three error-scenario's here: 1) the user you're running the backup as has no privileges to back up c: and d:. I think that's pretty unlikely though, since think you're running this as yourself... correct me if I'm wrong... 2) the backup user might not have the correct privileges on the NAS box - but if my assumption from 1 is correct, then that's in doubt, too. 3) the bkf file you're trying to access might be corrupt. At first glance that might be the best bet.
So could you try just doing a SystemState backup to a new bkf file on the NAS box (not an append)
So could you try just doing a SystemState backup to a new bkf file on the NAS box (not an append)
ASKER
I attempted to dump the system state on the NAS and it was having none of it...
I'll re-assess the permissions, and if necessary - Create a new user purely for the backup.
Backup Status
Operation: Backup
Active backup destination: File
Media name: "Incr-Thurs.bkf created 29/11/2010 at 12:31"
Volume shadow copy creation: Attempt 1.
Backup (via shadow copy) of "System State"
Backup set #1 on media #1
Backup description: "Set created 29/11/2010 at 12:31"
Media name: "Incr-Thurs.bkf created 29/11/2010 at 12:31"
Backup Type: Copy
Backup started on 29/11/2010 at 12:31.
Error: The device reported an error on a request to write data to media.
Error reported: Unknown error.
There may be a hardware or media problem.
Please check the system event log for relevant failures.
The operation was ended.
Backup completed on 29/11/2010 at 12:33.
Directories: 40
Files: 44
Bytes: 11,214,668
Time: 1 minute and 8 seconds
----------------------
The operation did not successfully complete.
----------------------
I'll re-assess the permissions, and if necessary - Create a new user purely for the backup.
ASKER
Nope - I've checked all the permissions and they're fine. Running the job as a network administrator throws up a warning box halfway through with the above message...
since it's halfway through I think we can safely conclude that the issue is not with reading the data... You wrote earlier that you wanted to try writing to a local bkf file. Did that work?
btw, I assume you did try copying any sort of large file from the DC to the NAS box to see that its accessible from the DC and copes ok with large files at all - right? Just making sure :-)
ASKER
Yes I tried that, I have also successfully performed a local backup of the 'system state' - I can even copy that file to the NAS if I want to. I'm rapidly approaching the 'stumped' stage here.. :-(
Hey, "stumped" is fun :-) (after figuring out the cause)
Is the windows eventlog giving any clues?
also, what happens if you map the share as a network drive first and then try backing up to that drive (running ntbackup under the same user that created the drive, of course)?
last question that pops into my head at the moment: are there any communication signing policies set up on the DC or on the NAS box (via GPO, most likely)
Is the windows eventlog giving any clues?
also, what happens if you map the share as a network drive first and then try backing up to that drive (running ntbackup under the same user that created the drive, of course)?
last question that pops into my head at the moment: are there any communication signing policies set up on the DC or on the NAS box (via GPO, most likely)
ASKER
I take that back - It has failed with a new error... Please see the screencast for more information.
(Side-note.... This screencast thing is pretty damn awesome......!) UncleVirus-374382.flv
(Side-note.... This screencast thing is pretty damn awesome......!) UncleVirus-374382.flv
ASKER
Sorry should have added: the Path on the 'NAS' isn't via a mapped drive - It's via full UNC path which is:
\\NAS.Mydomain.local\curre nt$\DC2\ (Which is there the bkf files reside.
\\NAS.Mydomain.local\curre
I agree, that screen cast thingy is cool. What software is it?
The pathname is shorter than 256chars and contains less than 10 spaces, so you're okay there. "The path is too deep" is one of those catch-all error messages - basically it says something went wrong and I don't know what....
I think we can conclude at this point that the issue is not NTBackup but the connection between your DC and the NAS box.
Is there a managed router in between (so you can see if there's any errors on the port switches) to rule out network errors?
The pathname is shorter than 256chars and contains less than 10 spaces, so you're okay there. "The path is too deep" is one of those catch-all error messages - basically it says something went wrong and I don't know what....
I think we can conclude at this point that the issue is not NTBackup but the connection between your DC and the NAS box.
Is there a managed router in between (so you can see if there's any errors on the port switches) to rule out network errors?
3 points.
1
You say the box is Windows Server Based NAS. Does it mean on SMB level, or is the OS Windows?
I am asking because some of the boxes use FAT32 with obvious limitations.
2
Is the box in any way a part of the DC machines's domain?
If not, you are likely to have trouble, unless you establish a trust relationship with the domains.
This is what I find most likely to be the problem.
Same, if the NAS is defined to be only in a workgroup, or is based on an old version of windows, you might have authentication problems that are not obvious.
3
Are the disks in the NAS box in perfect order?
1
You say the box is Windows Server Based NAS. Does it mean on SMB level, or is the OS Windows?
I am asking because some of the boxes use FAT32 with obvious limitations.
2
Is the box in any way a part of the DC machines's domain?
If not, you are likely to have trouble, unless you establish a trust relationship with the domains.
This is what I find most likely to be the problem.
Same, if the NAS is defined to be only in a workgroup, or is based on an old version of windows, you might have authentication problems that are not obvious.
3
Are the disks in the NAS box in perfect order?
ASKER
1. The NAS is a Windows Server 2003 machine with 2TB of RAID'ed storage. The OS is Windows and formatted in NTFS.
2. Yes, it's a member of the DC's domain.
3. That's a good point - I'll check over eventvwr for disk entries. I don't have any RAID monitor software on there currently so a visual inspection may be in order. I can't take the machine offline as 6 other machines back up to it every night! :-(
Still, that's not til 6AM - So I could kick off a chkdsk throughout the day at a push if it's required.
2. Yes, it's a member of the DC's domain.
3. That's a good point - I'll check over eventvwr for disk entries. I don't have any RAID monitor software on there currently so a visual inspection may be in order. I can't take the machine offline as 6 other machines back up to it every night! :-(
Still, that's not til 6AM - So I could kick off a chkdsk throughout the day at a push if it's required.
Sounds like a plan. Did you get around to checking the switch for errors? You might also want to check the Communication Signing policies. This could be something stupid like a bad network card or cabling, too...
ASKER
The machine only has 1x Gigabit NIC fed into a Cisco 2950 switch. There aren't any errors on the port.
I can't physically check the RAID/HDD Status LED's - No software to check status of RAID, so I'd need to reboot and use the BIOS-Level utility I'm afraid :-( - That'll have to wait til tomorrow, but the System eventlogs don't show any 'disk' errors, or any errors infact with regards to the backup.
I'm tempted to powercycle the NAS machine - But can't help feeling it shouldn't need it otherwise.. what's your suggestion? I think I'll probably hold off until the morning anyway.
I can't physically check the RAID/HDD Status LED's - No software to check status of RAID, so I'd need to reboot and use the BIOS-Level utility I'm afraid :-( - That'll have to wait til tomorrow, but the System eventlogs don't show any 'disk' errors, or any errors infact with regards to the backup.
I'm tempted to powercycle the NAS machine - But can't help feeling it shouldn't need it otherwise.. what's your suggestion? I think I'll probably hold off until the morning anyway.
Power cycling the NAS device is not a bad idea - probably the first thing MS support would tell you to do anyhow.
For the 2950 you can use the Cisco Network Assistant to have a closer look.
Did you look at the security options of the applicable GPO's when it comes to securing the communications? Btw, what also might be an issue is time: make sure that the DC and the NAS box have pretty much the exact time...
For the 2950 you can use the Cisco Network Assistant to have a closer look.
Did you look at the security options of the applicable GPO's when it comes to securing the communications? Btw, what also might be an issue is time: make sure that the DC and the NAS box have pretty much the exact time...
Heya,
Can you let me know if this resolved it - or if you need more help? Just curious.
Sven
Can you let me know if this resolved it - or if you need more help? Just curious.
Sven
ASKER
Apologies sven, I've been in hospital since Monday! :-( I'm going to power cycle the NAS this morning and also run a net time /querysntp to ensure they've all got the same NTP source and I'll sync clocks.
As for the screencast software, it's a new feature on ExpertsExchange. It uses Java... pretty neat imo!
As for the screencast software, it's a new feature on ExpertsExchange. It uses Java... pretty neat imo!
ASKER
Hasn't been rebooted in a while... :-)
Capture.PNG
Capture.PNG
it's not often that I see windows boxes up for that long. That might be your issue there right there.
Also, since it hasn't been rebooted that long I also assume that it didn't get any Windows Updates in all that time. You might want to get/apply those as well, just to be sure that a possible mismatch isn't causing your error either...
No worries about not updating, health is more important than IT :-)
Also, since it hasn't been rebooted that long I also assume that it didn't get any Windows Updates in all that time. You might want to get/apply those as well, just to be sure that a possible mismatch isn't causing your error either...
No worries about not updating, health is more important than IT :-)
ASKER
Failed again after rebooting 'NAS' box. Two entries in 'System' log on the DC2 machine:
I'm kind of swaying towards a bad cable somewhere or possibly a bad NIC driver/Switch port? - I'll check for Windows Updates & updated drivers on the DC as the NAS is talking to 5 other boxes okay. I'm also going to power cycle the DC2 to see if that'll help matters.
Event Type: Information
Event Source: Application Popup
Event Category: None
Event ID: 26
Date: 02/12/2010
Time: 11:30:13
User: N/A
Computer: DC2
Description:
Application popup: Windows - Delayed Write Failed : Windows was unable to save all the data for the file \\nas.mydomain.local\DC2\test-sys-state.bkf. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
[code]
Event Type: Warning
Event Source: MRxSmb
Event Category: None
Event ID: 50
Date: 02/12/2010
Time: 11:30:13
User: N/A
Computer: SPADC2
Description:
{Delayed Write Failed} Windows was unable to save all the data for the file \Device\LanmanRedirector. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 04 00 04 00 02 00 56 00 ......V.
0008: 00 00 00 00 32 00 04 80 ....2..¿
0010: 00 00 00 00 0c 02 00 c0 .......À
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 0c 02 00 c0 ...À
I'm kind of swaying towards a bad cable somewhere or possibly a bad NIC driver/Switch port? - I'll check for Windows Updates & updated drivers on the DC as the NAS is talking to 5 other boxes okay. I'm also going to power cycle the DC2 to see if that'll help matters.
network communication is indeed what this looks like to me - could be hardware, drivers, ip-signing or kerberos... Have a look on the NAS box's eventlog, likely that it will report something. When you power cycled it, did you have a look in the raid config - and was that clean?
ASKER
I'm working remotely here Sven,and I'm not awarded luxuries like KVM-Over-IP (Doubt the Accounts dept. would allow me the privilidge :P) so I can't check the RAID situation I'm afraid. I'll check eventlog on the NAS yeah. Bear with me.. thanks for being so patient :-)
ASKER
Sorry for lack of an update. I've since installed Intel's RAID utility and the array is fine, and no errors on any of the disks.
As I've upgraded network card drivers on all servers including the NAS, I'm going to put this down to either a bad cable, a bad port on the switch (Doubtful) - If not, I will have to look at I-signing or kerberos.
Weirdly, every now and then a successfull backup will actually go through AND verify...! This only seems to have started happening since the server was moved to a different location in the building. The next step is to replace the cable, and see how it goes. I will also connect it to a different port on the switch.
As I've upgraded network card drivers on all servers including the NAS, I'm going to put this down to either a bad cable, a bad port on the switch (Doubtful) - If not, I will have to look at I-signing or kerberos.
Weirdly, every now and then a successfull backup will actually go through AND verify...! This only seems to have started happening since the server was moved to a different location in the building. The next step is to replace the cable, and see how it goes. I will also connect it to a different port on the switch.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Interestingly, I managed to get a 'consistency check' running yesterday due to the business shutting down for a few days. One of the virtual disks found bad sectors so I think we've found the cause of the problem!
ASKER
Hit the nail on the head! Failing a consistency check and repaired problems in the virtual disk, which allowed a full backup to go through fine.
Disk was also heavily fragmented. Whether this was a side effect of the disk failure I'm not sure, but NOTHING was shown in event viewer OR the RAID controller logs.
Disk was also heavily fragmented. Whether this was a side effect of the disk failure I'm not sure, but NOTHING was shown in event viewer OR the RAID controller logs.
ASKER
Moderator: Points were intended to be split between sven_jambor (150 Points) and frragnarsson (350) Pts. could this be rectified please?