briansikes

asked on

Last year's data replaced current data after the holidays

Well, our school system was out for two weeks over the holidays and upon coming back one school contacts us to let me know that student home directories from last years students have reappeared  in with the current students and any directories that existed both last year and this year have been replaced by last year's version.

The only thing I know that happened was that the server lost power at least once due to snow storms.

Windows server 2008 Service Pack 2
Dell PowerEdge 2900
PERC 5/i Integrated

wwakefield
Flag of United States of America image

Link to home
-Check the System and Application logs on the file server
-Did you run any sort of scheduled task / mass copy / batch copy last year around this time?  Any chance it was set to run on the 1st of every year by accident?
-What do you use for backups?  Could a restore have been performed over the break?
David
First, check it out for yourself.

If this really did happen, then one possible scenario is that your RAID controller or a disk had a problem, so that the mirror became broken long ago when the old students were attending, and since the server was never booted, nobody knew or paid attention.   Now they fire up the system, and the broken RAID booted up the old disk drive set.   This is actually not that uncommon.

To better explain, say disk 0 which is primary boot for the RAID1/10 had a glitch, but did not actually fail ... it was dropped from the mirror.  So for 6 months all I/O was going to disk #1.  System then gets power cycled,  your cheapo-slightly dumb RAID continues to boot disk0.   presto last' years data.

Now the bad thing is that this controller will probably re-sync the data from disk0 -> disk1, so "current" data was rewritten.
You better get out there.
Unless I am mistaken, I read this that they did not lose any of the current data UNLESS it had the same file name as the old data, in which case, it was overwritten. That sounds like a restore or a file copy of some type from a backup of some sort rather than an offline drive. An offline drive would have stopped having data written to it. I am in agreement with the first post. This does not seem to be something that could have happened "by itself" unless it was a scheduled task as Brian mentioned.
sifuedition - yes, it could be that too, I was working from premise that there was more than one logical drive, and only one of them had the broken RAID, hence my insistence that the admin goes on site to get to the bottom of things.   One just can not trust these end-users to give 100% accurate and complete info.
briansikes


I didn't take the user's word blindly, I remoted into the server to look for myself and verified that I was seeing the same thing she was. Now I can't actually verify exactly what files the users were working on, but I can see the creation dates on the files are from the previous year and the IT Dept deletes all student files at the end of the school year. I can also see that there are home directories for user account that no longer exist in active directory.

The broken mirror theory makes the most sense to me, but I checked the RAID controller logs and couldn't see a reference to a rebuild. I'll attach them to see if anyone else can see one, maybe I'm looking in the wrong place.  lsi-0103.log

I rouge restore would also make since, but the logical drive were student store their files on that server isn't backed up, just the system drive, and its backed up with Symantec Backup Exec System Restore 8.5. I checked its logs and found no record of a restore.

I also checked the task scheduler and found nothing.
Under the heading of the mirror being broken then restoring, wouldn't a HUGE break in the logs show up?  You could rule that out with validating there are consistant log entries throughout the year.   To include the seagate logs etc.
Well, perhaps the RAID controller's log files are volatile (i.e, saved in a battery), so with system off for a few weeks then battery is discharged.    Some controllers save logs in the reserved metadata errors of disks, so they are never lost.   Maybe no rebuild is logged because it the stupid RAID controller doesn't know it needs to rebuild, which means you risk data destruction (assuming it is smart enough to do load balancing.  If it has a battery, and it is dead, then it would explain above.

Check raid diagnostics, make sure battery is good, and see if there is a non-volatile clock that shows dates going back to before power was turned off for the holiday.  If the clock starts today, then you have further indication of a RAID controller issue, and your current data could very well be on other disks.   If things are inconclusive, and controller allows a data consistency CHECK (not restore, check only), see if the data matches.   If you don't have that option ... power off, use a non-RAID controller and a binary editor to do some comparisons.

If you have an inexpensive, i.e, non-premium $500+ controller, then this is more of a possibility then if you had, say a HP SMARTArrray, or a LSI MPT-class.   (Promise, Intel Matrix, NVIDIA, etc.. all are 'cheap' controllers that can go a little brain-dead and this does happen on them)
I will review the raid controller log, but if the backup software is not scheduled / enabled on these folders, what about VSS? If the MS volume shadow copy is running, a user with sufficient privileges could have done this within the properties of the folder. Just a thought.
I considered shadow copy as a source of the issue as well, but shadow copies are disabled on both volumes. Thanks for looking over the controller log, sifuedition, I was afraid I was missing something. I'm at a loss at this point.
We never really tracked this down, but I will still like to award points to the people who helped.
