[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 222
  • Last Modified:

How to easily identify a corrupted text file, or any file for that matter?

Hi,

We have had a hard disk corruption and some of the text files are unreadable, i.e. If you open it in notepad, you will see strange characters, boxes, and sometimes text from another file.
Previously, I have writtne VBscript to read the file and take the first line of the file and if it is what I expected from the file, I will assume that this file is fine.
However, seems that there is a loop hole that I have seen a file to be fine halfway and corrupted as anything halfway down.
I may have to rewrite my script to completely parse the file and see if it's ok syntax-wise, however that is going to be one heck of a job.

Is there any easier way to do this, to identify if a text file has been corrupted?

We also have video files in the drive, the only way to know if it's ok is to try to play it on the player. I did try extracting the meta data out of the video files and assume that if the meta data cannot be extracted, the file is corrupt. Works well until we found videos that are unplayable, but meta-data is good.
0
FujiMed
Asked:
FujiMed
  • 4
  • 3
1 Solution
 
Bill PrewCommented:
I doubt there's really a way to be 100% sure if a file is correct, but you may be able to tell that it isn't.  In the case of a TXT file you could read the file character by character and make sure on printable characters and line feeds, tabs and carriage returns are found.  Still doesn't guarantee the file is in tack though.

For other types I think you have to open each and inspect it with the native program to determine if it's correct.

You may find a third party program that can validate certain file types, but I haven't looked.

~bp
0
 
FujiMedAuthor Commented:
Thanks, one (major, i should say) problem we are having is that there are millions of files dating back from 1995.
0
 
Bill PrewCommented:
I assume none of the data involved was backed up?

~bp
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
FujiMedAuthor Commented:
The server is actually maintained by another company.
From what I heard, some of the backup tape melted during a heat wave :(
with the backup being incomplete, they are reluctant to completely overwrite the disk with the backup and wanted to salvage whatever they can, identify what is corrupt and try to find it in backup if there is any, and what they can't get back, they'll have to report back.
0
 
Bill PrewCommented:
Well, if they have a backup of some of the data. I'd suggest you have them compare the current files to the backup versions, identifying which ones are the same.  This could be done with and MD5 calculation of each file and comparing.  Or their backup software may have a way to do this built in.

This would probably find a very high percentage of the files as unchanged, and those you would then know are not corrupted.  This would allow you to focus on a much smaller number of files to review for corruption.

~bp
0
 
FujiMedAuthor Commented:
Thanks! MD5 calculation may be a good idea.
0
 
SelfGovernCommented:
Just be careful about how you interpret the checksum -- a file that changed after it was backed
up will also fail an md5 check (the newer file on disk will legitimately be different from the older
file on tape).

This is a time to stop and think before doing anything that can't be undone.
0
 
FujiMedAuthor Commented:
Thanks
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now