Link to home
Start Free TrialLog in
Avatar of rump0054
rump0054Flag for United States of America

asked on

How can I monitor file integrity of thousands of digital library resources?

I work the UC Berkeley in the Library Systems Office.  Our main purpose is to host digital collections and provide end users with the tools to create them and the storage to place the files.  After 20 years of host these types of projects and several server changes and file moves we have come to the realization that we need to be able to monitor all of these files to make sure that they don't become corrupt and if they do we need to know so we can replace the corrupted file with a previous version.

Does anyone know of an open-source solution to this problem?  Any and all help is appreciated!
Avatar of Dave Baldwin
Dave Baldwin
Flag of United States of America image

'md5sum' http://en.wikipedia.org/wiki/Md5sum is frequently used and as the article notes, it is part of many operating systems.  The MD5 sum is often found on web sites for files to be downloladed so you can verify your downloads.  There is also sha1sum http://en.wikipedia.org/wiki/Sha1sum .
Avatar of rump0054

ASKER

I did do a application planning phase for this project and I was planning on using MD5 so it's good to hear that others agree that MD5 is appropriate for this purpose.

My main request is if anyone knows of any open source solution, something like Tripwire for example, so I wouldn't have to actually create the application myself.

BTW: The reason I list Tripwire but planning on using it is that it seems to be overkill for what my simple needs are but if someone has experience with it that shows otherwise I could just give that another try as well...
ASKER CERTIFIED SOLUTION
Avatar of Hapexamendios
Hapexamendios

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Hapexamendios
Hapexamendios

Hi rump0054,

I sense Tripwire comes up because whilst what you're essentially looking at is File Integrity Monitoring for a support/service reason, whereas it's more commonly undertaken for security reasons.

Tripwire would do, as would OSSEC (a Host-based Intrusion Detection System), in that they contin the existing logic for performing checks against MD5 or SHA-1 checksums. Whilst they might ne OTT for your initial needs, consider that you can disable the functionality you don't need, leaving you with just FIM - and you (hopefully) know where all your content is, which is teh task most people find so difficult in setting up FIM for security reasons.

We elected to go for a commercial product for our needs, called LogRhythm - our need was security, and LR ticked a lot of the logging and other requirements we had - but for your case I'd say one of these might be a good bet.

Best of luck whichever way you go.
I actually had stumbled across OSSEC in my research and it sounded like it would work.  Good to hear it from another source.  I'll take a look at it again.