Solved

Hashing images and other files

Posted on 2011-03-02
21
575 Views
Last Modified: 2012-06-27
What does hashing a file mean in terms of digital forensics. I would much rather some comments as opposed to links. The general gist I got was examiners hash (what tools?) a set of images, and then see if they are elsewhere on other machines? Or get a hash set of known bad/inappropriate images and scan that against a suspect’s machine?

So, how does hashing work in laymans terms. Doesn’t the way an image has been saved, tampered or uploaded to a given site affect that unique hash? Is there anything that can affect the exact same image having one hash when you hash it, and another hash if you find it on a different workstation? And also is hashing only used on certain file types, or can it be used on everything and anything? Also I hear about MD5sum, is that the only type of hash used in forensics? Can you run a hashset over the Internet, or only over a bit image of a device?
0
Comment
Question by:pma111
  • 10
  • 8
  • 3
21 Comments
 
LVL 10

Accepted Solution

by:
abbright earned 125 total points
ID: 35020326
Hashing a file computes something like a digital fingerprint of the file. Though this fingerprint is not guaranteed to be unique over all other possible files the likeliness to find another file with the same hash value is very small and it is very difficult to willfully create a file given a hash.
There are different hash algorithms available, MD5 being a prominent one, SHA1 another newer one, see http://en.wikipedia.org/wiki/Cryptographic_hash_function for a more detailed explanation and more examples.

In order to find certain images / files / ... on a storage device it is possible to compute the hash value of the files you're looking for and then comparing the hash value of all the files on the device to the original.
Simply uploading and transferring or storing a file on different locations doesn't change the hash while changing the file usually does.
In order to compute a hash you need to have access to the file. Whether you get it from the internet or somewhere else doesn't matter.
0
 
LVL 3

Author Comment

by:pma111
ID: 35020374
Thank you so what tool can be used to get the hash for the image
0
 
LVL 10

Expert Comment

by:abbright
ID: 35020424
This is a tool which lets you computer md5 and sha1-hashes: http://www.nirsoft.net/utils/hash_my_files.html
On Linux-machines there is usually md5sum: http://en.wikipedia.org/wiki/Md5sum
0
 
LVL 3

Author Comment

by:pma111
ID: 35020592
Thanks, are there any scenarios where say an image found on a users machine and an image found on a public website for example would have different hash values if it is the same picture?
0
 
LVL 10

Expert Comment

by:abbright
ID: 35020654
If the binary file is identical the hash is the same. If only one pixel of the image has been changed (which you probably won't notice using your eyes) the hash will be different.
0
 
LVL 3

Author Comment

by:pma111
ID: 35020688
Is that the common way (except manual visual) forensics guys use to find a file on another device,I.e hash it, hash the target machine files and check for a match, or is that a bit of an older technique replaced by newer techniques?
0
 
LVL 10

Expert Comment

by:abbright
ID: 35020728
I'm not sure what forensics will do, but it definitely is an easy way to figure out if certain files can be found on a computer. If someone wants to hide these he rather encrypts the files instead of changing them in order to change the hash, I guess.
0
 
LVL 3

Author Comment

by:pma111
ID: 35020750
Was gonna ask about adding them to compressed zip files I assume all you'd get then would be the hash for the zip file not a hash per file in the zip archive
0
 
LVL 10

Expert Comment

by:abbright
ID: 35020804
If the zip file is encrypted you don't have a chance to get the hashes of the files contained within. If it isn't encrypted you can hash each file in the zip file separately.
0
 
LVL 3

Author Comment

by:pma111
ID: 35020842
I assume you can't google a hash value and see where the image has a web presence... You'd have to have the server and hash each file on the server then do a match
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 10

Expert Comment

by:abbright
ID: 35020952
I think you cannot google a hash value as google cashes website-contents and hashes usually are not part of websites.
0
 
LVL 3

Author Comment

by:pma111
ID: 35020975
There's on site tinyeye.com I wonder how that works....
0
 
LVL 5

Assisted Solution

by:ChopOMatic
ChopOMatic earned 125 total points
ID: 35021682
You've already gotten some good answers, but I thought I'd add a little bit since I'm a full time digital forensic examiner who deals with this stuff all the time.

The hash value is calculated against the content of the file. The file can have a totally different filename, certain metadata (modified/accessed/created dates, etc.), but if the content of the file is identical it will yield the same hash. As someone else noted, though, ANY change to the content of the file will result in a completely different hash value. One pixel in a picture. One period in a document. ANY change. And the technology of hash computation is not such that making only a "tiny" change to a file will result in a "similar" hash.

All three of the major forensic software platforms I use (EnCase, FTK, X-Ways) of course incorporate various hashing features. One of the first steps in setting up a case is usually to create a hash of every file in the case. You can select any or all files and create a HASH SET, which is a list of hash values that you then use to figure out whether any of those files are present on other computers or hard drives or thumbdrives or whatever. Another software tool that is almost purely hash-based is Gargoyle from Wetstone. It contains hash sets for the files contained in a plethora of known malware, anti-forensic software, etc., and provides functionality for you to relatively quickly and easily scan devices for the presence of any of these "bad" files.

Another way in which hashing is frequently used in the forensic world is to eliminate certain files from consideration. There are extensive hash sets available online (Google NIST hash sets) that contain hash values for voluminous numbers of files that are known to likely be of little forensic interest in many cases. For example, if I'm looking for the presence of certain types of user-created files on a computer, I usually have zero interest in wading through the huge number of files that are created during a standard Windows installation. So I can tell EnCase to hide "known" files, narrowing that case's data universe. Make sense?

As for ZIP and other container-file formats, all the major forensic software platforms allow you to specify parameters on how to deal with them. For example, I can set up EnCase so that when it is computing hashes for all the files in a case, if it encounters a ZIP file, it knows to automatically open that container and hash each individual file that it contains. Encryption presents its own challenges and would warrant another lengthy discussion.

Finally, with regard specifically to pictures, there are also ways to narrow populations and facilitate faster comparative scans, etc. For example, X-Ways can be configured to automatically calculate the percentage of flesh tones present in each picture it processes. You can then run ops like, "show me only pictures that contain at least 30% flesh tones." Other tools are available that will scan picture populations for visual similarities, etc.

Hope this helps.
0
 
LVL 3

Author Comment

by:pma111
ID: 35021859
Thanks so much for the great answers. May have to tap into to knowledge further. Sometimes we find a suspect image like a jpg on a departmental file share and we always want to find "who put it there". As far as I know images dont have the same MAC information as you get with word docs, although I am not even sure if a word doc has a "created by" or "saved by" type log to narrow down who put the file there.
0
 
LVL 3

Author Comment

by:pma111
ID: 35021973
ChopOMatic,

I'd also (being nosey) be interested what are the more common types of investigations you get asked to review/prove/disprove etc? How long does a typical case take you guys, are you under urgent deadlines or do you typically get a couple of weeks to ensure you have located everything and not missed anything?
0
 
LVL 10

Expert Comment

by:abbright
ID: 35021998
In order to figure out "who put it there" you may want to check the file-ownership of the file. If users don't have administrative rights on the fileserver and are logged on using domain-credentials the file-ownership should help you figuring out the source. In addition to this you can activate auditing in order to figure out any filesystem changes.
0
 
LVL 5

Expert Comment

by:ChopOMatic
ID: 35022126
~75% of my casework is a scenario similar to this:  John Doe leaves the employ of Acme Widgets and goes to work for Good Widgets down the street. Acme Widgets starts losing business to Good Widgets and says, "Hey, ole John Doe stole our proprietary data and now he's using it for our competitor!" A lawsuit gets filed and an attorney for one side or the other calls me, explains the allegations, and asks me to figure out exactly what happened with regard to the data of interest. After a bit of wrangling in court, I'm granted access to the various computers and devices involved. I create forensic images of those devices and analyze those images in order to reconstruct what happened. I generate a detailed report of my findings. If the case makes it to trial, I testify to those findings.

Urgent, compressed timeframes are very common. I rarely hear, "No hurry, take your time!" ;-)

As for how long, there's no good answer to that. Sometimes the central question is whether one particular file is on one particular device. Those are pretty quick. Sometimes it's much broader, looking for data and clues across dozens of computers/thumbdrives/smartphones/you name it. Cases can literally go for days or years.
0
 
LVL 3

Author Comment

by:pma111
ID: 35025603
Interesting stuff/

Last questions

@abbright:

>>you may want to check the file-ownership of the file

How do you check this for say a .doc or a .jpg or a .mpg

@ChopOMatic::

I watched a couple of youtube vids on forensics last night. And one technique they say use is keyword searches. However I wondered, do files on a PC sometimes have key words in a less than plain text kinda format. So you could poetntially miss key files containing key words as they werent in a plain text format?

Also, aside from hash analysis and keyword search analysis which I assume you run on every/most cases, are there other types of "analysis" to find the files you are after? Can you details them so I can read up further as this stuff interests me...
0
 
LVL 10

Expert Comment

by:abbright
ID: 35025680
If you right-click on the file, select properties => security => advanced => owner you see the current owner of the file. Chances are that this is the one who uploaded the file, though it is not guaranteed as users with full access to a server can change the ownership of a file.

In order to find files you are after you can search for filenames, filename-suffixes, sizes, change-dates, or any other property a file has. For example you could search for all jpg-files of a certain size which have been changed in a certain period of time and which are stored in a certain folder. All this can be done with standard with windows search. If you want to go into the properties of the pictures themselves I guess you need specialized tools like the ones ChopOMatic talked about.
0
 
LVL 3

Author Comment

by:pma111
ID: 35025781
Thanks abbright, much appreciated.
0
 
LVL 5

Expert Comment

by:ChopOMatic
ID: 35033613
Yes, keyword searches are an uber-common element of what we do. And yes, keywords can indeed turn up in formats other than plaintext and commonly do. (A prime example of a common non-plaintext format is Unicode and there are certainly others. If you want to dig into the guts of this stuff, do a little research on codepages, Unicode, etc. That's only scratching the skin of the surface, but it's a start.) The short answer is that the software packages we use are designed specifically to find the keywords in their plaintext and non-plaintext flavors. Another important feature of these apps is that they have the ability to search EVERYWHERE for keyword occurrences, not just in active files as would be the case in a typical Windows search. We routinely find keyword occurrences in unallocated space, file slack, temp files, dump files, etc., all locations that wouldn't be examined in a "typical" user-level search. Make sense?

As for other elements of forensic analysis, the list would be long, but some common ones are:  Registry analysis to uncover certain user histories, removable device analysis to track the use of thumbdrives and other removable media, and carving files from unallocated space.

If you want to do some more digging, check out the site Forensic Focus. I'm pretty sure anyone can join there and you'll find a wealth of information and a host of super-helpful people there who love to talk about this stuff. :-)
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Finding a job can be stressful - searches, resume tweaks, and networking events can be super boring. Luckily we're here to help you land your dream job!
Big data transfers via information superhighways require special attention and protection. Learn more about the IT-regulations of the country where your server is located. Analyze cloud providers and their encryption systems for safe data transit. S…
Where to go on the main page to find the job listings. How to apply to a job that you are interested in from the list that is featured on our Careers page.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now