<

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

How to determine the type of file from its contents

Published on
13,614 Points
3,414 Views
17 Endorsements
Last Modified:
Approved
Community Pick
Joe Winograd, Fellow&MVE
50+ years in computer industry. Everything from development to sales. CIO. Document imaging. EE MVE 2015, EE MVE 2016, EE FELLOW 2017.

I. Introduction


There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension. This reminded me of questions that come up here at EE along the lines of, "How can I tell the type of file from its contents?", as well as, "What kind of file has the XXX extension?" Writing an article to address this has been on my to-do list for a long time — the group discussion has inspired me to do it.


II. Determine the type of file from its XXX extension


Here are four links that can help in determining what an XXX file is:


http://extension.nirsoft.net/XXX

http://www.fileinfo.com/extension/XXX

http://filext.com/file-extension/XXX

http://www.solvusoft.com/en/file-extensions/file-extension-XXX


Simply replace XXX with the file extension of interest. For example,


http://extension.nirsoft.net/TIFF

http://www.fileinfo.com/extension/AHK

http://filext.com/file-extension/xhtml

http://www.solvusoft.com/en/file-extensions/file-extension-opd


III. Determine the type of file from its contents


Now to the trickier question! An excellent file identifier application called TrID analyzes the contents of a file in an attempt to figure out what type of file it is. It comes in both a command line interface (CLI) version (for Windows and Linux) and a Graphical User Interface (GUI) version (Windows only) called TrIDNet. The downloads are at the links in the preceding sentence.


Both the CLI and GUI versions require a database/library of file definitions. This is a key feature of TrID and TrIDNet — the always increasing list of files that it recognizes. As of this article's submission date, the database contains 6,019 definitions (dated 13-August-2015). Note that there are separate downloads for the CLI definitions and the GUI definitions.


IV. More about TrID — the CLI version


After downloading the CLI version and its definitions, simply unpack the ZIP file with the program (trid.exe) and copy the definitions file (triddefs.trd) into the same folder as the program file. As mentioned above, using a database of definitions for file types is a really nice feature of TrID. Since file types are frequently added, the program author makes the definitions database available as a separate download, so you may go back to the website occasionally to get the latest definitions file.


Here's the syntax of the CLI version (v2.20):

 

Usage: TrID <[path]filespec(s)...> [-ae|-ce] [-d:file] [-ns] [-n:nn]
                                   [-@] [-v] [-w] [-?]

Where: <filespec> Files to identify/analyze
       -ae        Add guessed extension to filename
       -ce        Change filename extension
       -d:file    Use the specified defs package
       -ns        Disable unique strings check
       -n:nn      Number of matches to show (default: 5)
       -@         Read file list from stdin
       -v         Verbose mode - display def name, author, etc.
       -w         Wait for a key before exiting
       -?         This help!


The program is free for personal use. Here's exactly what the license says (I took the liberty of correcting typos in it):


The program can be freely distributed and is freeware for non-commercial, personal, research and educational use. Contact the author for commercial use or commercialization of TrID or TrID's definitions and contained information.

I don't want to put the author's email address in this article, but you may find it in the Readme file that is part of the download.


V. More about TrIDNet — the GUI version


As stated earlier, the definitions for the GUI version are in a different format from the definitions for the command line version. The GUI definitions are in a large number of XML files, one for each file type — currently, 6,019 of them!


As with the CLI version, there's no installation needed — just unpack the ZIP file with the program (TrIDNet.exe) and copy the definitions (all of the XML files) into the same folder as the program file.


When running TrIDNet, here's the opening screen:


TrIDNet-opening-screen.jpg

V. Conclusion


To come full circle to the group discussion that prompted this article, I fed to both TrID and TrIDNet a file that has 40 characters in the file name but no file extension. Here's the TrID command line with its result (via copy/paste from the command prompt window):


trid "d:\0tempd\40 character file name without extension"


TrID/32 - File Identifier v2.20 - (C) 2003-15 By M.Pontello

Definitions found:  6019

Analyzing...


Collecting data from file: d:\0tempd\40 character file name without extension

100.0% (.PDF) Adobe Portable Document Format (5000/1)


Here's the TrID GUI result:



TrIDNet-sample-analyze.jpg

Both TrID and TrIDNet easily determined that it is a PDF file — and with 100% certainty. Of course, 100% certainty is not always the case, as shown in this real-life example of a file uploaded in a recent EE question. The file bumped into the 40-character file name limit and wound up with a .x file extension. Here are the TrID results on it:


TrID/32 - File Identifier v2.20 - (C) 2003-15 By M.Pontello

Definitions found:  6019

Analyzing...


Collecting data from file: d:\0tempD\Time-Interval-Frequency-calculationv51.x

 51.3% (.XLSM) Excel Microsoft Office Open XML Format document (with Macro) (57500/1/12)

 45.0% (.XLSX) Excel Microsoft Office Open XML Format document (50500/1/11)

  3.5% (.ZIP) ZIP compressed archive (4000/1)


It is, in fact, a .XLSM file, as predicted by TrID, although with only 51.3% certainty. After changing the file type from .x to .xlsm, it loaded perfectly into Excel.


If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe

17
Comment
5 Comments
LVL 67

Expert Comment

by:Jim Horn
Nicely done.  Voted Yes.
0
LVL 62

Author Comment

by:Joe Winograd, Fellow&MVE
Jim,
Thanks for the compliment and the upvote — I appreciate both! Regards, Joe
0
LVL 39

Expert Comment

by:BillDL
Excellent article Joe.
I started a similar article several years ago that was to be entitled "Dealing With Unknown Files and File Extensions".  I began to outline many the different methods I have used over the years to identify and extract content from unknown files, but various iterations of Windows in the interim precluded the use of some of my methods.  In addition, my knowledge of Linux and Mac wasn't complete enough to even broach that side.  The result is that TrID and TrIDNet have outlasted most of my previous methods of identifying files and, for reasons made obvious in your article, should be the first port of call for most people.
0
LVL 62

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Bill,
Thanks for the kind words and the upvote — much appreciated! I like your "Dealing With Unknown Files and File Extensions" title — it's better than the one I chose. Regards, Joe
0
LVL 36

Expert Comment

by:Duncan Roe
The Linux file command gets the file type right every time (file extensions are a Windows thing I guess)
0

Featured Post

Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

Join & Write a Comment

Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade…
In this video, viewers will be given step by step instructions on adjusting mouse, pointer and cursor visibility in Microsoft Windows 10. The video seeks to educate those who are struggling with the new Windows 10 Graphical User Interface. Change Cu…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month