Community Pick: Many members of our community have endorsed this article.

How to determine the type of file from its contents

Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT
Published:
Updated:
Edited by: Andrew Leniart
There have been many questions here at Experts Exchange along the lines of, "How can I tell the type of file from its contents?", as well as, "What kind of file has the XXX extension?" Sometimes the situation is that a file does not even have an extension. This article addresses those issues.

I. Introduction


There was an interesting discussion years ago in an Experts Exchange Group — Attachments with no extension. This reminded me of questions that come up here at EE along the lines of, "How can I tell the type of file from its contents?" and "What kind of file has the XXX extension?" This article addresses those questions.


II. Determine the type of file from its XXX extension


Here are five links that can help in determining what an XXX file is:


http://extension.nirsoft.net/XXX

https://www.file-extensions.org/XXX-file-extension

http://www.fileinfo.com/extension/XXX

http://filext.com/file-extension/XXX

http://www.solvusoft.com/en/file-extensions/file-extension-XXX


Simply replace XXX with the file extension of interest. For example,


http://extension.nirsoft.net/TIFF

https://www.file-extensions.org/docx-file-extension

http://www.fileinfo.com/extension/AHK

http://filext.com/file-extension/xhtml

http://www.solvusoft.com/en/file-extensions/file-extension-opd


III. Determine the type of file from its contents


Now to the trickier question! An excellent file identifier application called TrID analyzes the contents of a file in an attempt to figure out what type of file it is. It comes in both a command line interface (CLI) version (for Windows and Linux) and a Graphical User Interface (GUI) version (Windows only) called TrIDNet. The downloads are at the links in the preceding sentence.


Both the CLI and GUI versions require a database/library of file definitions. This is a key feature of TrID and TrIDNet — the always increasing list of files that it recognizes. As of this article's most recent update, the database contains 13,136 definitions (dated 3-October-2020). Note that there are separate downloads for the CLI definitions and the GUI definitions, both available at the links in the previous paragraph.


IV. More about TrID — the CLI version


After downloading the CLI version and its definitions, simply unpack the ZIP file with the program (trid.exe) and copy the definitions file (triddefs.trd) into the same folder as the program file. As mentioned above, using a database of definitions for file types is a really nice feature of TrID. Since file types are frequently added, the program author makes the definitions database available as a separate download, so you may go back to the website occasionally to get the latest definitions file.


Here's the syntax of the CLI version (v2.24):

 

Usage: TrID <[path]filespec(s)...> [-ae|-ce] [-d:file] [-ns] [-n:nn]
                                   [-@] [-v] [-w] [-?]

Where: <filespec> Files to identify/analyze
       -ae        Add guessed extension to filename
       -ce        Change filename extension
       -d:file    Use the specified defs package
       -ns        Disable unique strings check
       -n:nn      Number of matches to show (default: 5)
       -@         Read file list from stdin
       -v         Verbose mode - display def name, author, etc.
       -w         Wait for a key before exiting
       -?         This help!


The program is free for personal use. Here's exactly what the license says (I took the liberty of correcting typos in it):


The program can be freely distributed and is freeware for non-commercial, personal, research and educational use. Contact the author for commercial use or commercialization of TrID or TrID's definitions and contained information.

I don't want to put the author's email address in this article, but you may find it in the Readme file that is part of the download.


V. More about TrIDNet — the GUI version


As stated earlier, the definitions for the GUI version are in a different format from the definitions for the command line version. The GUI definitions are in a large number of XML files, one for each file type — currently, 13,136 of them!


As with the CLI version, there's no installation needed — just unpack the ZIP file with the program (TrIDNet.exe) and copy the definitions (the entire defs with all of its subfolders containing all of the XML files) into the same folder as the program file.


The screenshot at the top of this article is the opening dialog when running TrIDNet.


V. Conclusion


To come full circle to the group discussion that prompted this article, I fed to both TrID and TrIDNet a file that has 40 characters in the file name but no file extension. Here's the TrID command line with its result (via copy/paste from the command prompt window...note that the runs below are from an earlier version of TrID...when I first published this article in 2015):


trid "d:\0tempd\40 character file name without extension"


TrID/32 - File Identifier v2.20 - (C) 2003-15 By M.Pontello

Definitions found:  6019

Analyzing...


Collecting data from file: d:\0tempd\40 character file name without extension

100.0% (.PDF) Adobe Portable Document Format (5000/1)


Here's the TrID GUI result:



TrIDNet-sample-analyze.jpg


Both TrID and TrIDNet easily determined that it is a PDF file — and with 100% certainty. Of course, 100% certainty is not always the case, as shown in this real-life example of a file uploaded in a recent EE question. The file bumped into the 40-character file name limit and wound up with a .x file extension. Here are the TrID results on it:


TrID/32 - File Identifier v2.20 - (C) 2003-15 By M.Pontello

Definitions found:  6019

Analyzing...


Collecting data from file: d:\0tempD\Time-Interval-Frequency-calculationv51.x

 51.3% (.XLSM) Excel Microsoft Office Open XML Format document (with Macro) (57500/1/12)

 45.0% (.XLSX) Excel Microsoft Office Open XML Format document (50500/1/11)

  3.5% (.ZIP) ZIP compressed archive (4000/1)


It is, in fact, a .XLSM file, as predicted by TrID, although with only 51.3% certainty. After changing the file type from .x to .xlsm, it loaded perfectly into Excel.


Those runs above in 2015 were on Windows 7. As part of an article update in October 2020, I ran the latest version of the CLI and GUI on Windows 10 V1909, where they worked perfectly.


If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much!


Regards, Joe

17
7,797 Views
Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT

Comments (5)

Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Most Valuable Expert 2013
Author of the Year 2015

Commented:
Nicely done.  Voted Yes.
Joe WinogradDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2018

Author

Commented:
Jim,
Thanks for the compliment and the upvote — I appreciate both! Regards, Joe
BillDLGeneral Factotum
CERTIFIED EXPERT

Commented:
Excellent article Joe.
I started a similar article several years ago that was to be entitled "Dealing With Unknown Files and File Extensions".  I began to outline many the different methods I have used over the years to identify and extract content from unknown files, but various iterations of Windows in the interim precluded the use of some of my methods.  In addition, my knowledge of Linux and Mac wasn't complete enough to even broach that side.  The result is that TrID and TrIDNet have outlasted most of my previous methods of identifying files and, for reasons made obvious in your article, should be the first port of call for most people.
Joe WinogradDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2018

Author

Commented:
Hi Bill,
Thanks for the kind words and the upvote — much appreciated! I like your "Dealing With Unknown Files and File Extensions" title — it's better than the one I chose. Regards, Joe
Duncan RoeSoftware Developer
CERTIFIED EXPERT

Commented:
The Linux file command gets the file type right every time (file extensions are a Windows thing I guess)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.