How to determine the type of file from its contents

Published on
13,082 Points
17 Endorsements
Last Modified:
Community Pick
Joe Winograd - EE Fellow & MVE
50+ years in computer industry. Everything from development to sales. CIO. Document imaging. EE MVE 2015, EE MVE 2016, EE FELLOW 2017.

I. Introduction

There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension. This reminded me of questions that come up here at EE along the lines of, "How can I tell the type of file from its contents?", as well as, "What kind of file has the XXX extension?" Writing an article to address this has been on my to-do list for a long time — the group discussion has inspired me to do it.

II. Determine the type of file from its XXX extension

Here are four links that can help in determining what an XXX file is:





Simply replace XXX with the file extension of interest. For example,





III. Determine the type of file from its contents

Now to the trickier question! An excellent file identifier application called TrID analyzes the contents of a file in an attempt to figure out what type of file it is. It comes in both a command line interface (CLI) version (for Windows and Linux) and a Graphical User Interface (GUI) version (Windows only) called TrIDNet. The downloads are at the links in the preceding sentence.

Both the CLI and GUI versions require a database/library of file definitions. This is a key feature of TrID and TrIDNet — the always increasing list of files that it recognizes. As of this article's submission date, the database contains 6,019 definitions (dated 13-August-2015). Note that there are separate downloads for the CLI definitions and the GUI definitions.

IV. More about TrID — the CLI version

After downloading the CLI version and its definitions, simply unpack the ZIP file with the program (trid.exe) and copy the definitions file (triddefs.trd) into the same folder as the program file. As mentioned above, using a database of definitions for file types is a really nice feature of TrID. Since file types are frequently added, the program author makes the definitions database available as a separate download, so you may go back to the website occasionally to get the latest definitions file.

Here's the syntax of the CLI version (v2.20):


Usage: TrID <[path]filespec(s)...> [-ae|-ce] [-d:file] [-ns] [-n:nn]
                                   [-@] [-v] [-w] [-?]

Where: <filespec> Files to identify/analyze
       -ae        Add guessed extension to filename
       -ce        Change filename extension
       -d:file    Use the specified defs package
       -ns        Disable unique strings check
       -n:nn      Number of matches to show (default: 5)
       -@         Read file list from stdin
       -v         Verbose mode - display def name, author, etc.
       -w         Wait for a key before exiting
       -?         This help!

The program is free for personal use. Here's exactly what the license says (I took the liberty of correcting typos in it):

The program can be freely distributed and is freeware for non-commercial, personal, research and educational use. Contact the author for commercial use or commercialization of TrID or TrID's definitions and contained information.

I don't want to put the author's email address in this article, but you may find it in the Readme file that is part of the download.

V. More about TrIDNet — the GUI version

As stated earlier, the definitions for the GUI version are in a different format from the definitions for the command line version. The GUI definitions are in a large number of XML files, one for each file type — currently, 6,019 of them!

As with the CLI version, there's no installation needed — just unpack the ZIP file with the program (TrIDNet.exe) and copy the definitions (all of the XML files) into the same folder as the program file.

When running TrIDNet, here's the opening screen:


V. Conclusion

To come full circle to the group discussion that prompted this article, I fed to both TrID and TrIDNet a file that has 40 characters in the file name but no file extension. Here's the TrID command line with its result (via copy/paste from the command prompt window):

trid "d:\0tempd\40 character file name without extension"

TrID/32 - File Identifier v2.20 - (C) 2003-15 By M.Pontello

Definitions found:  6019


Collecting data from file: d:\0tempd\40 character file name without extension

100.0% (.PDF) Adobe Portable Document Format (5000/1)

Here's the TrID GUI result:


Both TrID and TrIDNet easily determined that it is a PDF file — and with 100% certainty. Of course, 100% certainty is not always the case, as shown in this real-life example of a file uploaded in a recent EE question. The file bumped into the 40-character file name limit and wound up with a .x file extension. Here are the TrID results on it:

TrID/32 - File Identifier v2.20 - (C) 2003-15 By M.Pontello

Definitions found:  6019


Collecting data from file: d:\0tempD\Time-Interval-Frequency-calculationv51.x

 51.3% (.XLSM) Excel Microsoft Office Open XML Format document (with Macro) (57500/1/12)

 45.0% (.XLSX) Excel Microsoft Office Open XML Format document (50500/1/11)

  3.5% (.ZIP) ZIP compressed archive (4000/1)

It is, in fact, a .XLSM file, as predicted by TrID, although with only 51.3% certainty. After changing the file type from .x to .xlsm, it loaded perfectly into Excel.

If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe

LVL 66

Expert Comment

by:Jim Horn
Nicely done.  Voted Yes.
LVL 58

Author Comment

by:Joe Winograd - EE Fellow & MVE
Thanks for the compliment and the upvote — I appreciate both! Regards, Joe
LVL 39

Expert Comment

Excellent article Joe.
I started a similar article several years ago that was to be entitled "Dealing With Unknown Files and File Extensions".  I began to outline many the different methods I have used over the years to identify and extract content from unknown files, but various iterations of Windows in the interim precluded the use of some of my methods.  In addition, my knowledge of Linux and Mac wasn't complete enough to even broach that side.  The result is that TrID and TrIDNet have outlasted most of my previous methods of identifying files and, for reasons made obvious in your article, should be the first port of call for most people.
LVL 58

Author Comment

by:Joe Winograd - EE Fellow & MVE
Hi Bill,
Thanks for the kind words and the upvote — much appreciated! I like your "Dealing With Unknown Files and File Extensions" title — it's better than the one I chose. Regards, Joe
LVL 35

Expert Comment

by:Duncan Roe
The Linux file command gets the file type right every time (file extensions are a Windows thing I guess)

Featured Post

A proven path to a career in data science

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science  with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

Join & Write a Comment

This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …
Is your OST file inaccessible, Need to transfer OST file from one computer to another? Want to convert OST file to PST? If the answer to any of the above question is yes, then look no further. With the help of Stellar OST to PST Converter, you can e…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month