<

Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x

How to determine the type of file from its contents

Published on
9,174 Points
2,174 Views
15 Endorsements
Last Modified:
Joe Winograd, EE MVE 2015&2016
50+ yrs in computer industry. Everything from programming to sales. OS kernel dev on mainframes. CIO. Document imaging. EE MVE 2015 & 2016.
I. Introduction

There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension. This reminded me of questions that come up here at EE along the lines of, "How can I tell the type of file from its contents?", as well as, "What kind of file has the XXX extension?" Writing an article to address this has been on my to-do list for a long time — the group discussion has inspired me to do it.

II. Determine the type of file from its XXX extension

Here are four links that can help in determining what an XXX file is:

http://extension.nirsoft.net/XXX
http://www.fileinfo.com/extension/XXX
http://filext.com/file-extension/XXX
http://www.solvusoft.com/en/file-extensions/file-extension-XXX

Simply replace XXX with the file extension of interest. For example,

http://extension.nirsoft.net/TIFF
http://www.fileinfo.com/extension/AHK
http://filext.com/file-extension/xhtml
http://www.solvusoft.com/en/file-extensions/file-extension-opd

III. Determine the type of file from its contents

Now to the trickier question! An excellent file identifier application called TrID analyzes the contents of a file in an attempt to figure out what type of file it is. It comes in both a command line interface (CLI) version (for Windows and Linux) and a Graphical User Interface (GUI) version (Windows only) called TrIDNet. The downloads are at the links in the preceding sentence.

Both the CLI and GUI versions require a database/library of file definitions. This is a key feature of TrID and TrIDNet — the always increasing list of files that it recognizes. As of this article's submission date, the database contains 6,019 definitions (dated 13-August-2015). Note that there are separate downloads for the CLI definitions and the GUI definitions.

IV. More about TrID — the CLI version

After downloading the CLI version and its definitions, simply unpack the ZIP file with the program (trid.exe) and copy the definitions file (triddefs.trd) into the same folder as the program file. As mentioned above, using a database of definitions for file types is a really nice feature of TrID. Since file types are frequently added, the program author makes the definitions database available as a separate download, so you may go back to the website occasionally to get the latest definitions file.

Here's the syntax of the CLI version (v2.20):
 
Usage: TrID <[path]filespec(s)...> [-ae|-ce] [-d:file] [-ns] [-n:nn]
                                   [-@] [-v] [-w] [-?]

Where: <filespec> Files to identify/analyze
       -ae        Add guessed extension to filename
       -ce        Change filename extension
       -d:file    Use the specified defs package
       -ns        Disable unique strings check
       -n:nn      Number of matches to show (default: 5)
       -@         Read file list from stdin
       -v         Verbose mode - display def name, author, etc.
       -w         Wait for a key before exiting
       -?         This help!

Open in new window


The program is free for personal use. Here's exactly what the license says (I took the liberty of correcting typos in it):

The program can be freely distributed and is freeware for non-commercial, personal, research and educational use. Contact the author for commercial use or commercialization of TrID or TrID's definitions and contained information.
I don't want to put the author's email address in this article, but you may find it in the Readme file that is part of the download.

V. More about TrIDNet — the GUI version

As stated earlier, the definitions for the GUI version are in a different format from the definitions for the command line version. The GUI definitions are in a large number of XML files, one for each file type — currently, 6,019 of them!

As with the CLI version, there's no installation needed — just unpack the ZIP file with the program (TrIDNet.exe) and copy the definitions (all of the XML files) into the same folder as the program file.

When running TrIDNet, here's the opening screen:

TrIDNet-opening-screen.jpg
V. Conclusion

To come full circle to the group discussion that prompted this article, I fed to both TrID and TrIDNet a file that has 40 characters in the file name but no file extension. Here's the TrID command line with its result (via copy/paste from the command prompt window):

trid "d:\0tempd\40 character file name without extension"

TrID/32 - File Identifier v2.20 - (C) 2003-15 By M.Pontello
Definitions found:  6019
Analyzing...

Collecting data from file: d:\0tempd\40 character file name without extension
100.0% (.PDF) Adobe Portable Document Format (5000/1)

Here's the TrID GUI result:


TrIDNet-sample-analyze.jpg
Both TrID and TrIDNet easily determined that it is a PDF file — and with 100% certainty. Of course, 100% certainty is not always the case, as shown in this real-life example of a file uploaded in a recent EE question. The file bumped into the 40-character file name limit and wound up with a .x file extension. Here are the TrID results on it:

TrID/32 - File Identifier v2.20 - (C) 2003-15 By M.Pontello
Definitions found:  6019
Analyzing...

Collecting data from file: d:\0tempD\Time-Interval-Frequency-calculationv51.x
 51.3% (.XLSM) Excel Microsoft Office Open XML Format document (with Macro) (57500/1/12)
 45.0% (.XLSX) Excel Microsoft Office Open XML Format document (50500/1/11)
  3.5% (.ZIP) ZIP compressed archive (4000/1)

It is, in fact, a .XLSM file, as predicted by TrID, although with only 51.3% certainty. After changing the file type from .x to .xlsm, it loaded perfectly into Excel.

If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe
15
Comment
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 66

Expert Comment

by:Jim Horn
Nicely done.  Voted Yes.
0
 
LVL 55

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Jim,
Thanks for the compliment and the upvote — I appreciate both! Regards, Joe
0
 
LVL 38

Expert Comment

by:BillDL
Excellent article Joe.
I started a similar article several years ago that was to be entitled "Dealing With Unknown Files and File Extensions".  I began to outline many the different methods I have used over the years to identify and extract content from unknown files, but various iterations of Windows in the interim precluded the use of some of my methods.  In addition, my knowledge of Linux and Mac wasn't complete enough to even broach that side.  The result is that TrID and TrIDNet have outlasted most of my previous methods of identifying files and, for reasons made obvious in your article, should be the first port of call for most people.
0
 
LVL 55

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Hi Bill,
Thanks for the kind words and the upvote — much appreciated! I like your "Dealing With Unknown Files and File Extensions" title — it's better than the one I chose. Regards, Joe
0
 
LVL 35

Expert Comment

by:Duncan Roe
The Linux file command gets the file type right every time (file extensions are a Windows thing I guess)
0

Featured Post

Docker-Compose to Simplify Multi-Container Builds

Our veteran DevOps Author takes you through how to build a multi-container environment, managed with a single utility in order to simplify your deployments.

Join & Write a Comment

Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month