Simple Mime Type Detection based on file contents, not filename

Posted on 2003-03-18
Medium Priority
Last Modified: 2010-08-05
I'm trying to get perl to check the contents of a file and return the actual mime type to help verify the file extension associated with the file.  I can do this on my own server by using:
$type = `\bin\file $filename`;
but it isn't a very universal way of doing this.  
Is there a better way, preferrably without specialized modules, but if that is the way to go, let me know that too.

Question by:brianviehland
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2

Accepted Solution

biglug earned 225 total points
ID: 8164512
There's a pure-prel version of file at http://search.cpan.org/author/SDAGUE/ppt-0.12/bin/file

Its probably your best bet

Expert Comment

ID: 8164514
prel is either:
1) A strange nickname I use for perl
2) A typo

Choice is yours :)
LVL 20

Expert Comment

ID: 9699550
Nothing has happened on this question in over 7 months. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
accept answer by biglug.

Please post any comments here within the next seven days.


EE Cleanup Volunteer
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.


Author Comment

ID: 9703673
I haven't checked this since I posted it and I only got an email about there being a response on 11/07/03.  Sorry

The only problem I have with 'file' is that it doesn't work universally.

Thanks though.

LVL 20

Expert Comment

ID: 9704435
The module that biglug's answer was pointing you to is an implementation of the 'file' utility's recognition algorithm in pure perl -- it will run anywhere you can run perl! I suppose that's not quite "universal" but it is universal enough to merit an A grade.

Author Comment

ID: 9704626
Because my initial question referred to the use of `file` the response to use `file` was not much help.  I appreciate the answer, therefore I awarded the points, but pointing me to the answer I already had doesn't really make me feel I was "expertly" helped.  A C grade basically says the answers "lack finality or do not completely address the issue presented".  Since the answer provided was not one of innovation nor one that even presented an alternative to the solution I already had in the quesstion, I didn't feel it called for a backflip and a "good on you."  If you want, you can remove the C grade and leave the question for another answer to come up and I will keep my points for the next person who has a more definitive answer.

LVL 20

Expert Comment

ID: 9705014
The 'file' command you mentioned in your question is an OS utility, written in C and compiled into an executable for use on that OS.

The perl source code given at the link location is not the same thing at all. It is a reimplemtation of the OS utility. It is pure perl code. It can run anywhere perl can run. Barring further clarification, it seems to me to be exactly what you were asking for. You rejected it without looking at it, it seems, and, perhaps, are unfamiliar with what the strong claim "pure perl" means when uttered by perl aficionados.

Author Comment

ID: 9705788
Look, using file either as a command line prompt or a perl module command is that on different servers it provides different responses.  When using it on the same gif file on different servers, I have received different responses.  Each seems to be valid, but since they do not match, the code would have to account for a large number of unknown results. Assuming the 3 different results I got on the same file from 3 servers, multiply that by the possible number of file types parsed, there's no way to address that!  

I'm not an idiot, and I do understand the difference (and the similarity) of the two implementations of file described above.  Perhaps I could have worded the question better, but since I received no emails regarding responses, frankly I forgot this site even existed.  So, here I am reminded of the issue months later and the question, although responded to, didn't address the intent.  The intent of the post originally was to find a way to retreive mime types from files witout using file.  

So now I look at it months later and find that the only post refers to the method I've already mentioned, and used.  But because the person took a minute and a half to respond at all I thought it would be really crappy not to award the points.  As mentioned before, if you don't like the grade given, change it.  But without letting the users rate the answers in a way they feel is jsutified, what's the point.  Every answer is not an A and couldn't expect to be.  If the respondent thought that the "expert" answer was for me to do what I was doing, because I was so dumb I wouldn't have tried file as both an OS and as a reference to the perl module, then they deserve the C.  

I'm not trying to be a jerk, but it was like asking "what can I have for thankgiving other than turkey" and someone saying "how about turkey".  It wasn't an expert answer.  It seemed to be an answer written specifically to imply I was an idiot.  My opinion is that an expert answer that would have gotten an A would have presented an option other than the one referred to in the question.  So change the grade if you wish.
LVL 20

Expert Comment

ID: 9706223
I'm sorry your notification emails fell into a black hole. There have been occasionally outages of the Experts-Exchange notification system; it's an essential part of making things work smoothly here and when it fails things don't work — smoothly — at all.

Satisfying users is a goal at Experts-Exchange. We also want things to be fair for the experts who contribute answers, but most experts I've interacted with would very much prefer to hand you your points back rather than have you feel you had to accept a substandard answer.

The problem you originally set out to solve is actually kind of deep: how to determine what interpretation should be placed on the contents of a file. Mac operating systems have the Resource Fork, much of the modern Internet uses MIME types, many operating systems have adopted a convention of interpreting file "suffixes" as content type indicators, and many application designers have included internal flags and type indicators in the content of the file, either on purpose or by happenstance.

The 'file' command employs a "magic" file that contains the distilled intelligence from many programmers about how to characterize a file. Your experience with using the 'file' command shows that different systems have different stuff in their "magic" file -- that is why you get differing results from applying the same command to identical files. To get consistent results, you would want to employ the same "magic" file on each system. Some OS implementations and most definitely the Perl implementation of 'file' allow you to specify the "magic" file to be used.

I'm not sure where to get the "best" magic file for your purpose. It might be called /usr/share/magic, /usr/share/misc/magic or /etc/magic on a recent FreeBSD or Linux installation.

I'll close with Biglug's sentiment "It's probably your best bet". Whatever its inadequacies, this approach really is the best known way to solve the general problem you laid out.

Happy Thanksgiving!


Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question