Solved

Best way to check if a uploaded file is a pdf?

Posted on 2012-04-12
18
534 Views
Last Modified: 2012-06-27
Hi,

Is the php 5.3 function: "finfo_file" the best way to check if a uploaded pdf file is really a pdf file? (http://php.net/manual/en/function.finfo-file.php)

If yes, how should i use it for checking if a file is a pdf or docx file? I haven't used it before and learn best by seeing examples.

If no, what other method should i use?

Thanks a lot
0
Comment
Question by:peps03
  • 7
  • 5
  • 2
  • +4
18 Comments
 
LVL 17

Expert Comment

by:Anuroopsundd
ID: 37837247
how to check file types.. see example in below link

http://answers.yahoo.com/question/index?qid=20070208013050AAxzMEU
0
 
LVL 17

Expert Comment

by:xtermie
ID: 37837256
0
 
LVL 36

Expert Comment

by:Loganathan Natarajan
ID: 37837258
0
 

Author Comment

by:peps03
ID: 37837265
Thanks, but this doesn't answer my question. This is absolutely not the best way to check an uploaded file. Change a .jpg to .pdf and it will upload.
0
 

Author Comment

by:peps03
ID: 37837273
Ok, but is "finfo_file" the best way?

If yes, can anyone give me an example? i don't really get the php.net explanation...
0
 
LVL 20

Expert Comment

by:Mark Brady
ID: 37838160
You should be able to check file type using $_FILES["file"]["type"]
0
 

Author Comment

by:peps03
ID: 37838339
@elvin66 yes, you can. but its not foolproof.
changes .jpg to .pdf and you can trick it if only pdf is allowed.....

does anyone have experience with: php function:  finfo_file ?
0
 
LVL 33

Expert Comment

by:Slick812
ID: 37838669
greetings peps03, There is a difference in checking an uploaded file's name extension (as in  .pdf ) and checking the file's "file header" for relevant bytes that indicate what type of file it may or may not be. If you look it up the file header of a .pdf  file is always the four ascii text characters of -  "%PDF", so if you chop off the first 4 characters of the uploaded file string, and they are equal to "%PDF" then it is not a JPEG image file and is likely a real pdf file, but there can be other things in the file structure that may NOT have it be a "real pdf" (as you say), but I wold think that checking for "%PDF" is enough for your test.
ask questions if you need more info.
0
 

Author Comment

by:peps03
ID: 37838844
Thanks for your reaction Slick812!

So how do i check for "%PDF"?

Because $_FILES["file"]["type"] == "application/pdf" can be tricked by changing a image.jpg to image.pdf.
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 33

Expert Comment

by:Slick812
ID: 37839071
Kinda not much time now, sorry, but the $_FILES array has the disk location for your upload in -
$_FILES['file']['tmp_name']

so get this into a String -
$file1 = file_get_contents($_FILES['file']['tmp_name']);

then use substr to get first four char

$file1 = substr($file1,0,4);

then test for whatever file header you need,  %PDF   in this case

if ($file1 == '%PDF') { $valid = true;   } else {  $valid = false;  }

this is untested, but should give you the method
0
 

Author Comment

by:peps03
ID: 37840351
Thanks Slick812.

This works wel for pdf, rtf.
But not really for .doc and .docx.

You know how i should validate them?
0
 
LVL 33

Accepted Solution

by:
Slick812 earned 500 total points
ID: 37840990
OK, as a developer you might learn that all programming info is available from web searches, , and file TYPE file header specs are much used and easy to find, although you ask the question, all I'm goin to do is a web search for "doc file header" or whatever the ext may be, I do not really remember this kind of thing.


I found the DOC ext as nine characters (bytes) =
Hex: D0 CF 11 E0 A1 B1 1A E1 00

so that would be -
$file1 = substr($file1,0,9)
if ($file1 == chr(208).chr(207).chr(17).chr(224).chr(161).chr(177).chr(26).chr(225).chr(0)) { $doc = true;   } else {  $doc = false;  }


I found the DOCX ext as four characters (bytes) =
Hex: 50 4B 03 04

so that would be -
$file1 = substr($file1,0,4)
if ($file1 == 'PK'.chr(3).chr(4)) { $docx = true;   } else {  $docx = false;  }



I found the RTF ext as five characters (bytes) =
{\rtf

so that would be -
$file1 = substr($file1,0,5)
if ($file1 == '{\\rtf') { $rtf = true;   } else {  $rtf = false;  }

you can also use the developer's tool, a HEX Editor and just look at the first few bytes for a common set for several files.
0
 
LVL 33

Expert Comment

by:Slick812
ID: 37841014
OH I guess I should say, That some file header ID's like  "%PDF" are standard between many file-program versions, and others are NOT so standard, and may be different between some program versions, I know that microsoft changed the WORD program .DOC file headers alot between some versions (Oh yea there are several .DOC file extensions that are NOT microsoft WORD, MS wordpad used to do a DOC, and several other non MS programs had DOC, these do NOT use the MS .DOC file headers for WORD)
So my info above is untested, and may vary with versions and different programs, , I do not believe that you can patent a file extension, so any program can use any extension that they like, ,  so you may want do research if you find files that do not seem to match some header spec you come across.
0
 
LVL 7

Expert Comment

by:Atique Ansari
ID: 37841377
0
 

Author Comment

by:peps03
ID: 37841505
@Slick812 thanks again.
you are right about searching the internet. i did. and found what you found but didn't know how to use it.
with the .doc format i got 4 squares as outcome. couldn't read it. didn't know i had to use: chr(17).....


@atique_ansari: thanks but:
php.net says:This function has been deprecated as the PECL extension Fileinfo provides the same functionality (and more) in a much cleaner way.

They suggest to use: Fileinfo. but i don't know how to correctly use it. i tried though!

Can somebody show me how to validate a file using Fileinfo? i have php5.3
0
 
LVL 20

Expert Comment

by:Mark Brady
ID: 37841968
You should accept slick182 as the correct answer and open another question. I think this one is solved.
0
 
LVL 33

Expert Comment

by:Slick812
ID: 37843629
I really should not post any more, this is getting out of context for this question,
@peps03 - in your post ID: 37837273, you say =  "i don't really get the php.net explanation", that's because the whole finfo_file thing seems to be about un specific "categories" that have something to do with the with the (to me) rather flimsy and flexible  mime_type for a file as used in DHTML browser display, And I also "don't really get the php.net explanation" about their use of this thing, especially for the constants on  http://www.php.net/manual/en/fileinfo.constants.php  , , what these could be much use for in the context of hard drive file info I just do not get. However, as a web page tester with this function -

function getUrlMimeType($url) {
    $buffer = file_get_contents($url);
    $finfo = new finfo(FILEINFO_MIME_TYPE);
    return $finfo->buffer($buffer);
}

there may be some use for it.


   I do not know and do not care much about file mime types, but I do know about testing files for the file header specs from experience with that. You might consider that mime_type will not be a narrow enough test for your purposes.
0
 

Author Closing Comment

by:peps03
ID: 37850563
Slick812, thanks for the help!

Works great now!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
how to resize preview image 4 32
Paging Using PHP 7 33
What is the Best Editor for PHP Development ? 5 34
using hash in login 34 20
Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now