Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Best way to check if a uploaded file is a pdf?

Posted on 2012-04-12
18
Medium Priority
?
615 Views
Last Modified: 2012-06-27
Hi,

Is the php 5.3 function: "finfo_file" the best way to check if a uploaded pdf file is really a pdf file? (http://php.net/manual/en/function.finfo-file.php)

If yes, how should i use it for checking if a file is a pdf or docx file? I haven't used it before and learn best by seeing examples.

If no, what other method should i use?

Thanks a lot
0
Comment
Question by:peps03
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 2
  • +4
18 Comments
 
LVL 17

Expert Comment

by:Anuroopsundd
ID: 37837247
how to check file types.. see example in below link

http://answers.yahoo.com/question/index?qid=20070208013050AAxzMEU
0
 
LVL 18

Expert Comment

by:xtermie
ID: 37837256
0
 
LVL 36

Expert Comment

by:Loganathan Natarajan
ID: 37837258
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 

Author Comment

by:peps03
ID: 37837265
Thanks, but this doesn't answer my question. This is absolutely not the best way to check an uploaded file. Change a .jpg to .pdf and it will upload.
0
 

Author Comment

by:peps03
ID: 37837273
Ok, but is "finfo_file" the best way?

If yes, can anyone give me an example? i don't really get the php.net explanation...
0
 
LVL 20

Expert Comment

by:Mark Brady
ID: 37838160
You should be able to check file type using $_FILES["file"]["type"]
0
 

Author Comment

by:peps03
ID: 37838339
@elvin66 yes, you can. but its not foolproof.
changes .jpg to .pdf and you can trick it if only pdf is allowed.....

does anyone have experience with: php function:  finfo_file ?
0
 
LVL 34

Expert Comment

by:Slick812
ID: 37838669
greetings peps03, There is a difference in checking an uploaded file's name extension (as in  .pdf ) and checking the file's "file header" for relevant bytes that indicate what type of file it may or may not be. If you look it up the file header of a .pdf  file is always the four ascii text characters of -  "%PDF", so if you chop off the first 4 characters of the uploaded file string, and they are equal to "%PDF" then it is not a JPEG image file and is likely a real pdf file, but there can be other things in the file structure that may NOT have it be a "real pdf" (as you say), but I wold think that checking for "%PDF" is enough for your test.
ask questions if you need more info.
0
 

Author Comment

by:peps03
ID: 37838844
Thanks for your reaction Slick812!

So how do i check for "%PDF"?

Because $_FILES["file"]["type"] == "application/pdf" can be tricked by changing a image.jpg to image.pdf.
0
 
LVL 34

Expert Comment

by:Slick812
ID: 37839071
Kinda not much time now, sorry, but the $_FILES array has the disk location for your upload in -
$_FILES['file']['tmp_name']

so get this into a String -
$file1 = file_get_contents($_FILES['file']['tmp_name']);

then use substr to get first four char

$file1 = substr($file1,0,4);

then test for whatever file header you need,  %PDF   in this case

if ($file1 == '%PDF') { $valid = true;   } else {  $valid = false;  }

this is untested, but should give you the method
0
 

Author Comment

by:peps03
ID: 37840351
Thanks Slick812.

This works wel for pdf, rtf.
But not really for .doc and .docx.

You know how i should validate them?
0
 
LVL 34

Accepted Solution

by:
Slick812 earned 2000 total points
ID: 37840990
OK, as a developer you might learn that all programming info is available from web searches, , and file TYPE file header specs are much used and easy to find, although you ask the question, all I'm goin to do is a web search for "doc file header" or whatever the ext may be, I do not really remember this kind of thing.


I found the DOC ext as nine characters (bytes) =
Hex: D0 CF 11 E0 A1 B1 1A E1 00

so that would be -
$file1 = substr($file1,0,9)
if ($file1 == chr(208).chr(207).chr(17).chr(224).chr(161).chr(177).chr(26).chr(225).chr(0)) { $doc = true;   } else {  $doc = false;  }


I found the DOCX ext as four characters (bytes) =
Hex: 50 4B 03 04

so that would be -
$file1 = substr($file1,0,4)
if ($file1 == 'PK'.chr(3).chr(4)) { $docx = true;   } else {  $docx = false;  }



I found the RTF ext as five characters (bytes) =
{\rtf

so that would be -
$file1 = substr($file1,0,5)
if ($file1 == '{\\rtf') { $rtf = true;   } else {  $rtf = false;  }

you can also use the developer's tool, a HEX Editor and just look at the first few bytes for a common set for several files.
0
 
LVL 34

Expert Comment

by:Slick812
ID: 37841014
OH I guess I should say, That some file header ID's like  "%PDF" are standard between many file-program versions, and others are NOT so standard, and may be different between some program versions, I know that microsoft changed the WORD program .DOC file headers alot between some versions (Oh yea there are several .DOC file extensions that are NOT microsoft WORD, MS wordpad used to do a DOC, and several other non MS programs had DOC, these do NOT use the MS .DOC file headers for WORD)
So my info above is untested, and may vary with versions and different programs, , I do not believe that you can patent a file extension, so any program can use any extension that they like, ,  so you may want do research if you find files that do not seem to match some header spec you come across.
0
 

Author Comment

by:peps03
ID: 37841505
@Slick812 thanks again.
you are right about searching the internet. i did. and found what you found but didn't know how to use it.
with the .doc format i got 4 squares as outcome. couldn't read it. didn't know i had to use: chr(17).....


@atique_ansari: thanks but:
php.net says:This function has been deprecated as the PECL extension Fileinfo provides the same functionality (and more) in a much cleaner way.

They suggest to use: Fileinfo. but i don't know how to correctly use it. i tried though!

Can somebody show me how to validate a file using Fileinfo? i have php5.3
0
 
LVL 20

Expert Comment

by:Mark Brady
ID: 37841968
You should accept slick182 as the correct answer and open another question. I think this one is solved.
0
 
LVL 34

Expert Comment

by:Slick812
ID: 37843629
I really should not post any more, this is getting out of context for this question,
@peps03 - in your post ID: 37837273, you say =  "i don't really get the php.net explanation", that's because the whole finfo_file thing seems to be about un specific "categories" that have something to do with the with the (to me) rather flimsy and flexible  mime_type for a file as used in DHTML browser display, And I also "don't really get the php.net explanation" about their use of this thing, especially for the constants on  http://www.php.net/manual/en/fileinfo.constants.php  , , what these could be much use for in the context of hard drive file info I just do not get. However, as a web page tester with this function -

function getUrlMimeType($url) {
    $buffer = file_get_contents($url);
    $finfo = new finfo(FILEINFO_MIME_TYPE);
    return $finfo->buffer($buffer);
}

there may be some use for it.


   I do not know and do not care much about file mime types, but I do know about testing files for the file header specs from experience with that. You might consider that mime_type will not be a narrow enough test for your purposes.
0
 

Author Closing Comment

by:peps03
ID: 37850563
Slick812, thanks for the help!

Works great now!
0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
The viewer will learn how to count occurrences of each item in an array.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question