Link to home
Start Free TrialLog in
Avatar of peps03
peps03

asked on

Best way to check if a uploaded file is a pdf?

Hi,

Is the php 5.3 function: "finfo_file" the best way to check if a uploaded pdf file is really a pdf file? (http://php.net/manual/en/function.finfo-file.php)

If yes, how should i use it for checking if a file is a pdf or docx file? I haven't used it before and learn best by seeing examples.

If no, what other method should i use?

Thanks a lot
Avatar of Anuroopsundd
Anuroopsundd
Flag of India image

how to check file types.. see example in below link

http://answers.yahoo.com/question/index?qid=20070208013050AAxzMEU
Avatar of peps03
peps03

ASKER

Thanks, but this doesn't answer my question. This is absolutely not the best way to check an uploaded file. Change a .jpg to .pdf and it will upload.
Avatar of peps03

ASKER

Ok, but is "finfo_file" the best way?

If yes, can anyone give me an example? i don't really get the php.net explanation...
You should be able to check file type using $_FILES["file"]["type"]
Avatar of peps03

ASKER

@elvin66 yes, you can. but its not foolproof.
changes .jpg to .pdf and you can trick it if only pdf is allowed.....

does anyone have experience with: php function:  finfo_file ?
greetings peps03, There is a difference in checking an uploaded file's name extension (as in  .pdf ) and checking the file's "file header" for relevant bytes that indicate what type of file it may or may not be. If you look it up the file header of a .pdf  file is always the four ascii text characters of -  "%PDF", so if you chop off the first 4 characters of the uploaded file string, and they are equal to "%PDF" then it is not a JPEG image file and is likely a real pdf file, but there can be other things in the file structure that may NOT have it be a "real pdf" (as you say), but I wold think that checking for "%PDF" is enough for your test.
ask questions if you need more info.
Avatar of peps03

ASKER

Thanks for your reaction Slick812!

So how do i check for "%PDF"?

Because $_FILES["file"]["type"] == "application/pdf" can be tricked by changing a image.jpg to image.pdf.
Kinda not much time now, sorry, but the $_FILES array has the disk location for your upload in -
$_FILES['file']['tmp_name']

so get this into a String -
$file1 = file_get_contents($_FILES['file']['tmp_name']);

then use substr to get first four char

$file1 = substr($file1,0,4);

then test for whatever file header you need,  %PDF   in this case

if ($file1 == '%PDF') { $valid = true;   } else {  $valid = false;  }

this is untested, but should give you the method
Avatar of peps03

ASKER

Thanks Slick812.

This works wel for pdf, rtf.
But not really for .doc and .docx.

You know how i should validate them?
ASKER CERTIFIED SOLUTION
Avatar of Member_2_248744
Member_2_248744
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
OH I guess I should say, That some file header ID's like  "%PDF" are standard between many file-program versions, and others are NOT so standard, and may be different between some program versions, I know that microsoft changed the WORD program .DOC file headers alot between some versions (Oh yea there are several .DOC file extensions that are NOT microsoft WORD, MS wordpad used to do a DOC, and several other non MS programs had DOC, these do NOT use the MS .DOC file headers for WORD)
So my info above is untested, and may vary with versions and different programs, , I do not believe that you can patent a file extension, so any program can use any extension that they like, ,  so you may want do research if you find files that do not seem to match some header spec you come across.
Avatar of peps03

ASKER

@Slick812 thanks again.
you are right about searching the internet. i did. and found what you found but didn't know how to use it.
with the .doc format i got 4 squares as outcome. couldn't read it. didn't know i had to use: chr(17).....


@atique_ansari: thanks but:
php.net says:This function has been deprecated as the PECL extension Fileinfo provides the same functionality (and more) in a much cleaner way.

They suggest to use: Fileinfo. but i don't know how to correctly use it. i tried though!

Can somebody show me how to validate a file using Fileinfo? i have php5.3
You should accept slick182 as the correct answer and open another question. I think this one is solved.
I really should not post any more, this is getting out of context for this question,
@peps03 - in your post ID: 37837273, you say =  "i don't really get the php.net explanation", that's because the whole finfo_file thing seems to be about un specific "categories" that have something to do with the with the (to me) rather flimsy and flexible  mime_type for a file as used in DHTML browser display, And I also "don't really get the php.net explanation" about their use of this thing, especially for the constants on  http://www.php.net/manual/en/fileinfo.constants.php  , , what these could be much use for in the context of hard drive file info I just do not get. However, as a web page tester with this function -

function getUrlMimeType($url) {
    $buffer = file_get_contents($url);
    $finfo = new finfo(FILEINFO_MIME_TYPE);
    return $finfo->buffer($buffer);
}

there may be some use for it.


   I do not know and do not care much about file mime types, but I do know about testing files for the file header specs from experience with that. You might consider that mime_type will not be a narrow enough test for your purposes.
Avatar of peps03

ASKER

Slick812, thanks for the help!

Works great now!