Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 666
  • Last Modified:

Best way to check if a uploaded file is a pdf?

Hi,

Is the php 5.3 function: "finfo_file" the best way to check if a uploaded pdf file is really a pdf file? (http://php.net/manual/en/function.finfo-file.php)

If yes, how should i use it for checking if a file is a pdf or docx file? I haven't used it before and learn best by seeing examples.

If no, what other method should i use?

Thanks a lot
0
peps03
Asked:
peps03
  • 7
  • 5
  • 2
  • +4
1 Solution
 
AnuroopsunddCommented:
how to check file types.. see example in below link

http://answers.yahoo.com/question/index?qid=20070208013050AAxzMEU
0
 
Loganathan NatarajanLAMP DeveloperCommented:
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
peps03Author Commented:
Thanks, but this doesn't answer my question. This is absolutely not the best way to check an uploaded file. Change a .jpg to .pdf and it will upload.
0
 
peps03Author Commented:
Ok, but is "finfo_file" the best way?

If yes, can anyone give me an example? i don't really get the php.net explanation...
0
 
Mark BradyCommented:
You should be able to check file type using $_FILES["file"]["type"]
0
 
peps03Author Commented:
@elvin66 yes, you can. but its not foolproof.
changes .jpg to .pdf and you can trick it if only pdf is allowed.....

does anyone have experience with: php function:  finfo_file ?
0
 
Slick812Commented:
greetings peps03, There is a difference in checking an uploaded file's name extension (as in  .pdf ) and checking the file's "file header" for relevant bytes that indicate what type of file it may or may not be. If you look it up the file header of a .pdf  file is always the four ascii text characters of -  "%PDF", so if you chop off the first 4 characters of the uploaded file string, and they are equal to "%PDF" then it is not a JPEG image file and is likely a real pdf file, but there can be other things in the file structure that may NOT have it be a "real pdf" (as you say), but I wold think that checking for "%PDF" is enough for your test.
ask questions if you need more info.
0
 
peps03Author Commented:
Thanks for your reaction Slick812!

So how do i check for "%PDF"?

Because $_FILES["file"]["type"] == "application/pdf" can be tricked by changing a image.jpg to image.pdf.
0
 
Slick812Commented:
Kinda not much time now, sorry, but the $_FILES array has the disk location for your upload in -
$_FILES['file']['tmp_name']

so get this into a String -
$file1 = file_get_contents($_FILES['file']['tmp_name']);

then use substr to get first four char

$file1 = substr($file1,0,4);

then test for whatever file header you need,  %PDF   in this case

if ($file1 == '%PDF') { $valid = true;   } else {  $valid = false;  }

this is untested, but should give you the method
0
 
peps03Author Commented:
Thanks Slick812.

This works wel for pdf, rtf.
But not really for .doc and .docx.

You know how i should validate them?
0
 
Slick812Commented:
OK, as a developer you might learn that all programming info is available from web searches, , and file TYPE file header specs are much used and easy to find, although you ask the question, all I'm goin to do is a web search for "doc file header" or whatever the ext may be, I do not really remember this kind of thing.


I found the DOC ext as nine characters (bytes) =
Hex: D0 CF 11 E0 A1 B1 1A E1 00

so that would be -
$file1 = substr($file1,0,9)
if ($file1 == chr(208).chr(207).chr(17).chr(224).chr(161).chr(177).chr(26).chr(225).chr(0)) { $doc = true;   } else {  $doc = false;  }


I found the DOCX ext as four characters (bytes) =
Hex: 50 4B 03 04

so that would be -
$file1 = substr($file1,0,4)
if ($file1 == 'PK'.chr(3).chr(4)) { $docx = true;   } else {  $docx = false;  }



I found the RTF ext as five characters (bytes) =
{\rtf

so that would be -
$file1 = substr($file1,0,5)
if ($file1 == '{\\rtf') { $rtf = true;   } else {  $rtf = false;  }

you can also use the developer's tool, a HEX Editor and just look at the first few bytes for a common set for several files.
0
 
Slick812Commented:
OH I guess I should say, That some file header ID's like  "%PDF" are standard between many file-program versions, and others are NOT so standard, and may be different between some program versions, I know that microsoft changed the WORD program .DOC file headers alot between some versions (Oh yea there are several .DOC file extensions that are NOT microsoft WORD, MS wordpad used to do a DOC, and several other non MS programs had DOC, these do NOT use the MS .DOC file headers for WORD)
So my info above is untested, and may vary with versions and different programs, , I do not believe that you can patent a file extension, so any program can use any extension that they like, ,  so you may want do research if you find files that do not seem to match some header spec you come across.
0
 
peps03Author Commented:
@Slick812 thanks again.
you are right about searching the internet. i did. and found what you found but didn't know how to use it.
with the .doc format i got 4 squares as outcome. couldn't read it. didn't know i had to use: chr(17).....


@atique_ansari: thanks but:
php.net says:This function has been deprecated as the PECL extension Fileinfo provides the same functionality (and more) in a much cleaner way.

They suggest to use: Fileinfo. but i don't know how to correctly use it. i tried though!

Can somebody show me how to validate a file using Fileinfo? i have php5.3
0
 
Mark BradyCommented:
You should accept slick182 as the correct answer and open another question. I think this one is solved.
0
 
Slick812Commented:
I really should not post any more, this is getting out of context for this question,
@peps03 - in your post ID: 37837273, you say =  "i don't really get the php.net explanation", that's because the whole finfo_file thing seems to be about un specific "categories" that have something to do with the with the (to me) rather flimsy and flexible  mime_type for a file as used in DHTML browser display, And I also "don't really get the php.net explanation" about their use of this thing, especially for the constants on  http://www.php.net/manual/en/fileinfo.constants.php  , , what these could be much use for in the context of hard drive file info I just do not get. However, as a web page tester with this function -

function getUrlMimeType($url) {
    $buffer = file_get_contents($url);
    $finfo = new finfo(FILEINFO_MIME_TYPE);
    return $finfo->buffer($buffer);
}

there may be some use for it.


   I do not know and do not care much about file mime types, but I do know about testing files for the file header specs from experience with that. You might consider that mime_type will not be a narrow enough test for your purposes.
0
 
peps03Author Commented:
Slick812, thanks for the help!

Works great now!
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 7
  • 5
  • 2
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now