Best way to check if a uploaded file is a pdf?


Is the php 5.3 function: "finfo_file" the best way to check if a uploaded pdf file is really a pdf file? (

If yes, how should i use it for checking if a file is a pdf or docx file? I haven't used it before and learn best by seeing examples.

If no, what other method should i use?

Thanks a lot
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

how to check file types.. see example in below link
Loganathan NatarajanLAMP DeveloperCommented:
Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

peps03Author Commented:
Thanks, but this doesn't answer my question. This is absolutely not the best way to check an uploaded file. Change a .jpg to .pdf and it will upload.
peps03Author Commented:
Ok, but is "finfo_file" the best way?

If yes, can anyone give me an example? i don't really get the explanation...
Mark BradyPrincipal Data EngineerCommented:
You should be able to check file type using $_FILES["file"]["type"]
peps03Author Commented:
@elvin66 yes, you can. but its not foolproof.
changes .jpg to .pdf and you can trick it if only pdf is allowed.....

does anyone have experience with: php function:  finfo_file ?
greetings peps03, There is a difference in checking an uploaded file's name extension (as in  .pdf ) and checking the file's "file header" for relevant bytes that indicate what type of file it may or may not be. If you look it up the file header of a .pdf  file is always the four ascii text characters of -  "%PDF", so if you chop off the first 4 characters of the uploaded file string, and they are equal to "%PDF" then it is not a JPEG image file and is likely a real pdf file, but there can be other things in the file structure that may NOT have it be a "real pdf" (as you say), but I wold think that checking for "%PDF" is enough for your test.
ask questions if you need more info.
peps03Author Commented:
Thanks for your reaction Slick812!

So how do i check for "%PDF"?

Because $_FILES["file"]["type"] == "application/pdf" can be tricked by changing a image.jpg to image.pdf.
Kinda not much time now, sorry, but the $_FILES array has the disk location for your upload in -

so get this into a String -
$file1 = file_get_contents($_FILES['file']['tmp_name']);

then use substr to get first four char

$file1 = substr($file1,0,4);

then test for whatever file header you need,  %PDF   in this case

if ($file1 == '%PDF') { $valid = true;   } else {  $valid = false;  }

this is untested, but should give you the method
peps03Author Commented:
Thanks Slick812.

This works wel for pdf, rtf.
But not really for .doc and .docx.

You know how i should validate them?
OK, as a developer you might learn that all programming info is available from web searches, , and file TYPE file header specs are much used and easy to find, although you ask the question, all I'm goin to do is a web search for "doc file header" or whatever the ext may be, I do not really remember this kind of thing.

I found the DOC ext as nine characters (bytes) =
Hex: D0 CF 11 E0 A1 B1 1A E1 00

so that would be -
$file1 = substr($file1,0,9)
if ($file1 == chr(208).chr(207).chr(17).chr(224).chr(161).chr(177).chr(26).chr(225).chr(0)) { $doc = true;   } else {  $doc = false;  }

I found the DOCX ext as four characters (bytes) =
Hex: 50 4B 03 04

so that would be -
$file1 = substr($file1,0,4)
if ($file1 == 'PK'.chr(3).chr(4)) { $docx = true;   } else {  $docx = false;  }

I found the RTF ext as five characters (bytes) =

so that would be -
$file1 = substr($file1,0,5)
if ($file1 == '{\\rtf') { $rtf = true;   } else {  $rtf = false;  }

you can also use the developer's tool, a HEX Editor and just look at the first few bytes for a common set for several files.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
OH I guess I should say, That some file header ID's like  "%PDF" are standard between many file-program versions, and others are NOT so standard, and may be different between some program versions, I know that microsoft changed the WORD program .DOC file headers alot between some versions (Oh yea there are several .DOC file extensions that are NOT microsoft WORD, MS wordpad used to do a DOC, and several other non MS programs had DOC, these do NOT use the MS .DOC file headers for WORD)
So my info above is untested, and may vary with versions and different programs, , I do not believe that you can patent a file extension, so any program can use any extension that they like, ,  so you may want do research if you find files that do not seem to match some header spec you come across.
peps03Author Commented:
@Slick812 thanks again.
you are right about searching the internet. i did. and found what you found but didn't know how to use it.
with the .doc format i got 4 squares as outcome. couldn't read it. didn't know i had to use: chr(17).....

@atique_ansari: thanks but: says:This function has been deprecated as the PECL extension Fileinfo provides the same functionality (and more) in a much cleaner way.

They suggest to use: Fileinfo. but i don't know how to correctly use it. i tried though!

Can somebody show me how to validate a file using Fileinfo? i have php5.3
Mark BradyPrincipal Data EngineerCommented:
You should accept slick182 as the correct answer and open another question. I think this one is solved.
I really should not post any more, this is getting out of context for this question,
@peps03 - in your post ID: 37837273, you say =  "i don't really get the explanation", that's because the whole finfo_file thing seems to be about un specific "categories" that have something to do with the with the (to me) rather flimsy and flexible  mime_type for a file as used in DHTML browser display, And I also "don't really get the explanation" about their use of this thing, especially for the constants on  , , what these could be much use for in the context of hard drive file info I just do not get. However, as a web page tester with this function -

function getUrlMimeType($url) {
    $buffer = file_get_contents($url);
    $finfo = new finfo(FILEINFO_MIME_TYPE);
    return $finfo->buffer($buffer);

there may be some use for it.

   I do not know and do not care much about file mime types, but I do know about testing files for the file header specs from experience with that. You might consider that mime_type will not be a narrow enough test for your purposes.
peps03Author Commented:
Slick812, thanks for the help!

Works great now!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.