?
Solved

Parsing binary data file in PHP

Posted on 2008-10-24
10
Medium Priority
?
690 Views
Last Modified: 2013-11-18
How come this doesn't work?

I am trying to extract the data from the TrueType font file.  I am sure of the structure but I don't know why it is not displaying the number.  It should be in hex form but obviously not a string.  Am I doing something wrong?

This prints out a number just fine:

print 0x0000F8;

So what is the difference?

Thank you,

Shawn W
define(BYTE, 1);
define(CHAR, 1);
define(USHORT, 2);
define(SHORT, 2);
define(UINT24, 3);
define(ULONG, 4);
define(LONG, 4);
define(FIXED, 4);
define(FWORD, 2);
define(UFWORD, 2);
define(F2DOT14, 2);
 
//Offset Table
//Type					Name					Description
//---------- 	  --------      ------------------------------------
//Fixed 				sfnt version	0x00010000 for version 1.0.
//USHORT 				numTables 		Number of tables.
//USHORT 				searchRange 	(Maximum power of 2 <= numTables) x 16.
//USHORT 				entrySelector Log2(maximum power of 2 <= numTables).
//USHORT 				rangeShift 		NumTables x 16-searchRange.
 
$numTables = 10;
 
// Get Offset Table
$filename = "fonts/arial.ttf";
if(is_readable($filename)) {
	print("Opened");
	$handle = fopen($filename, "rb");
	$sfnt_version = fread($handle, FIXED);
	$numTables = fread($handle, USHORT);
	$searchRange = fread($handle, USHORT);
	$entrySelector = fread($handle, USHORT);
	$rangeShift = fread($handle, USHORT);
 
	print $numTables;
 
	fclose($handle);
}

Open in new window

0
Comment
Question by:SignsUS
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 13

Expert Comment

by:AielloJ
ID: 22796016
If you have "magic quotes" enabled PHP will convert characters like null to their "escaped" representation of "\0".  You can check your setting by executing the get_magic_quotes_runtime() function and change them using the set_magic_quotes_runtime() function.
0
 
LVL 1

Author Comment

by:SignsUS
ID: 22796081
I have revised the code a bit.

$sfnt_version contains 0x00010000

If I do a 'print 0x00010000' it displays 65536

But 'print $sfnt_version' prints out some gobbly-gook
set_magic_quotes_runtime(0);
 
// http://www.microsoft.com/typography/otspec/otff.htm
 
/**************************************************************************/
// Motorola-style byte ordering (Big Endian)
/**************************************************************************/
//Data					Type		Description
//---------     ------  --------------------------------------------------------------------------
//BYTE					8-bit		unsigned integer.
//CHAR					8-bit		signed integer.
//USHORT				16-bit	unsigned integer.
//SHORT					16-bit	signed integer.
//UINT24				       24-bit	unsigned integer.
//ULONG					32-bit	unsigned integer.
//LONG					32-bit	signed integer.
//Fixed					32-bit	signed fixed-point number (16.16)
//FUNIT									Smallest measurable distance in the em space.
//FWORD					16-bit	signed integer (SHORT) that describes a quantity in FUnits.
//UFWORD				16-bit	unsigned integer (USHORT) that describes a quantity in FUnits.
//F2DOT14				16-bit	signed fixed number with the low 14 bits of fraction (2.14).
//LONGDATETIME					Date represented in number of seconds since 12:00 midnight, January 1, 1904. The value is represented as a signed 64-bit integer.
//Tag										Array of four uint8s (length = 32 bits) used to identify a script, language system, feature, or baseline
//GlyphID								Glyph index number, same as uint16(length = 16 bits)
//Offset								Offset to a table, same as uint16 (length = 16 bits), NULL offset = 0x0000
/**************************************************************************/
 
define(BYTE, 1);
define(CHAR, 1);
define(USHORT, 2);
define(SHORT, 2);
define(UINT24, 3);
define(ULONG, 4);
define(LONG, 4);
define(FIXED, 4);
define(FWORD, 2);
define(UFWORD, 2);
define(F2DOT14, 2);
 
//Offset Table
//Type					Name			  Description
//---------- 	                       --------                         ------------------------------------
//Fixed 				        sfnt version	          0x00010000 for version 1.0.
//USHORT 				numTables 		  Number of tables.
//USHORT 				searchRange 	          (Maximum power of 2 <= numTables) x 16.
//USHORT 				entrySelector            Log2(maximum power of 2 <= numTables).
//USHORT 				rangeShift 		 NumTables x 16-searchRange.
 
// Get Offset Table
$filename = "fonts/arial.ttf";
if(is_readable($filename)) {
	$handle = fopen($filename, "rb");
	$sfnt_version = fread($handle, 4);
	
	$numTables = fread($handle, 2);
	$searchRange = fread($handle, USHORT);
	$entrySelector = fread($handle, USHORT);
	$rangeShift = fread($handle, USHORT);
	
	print $sfnt_version;
 
	fclose($handle);
}
 
print '<br><br>';
print 0x00010000;

Open in new window

0
 
LVL 13

Expert Comment

by:AielloJ
ID: 22796342
Based on the comments in your code there's something you should try.  Your comment states: Motorola-style byte ordering (Big Endian).  Windows uses Little endian.  Depending on the machine type that wrote the file you may have to convert endian types.  The PHP manual at URL:

  http://www.php.net/manual/en/function.pack.php

states: "Also note that PHP internally stores integer values as signed values of a machine-dependent size."  Most minicomputers and mainframes had a minimum 16-bit storage scheme.  In your prior post you stated:

  $sfnt_version contains 0x00010000
  If I do a 'print 0x00010000' it displays 65536

A version number of 0x00010000' (65536) would probably never occurr, but a version of 0x00000001 (1) by switching the endian would be realistic.

JRA
0
Don't Cry: How Liquid Web is Ensuring Security

WannaCry is just the start. Read how Liquid Web is protecting itself and its customers against new threats.

 
LVL 1

Author Comment

by:SignsUS
ID: 22798145
Thanks for the advice but I am not concerned with the version number.  I am just trying to work with the byte information.

This seems to work:

hexdec(bin2hex($sfnt_version))

Which spits out 65536.

That is the idea but I don't want to have to do this conversion for EVERY single piece of data.

Like I cannot seem to get something like this to work.

for($i, $i ^ $sfnt_version, $++) {
 ...
}

In theory, this should loop 65536 times because of the XOR.  Once the binary digits match up, it should all go false.  Which is correct in theory but this is not happening.  I don't think it is even comparing the two properly.

Any advice would be great.

Thank you.

Shawn W
0
 
LVL 1

Author Comment

by:SignsUS
ID: 22798154
I mean $i = 0; ...
0
 
LVL 1

Author Comment

by:SignsUS
ID: 22798157
..and $i++
0
 
LVL 13

Expert Comment

by:AielloJ
ID: 22798573
SignsUS:

You totally missed my point.  A version number of 65536 is extremely unlikely.  What is very likely is a version number of 1 which is what you would get if the endian values in your example were swapped.  Your example of what seems to work is still returning a value of 65536 for the version number, which is where you started, so essentially nothing has changed.  You're still getting a version number that doesn't make sense.  You don't seriously think there have been 65536 versions of this font file do you?  The version number was used as a simple example of a single data element that makes sense to explain the concept.

The comment says: // Motorola-style byte ordering (Big Endian)

Most Windows based, and other, systems use Little Endian which will do exactly what is happening when reading Big Endian files.  There's no magic conversion when the file is read.  Big Endian / Little Endian issues, and word size issues, will occur on every element of data you read from such files.
0
 
LVL 1

Author Comment

by:SignsUS
ID: 22798631
Hey, thank you so much for your help. Maybe I wasn't wording my question right. But this is basically the answer I was looking for:

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21651537.html
0
 
LVL 13

Accepted Solution

by:
AielloJ earned 1500 total points
ID: 22798701
That's why I included the reference to the webpage below in my prior post.  It says, and my prior posts, referred to the exact same thing except they didn't referr to it by it's industry common Endian naming convention.
  http://www.php.net/manual/en/function.pack.php

Glad it's working for you.

JRA
0
 
LVL 1

Author Closing Comment

by:SignsUS
ID: 31509631
Thank you
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.
Starting up a Project
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question