Solved

Parsing binary data file in PHP

Posted on 2008-10-24
10
679 Views
Last Modified: 2013-11-18
How come this doesn't work?

I am trying to extract the data from the TrueType font file.  I am sure of the structure but I don't know why it is not displaying the number.  It should be in hex form but obviously not a string.  Am I doing something wrong?

This prints out a number just fine:

print 0x0000F8;

So what is the difference?

Thank you,

Shawn W
define(BYTE, 1);

define(CHAR, 1);

define(USHORT, 2);

define(SHORT, 2);

define(UINT24, 3);

define(ULONG, 4);

define(LONG, 4);

define(FIXED, 4);

define(FWORD, 2);

define(UFWORD, 2);

define(F2DOT14, 2);
 

//Offset Table

//Type					Name					Description

//---------- 	  --------      ------------------------------------

//Fixed 				sfnt version	0x00010000 for version 1.0.

//USHORT 				numTables 		Number of tables.

//USHORT 				searchRange 	(Maximum power of 2 <= numTables) x 16.

//USHORT 				entrySelector Log2(maximum power of 2 <= numTables).

//USHORT 				rangeShift 		NumTables x 16-searchRange.
 

$numTables = 10;
 

// Get Offset Table

$filename = "fonts/arial.ttf";

if(is_readable($filename)) {

	print("Opened");

	$handle = fopen($filename, "rb");

	$sfnt_version = fread($handle, FIXED);

	$numTables = fread($handle, USHORT);

	$searchRange = fread($handle, USHORT);

	$entrySelector = fread($handle, USHORT);

	$rangeShift = fread($handle, USHORT);
 

	print $numTables;
 

	fclose($handle);

}

Open in new window

0
Comment
Question by:SignsUS
  • 6
  • 4
10 Comments
 
LVL 13

Expert Comment

by:AielloJ
ID: 22796016
If you have "magic quotes" enabled PHP will convert characters like null to their "escaped" representation of "\0".  You can check your setting by executing the get_magic_quotes_runtime() function and change them using the set_magic_quotes_runtime() function.
0
 
LVL 1

Author Comment

by:SignsUS
ID: 22796081
I have revised the code a bit.

$sfnt_version contains 0x00010000

If I do a 'print 0x00010000' it displays 65536

But 'print $sfnt_version' prints out some gobbly-gook
set_magic_quotes_runtime(0);
 

// http://www.microsoft.com/typography/otspec/otff.htm
 

/**************************************************************************/

// Motorola-style byte ordering (Big Endian)

/**************************************************************************/

//Data					Type		Description

//---------     ------  --------------------------------------------------------------------------

//BYTE					8-bit		unsigned integer.

//CHAR					8-bit		signed integer.

//USHORT				16-bit	unsigned integer.

//SHORT					16-bit	signed integer.

//UINT24				       24-bit	unsigned integer.

//ULONG					32-bit	unsigned integer.

//LONG					32-bit	signed integer.

//Fixed					32-bit	signed fixed-point number (16.16)

//FUNIT									Smallest measurable distance in the em space.

//FWORD					16-bit	signed integer (SHORT) that describes a quantity in FUnits.

//UFWORD				16-bit	unsigned integer (USHORT) that describes a quantity in FUnits.

//F2DOT14				16-bit	signed fixed number with the low 14 bits of fraction (2.14).

//LONGDATETIME					Date represented in number of seconds since 12:00 midnight, January 1, 1904. The value is represented as a signed 64-bit integer.

//Tag										Array of four uint8s (length = 32 bits) used to identify a script, language system, feature, or baseline

//GlyphID								Glyph index number, same as uint16(length = 16 bits)

//Offset								Offset to a table, same as uint16 (length = 16 bits), NULL offset = 0x0000

/**************************************************************************/
 

define(BYTE, 1);

define(CHAR, 1);

define(USHORT, 2);

define(SHORT, 2);

define(UINT24, 3);

define(ULONG, 4);

define(LONG, 4);

define(FIXED, 4);

define(FWORD, 2);

define(UFWORD, 2);

define(F2DOT14, 2);
 

//Offset Table

//Type					Name			  Description

//---------- 	                       --------                         ------------------------------------

//Fixed 				        sfnt version	          0x00010000 for version 1.0.

//USHORT 				numTables 		  Number of tables.

//USHORT 				searchRange 	          (Maximum power of 2 <= numTables) x 16.

//USHORT 				entrySelector            Log2(maximum power of 2 <= numTables).

//USHORT 				rangeShift 		 NumTables x 16-searchRange.
 

// Get Offset Table

$filename = "fonts/arial.ttf";

if(is_readable($filename)) {

	$handle = fopen($filename, "rb");

	$sfnt_version = fread($handle, 4);

	

	$numTables = fread($handle, 2);

	$searchRange = fread($handle, USHORT);

	$entrySelector = fread($handle, USHORT);

	$rangeShift = fread($handle, USHORT);

	

	print $sfnt_version;
 

	fclose($handle);

}
 

print '<br><br>';

print 0x00010000;

Open in new window

0
 
LVL 13

Expert Comment

by:AielloJ
ID: 22796342
Based on the comments in your code there's something you should try.  Your comment states: Motorola-style byte ordering (Big Endian).  Windows uses Little endian.  Depending on the machine type that wrote the file you may have to convert endian types.  The PHP manual at URL:

  http://www.php.net/manual/en/function.pack.php

states: "Also note that PHP internally stores integer values as signed values of a machine-dependent size."  Most minicomputers and mainframes had a minimum 16-bit storage scheme.  In your prior post you stated:

  $sfnt_version contains 0x00010000
  If I do a 'print 0x00010000' it displays 65536

A version number of 0x00010000' (65536) would probably never occurr, but a version of 0x00000001 (1) by switching the endian would be realistic.

JRA
0
 
LVL 1

Author Comment

by:SignsUS
ID: 22798145
Thanks for the advice but I am not concerned with the version number.  I am just trying to work with the byte information.

This seems to work:

hexdec(bin2hex($sfnt_version))

Which spits out 65536.

That is the idea but I don't want to have to do this conversion for EVERY single piece of data.

Like I cannot seem to get something like this to work.

for($i, $i ^ $sfnt_version, $++) {
 ...
}

In theory, this should loop 65536 times because of the XOR.  Once the binary digits match up, it should all go false.  Which is correct in theory but this is not happening.  I don't think it is even comparing the two properly.

Any advice would be great.

Thank you.

Shawn W
0
 
LVL 1

Author Comment

by:SignsUS
ID: 22798154
I mean $i = 0; ...
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 1

Author Comment

by:SignsUS
ID: 22798157
..and $i++
0
 
LVL 13

Expert Comment

by:AielloJ
ID: 22798573
SignsUS:

You totally missed my point.  A version number of 65536 is extremely unlikely.  What is very likely is a version number of 1 which is what you would get if the endian values in your example were swapped.  Your example of what seems to work is still returning a value of 65536 for the version number, which is where you started, so essentially nothing has changed.  You're still getting a version number that doesn't make sense.  You don't seriously think there have been 65536 versions of this font file do you?  The version number was used as a simple example of a single data element that makes sense to explain the concept.

The comment says: // Motorola-style byte ordering (Big Endian)

Most Windows based, and other, systems use Little Endian which will do exactly what is happening when reading Big Endian files.  There's no magic conversion when the file is read.  Big Endian / Little Endian issues, and word size issues, will occur on every element of data you read from such files.
0
 
LVL 1

Author Comment

by:SignsUS
ID: 22798631
Hey, thank you so much for your help. Maybe I wasn't wording my question right. But this is basically the answer I was looking for:

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21651537.html
0
 
LVL 13

Accepted Solution

by:
AielloJ earned 500 total points
ID: 22798701
That's why I included the reference to the webpage below in my prior post.  It says, and my prior posts, referred to the exact same thing except they didn't referr to it by it's industry common Endian naming convention.
  http://www.php.net/manual/en/function.pack.php

Glad it's working for you.

JRA
0
 
LVL 1

Author Closing Comment

by:SignsUS
ID: 31509631
Thank you
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
groupSum6 challenge 6 35
VB6 Compile Compatibility Issue 4 31
Turning python script into an applet 12 39
regex expression 9 22
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now