Link to home
Start Free TrialLog in
Avatar of SignsUS
SignsUSFlag for United States of America

asked on

Parsing binary data file in PHP

How come this doesn't work?

I am trying to extract the data from the TrueType font file.  I am sure of the structure but I don't know why it is not displaying the number.  It should be in hex form but obviously not a string.  Am I doing something wrong?

This prints out a number just fine:

print 0x0000F8;

So what is the difference?

Thank you,

Shawn W
define(BYTE, 1);
define(CHAR, 1);
define(USHORT, 2);
define(SHORT, 2);
define(UINT24, 3);
define(ULONG, 4);
define(LONG, 4);
define(FIXED, 4);
define(FWORD, 2);
define(UFWORD, 2);
define(F2DOT14, 2);
 
//Offset Table
//Type					Name					Description
//---------- 	  --------      ------------------------------------
//Fixed 				sfnt version	0x00010000 for version 1.0.
//USHORT 				numTables 		Number of tables.
//USHORT 				searchRange 	(Maximum power of 2 <= numTables) x 16.
//USHORT 				entrySelector Log2(maximum power of 2 <= numTables).
//USHORT 				rangeShift 		NumTables x 16-searchRange.
 
$numTables = 10;
 
// Get Offset Table
$filename = "fonts/arial.ttf";
if(is_readable($filename)) {
	print("Opened");
	$handle = fopen($filename, "rb");
	$sfnt_version = fread($handle, FIXED);
	$numTables = fread($handle, USHORT);
	$searchRange = fread($handle, USHORT);
	$entrySelector = fread($handle, USHORT);
	$rangeShift = fread($handle, USHORT);
 
	print $numTables;
 
	fclose($handle);
}

Open in new window

Avatar of AielloJ
AielloJ
Flag of United States of America image

If you have "magic quotes" enabled PHP will convert characters like null to their "escaped" representation of "\0".  You can check your setting by executing the get_magic_quotes_runtime() function and change them using the set_magic_quotes_runtime() function.
Avatar of SignsUS

ASKER

I have revised the code a bit.

$sfnt_version contains 0x00010000

If I do a 'print 0x00010000' it displays 65536

But 'print $sfnt_version' prints out some gobbly-gook
set_magic_quotes_runtime(0);
 
// http://www.microsoft.com/typography/otspec/otff.htm
 
/**************************************************************************/
// Motorola-style byte ordering (Big Endian)
/**************************************************************************/
//Data					Type		Description
//---------     ------  --------------------------------------------------------------------------
//BYTE					8-bit		unsigned integer.
//CHAR					8-bit		signed integer.
//USHORT				16-bit	unsigned integer.
//SHORT					16-bit	signed integer.
//UINT24				       24-bit	unsigned integer.
//ULONG					32-bit	unsigned integer.
//LONG					32-bit	signed integer.
//Fixed					32-bit	signed fixed-point number (16.16)
//FUNIT									Smallest measurable distance in the em space.
//FWORD					16-bit	signed integer (SHORT) that describes a quantity in FUnits.
//UFWORD				16-bit	unsigned integer (USHORT) that describes a quantity in FUnits.
//F2DOT14				16-bit	signed fixed number with the low 14 bits of fraction (2.14).
//LONGDATETIME					Date represented in number of seconds since 12:00 midnight, January 1, 1904. The value is represented as a signed 64-bit integer.
//Tag										Array of four uint8s (length = 32 bits) used to identify a script, language system, feature, or baseline
//GlyphID								Glyph index number, same as uint16(length = 16 bits)
//Offset								Offset to a table, same as uint16 (length = 16 bits), NULL offset = 0x0000
/**************************************************************************/
 
define(BYTE, 1);
define(CHAR, 1);
define(USHORT, 2);
define(SHORT, 2);
define(UINT24, 3);
define(ULONG, 4);
define(LONG, 4);
define(FIXED, 4);
define(FWORD, 2);
define(UFWORD, 2);
define(F2DOT14, 2);
 
//Offset Table
//Type					Name			  Description
//---------- 	                       --------                         ------------------------------------
//Fixed 				        sfnt version	          0x00010000 for version 1.0.
//USHORT 				numTables 		  Number of tables.
//USHORT 				searchRange 	          (Maximum power of 2 <= numTables) x 16.
//USHORT 				entrySelector            Log2(maximum power of 2 <= numTables).
//USHORT 				rangeShift 		 NumTables x 16-searchRange.
 
// Get Offset Table
$filename = "fonts/arial.ttf";
if(is_readable($filename)) {
	$handle = fopen($filename, "rb");
	$sfnt_version = fread($handle, 4);
	
	$numTables = fread($handle, 2);
	$searchRange = fread($handle, USHORT);
	$entrySelector = fread($handle, USHORT);
	$rangeShift = fread($handle, USHORT);
	
	print $sfnt_version;
 
	fclose($handle);
}
 
print '<br><br>';
print 0x00010000;

Open in new window

Based on the comments in your code there's something you should try.  Your comment states: Motorola-style byte ordering (Big Endian).  Windows uses Little endian.  Depending on the machine type that wrote the file you may have to convert endian types.  The PHP manual at URL:

  http://www.php.net/manual/en/function.pack.php

states: "Also note that PHP internally stores integer values as signed values of a machine-dependent size."  Most minicomputers and mainframes had a minimum 16-bit storage scheme.  In your prior post you stated:

  $sfnt_version contains 0x00010000
  If I do a 'print 0x00010000' it displays 65536

A version number of 0x00010000' (65536) would probably never occurr, but a version of 0x00000001 (1) by switching the endian would be realistic.

JRA
Avatar of SignsUS

ASKER

Thanks for the advice but I am not concerned with the version number.  I am just trying to work with the byte information.

This seems to work:

hexdec(bin2hex($sfnt_version))

Which spits out 65536.

That is the idea but I don't want to have to do this conversion for EVERY single piece of data.

Like I cannot seem to get something like this to work.

for($i, $i ^ $sfnt_version, $++) {
 ...
}

In theory, this should loop 65536 times because of the XOR.  Once the binary digits match up, it should all go false.  Which is correct in theory but this is not happening.  I don't think it is even comparing the two properly.

Any advice would be great.

Thank you.

Shawn W
Avatar of SignsUS

ASKER

I mean $i = 0; ...
Avatar of SignsUS

ASKER

..and $i++
SignsUS:

You totally missed my point.  A version number of 65536 is extremely unlikely.  What is very likely is a version number of 1 which is what you would get if the endian values in your example were swapped.  Your example of what seems to work is still returning a value of 65536 for the version number, which is where you started, so essentially nothing has changed.  You're still getting a version number that doesn't make sense.  You don't seriously think there have been 65536 versions of this font file do you?  The version number was used as a simple example of a single data element that makes sense to explain the concept.

The comment says: // Motorola-style byte ordering (Big Endian)

Most Windows based, and other, systems use Little Endian which will do exactly what is happening when reading Big Endian files.  There's no magic conversion when the file is read.  Big Endian / Little Endian issues, and word size issues, will occur on every element of data you read from such files.
Avatar of SignsUS

ASKER

Hey, thank you so much for your help. Maybe I wasn't wording my question right. But this is basically the answer I was looking for:

https://www.experts-exchange.com/questions/21651537/Problem-reading-in-data-from-file.html
ASKER CERTIFIED SOLUTION
Avatar of AielloJ
AielloJ
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of SignsUS

ASKER

Thank you