SignsUS
asked on
Parsing binary data file in PHP
How come this doesn't work?
I am trying to extract the data from the TrueType font file. I am sure of the structure but I don't know why it is not displaying the number. It should be in hex form but obviously not a string. Am I doing something wrong?
This prints out a number just fine:
print 0x0000F8;
So what is the difference?
Thank you,
Shawn W
I am trying to extract the data from the TrueType font file. I am sure of the structure but I don't know why it is not displaying the number. It should be in hex form but obviously not a string. Am I doing something wrong?
This prints out a number just fine:
print 0x0000F8;
So what is the difference?
Thank you,
Shawn W
define(BYTE, 1);
define(CHAR, 1);
define(USHORT, 2);
define(SHORT, 2);
define(UINT24, 3);
define(ULONG, 4);
define(LONG, 4);
define(FIXED, 4);
define(FWORD, 2);
define(UFWORD, 2);
define(F2DOT14, 2);
//Offset Table
//Type Name Description
//---------- -------- ------------------------------------
//Fixed sfnt version 0x00010000 for version 1.0.
//USHORT numTables Number of tables.
//USHORT searchRange (Maximum power of 2 <= numTables) x 16.
//USHORT entrySelector Log2(maximum power of 2 <= numTables).
//USHORT rangeShift NumTables x 16-searchRange.
$numTables = 10;
// Get Offset Table
$filename = "fonts/arial.ttf";
if(is_readable($filename)) {
print("Opened");
$handle = fopen($filename, "rb");
$sfnt_version = fread($handle, FIXED);
$numTables = fread($handle, USHORT);
$searchRange = fread($handle, USHORT);
$entrySelector = fread($handle, USHORT);
$rangeShift = fread($handle, USHORT);
print $numTables;
fclose($handle);
}
If you have "magic quotes" enabled PHP will convert characters like null to their "escaped" representation of "\0". You can check your setting by executing the get_magic_quotes_runtime() function and change them using the set_magic_quotes_runtime() function.
ASKER
I have revised the code a bit.
$sfnt_version contains 0x00010000
If I do a 'print 0x00010000' it displays 65536
But 'print $sfnt_version' prints out some gobbly-gook
$sfnt_version contains 0x00010000
If I do a 'print 0x00010000' it displays 65536
But 'print $sfnt_version' prints out some gobbly-gook
set_magic_quotes_runtime(0);
// http://www.microsoft.com/typography/otspec/otff.htm
/**************************************************************************/
// Motorola-style byte ordering (Big Endian)
/**************************************************************************/
//Data Type Description
//--------- ------ --------------------------------------------------------------------------
//BYTE 8-bit unsigned integer.
//CHAR 8-bit signed integer.
//USHORT 16-bit unsigned integer.
//SHORT 16-bit signed integer.
//UINT24 24-bit unsigned integer.
//ULONG 32-bit unsigned integer.
//LONG 32-bit signed integer.
//Fixed 32-bit signed fixed-point number (16.16)
//FUNIT Smallest measurable distance in the em space.
//FWORD 16-bit signed integer (SHORT) that describes a quantity in FUnits.
//UFWORD 16-bit unsigned integer (USHORT) that describes a quantity in FUnits.
//F2DOT14 16-bit signed fixed number with the low 14 bits of fraction (2.14).
//LONGDATETIME Date represented in number of seconds since 12:00 midnight, January 1, 1904. The value is represented as a signed 64-bit integer.
//Tag Array of four uint8s (length = 32 bits) used to identify a script, language system, feature, or baseline
//GlyphID Glyph index number, same as uint16(length = 16 bits)
//Offset Offset to a table, same as uint16 (length = 16 bits), NULL offset = 0x0000
/**************************************************************************/
define(BYTE, 1);
define(CHAR, 1);
define(USHORT, 2);
define(SHORT, 2);
define(UINT24, 3);
define(ULONG, 4);
define(LONG, 4);
define(FIXED, 4);
define(FWORD, 2);
define(UFWORD, 2);
define(F2DOT14, 2);
//Offset Table
//Type Name Description
//---------- -------- ------------------------------------
//Fixed sfnt version 0x00010000 for version 1.0.
//USHORT numTables Number of tables.
//USHORT searchRange (Maximum power of 2 <= numTables) x 16.
//USHORT entrySelector Log2(maximum power of 2 <= numTables).
//USHORT rangeShift NumTables x 16-searchRange.
// Get Offset Table
$filename = "fonts/arial.ttf";
if(is_readable($filename)) {
$handle = fopen($filename, "rb");
$sfnt_version = fread($handle, 4);
$numTables = fread($handle, 2);
$searchRange = fread($handle, USHORT);
$entrySelector = fread($handle, USHORT);
$rangeShift = fread($handle, USHORT);
print $sfnt_version;
fclose($handle);
}
print '<br><br>';
print 0x00010000;
Based on the comments in your code there's something you should try. Your comment states: Motorola-style byte ordering (Big Endian). Windows uses Little endian. Depending on the machine type that wrote the file you may have to convert endian types. The PHP manual at URL:
http://www.php.net/manual/en/function.pack.php
states: "Also note that PHP internally stores integer values as signed values of a machine-dependent size." Most minicomputers and mainframes had a minimum 16-bit storage scheme. In your prior post you stated:
$sfnt_version contains 0x00010000
If I do a 'print 0x00010000' it displays 65536
A version number of 0x00010000' (65536) would probably never occurr, but a version of 0x00000001 (1) by switching the endian would be realistic.
JRA
http://www.php.net/manual/en/function.pack.php
states: "Also note that PHP internally stores integer values as signed values of a machine-dependent size." Most minicomputers and mainframes had a minimum 16-bit storage scheme. In your prior post you stated:
$sfnt_version contains 0x00010000
If I do a 'print 0x00010000' it displays 65536
A version number of 0x00010000' (65536) would probably never occurr, but a version of 0x00000001 (1) by switching the endian would be realistic.
JRA
ASKER
Thanks for the advice but I am not concerned with the version number. I am just trying to work with the byte information.
This seems to work:
hexdec(bin2hex($sfnt_versi on))
Which spits out 65536.
That is the idea but I don't want to have to do this conversion for EVERY single piece of data.
Like I cannot seem to get something like this to work.
for($i, $i ^ $sfnt_version, $++) {
...
}
In theory, this should loop 65536 times because of the XOR. Once the binary digits match up, it should all go false. Which is correct in theory but this is not happening. I don't think it is even comparing the two properly.
Any advice would be great.
Thank you.
Shawn W
This seems to work:
hexdec(bin2hex($sfnt_versi
Which spits out 65536.
That is the idea but I don't want to have to do this conversion for EVERY single piece of data.
Like I cannot seem to get something like this to work.
for($i, $i ^ $sfnt_version, $++) {
...
}
In theory, this should loop 65536 times because of the XOR. Once the binary digits match up, it should all go false. Which is correct in theory but this is not happening. I don't think it is even comparing the two properly.
Any advice would be great.
Thank you.
Shawn W
ASKER
I mean $i = 0; ...
ASKER
..and $i++
SignsUS:
You totally missed my point. A version number of 65536 is extremely unlikely. What is very likely is a version number of 1 which is what you would get if the endian values in your example were swapped. Your example of what seems to work is still returning a value of 65536 for the version number, which is where you started, so essentially nothing has changed. You're still getting a version number that doesn't make sense. You don't seriously think there have been 65536 versions of this font file do you? The version number was used as a simple example of a single data element that makes sense to explain the concept.
The comment says: // Motorola-style byte ordering (Big Endian)
Most Windows based, and other, systems use Little Endian which will do exactly what is happening when reading Big Endian files. There's no magic conversion when the file is read. Big Endian / Little Endian issues, and word size issues, will occur on every element of data you read from such files.
You totally missed my point. A version number of 65536 is extremely unlikely. What is very likely is a version number of 1 which is what you would get if the endian values in your example were swapped. Your example of what seems to work is still returning a value of 65536 for the version number, which is where you started, so essentially nothing has changed. You're still getting a version number that doesn't make sense. You don't seriously think there have been 65536 versions of this font file do you? The version number was used as a simple example of a single data element that makes sense to explain the concept.
The comment says: // Motorola-style byte ordering (Big Endian)
Most Windows based, and other, systems use Little Endian which will do exactly what is happening when reading Big Endian files. There's no magic conversion when the file is read. Big Endian / Little Endian issues, and word size issues, will occur on every element of data you read from such files.
ASKER
Hey, thank you so much for your help. Maybe I wasn't wording my question right. But this is basically the answer I was looking for:
https://www.experts-exchange.com/questions/21651537/Problem-reading-in-data-from-file.html
https://www.experts-exchange.com/questions/21651537/Problem-reading-in-data-from-file.html
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you