wjm013
asked on
What is the logic for representing Unicode hex character 0419 as 0xD0 0x99 in Word 2007?
If I insert Unicode character 0419 (hex) in a Word 2007 document and then look at its byte representation in the document.xml file, it appears as <0xD0> <0x99>.
What is the logic behind this conversion?
What is the logic behind this conversion?
ASKER
>>>> I insert Unicode character 0419 (hex) in a Word 2007
>>>>How exactly you were doing that?
1 - Opened a Word 2007 file, chose Insert Symbol and Arial Unicode MS, chose "from Unicode (hex)", navigated to 0419 and inserted it.
2 - Saved the Word 2007 file.
3 - Added ".ZIP" to the end of the filename.
4 - Extracted the Zip files to a separate directory.
5 - Opened "document.xml" in a text editor.
6 - Navigated to the 0419 character, , and chose to view the binary which is "D0 99"
>>>>How exactly you were doing that?
1 - Opened a Word 2007 file, chose Insert Symbol and Arial Unicode MS, chose "from Unicode (hex)", navigated to 0419 and inserted it.
2 - Saved the Word 2007 file.
3 - Added ".ZIP" to the end of the filename.
4 - Extracted the Zip files to a separate directory.
5 - Opened "document.xml" in a text editor.
6 - Navigated to the 0419 character, , and chose to view the binary which is "D0 99"
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
How exactly you were doing that?
>>>> it appears as <0xD0> <0x99>.
How do you know it is the character you entered as 0419?
>>>> What is the logic behind this conversion?
Don't know. As far as I could see there is 0x0419 is quite a different character than 0xD099 (or 0x99D0 if the XML prints the bytes from left to right.
I did similar to you recently but I checked the representation in the binary .doc file using the hex editor of VS. Here the hex representations of the UNICODE chars from Lucida Unicode font were prettyly matching with that documented in the charts.