Link to home
Start Free TrialLog in
Avatar of wjm013
wjm013

asked on

What is the logic for representing Unicode hex character 0419 as 0xD0 0x99 in Word 2007?

If I insert Unicode character 0419 (hex) in a Word 2007 document and then look at its byte representation in the document.xml file, it appears as <0xD0> <0x99>.

What is the logic behind this conversion?

Avatar of itsmeandnobodyelse
itsmeandnobodyelse
Flag of Germany image

>>>> I insert Unicode character 0419 (hex) in a Word 2007
How exactly you were doing that?

>>>> it appears as <0xD0> <0x99>.
How do you know it is the character you entered as 0419?

>>>> What is the logic behind this conversion?
Don't know. As far as I could see there is 0x0419 is quite a different character than 0xD099 (or 0x99D0 if the XML prints the bytes from left to right.


I did similar to you recently but I checked the representation in the binary .doc file using the hex editor of VS. Here the hex representations of the UNICODE chars from Lucida Unicode font were prettyly matching with that documented in the charts.
Avatar of wjm013
wjm013

ASKER

>>>> I insert Unicode character 0419 (hex) in a Word 2007
>>>>How exactly you were doing that?
1 - Opened a Word 2007 file, chose Insert Symbol and Arial Unicode MS, chose "from Unicode (hex)", navigated to 0419 and inserted it.

2 - Saved the Word 2007 file.

3 - Added ".ZIP" to the end of the filename.

4 - Extracted the Zip files to a separate directory.

5 - Opened "document.xml" in a text editor.

6 - Navigated to the 0419 character, , and chose to view the binary which is "D0 99"
ASKER CERTIFIED SOLUTION
Avatar of wjm013
wjm013

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial