Link to home
Start Free TrialLog in
Avatar of ntasker02
ntasker02

asked on

How do I convert a ANSI Number to a Unicode Number in C#, for output in XSL:FO

This is my first question on the site

An application that I am writing reads an RTF document I supply but I have no control over the RTF's content.  In the RTF data, it contains special characters such as the Euro symbol (¬ - ANSI 128)

My application parses the RTF and outputs it as XSL:FO.  A later step renders the XSL:FO into a PDF document.  The problem is that, from what I've found, the special characters need to be converted to their UNICODE equivalents in order for them to appear in the pdf correctly.  Based on a table found here: http://www.alanwood.net/demos/ansi.html, I can see that the unicode number equivalent for the Euro (ANSI 128) is 8364.

In C#, how do I convert a ANSI number to a Unicode number?  I've tried the attached code bit, but it is just changing encoding types of the numbers, and not actually converting them.

Failing that, is there another way to output characters in FO such that I can use their ANSI number instead of their unicode number?  Here is how I output a ® in FO (ANSI and Unicode number 174)

<fo:inline font-size="0.50em" baseline-shift="super" font-family="Times New Roman" color="#000000">&#174;</fo:inline>

Attached is a .fo file with multiple special characters in it to show what I am trying to do, saved as a .txt file due to file upload restrictions on this site.  I am using RenderX's XEP rendering engine to eventually render from XSL:FO to PDF which is where I can tell the characters arn't translating correctly.

Thanks!
System.Text.Encoding ansi = System.Text.Encoding.Default;
                                System.Text.Encoding unicode = System.Text.Encoding.UTF8;
 
                                byte[] ansibytes = ansi.GetBytes(MyAnsiCharacterCode);
                                
 
                                byte[] unicodebytes = System.Text.Encoding.Convert(ansi, unicode, ansibytes);
                                char[] unicodechars = new char[unicode.GetCharCount(unicodebytes, 0, unicodebytes.Length)];
                                unicode.GetChars(unicodebytes, 0, unicodebytes.Length, unicodechars, 0);
 
                                string unicodestring = new string(unicodechars);

Open in new window

character-example.txt
Avatar of dkloeck
dkloeck
Flag of Spain image

Avatar of ntasker02
ntasker02

ASKER

The example code I posted is a slight variation of the code found on that page

The full link is http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemtextencodingclasstopic.asp

Unfortunately that won't work for what I am trying to do.  I either need a way to get the unicode number equivalent of an ANSI code, or a way for XSL:FO to accept ANSI codes as special characters to output
As a followup to my above comment, I would like to know if there is a way to get the unicode number for an ANSI number, without using a manually created character map table/class.  I can easily say "if I see ansi 128, output unicode character 8364" using a map.  I'd like an automated lookup without the need for hardcoding characters
I have solved the problem myself.  The code is attached for anyone that might come across this same problem.

I create the string "s" by taking my input ansi number such as 128 for the Euro Symbol (using codepage 1252).  I then convert that number to hex where '128' becomes '80'.  Next I convert that hex number by casting 0x80 to a byte, and then using the System.Text.Encoding class object cp1252enc I made which represents codepage 1252, and creating the string equivalent of the byte I just made.  My string now contains the ¬ (euro) symbol.

From there I can use a different encoding object, in this case utf8enc, to convert my string to a byte array in UTF8 encoding, and eventually getting the utf8 equivalent number.
                                System.Text.Encoding cp1252enc = System.Text.Encoding.GetEncoding(1252);
                                System.Text.Encoding utf8enc = System.Text.Encoding.UTF8;
 
                                string ParamaterAsHex = tok.Parameter.ToString("X");
 
                                ParamaterAsHex = "0x" + ParamaterAsHex;
 
                                string s = Encoding.GetEncoding(1252).GetString(new byte[] { Convert.ToByte(ParamaterAsHex, 16) });
 
 
                                byte[] utf8bytes = Encoding.UTF8.GetBytes(s);
                                char[] utf8chars = Encoding.UTF8.GetChars(utf8bytes);
 
                                int utf8int = (int)utf8chars[0];

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ntasker02
ntasker02

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Closed, 200 points refunded.
Computer101
EE Admin