Solved

How to use Extended ASCII set (128 - 255) in C#

Posted on 2016-09-09
10
75 Views
Last Modified: 2016-09-11
Hi, I'm using Visual Studio 2010 C# to create a Dictionary, where some of the "Keys" are Extended ASCII characters -- (chars 128-255).

So, if my Dictionary contained the Key / Value pair {(char)143, 'A'}, {(char)144, 'E'} - I would need to have the Dictionary return an A with a dot on top of it (i.e. (char)143) for the 'A' and an E with a hyphen on top of it (i.e. (char)144) for the E. -- (I'm using A and E here to make this simple to explain...I know A and E are different chars).

      Dictionary<char, char> ExtendedSet = new Dictionary<char, char> {(char)143, 'A'}, {(char)144, 'E'},...};

My Dictionary works if I use the ASCII set below 127, but I cannot get the Extended ASCII set above 127 to print to the screen or to the console. How can I get the "Encoding" to recognize the proper ASCII set?

Please excuse me if I'm not phrasing this correctly... but at its simplest, I need the Extended ASCII characters to show up in a Textbox, on the screen and in the console and they are not.  

Thanks,
Fulano
0
Comment
Question by:Mr_Fulano
10 Comments
 
LVL 23

Assisted Solution

by:Dr. Klahn
Dr. Klahn earned 100 total points
ID: 41792018
This is apparently a headache no matter what, but it is discussed a bit here.  If you want to stick with one byte per character encoding, it may be necessary to manually set the desired character code page.
0
 

Author Comment

by:Mr_Fulano
ID: 41792059
HI Dr. Klahn, I had found several articles like the one your linked to, but they didn't answer my question. They all discuss linking to a different character set, but how....???

Perhaps this is trickier than I an anticipating it to be. I thought it was simply telling the applicant to use a different Encoding scheme.
0
 
LVL 16

Accepted Solution

by:
DansDadUK earned 400 total points
ID: 41792605
Perhaps you are looking for something like execution-charset?

As regards which character set to use:

If you want to use single-bye encoding in the Western world, use the ISO-8859-1 Latin-1 set, or perhaps the Windows ANSI superset of this (which uses character codes in the 'reserved C1 control codes' range (0x80 -> 0x9f) for additional graphic glyphs, such as the Euro sign).
For international applications, make use of the UTF-8 encoding of the Unicode character set; this uses between one and four bytes for each character; characters in the basic ASCII set are still the same single-byte code-point values.
0
 
LVL 16

Expert Comment

by:DansDadUK
ID: 41792611
... and I've no idea which (if any) character sets include either of the characters:

A with a dot on top of it
E with a hyphen on top of it

Can you clarify just what character shapes you are trying to use?
What are their Unicode code-point values?
0
 

Author Comment

by:Mr_Fulano
ID: 41792760
They are 143 and 144 as I mentioned in my original post.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:Mr_Fulano
ID: 41792765
The are characters like this:

 -- Å œ∑´®†¥¨ˆøπ“‘åß∂ƒ©˙∆˚¬…Ω≈ç√∫˜µ≤≥ Œ„´‰ˇÁ¨ˆØ∏ÅÍÎÏ˝ÓÔÒÚƸ˛Ç◊ı˜Â¯˘¿

I'm not sure how to do the E on the Mac.
0
 

Author Comment

by:Mr_Fulano
ID: 41793127
Hi DansDad, I think i figured out my problem...I wrote a simple FOR loop to print to the condole every ASCII character from 0 to 255. That was an eye opener!

for (int x = 0; x <= 255; x++)
{
   Console.Writeline( x + " ----- " + (char) x);
}

I found that (char)143 and (char)144, which is what I was testing with have no output. The majority of the rest of the characters actually do print out, but those are blank...not sure why, but I did find that the two characters I wanted to use are at 193 and 201. So, that solves my issue.

I am going to award you and Dr. Klahn the points, because you did a lot of research for me and I grateful for your help!

Thanks,
Fulano
0
 

Author Closing Comment

by:Mr_Fulano
ID: 41793129
Thank you both for your help. I found my solution, which I posted on the question for other in case they need the help.
0
 
LVL 29

Expert Comment

by:Olaf Doschke
ID: 41793154
You get a different output, whether you have  Console.OutputEncoding = Encoding.Unicode; or  Console.OutputEncoding = Encoding.ASCII; The console is NOT using ASCII by default.

When I use Unicode I also get a gap of characters in the range 127 to 159. A char is then actually not in the range of 0-255 but 0-65535 (2 bytes), of which not all codes make sense. That's actually UCS2, not todays Unicode, but Unicode as it initially was defined. Converting 0-255 only covers most valid codes, which also ASCII would give. The gap you see in UCS2 code values, which in ASCII would be valid printable codes is somewhere else in UCS2, don't ask me, you may look it up. To prove the fact char is not single byte, simply Console.Write((char)0x20AC); That will give you the € symbol. So char and byte are not interchangable and don't cover same ranges.

The simple solution is to switch to ASCII output encoding, if you want ASCII. Then 0-255 simply are the ASCII codes.

Bye, Olaf.
0
 
LVL 16

Expert Comment

by:DansDadUK
ID: 41793320
@Mr_Fulano - glad that you have a resolution, despite me (unintentionally) leading you astray with reference to "execution-charset" - this is a C++ compiler directive, nothing to do with C#.

As regards the character you showed in an earlier post - the two highlighted characters are:

Å - Unicode U+00C5 "Latin Capital A With Ring Above"
Á - Unicode U+00C1 "Latin Capital A With Acute"

but I can't find a character "A with a dot on top" (or "E with a dot on top" either).

If you are using an 8-bit coded character set, the code-point you require to map to these Unicode characters depends on the selected character set.
I'm not familiar with Mac systems, but it seems that the old (8-bit) "MAC OS Roman" set is not (unlike ISO-8859-1) a strict subset of the Unicode character set (i.e. where the 256 values map directly to the 256 code-points in group 0, plane 0, row 0 of Unicode).

As @Olaf Doschke points out, characters in C# are always 16-bit Unicode UCS-2 values (and strings are effectively arrays of such characters), and you can 'convert' to different Encodings (but you'll only get a glyph displayed if a mapping exists for the chosen code-point).
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Suggested Solutions

Extention Methods in C# 3.0 by Ivo Stoykov C# 3.0 offers extension methods. They allow extending existing classes without changing the class's source code or relying on inheritance. These are static methods invoked as instance method. This…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now