Solved

How to use Extended ASCII set (128 - 255) in C#

Posted on 2016-09-09
10
183 Views
Last Modified: 2016-09-11
Hi, I'm using Visual Studio 2010 C# to create a Dictionary, where some of the "Keys" are Extended ASCII characters -- (chars 128-255).

So, if my Dictionary contained the Key / Value pair {(char)143, 'A'}, {(char)144, 'E'} - I would need to have the Dictionary return an A with a dot on top of it (i.e. (char)143) for the 'A' and an E with a hyphen on top of it (i.e. (char)144) for the E. -- (I'm using A and E here to make this simple to explain...I know A and E are different chars).

      Dictionary<char, char> ExtendedSet = new Dictionary<char, char> {(char)143, 'A'}, {(char)144, 'E'},...};

My Dictionary works if I use the ASCII set below 127, but I cannot get the Extended ASCII set above 127 to print to the screen or to the console. How can I get the "Encoding" to recognize the proper ASCII set?

Please excuse me if I'm not phrasing this correctly... but at its simplest, I need the Extended ASCII characters to show up in a Textbox, on the screen and in the console and they are not.  

Thanks,
Fulano
0
Comment
Question by:Mr_Fulano
10 Comments
 
LVL 26

Assisted Solution

by:Dr. Klahn
Dr. Klahn earned 100 total points
ID: 41792018
This is apparently a headache no matter what, but it is discussed a bit here.  If you want to stick with one byte per character encoding, it may be necessary to manually set the desired character code page.
0
 

Author Comment

by:Mr_Fulano
ID: 41792059
HI Dr. Klahn, I had found several articles like the one your linked to, but they didn't answer my question. They all discuss linking to a different character set, but how....???

Perhaps this is trickier than I an anticipating it to be. I thought it was simply telling the applicant to use a different Encoding scheme.
0
 
LVL 16

Accepted Solution

by:
DansDadUK earned 400 total points
ID: 41792605
Perhaps you are looking for something like execution-charset?

As regards which character set to use:

If you want to use single-bye encoding in the Western world, use the ISO-8859-1 Latin-1 set, or perhaps the Windows ANSI superset of this (which uses character codes in the 'reserved C1 control codes' range (0x80 -> 0x9f) for additional graphic glyphs, such as the Euro sign).
For international applications, make use of the UTF-8 encoding of the Unicode character set; this uses between one and four bytes for each character; characters in the basic ASCII set are still the same single-byte code-point values.
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 16

Expert Comment

by:DansDadUK
ID: 41792611
... and I've no idea which (if any) character sets include either of the characters:

A with a dot on top of it
E with a hyphen on top of it

Can you clarify just what character shapes you are trying to use?
What are their Unicode code-point values?
0
 

Author Comment

by:Mr_Fulano
ID: 41792760
They are 143 and 144 as I mentioned in my original post.
0
 

Author Comment

by:Mr_Fulano
ID: 41792765
The are characters like this:

 -- Å œ∑´®†¥¨ˆøπ“‘åß∂ƒ©˙∆˚¬…Ω≈ç√∫˜µ≤≥ Œ„´‰ˇÁ¨ˆØ∏ÅÍÎÏ˝ÓÔÒÚƸ˛Ç◊ı˜Â¯˘¿

I'm not sure how to do the E on the Mac.
0
 

Author Comment

by:Mr_Fulano
ID: 41793127
Hi DansDad, I think i figured out my problem...I wrote a simple FOR loop to print to the condole every ASCII character from 0 to 255. That was an eye opener!

for (int x = 0; x <= 255; x++)
{
   Console.Writeline( x + " ----- " + (char) x);
}

I found that (char)143 and (char)144, which is what I was testing with have no output. The majority of the rest of the characters actually do print out, but those are blank...not sure why, but I did find that the two characters I wanted to use are at 193 and 201. So, that solves my issue.

I am going to award you and Dr. Klahn the points, because you did a lot of research for me and I grateful for your help!

Thanks,
Fulano
0
 

Author Closing Comment

by:Mr_Fulano
ID: 41793129
Thank you both for your help. I found my solution, which I posted on the question for other in case they need the help.
0
 
LVL 29

Expert Comment

by:Olaf Doschke
ID: 41793154
You get a different output, whether you have  Console.OutputEncoding = Encoding.Unicode; or  Console.OutputEncoding = Encoding.ASCII; The console is NOT using ASCII by default.

When I use Unicode I also get a gap of characters in the range 127 to 159. A char is then actually not in the range of 0-255 but 0-65535 (2 bytes), of which not all codes make sense. That's actually UCS2, not todays Unicode, but Unicode as it initially was defined. Converting 0-255 only covers most valid codes, which also ASCII would give. The gap you see in UCS2 code values, which in ASCII would be valid printable codes is somewhere else in UCS2, don't ask me, you may look it up. To prove the fact char is not single byte, simply Console.Write((char)0x20AC); That will give you the € symbol. So char and byte are not interchangable and don't cover same ranges.

The simple solution is to switch to ASCII output encoding, if you want ASCII. Then 0-255 simply are the ASCII codes.

Bye, Olaf.
0
 
LVL 16

Expert Comment

by:DansDadUK
ID: 41793320
@Mr_Fulano - glad that you have a resolution, despite me (unintentionally) leading you astray with reference to "execution-charset" - this is a C++ compiler directive, nothing to do with C#.

As regards the character you showed in an earlier post - the two highlighted characters are:

Å - Unicode U+00C5 "Latin Capital A With Ring Above"
Á - Unicode U+00C1 "Latin Capital A With Acute"

but I can't find a character "A with a dot on top" (or "E with a dot on top" either).

If you are using an 8-bit coded character set, the code-point you require to map to these Unicode characters depends on the selected character set.
I'm not familiar with Mac systems, but it seems that the old (8-bit) "MAC OS Roman" set is not (unlike ISO-8859-1) a strict subset of the Unicode character set (i.e. where the 256 values map directly to the 256 code-points in group 0, plane 0, row 0 of Unicode).

As @Olaf Doschke points out, characters in C# are always 16-bit Unicode UCS-2 values (and strings are effectively arrays of such characters), and you can 'convert' to different Encodings (but you'll only get a glyph displayed if a mapping exists for the chosen code-point).
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
2 questions 10 29
ASP.NET(C#) Eliminating weekends from a date range 2 45
Need to start a web service from Visual Studio 2015 Pro 2 35
FTP file download using c# 3 25
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question