• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1587
  • Last Modified:

Reading German special characters using a streamreader with currentEncoding = system.text.utf8encoding

I'm using a streamreader to read a text file.

The streamreader.currentencoding is system.text.utf8encoding

The first line of my test text file is the following string of characters:
Uppercase A with an umlaut (HTML code would be Ä)
Lowercase a with an umlaut (ä)
Uppercase O with an umlaut (Ö)
Lowercase o with an umlaut (ö)
Uppercase U with an umlaut (Ü)
Lowercase u with an umlaut (ü)
The German lowercase "SZ" symbol (ß)

This is interpreted as a series of unknown characters.

What can I do to make the stream reader interpret the text correctly.

Thanks very much in advance,

JaimeHy
0
jaimehy
Asked:
jaimehy
  • 2
  • 2
1 Solution
 
Göran AnderssonCommented:
Then the file is obviously not encoded  using UTF-8. You have to specify the encoding that was used to create the text file.

If it's a ANSI (Windows pre-unicode) text file, the Encoding.Default property will give you the current ANSI encoding in the system, which will likely match the encoding used to create the file.

If the file is unicode, but not UTF-8, it usually has a BOM (byte order mark) at the beginning of the file. The StreamReader has a constructor that takes a boolean argument named detectEncodingFromByteOrderMarks that you can use to let the StreamReader select encoding based on the BOM.

If that fails, you can read the file using a FileStream, and examine the actual byte values in the file to see if there is any BOM, and what character codes you get for the special characters.
0
 
Göran AnderssonCommented:
As an alternative to using a FileStream to examine the file, just add a .bin file extension to the file and open it in Visual Studio.
0
 
jaimehyAuthor Commented:
It gave me the pointers I needed to resolve the problem I had.  More than that I couldn't ask for.
0
 
jaimehyAuthor Commented:
Thanks for the tips, GreenGhost,

Yes, I did forget to mention that the txt file is encoded in ANSI.  Sorry about that.

The question that I suppose I wanted to ask is how to force the streamreader to read the file as ANSI.  With your help,I've managed to answer it.

The StreamReader.CurrentEncoding property is read only, so no luck there, and I was stuck.   What I eventually did was declare a default encoding and overload the arguments when declaring my streamreader as follows:

        Dim DefaultEncoding As Encoding = Encoding.Default
        Dim sr As New StreamReader(FileToBeRead, DefaultEncoding)

My umlauts now appear perfectly in my output xml file!

Much obliged


JaimeHY
0

Featured Post

2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now