Link to home
Start Free TrialLog in
Avatar of TicketMan
TicketMan

asked on

How to get Excel recognise the German 'Estzett' symbol: ß ?

Hello,

I am importing regularly a file into Excel which has German addresses in it. These addresses contain symbols that are not available in the English language, such as the 'ß' (also called the German Estzett character.
In 'MS-Office 2003 Language Preferences' I have included 'German' yet it still doesn't recognise the ß.
Instead Excel shows a little black square for this ß-character.
Any suggestions? Thank you.

Avatar of Ingeborg Hawighorst (Microsoft MVP / EE MVE)
Ingeborg Hawighorst (Microsoft MVP / EE MVE)
Flag of New Zealand image

Hello TicketMan,

What are you importing from? The German ß character is Ansi code 223 or in Excel =char(223).

Single out a character that displays as a square and use =code() to find what the underlying code is.

For example:

cell A1 contains: Straße

use

=code(mid(a1,5,1))

to see what Ansi code is actually sitting there.

Or post a data sample as an Excel sheet.

cheers, teylyn
Avatar of TicketMan
TicketMan

ASKER

Hi Teylyn,

I have used your formula (clever!) to check the Ansi code. It returns 63.
I have done it on several cells to make sure. It always returns 63.
I am importing the data from a .csv file into Excel. In this .csv file the German characters are fine.
63 is the question mark, and it is returned when you use code() on Unicode characters that have no ANSI equivalent, such as the box.

I'm guessing this is an encoding issue. Normally, when importing text data, you should be able to specify which encoding to use. Instead, Excel tries to guess, and apparently gets it wrong. Perhaps the ß is treated as a Greek beta for example (DOS encoding).

I would like to see a few lines of your CSV file. teylyn wants an Excel file, I think I'd rather try the import myself...

(°v°)
Here is how a typical word looks in the .csv file that I use to import in Excel (this .csv file has tens of thousand of records by the way with addresses from Germany...):
Bleichstraße

Here is how that word looks in Excel after I imported the .csv file:
Bleichstra¯e

The 'ß' has been replaced by a black square.
Woops, sorry all. I can't seem to show the black square on this forum. It's been replaced by again something else...how strange.
The character that made it up here is 175 Ascii code, but the EE software may have changed it from what it originally was. I agree with harfang. It would be best to see a small sample of the CSV.

cheers, teylyn
» Here is how a typical word looks…

We understand this, but it doesn't help solving the issue. The important item of information is how the text is encoded, not how it looks. Do not post your entire file, but just a three-line file which exhibits the same symptom when imported in Excel.

For example, in your comment, you included a black square. This was probably some Unicode graphical character. When received by EE, it was converted to ANSI (the encoding used to store comments in the site's database). Characters outside of the ANSI range are converted either to a general fall-back character, like `?´, or to a vaguely similar character. Something like this happened to the `ß´s in your CSV file.

(°v°)
Here is a tiny extract of the file after data has been imported into Excel. As you can see the German estzett character has been replaced by a little black squar'ish character.

(Because of confidentiality I have removed other data)
Participant-Contact-Data-afv.xls
can you post the same data as it is in the CSV?
What we can already deduce from the Excel file is that the import seems to assume a DOS code page while the text file is probably in ANSI. Normally, a CSV file is either in Unicode or in ANSI (more specifically Windows Western, or 1252): http://en.wikipedia.org/wiki/Windows-1252

Notice that ß is encoded as hex DF, dec 223. But it is imported as a upper half black square, which is correct in the DOS code page 437: http://en.wikipedia.org/wiki/Code_page_437 (notice that the ß exists also, but as hex E1, dec 225). When posted here, the upper-half black square, which doesn't exist in ANSI, is converted to something vaguely similar, the “overline” character.

This doesn't bring us closer to the solution, but we can assume it's not a Unicode conversion problem.

(°v°)
ASKER CERTIFIED SOLUTION
Avatar of Markus Fischer
Markus Fischer
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Brilliant harfang. Thanks, that worked.

I imported the data into Excel, which opens a (delimiter) Text Import Wizard. In this wizard there is indeed a combo box with name 'File Origin'. This File Origin was showing MS-Dos (PC-8). I changed that to combo-box value to 1252 Western Euopean (Windows) and continued with the import of the data. Looked at the data and all characters are fine now!

Many thanks for your help, harfang!

Ticketman
Nice one, harfang. As a native German, I will hang on to this thread! :-)) Might come in useful at some time.

cheers, teylyn
TicketMan: I'm glad it worked! Thanks for the feedback and the kind words.

teylyn: I'm also originally German, although I have always lived in Geneva. Perhaps this is why this question title caught my attention.

(^v°)