TicketMan
asked on
How to get Excel recognise the German 'Estzett' symbol: ß ?
Hello,
I am importing regularly a file into Excel which has German addresses in it. These addresses contain symbols that are not available in the English language, such as the 'ß' (also called the German Estzett character.
In 'MS-Office 2003 Language Preferences' I have included 'German' yet it still doesn't recognise the ß.
Instead Excel shows a little black square for this ß-character.
Any suggestions? Thank you.
I am importing regularly a file into Excel which has German addresses in it. These addresses contain symbols that are not available in the English language, such as the 'ß' (also called the German Estzett character.
In 'MS-Office 2003 Language Preferences' I have included 'German' yet it still doesn't recognise the ß.
Instead Excel shows a little black square for this ß-character.
Any suggestions? Thank you.
ASKER
Hi Teylyn,
I have used your formula (clever!) to check the Ansi code. It returns 63.
I have done it on several cells to make sure. It always returns 63.
I have used your formula (clever!) to check the Ansi code. It returns 63.
I have done it on several cells to make sure. It always returns 63.
ASKER
I am importing the data from a .csv file into Excel. In this .csv file the German characters are fine.
63 is the question mark, and it is returned when you use code() on Unicode characters that have no ANSI equivalent, such as the box.
I'm guessing this is an encoding issue. Normally, when importing text data, you should be able to specify which encoding to use. Instead, Excel tries to guess, and apparently gets it wrong. Perhaps the ß is treated as a Greek beta for example (DOS encoding).
I would like to see a few lines of your CSV file. teylyn wants an Excel file, I think I'd rather try the import myself...
(°v°)
I'm guessing this is an encoding issue. Normally, when importing text data, you should be able to specify which encoding to use. Instead, Excel tries to guess, and apparently gets it wrong. Perhaps the ß is treated as a Greek beta for example (DOS encoding).
I would like to see a few lines of your CSV file. teylyn wants an Excel file, I think I'd rather try the import myself...
(°v°)
ASKER
Here is how a typical word looks in the .csv file that I use to import in Excel (this .csv file has tens of thousand of records by the way with addresses from Germany...):
Bleichstraße
Here is how that word looks in Excel after I imported the .csv file:
Bleichstra¯e
The 'ß' has been replaced by a black square.
Bleichstraße
Here is how that word looks in Excel after I imported the .csv file:
Bleichstra¯e
The 'ß' has been replaced by a black square.
ASKER
Woops, sorry all. I can't seem to show the black square on this forum. It's been replaced by again something else...how strange.
The character that made it up here is 175 Ascii code, but the EE software may have changed it from what it originally was. I agree with harfang. It would be best to see a small sample of the CSV.
cheers, teylyn
cheers, teylyn
» Here is how a typical word looks…
We understand this, but it doesn't help solving the issue. The important item of information is how the text is encoded, not how it looks. Do not post your entire file, but just a three-line file which exhibits the same symptom when imported in Excel.
For example, in your comment, you included a black square. This was probably some Unicode graphical character. When received by EE, it was converted to ANSI (the encoding used to store comments in the site's database). Characters outside of the ANSI range are converted either to a general fall-back character, like `?´, or to a vaguely similar character. Something like this happened to the `ß´s in your CSV file.
(°v°)
We understand this, but it doesn't help solving the issue. The important item of information is how the text is encoded, not how it looks. Do not post your entire file, but just a three-line file which exhibits the same symptom when imported in Excel.
For example, in your comment, you included a black square. This was probably some Unicode graphical character. When received by EE, it was converted to ANSI (the encoding used to store comments in the site's database). Characters outside of the ANSI range are converted either to a general fall-back character, like `?´, or to a vaguely similar character. Something like this happened to the `ß´s in your CSV file.
(°v°)
ASKER
Here is a tiny extract of the file after data has been imported into Excel. As you can see the German estzett character has been replaced by a little black squar'ish character.
(Because of confidentiality I have removed other data)
Participant-Contact-Data-afv.xls
(Because of confidentiality I have removed other data)
Participant-Contact-Data-afv.xls
can you post the same data as it is in the CSV?
What we can already deduce from the Excel file is that the import seems to assume a DOS code page while the text file is probably in ANSI. Normally, a CSV file is either in Unicode or in ANSI (more specifically Windows Western, or 1252): http://en.wikipedia.org/wiki/Windows-1252
Notice that ß is encoded as hex DF, dec 223. But it is imported as a upper half black square, which is correct in the DOS code page 437: http://en.wikipedia.org/wiki/Code_page_437 (notice that the ß exists also, but as hex E1, dec 225). When posted here, the upper-half black square, which doesn't exist in ANSI, is converted to something vaguely similar, the “overline” character.
This doesn't bring us closer to the solution, but we can assume it's not a Unicode conversion problem.
(°v°)
Notice that ß is encoded as hex DF, dec 223. But it is imported as a upper half black square, which is correct in the DOS code page 437: http://en.wikipedia.org/wiki/Code_page_437 (notice that the ß exists also, but as hex E1, dec 225). When posted here, the upper-half black square, which doesn't exist in ANSI, is converted to something vaguely similar, the “overline” character.
This doesn't bring us closer to the solution, but we can assume it's not a Unicode conversion problem.
(°v°)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Brilliant harfang. Thanks, that worked.
I imported the data into Excel, which opens a (delimiter) Text Import Wizard. In this wizard there is indeed a combo box with name 'File Origin'. This File Origin was showing MS-Dos (PC-8). I changed that to combo-box value to 1252 Western Euopean (Windows) and continued with the import of the data. Looked at the data and all characters are fine now!
Many thanks for your help, harfang!
Ticketman
I imported the data into Excel, which opens a (delimiter) Text Import Wizard. In this wizard there is indeed a combo box with name 'File Origin'. This File Origin was showing MS-Dos (PC-8). I changed that to combo-box value to 1252 Western Euopean (Windows) and continued with the import of the data. Looked at the data and all characters are fine now!
Many thanks for your help, harfang!
Ticketman
Nice one, harfang. As a native German, I will hang on to this thread! :-)) Might come in useful at some time.
cheers, teylyn
cheers, teylyn
TicketMan: I'm glad it worked! Thanks for the feedback and the kind words.
teylyn: I'm also originally German, although I have always lived in Geneva. Perhaps this is why this question title caught my attention.
(^v°)
teylyn: I'm also originally German, although I have always lived in Geneva. Perhaps this is why this question title caught my attention.
(^v°)
What are you importing from? The German ß character is Ansi code 223 or in Excel =char(223).
Single out a character that displays as a square and use =code() to find what the underlying code is.
For example:
cell A1 contains: Straße
use
=code(mid(a1,5,1))
to see what Ansi code is actually sitting there.
Or post a data sample as an Excel sheet.
cheers, teylyn