Problem with Loadfromfile & unicode

I have a problem when I try to read a unicode text-file.
The data is written in kyrillic.

  StringList : tStringlist;
  StringList := tStringlist.create;
  if FileExists( 'c:\unicode.txt' ) then
    StringList.Loadfromfile( 'c:\unicode.txt' );

after loading the file the strings of StringList haven't kyrillic letters anymore but appear so whether i would have loaded an non unicode file with Delphi 2006.

I want to read : ¿¿¿¿¿¿
But i get: Îòìåíà

What am I doing wrong?

Who is Participating?
ebob42Connect With a Mentor Commented:
From my Unicode chapter:

>> Unicode Files and BOM
A Unicode file starts with 2 or 3 bytes to specify the format of the file and the byte-order. As a consequence, this sequence of bytes is also called the Byte-Order-Mark or BOM.
For UTF-8, the BOM consists of three bytes:

  239 = $EF = ï
  187 = $BB = »
  191 = $BF = ¿

For UTF-16, the BOM consists of two bytes, usually as follows (little endian):

  255 = $FF = ÿ
  254 = $FE = þ

For UTF-16 big endian, the order of the BOM is reversed. When we create and write data to files in the Unicode format (either in UTF-8 or UTF-16), we need to write the BOM first, before writing any Unicode data – in the right format as well, of course.
Fortunately, we do not have to do this manually, but we can use some support classes added to Delphi 2009, like the TEncoding class.

>> TEncoding
In order to determine the encoding of Unicode data, Delphi 2009 defines a TEncoding class with members for the different encodings: ASCII, UTF7, UTF8, Unicode (= UTF16), BigEndianUncode and Default. The Default encoding, or TEncoding.Default, is equal to the current codepage that the application is running on. You cannot change the default encoding while the application is running.
TEncoding is used as encoding specifier when saving a TStrings or TStringList to disk, or when loading these from disk. The SaveToFile method has been extended with a second argument, specifying the encoding.

  Memo1.Lines.SaveToFile('Memo1.txt', TEncoding.UTF8);

By default, the second argument uses TEncoding.Default, which is the default ANSI Code Page of the machine. This means that by default, the SaveToFile will not produce Unicode output, but ANSI output instead (in other words: the previous behaviour of the application, but any explicit Unicode characters or data will be lost, unless the SaveToFile gets a second argument value using a TEncoding field other than Default, ASCII or UTF7).

Note that the corresponding LoadFromFile does not take a second argument of type TEncoding, since the encoding should be determinable from the BOM in the first few characters of the file.

So, to answer your question: if the characters are not in kyrillic letters, then I wonder what format the file is. Is it a UTF-8 file? Do the kyrillic letters appear when you view the file with notepad for example?

Otherwise, make sure the file has a BOM, or just save the file using Notepad (which writes the BOM for you).
TreppenmeisterAuthor Commented:
In the editor i could see the kyrillic letters - after posting not anymore :-(
Quote: "Unicode support in Win32 Delphi is a sore issue." The method you're using does not support unicode. I bet neither does tStringlist
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Do I read correctly that you are using Delphi 2006 ?
If yes, then TStringList DOES NOT support Unicode, as no other VLC class or component. You must have Delphi 2009 or 2010 to support Unicode natively.

Even to display in a memo Unicode text with older Delphi versions requires some 3rd party components.
TreppenmeisterAuthor Commented:
I am using Delphi 2010. I wrote it only in the tags.
Sorry for misundertanding. I only meant that i get the same results as i got before in Delphi 2006.
TreppenmeisterAuthor Commented:

 StringList.Loadfromfile( 'c:\unicode.txt', tEncoding.Unicode );

also doesnt work. Then stringlist has only one item and all looks suddenly like a chinese novel.
ebo42 : very neat explanation. Thanks a lot
Then what is the format of the file c:\unicode.txt - how did you make it?

Try to open it with notepad and save it as a *real* UTF8 file, with BOM. Then load it with TEncoding.UTF8 as argument.
TreppenmeisterAuthor Commented:
@ ebob42:
when i open the textfile with the notepad then i can see all letters in kyrillic. Therefore i assume that the file is correct (I didnt create the file (it is a translation for the caption of the components), but from the size of it it has to be unicode) It also starts with ÿþ
But what unicode format is it? UTF8, UTF16?

Use Note pad to SAVE THE FILE and then specify the Encoding. It can be UTF-8 in which cae you must use TEncoding.UTF8 or it can be Unicode, in which case you must use TEncoding.Unicode.

If it starts with ÿþ Unicode, then it looks like it's a Unicode file already. I have no idea why it doesn't show right.

Which font are you using? Is the font capable of showing Kyrillic characters?
What if you paste the text (from notepad) in a TEdit and then save it to disk in a unicode file??

There are several things to try here, I'm not sure what you are doing wrong, sorry. Please re-read my text above, and try to follow it to handle unicode files in different formats (UTF-8 vs UTF-16).
TreppenmeisterAuthor Commented:
It works now. I should stop working and take a break.
I edited the wrong file and was accessing all the time the ansi-code file. :-(

This happens when you have two parallel folders (for Delphi 2006 & 2010) :-(

Sorry !
OK, then at least feel free to accept my detailed explanation as solution - also for potential future problems ;-)
rightly deserved
TreppenmeisterAuthor Commented:
It wasn't my intention not to give you the points but i had to rush for Easter holiday.
All Courses

From novice to tech pro — start learning today.