Problem with Loadfromfile & unicode

I have a problem when I try to read a unicode text-file.
The data is written in kyrillic.

  StringList : tStringlist;
  StringList := tStringlist.create;
  if FileExists( 'c:\unicode.txt' ) then
    StringList.Loadfromfile( 'c:\unicode.txt' );

after loading the file the strings of StringList haven't kyrillic letters anymore but appear so whether i would have loaded an non unicode file with Delphi 2006.

I want to read : ¿¿¿¿¿¿
But i get: Îòìåíà

What am I doing wrong?

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

TreppenmeisterAuthor Commented:
In the editor i could see the kyrillic letters - after posting not anymore :-(
Quote: "Unicode support in Win32 Delphi is a sore issue." The method you're using does not support unicode. I bet neither does tStringlist
Emmanuel PASQUIERFreelance Project ManagerCommented:
Do I read correctly that you are using Delphi 2006 ?
If yes, then TStringList DOES NOT support Unicode, as no other VLC class or component. You must have Delphi 2009 or 2010 to support Unicode natively.

Even to display in a memo Unicode text with older Delphi versions requires some 3rd party components.
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

TreppenmeisterAuthor Commented:
I am using Delphi 2010. I wrote it only in the tags.
Sorry for misundertanding. I only meant that i get the same results as i got before in Delphi 2006.
From my Unicode chapter:

>> Unicode Files and BOM
A Unicode file starts with 2 or 3 bytes to specify the format of the file and the byte-order. As a consequence, this sequence of bytes is also called the Byte-Order-Mark or BOM.
For UTF-8, the BOM consists of three bytes:

  239 = $EF = ï
  187 = $BB = »
  191 = $BF = ¿

For UTF-16, the BOM consists of two bytes, usually as follows (little endian):

  255 = $FF = ÿ
  254 = $FE = þ

For UTF-16 big endian, the order of the BOM is reversed. When we create and write data to files in the Unicode format (either in UTF-8 or UTF-16), we need to write the BOM first, before writing any Unicode data – in the right format as well, of course.
Fortunately, we do not have to do this manually, but we can use some support classes added to Delphi 2009, like the TEncoding class.

>> TEncoding
In order to determine the encoding of Unicode data, Delphi 2009 defines a TEncoding class with members for the different encodings: ASCII, UTF7, UTF8, Unicode (= UTF16), BigEndianUncode and Default. The Default encoding, or TEncoding.Default, is equal to the current codepage that the application is running on. You cannot change the default encoding while the application is running.
TEncoding is used as encoding specifier when saving a TStrings or TStringList to disk, or when loading these from disk. The SaveToFile method has been extended with a second argument, specifying the encoding.

  Memo1.Lines.SaveToFile('Memo1.txt', TEncoding.UTF8);

By default, the second argument uses TEncoding.Default, which is the default ANSI Code Page of the machine. This means that by default, the SaveToFile will not produce Unicode output, but ANSI output instead (in other words: the previous behaviour of the application, but any explicit Unicode characters or data will be lost, unless the SaveToFile gets a second argument value using a TEncoding field other than Default, ASCII or UTF7).

Note that the corresponding LoadFromFile does not take a second argument of type TEncoding, since the encoding should be determinable from the BOM in the first few characters of the file.

So, to answer your question: if the characters are not in kyrillic letters, then I wonder what format the file is. Is it a UTF-8 file? Do the kyrillic letters appear when you view the file with notepad for example?

Otherwise, make sure the file has a BOM, or just save the file using Notepad (which writes the BOM for you).

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
TreppenmeisterAuthor Commented:

 StringList.Loadfromfile( 'c:\unicode.txt', tEncoding.Unicode );

also doesnt work. Then stringlist has only one item and all looks suddenly like a chinese novel.
Emmanuel PASQUIERFreelance Project ManagerCommented:
ebo42 : very neat explanation. Thanks a lot
Then what is the format of the file c:\unicode.txt - how did you make it?

Try to open it with notepad and save it as a *real* UTF8 file, with BOM. Then load it with TEncoding.UTF8 as argument.
TreppenmeisterAuthor Commented:
@ ebob42:
when i open the textfile with the notepad then i can see all letters in kyrillic. Therefore i assume that the file is correct (I didnt create the file (it is a translation for the caption of the components), but from the size of it it has to be unicode) It also starts with ÿþ
But what unicode format is it? UTF8, UTF16?

Use Note pad to SAVE THE FILE and then specify the Encoding. It can be UTF-8 in which cae you must use TEncoding.UTF8 or it can be Unicode, in which case you must use TEncoding.Unicode.

If it starts with ÿþ Unicode, then it looks like it's a Unicode file already. I have no idea why it doesn't show right.

Which font are you using? Is the font capable of showing Kyrillic characters?
What if you paste the text (from notepad) in a TEdit and then save it to disk in a unicode file??

There are several things to try here, I'm not sure what you are doing wrong, sorry. Please re-read my text above, and try to follow it to handle unicode files in different formats (UTF-8 vs UTF-16).
TreppenmeisterAuthor Commented:
It works now. I should stop working and take a break.
I edited the wrong file and was accessing all the time the ansi-code file. :-(

This happens when you have two parallel folders (for Delphi 2006 & 2010) :-(

Sorry !
OK, then at least feel free to accept my detailed explanation as solution - also for potential future problems ;-)
Emmanuel PASQUIERFreelance Project ManagerCommented:
rightly deserved
TreppenmeisterAuthor Commented:
It wasn't my intention not to give you the points but i had to rush for Easter holiday.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Editors IDEs

From novice to tech pro — start learning today.