• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 6155
  • Last Modified:

Read a unicode TextFile with ReadLn


When I assign TextFile to a unicode formatted txt file which contains 'normal' characters which fits in one byte, I get strange result since every other byte is marked as #0. So when I try to get the result of a read line, I just get the first byte.
How should I do to read a unicode textfile? I prefer to use ReadLn, is it possible?

3 Solutions
#0 means a null character if I am right, and I think it also means end of a string so the ReadLn might only be reading what is before the first #0 it hits
Would it not be easier to read the whole file into a stringlist first and read it from there?


You could then loop through the lines and remove all the #0 first

For iLoop := 0 to Pred(StringList1.Count) do
  StringList1[iLoop] := StringReplace(StringList1[iLoop], '#0', '', [rfReplaceAll]);
I'm afraid you're going to have to write your own readline to read Unicode text files, unless you have a specific need and can use a Unicode enable control. Perhaps you can enlighten us here.

Unicode files contain in the first two bytes in hex FF FE and you can use this fact to determine the file type. There after you have to read TWO bytes at a time looking for the combination 0D 00 0A 00.

The question is, when you get a line in what are you going to do with it? Do you want to convert it to ANSI or what?
Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

Nick_72Author Commented:
Although you might already have realized it, I feel I should clearify a bit further:

I managed to read a whole line to a string, but when I try to display it (for debug purpose with ShowMessage()) I get just the first byte in the message box since there is this null terminator as the second byte in the first unicode character. So ReadLn works for the whole line.

StringReplace seems to be an option, and it seems easier to read the whole file first although it should work for the result of a call to ReadLn too.


>>Unicode files contain in the first two bytes in hex FF FE and you can use this fact to determine the file type
Great! I should implement this check.

>>There after you have to read TWO bytes at a time looking for the combination 0D 00 0A 00.
Hmm...I'm with you with the 'two bytes' issue, but what is the combination 0D 00 0A 00 and what should I do with it..?

>>Do you want to convert it to ANSI or what?
That was my initial thought, since I scan logfiles and check the lines for specific values - and if found, appropriate action is taken.
I have assumed that there are only ANSI characters in these files, but when I think of it, I can't be 100% sure.

What about the WideString type. It's purpose is to handle two-byte characters isn't it?

Nick_72Author Commented:
Mokule, didn't see your post, I'll check it out thanks.
>>What about the WideString type.

Follow up the link, it reads the entire file into a string. That *might* cause you problems.

In any event scanning until the Unicode CR/LF sequence and packing that into a WideString the next step might be to use the Windows API WideStringToMultiByte and convert the 16-bit chacaters to 8 bits. I suppose you are searching only for ASCII sequences?
Nick_72Author Commented:
Alright it works ok, but the entire file is read into the variable. Now I need them line by line. I tried to use TStringList to split it with the Delimiter and DelimitedText properties. But I can't get that part to work. Even if I convert the WideString to AnsiString it won't work. I have tried to place both #10 and #$A as delimiter but it just use space as delimiter.
I would have thougth creating a TStringStream and loading a StringList from it.
I'd create the string stream by passing the WideString as a parameter to the create constructor and let Delphi do the conversion.

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now