Creating and writing into a Unicode file

redsg
redsg used Ask the Experts™
on
I'm using Delphi 7 and would like to find out how to programmatically create a Unicode text file, and write lines of Unicode strings (of WideString type) into the file. Will using FileCreate automtically encode the text file as non-Unicode?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
The easiest way it just to load a TStringList with the text you want to save, and then do a SaveToFile passing the TEncoding value you want your file to be saved.

A quote from my Delphi XE Development Essentials courseware manual:

>> Console or Text File I/O

First the bad news: neither console nor Text file I/O support reading Unicode strings. And writing also only supports AnsiStrings. This means that as soon as you call write or writeln, the contents of a (Unicode) string will be converted to AnsiString when needed, and written to the output.
This means that any Text file I/O needs to be rewritten using streams or other techniques. However, since a UTF8String is also an AnsiString (with the 65001 code page specified), there is a good workaround for writing to console output provided you set the console codepage to UTF8 and use a font that can display the Unicode characters (that’s Lucida Console for example):

program ConsoleUTF8;
{$APPTYPE CONSOLE}
uses
  Windows, SysUtils;

begin
  SetConsoleOutputCP(65001);
  write(AnsiChar(239), AnsiChar(187), AnsiChar(191)); // UTF-8 BOM
  Writeln(Output, UTF8String('[¿¿¿¿¿¿¿¿¿¿¿¿ ¿¿¿¿¿¿¿]'));
end.

This will produce Cyrillic characters on the standard output. Note that Lucida Console cannot display all Unicode characters – Chinese and the Clef are not shown, but at least Cyrillic characters display without problems.

Note that I’m also writing the BOM to the output in case you want to save the console output to a text file and read it afterwards. That way, you can set the font afterwards and also see the Chinese or Clef characters without problems. Provided they were written as UTF8.
This is also the basis for writing UTF8 data to Text files: printing UTF8Strings on a file which starts with the UTF-8 BOM:

program UnicodeTextFile;
{$APPTYPE CONSOLE}
uses
  Windows, SysUtils;

var
  F: Text;
begin
  Assign(F, 'output.txt');
  Rewrite(f);
  write(f, AnsiChar(239), AnsiChar(187), AnsiChar(191)); // UTF-8 BOM
  writeln(f, UTF8String('[¿¿¿¿¿¿¿¿¿¿¿¿ ¿¿¿¿¿¿¿]'));
  Close(f);
end.

Since UTF8String is an AnsiString, we can combine the code above with writeln of normal strings, which will be converted to AnsiStrings, as long as we keep away from high-ascii characters (since these would indicate the start of a UTF8 special character byte sequence).

program UnicodeTextFile;
{$APPTYPE CONSOLE}
uses
  Windows, SysUtils;
var
  F: Text;
begin
  Assign(F, 'output.txt');
  Rewrite(f);
  write(f, AnsiChar(239), AnsiChar(187), AnsiChar(191)); // UTF-8 BOM
  writeln(f, UTF8String('[¿¿¿¿¿¿¿¿¿¿¿¿ ¿¿¿¿¿¿¿]'));
  writeln(f, 'This is a UTF-16 String which will be written as AnsiString');
  Close(f);
end.

As long as we convert UTF-16 Unicode Strings to UTF8 before writing to Text files, and don’t forget to use the UTF-8 BOM as prefix, this will work fine for writing files with Unicode UTF-8 output.

>> TStrings / TStringList

Apart from the UTF-8 testfile trick just covered, the easiest way to produce text output that supports the TEncoding formats, is using the SaveToFile method of a TStrings or TStringList. The SaveToFile method has been extended with a second argument, specifying the encoding.

begin
  Memo1.Lines.SaveToFile('Memo1.txt', TEncoding.UTF8);

By default, the second argument uses TEncoding.Default, which is the default ANSI Code Page of the machine. This means that by default, the SaveToFile will not produce Unicode output, but ANSI output instead (in other words: the previous behavior of the application, but any explicit Unicode characters or data will be lost, unless the SaveToFile gets a second argument value using a TEncoding field other than Default, ASCII or UTF7).

Note that the corresponding LoadFromFile does not take a second argument of type TEncoding, since the encoding should be determinable from the BOM in the first few characters of the file:

  Memo1.Lines.LoadFromFile('Memo1.txt');
end;
"
sorry, but delphi 7 doesn't handle unicode strings , only ansi ! you should code the file byte by byte, but it would be very uneasy to handle...

Commented:
When using Delphi 7, you can use the TNT controls. They used to be free, but are now part of the TMS Software offerings.
CompTIA Network+

Prepare for the CompTIA Network+ exam by learning how to troubleshoot, configure, and manage both wired and wireless networks.

Author

Commented:
i'm already using TNT controls i.e. the Unicode strings/lines are currently in the TNTMemo component. the challenge for me is to create a Unicode encoded text file and to write the Unicode lines into it.

Commented:
TNT controls should support TTntInifile as far as I remember. Didn't they also support text files or streams?
you should have told all the information in the beginning ... :-P
Simple procedure to do the work.
see code attached.

procedure WriteUnicodeFileString(AFilename:String; AString:AnsiString);
var
  str : TFIlestream;
  buf : TBytes;
  ws : PWideChar;
  SysPrepStrings :TStringlist;
begin
  str := TFIlestream.Create(AFilename,fmCreate);
  try
    ws := PWidechar(UTF8Decode(AString));
    setlength(buf,Length(ws)*2);
    buf[0] := $FF;    buf[1] := $FE ; //unicode preamble;
    Move(ws[1],buf[2],Length(Buf)-2);
    str.Write(buf[0],Length(Buf));
  finally
    FreeAndNil(str);
  end;
end;

Open in new window

Author

Commented:
I've tried the following and it seems to be able to write the Unicode characters into the text file. However, I've noticed that each line is not written on a new line when the files are viewed in notepad. But there are displayed line by line when viewed in MS Word.

How can I ensure that each string being "fed" into the method is printed on a new line? Seems like adding the '#13#10' to the end of the WideString variable isn't working.

In addition, the FreeAndNil() function in briangochnauer's code is returning an error. Is there an alternative to closing the TFileStream instance?
procedure WriteToTextFile(newFile: TFileStream; StringToWrite: WideString);
var
  ws:   PWideChar;
  buf:  array of byte;
begin
  ws := PWideChar(StringToWrite + #13#10);
  SetLength := (buf, Length(ws)*2);
  buf[0] := $FF;
  buf[1] := $FE;
  Move(ws[0], buf[2], Length(buf)-2);
  newFile.Write(buf[0], Length(buf));
end;

Open in new window

Author

Commented:
I think I've got it:
// create TFileStream instance
str := TFileStream.Create(fileLocation, fmCreate);

// for each line (of WideString type) in TNTMemo..
for i:=0 to memo.Lines.Count-1 do
begin
  // .. insert line
  WriteToTextFile(str, memo.Lines[i]);
  // .. insert line break
  WriteToTextFile(str, #13#10);
end;

Open in new window

Author

Commented:
then to include this line at the end of it all:

str.Destroy;

Commented:
str.Feee would be better (compared to calling the destructor directly). Or FreeAndNil(str);
@redsq,

WriteUnicodeFileString is intended to write the WHOLE file at once not as an 'Append String' to a file.
  buf[0] := $FF;
  buf[1] := $FE;

Need only appear ONCE at the beginning of the file, to signal a  Unicode file structure.
In place of a Append you could loop and concat strings together with #13#10 (CRLF)
MyString := Mystring+' abc'+#13#10;
WriteUnicodeFileString ('c:\temp\test.txt',MyString);

AString does not contain a CRLF just because it is written to a file you must put them in.

Author

Commented:
@briangochnauer,

understood what you meant. i shall build up the widestring with the necessary #13#10's before passing it through the write function. thanks much!
>> i shall build up the widestring with the necessary #13#10's before passing it through the write function.
This is not needed with my procedure (shown below) it will take a string; and work correctly under Delphi 7 through Delphi XE
Building the string with CRLF (#13#10) is needed but not as a wide string.
I suggest maybe a TStringList to concat string;
var SL:TStringlist;
   SL := TStringlist.create;
  then you can add a string with SL.Add('My next string');
    and finally
WriteUnicodeFileString('c:\temp\filename.txt;  SL.DelimitedText);


procedure WriteUnicodeFileString(AFilename:String; AString:String);
var
  str : TFIlestream;
  buf : TBytes;
  ws : PWideChar;
  SysPrepStrings :TStringlist;
begin
  str := TFIlestream.Create(AFilename,fmCreate);
  try
    {$IFDEF Unicode}
     ws := PChar(AString); /// in Unicode, a string = WideString
    {$ELSE} ws := PWidechar(UTF8Decode(AString));{$ENDIF}
    setlength(buf,Length(ws)*2);
    buf[0] := $FF;    buf[1] := $FE ; //unicode preamble;
    Move(ws[1],buf[2],Length(Buf)-2);
    str.Write(buf[0],Length(Buf));
  finally
    FreeAndNil(str);
  end;
end;

Open in new window

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial