Delphi2010: how do i convert utf16 String to utf 8 String.

wantime
wantime used Ask the Experts™
on
hi all,

as i know, in delphi2010 String is an alias to utf16 String(UnicodeString).

can anyone tell me, how do i convert a utf16 string to utf 8 string?

i know function uft8Encode has a returnvalue as rawByteString, is it also possible to convert this returnvalue to a utf8 String?

Utf8Encode(const US: UnicodeString): RawByteString;

please show me some Codes here. Thanks!

Best Regards,

wantime
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
You can directtly assign a UnicodeString to a UTF8String.


var
  S: String;
  U: UTF8String;
begin
  S := '¿¿¿¿¿¿¿';
  U := S;
  ShowMessage(U);
end;

Open in new window

Geert GOracle dba
Top Expert 2009

Commented:
you can use the TEncoding class to convert strings

TEncoding.Convert(Source, Destination: TEncoding; Bytes: TBytes): TBytes;

var
  s, x: string;
begin
  with TEncoding do
    s := StringOf(Convert(TUnicodeEncoding, TUTF8Encoding, BytesOf('Test unicode string')));
  ShowMessage(s);
Geert GOracle dba
Top Expert 2009

Commented:
ungh ... typo

var
  s, x: string;
begin
  x := 'Test unicode string';
  s := StringOf(TEncoding.Convert(TEncoding.Unicode, TEncoding.UTF8, BytesOf(X)));
  ShowMessage(s);
end;
Amazon Web Services

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

Commented:
There is a special codepage 65001 which produces an AnsiString with the UTF-8 encoding, also called the UTF8String.

type
  UTF8String = type Ansistring(65001); // UTF-8

This is a very powerful string type. You can assign Unicode String to UTF8String, and the assignment will do the conversion for you.
var
  S: String;
  U: UTF8String;
begin
  S := '¿¿¿¿¿¿¿'; // 7 chinese characters
  U := S;
  ShowMessage(U);
  ShowMessage(IntToStr(Length(S))); // 7
  ShowMessage(IntToStr(Length(U))); // 21
end;

Commented:
IMHO, there is no need for the call to StringOf(TEncoding.Convert(TEncoding.Unicode, TEncoding.UTF8, BytesOf(X)));

Author

Commented:
thanks!
for following Codes(from ebob42), can anyone told me why the Length(U) is 21?

var
  S: String;
  U: UTF8String;
begin
  S := '¿¿¿¿¿¿¿'; // 7 chinese characters
  U := S;
  ShowMessage(U);
  ShowMessage(IntToStr(Length(S))); // 7
  ShowMessage(IntToStr(Length(U))); // 21
end;
Geert GOracle dba
Top Expert 2009

Commented:
UTF8 stores characters with 1, 2 or 4 bytes
the length of each character can be any of those numbers
sample:
4+2+4+4+4+2+1

Commented:
No, UTF8 can store code points (graphical representations of characters) in 1, 2 3 or 4 bytes. Not just 1, 2 and 4.

From my Delphi 2009 (or 2010) Development Essentials manual:

"Using UTF-8 we get between 1 and 4 bytes for each Unicode character. This is an encoding where we never know in advance how much (storage) bytes are needed to contain a string. Although we can predict the minimum number of bytes: which is the same as the number of characters for a 7-bit ASCII data stream.
The standard 7-bits ASCII characters are the same in UTF-8, which means there is a great level of compatibility between ‘normal’ characters. Apart from these standard ASCII characters, UTF-8 supports all 1 million Unicode characters using a UTF-8 specific coding. UTF-8 is mainly used on the internet for web pages for example (since it produces smaller files compared to the UTF-16 and UTF-32 formats)."

See also http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:2975 for more UTF-8 usages (writing to standard output!).

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial