Strange behavior with WideString conversion

procedure TForm1.TisButton1Click(Sender: TObject);
  procedure test(const Param1: WideString);
  var
    s,d:string;
    i,len:integer;
  begin
    s:=param1;
    len:=length(s);
    d:=Inttostr(len)+': ';
    for i:=1 to len do
      d:=d+' '+inttostr(ord(s[i]));
    ShowMessage(d);
  end;
var
  s:string;
begin
  s:=#253#253#253#02#25;
  test(s);
end;

After call test, the result is #253#253#63#25, Why?

Thanks.
HBZhangAsked:
Who is Participating?
 
AloneConnect With a Mentor Commented:
If string "comes from RS232" try to receive it into WideString variable and NEVER covert it into single-byte. Always use WideString.
0
 
MBoCommented:
I've tried.
Result- 5:253 253 253 2 25
It looks like localization problem?
Is your Windows English? (My is Russian)
0
 
HBZhangAuthor Commented:
No, my is Chinese.
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

 
AloneCommented:
Hi!

Your error is in this line: s:=param1;

Because Param1 is WideString and S is string that line is equivalent of WideCharToString (or similar) function call. This conversion depends on your system locale settings and behavior differs between Russian and Chinese locales :-))

Try this:

procedure TMainForm.Button1Click(Sender: TObject);

procedure Test(const Param: WideString);
var
  S: string;
  I, Len: Integer;
  P: Pointer;
begin
  Len := Length(Param) * SizeOf(Param[1]); // size in bytes
  S := IntToStr(Len) + ',';
  for I := 0 to Len - 1 do
    S := S + '#' + IntToStr(Byte(AnsiString(Pointer(Param))[I]));
  ShowMessage(S);
end;

begin
  Test(#204#224#236#224);
// shows 8,#0#28#4#48#4#60#4#48  
end;

My system is Russian too :-))))


0
 
HBZhangAuthor Commented:
I'm call a COM object function with a string as parameter. Because COM only support WideString, then problem occurs.

Is there a way to exchange data between String and WideString safely?

Thanks.
0
 
AloneCommented:
For Chinese? For what purpose?

var
  S1, S2: string;
  W: WideString;
begin
  S1 := 'blabla';  // ANSI 'blabla'
  W := S1;         // ANSI #0'b'#0'l'#0'a'#0'b'#0'l'#0'a'
  S2 := W;         // ANSI 'blabla'
end;

Don't cofuse with ANSI (may be multibyte) and Unicode characters. They are MUST BE different! My previous example shows how Unicode string with Russian characters looks at low-level (bytes chain).

For european languages (and Russian)  ANSI strings are always single-byte. When we assigning them to WideString, they're expanding to double-byte Unicode characters, for my example, 4 bytes (4 single-byte characters) to 8 bytes (4 double-byte characters).

How many Chinese characters are in your sample: #253#253#253#02#25?

Please try current example (blablabla). If strings are not corrupted your system still is working ok. :-))
0
 
AloneCommented:
For Chinese? For what purpose?

var
  S1, S2: string;
  W: WideString;
begin
  S1 := 'blabla';  // ANSI 'blabla'
  W := S1;         // ANSI #0'b'#0'l'#0'a'#0'b'#0'l'#0'a'
  S2 := W;         // ANSI 'blabla'
end;

Don't cofuse with ANSI (may be multibyte) and Unicode characters. They are MUST BE different! My previous example shows how Unicode string with Russian characters looks at low-level (bytes chain).

For european languages (and Russian)  ANSI strings are always single-byte. When we assigning them to WideString, they're expanding to double-byte Unicode characters, for my example, 4 bytes (4 single-byte characters) to 8 bytes (4 double-byte characters).

How many Chinese characters are in your sample: #253#253#253#02#25?

Please try current example (blablabla). If strings are not corrupted your system still is working ok. :-))
0
 
AloneCommented:
Sorry my message sent twice :-((
0
 
HBZhangAuthor Commented:
Create a variant array with varByte type can avoid this problem, but it's no simple.

I wonder to know is there something can control this conversion? Anyway, i think it as strange behavior.

  WideString := String;
  String := WideString;

Changed? Why?
0
 
AloneCommented:
For Russian:

 WideString := String;
 String := WideString;

works fine.

But for Azeri no:

W := 'az'#609'ri';
S := W; // looks as 'az?ri';
W := 'S' // looks as 'az?ri'; but in Unicode :-((

This behavior depends on ANSI (single-byte in my case) character set restriction. Some Unicode characters has no equivalent in single-byte and system replace them with '?'

Am I right?
0
 
AloneCommented:
Yep! When you place strings direclty in your program source Delphi ALWAYS creates ANSI strings.

procedure TMainForm.Button2Click(Sender: TObject);

function ComFunction(const Param: OleVariant): Integer;
begin
// Works with NT/2k/XP only ;-)
  Result := MessageBoxW(Handle, Pointer(WideString(Param)), '', MB_ICONINFORMATION);
end;

var
  S: string;
  W: WideString;
  C: WideChar;

begin
  ComFunction('Direct: az'#$018F'ri'); // Delphi creates an ANSI (?) string 'az'#$8F'ri'
// may be Unicode string BUT depends on system locale (my locale is Russian but string is in Azeri (Azerbaijani))
  C := #$018F;
  ComFunction('WideChar: Az'+C+'ri'); // works fine
end;
0
 
HBZhangAuthor Commented:
Thanks. But it's not good enough. I'am waiting...

BTW: In my really work, I donnot "place strings direclty in your program source". It comes from rs232 port.
0
 
AloneCommented:
Here is your sample like:

var
  W: WideString;

begin
  W := 'az'#$018F'ri'; // Delphi creates #$0041#$007A + #$040F + #0072#0069
// instead of #$0041#$007A + #$018F+ #0072#0069
// third characted replaced with Russian (using system locale)
end;
0
 
HBZhangAuthor Commented:
Yeah, always use WideString may be a good idea.
0
 
Lee_NoverCommented:
why not simply use StringToOLEStr :)
0
 
geobulCommented:
Hi,

procedure test(const Param1: WideString);
var
  s: string;
  wc: PWideChar;
begin
  wc := PWideChar(Param1);
  s := WideCharToString(wc);
  s := IntToStr(Length(s)) + ': ' + s;
  ShowMessage(s);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  ws: WideString;
begin
  ws := #253#253#253#02#25;
  test(ws);
end;

Regards, Geo
0
 
geobulCommented:
Or:

procedure TForm1.TisButton1Click(Sender: TObject);
  procedure test(const Param1: WideString);
  var
    s,d: string;
    wc: PWideChar;
    len,i: integer;
  begin
    wc := PWideChar(Param1);
    s := WideCharToString(wc);
    len := length(s);
    d := IntToStr(len) + ': ';
    for i := 1 to len do
      d := d + '#' + inttostr(ord(s[i]));
    ShowMessage(d);
  end;
var
 s:string;
begin
 s:=#253#253#253#02#25;
 test(s);
end;
0
 
AloneCommented:
All depends on receiving strings original format: ANSI or Unicode. When they're ANSI, may possible to use AnsiString and StringToOleStr or direct StrOleVariant assignment.
But when string is Unicode, converting to ANSI representation may corrupt the data, replacing some characters with '?' or other. Using WideString representation is more flexible because it locale-independent.

2geobul: Have you tested your examples? What result they produce? And what your default system locale?
0
 
geobulCommented:
Well, what is supposed to be produced? The second one shows:
5: #253#253#253#2#25

English(US)

Regards, Geo
0
 
AloneCommented:
All depends on receiving strings original format: ANSI or Unicode. When they're ANSI, may possible to use AnsiString and StringToOleStr or direct StrOleVariant assignment.
But when string is Unicode, converting to ANSI representation may corrupt the data, replacing some characters with '?' or other. Using WideString representation is more flexible because it locale-independent.

2geobul: Have you tested your examples? What result they produce? And what your default system locale?
0
 
AloneCommented:
Damn! My browser automatically resends messages!

When you're using Unicode on locale has FULL ANSI representation - on problem. But when your locale hasn't single-byte equivalent, you'll receive some question marks instead of characters and the data will corrupt.

In my expamples: Russian locale has full single-byte ANSI representation and all work ok. But Azerbaijani is Unicode-only and now we'we a big headache with one (!) letter :-(((
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.