• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 564
  • Last Modified:

Strange behavior with WideString conversion

procedure TForm1.TisButton1Click(Sender: TObject);
  procedure test(const Param1: WideString);
  var
    s,d:string;
    i,len:integer;
  begin
    s:=param1;
    len:=length(s);
    d:=Inttostr(len)+': ';
    for i:=1 to len do
      d:=d+' '+inttostr(ord(s[i]));
    ShowMessage(d);
  end;
var
  s:string;
begin
  s:=#253#253#253#02#25;
  test(s);
end;

After call test, the result is #253#253#63#25, Why?

Thanks.
0
HBZhang
Asked:
HBZhang
  • 11
  • 5
  • 3
  • +2
1 Solution
 
MBoCommented:
I've tried.
Result- 5:253 253 253 2 25
It looks like localization problem?
Is your Windows English? (My is Russian)
0
 
HBZhangAuthor Commented:
No, my is Chinese.
0
 
AloneCommented:
Hi!

Your error is in this line: s:=param1;

Because Param1 is WideString and S is string that line is equivalent of WideCharToString (or similar) function call. This conversion depends on your system locale settings and behavior differs between Russian and Chinese locales :-))

Try this:

procedure TMainForm.Button1Click(Sender: TObject);

procedure Test(const Param: WideString);
var
  S: string;
  I, Len: Integer;
  P: Pointer;
begin
  Len := Length(Param) * SizeOf(Param[1]); // size in bytes
  S := IntToStr(Len) + ',';
  for I := 0 to Len - 1 do
    S := S + '#' + IntToStr(Byte(AnsiString(Pointer(Param))[I]));
  ShowMessage(S);
end;

begin
  Test(#204#224#236#224);
// shows 8,#0#28#4#48#4#60#4#48  
end;

My system is Russian too :-))))


0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
HBZhangAuthor Commented:
I'm call a COM object function with a string as parameter. Because COM only support WideString, then problem occurs.

Is there a way to exchange data between String and WideString safely?

Thanks.
0
 
AloneCommented:
For Chinese? For what purpose?

var
  S1, S2: string;
  W: WideString;
begin
  S1 := 'blabla';  // ANSI 'blabla'
  W := S1;         // ANSI #0'b'#0'l'#0'a'#0'b'#0'l'#0'a'
  S2 := W;         // ANSI 'blabla'
end;

Don't cofuse with ANSI (may be multibyte) and Unicode characters. They are MUST BE different! My previous example shows how Unicode string with Russian characters looks at low-level (bytes chain).

For european languages (and Russian)  ANSI strings are always single-byte. When we assigning them to WideString, they're expanding to double-byte Unicode characters, for my example, 4 bytes (4 single-byte characters) to 8 bytes (4 double-byte characters).

How many Chinese characters are in your sample: #253#253#253#02#25?

Please try current example (blablabla). If strings are not corrupted your system still is working ok. :-))
0
 
AloneCommented:
For Chinese? For what purpose?

var
  S1, S2: string;
  W: WideString;
begin
  S1 := 'blabla';  // ANSI 'blabla'
  W := S1;         // ANSI #0'b'#0'l'#0'a'#0'b'#0'l'#0'a'
  S2 := W;         // ANSI 'blabla'
end;

Don't cofuse with ANSI (may be multibyte) and Unicode characters. They are MUST BE different! My previous example shows how Unicode string with Russian characters looks at low-level (bytes chain).

For european languages (and Russian)  ANSI strings are always single-byte. When we assigning them to WideString, they're expanding to double-byte Unicode characters, for my example, 4 bytes (4 single-byte characters) to 8 bytes (4 double-byte characters).

How many Chinese characters are in your sample: #253#253#253#02#25?

Please try current example (blablabla). If strings are not corrupted your system still is working ok. :-))
0
 
AloneCommented:
Sorry my message sent twice :-((
0
 
HBZhangAuthor Commented:
Create a variant array with varByte type can avoid this problem, but it's no simple.

I wonder to know is there something can control this conversion? Anyway, i think it as strange behavior.

  WideString := String;
  String := WideString;

Changed? Why?
0
 
AloneCommented:
For Russian:

 WideString := String;
 String := WideString;

works fine.

But for Azeri no:

W := 'az'#609'ri';
S := W; // looks as 'az?ri';
W := 'S' // looks as 'az?ri'; but in Unicode :-((

This behavior depends on ANSI (single-byte in my case) character set restriction. Some Unicode characters has no equivalent in single-byte and system replace them with '?'

Am I right?
0
 
AloneCommented:
Yep! When you place strings direclty in your program source Delphi ALWAYS creates ANSI strings.

procedure TMainForm.Button2Click(Sender: TObject);

function ComFunction(const Param: OleVariant): Integer;
begin
// Works with NT/2k/XP only ;-)
  Result := MessageBoxW(Handle, Pointer(WideString(Param)), '', MB_ICONINFORMATION);
end;

var
  S: string;
  W: WideString;
  C: WideChar;

begin
  ComFunction('Direct: az'#$018F'ri'); // Delphi creates an ANSI (?) string 'az'#$8F'ri'
// may be Unicode string BUT depends on system locale (my locale is Russian but string is in Azeri (Azerbaijani))
  C := #$018F;
  ComFunction('WideChar: Az'+C+'ri'); // works fine
end;
0
 
HBZhangAuthor Commented:
Thanks. But it's not good enough. I'am waiting...

BTW: In my really work, I donnot "place strings direclty in your program source". It comes from rs232 port.
0
 
AloneCommented:
Here is your sample like:

var
  W: WideString;

begin
  W := 'az'#$018F'ri'; // Delphi creates #$0041#$007A + #$040F + #0072#0069
// instead of #$0041#$007A + #$018F+ #0072#0069
// third characted replaced with Russian (using system locale)
end;
0
 
AloneCommented:
If string "comes from RS232" try to receive it into WideString variable and NEVER covert it into single-byte. Always use WideString.
0
 
HBZhangAuthor Commented:
Yeah, always use WideString may be a good idea.
0
 
Lee_NoverCommented:
why not simply use StringToOLEStr :)
0
 
geobulCommented:
Hi,

procedure test(const Param1: WideString);
var
  s: string;
  wc: PWideChar;
begin
  wc := PWideChar(Param1);
  s := WideCharToString(wc);
  s := IntToStr(Length(s)) + ': ' + s;
  ShowMessage(s);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  ws: WideString;
begin
  ws := #253#253#253#02#25;
  test(ws);
end;

Regards, Geo
0
 
geobulCommented:
Or:

procedure TForm1.TisButton1Click(Sender: TObject);
  procedure test(const Param1: WideString);
  var
    s,d: string;
    wc: PWideChar;
    len,i: integer;
  begin
    wc := PWideChar(Param1);
    s := WideCharToString(wc);
    len := length(s);
    d := IntToStr(len) + ': ';
    for i := 1 to len do
      d := d + '#' + inttostr(ord(s[i]));
    ShowMessage(d);
  end;
var
 s:string;
begin
 s:=#253#253#253#02#25;
 test(s);
end;
0
 
AloneCommented:
All depends on receiving strings original format: ANSI or Unicode. When they're ANSI, may possible to use AnsiString and StringToOleStr or direct StrOleVariant assignment.
But when string is Unicode, converting to ANSI representation may corrupt the data, replacing some characters with '?' or other. Using WideString representation is more flexible because it locale-independent.

2geobul: Have you tested your examples? What result they produce? And what your default system locale?
0
 
geobulCommented:
Well, what is supposed to be produced? The second one shows:
5: #253#253#253#2#25

English(US)

Regards, Geo
0
 
AloneCommented:
All depends on receiving strings original format: ANSI or Unicode. When they're ANSI, may possible to use AnsiString and StringToOleStr or direct StrOleVariant assignment.
But when string is Unicode, converting to ANSI representation may corrupt the data, replacing some characters with '?' or other. Using WideString representation is more flexible because it locale-independent.

2geobul: Have you tested your examples? What result they produce? And what your default system locale?
0
 
AloneCommented:
Damn! My browser automatically resends messages!

When you're using Unicode on locale has FULL ANSI representation - on problem. But when your locale hasn't single-byte equivalent, you'll receive some question marks instead of characters and the data will corrupt.

In my expamples: Russian locale has full single-byte ANSI representation and all work ok. But Azerbaijani is Unicode-only and now we'we a big headache with one (!) letter :-(((
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 11
  • 5
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now