Solved

Strange behavior with WideString conversion

Posted on 2002-04-15
21
559 Views
Last Modified: 2010-04-04
procedure TForm1.TisButton1Click(Sender: TObject);
  procedure test(const Param1: WideString);
  var
    s,d:string;
    i,len:integer;
  begin
    s:=param1;
    len:=length(s);
    d:=Inttostr(len)+': ';
    for i:=1 to len do
      d:=d+' '+inttostr(ord(s[i]));
    ShowMessage(d);
  end;
var
  s:string;
begin
  s:=#253#253#253#02#25;
  test(s);
end;

After call test, the result is #253#253#63#25, Why?

Thanks.
0
Comment
Question by:HBZhang
  • 11
  • 5
  • 3
  • +2
21 Comments
 
LVL 1

Expert Comment

by:MBo
ID: 6943694
I've tried.
Result- 5:253 253 253 2 25
It looks like localization problem?
Is your Windows English? (My is Russian)
0
 

Author Comment

by:HBZhang
ID: 6943710
No, my is Chinese.
0
 
LVL 1

Expert Comment

by:Alone
ID: 6943941
Hi!

Your error is in this line: s:=param1;

Because Param1 is WideString and S is string that line is equivalent of WideCharToString (or similar) function call. This conversion depends on your system locale settings and behavior differs between Russian and Chinese locales :-))

Try this:

procedure TMainForm.Button1Click(Sender: TObject);

procedure Test(const Param: WideString);
var
  S: string;
  I, Len: Integer;
  P: Pointer;
begin
  Len := Length(Param) * SizeOf(Param[1]); // size in bytes
  S := IntToStr(Len) + ',';
  for I := 0 to Len - 1 do
    S := S + '#' + IntToStr(Byte(AnsiString(Pointer(Param))[I]));
  ShowMessage(S);
end;

begin
  Test(#204#224#236#224);
// shows 8,#0#28#4#48#4#60#4#48  
end;

My system is Russian too :-))))


0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 

Author Comment

by:HBZhang
ID: 6943996
I'm call a COM object function with a string as parameter. Because COM only support WideString, then problem occurs.

Is there a way to exchange data between String and WideString safely?

Thanks.
0
 
LVL 1

Expert Comment

by:Alone
ID: 6944037
For Chinese? For what purpose?

var
  S1, S2: string;
  W: WideString;
begin
  S1 := 'blabla';  // ANSI 'blabla'
  W := S1;         // ANSI #0'b'#0'l'#0'a'#0'b'#0'l'#0'a'
  S2 := W;         // ANSI 'blabla'
end;

Don't cofuse with ANSI (may be multibyte) and Unicode characters. They are MUST BE different! My previous example shows how Unicode string with Russian characters looks at low-level (bytes chain).

For european languages (and Russian)  ANSI strings are always single-byte. When we assigning them to WideString, they're expanding to double-byte Unicode characters, for my example, 4 bytes (4 single-byte characters) to 8 bytes (4 double-byte characters).

How many Chinese characters are in your sample: #253#253#253#02#25?

Please try current example (blablabla). If strings are not corrupted your system still is working ok. :-))
0
 
LVL 1

Expert Comment

by:Alone
ID: 6944048
For Chinese? For what purpose?

var
  S1, S2: string;
  W: WideString;
begin
  S1 := 'blabla';  // ANSI 'blabla'
  W := S1;         // ANSI #0'b'#0'l'#0'a'#0'b'#0'l'#0'a'
  S2 := W;         // ANSI 'blabla'
end;

Don't cofuse with ANSI (may be multibyte) and Unicode characters. They are MUST BE different! My previous example shows how Unicode string with Russian characters looks at low-level (bytes chain).

For european languages (and Russian)  ANSI strings are always single-byte. When we assigning them to WideString, they're expanding to double-byte Unicode characters, for my example, 4 bytes (4 single-byte characters) to 8 bytes (4 double-byte characters).

How many Chinese characters are in your sample: #253#253#253#02#25?

Please try current example (blablabla). If strings are not corrupted your system still is working ok. :-))
0
 
LVL 1

Expert Comment

by:Alone
ID: 6944050
Sorry my message sent twice :-((
0
 

Author Comment

by:HBZhang
ID: 6944068
Create a variant array with varByte type can avoid this problem, but it's no simple.

I wonder to know is there something can control this conversion? Anyway, i think it as strange behavior.

  WideString := String;
  String := WideString;

Changed? Why?
0
 
LVL 1

Expert Comment

by:Alone
ID: 6944089
For Russian:

 WideString := String;
 String := WideString;

works fine.

But for Azeri no:

W := 'az'#609'ri';
S := W; // looks as 'az?ri';
W := 'S' // looks as 'az?ri'; but in Unicode :-((

This behavior depends on ANSI (single-byte in my case) character set restriction. Some Unicode characters has no equivalent in single-byte and system replace them with '?'

Am I right?
0
 
LVL 1

Expert Comment

by:Alone
ID: 6944125
Yep! When you place strings direclty in your program source Delphi ALWAYS creates ANSI strings.

procedure TMainForm.Button2Click(Sender: TObject);

function ComFunction(const Param: OleVariant): Integer;
begin
// Works with NT/2k/XP only ;-)
  Result := MessageBoxW(Handle, Pointer(WideString(Param)), '', MB_ICONINFORMATION);
end;

var
  S: string;
  W: WideString;
  C: WideChar;

begin
  ComFunction('Direct: az'#$018F'ri'); // Delphi creates an ANSI (?) string 'az'#$8F'ri'
// may be Unicode string BUT depends on system locale (my locale is Russian but string is in Azeri (Azerbaijani))
  C := #$018F;
  ComFunction('WideChar: Az'+C+'ri'); // works fine
end;
0
 

Author Comment

by:HBZhang
ID: 6944165
Thanks. But it's not good enough. I'am waiting...

BTW: In my really work, I donnot "place strings direclty in your program source". It comes from rs232 port.
0
 
LVL 1

Expert Comment

by:Alone
ID: 6944181
Here is your sample like:

var
  W: WideString;

begin
  W := 'az'#$018F'ri'; // Delphi creates #$0041#$007A + #$040F + #0072#0069
// instead of #$0041#$007A + #$018F+ #0072#0069
// third characted replaced with Russian (using system locale)
end;
0
 
LVL 1

Accepted Solution

by:
Alone earned 300 total points
ID: 6944189
If string "comes from RS232" try to receive it into WideString variable and NEVER covert it into single-byte. Always use WideString.
0
 

Author Comment

by:HBZhang
ID: 6944205
Yeah, always use WideString may be a good idea.
0
 
LVL 12

Expert Comment

by:Lee_Nover
ID: 6944714
why not simply use StringToOLEStr :)
0
 
LVL 17

Expert Comment

by:geobul
ID: 6944757
Hi,

procedure test(const Param1: WideString);
var
  s: string;
  wc: PWideChar;
begin
  wc := PWideChar(Param1);
  s := WideCharToString(wc);
  s := IntToStr(Length(s)) + ': ' + s;
  ShowMessage(s);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  ws: WideString;
begin
  ws := #253#253#253#02#25;
  test(ws);
end;

Regards, Geo
0
 
LVL 17

Expert Comment

by:geobul
ID: 6944820
Or:

procedure TForm1.TisButton1Click(Sender: TObject);
  procedure test(const Param1: WideString);
  var
    s,d: string;
    wc: PWideChar;
    len,i: integer;
  begin
    wc := PWideChar(Param1);
    s := WideCharToString(wc);
    len := length(s);
    d := IntToStr(len) + ': ';
    for i := 1 to len do
      d := d + '#' + inttostr(ord(s[i]));
    ShowMessage(d);
  end;
var
 s:string;
begin
 s:=#253#253#253#02#25;
 test(s);
end;
0
 
LVL 1

Expert Comment

by:Alone
ID: 6944895
All depends on receiving strings original format: ANSI or Unicode. When they're ANSI, may possible to use AnsiString and StringToOleStr or direct StrOleVariant assignment.
But when string is Unicode, converting to ANSI representation may corrupt the data, replacing some characters with '?' or other. Using WideString representation is more flexible because it locale-independent.

2geobul: Have you tested your examples? What result they produce? And what your default system locale?
0
 
LVL 17

Expert Comment

by:geobul
ID: 6944996
Well, what is supposed to be produced? The second one shows:
5: #253#253#253#2#25

English(US)

Regards, Geo
0
 
LVL 1

Expert Comment

by:Alone
ID: 6945015
All depends on receiving strings original format: ANSI or Unicode. When they're ANSI, may possible to use AnsiString and StringToOleStr or direct StrOleVariant assignment.
But when string is Unicode, converting to ANSI representation may corrupt the data, replacing some characters with '?' or other. Using WideString representation is more flexible because it locale-independent.

2geobul: Have you tested your examples? What result they produce? And what your default system locale?
0
 
LVL 1

Expert Comment

by:Alone
ID: 6945055
Damn! My browser automatically resends messages!

When you're using Unicode on locale has FULL ANSI representation - on problem. But when your locale hasn't single-byte equivalent, you'll receive some question marks instead of characters and the data will corrupt.

In my expamples: Russian locale has full single-byte ANSI representation and all work ok. But Azerbaijani is Unicode-only and now we'we a big headache with one (!) letter :-(((
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Objective: - This article will help user in how to convert their numeric value become words. How to use 1. You can copy this code in your Unit as function 2. than you can perform your function by type this code The Code   (CODE) The Im…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question