Solved

Multibyte String byte difference

Posted on 2006-11-15
6
262 Views
Last Modified: 2010-04-15
Hi,
I am using utf-8 encoding for my application.

is_valid_name (char *name){
    char *cp;

    if (name == NULL || *name == '\0')
     return 0;

    for (cp = name; *cp != '\0'; cp++) {
          printf("%i", *cp);
    }
    return 1;
}

In the function above, name is composed of 1 Chinese character but 3 bytes.
So the strlen(name) = 3.

On windows, the char pointer *cp prints out 237, 138, 184 respectively.
On unix, *cp prints out -19, -118, -72 repectively for the same string.

I know that the negative values are the difference from 256.
Could you explain the byte difference ?
Why the different values for unix and windows?
Is the negative byte value only apply for multi byte characters?

Thanks
Jamie




0
Comment
Question by:jamie_lynn
  • 4
  • 2
6 Comments
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 17951309
Hi jamie_lynn,

This can be compiler settings. With some compilers the default for char is unsigned, in others its signed.

Paul
0
 

Author Comment

by:jamie_lynn
ID: 17951556
Hi Paul,

What is a better way to validate the string?
Using unsigned char for the parameter or checking for negative?

Thanks
Jamie

i.e.
is_valid_name (unsigned char *name){
...
}

or

is_valid_name (char *name){
    char *cp;

    if (name == NULL || *name == '\0')
     return 0;

    for (cp = name; *cp != '\0'; cp++) {
          if (*cp < 0)  
                 continue;
          printf("%i", *cp);
    }
    return 1;
}

0
 
LVL 16

Accepted Solution

by:
PaulCaswell earned 500 total points
ID: 17951574
I'd leave the parameter as char so caller doesnt have to cast.

is_valid_name (char *name){
    unsigned char *cp;

...

    for (cp = (unsigned char *) name; *cp != '\0'; cp++) {
 
That way compiler settings wont change how your code works.

Paul
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 

Author Comment

by:jamie_lynn
ID: 17951627
Thanks Paul!
Jamie
0
 

Author Comment

by:jamie_lynn
ID: 17951781
Paul,

I casted name with unsigned char * but I still get negative byte value on unix....
What should I try next?

for (cp = (unsigned char *) name; *cp != '\0'; cp++) {
...
}

Thanks
Jamie
0
 

Author Comment

by:jamie_lynn
ID: 17951802
Ooops. This is my bad.
This works. I forgot to declare cp as unsigned char.
Thanks!
Jamie
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now