• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 276
  • Last Modified:

Multibyte String byte difference

Hi,
I am using utf-8 encoding for my application.

is_valid_name (char *name){
    char *cp;

    if (name == NULL || *name == '\0')
     return 0;

    for (cp = name; *cp != '\0'; cp++) {
          printf("%i", *cp);
    }
    return 1;
}

In the function above, name is composed of 1 Chinese character but 3 bytes.
So the strlen(name) = 3.

On windows, the char pointer *cp prints out 237, 138, 184 respectively.
On unix, *cp prints out -19, -118, -72 repectively for the same string.

I know that the negative values are the difference from 256.
Could you explain the byte difference ?
Why the different values for unix and windows?
Is the negative byte value only apply for multi byte characters?

Thanks
Jamie




0
jamie_lynn
Asked:
jamie_lynn
  • 4
  • 2
1 Solution
 
PaulCaswellCommented:
Hi jamie_lynn,

This can be compiler settings. With some compilers the default for char is unsigned, in others its signed.

Paul
0
 
jamie_lynnAuthor Commented:
Hi Paul,

What is a better way to validate the string?
Using unsigned char for the parameter or checking for negative?

Thanks
Jamie

i.e.
is_valid_name (unsigned char *name){
...
}

or

is_valid_name (char *name){
    char *cp;

    if (name == NULL || *name == '\0')
     return 0;

    for (cp = name; *cp != '\0'; cp++) {
          if (*cp < 0)  
                 continue;
          printf("%i", *cp);
    }
    return 1;
}

0
 
PaulCaswellCommented:
I'd leave the parameter as char so caller doesnt have to cast.

is_valid_name (char *name){
    unsigned char *cp;

...

    for (cp = (unsigned char *) name; *cp != '\0'; cp++) {
 
That way compiler settings wont change how your code works.

Paul
0
Making Bulk Changes to Active Directory

Watch this video to see how easy it is to make mass changes to Active Directory from an external text file without using complicated scripts.

 
jamie_lynnAuthor Commented:
Thanks Paul!
Jamie
0
 
jamie_lynnAuthor Commented:
Paul,

I casted name with unsigned char * but I still get negative byte value on unix....
What should I try next?

for (cp = (unsigned char *) name; *cp != '\0'; cp++) {
...
}

Thanks
Jamie
0
 
jamie_lynnAuthor Commented:
Ooops. This is my bad.
This works. I forgot to declare cp as unsigned char.
Thanks!
Jamie
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now