Solved

Multibyte String byte difference

Posted on 2006-11-15
6
265 Views
Last Modified: 2010-04-15
Hi,
I am using utf-8 encoding for my application.

is_valid_name (char *name){
    char *cp;

    if (name == NULL || *name == '\0')
     return 0;

    for (cp = name; *cp != '\0'; cp++) {
          printf("%i", *cp);
    }
    return 1;
}

In the function above, name is composed of 1 Chinese character but 3 bytes.
So the strlen(name) = 3.

On windows, the char pointer *cp prints out 237, 138, 184 respectively.
On unix, *cp prints out -19, -118, -72 repectively for the same string.

I know that the negative values are the difference from 256.
Could you explain the byte difference ?
Why the different values for unix and windows?
Is the negative byte value only apply for multi byte characters?

Thanks
Jamie




0
Comment
Question by:jamie_lynn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
6 Comments
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 17951309
Hi jamie_lynn,

This can be compiler settings. With some compilers the default for char is unsigned, in others its signed.

Paul
0
 

Author Comment

by:jamie_lynn
ID: 17951556
Hi Paul,

What is a better way to validate the string?
Using unsigned char for the parameter or checking for negative?

Thanks
Jamie

i.e.
is_valid_name (unsigned char *name){
...
}

or

is_valid_name (char *name){
    char *cp;

    if (name == NULL || *name == '\0')
     return 0;

    for (cp = name; *cp != '\0'; cp++) {
          if (*cp < 0)  
                 continue;
          printf("%i", *cp);
    }
    return 1;
}

0
 
LVL 16

Accepted Solution

by:
PaulCaswell earned 500 total points
ID: 17951574
I'd leave the parameter as char so caller doesnt have to cast.

is_valid_name (char *name){
    unsigned char *cp;

...

    for (cp = (unsigned char *) name; *cp != '\0'; cp++) {
 
That way compiler settings wont change how your code works.

Paul
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:jamie_lynn
ID: 17951627
Thanks Paul!
Jamie
0
 

Author Comment

by:jamie_lynn
ID: 17951781
Paul,

I casted name with unsigned char * but I still get negative byte value on unix....
What should I try next?

for (cp = (unsigned char *) name; *cp != '\0'; cp++) {
...
}

Thanks
Jamie
0
 

Author Comment

by:jamie_lynn
ID: 17951802
Ooops. This is my bad.
This works. I forgot to declare cp as unsigned char.
Thanks!
Jamie
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were small…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question