Solved

Validating multibyte strings

Posted on 2006-11-14
3
189 Views
Last Modified: 2010-04-15
Hi,
I'm trying to create a function that validates a string.
The string cannot contain '<', '>', and '/'.
This function works great for English.
For Chinese string, the function works on Windows, but on unix, it returns false when the string does not contain the three characters.  

i.e.
is_valid_name (char *name){
    char *cp;

    if (name == NULL || *name == '\0')
      return 0;

    for (cp = name; *cp != '\0'; cp++) {
      if (*cp == '/' || *cp == '<' || *cp == '>')
          return 0;
    }
    return 1;
}


Is it because Chinese uses multibyte characters and the part of the string can contain one of the three characters?
What can I do so that I can validate multibyte strings?

Thanks
Jamie
0
Comment
Question by:jamie_lynn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 15

Accepted Solution

by:
bpmurray earned 500 total points
ID: 17942334
For multibyte characters, you have to know the encoding. In fact, you should be doing this for ALL text, including western. Since the encoding patterns vary for each encoding-type, you have to be very careful. For example, a simple algorithm for Windows encodings varies depending on locale.

Japanese (CP 932):
for (cp=name; *cp; cp++) {
   if (*cp == '/' || *cp == '<' || *cp == '>')
      return 0;
   if ((*cp > 0x80 && *cp < 0xA0) || (*cp > 0xDF && *cp < 0xFD)) /* if it's a double-byte char, increment once more */
      cp++;
}

Korean (CP 949):
for (cp=name; *cp; cp++) {
   if (*cp == '/' || *cp == '<' || *cp == '>')
      return 0;
   if (*cp > 0x80 && *cp < 0xFF) /* if it's a double-byte char, increment once more */
      cp++;
}

As you can see, there isn't a one-size-fits-all solution. For example, there are about 6 common encodings in use in Japan, all different. The reality is that you MUST know the encoding before you do this kind of thing. A good solution is to always use Unicode UTF16 internally, converting when you read data and when you write data. That way the only time you have to manage different encodings is at IO, while internally everything is the same for all languages.

Since you're doing stuff cross-platform, I'd suggest your best solution is to use the functionality in a responably standard lib. Have you looked at ICU? See http://icu.sourceforge.net/ - it's backed by IBM and many other big companies, so it's pretty much the de facto cross-platform standard.
   
0
 
LVL 6

Expert Comment

by:_iskywalker_
ID: 17948833
you may want to use unicode.
there are libs for unicode.
0
 

Author Comment

by:jamie_lynn
ID: 17951080
Thanks!
Jamie
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question