Solved

Validating multibyte strings

Posted on 2006-11-14
3
190 Views
Last Modified: 2010-04-15
Hi,
I'm trying to create a function that validates a string.
The string cannot contain '<', '>', and '/'.
This function works great for English.
For Chinese string, the function works on Windows, but on unix, it returns false when the string does not contain the three characters.  

i.e.
is_valid_name (char *name){
    char *cp;

    if (name == NULL || *name == '\0')
      return 0;

    for (cp = name; *cp != '\0'; cp++) {
      if (*cp == '/' || *cp == '<' || *cp == '>')
          return 0;
    }
    return 1;
}


Is it because Chinese uses multibyte characters and the part of the string can contain one of the three characters?
What can I do so that I can validate multibyte strings?

Thanks
Jamie
0
Comment
Question by:jamie_lynn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 15

Accepted Solution

by:
bpmurray earned 500 total points
ID: 17942334
For multibyte characters, you have to know the encoding. In fact, you should be doing this for ALL text, including western. Since the encoding patterns vary for each encoding-type, you have to be very careful. For example, a simple algorithm for Windows encodings varies depending on locale.

Japanese (CP 932):
for (cp=name; *cp; cp++) {
   if (*cp == '/' || *cp == '<' || *cp == '>')
      return 0;
   if ((*cp > 0x80 && *cp < 0xA0) || (*cp > 0xDF && *cp < 0xFD)) /* if it's a double-byte char, increment once more */
      cp++;
}

Korean (CP 949):
for (cp=name; *cp; cp++) {
   if (*cp == '/' || *cp == '<' || *cp == '>')
      return 0;
   if (*cp > 0x80 && *cp < 0xFF) /* if it's a double-byte char, increment once more */
      cp++;
}

As you can see, there isn't a one-size-fits-all solution. For example, there are about 6 common encodings in use in Japan, all different. The reality is that you MUST know the encoding before you do this kind of thing. A good solution is to always use Unicode UTF16 internally, converting when you read data and when you write data. That way the only time you have to manage different encodings is at IO, while internally everything is the same for all languages.

Since you're doing stuff cross-platform, I'd suggest your best solution is to use the functionality in a responably standard lib. Have you looked at ICU? See http://icu.sourceforge.net/ - it's backed by IBM and many other big companies, so it's pretty much the de facto cross-platform standard.
   
0
 
LVL 6

Expert Comment

by:_iskywalker_
ID: 17948833
you may want to use unicode.
there are libs for unicode.
0
 

Author Comment

by:jamie_lynn
ID: 17951080
Thanks!
Jamie
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.

737 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question