happy_emily
asked on
Recognize Chinese Multibyte Character
Hi
I'd like to ask any of you the method of recognizing Chinese character (a multibyte character) in a passage containing both Chinese and some single byte characters, such as English and numbers.
When I use a pointer, it only points the passage byte by byte and it is not able to detect whether it is a multibyte character or not.
Is there a way to:
1. Extract these Chinese characters from the passage OR
2. Intelligently pointing character by character (not matter the character is multibyte or single byte) OR
3. Convert all of them to multibyte characters?
Your suggestions will be much appreciated! Thanks!
I'd like to ask any of you the method of recognizing Chinese character (a multibyte character) in a passage containing both Chinese and some single byte characters, such as English and numbers.
When I use a pointer, it only points the passage byte by byte and it is not able to detect whether it is a multibyte character or not.
Is there a way to:
1. Extract these Chinese characters from the passage OR
2. Intelligently pointing character by character (not matter the character is multibyte or single byte) OR
3. Convert all of them to multibyte characters?
Your suggestions will be much appreciated! Thanks!
ASKER
Can you show me some example programs demonstrating the use of these functions? (I am a newbie in C++ program)
Say for example, the passage is "abcdefXXXX23" where XXXX are the Chinese characters.
Thanks!
Say for example, the passage is "abcdefXXXX23" where XXXX are the Chinese characters.
Thanks!
Sure.
What exaclty you are trying to do. Just read these characters from a file or something and output it?
Or you just want to separate Chinese characters from English?
What exaclty you are trying to do. Just read these characters from a file or something and output it?
Or you just want to separate Chinese characters from English?
In fact, what I am trying to do is to count the number of occurrence of every character (Chinese character must be counted) appeared in the passage, which consists of different types of characters (ie. English + Chinese + Numbers).
What I can think of is using pointers to do so. However, I have encountered the problem mentioned...... So, I am pondering whether I should convert all the characters in the passage to be double-byte first and then increment the pointer by 2 everytime reading a character, or I should separate the multibyte characters (Chinese) from the singlebyte ones (English + Numbers) and then count them respectively.
Do you have any idea?
What I can think of is using pointers to do so. However, I have encountered the problem mentioned...... So, I am pondering whether I should convert all the characters in the passage to be double-byte first and then increment the pointer by 2 everytime reading a character, or I should separate the multibyte characters (Chinese) from the singlebyte ones (English + Numbers) and then count them respectively.
Do you have any idea?
ASKER
PS whoops! hellohelloworld is my second account
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
1. wcsrtombcs(wchar_t*, char*, int); //wide to Multibyte
2. _mbbtombc //Convert 1-byte multibyte character to corresponding 2-byte multibyte character