[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1216
  • Last Modified:

Recognize Chinese Multibyte Character

Hi

I'd like to ask any of you the method of recognizing Chinese character (a multibyte character) in a passage containing both Chinese and some single byte characters, such as English and numbers.

When I use a pointer, it only points the passage byte by byte and it is not able to detect whether it is a multibyte character or not.

Is there a way to:
1. Extract these Chinese characters from the passage OR
2. Intelligently pointing character by character (not matter the character is multibyte or single byte)  OR
3. Convert all of them to multibyte characters?

Your suggestions will be much appreciated! Thanks!
0
happy_emily
Asked:
happy_emily
  • 3
  • 2
1 Solution
 
pb_indiaCommented:
You can use, depending on your need :
1. wcsrtombcs(wchar_t*, char*, int); //wide to Multibyte

2. _mbbtombc //Convert 1-byte multibyte character to corresponding 2-byte multibyte character
0
 
happy_emilyAuthor Commented:
Can you show me some example programs demonstrating the use of these functions? (I am a newbie in C++ program)
Say for example, the passage is "abcdefXXXX23" where XXXX are the Chinese characters.

Thanks!
0
 
pb_indiaCommented:
Sure.

What exaclty you are trying to do. Just read these characters from a file or something and output it?
Or you just want to separate Chinese characters from English?
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
hellohelloworldCommented:
In fact, what I am trying to do is to count the number of occurrence of every character (Chinese character must be counted) appeared in the passage, which consists of different types of characters (ie. English + Chinese + Numbers).

What I can think of is using pointers to do so. However, I have encountered the problem mentioned...... So, I am pondering whether I should convert all the characters in the passage to be double-byte first and then increment the pointer by 2 everytime reading a character, or I should separate the multibyte characters (Chinese) from the singlebyte ones (English + Numbers) and then count them respectively.

Do you have any idea?
0
 
happy_emilyAuthor Commented:
PS whoops! hellohelloworld is my second account
0
 
pb_indiaCommented:
Hi,

I think what you can do is:
COnvert all the characters from narrow to wide and do a byte comparison to count number of characters.

Use:
mbstate_t ps;
mbsrtowcs(wchar_t* wide,const char* narrow,  int len, mbstate_t* ps);

char* narrow will be your string from passage

and then use code with logic as following... (You wil need to modify it for your own use)
[I can develop the program for you, but 125 is too less for that much work.]


#include <iostream.h>
#include <fstream.h>

int main () {
  ifstream f1;
  char c;
  int numchars, numlines;

  f1.open("test");

  numchars = 0;
  numlines = 0;
  f1.get(c);
  while (f1) {
    while (f1 && c != '\n') {
      numchars = numchars + 1;
      f1.get(c);
    }
    numlines = numlines + 1;
    f1.get(c);
  }
  cout << "The file has " << numlines << " lines and " 
    << numchars << " characters" << endl;
  return(0);
}

0

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now