Solved

Recognize Chinese Multibyte Character

Posted on 2004-11-01
1,211 Views
Last Modified: 2013-12-03
Hi

I'd like to ask any of you the method of recognizing Chinese character (a multibyte character) in a passage containing both Chinese and some single byte characters, such as English and numbers.

When I use a pointer, it only points the passage byte by byte and it is not able to detect whether it is a multibyte character or not.

Is there a way to:
1. Extract these Chinese characters from the passage OR
2. Intelligently pointing character by character (not matter the character is multibyte or single byte)  OR
3. Convert all of them to multibyte characters?

Your suggestions will be much appreciated! Thanks!
0
Question by:happy_emily
    6 Comments
     
    LVL 2

    Expert Comment

    by:pb_india
    You can use, depending on your need :
    1. wcsrtombcs(wchar_t*, char*, int); //wide to Multibyte

    2. _mbbtombc //Convert 1-byte multibyte character to corresponding 2-byte multibyte character
    0
     

    Author Comment

    by:happy_emily
    Can you show me some example programs demonstrating the use of these functions? (I am a newbie in C++ program)
    Say for example, the passage is "abcdefXXXX23" where XXXX are the Chinese characters.

    Thanks!
    0
     
    LVL 2

    Expert Comment

    by:pb_india
    Sure.

    What exaclty you are trying to do. Just read these characters from a file or something and output it?
    Or you just want to separate Chinese characters from English?
    0
     

    Expert Comment

    by:hellohelloworld
    In fact, what I am trying to do is to count the number of occurrence of every character (Chinese character must be counted) appeared in the passage, which consists of different types of characters (ie. English + Chinese + Numbers).

    What I can think of is using pointers to do so. However, I have encountered the problem mentioned...... So, I am pondering whether I should convert all the characters in the passage to be double-byte first and then increment the pointer by 2 everytime reading a character, or I should separate the multibyte characters (Chinese) from the singlebyte ones (English + Numbers) and then count them respectively.

    Do you have any idea?
    0
     

    Author Comment

    by:happy_emily
    PS whoops! hellohelloworld is my second account
    0
     
    LVL 2

    Accepted Solution

    by:
    Hi,

    I think what you can do is:
    COnvert all the characters from narrow to wide and do a byte comparison to count number of characters.

    Use:
    mbstate_t ps;
    mbsrtowcs(wchar_t* wide,const char* narrow,  int len, mbstate_t* ps);

    char* narrow will be your string from passage

    and then use code with logic as following... (You wil need to modify it for your own use)
    [I can develop the program for you, but 125 is too less for that much work.]


    #include <iostream.h>
    #include <fstream.h>

    int main () {
      ifstream f1;
      char c;
      int numchars, numlines;

      f1.open("test");

      numchars = 0;
      numlines = 0;
      f1.get(c);
      while (f1) {
        while (f1 && c != '\n') {
          numchars = numchars + 1;
          f1.get(c);
        }
        numlines = numlines + 1;
        f1.get(c);
      }
      cout << "The file has " << numlines << " lines and "
        << numchars << " characters" << endl;
      return(0);
    }

    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone. Privacy Policy Terms of Use

    Featured Post

    Free Trending Threat Insights Every Day

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
    Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
    The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
    The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

    875 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    12 Experts available now in Live!

    Get 1:1 Help Now