Link to home
Start Free TrialLog in
Avatar of michaelh77
michaelh77

asked on

Parsing unicode text file with wcstok?

Hello,

I have an Excel spreadsheet containing data in multiple languages (English, Korean, Japanese, Chinese, and Arabic).  I'm looking for a way to get this data into the wchar_t portion of my C structure.

My first thought was to save the excel spreadsheet to a unicode text file, and then parse the file using wcstok.
However, I noticed when doing this that my second call to wcstok returns NULL.  If I replace the unicode text file with a normal text file with only english characters it seems to parse properly.

Am I barking up the wrong tree?  Is it possible to parse a Unicode text file with wcstok?  Or is there something more involved to this process?

Thanks,
Mike
Avatar of Kryp
Kryp
Flag of United Kingdom of Great Britain and Northern Ireland image

Can you paste an example call?
Is it returning the whole string as the first token for example (when it shouldn't be)?

I assume that you're using VC++ as your compiler, since you mention excel.

Avatar of michaelh77
michaelh77

ASKER

Yes, I am developing in VC++ 6.0.

I am using fgetws to read in each line and I also noticed
that when I used fgetws to count the number of records in my file it does not get the proper number.  There are over 100 entries in my file and the count gives me 25.

Is it possible I could be using fgetws wrong?

Thanks.

int main( int argc, char ** argv )

{
  wchar_t line[256];
  int i,entries;
  wchar_t *token;
  FILE *fp;
   
  fp = fopen("unitest.txt","r");
  if (fp == NULL)
  {
    printf("could not open i18n file.\n");
    exit(0);
  }
  i = 0;
  while (fgetws (line, 256, fp) != NULL)
  {
    i++;
  }
  rewind(fp);
  entries = i;

  // malloc space for my names_from_file structure...

  i = 0;
  while (fgetws (line, 256, fp) != NULL)
  {
    token = wcstok (line, L",");
    //wcscpy(names_from_file_english[i].name,token);

    token = wcstok (NULL, L",");
    //wcscpy(names_from_file_chinese[i].name,token);

    token = wcstok (NULL, L",");
    //wcscpy(names_from_file_korean[i].name,token);
    i++;
  }
  fclose(fp);
  //free names_from_file structure...
  exit(0);      
}
 
Yes, I am developing in VC++ 6.0.

I am using fgetws to read in each line and I also noticed
that when I used fgetws to count the number of records in my file it does not get the proper number.  There are over 100 entries in my file and the count gives me 25.

Is it possible I could be using fgetws wrong?

Thanks.

int main( int argc, char ** argv )

{
  wchar_t line[256];
  int i,entries;
  wchar_t *token;
  FILE *fp;
   
  fp = fopen("unitest.txt","r");
  if (fp == NULL)
  {
    printf("could not open i18n file.\n");
    exit(0);
  }
  i = 0;
  while (fgetws (line, 256, fp) != NULL)
  {
    i++;
  }
  rewind(fp);
  entries = i;

  // malloc space for my names_from_file structure...

  i = 0;
  while (fgetws (line, 256, fp) != NULL)
  {
    token = wcstok (line, L",");
    //wcscpy(names_from_file_english[i].name,token);

    token = wcstok (NULL, L",");
    //wcscpy(names_from_file_chinese[i].name,token);

    token = wcstok (NULL, L",");
    //wcscpy(names_from_file_korean[i].name,token);
    i++;
  }
  fclose(fp);
  //free names_from_file structure...
  exit(0);      
}
 
ASKER CERTIFIED SOLUTION
Avatar of Kryp
Kryp
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Nothing has happened on this question in more than 13 months. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
accept answer by Kryp [grade B] (only a hint towards a solution).

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer