Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Read Unicode string from text file into CString

Posted on 2013-01-04
23
652 Views
Last Modified: 2013-01-23
I have a text file which was created with a C# program like this:
             using (StreamWriter sw = new StreamWriter(defaultFile, false))
                {
                    sw.WriteLine(profile_name);
                }      

Where profile_name = 10.67µm steps to 277.42µm

This file created has a single line that reads as follows when I open it in Notepad:
10.67µm steps to 277.42µm

Note the µ character.

I have a C++ (Visual Studio 2010) project that needs to read this line into a CString, but I always get garbage using CStdioFile.

I even tried using that CStdioFileEx  (http://www.codeproject.com/Articles/4119/CStdioFile-derived-class-for-multibyte-and-Unicode) but still I get only junk.

What is the proper way to read this value into a C++ CString?

Thanks
0
Comment
Question by:PMH4514
  • 12
  • 10
23 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 38745166
Are you sure you are writing teh file as UNICODE?  The 'µ' character is not specific for that, it can also be used in ASCII. Can you try to explicitly open the file as either UNICODE or ASCII using Notepad (you can choose the encoding in the 'File Open' dialog)?
0
 

Author Comment

by:PMH4514
ID: 38745270
Hi jkr -

I have assumed the  'µ' character was unicode, perhaps that is my first problem.
I tried your suggestion, choosing UNICODE in Notepad during File Open and as Unicode, it appears completely wrong (as I will attempt to paste below)

¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿

So given the text file clearly isn't actually Unicode, why then does my attempt to read the line fail?

When I try the following:

CString sLine = _T("");
CStdioFile file;
if (file.Open(szFilePath, CFile::modeRead))
{
      file.ReadString(strLine);
}

I end up with:

strLine = "10.67µm steps to 277.42µm"
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745330
Is your project set toUNICODE (which I assume, since it is the default)? Try setting it to ASCII switching "Project Properties|C/C++|General|Use Character Set" from UNICODE to "Multi-Byte".
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 

Author Comment

by:PMH4514
ID: 38745392
My project is UNICODE. I couldn't set it to MBCS and recompile/run because other libraries in use by it (unrelated to this query) require UNICODE.
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745399
OK, so why not using STL's file I/O for that purpose, since it allows you to explicitly open and read ANSI/UNICODE files? E.g.

#include <fstream>
#include <string>

std::string sLine;

std::ifstream is("file.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

getline(is,sLine);

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
ID: 38745411
BTW, another - maybe easier - option would be to ensure that the C# project writes UNICODE ;o)
0
 

Author Comment

by:PMH4514
ID: 38745546
The ifstream version produces exactly the same result

sLine = "10.67µm steps to 277.42µm"

The C# side has other dependencies I'd rather not open that box.
0
 

Author Comment

by:PMH4514
ID: 38745563
I can "hack" a fix:

strLine.Replace(_T("Â"), _T(""));

and get the line I need, but it's a hack and I don't understand it, so I don't like it :)
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745572
Hmm, OK - what is 'Â' in hexadecimal? And, do you find the same value in the file in question qhen you open that with a hex editor?
0
 

Author Comment

by:PMH4514
ID: 38745627
If I type  into a new file, and view it in Hex mode,  it reads C2
(also verified C2 using the little converter about halfway down this page: http://www.thehistoryprofessor.us/bin/header/ascii.html)

When I view the text file in question in hex mode, I do not see C2 anywhere.
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745668
Well, that's right, but what does it evaluate to in your case when you see it in the debugger?
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745671
Of, even better: Could you attach the file to this thread?
0
 

Author Comment

by:PMH4514
ID: 38745688
I'm attaching the file.

In the debugger when I roll over the line, and right click and choose Hexadecimal display, nothing changes.
0
 

Author Comment

by:PMH4514
ID: 38745691
woops, didn't attach to previous...
1542.txt
0
 

Author Comment

by:PMH4514
ID: 38745696
I tried making a new text file with notepad, and typed exactly the same values in, copying and pasting 'µ' from character map, and then saving (just to take the C# project out of the equation.)

the C++ attempt to read it still produces the same result.
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745741
OK, *now* it is getting weird. Just using the following code:

#include <fstream>
#include <string>
#include <iostream>
using namespace std;

int main () {

  string sLine;

  ifstream is("1542.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

  getline(is,sLine);

  cout << sLine << endl;
}

Open in new window


With VC++, I get the same result that you get:

10.67Ám steps to 277.42Ám

Open in new window


Using g++, that is

10.67µm steps to 277.42µm

Open in new window


The VC++ debugger correctly shows that as

           [5]      0xb5 'µ'      char

And when I try the same using

#include <windows.h>

#include <fstream>
#include <string>
#include <iostream>
using namespace std;

int main () {

  string sLine;

  ifstream is("1542.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

  getline(is,sLine);

  //cout << sLine << endl;
  MessageBox(NULL,sLine.c_str(),"Test",MB_OK);
}
                                            

Open in new window


I get the attached message box with the result you can see - the correct one. Seems that all we have here is a console codepage issue, probably not even worth to bother ;o)
1542.png
0
 

Author Comment

by:PMH4514
ID: 38745929
Weird indeed!
Unfortunately, my applied problem seems to go beyond a console/debugger codepage issue because I need to use the string to format a path to an actual datafile to load.

For example, I may have several "profile" files (which are just CSV text given an extension of .profile rather than .txt)

c:\profiles\1µm steps to 10µm.profile
c:\profiles\2µm steps to 11µm.profile
c:\profiles\3µm steps to 12µm.profile
c:\profiles\4µm steps to 13µm.profile
c:\profiles\10.67µm steps to 277.42µm.profile  
etc..


The file we are reading (the one attached earlier) holds as its first and only line, the name of the default datafile to load. So I'm using the string I read in to form a fully qualified path after reading sLine:

CString sPath = _T("");
sPath.Format(_T("c:\\profiles\\%s.profile"), sLine);

The problem then is I end up with this path:
sPath = _T("c:\\profiles\\10.67µm steps to 277.42µm.profile")

Which is not a file that exists.

I check for existence like:

BOOL FileExists(CString path)
{
   CFileStatus status;
   return CFile::GetStatus( path, status );  
}

If I use my earlier described "hack" to strip out that 'Â' character, in order to check for and open the file at:

sPath = _T("c:\\profiles\\10.67µm steps to 277.42µm.profile")

everything works as expected (that is, yes they are weird filenames, but there is no inherent issue opening and reading from them)

Thanks for all your attention!
0
 
LVL 86

Expert Comment

by:jkr
ID: 38747385
That's even more odd, since when checking with the debugger, there was no such character at all :-/
0
 

Author Comment

by:PMH4514
ID: 38747402
I guess sometimes hacks have their place :)
0
 
LVL 33

Expert Comment

by:sarabande
ID: 38752119
the Ám is a typical output for an utf-8 character that was shown by a program that could not handle UTF-8 (such as notepad or windows command interpreter).

the vs editor (and debugger) can handle utf-8 and would silently show the appropriate ansi character (if any). you could verify that i was right by opening the file in visual studio with the hex editor (use the drop-down box at the open button in the open file dialog).

You can recognize utf8 characters by their prefix code which is not printable in ascii.

the common utf-8 characters have 2 bytes and would begin with hex c2, c3, ...

the µ has utf-8 code sequence "CEBC" which you should find in the hex table if it is utf-8.

Sara
0
 

Author Comment

by:PMH4514
ID: 38758805
Thank you Sara.
Your comments imply, as JKR had suggested, that the issue is merely the display of the character within the IDE. But this doesn't seem to be the case, given that:

1. I can strip/replace the 'Á' character and be left with the IDE properly displaying 'µm'

2. When I format a string representing a path to a file, if it contains  (er, "displays as") 'Á'  the file is not found (implying the character is real, not just a display thing) whereas if I strip that character, the file can then be found.

this is an odd one for sure.
0
 
LVL 86

Accepted Solution

by:
jkr earned 500 total points
ID: 38759976
I'd still check with the debugger what the actual strings are. Also, are you enclosing the resulting path in quotes? Since they'll contain spaces, these are required.
0
 

Author Comment

by:PMH4514
ID: 38810086
sorry for the delay.
I wasn't enclosing the resulting path in quotes!
my mistake, plus debugger code page weirdness is all it was I guess.
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

For a while now I'v been searching for a circular progress control, much like the one you get when first starting your Silverlight application. I found a couple that were written in WPF and there were a few written in Silverlight, but all appeared o…
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question