Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Read Unicode string from text file into CString

Posted on 2013-01-04
23
Medium Priority
?
681 Views
Last Modified: 2013-01-23
I have a text file which was created with a C# program like this:
             using (StreamWriter sw = new StreamWriter(defaultFile, false))
                {
                    sw.WriteLine(profile_name);
                }      

Where profile_name = 10.67µm steps to 277.42µm

This file created has a single line that reads as follows when I open it in Notepad:
10.67µm steps to 277.42µm

Note the µ character.

I have a C++ (Visual Studio 2010) project that needs to read this line into a CString, but I always get garbage using CStdioFile.

I even tried using that CStdioFileEx  (http://www.codeproject.com/Articles/4119/CStdioFile-derived-class-for-multibyte-and-Unicode) but still I get only junk.

What is the proper way to read this value into a C++ CString?

Thanks
0
Comment
Question by:PMH4514
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 12
  • 10
23 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 38745166
Are you sure you are writing teh file as UNICODE?  The 'µ' character is not specific for that, it can also be used in ASCII. Can you try to explicitly open the file as either UNICODE or ASCII using Notepad (you can choose the encoding in the 'File Open' dialog)?
0
 

Author Comment

by:PMH4514
ID: 38745270
Hi jkr -

I have assumed the  'µ' character was unicode, perhaps that is my first problem.
I tried your suggestion, choosing UNICODE in Notepad during File Open and as Unicode, it appears completely wrong (as I will attempt to paste below)

¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿

So given the text file clearly isn't actually Unicode, why then does my attempt to read the line fail?

When I try the following:

CString sLine = _T("");
CStdioFile file;
if (file.Open(szFilePath, CFile::modeRead))
{
      file.ReadString(strLine);
}

I end up with:

strLine = "10.67µm steps to 277.42µm"
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745330
Is your project set toUNICODE (which I assume, since it is the default)? Try setting it to ASCII switching "Project Properties|C/C++|General|Use Character Set" from UNICODE to "Multi-Byte".
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:PMH4514
ID: 38745392
My project is UNICODE. I couldn't set it to MBCS and recompile/run because other libraries in use by it (unrelated to this query) require UNICODE.
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745399
OK, so why not using STL's file I/O for that purpose, since it allows you to explicitly open and read ANSI/UNICODE files? E.g.

#include <fstream>
#include <string>

std::string sLine;

std::ifstream is("file.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

getline(is,sLine);

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
ID: 38745411
BTW, another - maybe easier - option would be to ensure that the C# project writes UNICODE ;o)
0
 

Author Comment

by:PMH4514
ID: 38745546
The ifstream version produces exactly the same result

sLine = "10.67µm steps to 277.42µm"

The C# side has other dependencies I'd rather not open that box.
0
 

Author Comment

by:PMH4514
ID: 38745563
I can "hack" a fix:

strLine.Replace(_T("Â"), _T(""));

and get the line I need, but it's a hack and I don't understand it, so I don't like it :)
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745572
Hmm, OK - what is 'Â' in hexadecimal? And, do you find the same value in the file in question qhen you open that with a hex editor?
0
 

Author Comment

by:PMH4514
ID: 38745627
If I type  into a new file, and view it in Hex mode,  it reads C2
(also verified C2 using the little converter about halfway down this page: http://www.thehistoryprofessor.us/bin/header/ascii.html)

When I view the text file in question in hex mode, I do not see C2 anywhere.
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745668
Well, that's right, but what does it evaluate to in your case when you see it in the debugger?
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745671
Of, even better: Could you attach the file to this thread?
0
 

Author Comment

by:PMH4514
ID: 38745688
I'm attaching the file.

In the debugger when I roll over the line, and right click and choose Hexadecimal display, nothing changes.
0
 

Author Comment

by:PMH4514
ID: 38745691
woops, didn't attach to previous...
1542.txt
0
 

Author Comment

by:PMH4514
ID: 38745696
I tried making a new text file with notepad, and typed exactly the same values in, copying and pasting 'µ' from character map, and then saving (just to take the C# project out of the equation.)

the C++ attempt to read it still produces the same result.
0
 
LVL 86

Expert Comment

by:jkr
ID: 38745741
OK, *now* it is getting weird. Just using the following code:

#include <fstream>
#include <string>
#include <iostream>
using namespace std;

int main () {

  string sLine;

  ifstream is("1542.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

  getline(is,sLine);

  cout << sLine << endl;
}

Open in new window


With VC++, I get the same result that you get:

10.67Ám steps to 277.42Ám

Open in new window


Using g++, that is

10.67µm steps to 277.42µm

Open in new window


The VC++ debugger correctly shows that as

           [5]      0xb5 'µ'      char

And when I try the same using

#include <windows.h>

#include <fstream>
#include <string>
#include <iostream>
using namespace std;

int main () {

  string sLine;

  ifstream is("1542.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

  getline(is,sLine);

  //cout << sLine << endl;
  MessageBox(NULL,sLine.c_str(),"Test",MB_OK);
}
                                            

Open in new window


I get the attached message box with the result you can see - the correct one. Seems that all we have here is a console codepage issue, probably not even worth to bother ;o)
1542.png
0
 

Author Comment

by:PMH4514
ID: 38745929
Weird indeed!
Unfortunately, my applied problem seems to go beyond a console/debugger codepage issue because I need to use the string to format a path to an actual datafile to load.

For example, I may have several "profile" files (which are just CSV text given an extension of .profile rather than .txt)

c:\profiles\1µm steps to 10µm.profile
c:\profiles\2µm steps to 11µm.profile
c:\profiles\3µm steps to 12µm.profile
c:\profiles\4µm steps to 13µm.profile
c:\profiles\10.67µm steps to 277.42µm.profile  
etc..


The file we are reading (the one attached earlier) holds as its first and only line, the name of the default datafile to load. So I'm using the string I read in to form a fully qualified path after reading sLine:

CString sPath = _T("");
sPath.Format(_T("c:\\profiles\\%s.profile"), sLine);

The problem then is I end up with this path:
sPath = _T("c:\\profiles\\10.67µm steps to 277.42µm.profile")

Which is not a file that exists.

I check for existence like:

BOOL FileExists(CString path)
{
   CFileStatus status;
   return CFile::GetStatus( path, status );  
}

If I use my earlier described "hack" to strip out that 'Â' character, in order to check for and open the file at:

sPath = _T("c:\\profiles\\10.67µm steps to 277.42µm.profile")

everything works as expected (that is, yes they are weird filenames, but there is no inherent issue opening and reading from them)

Thanks for all your attention!
0
 
LVL 86

Expert Comment

by:jkr
ID: 38747385
That's even more odd, since when checking with the debugger, there was no such character at all :-/
0
 

Author Comment

by:PMH4514
ID: 38747402
I guess sometimes hacks have their place :)
0
 
LVL 35

Expert Comment

by:sarabande
ID: 38752119
the Ám is a typical output for an utf-8 character that was shown by a program that could not handle UTF-8 (such as notepad or windows command interpreter).

the vs editor (and debugger) can handle utf-8 and would silently show the appropriate ansi character (if any). you could verify that i was right by opening the file in visual studio with the hex editor (use the drop-down box at the open button in the open file dialog).

You can recognize utf8 characters by their prefix code which is not printable in ascii.

the common utf-8 characters have 2 bytes and would begin with hex c2, c3, ...

the µ has utf-8 code sequence "CEBC" which you should find in the hex table if it is utf-8.

Sara
0
 

Author Comment

by:PMH4514
ID: 38758805
Thank you Sara.
Your comments imply, as JKR had suggested, that the issue is merely the display of the character within the IDE. But this doesn't seem to be the case, given that:

1. I can strip/replace the 'Á' character and be left with the IDE properly displaying 'µm'

2. When I format a string representing a path to a file, if it contains  (er, "displays as") 'Á'  the file is not found (implying the character is real, not just a display thing) whereas if I strip that character, the file can then be found.

this is an odd one for sure.
0
 
LVL 86

Accepted Solution

by:
jkr earned 2000 total points
ID: 38759976
I'd still check with the debugger what the actual strings are. Also, are you enclosing the resulting path in quotes? Since they'll contain spaces, these are required.
0
 

Author Comment

by:PMH4514
ID: 38810086
sorry for the delay.
I wasn't enclosing the resulting path in quotes!
my mistake, plus debugger code page weirdness is all it was I guess.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What my article will show is if you ever had to do processing to a listbox without being able to just select all the items in it. My software Visual Studio 2008 crystal report v11 My issue was I wanted to add crystal report to a form and show…
For most people, the WrapPanel seems like a magic when they switch from WinForms to WPF. Most of us will think that the code that is used to write a control like that would be difficult. However, most of the work is done by the WPF engine, and the W…
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question