Solved

Read Unicode string from text file into CString

Posted on 2013-01-04
23
635 Views
Last Modified: 2013-01-23
I have a text file which was created with a C# program like this:
             using (StreamWriter sw = new StreamWriter(defaultFile, false))
                {
                    sw.WriteLine(profile_name);
                }      

Where profile_name = 10.67µm steps to 277.42µm

This file created has a single line that reads as follows when I open it in Notepad:
10.67µm steps to 277.42µm

Note the µ character.

I have a C++ (Visual Studio 2010) project that needs to read this line into a CString, but I always get garbage using CStdioFile.

I even tried using that CStdioFileEx  (http://www.codeproject.com/Articles/4119/CStdioFile-derived-class-for-multibyte-and-Unicode) but still I get only junk.

What is the proper way to read this value into a C++ CString?

Thanks
0
Comment
Question by:PMH4514
  • 12
  • 10
23 Comments
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Are you sure you are writing teh file as UNICODE?  The 'µ' character is not specific for that, it can also be used in ASCII. Can you try to explicitly open the file as either UNICODE or ASCII using Notepad (you can choose the encoding in the 'File Open' dialog)?
0
 

Author Comment

by:PMH4514
Comment Utility
Hi jkr -

I have assumed the  'µ' character was unicode, perhaps that is my first problem.
I tried your suggestion, choosing UNICODE in Notepad during File Open and as Unicode, it appears completely wrong (as I will attempt to paste below)

¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿

So given the text file clearly isn't actually Unicode, why then does my attempt to read the line fail?

When I try the following:

CString sLine = _T("");
CStdioFile file;
if (file.Open(szFilePath, CFile::modeRead))
{
      file.ReadString(strLine);
}

I end up with:

strLine = "10.67µm steps to 277.42µm"
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Is your project set toUNICODE (which I assume, since it is the default)? Try setting it to ASCII switching "Project Properties|C/C++|General|Use Character Set" from UNICODE to "Multi-Byte".
0
 

Author Comment

by:PMH4514
Comment Utility
My project is UNICODE. I couldn't set it to MBCS and recompile/run because other libraries in use by it (unrelated to this query) require UNICODE.
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
OK, so why not using STL's file I/O for that purpose, since it allows you to explicitly open and read ANSI/UNICODE files? E.g.

#include <fstream>
#include <string>

std::string sLine;

std::ifstream is("file.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

getline(is,sLine);

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
BTW, another - maybe easier - option would be to ensure that the C# project writes UNICODE ;o)
0
 

Author Comment

by:PMH4514
Comment Utility
The ifstream version produces exactly the same result

sLine = "10.67µm steps to 277.42µm"

The C# side has other dependencies I'd rather not open that box.
0
 

Author Comment

by:PMH4514
Comment Utility
I can "hack" a fix:

strLine.Replace(_T("Â"), _T(""));

and get the line I need, but it's a hack and I don't understand it, so I don't like it :)
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Hmm, OK - what is 'Â' in hexadecimal? And, do you find the same value in the file in question qhen you open that with a hex editor?
0
 

Author Comment

by:PMH4514
Comment Utility
If I type  into a new file, and view it in Hex mode,  it reads C2
(also verified C2 using the little converter about halfway down this page: http://www.thehistoryprofessor.us/bin/header/ascii.html)

When I view the text file in question in hex mode, I do not see C2 anywhere.
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Well, that's right, but what does it evaluate to in your case when you see it in the debugger?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 86

Expert Comment

by:jkr
Comment Utility
Of, even better: Could you attach the file to this thread?
0
 

Author Comment

by:PMH4514
Comment Utility
I'm attaching the file.

In the debugger when I roll over the line, and right click and choose Hexadecimal display, nothing changes.
0
 

Author Comment

by:PMH4514
Comment Utility
woops, didn't attach to previous...
1542.txt
0
 

Author Comment

by:PMH4514
Comment Utility
I tried making a new text file with notepad, and typed exactly the same values in, copying and pasting 'µ' from character map, and then saving (just to take the C# project out of the equation.)

the C++ attempt to read it still produces the same result.
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
OK, *now* it is getting weird. Just using the following code:

#include <fstream>
#include <string>
#include <iostream>
using namespace std;

int main () {

  string sLine;

  ifstream is("1542.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

  getline(is,sLine);

  cout << sLine << endl;
}

Open in new window


With VC++, I get the same result that you get:

10.67Ám steps to 277.42Ám

Open in new window


Using g++, that is

10.67µm steps to 277.42µm

Open in new window


The VC++ debugger correctly shows that as

           [5]      0xb5 'µ'      char

And when I try the same using

#include <windows.h>

#include <fstream>
#include <string>
#include <iostream>
using namespace std;

int main () {

  string sLine;

  ifstream is("1542.txt"); // hard-code that for testing purposes, we might have to do a conversion from UNICODE here

  getline(is,sLine);

  //cout << sLine << endl;
  MessageBox(NULL,sLine.c_str(),"Test",MB_OK);
}
                                            

Open in new window


I get the attached message box with the result you can see - the correct one. Seems that all we have here is a console codepage issue, probably not even worth to bother ;o)
1542.png
0
 

Author Comment

by:PMH4514
Comment Utility
Weird indeed!
Unfortunately, my applied problem seems to go beyond a console/debugger codepage issue because I need to use the string to format a path to an actual datafile to load.

For example, I may have several "profile" files (which are just CSV text given an extension of .profile rather than .txt)

c:\profiles\1µm steps to 10µm.profile
c:\profiles\2µm steps to 11µm.profile
c:\profiles\3µm steps to 12µm.profile
c:\profiles\4µm steps to 13µm.profile
c:\profiles\10.67µm steps to 277.42µm.profile  
etc..


The file we are reading (the one attached earlier) holds as its first and only line, the name of the default datafile to load. So I'm using the string I read in to form a fully qualified path after reading sLine:

CString sPath = _T("");
sPath.Format(_T("c:\\profiles\\%s.profile"), sLine);

The problem then is I end up with this path:
sPath = _T("c:\\profiles\\10.67µm steps to 277.42µm.profile")

Which is not a file that exists.

I check for existence like:

BOOL FileExists(CString path)
{
   CFileStatus status;
   return CFile::GetStatus( path, status );  
}

If I use my earlier described "hack" to strip out that 'Â' character, in order to check for and open the file at:

sPath = _T("c:\\profiles\\10.67µm steps to 277.42µm.profile")

everything works as expected (that is, yes they are weird filenames, but there is no inherent issue opening and reading from them)

Thanks for all your attention!
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
That's even more odd, since when checking with the debugger, there was no such character at all :-/
0
 

Author Comment

by:PMH4514
Comment Utility
I guess sometimes hacks have their place :)
0
 
LVL 32

Expert Comment

by:sarabande
Comment Utility
the Ám is a typical output for an utf-8 character that was shown by a program that could not handle UTF-8 (such as notepad or windows command interpreter).

the vs editor (and debugger) can handle utf-8 and would silently show the appropriate ansi character (if any). you could verify that i was right by opening the file in visual studio with the hex editor (use the drop-down box at the open button in the open file dialog).

You can recognize utf8 characters by their prefix code which is not printable in ascii.

the common utf-8 characters have 2 bytes and would begin with hex c2, c3, ...

the µ has utf-8 code sequence "CEBC" which you should find in the hex table if it is utf-8.

Sara
0
 

Author Comment

by:PMH4514
Comment Utility
Thank you Sara.
Your comments imply, as JKR had suggested, that the issue is merely the display of the character within the IDE. But this doesn't seem to be the case, given that:

1. I can strip/replace the 'Á' character and be left with the IDE properly displaying 'µm'

2. When I format a string representing a path to a file, if it contains  (er, "displays as") 'Á'  the file is not found (implying the character is real, not just a display thing) whereas if I strip that character, the file can then be found.

this is an odd one for sure.
0
 
LVL 86

Accepted Solution

by:
jkr earned 500 total points
Comment Utility
I'd still check with the debugger what the actual strings are. Also, are you enclosing the resulting path in quotes? Since they'll contain spaces, these are required.
0
 

Author Comment

by:PMH4514
Comment Utility
sorry for the delay.
I wasn't enclosing the resulting path in quotes!
my mistake, plus debugger code page weirdness is all it was I guess.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Visual Fox Pro commands 15 27
Compile GLUT with Visual Studio 2015 1 81
T-SQL Debugging - Temp Object Content 2 49
Problem to packaging 1 73
After several hours of googling I could not gather any information on this topic. There are several ways of controlling the USB port connected to any storage device. The best example of that is by changing the registry value of "HKEY_LOCAL_MACHINE\S…
A theme is a collection of property settings that allow you to define the look of pages and controls, and then apply the look consistently across pages in an application. Themes can be made up of a set of elements: skins, style sheets, images, and o…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now