Avatar of UdiRaz
UdiRaz asked on

how to read the unicode for text in a unicode txt file?

Hi,
i am probably missing something here. I have a unicode txt file with the following text : " a b 1 Ð -  0 " and I would like to know what is the unicode of each characters. I tried two ways, What am I doing wrong ?

the result buffer show the unicode of the characters 'a', 'b', '1' and '-' correct but to the other characters.
My project is a unicode project.

Please advice

Thanks,

Udi Raz
1.
TCHAR buffer[150];
CFile file;
file.Open( _T("c:\\aaa.txt"), CFile::modeRead );
file.Read( buffer, 300 );
 
2.
TCHAR buffer[150];
std::wifstream f( _T("c:\\aaa.txt") );
f.getline(buffer,sizeof(buffer));

Open in new window

System ProgrammingC++

Avatar of undefined
Last Comment
UdiRaz

8/22/2022 - Mon
trinitrotoluene

LordOfPorts

If you check the size of TCHAR, i.e. sizeof(TCHAR); what is the output on your end?
HawyLem

Using streams will create you some more problem rather than using normal (and surely more portable) fopen, fread.

Use them in binary mode to read unicode strings (be careful! Unicode char size is platform-dependent), I think that should solve your problem
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
jkr

A "std::wifstream" constructor (http://msdn.microsoft.com/en-us/library/zek0beca(VS.80).aspx) still takes a 'char*' as the file name argument, not a 'wchar_t*' - use
TCHAR buffer[150];
std::wifstream f("c:\\aaa.txt");
f.getline(buffer,sizeof(buffer));

Open in new window

ASKER
UdiRaz

jkr : The fact that I called "std::wifstream" constructor with a wide string didn't make any difference. The txt file was opened and I could read the english text. Other then that, my code is excalty like the code you offer and it still doesn't read Hindi and hebrew characters.

Oh, I just saw that the unicode characters I pasted to my orig question don't display properly, i guess I cant type unicode letter here. After the letters a, b, 1 I typed letters in hebrew and in Hindi...
ASKER
UdiRaz

Ok, there is no bug in the code, there is a bug in my understanding.

I Made a test : I cretaed a txt file and tried to write a Hindi character. The unicode of the Hindi character is 0x94d. It didn't worked. Then I changed my character and tried to write 'b'. It worked.

I took the original txt file, the one that contains English, Hebrew and Hindi characters and change it suffix to .bin so I can view its code with the visual c++. It appears that my method reads it well, but why it's not the unicode values I am looking for?

Maybe it is saves different, may a coge page is involve here?

Any suggestions ?

std::wofstream f( "c:\\b-out.txt",std::ios::out|std::ios::app );
//TCHAR tmp = (TCHAR)0x94d;
TCHAR tmp = (TCHAR)0x62;
ff << tmp;
ff.flush();
ff.close();

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER CERTIFIED SOLUTION
LordOfPorts

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
ASKER
UdiRaz

Saving to a file was only a test to try and understand what is going on.Since I changed the txt file to a bin file, dragged it to my visual and saw the same codes as I read using wifstream, my original question is no relevant. The problem is not with the way I tried to read the text but with the way the text is represented on a file.

I will open a new question.

Thanks

ASKER
UdiRaz

Ok, found the site : http://www.microsoft.com/globaldev/reference/keyboards.mspx

1. There is not Hindi Remington.
2. The map you sent me does not match remington nor inscript ( something like shift inscript )