• C

find if a text file is UNICODE or ASCI

I have to scan a text file, but I don't
know in advance if the file is UNICODE
or ASCII.
Is there a way to knowing it (from
code)?
I mean, maybe UNICODE files have an
header or something similar...
If Notepad for Win NT can find it, there
must be a way...
Thanks
gpbaldazziAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jkrCommented:
Load the file into memory and use the Win32 API 'IsTextUnicode()' (from the docs):

DWORD IsTextUnicode( CONST LPVOID lpBuffer,
 // pointer to an input buffer to be examined
 
int cb,
 // the size in bytes of the input buffer
 
LPINT lpi
 // pointer to flags that condition text examination and receive results
 
);
 
The IsTextUnicode function determines whether a buffer probably contains a form of Unicode text. The function uses various statistical and deterministic methods to make its determination, under the control of flags passed via lpi. When the function returns, the results of such tests are reported via lpi. If all specified tests are passed, the function returns TRUE; otherwise, it returns FALSE.

If you don't want to load the whole file, use a reasonable amount of bytes, which must be dividable by 2.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jkrCommented:
BTW, Just as an addition: UICODE files don't have special headers, they're just 2 bytes per character...
0
gpbaldazziAuthor Commented:
The IsTextUnicode API seems to be what I need.
I saved a text file as Unicode (with Notepad for NT) and the first two bytes of the file are FF and FE: maybe all Unicode files have this sort of header? or is just a Notepad feature? If you know something about this, please tell me!
Anyway, thanks for your answer.
bye
GP
0
gpbaldazziAuthor Commented:
Bad news: IsTextUnicode works only under WinNT or Win 2000 (does this really exist?), it doesn't works for
win 95/98...
0
jkrCommented:
Sorry, I assumed you were talking about NT. Using UNICODE on Win9x doesn't make much sense either, as most of the APIs aren't supported...
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C

From novice to tech pro — start learning today.