Link to home
Start Free TrialLog in
Avatar of sjm
sjm

asked on

Checking type of File Data

I am trying to find a way to check if the data in a file is binary or text.  I tried to use CFile Open with CFile::typeText as a flag, but I get an exception and informed that this flag is not supported.  Can someone please tell me how I can go into a file pick a portion of the data and verify that the data is text data or binary data.  I need to be able to verify the portion picked is not text data. When I go into MSVC and open the file it tells me weather or not this data is text, so there must be a way.  Please supply the code needed to do this.
Thanks
Avatar of Srw
Srw

All files are just byte streams.  There really is no Text vs. Binary.  Those file opening modes just refer to how whitespace and CR/LF pairs are handled.

A file's type (text / binary) is a more or less an arbitrary distinction.  Your requirements determine if the file is "binary"  So, why must you have a "binary" file?  What is it about a "text" file that is bad for you?
ASKER CERTIFIED SOLUTION
Avatar of chensu
chensu
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi sjm:  My thoughts on this topic are as follows:

Each Byte of text data always has values less than 128.
Text only uses the low seven bits of an eight bit byte;
while binary data uses all eight bits and binary bytes
can have values as high as 255. Thus you can determine
a file contains binary and not text if it has byte values
in excess of 127 in it. You cannot determine a file is
text and not binary (because text is basically a subset of binary) and just because a file contains no byte values
greater than 127 does not mean it is not a binary file,
although it could certainly be considered a text file.

The code to check would be as follows:

CString strFileName = "c:\TestFile";

//CREATE A CFile OBJECT TO OPEN THE FILE
CFile* pFile;
TRY
{
      pFile = new CFile(
            strFileName,
            CFile::modeRead | CFile::shareDenyWrite );
}
CATCH( CFileException, e )
{
      if( e->m_cause != CFileException::none )
      MessageBox( "Cannot open this file." , "FILE ACCESS ERROR" , MB_OK );
      return;
}
END_CATCH

//ALLOCATE MEMORY TO READ THE FILE
DWORD dwReadSize = pFile->GetLength();
HANDLE hFile = GlobalAlloc(GMEM_FIXED, dwReadSize0);
if( !hFile )
{
      MessageBox("Unable to allocate RAM to read the file.", "MEMORY ALLOCATION ERROR", MB_OK);
      delete pFile;
      return;
}
BYTE* pbyFile = (BYTE*)hFile;

//READ THE FILE
UINT nBytesRead = pFile->Read(pbyFiler, dwReadSize);
if( nBytesRead != dwReadSize )
{
      MessageBox("Cannot read this file.", "FILE READ ERROR", MB_OK);
      delete pFile;
      GlobalFree(hFIle);
      return;
}

//TEST FOR A BINARY FILE
BOOL bFileIsText = TRUE;
for(DWORD; i<dwReadSize; i++)
{
      if(pbyFile[i] > 127)
      {
            bFileIsText = FALSE;
            break;
      }
}

//AT THIS POINT bFileIsText WILL TELL TYPE OF FILE

//CLEAN UP
delete pFile;
GlobalFree(hFie);

Hope that helps out.

The text files WBerthin is referring to are pure ASCII English text files. It does not apply to the text files in other languages.
Avatar of sjm

ASKER

Thanks for all the input but the points go to WBerthin.  He gave me the code to find the answer.  WBerthin please send another comment in so I can give you the points.

I am pleased if my code helped ...
BUT I think the points belong to chensu!
I just sent a comment on his locked question,
to help out if I could.
Maybe sometime chensu will help out one of my answers.
sjm:
If you would like to give the points to WBerthin, you should reopen the question and ask WBerthin to answer it. Otherwise, it is still locked.