Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 295
  • Last Modified:

Checking binary file for a string

Hi Guys I an trying to read in a binary file to a variable called buffer.
See code below
buffer is defined as char *
but when I debug code, buffer only seems to hold the first character from the file.
What I am trying to achieve is to read in the file search it for a particular string and when I find this string, read in the entry in the file which is x characters after the first found string
I know 'seekg' can read in characters from a given position.... but firstly I have to find the position of a particular string in the file.
The file is an SQL Server Profiler trace file.
Is there some other way I can do this - Is there any way I can serach the file without reading it in... or do I need to declare buffer in a different way.

Any ideas would be appreciated

Cheers


=====================================================
// readbinaryfile.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <iostream.h>
#include <fstream.h>
#include <string.h>

const char * filename = "C:\TenforeFeed.trc";
char * buffer;
long size;
char * pch;

int main(int argc, char* argv[])
{

ifstream file (filename, ios::in|ios::binary|ios::ate);
if(file.is_open())
{
size = file.tellg();
file.seekg (0, ios::beg);
buffer = new char [size];
file.read (buffer, size);


cout << buffer[3];

//delete[] buffer;
}

return 0;
}
========================================================
0
Barry Cunney
Asked:
Barry Cunney
  • 13
  • 12
  • 2
  • +1
1 Solution
 
AxterCommented:
You're missing '\' charactor in your code.
Should be the following:
const char * filename = "C:\\TenforeFeed.trc";
0
 
AxterCommented:
When ever you have a back slash for directory address in a string literal, you need to add an extra back slash.
If you don't the compiler will think you're adding a special charactor.
For example "\t" happens to be the special charctor for tab.
0
 
Barry CunneyAuthor Commented:
Hi Axter,
I have corrected the file path but as you correctly pointed out but the buffer variable still does not seem to contain all contents of the file
buffer[3] = 0 '' (in Watch window)

Am I taking the correct approach by first reading the binary file into a char* variable and then trying to search this variable using possibly using the 'substr' function to find instance of string in buffer.
If this is the right approach I need to figure out why buffer does not contain total content of binary file.

Or is there way to aproach searching a binary file for a string.


Cheers
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
AxterCommented:
You could try the following example:

const std::string filename = "C:\\TenforeFeed.trc";

int main(int, char*)
{
    ifstream                 is (filename.c_str());
    istream_iterator<char> ii (is);
    vector<char>           MyVecBuf;
    copy (ii, istream_iterator<char> (), back_inserter (MyVecBuf));

     std::string SearchStr = "main";
     vector<char>::iterator pos_i = std::search(MyVecBuf.begin(), MyVecBuf.end(), SearchStr.begin(), SearchStr.end());
     
     if (pos_i != MyVecBuf.end())
     {
          cout << "Found data = " << pos_i << endl;
     }
     
     system("pause");
     return 0;
}

0
 
AxterCommented:
If the file is small, I recommend you use the above approach.
If it's a big file, then you should use a non-STL method via reading the file one byte at a time, and comparing what you've read to what you're looking for.
0
 
Barry CunneyAuthor Commented:
Yes the file will vary in size depending on trace output - it will certainly be 20MB or more.
I will try the idea of reading in a byte at a time.
Basically this binary file is an SQL Server Profiler trace file(.trc) which is produced from a Profiler trace that monitors the work of a Stored Procedure 'I n i t V a l u e s _ P u t '
I want to find the  first instance of 'I n i t V a l u e s _ P u t' in the file and the read in the string for example 16 characters/bytes after this.

=======================================================
Snapshot from file
========================================================
d b o . I n i t V a l u e s _ P u t ; 1   N ' F O R E X ' ,   N ' U S D E S P ' ,   N ' S P O T ' ,   ' U S D E S P } 1 8 7 . 5 6 } 1 8 7 . 4 6 } '
 
   H        °                K   Pr                    
                   óD                              5                         Ò  
 - 7 €                            d b o . I n i t V a l u e s _ P u t ; 1   N ' F O R E X ' ,   N ' U S D G R D ' ,   N ' S P O T ' ,   ' U S D G R D } 3 8 4 . 1 2 } 3 8 3 . 9 } '
 
   H        °     
           K   Ø~                    
                   óD                              5                         Ò  
 - 7 €Ò  
 - 7 €           d b o . I n i t V a l u e s _ P u t ; 1   N
0
 
AxterCommented:
Here's some example code for the one byte read method:

const std::string filename = "C:\\TenforeFeed.trc";

bool ReadUntilKeyIsFound(std::ifstream &OpenedFile, const std::string KeyWord)
{
     vector<char> Buff(KeyWord.size()+1, 0);
     OpenedFile.read(&Buff[0], KeyWord.size());
     while(!OpenedFile.eof())
     {
          if (KeyWord == &Buff[0]) return true;
          std::copy(Buff.begin()+1, Buff.end(), Buff.begin());
          OpenedFile.read(&Buff[KeyWord.size()-1],1);
     }
     return false;
}

int main(int, char*)
{
    ifstream                 is (filename.c_str());
     std::string SearchStr = "main";
     if (ReadUntilKeyIsFound(is,SearchStr))
     {
          cout << "Found Keyword " << SearchStr << endl;
     }
     else
     {
          cout << "Didn't find " << SearchStr << endl;
     }
     system("pause");
     return 0;
}
0
 
Barry CunneyAuthor Commented:
Axeter,
Thanks for eaxample code for read method
It will give this a shot and see how I get on.
Coming to the end of my day here in Dublin Ireland so it may be tomorrow before I can fully try out example code.
Will respond then


Cheers

0
 
cookreCommented:
Uh, UNICODE, perchance?


0
 
DanRollinsCommented:
BCUNNY,
The reason you see the string as being short is because there is a binary 0 in it.  The debugger shows only the text up to a standard string terminator.  If you look at the variable via the Memory window, you will see all of the data.

Axter's code seems a little complicated.  I'd go with a simple memcmp... something like this:

#include "stdafx.h"
#include <fstream.h>
#include <string.h>

const char*    pszFilename= "C:\\temp\\binaryFile.bin"; // put your filename here
unsigned char* pBuffer;
long           nFileLen;
unsigned char  abDataToFind[]= { 0, 0x2e, 't', 'e', 'x' }; // put your search bytes here
int            nLenDataToFind= sizeof(abDataToFind);

void main()
{
     ifstream file ( pszFilename, ios::in|ios::binary|ios::ate);
     if( file.is_open() ) {
          nFileLen= file.tellg();
          file.seekg (0, ios::beg);
          pBuffer= new unsigned char [nFileLen];
          file.read ( pBuffer, nFileLen );
     }
     //----------- file is in memory now


     unsigned char* p= pBuffer;
     for (int j=0; j< nFileLen-nLenDataToFind; j++ ) {
          if ( memcmp( p, abDataToFind, nLenDataToFind ) == 0 ) {
               cout << "Found it at offset: " << j << endl;
               break;
          }
          p++;
     }
     
     delete[] pBuffer;
}

-- Dan
0
 
Barry CunneyAuthor Commented:
Hi Dan
Thanks for the code - It is more simple to follow than Axters(for someone just learning C++) - no offence Axter.

However Dan I still have a problem.
It works fine if I just put one search byte in the
'abDataToFind[]' array like below
unsigned char abDataToFind[]=  {'V'};

but if I do something like
unsigned char abDataToFind[]=  {'V','a','l','u'};

It doesn't return an offset

I know these bytes exist in the file(see snapshot above)

I wonder do I need to set up abDataToFind[]  in a different way.

As a test I ran the program for each character above, just putting one byte in abDataToFind[] each time
and the following offsets were returned
V 3916
a 3918
l 3920
u 3922

This is exactly as one would expect so I can't understand why when I put these 4 bytes in abDataToFind[] together why memcmp does not return 0 at some stage and in turn an offset returned.

I am fully happy that the whole file is being read into the buffer, I set up a test loop to send contents of pBuffer to console.

Cheers




0
 
Barry CunneyAuthor Commented:
Dan
Is p only ever pointing to one byte at a time?
Is this the problem?
0
 
AxterCommented:
BCUNNEY,
>>I am fully happy that the whole file is being read into
>>the buffer

I don't think this is a smart move if the file can be as big as you stated previously.

If a file is 20MB or more, you're going to be wasteing a lot of systems resources, and your code will be much slower.
0
 
Barry CunneyAuthor Commented:
Dan
Is p only ever pointing to one byte at a time?
Is this the problem?
0
 
AxterCommented:
>>Is p only ever pointing to one byte at a time?
p is pointing to more then one byte.  It's pointing to a string.
0
 
Barry CunneyAuthor Commented:
Dan
Is p only ever pointing to one byte at a time?
Is this the problem?
0
 
AxterCommented:
BCUNNEY,
You should avoid hitting the refresh button when using EE.
That's what is causing your double posting.
If you click on the RELOAD link in the top right corner of this page, you can avoid double posting.
0
 
Barry CunneyAuthor Commented:
Axeter
Re: Hitting Refresh
Will avoid hitting Refresh in future - did not realise this had effect of double posting

Cheers

BCunney
0
 
AxterCommented:
>>Will avoid hitting Refresh in future - did not realise >>this had effect of double posting
This is a common problem.
I don't understand why EE hasn't change their web site code so it can detect double posting.
If you use codeguru, it can detect double posting.
0
 
Barry CunneyAuthor Commented:
Axeter,
I didn't mean to totally ignore your example code above.
I am just learning C++ and Dan Rollins code looked more familiar to what I have been learning.
Vector means nothing to me - not yet
Also when I try to compile your sample code I get the following errors which relate to the line

const std::string filename = "C:\\TenforeFeed.trc";

What include files do I need or are there any other settings I need
I am using MS Visual C++ Stadard Edition.

D:\C++\testAxeterscode\testAxeterscode.cpp(7) : error C2653: 'std' : is not a class or namespace name
D:\C++\testAxeterscode\testAxeterscode.cpp(7) : error C2146: syntax error : missing ';' before identifier 'filename'
D:\C++\testAxeterscode\testAxeterscode.cpp(7) : error C2734: 'string' : const object must be initialized if not extern
D:\C++\testAxeterscode\testAxeterscode.cpp(7) : fatal error C1004: unexpected end of file found

Cheers


BCunney






0
 
AxterCommented:
>>I am just learning C++ and Dan Rollins code looked more
>>familiar to what I have been learning.
Don't worry, I'm not take offense to you using Dan method.
I just think you should use a method that is not going to read the entire contents into memory.
I figured since you were uinsg fstream, that you were familiar with STL, and preferred the STL method.

Here's a complete example with all the required header files.
#include "stdafx.h"

#pragma warning (disable:4786)
#include <iostream>
#include <fstream>
#include <string>
#include <vector>

using namespace std;

const std::string filename = "C:\\TenforeFeed.trc";

bool ReadUntilKeyIsFound(std::ifstream &OpenedFile, const std::string KeyWord, int &PositionFound)
{
     int Position = KeyWord.size();
     vector<char> Buff(KeyWord.size()+1, 0);
     OpenedFile.read(&Buff[0], KeyWord.size());
     while(!OpenedFile.eof())
     {
          if (KeyWord == &Buff[0])
          {
               PositionFound = Position - KeyWord.size();
               return true;
          }
          std::copy(Buff.begin()+1, Buff.end(), Buff.begin());
          OpenedFile.read(&Buff[KeyWord.size()-1],1);
          ++Position;
     }
     return false;
}

int main(int, char*)
{
    ifstream                 is (filename.c_str());
     std::string SearchStr = "main";
     int PosFound = 0;
     if (ReadUntilKeyIsFound(is,SearchStr,PosFound))
     {
          cout << "Found Keyword at position " << PosFound << endl;
     }
     else
     {
          cout << "Didn't find " << SearchStr << endl;
     }
     system("pause");
     return 0;
}
0
 
cookreCommented:
The sample data you posted shows blanks between the characters.  Those may very well have been binary zeros, i.e., the data you're scanning looks to be in UNICODE - two bytes per character with the second byte (the most significant in the little-endian Intel world) being zero.

E.g., "dbo.init" would appear in memory as 'd',0,'b',0,'o',0,'.',0,'i',0,...

That's also why your individual character offsets where tyo bytes apart.

If you change your abDataToFind initialization to:
'V','\0','a','\0'.'l','\0','u','\0'

the single byte oriented memcmp will find it.

0
 
AxterCommented:
Here's a non STL version of the code.

#include "stdafx.h"

#include <fstream>  //Still using ifstream
using namespace std;


bool ReadUntilKeyIsFound(std::ifstream &OpenedFile, const unsigned char*  KeyWord, int SizeOfKeyWord, int &PositionFound)
{
     int Position = SizeOfKeyWord;
     unsigned char *Buff = new unsigned char[SizeOfKeyWord];
     OpenedFile.read((char*)Buff, SizeOfKeyWord);
     while(!OpenedFile.eof())
     {
          if (!memcmp(KeyWord, Buff, SizeOfKeyWord))
          {
               delete []Buff;
               PositionFound = Position - SizeOfKeyWord;
               return true;
          }
          memmove(Buff, Buff + 1, SizeOfKeyWord-1);
          OpenedFile.read((char*)Buff+SizeOfKeyWord-1,1);
          ++Position;
     }
     delete []Buff;
     return false;
}

const char* filename = "C:\\TenforeFeed.trc";

int main(int, char*)
{
    ifstream  is (filename);
     const char* SearchStr = "include";
     int PosFound = 0;
     if (ReadUntilKeyIsFound(is,(unsigned char *)SearchStr, strlen(SearchStr),PosFound))
     {
          printf("Found Keyword at position %i\n", PosFound);
     }
     else
     {
          printf("Didn't find %s", SearchStr);
     }
     
     system("pause");
     return 0;
}
0
 
Barry CunneyAuthor Commented:
Well Done Cookre
I had just started to delve into possibilities of UNICODE - was not ignoring your first comment.

It's just that I had previously read in similar file(i.e. SQL Server Profiler Trace file - .trc) as Binary in Visual Basic and was able to use the VB binary string compare functions to achieve what I wanted.
I just presumed I should be able to do something similar in C++.
I had just proved that the file was the variable factor.
Just before you had posted your second comment I had tested DanRollins and Axeters suggested code with a text file that I manually created myself and both examples worked perfectly.
I changed abDataToFind initialization to:
'V','\0','a','\0'.'l','\0','u','\0' as you suggested and it works perfectly.
I possibly should have copped on to this myself.

Cheers

BCunney






 
0
 
Barry CunneyAuthor Commented:
DanRollins, Axeter, Cookre
I think all of you made very good contributions so here is the deal.

I am going to accept DanRollins solution but I am subsequently going to post dummy questions
Points for Axeter
Points for Cookre

(50 points each)
to award points to you Axeter and Cookre as well as Dan Rollins.
Let me know if you are happy with this deal
0
 
AxterCommented:
>>Let me know if you are happy with this deal
Sounds good to me.

Was your file a UNICODE file?
0
 
Barry CunneyAuthor Commented:
Axter,
Yes it was a UNICODE file.
but I had read in similar file(Profiler  Trace .trc) before in VB as binary(Open as Binary)
and was in this tunnel of thought.
I should have paid more attention to type of file and content. I wasn't fully clued in to the fact that it was UNICODE.


0
 
Barry CunneyAuthor Commented:
Thanks a million Dan
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

  • 13
  • 12
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now