Solved

Checking binary file for a string

Posted on 2002-04-23
28
278 Views
Last Modified: 2013-11-15
Hi Guys I an trying to read in a binary file to a variable called buffer.
See code below
buffer is defined as char *
but when I debug code, buffer only seems to hold the first character from the file.
What I am trying to achieve is to read in the file search it for a particular string and when I find this string, read in the entry in the file which is x characters after the first found string
I know 'seekg' can read in characters from a given position.... but firstly I have to find the position of a particular string in the file.
The file is an SQL Server Profiler trace file.
Is there some other way I can do this - Is there any way I can serach the file without reading it in... or do I need to declare buffer in a different way.

Any ideas would be appreciated

Cheers


=====================================================
// readbinaryfile.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <iostream.h>
#include <fstream.h>
#include <string.h>

const char * filename = "C:\TenforeFeed.trc";
char * buffer;
long size;
char * pch;

int main(int argc, char* argv[])
{

ifstream file (filename, ios::in|ios::binary|ios::ate);
if(file.is_open())
{
size = file.tellg();
file.seekg (0, ios::beg);
buffer = new char [size];
file.read (buffer, size);


cout << buffer[3];

//delete[] buffer;
}

return 0;
}
========================================================
0
Comment
Question by:Barry Cunney
  • 13
  • 12
  • 2
  • +1
28 Comments
 
LVL 30

Expert Comment

by:Axter
ID: 6963435
You're missing '\' charactor in your code.
Should be the following:
const char * filename = "C:\\TenforeFeed.trc";
0
 
LVL 30

Expert Comment

by:Axter
ID: 6963440
When ever you have a back slash for directory address in a string literal, you need to add an extra back slash.
If you don't the compiler will think you're adding a special charactor.
For example "\t" happens to be the special charctor for tab.
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6963473
Hi Axter,
I have corrected the file path but as you correctly pointed out but the buffer variable still does not seem to contain all contents of the file
buffer[3] = 0 '' (in Watch window)

Am I taking the correct approach by first reading the binary file into a char* variable and then trying to search this variable using possibly using the 'substr' function to find instance of string in buffer.
If this is the right approach I need to figure out why buffer does not contain total content of binary file.

Or is there way to aproach searching a binary file for a string.


Cheers
0
 
LVL 30

Expert Comment

by:Axter
ID: 6963671
You could try the following example:

const std::string filename = "C:\\TenforeFeed.trc";

int main(int, char*)
{
    ifstream                 is (filename.c_str());
    istream_iterator<char> ii (is);
    vector<char>           MyVecBuf;
    copy (ii, istream_iterator<char> (), back_inserter (MyVecBuf));

     std::string SearchStr = "main";
     vector<char>::iterator pos_i = std::search(MyVecBuf.begin(), MyVecBuf.end(), SearchStr.begin(), SearchStr.end());
     
     if (pos_i != MyVecBuf.end())
     {
          cout << "Found data = " << pos_i << endl;
     }
     
     system("pause");
     return 0;
}

0
 
LVL 30

Expert Comment

by:Axter
ID: 6963684
If the file is small, I recommend you use the above approach.
If it's a big file, then you should use a non-STL method via reading the file one byte at a time, and comparing what you've read to what you're looking for.
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6963713
Yes the file will vary in size depending on trace output - it will certainly be 20MB or more.
I will try the idea of reading in a byte at a time.
Basically this binary file is an SQL Server Profiler trace file(.trc) which is produced from a Profiler trace that monitors the work of a Stored Procedure 'I n i t V a l u e s _ P u t '
I want to find the  first instance of 'I n i t V a l u e s _ P u t' in the file and the read in the string for example 16 characters/bytes after this.

=======================================================
Snapshot from file
========================================================
d b o . I n i t V a l u e s _ P u t ; 1   N ' F O R E X ' ,   N ' U S D E S P ' ,   N ' S P O T ' ,   ' U S D E S P } 1 8 7 . 5 6 } 1 8 7 . 4 6 } '
 
   H        °                K   Pr                    
                   óD                              5                         Ò  
 - 7 €                            d b o . I n i t V a l u e s _ P u t ; 1   N ' F O R E X ' ,   N ' U S D G R D ' ,   N ' S P O T ' ,   ' U S D G R D } 3 8 4 . 1 2 } 3 8 3 . 9 } '
 
   H        °     
           K   Ø~                    
                   óD                              5                         Ò  
 - 7 €Ò  
 - 7 €           d b o . I n i t V a l u e s _ P u t ; 1   N
0
 
LVL 30

Expert Comment

by:Axter
ID: 6963748
Here's some example code for the one byte read method:

const std::string filename = "C:\\TenforeFeed.trc";

bool ReadUntilKeyIsFound(std::ifstream &OpenedFile, const std::string KeyWord)
{
     vector<char> Buff(KeyWord.size()+1, 0);
     OpenedFile.read(&Buff[0], KeyWord.size());
     while(!OpenedFile.eof())
     {
          if (KeyWord == &Buff[0]) return true;
          std::copy(Buff.begin()+1, Buff.end(), Buff.begin());
          OpenedFile.read(&Buff[KeyWord.size()-1],1);
     }
     return false;
}

int main(int, char*)
{
    ifstream                 is (filename.c_str());
     std::string SearchStr = "main";
     if (ReadUntilKeyIsFound(is,SearchStr))
     {
          cout << "Found Keyword " << SearchStr << endl;
     }
     else
     {
          cout << "Didn't find " << SearchStr << endl;
     }
     system("pause");
     return 0;
}
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6963771
Axeter,
Thanks for eaxample code for read method
It will give this a shot and see how I get on.
Coming to the end of my day here in Dublin Ireland so it may be tomorrow before I can fully try out example code.
Will respond then


Cheers

0
 
LVL 22

Expert Comment

by:cookre
ID: 6964483
Uh, UNICODE, perchance?


0
 
LVL 49

Accepted Solution

by:
DanRollins earned 50 total points
ID: 6965106
BCUNNY,
The reason you see the string as being short is because there is a binary 0 in it.  The debugger shows only the text up to a standard string terminator.  If you look at the variable via the Memory window, you will see all of the data.

Axter's code seems a little complicated.  I'd go with a simple memcmp... something like this:

#include "stdafx.h"
#include <fstream.h>
#include <string.h>

const char*    pszFilename= "C:\\temp\\binaryFile.bin"; // put your filename here
unsigned char* pBuffer;
long           nFileLen;
unsigned char  abDataToFind[]= { 0, 0x2e, 't', 'e', 'x' }; // put your search bytes here
int            nLenDataToFind= sizeof(abDataToFind);

void main()
{
     ifstream file ( pszFilename, ios::in|ios::binary|ios::ate);
     if( file.is_open() ) {
          nFileLen= file.tellg();
          file.seekg (0, ios::beg);
          pBuffer= new unsigned char [nFileLen];
          file.read ( pBuffer, nFileLen );
     }
     //----------- file is in memory now


     unsigned char* p= pBuffer;
     for (int j=0; j< nFileLen-nLenDataToFind; j++ ) {
          if ( memcmp( p, abDataToFind, nLenDataToFind ) == 0 ) {
               cout << "Found it at offset: " << j << endl;
               break;
          }
          p++;
     }
     
     delete[] pBuffer;
}

-- Dan
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6965463
Hi Dan
Thanks for the code - It is more simple to follow than Axters(for someone just learning C++) - no offence Axter.

However Dan I still have a problem.
It works fine if I just put one search byte in the
'abDataToFind[]' array like below
unsigned char abDataToFind[]=  {'V'};

but if I do something like
unsigned char abDataToFind[]=  {'V','a','l','u'};

It doesn't return an offset

I know these bytes exist in the file(see snapshot above)

I wonder do I need to set up abDataToFind[]  in a different way.

As a test I ran the program for each character above, just putting one byte in abDataToFind[] each time
and the following offsets were returned
V 3916
a 3918
l 3920
u 3922

This is exactly as one would expect so I can't understand why when I put these 4 bytes in abDataToFind[] together why memcmp does not return 0 at some stage and in turn an offset returned.

I am fully happy that the whole file is being read into the buffer, I set up a test loop to send contents of pBuffer to console.

Cheers




0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6965592
Dan
Is p only ever pointing to one byte at a time?
Is this the problem?
0
 
LVL 30

Expert Comment

by:Axter
ID: 6966048
BCUNNEY,
>>I am fully happy that the whole file is being read into
>>the buffer

I don't think this is a smart move if the file can be as big as you stated previously.

If a file is 20MB or more, you're going to be wasteing a lot of systems resources, and your code will be much slower.
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6966056
Dan
Is p only ever pointing to one byte at a time?
Is this the problem?
0
Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

 
LVL 30

Expert Comment

by:Axter
ID: 6966073
>>Is p only ever pointing to one byte at a time?
p is pointing to more then one byte.  It's pointing to a string.
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6966076
Dan
Is p only ever pointing to one byte at a time?
Is this the problem?
0
 
LVL 30

Expert Comment

by:Axter
ID: 6966089
BCUNNEY,
You should avoid hitting the refresh button when using EE.
That's what is causing your double posting.
If you click on the RELOAD link in the top right corner of this page, you can avoid double posting.
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6966109
Axeter
Re: Hitting Refresh
Will avoid hitting Refresh in future - did not realise this had effect of double posting

Cheers

BCunney
0
 
LVL 30

Expert Comment

by:Axter
ID: 6966125
>>Will avoid hitting Refresh in future - did not realise >>this had effect of double posting
This is a common problem.
I don't understand why EE hasn't change their web site code so it can detect double posting.
If you use codeguru, it can detect double posting.
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6966201
Axeter,
I didn't mean to totally ignore your example code above.
I am just learning C++ and Dan Rollins code looked more familiar to what I have been learning.
Vector means nothing to me - not yet
Also when I try to compile your sample code I get the following errors which relate to the line

const std::string filename = "C:\\TenforeFeed.trc";

What include files do I need or are there any other settings I need
I am using MS Visual C++ Stadard Edition.

D:\C++\testAxeterscode\testAxeterscode.cpp(7) : error C2653: 'std' : is not a class or namespace name
D:\C++\testAxeterscode\testAxeterscode.cpp(7) : error C2146: syntax error : missing ';' before identifier 'filename'
D:\C++\testAxeterscode\testAxeterscode.cpp(7) : error C2734: 'string' : const object must be initialized if not extern
D:\C++\testAxeterscode\testAxeterscode.cpp(7) : fatal error C1004: unexpected end of file found

Cheers


BCunney






0
 
LVL 30

Expert Comment

by:Axter
ID: 6966488
>>I am just learning C++ and Dan Rollins code looked more
>>familiar to what I have been learning.
Don't worry, I'm not take offense to you using Dan method.
I just think you should use a method that is not going to read the entire contents into memory.
I figured since you were uinsg fstream, that you were familiar with STL, and preferred the STL method.

Here's a complete example with all the required header files.
#include "stdafx.h"

#pragma warning (disable:4786)
#include <iostream>
#include <fstream>
#include <string>
#include <vector>

using namespace std;

const std::string filename = "C:\\TenforeFeed.trc";

bool ReadUntilKeyIsFound(std::ifstream &OpenedFile, const std::string KeyWord, int &PositionFound)
{
     int Position = KeyWord.size();
     vector<char> Buff(KeyWord.size()+1, 0);
     OpenedFile.read(&Buff[0], KeyWord.size());
     while(!OpenedFile.eof())
     {
          if (KeyWord == &Buff[0])
          {
               PositionFound = Position - KeyWord.size();
               return true;
          }
          std::copy(Buff.begin()+1, Buff.end(), Buff.begin());
          OpenedFile.read(&Buff[KeyWord.size()-1],1);
          ++Position;
     }
     return false;
}

int main(int, char*)
{
    ifstream                 is (filename.c_str());
     std::string SearchStr = "main";
     int PosFound = 0;
     if (ReadUntilKeyIsFound(is,SearchStr,PosFound))
     {
          cout << "Found Keyword at position " << PosFound << endl;
     }
     else
     {
          cout << "Didn't find " << SearchStr << endl;
     }
     system("pause");
     return 0;
}
0
 
LVL 22

Expert Comment

by:cookre
ID: 6966549
The sample data you posted shows blanks between the characters.  Those may very well have been binary zeros, i.e., the data you're scanning looks to be in UNICODE - two bytes per character with the second byte (the most significant in the little-endian Intel world) being zero.

E.g., "dbo.init" would appear in memory as 'd',0,'b',0,'o',0,'.',0,'i',0,...

That's also why your individual character offsets where tyo bytes apart.

If you change your abDataToFind initialization to:
'V','\0','a','\0'.'l','\0','u','\0'

the single byte oriented memcmp will find it.

0
 
LVL 30

Expert Comment

by:Axter
ID: 6966550
Here's a non STL version of the code.

#include "stdafx.h"

#include <fstream>  //Still using ifstream
using namespace std;


bool ReadUntilKeyIsFound(std::ifstream &OpenedFile, const unsigned char*  KeyWord, int SizeOfKeyWord, int &PositionFound)
{
     int Position = SizeOfKeyWord;
     unsigned char *Buff = new unsigned char[SizeOfKeyWord];
     OpenedFile.read((char*)Buff, SizeOfKeyWord);
     while(!OpenedFile.eof())
     {
          if (!memcmp(KeyWord, Buff, SizeOfKeyWord))
          {
               delete []Buff;
               PositionFound = Position - SizeOfKeyWord;
               return true;
          }
          memmove(Buff, Buff + 1, SizeOfKeyWord-1);
          OpenedFile.read((char*)Buff+SizeOfKeyWord-1,1);
          ++Position;
     }
     delete []Buff;
     return false;
}

const char* filename = "C:\\TenforeFeed.trc";

int main(int, char*)
{
    ifstream  is (filename);
     const char* SearchStr = "include";
     int PosFound = 0;
     if (ReadUntilKeyIsFound(is,(unsigned char *)SearchStr, strlen(SearchStr),PosFound))
     {
          printf("Found Keyword at position %i\n", PosFound);
     }
     else
     {
          printf("Didn't find %s", SearchStr);
     }
     
     system("pause");
     return 0;
}
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6966600
Well Done Cookre
I had just started to delve into possibilities of UNICODE - was not ignoring your first comment.

It's just that I had previously read in similar file(i.e. SQL Server Profiler Trace file - .trc) as Binary in Visual Basic and was able to use the VB binary string compare functions to achieve what I wanted.
I just presumed I should be able to do something similar in C++.
I had just proved that the file was the variable factor.
Just before you had posted your second comment I had tested DanRollins and Axeters suggested code with a text file that I manually created myself and both examples worked perfectly.
I changed abDataToFind initialization to:
'V','\0','a','\0'.'l','\0','u','\0' as you suggested and it works perfectly.
I possibly should have copped on to this myself.

Cheers

BCunney






 
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6966609
DanRollins, Axeter, Cookre
I think all of you made very good contributions so here is the deal.

I am going to accept DanRollins solution but I am subsequently going to post dummy questions
Points for Axeter
Points for Cookre

(50 points each)
to award points to you Axeter and Cookre as well as Dan Rollins.
Let me know if you are happy with this deal
0
 
LVL 30

Expert Comment

by:Axter
ID: 6966696
>>Let me know if you are happy with this deal
Sounds good to me.

Was your file a UNICODE file?
0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6966717
Axter,
Yes it was a UNICODE file.
but I had read in similar file(Profiler  Trace .trc) before in VB as binary(Open as Binary)
and was in this tunnel of thought.
I should have paid more attention to type of file and content. I wasn't fully clued in to the fact that it was UNICODE.


0
 
LVL 17

Author Comment

by:Barry Cunney
ID: 6966723
Thanks a million Dan
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
In this article, you will read about the trends across the human resources departments for the upcoming year. Some of them include improving employee experience, adopting new technologies, using HR software to its full extent, and integrating artifi…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to successfully download and install the SARDU utility on Windows 8, without downloading adware.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now