Solved

How can I compare 2 files and output the differences to a new file

Posted on 2004-10-13
20
216 Views
Last Modified: 2010-04-01
Hi experts,

I have 2 files (file_old and file_new) with same formats, same columns, but different rows and contents.
So, is there any function I can use to compare the 2 files and write the differences to a new output file (file_diff) ?

Thanks in advance
0
Comment
Question by:justinY
  • 9
  • 8
  • 2
  • +1
20 Comments
 
LVL 12

Expert Comment

by:stefan73
ID: 12306112
Hi justinY,
Why don't you use diff?

FILE* f=popen("diff file_old file_new","r");

...then you can parse the output.


Cheers!

Stefan
0
 

Author Comment

by:justinY
ID: 12309459
diff is an unix function. I am running on windows O.S
so whats for windows O.S

thanks
0
 
LVL 1

Expert Comment

by:gugario
ID: 12323088
You could write your own little version of diff... something like this:

#include <fstream>
#include <string>

int main() {

ifstream file1 ("file_old");
ifstream file2 ("file_new");
ofstream out ("file_dff");

string line1, line2;
getline (file1, line1);
getline (file2, line2);

while ( file1 )  //while not at EOF
{
   if (line1 != line2)
        out << "OLD: " << line1 << "NEW: " << line2 << endl;
   getline (file1, line1);
   getline (file2, line2);
}

return 0;
}



0
 

Expert Comment

by:Chiffa
ID: 12323845
Please specify what did you mean by "differences"? If, for example,  first string from first file present in second file, but as fifth string - it should be marked as different? Or you want to get only lines that present only in one source file? In first case - answer from gugario is what you need. In second - you'll need more complicated code, for example use sets for strings, something like that:

#include <fstream>
#include <string>
#include <set>

using namespace std;

int main() {

  string file1_name = "file_old";
  string file2_name = "file_new";

  ifstream file1 ( file1_name.c_str() );
  ifstream file2 ( file2_name.c_str() );
  ofstream out ( "file_dff" );

  string line1, line2;
  set<string> set1;

  int cnt = 0;
  // Load contents of first file into set
  while( file1 ) {
    cnt++;
    getline( file1, line1 );
    if ( ! set1.insert(line1).second ) {
        cerr << "Insert failed of line no." << cnt << ": \"" << line1 << "\"  - duplicate" << endl ;
    }
  }

  // Read and compare lines from second file to set1, outputting lines that present in file2 only
  while( file2 ) {
    getline( file2, line2 );
    if ( set1.empty() || set1.find( line2 ) == set1.end() ) {
      out << "only in file " << file2_name << ": " << line2 << endl;
    } else {
      set1.erase( line2 );   // remove from set1 lines, found in both files
    }
  }

  // Just print remaining lines from set1 - there should stay only lines, not found in file2
  set<string>::iterator setItr;
  for( setItr = set1.begin(); setItr != set1.end(); setItr++ ) {
    out << "only in file " << file1_name << ": " << *setItr << endl;
  }
}

Just example will not work correctly in case of repeating lines - at least you should use multiset instead of set.


0
 
LVL 12

Expert Comment

by:stefan73
ID: 12336884
Why don't you use cygwin's diff?

http://www.cygwin.com

Then you have both worlds: Window's nice GUI and Unix's power. Or you can have a look at diff's source. It should be programmed in such a generic way that it probably compiles fine on Windows.
0
 

Author Comment

by:justinY
ID: 12351920
first string from first file present in second file, but as fifth string - it should be marked as different?          >>>>>             NO, that's same

Thanks guys,
Let me make myself clear here.
I have an old file ( oldfile ) and a new file (newfile). I want to compare oldfile and newfile, delete the same records ( not the same lines ) from newfile, and then output the newfile to a file called oldfile.

How can I start this ?
0
 
LVL 1

Expert Comment

by:gugario
ID: 12352140
What I would do is go through the old file once, and use a <set> to store all the unique records in the first file.  Afterwards, go through the second file, and compare each read in record to the ones you already read.

Open the old file again in append  mode (so that you can add the new information), and for each record you find which is not already in the old file, add this in.

You could have multiple sets, or some other technique for parsing the line, if you don't have only one column of records in your file.  I really hope this helps, let me know if you have any questions.

Gustavo

here's the code:

#include <set>
#include <fstream>
#include <string>
using namespace std;

int main()
{
        set <string> oldRecords;  //declare the set
        oldRecords.clear();       //empty it out

        ifstream oldFile ("oldfile.txt");       //open old file for reading
        ifstream newFile ("newfile.txt");       //open new file for reading

        string nextRecord;
        while (! oldFile.eof() )        //while not at end of old file
        {
                //add record to set
                oldFile >> nextRecord;
                oldRecords.insert( nextRecord );
        }

        //close old file and open it again in append write mode
        oldFile.close();
        ofstream writeFile ("oldfile.txt", fstream::app);

        while (! newFile.eof() )
        {
                //read record
                newFile >> nextRecord;
                if (oldRecords.find(nextRecord) == oldRecords.end()) //if not found, add
                {
                        writeFile << nextRecord << endl;
                        oldRecords.insert(nextRecord);
                }
        }

        newFile.close();
        writeFile.close();
        oldRecords.clear();

        return 1;
}
0
 

Author Comment

by:justinY
ID: 12352598
Thanks gugario, here is my code, but compiling errors, can you check see whats wrong ? Thanks

#include <fstream>
#include <sstream>
#include <iostream>
#include <string>
#include <iomanip>

using namespace std;

///////////////////////////////////
// this function can get any fields
//////////////////////////////////
std::string GetField(std::string &aStr, int aFieldNum, char aDelim)
{
    std::istringstream ss(aStr);
    std::string field;
    while (std::getline(ss, field, ',') && aFieldNum > 0 )
    {
        --aFieldNum;
    }
    return field;
}

int main(int argc, char *argv[])
{
    std::ifstream oldfin;
    std::ifstream newfin;
    std::ofstream fout;
    oldfin.open("oldfile.csv");
    newfin.open("newfile.csv");
    fout.open("diff.csv");
    std::string line;

    while ( std::getline(oldfin, line) && std::getline(newfin, line))
    {
            // comparing oldfile and newfile in both the 10th column and the 30th column fields
                                // if they both not same, then write the newfile to output file  ( my code id for this one )
                                // or if they both same, then delete the same fileds from newfile, and write the rest of newfile to output file ( can you give me code to delete
                                // the same fileds)
 
            std::string oldfn9 = GetField(line, 9, ',');
            std::string oldfn29 = GetField(line, 29,',');
            std::string newfn9 = GetField(line, 9,',');
            std::string newfn29 = GetField(line, 29,',');
            if ( (::oldfn9.c_str() != ::newfn9.c_strt()) && (::oldfn29.c_str() != ::newfn29.c_strt()) )
        {
            fout << line.c_str() << std::endl;
        }
    }
    fout.close();
    oldfin.close();
    newfin.close();
}
0
 

Author Comment

by:justinY
ID: 12352671
here is the compiling errors:
 if ( (oldfn9.c_str() != newfn9.c_str()) && (oldfn29.c_str() != newfn29.c_str()) )

But,  this part ' fout << line.c_str() << std::endl;' doesnt make sense. It produces no results.
what I want to do here is write to the lines to output file on the base of newfile, but not the same records by comparing it with oldfile.
0
 
LVL 1

Expert Comment

by:gugario
ID: 12354273
Hey, I almost understand what your program is supposed to do completely.. I'm gonna try to make some simple test files and make sure it runs ok, and then I'll repost the code... A couple of quick question and comments, though:

1.  When you use the "using namespace std;" line in the top of your program, you don't need all the "std::" in the middle of the code, since you already told the compiler you are using std for your namespace.  (that would make it a lot cleaner)

2. comparing oldfin9.c_str() != newfn9.c_str() is the same as doing oldfin9 != newfin9.... the string library has a comparison operator, so you don't have to change them to char strings before comparing...

I don't mean to be too picky, I'm just saying it because I think it would save you a lot of trouble and make the code cleaner.

Now for my question:
> // or if they both same, then delete the same fileds from newfile, and ?> write the rest of newfile to output file
   Does that mean that when you find a line where fields 10 and 30 are equal you go on to the next one and keep processing the file?  Are you supposed to remove those lines from the newfile?  Are you supposed to write the line to the output file without the 2 fields?  or are you supposed to skip that line and copy the rest of oldfile into the output file?  Please clarify cause I don't understand...

I'll post the code with some fixes (except for that part) as soon as possible

Gustavo.

0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 1

Expert Comment

by:gugario
ID: 12354524
Here you go... apart from some general cleaning up, I made sure that the GetField function now
returned the appropriate field (so you can call it on 10 and 30 instead of 9 and 29), and your big
problem was that you were trying to store both the line from the old file and the one from the new
file in the same string, that way you ended up comparing one line to the same line, and that's
why you never found them to be different.

Hope it helps,

Gustavo.:

#include <fstream>
#include <sstream>
#include <iostream>
#include <string>
#include <iomanip>

using namespace std;

///////////////////////////////////
// this function can get any fields
////////////////////////////////////
string GetField(string &aStr, int aFieldNum, char aDelim)
{
        istringstream ss(aStr);
        string field;
        while ( aFieldNum > 0)
        {
                getline ( ss, field, ',');
                --aFieldNum;
        }
        return field;
}

int main(int argc, char *argv[])
{
        ifstream oldfin;
        ifstream newfin;
        ofstream fout;
        oldfin.open("oldfile.csv");
        newfin.open("newfile.csv");
        fout.open("diff.csv");

        string lineOld;
        string lineNew;

        while (getline(oldfin, lineOld) && getline(newfin, lineNew))
        {
                string oldfn10 = GetField(lineOld, 10,',');
                string oldfn30 = GetField(lineOld, 30, ',');
                string newfn10 = GetField(lineNew, 10, ',');
                string newfn30 = GetField(lineNew, 30, ',');

               if ( (oldfn10 != newfn10) && (oldfn30 != newfn30) )
                {
                        fout << lineNew << endl;
                }
               else if ((oldfn10 == newfn10) && (oldfn30 == newfn30))
              {
                     //not sure what you wanted to do here.. let me know
               }
        }
        oldfin.close();
        newfin.close();
        fout.close();

        return 1;

}
0
 

Author Comment

by:justinY
ID: 12358010
Now for my question:
> // or if they both same, then delete the same fileds from newfile, and ?> write the rest of newfile to output file
   Does that mean that when you find a line where fields 10 and 30 are equal you go on to the next one and keep processing the file?  Are you supposed to remove those lines from the newfile?  Are you supposed to write the line to the output file without the 2 fields?  or are you supposed to skip that line and copy the rest of oldfile into the output file?  Please clarify cause I don't understand...

>>>>> Yes to all questions, but skip that line and copy the rest of newfile into the output file.

Now back to the code, after running the code, I have nothing in my diff.csv file. dont know why .
0
 
LVL 1

Expert Comment

by:gugario
ID: 12358402
We're getting there... The reason the code gave no output must be that I'm thinking that your input file looks some other way.

hmmmm... if you can, post the newfile.csv, oldfile.csv you are using, so that I can see what your input really looks like.. also, it would be good if you could tell me what kind of diff.csv you are expecting... (i don't need a 100 line file, just something that covers the cases where a line is diff in 2 places, equal in 2 places or diff in only one place.)

Gustavo.
0
 

Author Comment

by:justinY
ID: 12359623
Hi,

Finally, I got the code working , but it seems like the code is comparing line to line. That means 1st line of oldfile compares 1st line of newfile, and 2nd line of oldfile compares 2nd line of newfile and so on .... . Thats not what I want. What I want is as long as newfile has same fields as oldfile, not matter of the line number, then ignore them and go on the comparing untill reach the end of newfile, and write the not same fields to diff file.
0
 

Author Comment

by:justinY
ID: 12359893
Hi, Gustavo

this might be a good approach.

1. get field 10 and 30 of line 1 of newfile, compare them with oldfile from line1 to end of oldfile. If both are same, then delete them from newfile ( delete the whole line), and go on, otherwise go on

2. get filed10 and 30 of line2 of newfile, compare them with oldfile from line1 to end of oldfile. If both are same, then delete them from newfile (delete the whole line) and go on, otherwise go on

3. keep doing this, until reach the end of newfile.

4. write the rest lines of newfile to output file.

I think this will give us a clear logic way to do it, do you think so ? then how can i do it ?
0
 
LVL 1

Accepted Solution

by:
gugario earned 125 total points
ID: 12360556
Gotcha! The problem of doing it the way you suggested is that you would have to open the newfile a whole bunch of times.  And deleting a line from a file is harder than it looks!  But what you suggested can be easily done with sets.

What I did is keep a set of all pairs in file 1.. so, for instance, if column 10 is "aaa" and column 30 is "bbb" a pair is a string in the form "aaa,bbb"...

now, first you load all the pairs of old file into the set...

then, for each pair in newfile, if it's already in the set you do nothing, it it's not on the set, then you output to the diff file...

the code now looks like this:

#include <fstream>
#include <sstream>
#include <iostream>
#include <string>
#include <iomanip>
#include <set>
using namespace std;

///////////////////////////////////
// this function can get any fields
////////////////////////////////////
string GetField(string &aStr, int aFieldNum, char aDelim)
{
        istringstream ss(aStr);
        string field;
        while ( aFieldNum > 0)
        {
                getline ( ss, field, ',');
                --aFieldNum;
        }
        return field;

int main(int argc, char *argv[])
{
        ifstream oldfin;
        ifstream newfin;
        ofstream fout;
        oldfin.open("oldfile.csv");
        newfin.open("newfile.csv");
        fout.open("diff.csv");

        string lineOld;
        string lineNew;

        set<string> oldPairs;   //all the pairs of values in old file
        oldPairs.clear();
        string pair;    //a pair to add/check
        while (getline(oldfin, lineOld))
        {
                string oldfn10 = GetField(lineOld, 10,',');
                string oldfn30 = GetField(lineOld, 30, ',');
                //insert the pair in the set
                pair = oldfn10 + "," + oldfn30;
                oldPairs.insert( pair );
        }
        while (getline(newfin, lineNew))
        {
                string newfn10 = GetField(lineNew, 10, ',');
                string newfn30 = GetField(lineNew, 30, ',');
                pair = newfn10 + ","  + newfn30;

                if (oldPairs.find(pair) == oldPairs.end() )
                        fout << lineNew << endl;
        }
        oldfin.close();
        newfin.close();
        fout.close();

        return 1;
}
0
 

Author Comment

by:justinY
ID: 12361187
I have an empty output file, why ?
0
 
LVL 1

Expert Comment

by:gugario
ID: 12361339
is there anyway you can give me sample input/output?  It seemed to work on my pc....
0
 
LVL 1

Expert Comment

by:gugario
ID: 12361360
hey, nevermind.. i saw an obvious error on it.. after the return field; line in the first function, i forgot to close the bracket of the function (this is definitelly a copy/paste error).. did u catch that?  Maybe it'll make a difference.. if not, i'd still ask for sample in/out
0
 

Author Comment

by:justinY
ID: 12362711
OK, It works great. ---- Yes, I did catch that error since I had compiling error.

 Thanks, I am going to close this ticket and credit the points to you. I will open another ticket regarding getline(). If you have time, please take a look.  
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
This article will show you some of the more useful Standard Template Library (STL) algorithms through the use of working examples.  You will learn about how these algorithms fit into the STL architecture, how they work with STL containers, and why t…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now