Solved

Automated Script to compare Two Text Files and Save a copy of Only Differences

Posted on 2013-11-08
10
56 Views
1 Endorsement
Last Modified: 2016-07-10
My dearest Experts,

I want to compare two plain text files.  original.txt and new.txt
original.txt would be a full Customer list from a client a day old, and new.txt will be a full Customer list from a client from today.  I want to generate a script that will look at these two on a daily basis and save a copy of only the data in new.txt that did not exactly exist in original.txt to a file names diff.txt

Example:

original.txt
One
Two
Three
Five
Six
Seven
Eight

Open in new window


new.txt
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
Ten
Eleven

Open in new window


diff.txt
Four
Nine
Ten
Eleven

Open in new window


Is this at all possible?  I see plenty of option on comparing text with other applications, but I want to do this automatically on a scheduled basis every day.  

Also, please keep in mind that my sample is nothing compared to what I'm comparing.  The data files a full customer demographics, and the files are 60,000+ lines of text (comprised of "~" delimited data).

Thank you in advance.

-Nick
1
Comment
Question by:NCollinsBBP
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +1
10 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39633301
sort original.txt > oringinal.sort
sort new.txt > new.sort
comm -13 oringinal.sort new.sort > diff.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 39633315
Or, if diff.txt needs to keep the data in the same order as they appeared in new.txt:

perl -ne 'print if !$s{$_}++ && !@ARGV' original.txt new.txt > diff.txt
0
 
LVL 2

Expert Comment

by:burnocrash
ID: 39633353
if you wanna do in powershell.

here is the script,

compare-object -ReferenceObject $(Get-Content .\original.txt) -DifferenceObject $(Get-Content new.txt) > diff.txt
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:NCollinsBBP
ID: 39633475
@ozo, I do not have the liberty to utilize Perl on my current environment.  

@burnocrash, I have run the following script on my end in PowerShell...

compare-object -ReferenceObject $(Get-Content C:\test\old.txt) -DifferenceObject $(Get-Content new.txt) > C:\test\diff.txt  

Now, I get what I believe is the correct # of lines, but I do not see what I think I should see...

I'm getting in diff.txt a blank line at top, then two headers of "Input Object" as well as "SideIndicator", then my results.  But, I only get the first 56 characters of the line, followed by "...   =>"  

Is it possible to get diff.txt to show JUST the difference results in full?

-Nick
0
 
LVL 2

Accepted Solution

by:
burnocrash earned 500 total points
ID: 39638197
compare-object -ReferenceObject $(Get-Content .\original.txt) -DifferenceObject $(Get-Content new.txt) | select Inputobject | format-table -Wrap
0
 

Author Comment

by:NCollinsBBP
ID: 39638562
@burnocrash
Success!  (In regards to the output in the PowerShell screen).  Can this be spit out into the "diff.txt" file?  

My reason on doing this is that I receive a full customer file each and every day from a client, which has 60,000+ rows in, where only 75 to 100 of the lines are either updated or brand new.  Importing each of these daily is just killing my processing with the duplicates.  I can save hours in processing if I can just get the differences / new items spit out.  (And the client will not give the resources to change the customer extract... which is why I'm in this boat)

-Nick
0
 
LVL 2

Assisted Solution

by:burnocrash
burnocrash earned 500 total points
ID: 39640926
just add diff.txt to it.

here is the code,

compare-object -ReferenceObject $(Get-Content .\original.txt) -DifferenceObject $(Get-Content new.txt) | select Inputobject | format-table -Wrap > diff.txt

Enjoy :-)
0
 
LVL 12

Expert Comment

by:tel2
ID: 41702140
I suggest https:#a39640926 be accepted as the answer, as I see no reason to believe it didn't finish off the job.

Too bad the asker didn't specify the OS in the first place.  Would have saved ozo from wasting his time on it.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will show, step by step, how to integrate R code into a R Sweave document
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Introduction to Processes

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question