How do I merge the data between two files if the contents is different?

I would like to compare the "contents" of 2 files, ie:

Folder E is the 'goto' folder.

Scenarios:

E:\FOLDER\text1.txt
F:\FOLDER\text1.txt

If the contents of the data is exactly the same in both files, then there should be no merging.  

If there is the same data in both files, but there is extra data in one of the files, then the files should be merged, however there should be no duplication of the data which is found which is the same, that data should only show up once.
The file where the merge should be sent to is the E FOLDER.

or, this other scenario:

E:\FOLDER\
F:\FOLDER\text1.txt  
(in this scenario text1. should be copied to the E FOLDER path since there is no data in E FOLDER.
100questionsAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

MontoyaProcess Improvement MgrCommented:
I don't know if you're interested in learning the programming method, or if you want the wheel already invented. In case of the latter... http://www.scootersoftware.com/moreinfo.php
100questionsAuthor Commented:
Thanks.  Can this run in a script?
It needs to run regularly in a script.
Bill PrewIT / Software Engineering ConsultantCommented:
I would need a little more information on the data, and a few examples to comment on a good approach.  Is the data relatively "simple" and you will be doing this comparison and merge based on the whole contents of the line?  Or is there a "key" field of some sort that needs to be matched together, and then changes applied somehow (maybe latest dated file)?  Is there an ordering to the file that is important, or a "structure" to the file (like an INI or XML, etc) that needs to be honored?

If you are just looking at the whole line, and order is not important, then one approach would be to concatenate the two files together, and then sort with an option to remove duplicates.  The base Windows DOS sort.exe command line sort doesn't have this ability, but there are several freeware sorts that can delete duplicate lines, like SORT in http://www.gnu.org/software/coreutils/

~bp
Become a Microsoft Certified Solutions Expert

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

100questionsAuthor Commented:
HI Bill.
Yes the data is relatively simple in the sense that order is not important, dates are not important either.
Basically the content of the two text files need to be compared, and if there is similar data found in both, then duplication should not occur, but one copy of that data should be kept, and if there is any additional data then it definitely should be added to the file.

Hope this helps.

Also a simple dos or batch command which would compare the contents of two files would be useful, and if data is found to be the same then not to concatenate or merge the files etc.
Bill PrewIT / Software Engineering ConsultantCommented:
If the files are different, do you want them both to be replaced by the merged result?

~bp
MontoyaProcess Improvement MgrCommented:
you can use AutoIt to write a script that runs the program, or you can actually write the whole script there.
Just about any language PHP, C#, etc.. could do the same.

If you're comparing content on both files, what is considered a difference?

For example, if a date is different, do you want to merge the file? Is it that kind of data?

Even if you were comparing on file size alone (which I would not recommend), if the file size is different, you would want to analyze the contents to determine merging point. Right? or is it that second document will typically show whatever was in the first document plus some changes?

All this matters in determining the right solution.
100questionsAuthor Commented:
Hi Bill.  If the contents of the files are different, then either your can create a new file if it's easier with the merged results, or replace the file in the E FOLDER.
Bill PrewIT / Software Engineering ConsultantCommented:
Okay, if you grab a copy of this free little sort utility http://www.chmaas.handshake.de/delphi/freeware/cmsort/cmsort.htm , then you can easily get a de-duplicated merged version of the two files like this.

Here's a working script that you can adjust as needed.  It looks for the two files and if they differ will replace BOTH of them with the merged result.

@echo off
setlocal

REM Defile folder and file locations to compare and merge
set Dir1=B:\EE\EE28507172\E
set Dir2=B:\EE\EE28507172\F
set File=text1.txt

REM Error if neither file exists
if not exist "%Dir1%\%File%" (
  if not exist "%Dir2%\%File%" (
    echo *ERROR* No files to compare
    exit /b
  )
)

REM If only first file exists just copy it to other folder
if not exist "%Dir1%\%File%" (
  echo *INFO* Copying "%Dir2%\%File%" to "%Dir1%"
  copy "%Dir2%\%File%" "%Dir1%\%File%" >NUL
  exit /b
)

REM If only second file exists just copy it to other folder
if not exist "%Dir2%\%File%" (
  echo *INFO* Copying "%Dir1%\%File%" to "%Dir2%"
  copy "%Dir1%\%File%" "%Dir2%\%File%" >NUL
  exit /b
)

REM Both files exist, check if different and if so merge them
fc "%Dir1%\%File%" "%Dir2%\%File%" >NUL && (
  ECHO *INFO* Files match
) || (
  ECHO *INFO* Merging files
  copy "%Dir1%\%File%"+"%Dir2%\%File%" "%Dir1%\_workfile_.txt" >NUL
  cmsort /d "%Dir1%\_workfile_.txt" "%Dir1%\%File%" >NUL
  copy /y "%Dir1%\%File%" "%Dir2%\%File%" >NUL
  if exist "%Dir1%\_workfile_.txt" del "%Dir1%\_workfile_.txt"
)

Open in new window

~bp
100questionsAuthor Commented:
Thanks Bill.  The script you wrote is not dependant on downloading csort is it?
Bill PrewIT / Software Engineering ConsultantCommented:
Yes, it still requires the cmsort utility that I referred to.  Merging files and eliminating dupes in pure DOS can be done (with the help of FINDSTR typically), but is a bit more ugly and can trip over special characters in the files...

~bp
100questionsAuthor Commented:
Thanks Bill. I don't see any reference to the cmsort utility though.  Is it referred to in the script?
100questionsAuthor Commented:
I found the reference to cmsort my apologies.
100questionsAuthor Commented:
Bill, can I also see what the merging and eliminating dupes in DOS with the FINDSTR would look like?  Perhaps it's enough for what I need.
Bill PrewIT / Software Engineering ConsultantCommented:
Here's an example of that technique from a prior question.  We could adapt it to your specifics, but it will give you the idea.

http:#a37907672
100questionsAuthor Commented:
Bill.  I tried the script, however it reorders the data so that all similar data is in a specific order.  It did not respect the order which the data was in the files.  
It looks like it merged however it then reorders and lists each line in alpha order.
Is there a way to correct this.

Otherwise it would be easy to merge data without the csort utility, however I would still need to see if it sees duplication in the data and remove it.

To be clearer, there is always data that starts with specific digits. ie ABC.  Then there are lines of data after ABC, and then eventually another line with ABC starts again..  If the set of data after the 2nd ABC is the same as the set of data in a previous ABC, then duplication must not occur.  Also, alpha order must not be rearranged in the file.
100questionsAuthor Commented:
Bill, further on my message above.   Perhaps if we merge two files by means of concatenation, and then use csort simply to remove duplicate lines, without rearranging the order of the data within the file, would that work?  Of course it would only have to remove perhaps the 2nd instance of the duplicated line, and not the first to make it consistent.

Here is information I pulled from the pdf which came along with csort.

6.2 Example for ignoring records with duplicate keys
Duplicate records are recognized by the defined key(s), not by the whole line. If you
want to exclude identical lines, you must perform an additional sort beforehand by
using the whole line as sort key. The following log file is containing user ID, user
name, and last access time:

055 Maas 2001-02-05 07:31:55
087 Mechenbier 2001-02-05 08:01:23
024 Hesselbein 2001-02-05 08:15:16
055 Maas 2001-02-05 08:44:24
089 Kruft 2001-02-05 09:05:07
087 Mechenbier 2001-02-05 09:31:13

Command line:
cmsort /S=1,3 /D log.txt log.sor
100questionsAuthor Commented:
Bill, re your comment to try http:#a37907672.
I modified it to try to work however when it sorts it still rearranges everything in alpha order.

It removed the duplicates, however it sorted everything in alpha order and I did not want that.  
It has to keep the lines in the order they were in, however just eliminating duplicates.
Bill PrewIT / Software Engineering ConsultantCommented:
Okay, that's why I asked about order earlier, thought you had said that didn't matter.  Tell me this, if these were our two files, what would the desired merge be?

file1
-------
AAA
DDD
XXX
YYY
BBB
MMM

file2
-------
MMM
AAA
XXX
YYY
ZZZ
100questionsAuthor Commented:
Thanks for your help with this.
The merge, using A as to go to file, would be as follows:

file1
-------
AAA
DDD
XXX
YYY
BBB
MMM
ZZZ
Bill PrewIT / Software Engineering ConsultantCommented:
Okay, give this version a try.  It does NOT use the cmsort utility any more, it does it all in the BAT script.  It may have a little more logging display than you want, but better for testing, and you can remove or comment out some of the ECHO's later once you get it dialed in.

@echo off
setlocal

REM Defile folder and file locations to compare and merge
set Dir1=E:\Folder
set Dir2=F:\Folder
set File1=%Dir1%\text1.txt
set File2=%Dir2%\text1.txt
set MergeList="%File1%","%File2%"
set Workfile=%Dir1%\_workfile_.txt

REM Error if neither file exists
if not exist "%File1%" (
  if not exist "%File2%" (
    echo *ERROR* No files to compare
    exit /b
  )
)

REM If only first file exists just copy it to other folder
if not exist "%File1%" (
  echo *INFO* Copying "%File2%" to "%File1%"
  copy "%File2%" "%File1%" >NUL
  exit /b
)

REM If only second file exists just copy it to other folder
if not exist "%File2%" (
  echo *INFO* Copying "%File1%" to "%File2%"
  copy "%File1%" "%File2%" >NUL
  exit /b
)

REM Both files exist, check if different and if so merge them
fc "%File1%" "%File2%" >NUL && (
  ECHO *INFO* Files match
) || (
  ECHO *INFO* Merging files
  copy NUL "%Workfile%" >NUL
  for %%A in (%MergeList%) do (
    ECHO *INFO* Processing file:[%%~A]
    for /f "tokens=* usebackq" %%B in ("%%~A") do (
      findstr /b /e /c:"%%B" /i "%Workfile%">NUL || echo.%%B>>"%Workfile%"
    )
  )
  copy /y "%Workfile%" "%File1%" >NUL
  copy /y "%Workfile%" "%File2%" >NUL
  if exist "%Workfile%" del "%Workfile%"
)

Open in new window

~bp

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
100questionsAuthor Commented:
Hi Bill. This works very well thank you.  One last modification.  If I simply want the script to look at a file in a specific folder, ie Dir1=E:\Folder, assuming that it's already a combined file called text1.txt, how can the script be modified to simply just eliminate the duplication if it finds any?
Bill PrewIT / Software Engineering ConsultantCommented:
I think this would handle that.

@echo off
setlocal

REM Defile file location
set File1=E:\Folder\text1.txt
set Workfile=E:\Folder\_workfile_.txt

REM Error if file does not exist
if not exist "%File1%" (
  exit /b
)

REM Remove duplicate lines from the file
  copy NUL "%Workfile%" >NUL
  for /f "tokens=* usebackq" %%B in ("%File1%") do (
    findstr /b /e /c:"%%B" /i "%Workfile%">NUL || echo.%%B>>"%Workfile%"
  )
  copy /y "%Workfile%" "%File1%" >NUL
  if exist "%Workfile%" del "%Workfile%"
)

Open in new window

~bp
100questionsAuthor Commented:
Excellent solutions on both counts!  Thank you for your patient assistance with this.
Bill PrewIT / Software Engineering ConsultantCommented:
Welcome, glad that was helpful.

~bp
100questionsAuthor Commented:
Bill, one last request please.
I inserted the script in another script and what seems to happen is that it stops my script at a certain point.
Perhaps the exit command?
After running the script I want it to continue with the rest of my script.
How can I get it to do this?
Bill PrewIT / Software Engineering ConsultantCommented:
The exits are fine as long as they have the /B on them.  It sounds like you may not be calling the second script with the CALL command, like:

CALL second.bat

That should continue the first script when the second ends.

~bp
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Legacy OS

From novice to tech pro — start learning today.