Solved

How can I compare several text files and keep only 1 copy of a file without duplication?

Posted on 2014-09-24
9
129 Views
Last Modified: 2014-10-01
I would like to compare several text files, ie new1.txt and new2.txt, new3.txt .. regardless of the name of the files.
If the 'content's of the files compared with others is the same then delete one of the files which is a duplicate.
0
Comment
Question by:100questions
  • 4
  • 3
  • 2
9 Comments
 
LVL 17

Expert Comment

by:Emmanuel Adebayo
ID: 40342088
Hi,

You can use Multi-File Compare

To download files and see other information about the project, go to http://sourceforge.net/projects/multi-fcompare.

Rgds
0
 

Author Comment

by:100questions
ID: 40342173
Will this work in an existing batch script?
0
 
LVL 17

Expert Comment

by:Emmanuel Adebayo
ID: 40342192
No this is an executable.
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 

Author Comment

by:100questions
ID: 40342243
Then I would need something that I can insert in an existing Windows Batch file or a new Powershell or VBScript which can perform the function.
0
 
LVL 9

Accepted Solution

by:
dlb6597 earned 500 total points
ID: 40342313
barebones, inefficient...definately test this with a subset of your data.
basically for every text file this launches another for loop that compares each .txt file to every other .txt file and deletes if there is a match. The script starts over after a deletion because the (*.txt) set changes...

:start
for %%i in (*.txt) do (
	for %%j in (*.txt) do (
	if not "%%i" == "%%j" fc %%i %%j && del %%i && goto start
)
)

Open in new window

0
 

Author Comment

by:100questions
ID: 40342336
Does this script look into the contents of the txt file?
0
 
LVL 9

Expert Comment

by:dlb6597
ID: 40342351
yes, it compares file contents using the fc command.
0
 

Author Comment

by:100questions
ID: 40344239
This seems to work, however the problem is that one of the files it compares contains a small right arrow at the end of the data (an ASCII EOF marker) and if it sees that then it does not deduplicate properly.  

Is there a way your script can be modified so as to ignore the an ASCII EOF marker?
0
 
LVL 9

Expert Comment

by:dlb6597
ID: 40344258
then the files aren't identical then are they?  There is a /L parameter for FC, but I doubt it will make any difference since the files are truly different.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

AutoHotkey is an excellent, free, open source programming/scripting language for Windows. It started out as a keyboard/mouse macros product, but has expanded into a robust language. This article provides an introduction to it, with links to addition…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…

685 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question