How to remove duplication within a combined text file?
Posted on 2014-09-24
The following script is what I am starting with:
REM Defile file location
REM Error if file does not exist
if not exist "%File1%" (
REM Remove duplicate lines from the file
copy NUL "%Workfile%" >NUL
for /f "tokens=* usebackq" %%B in ("%File1%") do (
findstr /b /e /c:"%%B" /i "%Workfile%">NUL || echo.%%B>>"%Workfile%"
copy /y "%Workfile%" "%File1%" >NUL
if exist "%Workfile%" del "%Workfile%"
My goal is to look at a file in a specific folder, ie Dir1=E:\Folder, assuming that it's already a combined file called newfile.txt, how can the script be modified to simply eliminate duplication based on the following conditions:
1. Each set of data within the file will start with a line which has the characters ISA*00* ....and will end with a line which starts with data IEA*1*,,,There can be several sets of data which start and end this way, one after another. The script needs to look at all the sets of data and remove any sets of data which are duplicates, simply leaving one copy of the data without duplication. The script above is good at eliminating duplicates however if there is an exact line which is duplicated in the various sets of data it will remove it at times, which is not the goal.
Summarized short form of what a file could contain..