Link to home
Start Free TrialLog in
Avatar of rortiz77
rortiz77

asked on

Scan files for strings of text and log into another file

Problem: I have archives of files stored in directories by date.  For example, directory called "z:\journal\fhodict\200702\25" is one directory with 800 files/reports in them.  They are all in HL7 format viewable/modifiable in a txt file.  I need a way to scan all of these reports for a directory "z:\journal\fhodict\200702\25" .  In this scan its going to look at the top 4 lines of the file which will look like:

MSH|^~\`|DICT|020-01||FHIS|20070206235659||ORU^R01|
PID|1||001248568||BURKS^LENTON^E||
PV1|| .........................(NOT IMPORTANT)
OBR||RA070370369600|5459057|RA020006^CHEST,PORT SINGLE VW^01

What I need is a way to scan this files to grab specific parts of this header info and log it into another file so it looks like the below :

20070224CARL BETH: PRE OP|RA070540600
20070224MILDRED NEOL: LINE PLACEMENT|RA07059100
20070224LAUTERIA GAURIDO J: CVA|RA070540400
20070224RODRIGUEZ YENELISA: SIZE & DATES|RA07056700
20070224KOBOW BELLA: COPD|RA0705400700

Each directory will have it's own file log file that will be saved in as a .dat.  This will be done to many directories but I don't mind doing it one by one.  Also, this has to get rid of duplicate entries.  Is this possible?

thanks!!!
Avatar of alexcohn
alexcohn
Flag of Israel image

There is no Windows built-in utility that will do the work for you. Install Perl, or Python, or maybe even cygwin with awk.
Avatar of rortiz77
rortiz77

ASKER

Well, I was thinking more along the lines of a VB script or Batch that might be able to do this.  
there are no suitable batch commands; vbscript may be used, but it's not well suited for the task.
Are the first elements in each header line (MSH, PID, PV1, OBR) always the same?  Are these items unique to the header lines?
Yes, they are always in the same order and positioning.  They also have the same number of "bars" in between.  The bars are the delimiter.  
Given the following input:

MSH|^~\`|DICT|020-01||FHIS|20070206235659||ORU^R01|
PID|1||001248568||BURKS^LENTON^E||
PV1|| .........................(NOT IMPORTANT)
OBR||RA070370369600|5459057|RA020006^CHEST,PORT SINGLE VW^01

What would be the output?
Output would be...

20070206Burks Lenton E: CHEST, PORT SINGLE | RA070370369600

The 20070206 can come from the files system date somehow or by the 1st line in MSH where is has that long string of 20070206235659.
Or the date, 20070206, can be something that can be manually coded to be tagged in the front of the output and i'll modify it per directory it scans.  It would make that part easier :-)
@echo off
setlocal

set targetdir=z:\journal\fhodict\200702\25

set outputfile=output.txt

for /F "tokens=* usebackq" %%G in (`dir "%targetdir%\*.hl7" /B`) do (
 for /F "tokens=1,2,3,4,5,6 delims=|^" %%H in (%%G) do call :_process "%%H" "%%I" "%%J" "%%K" "%%L" "%%M"
)

goto :_end

:_process
if /I [%~1] EQU [MSH] set element1=%~6

if /I [%~1] EQU [PID] (
 set element2=%~4
 set element3=%~5
 if [%~6] NEQ [] set element4=%~6
)

if /I [%~1] EQU [OBR] (
 set element5=%~5
 set element6=%~2
)

if defined element6 (
 if defined element4 (
  echo %element1:~0,8%%element2% %element3% %element4%: %element5% ^| %element6% >> %outputfile%
 ) else (
  echo %element1:~0,8%%element2% %element3%:%element5% ^| %element6% >> %outputfile%
 )
)

goto :eof


:_end
endlocal
Here's the best  I could some up with. The batch processing accepts two optional parameters. The first is the directory where the files exist. The second is the output file name:

@echo off

setlocal enabledelayedexpansion

set fileMask=*.txt
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

set /a lineCnt=0

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

:PROCDONE

(echo %fld1%%fld2% %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

set fld1=%~1
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

for /f "tokens=2 delims=^" %%a in ('echo %~2') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
SteveGTR,

ok, so is this correct?

set fileMask= 'output.txt'
set workDir= 'C:\Test2'

Is that what you ment?
Ah, my script creates a bunch of duplicate entries.  I'll see if I can correct it.
No, if you named the batch file PROCTEXT.BAT you'd say:

PROCTEST c:\test2 output.dat

I'd avoid naming the output file with the .txt extension because the file is create in the same directory where the *.txt file are located.
This is a little better.

@echo off
setlocal

set targetdir=z:\journal\fhodict\200702\25

set outputfile=output.dat

for /F "tokens=* usebackq" %%G in (`dir "%targetdir%\*.hl7" /B`) do (
 for /F "tokens=1,2,3,4,5,6 delims=|^" %%H in (%%G) do call :_process "%%H" "%%I" "%%J" "%%K" "%%L" "%%M"
)

goto :_end

:_process
if /I [%~1] EQU [MSH] set element1=%~6

if /I [%~1] EQU [PID] (
 set element2=%~4
 set element3=%~5
 if [%~6] NEQ [] set element4=%~6
)

if /I [%~1] EQU [OBR] (
 set element5=%~5
 set element6=%~2
)

if defined element6 (
 if defined element4 (
  echo %element1:~0,8%%element2% %element3% %element4%: %element5% ^| %element6% >> "%outputfile%"
 ) else (
  echo %element1:~0,8%%element2% %element3%:%element5% ^| %element6% >> "%outputfile%"
 )
 set element6=
)

goto :eof

:_end
endlocal
Shift-3,

When I tried your batch it formated things correctly but it created duplicates and only did 4 of the same person.  

20060801BLANKENSHIP HOLLY R: 200608011019 | 4714683
20060801BLANKENSHIP HOLLY R: 200608011019 | 4714683
20060801BLANKENSHIP HOLLY R: 200608011019 | 4714683
20060801BLANKENSHIP HOLLY R: 200608011019 | 4714683
Shift-3,

This next one created only two duplicates but it's only looking at the same person.  It's not reading down the list of files.

20060801BLANKENSHIP HOLLY R: 200608011019 | 4714683
20060801BLANKENSHIP HOLLY R: 200608011019 | 4714683
SteveGTR,

It still makes no sense to me.  Can you use your example and fill in the source directory as an example? what file format its looking for? the destination already looks like output.dat so that part is fine.  
Ok, one more try.  This should process all the files and should remove all duplicates.  For the last part I borrowed a command from here:
http://www.jsifaq.com/SF/Tips/Tip.aspx?id=3530


@echo off
setlocal

set targetdir=z:\journal\fhodict\200702\25

set outputfile=output.dat

for /F "tokens=* usebackq" %%G in (`dir "%targetdir%\*.hl7" /B`) do (
 for /F "tokens=1,2,3,4,5,6 delims=|^ usebackq" %%H in ("%%G") do call :_process "%%H" "%%I" "%%J" "%%K" "%%L" "%%M"
)

for /F "tokens=* usebackq" %%S in (`Sort^<"%outputfile%"`) do @Find "%%S" tempoutput.txt||@echo %%S>>tempoutput.txt
move /Y tempoutput.txt "%outputfile%"

goto :_end

:_process
if /I [%~1] EQU [MSH] set element1=%~6

if /I [%~1] EQU [PID] (
 set element2=%~4
 set element3=%~5
 if [%~6] NEQ [] set element4=%~6
)

if /I [%~1] EQU [OBR] (
 set element5=%~5
 set element6=%~2
)

if defined element6 (
 if defined element4 (
  echo %element1:~0,8%%element2% %element3% %element4%: %element5% ^| %element6% >> "%outputfile%"
 ) else (
  echo %element1:~0,8%%element2% %element3%:%element5% ^| %element6% >> "%outputfile%"
 )
 set element1=
 set element2=
 set element3=
 set element4=
 set element5=
 set element6=
)

goto :eof

:_end
endlocal
Steve,

Ok, I get it.  At the DOS promt to type in "PROCTEST c:\test2 output.dat"  I ran it but it only has 3 problems:

1.  It only managed to extract 3 unique reports
2.  It didn't put in a ":" right after the persons name
3. It has a "2-3 V" right after the report name.
It appears my mask is incorrect. I was using *.txt it should be *.hl7. Just run this batch file from the root directory where the files exist (z:\journal\fhodict\200702\25). The processing doesn't produce what you want exactly, but it's close.

@echo off

setlocal enabledelayedexpansion

set fileMask=*.hl7
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

set /a lineCnt=0

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

:PROCDONE

(echo %fld1%%fld2% %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

set fld1=%~1
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

for /f "tokens=2 delims=^" %%a in ('echo %~2') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Here's some adjustments with your latest comments taken into consideration:

@echo off

setlocal enabledelayedexpansion

set fileMask=*.hl7
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

set /a lineCnt=0

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

:PROCDONE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

set fld1=%~1
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

for /f "tokens=2 delims=^" %%a in ('echo %~2') do set fld3=%%a

set fld3=%fld3:~0,-3%

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Steve,

Well, looks good...it's just reading one file :-)  MAN we're close hahaha!!!!!

Shift-3,

Your's has the perfect format but it's also just reading one file and not the entire thing...not sure why but both are soooo close!!!
Not sure if it's related but I've been renaming the source files to .txt  Yes, they are HL7 formated data but the file types are all random...they are .234, .342, .830...its all random as far as the file type is concerned.  That's why I renamed the type to .txt.
My file looks for files with a specific mask (*.hl7). Is there a specific mask or should it just process all files?
All files in the directory would be best.
@echo off

setlocal enabledelayedexpansion

set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

set /a lineCnt=0

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

:PROCDONE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

set fld1=%~1
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

for /f "tokens=2 delims=^" %%a in ('echo %~2') do set fld3=%%a

set fld3=%fld3:~0,-3%

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
We may want to place a sanity check in place in case other files are in the directory. Is it true that the first field on the first line of each file will be equal to 'MSH'?

If so, then this should do the trick:

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

for /f "tokens=2 delims=^" %%a in ('echo %~2') do set fld3=%%a

set fld3=%fld3:~0,-3%

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
@echo off
setlocal

set targetdir=z:\journal\fhodict\200702\25

set outputfile=output.dat

for /F "tokens=* usebackq" %%G in (`dir "%targetdir%\*.*" /B`) do (
 for /F "tokens=1,2,3,4,5,6 delims=|^ usebackq" %%H in ("%%G") do call :_process "%%H" "%%I" "%%J" "%%K" "%%L" "%%M"
)

for /F "tokens=* usebackq" %%S in (`Sort^<"%outputfile%"`) do @Find "%%S" tempoutput.txt||@echo %%S>>tempoutput.txt
move /Y tempoutput.txt "%outputfile%"

goto :_end

:_process
if /I [%~1] EQU [MSH] set element1=%~6

if /I [%~1] EQU [PID] (
 set element2=%~4
 set element3=%~5
 if [%~6] NEQ [] set element4=%~6
)

if /I [%~1] EQU [OBR] (
 set element5=%~5
 set element6=%~2
)

if defined element6 (
 if defined element4 (
  echo %element1:~0,8%%element2% %element3% %element4%: %element5% ^| %element6% >> "%outputfile%"
 ) else (
  echo %element1:~0,8%%element2% %element3%:%element5% ^| %element6% >> "%outputfile%"
 )
 set element1=
 set element2=
 set element3=
 set element4=
 set element5=
 set element6=
)

goto :eof

:_end
endlocal
Steve,

I keep getting:

C:\Test2>PROCTEXT c:\test2 output.dat
> was unexpected at this time.
Could be something in one of the files. Try this, it echos out the files as they are processing:

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

for /f "tokens=2 delims=^" %%a in ('echo %~2') do set fld3=%%a

set fld3=%fld3:~0,-3%

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Steve,

Got this:

C:\Test2>PROCTEXT c:\test2 output.dat
Processing dt102050.479...
> was unexpected at this time.
Can you post the first 5 lines of dt102050.479?
MSH|^~\`|DICT|020-06||FHIS|20060801102049||ORU^R01|20060801102049000|T|2.2|1|L||||||
PID|1||002000049||RONO^JENERS||1002000||||||(407)517-7377|(407)540-4919||||16678092||||||||||
PV1||E|^+||||^^^^^^FRID, VICKI KY DO DR^Y9||||||||||^^^^^^FRID, VICKI KY DO DR^Y9|O|
OBR||RA062120313100|4700009|RA420137^KNEE,=>4 VWS-RT*^06|||
OBX|1|TX|xxxxxxxR^^^913||||||||F|||||||
It's that > sign in the OBR line 4. What do you expect this output to look like?
From that line all that is needed is

RA062120313100    and
KNEE,=>4 VWS-RT*

If special characters are messing it up then I'd just not include it as part of the export.  
Ok, when removed manually it begins to process until it hits another one with that character....at least now we know the problem!
Well, give this a try and cross your fingers :)

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%

for /f "tokens=2-3 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

set fld3=%fld3:~0,-4%

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Steve,

This one got really far before it bombed out.  Not sure what caused it this time but here's the top 4 lines again and the output message.

MSH|^~\`|DICT|020-01||FHIS|20060801103218||
PID|1||000170002||NEWRI^TARIT^J||100000080000
PV1||E|7S^7329^01^732901||||
OBR||RA062120300000|4715572|RA490046^HUMERUS (R) 1-2 VWS^01|||

Processing dt103043.216...
Processing dt103053.154...
Processing dt103055.873...
Processing dt103116.780...
Processing dt103119.296...
Processing dt103121.702...
Processing dt103129.452...
Processing dt103153.78...
Processing dt103204.672...
Processing dt103219.657...
1-2|RA062120300000) was unexpected at this time.

Not sure if this is due to another special character but it almost looks like this issue came in the output process.
Actually, it seemed to have been the "()" in the OBR line that through it off.  When removed it worked fine.  Is there a way to add a line to the batch that automatically removes certain types of characters before it processes?
We could handle it like the other special characters:

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%
set _esc=%_esc:(=^^^(%
set _esc=%_esc:)=^^^)%

for /f "tokens=2-3 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

set fld3=%fld3:~0,-4%

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Another weird issue...not sure on this one.  

Processing fd150433.820...
Processing fd151501.540...
Processing fd151858.962...
Processing fd151952.712...
'RA070570180000' is not recognized as an internal or external command,
operable program or batch file.

MSH|^~\`|DICT|020-84||FHIS|20070226141951||
PID|1||000580016||MAULI^CHARLES^E||1
PV1||O|^+||||0000014008
OBR||RA070570180000|5540269|RA140026^ABD PAIN,DIARRHEA, R^84|||



Looks like it's taking too much off the end of fld3. I just changed the code to use the whole part.

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%
set _esc=%_esc:(=^^^(%
set _esc=%_esc:)=^^^)%

for /f "tokens=2-3 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Nice!!!!  It ran with no problems!!!  The only thing I noticed was "duplicate" entries.  It's not a literal duplicate because the RA# is different but the Report Type is the same. For example on the below, "MAL NEOPL BREAST-CEN"  and “TEMILL DOKIS” together would be looked at by the system as a duplicate.  

20070226TEMILL DOKIS: MAL NEOPL BREAST-CEN|RA070000083400
20070226TEMILL DOKIS: MAL NEOPL BREAST-CEN|RA070000083500

Is there a way to have it not log another entry for the same person and report type if one exists?
If the lines are the same we could do a comparision prior to saving.

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

del _temp.dat 2>NUL

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

if not exist "%outFile%" goto WRITELINE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>_temp.dat

findstr /G:_temp.dat "%outFile%" >NUL
if ERRORLEVEL 1 goto WRITELINE

goto :EOF

:WRITELINE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%
set _esc=%_esc:(=^^^(%
set _esc=%_esc:)=^^^)%

for /f "tokens=2-3 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
It ran but still got some duplicates…

20070226BIKER MARK A: HX CANCER|RA000550069700
20070226BIKER MARK A: HX CANCER|RA000550069800
That's because the lines are not the same. What is the unqiue portion that I can check?
Basically if the name and report type are the same it's a duplicate even though the RA# is different.  So it has to ignore the RA part.  

So if this part is the same its a duplicate:
BIKER MARK A: HX CANCER
BIKER MARK A: HX CANCER
@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

del _temp.dat 2>NUL

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

if not exist "%outFile%" goto WRITELINE

(echo %fld3%)>_temp.dat

findstr /G:_temp.dat "%outFile%" >NUL
if ERRORLEVEL 1 goto WRITELINE

goto :EOF

:WRITELINE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%
set _esc=%_esc:(=^^^(%
set _esc=%_esc:)=^^^)%

for /f "tokens=2-3 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Hmm...ok.  It got rid of nearly half the logs.  It went from 266 to 164.  Trying to verify if its accurate on a spreadsheet.  
Ok, after taking a look it should only be removing about 6% of the total logs instead of 90%.  When looking at how it was done in the past there were 2600 files and after the job it had an output of about 2500.  Now if i run this job of 1600 files I get only 162 on the output file when really I should expect maybe 1500.

What's causing this?
I'd say there are more rows matching then you expect. Maybe if we dumped out the duplicates you could get a better idea.

One possibility is that a person has the same name as another person with the same symtom. Another is that a person has multiple entries on different dates for the same symtom.

What do you think?
Maybe if you can take off the compare before writing and then I'll see how many it logs after that.  
Sure we can bypass that processing:

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

del _temp.dat 2>NUL

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

REM ** Bypass duplicate processing
goto WRITELINE

if not exist "%outFile%" goto WRITELINE

(echo %fld3%)>_temp.dat

findstr /G:_temp.dat "%outFile%" >NUL
if ERRORLEVEL 1 goto WRITELINE

goto :EOF

:WRITELINE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%
set _esc=%_esc:(=^^^(%
set _esc=%_esc:)=^^^)%

for /f "tokens=2-3 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Hmm...ok.  When I run it without checking for duplicates it give me the correct number of 1690 logs.  Also, I looked at the data on a spreadsheet and its definitely not the case of different people with same names and reports or a person with multiple entries on different dates for the same symptom.

There's something else we're missing....
Ok, I just did a sort for unique records in excel and it shows me the correct amount of 1570 from 1690 files.  
It could be the switches I'm using for findstr. I didn't include case-insensitive or literal. It should. Try this:

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

del _temp.dat 2>NUL

popd

if exist "%outFile%" (echo Output in %outFile%)&goto :EOF

echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

goto :EOF

:PROCDONE

if not exist "%outFile%" goto WRITELINE

(echo %fld3%)>_temp.dat

findstr /I /L /G:_temp.dat "%outFile%" >NUL
if ERRORLEVEL 1 goto WRITELINE

goto :EOF

:WRITELINE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if not "%~1"=="%checkFld%" set abort=Y&goto :EOF

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%
set _esc=%_esc:(=^^^(%
set _esc=%_esc:)=^^^)%

for /f "tokens=2-3 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
Nope, back down to 250 logs.
Could be that lines are being excluded because they don't start with the initial tag of MSH.
All the files there start with MSH.  
Time to debug...

Check out skiprec.log after processing completes:

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat
set foundFiles=

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL
del skiprec.log 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

del _temp.dat 2>NUL
del _temp2.dat 2>NUL

if exist "%outFile%" (echo Output in %outFile%)&set foundFiles=Y
if exist skiprec.log (echo Skipped record log in skiprec.log)

popd

if "%foundFiles%"=="" echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set fileName=%~1

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

(echo %~1: There were not at least 4 lines in the file)>>skiprec.log

goto :EOF

:PROCDONE

if not exist "%outFile%" goto WRITELINE

(echo %fld3%)>_temp.dat

findstr /I /L /G:_temp.dat "%outFile%" >_temp2.dat
if ERRORLEVEL 1 goto WRITELINE

(echo %~1: Matched found in output file on %fld3%)>>skiprec.log

goto :EOF

:WRITELINE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if "%~1"=="%checkFld%" goto PROC0_CNT

set abort=Y
(echo %~1: %~1 does not equal %checkFld% on 1st field in 1st line of file)>>skiprec.log
goto :EOF

:PROC0_CNT

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%
set _esc=%_esc:(=^^^(%
set _esc=%_esc:)=^^^)%

for /f "tokens=2-3 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF
90% of the the files were in this log skiprec.log:
dt001821.652: Matched found in output file on CT PELVIS W
dt002316.857: Matched found in output file on CHEST^,PORT SINGLE VW
dt002441.61: Matched found in output file on CTA CHEST
dt002553.577: Matched found in output file on CT BRAIN WO
dt002723.390: Matched found in output file on CT BRAIN WO
dt002845.625: Matched found in output file on CT BRAIN WO

what is the logic being used for the comparison?
Can you post back the first 4 lines of:

dt002553.577
dt002723.390
dt002845.625

dt002553.577
MSH|^~\`|DICT|020-02||FHIS|20070105232551||
PID|1||000071463||SWER^MION^K||190008130000||
PV1||E|^+||||^^^^^^CHAN, HAANG L. MD DR^Y9|
OBR||RA070000304900|5329834|RA420070^CT BRAIN WO^02||

dt002723.390
MSH|^~\`|DICT|020-01||FHIS|20070105232722||
PID|1||000050753||LEMON^DORTHY^R||
PV1||E|5ETW^5209^01^520901|||
OBR||RA070050002200|5329835|RA140009^CT BRAIN WO^01||

dt002845.625
MSH|^~\`|DICT|020-03||FHIS|20070105232844||
PID|1||000500426||CREY^JAMES^B||
PV1||E|^+||||^^^^^^BALER, STEVEN J. MD DR^Y9||
OBR||RA070050307000|5329837|RA420070^CT BRAIN WO^03||
My comparision is being done on "CT BRAIN WO". What should it be done on? What makes each of these unique?
These are all different people.  Only when you have:

John Doe: Head injury
John Doe: Head injury

should it be considered a duplicate and only log one version to say:

John Doe: Head injury

So if John Doe with Head injury is on the next line it would skip over it and continue to the next file.
Yes, I see so we should match on person and line 4 item?
Correct, on person and report type.
ASKER CERTIFIED SOLUTION
Avatar of SteveGTR
SteveGTR
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You got it!!!!!!!!!  
Wow this was a tough one...i'm no programmer so this works great!  Thanks for your patience...you rock!!!
Glad it work :)
Steve,

Just noticed another problem :-(  
It crashes when looking for the Report Name and finds ^^ (nothing) in its place.  

OBR||RA070000075100|5500040|RA400107^^01|

where a normal one looks like:
OBR||RA070000181000|5007126|RA000001^INFILTRATE^01|

Can it be set to ignore ones where it is left blank?
Give this a try:

@echo off

setlocal enabledelayedexpansion

set checkFld=MSH
set fileMask=*.*
set workDir=.

if not "%~1"=="" set workDir=%~1

set outFile=output.dat
set foundFiles=

if not "%~2"=="" set outFile=%~2

pushd "%workDir%"

del "%outFile%" 2>NUL
del skiprec.log 2>NUL

for /f "tokens=*" %%a in ('dir /b /a-d "%fileMask%" 2^>NUL') do call :PROCESS "%%a"

del _temp.dat 2>NUL
del _temp2.dat 2>NUL

if exist "%outFile%" (echo Output in %outFile%)&set foundFiles=Y
if exist skiprec.log (echo Skipped record log in skiprec.log)

popd

if "%foundFiles%"=="" echo No files found

goto :EOF

:PROCESS

echo Processing %~1...

set fileName=%~1

set /a lineCnt=0
set abort=

for /f "tokens=1-9 delims=|" %%a in ('type "%~1"') do (
  call :PROCLINE "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i"
  if "!abort!"=="Y" goto :EOF
  set /a lineCnt+=1
  if /i !lineCnt! EQU 4 goto :PROCDONE
)

REM ** File must have at least 4 lines

(echo %~1: There were not at least 4 lines in the file)>>skiprec.log

goto :EOF

:PROCDONE

if not exist "%outFile%" goto WRITELINE

(echo %fld2%: %fld3%)>_temp.dat

findstr /I /L /G:_temp.dat "%outFile%" >_temp2.dat
if ERRORLEVEL 1 goto WRITELINE

(echo %~1: Matched found in output file on %fld2%: %fld3%)>>skiprec.log

goto :EOF

:WRITELINE

(echo %fld1%%fld2%: %fld3%^|%fld4%)>>"%outFile%"

goto :EOF

:PROCLINE

if /i %lineCnt% EQU 0 call :PROC0 "%~1" "%~6"
if /i %lineCnt% EQU 1 call :PROC1 "%~4"
if /i %lineCnt% EQU 3 call :PROC3 "%~2" "%~4"

goto :EOF

:PROC0

if "%~1"=="%checkFld%" goto PROC0_CNT

set abort=Y
(echo %~1: %~1 does not equal %checkFld% on 1st field in 1st line of file)>>skiprec.log
goto :EOF

:PROC0_CNT

set fld1=%~2
set fld1=%fld1:~0,8%

goto :EOF

:PROC1

call :FIXCARET fld2 "%~1"

goto :EOF

:PROC3

set fld3=
set fld4=%~1

set _esc="%~2"

set _esc=%_esc:^=@%
set _esc=%_esc:&=^^^&%
set _esc=%_esc:,=^^^,%
set _esc=%_esc:\=^^^\%
set _esc=%_esc:|=^^^|%
set _esc=%_esc:<=^^^<%
set _esc=%_esc:>=^^^>%
set _esc=%_esc:(=^^^(%
set _esc=%_esc:)=^^^)%

set _t=%_esc:@@@@@@@@=%

REM Test for a blank field
if not %_t%==%_esc% goto :EOF

for /f "tokens=2 delims=@" %%a in ('echo %_esc%') do set fld3=%%a

goto :EOF

:FIXCARET

set _fc=%~2
set _fc=%_fc:^^^^= %

set %~1=%_fc%

goto :EOF