Link to home
Start Free TrialLog in
Avatar of ptoomen
ptoomen

asked on

windows batch script

I have a requirement where i need to read a text file, process it and create an output csv text file.

=> Input file is below:
                        LEDGER REPORT        Date:28-SEP-15 09:15:33
                ABC XY REPORTING                                 Page: 1
                           Reporting period: SEP-2015
Currency: USD
No additional requested
              OPENING
ENTITY    RESULTS          ACCOUNT
---------   ----------     ----------
2345          72,345.29     100140
2345X          7,500.23     050677
2345             500.23     050677
                        LEDGER REPORT        Date:28-SEP-15 09:15:33
                ABC XY REPORTING                                 Page: 2
                           Reporting period: SEP-2015
Currency: USD
No additional requested
              OPENING
ENTITY    RESULTS           ACCOUNT
---------   ----------      ----------
2345         (77,345.29)    100140
2345X          3,500.23     050677


=> Output File is below:
ABC Company
H1,M1,F1,G1,07/31/2015,08/31/2015
ABC,100140,72345.29,0,TB,D,XYZ
ABC,050677,7500.23,0,XY,D,XYZ
ABC,050677,500.23,0,TB,D,XYZ
ABC,100140,-77345.29,0,TB,C,XYZ
ABC,050677,3500.23,0,XY,D,XYZ
L1,M,DDD,07/31/2015,08/31/2015,5

=> Few additional Notes:
1. The fields in the input file are to be processed based on the position of the field as each field is fixed width
2. The total number of records in the output file are 2 headers + 1 footer + number of records in the input file (excluding the input file headers)
3. The 1st header is hardcoded as ABC Company
4. The 2nd header in the output file has 2 dates - please assume these dates are present in the file name and to be pulled from the file name. Remaining fields in the 2nd header of the output file are hardcoded values
5. The output data has 7 fields - details below:
     Field1 - ABC
     Field2 - from column 29 of input field (length 6)
     Field3 - from column 6 of input field length (17)
     Field4 - 0
     Field5 - TB (if 1st field of input file is 2345)
            - XY (if 1st field of input file is 2345X)
     Field6 - D (if 2nd field of input file is >= 0)
            - C (if 2nd field of input file is < 0)
     Field7 - XYZ
6. Trailer record of output file has hardcoded values L1, M, DDD and same dates as in the header record of output file and also number of records written to the output file (in this case 5)
Avatar of NVIT
NVIT
Flag of United States of America image

This works. Make file test.bat of this code:
@echo off
setlocal enabledelayedexpansion
echo ABC Company
for %%a in (input.txt) do (set dt=%%~ta)
echo H1,M1,F1,G1,%dt%,%dt%

for /f "tokens=1-3 skip=8" %%a in (input.txt) do (call :doit %%a %%b %%c)
echo L1,M,DDD,%dt%,%dt%,%count%
goto :eof

:doit
if %1 equ LEDGER goto :end
if %1 equ ABC goto :end
if %1 equ Reporting goto :end
if %1 equ Currency: goto :end
if %1 equ No goto :end
if %1 equ OPENING goto :end
if %1 equ ENTITY goto :end
if %1 equ --------- goto :end

set plusmin=%2

if /i 2345 equ %1 set f5=TB
if /i 2345X equ %1 set f5=XY
if /i %plusmin% gtr 0 set f6=D
if /i %plusmin% lss 0 set f6=C
if /i %plusmin:~0,1% equ ( set f6=C
echo %1,%2,%3,0,!f5!,!f6!,XYZ
set /a count=%count%+1
)
goto :eof

:end

Open in new window

Avatar of ptoomen
ptoomen

ASKER

Thanks a lot NVIT for your quick help! This worked perfectly fine for this example.

I had few more questions:
1. For Field2 and Field 3, I would like to have it processed based on the position. This is because there may be some records where some columns have spaces in between thus throwing off the records.
2. Currently data gets echoed on the command prompt. How to write the output to a csv file?
3. There are 2 dates that needs to be written into the file. Either it could be present in the file name. eg: input_20150920_20150926.txt or input file can be input.txt and the dates could be in another file called dates.txt where it could have 2 records for the 2 dates (whichever is easier).
4. Instead of ignoring certain lines eg: if it starts with LEDGER, AB C etc is it possible to have the logic so that it processes the line only if the line starts with 2345 or 2345X
> Field2 and Field 3, I would like to have it processed based on the position. This is because there may be some records where some columns have spaces in between thus throwing off the records.

Can you give an example of how the columns may differ?
Try this; it expects the input file name as first argument (or you can just drag the input file on the script in Explorer), and will read the dates from the file name.
The output file name will be the same as the input file name, with the extension '.csv'.
You can define your settings in lines 21-41; there should be no need to change anything below that.
Check the fixed field definitions; since you didn't post the input file as an attachment or inside a code box, spaces might have been lost. Especially your definition for Field 3 (one-based columns 6-17) didn't match with the input file, it would for example have extracted '        (77,345.2' from page 2,
@echo off
setlocal enabledelayedexpansion
if "%~1"=="" (
	echo Syntax:
	echo %~nx0 ^<Input File^>
	pause
	exit /b 1
)
if not exist "%~1" (
	echo Input file '%~1' not found.
	pause
	exit /b 1
)
for /f "tokens=2,3 delims=_" %%a in ("%~n1") do (
	set FileDate1=%%a
	set FileDate2=%%b
)
set InputFile=%~f1
set OutputFile=%~dpn1.csv

set Header1=ABC Company
set Header2=H1,M1,F1,G1,%FileDate1%,%FileDate2%
set Footer=L1,M,DDD,%FileDate1%,%FileDate2%
REM *** Field 1; static definition:
set Field[1]=ABC
REM *** Field 2 fixed width extraction: <ZERO-based index of the first character>,<character count>
set Field2_FixedWidth=28,6
REM *** Field 3 fixed width extraction: <ZERO-based index of the first character>,<character count>
set Field3_FixedWidth=7,17
REM *** Field 4; static definition:
set Field[4]=0
REM *** Field 5 fixed width extraction: <ZERO-based index of the first character>,<character count>
set Field5_FixedWidth=0,6
REM *** Hashtable for Field 5; format: Entity[<Entity name>]=<Value for Field 5>
set Entity[2345]=TB
set Entity[2345X]=XY
REM *** Hashtable for Field 6
set Results[positive]=D
set Results[negative]=C
REM *** Field 7; static definition:
set Field[7]=XYZ

set /a RecordCount = 0
set Search=
for /f "tokens=2 delims=[]" %%a in ('set Entity[') do set Search=!Search! /c:"^^%%a "
 >"%OutputFile%" echo %Header1%
>>"%OutputFile%" echo %Header2%
echo Processing '%InputFile%' ...
for /f "delims=" %%a in ('type "%InputFile%" ^| findstr.exe /i /r %Search%') do call :Process "%%a"
>>"%OutputFile%" echo %Footer%,%RecordCount%
echo ... done; output file is '%OutputFile%':
echo.
type "%OutputFile%"
echo.
pause
goto :eof

:Process
set Line=%~1
set /a RecordCount += 1
call :Trim Field[2] "!Line:~%Field2_FixedWidth%!"
call :Trim Results "!Line:~%Field3_FixedWidth%!"
set Results=%Results:,=%
if "%Results:~0,1%"=="(" goto Negative
:Positive
set Field[3]=%Results%
set Field[6]=%Results[positive]%
goto Entity
:Negative
set Field[3]=-%Results:~1,-1%
set Field[6]=%Results[negative]%
:Entity
call :Trim Entity "!Line:~%Field5_FixedWidth%!"
set Field[5]=!Entity[%Entity%]!
>>"%OutputFile%" echo %Field[1]%,%Field[2]%,%Field[3]%,%Field[4]%,%Field[5]%,%Field[6]%,%Field[7]%
goto :eof

:Trim <ByRef Var> <String>
set __String=%~2
:TrimStart
if "!__String:~0,1!"==" " (set __String=!__String:~1!&goto TrimStart)
:TrimEnd
if "!__String:~-1!"==" " (set __String=!__String:~0,-1!&goto TrimEnd)
set %1=%__String%
goto :eof

Open in new window

Output generated from your input file, saved as 'input_20150920_20150926.txt':
ABC Company
H1,M1,F1,G1,20150920,20150926
ABC,100140,72345.29,0,TB,D,XYZ
ABC,050677,7500.23,0,XY,D,XYZ
ABC,050677,500.23,0,TB,D,XYZ
ABC,100140,-77345.29,0,TB,C,XYZ
ABC,050677,3500.23,0,XY,D,XYZ
L1,M,DDD,20150920,20150926,5

Open in new window

Avatar of ptoomen

ASKER

Great - thank oBdA! This also works great.

Needed some help still. The sample input file I had placed had only 3 columns and output has 7 coulmns. In reality, the input file has many columns (maybe around 25) and the output file file is actually 14 columns. If I want to add more fields to the output file, which part of the code should I change?

I added more columns in the below line to add more columns
>>"%OutputFile%" echo %Field[1]%,%Field[2]%,%Field[3]%,%Field[4]%,%Field[5]%,%Field[6]%,%Field[7]%

Also, I modified the below section of the code to suit my needs

REM *** Field 2 fixed width extraction: <ZERO-based index of the first character>,<character count>
set Field2_FixedWidth=28,6

Also, in one of the columns in the input file it has the ampersand symbol in some records  in the data and it is throwing an error saying inoperable command or so when it processes those records.

For some reason, the TB and XY values are not getting written in my output file in my modified code
set Entity[2345]=TB
set Entity[2345X]=XY
ASKER CERTIFIED SOLUTION
Avatar of oBdA
oBdA

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ptoomen

ASKER

Great - thanks oBdA!!! Few more items I need help on....

1. The dates will not be input as parameters as earlier thought

If you look at the 3rd line in input file it says Reporting period : SEP-2015
So, Date1 in the output file will be 9/30/2015 (last date of month)
Date2 will be the run date of the program, If program is run on 1st Oct, it would be 10/1/2015

2. In the trailer record, I need the sum of 2nd column of input file just after the dates in the output trailer record. So, the output trailer record would like this
L1,M,DDD,09/30/2015,10/1/2015,6500.69,5

3. Is it possible for the output file to be names as OutputFile_MMYYYY.csv
where MMYYY would be the date mentioned in the Reporting period
eg: if its Reporting period : SEP-2015
The output file name would be OutputFile_092015.csv

4. Instead of echoeing the displays, is it possible to log it into an output log file in the same folder? say OutputFile_MMYYYY.log
Avatar of ptoomen

ASKER

hi oBdA - your solution worked perfectly. i have few more questions - kindly get back on those when you get some time
That should take care of the rest, with a word of caution:
It expects the amounts to always have two digit for the cents (so no "123.5"), and it can only handle amounts and sums that never exceed ~21,000,000. Batch only has integer arithmetic, and is limited from 2^31-1 to -2^31.
And the leap year test is rather crude, but I really don't expect this script to still be running in 2100.
@echo off
setlocal enabledelayedexpansion
if "%~1"=="" (
	echo Syntax:
	echo %~nx0 ^<Input File^>
	pause
	exit /b 1
)
if not exist "%~1" (
	echo Input file '%~1' not found.
	pause
	exit /b 1
)
set InputFile=%~f1
REM Report period timestamp will be inserted before the extension
set OutputFile=%~dp1OutputFile_.csv

set Header1=ABC Company
set Header2=H1,M1,F1,G1
set Footer=L1,M,DDD
REM *** Field 1; static definition:
set Field[1]=ABC
REM *** Field 2 fixed width extraction: <ZERO-based index of the first character>,<character count>
set Field2_FixedWidth=28,9
REM *** Field 3 fixed width extraction: <ZERO-based index of the first character>,<character count>
set Field3_FixedWidth=7,17
REM *** Field 4; static definition:
set Field[4]=0
REM *** Field 5 fixed width extraction: <ZERO-based index of the first character>,<character count>
set Field5_FixedWidth=0,6
REM *** Hashtable for Field 5; format: Entity[<Entity name>]=<Value for Field 5>
set Entity[2345]=TB
set Entity[2345X]=XY
REM *** Hashtable for Field 6
set Results[positive]=D
set Results[negative]=C
REM *** Field 7; static definition:
set Field[7]=XYZ

call :GetTimeStamp TimeStampMDY
set ReportingPeriod[JAN]=01/31
if %Leap%==FALSE set (ReportingPeriod[FEB]=02/28) else (ReportingPeriod[FEB]=02/29)
set ReportingPeriod[MAR]=03/31
set ReportingPeriod[APR]=04/30
set ReportingPeriod[MAY]=05/31
set ReportingPeriod[JUN]=06/30
set ReportingPeriod[JUL]=07/31
set ReportingPeriod[AUG]=08/31
set ReportingPeriod[SEP]=09/30
set ReportingPeriod[OCT]=10/31
set ReportingPeriod[NOV]=11/30
set ReportingPeriod[DEC]=12/31
for /f "tokens=3,4 delims=- " %%a in ('type "%InputFile%" ^| find.exe /i "Reporting period: "') do (set DateReport=!ReportingPeriod[%%a]!/%%b)
for /f "tokens=1,2 delims=/" %%a in ("%DateReport%") do (set DateFile=%%a%%b)
for %%a in ("%OutputFile%") do (
	set OutputFile=%%~dpna%DateFile%%%~xa
	set OutputLog=%%~dpna%DateFile%.log
)
REM set OutputLog=CON:

set /a RecordCount = 0
set Search=
for /f "tokens=2 delims=[]" %%a in ('set Entity[') do set Search=!Search! /c:"^^%%a "
 >"%OutputFile%" echo %Header1%
>>"%OutputFile%" echo %Header2%,%DateReport%,%TimeStampMDY%
>>"%OutputLog%" echo Processing '%InputFile%' ...
set /a Sum = 0
for /f "delims=" %%a in ('type "%InputFile%" ^| findstr.exe /i /r %Search%') do call :Process "%%a"
>>"%OutputFile%" echo %Footer%,%DateReport%,%Sum:~0,-2%.%Sum:~-2%,%TimeStampMDY%,%RecordCount%
>>"%OutputLog%" echo ... done; output file is '%OutputFile%':
>>"%OutputLog%" echo.
>>"%OutputLog%" type "%OutputFile%"
>>"%OutputLog%" echo.
pause
goto :eof

:Process
set "Line=%~1"
set /a RecordCount += 1
call :Trim Field[2] "!Line:~%Field2_FixedWidth%!"
call :Trim Results "!Line:~%Field3_FixedWidth%!"
set Results=%Results:,=%
if "%Results:~0,1%"=="(" goto Negative
:Positive
set Field[3]=%Results%
set Field[6]=%Results[positive]%
goto Entity
:Negative
set Field[3]=-%Results:~1,-1%
set Field[6]=%Results[negative]%
:Entity
call :Trim Entity "!Line:~%Field5_FixedWidth%!"
>>"%OutputLog%" ECHO Extracted entity: '!Entity!'; value from hashtable: '!Entity[%Entity%]!'
set Field[5]=!Entity[%Entity%]!
set Output=
for /l %%i in (1, 1, 100) do (
	if defined Field[%%i] (set Output=!Output!,!Field[%%i]!) else (goto WriteLine)
)
:WriteLine
set /a Sum += %Field[3]:.=%
>>"%OutputFile%" echo !Output:~1!
goto :eof

:Trim <ByRef Var> <String>
set "__String=%~2"
:TrimStart
if "!__String:~0,1!"==" " (set __String=!__String:~1!&goto TrimStart)
:TrimEnd
if "!__String:~-1!"==" " (set __String=!__String:~0,-1!&goto TrimEnd)
set "%1=!__String!"
goto :eof

:GetTimeStamp
for /f "delims=" %%a in ('wmic.exe Path Win32_LocalTime Get /format:list ^| find.exe "="') do (for /f "tokens=1* delims==" %%b in ("%%a") do set %%b=%%c)
for %%a in (Month Day Hour Minute Second) do (if !%%a! LSS 10 set %%a=0!%%a!)
set /a Leap = Year %% 4
if %Leap%==0 (set Leap=TRUE) else (set Leap=FALSE)
set __TimeStamp=%Month%/%Day%/%Year%
set %1=%__TimeStamp%

Open in new window

Avatar of ptoomen

ASKER

1. In the input file I used, one of the values was larger than the admissible value allowed (less than -2^31). Is it possible to divide it by say 1000 or 1000000 while processing each record and then multiply the final sum by 1000 or 1000000 while writing the trailer record. This is because the sum would ideally be 0.

2. When I ran the program, I got the following messages on the command prompt

Invalid number. Numbers are limited to 32-bits of precision.

and few occurences of the below
Invalid number. Numeric constant are either decimal (17), hexadecimal (0x11) or octal (021).

I'm not sure if the above caused any issues as the final sum was not correct either due to above errors or maybe due to the record having the larger than admissible value.

3. The cents are always 2 digits - so, we're ok. Also, I believe leap year is ok.
Avatar of ptoomen

ASKER

Oh - sorry - you mentioned it supports only integer arithmetic...hmm - thats a problem.
Well, the integer arithmetic by itself isn't a problem, as long as the numbers stay below (2^31-1) / 100, so you can calculate with cents (hence the ~21,000,000 I mentioned above).
Since the sums involved seem to be larger, and since it's about money, you should probably post a new question in the Powershell TA, which is better prepared to handle these kinds of sums (and strings with strange characters in it) than Batch.
Feel free to post a link to the new question here.