Link to home
Start Free TrialLog in
Avatar of pkromer
pkromer

asked on

batch file to read csv files, invert fields

I have a batch file which currently reads all the csv files in a folder, strips certain characters within the files, and saves them with their original names. Now, I also need to look in the file and see if field 7 has no numbers in it, but field 8 does. If so, I need to clear the contents of field 7 and replace it with the contents of field 8. Field 8 can remain unchanged. This is on a W2008 Server 64 bit. Thank you.
Avatar of Steve Knight
Steve Knight
Flag of United Kingdom of Great Britain and Northern Ireland image

OK.  Before you gave an example of the data as:

085856,,170545,30537,F/C,1,0000000            
085674,,232455,4356,F/C,1,0000000    

So that makes 7 fields in total that I can see, not 8, or is this different data?

Also can you explain about what you mean by being no numbers in field 7, does that mean if it is 0000000 ?  Is there always a field 8?

Steve

We can read a file and look at those entries using another for command:

@echo off
for /f "tokens=1-8* delims=," %%A in ('type afile.csv') do (
  echo Fields are %%a %%b %%c %%d %%e %%f
  echo Field 7 is %%g
  echo Field 8 is %%h
  echo and the rest of the line is %%i
  echo We can put the rest of the line back togather with
  echo %%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i
)

Avatar of pkromer
pkromer

ASKER

That data i gave before was not complete lines, i shortened it for ease of viewing. there are about 30 fields in each line. This would be an example of what I am talking about...

085856,,170545,30537,F/C,1,0000000,12344,billy,etc            
085674,,232455,4356,F/C,1,WordsThatShouldntBeHere,65463,fred,etc  
Avatar of pkromer

ASKER

So field 7 may either have words OR blank, but either way, if there are numbers in field 8, erase what's in 7 and put what's in 8 there. IF 7 has words AND numbers, erase all words leaving only the number.

This seems compliacted, I don't even know if a batch file can do this. But I'll bet you do :-)
I can do but it is 11:30pm and currently have a baby on my lap who decided to wake up so will look tomorrow if no-one else gets there first!

Is a vbscript ok to go with the batch?
Avatar of pkromer

ASKER

Yes, i can just run it as a .vbs file instead of .bat, right? And will that run on a W2008 Server 64 bit?
it seems like this could be easily done with an awk script.  You could download gawk.exe for windows from
http://gnuwin32.sourceforge.net/packages/gawk.htm

Then create a file called awkfile.awk as:
BEGIN { FS = "," }
{if ($7 ~ /[0-9]/) {
      $7=$8
      }
for(i=1;i<=NF;i++) {
      if (i < NF)
            printf $i FS
      else
            printf $i RS
      }
}


and then run
\path\to\gawk -f \path\to\awkfile.awk inputfilename > outputfilename

The only part this doesn't satisfy is that the outputfile name will be different form the input file name, but thats easily renamed.  I tested with the below input:

1111,111,111,111,111,111,555,666
1111,111,111,111,111,111,nonumbers,888

and output is:
1111,111,111,111,111,111,666,666
1111,111,111,111,111,111,nonumbers,888
Whoops. just realized thats the opposite of what you asked for :)
heres the corrected awkfile (also a slightly faster version)

BEGIN { FS = "," }
{if ($7 !~ /[0-9]/) {
      $7=$8
      for(i=1;i<=NF;i++) {
            if (i < NF)
                  printf $i FS
            else
                  printf $i RS
      }
}
else
      print $0
}

I expanded my input test this time also:
1111,111,111,111,111,111,555,666
1111,111,111,111,111,111,nonumbershere,888
1111,111,111,111,111,111,555a,777
1111,111,111,111,111,111,stillnonumbers,randomstring

output:
1111,111,111,111,111,111,555,666
1111,111,111,111,111,111,888,888
1111,111,111,111,111,111,555a,777
1111,111,111,111,111,111,randomstring,randomstring
Avatar of pkromer

ASKER

I may try that, but i really don't want to add any programs to the server to do it. Is awk even 64 bit compatible?
I could understand that, I just figured I would offer it as a possibility.  Gawk is an unobtrusive, relatively small program (its a 350kb executable) with no setup or installation, you just drop the executable somewhere.  And when it comes to text processing, it's generally very good at that.  

The 64 bit part, well, I didn't really think about that part.  Another plan foiled by a simple fact. :)
Avatar of pkromer

ASKER

All good, like I said I may try it if nothing else comes up. Thanks very much for your efforts.
Interesting question. Listening...
I'm going to go with, this is not the prettiest code ive ever written, but it gets the job done.. it might need some slight modifications to work in your script, but this is basically it.
I really don't like having to use the for loops to write the output, but the ado CSV text driver is read-only, so its about the only option.

The wscript.Arguments(0) would grab the file name if it was passed as a command line parameter.  I dont know exactly how your script works, so I dont know if you need that or not.
On Error Resume Next
Const adOpenDynamic =  2
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001

Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")
Set objFSO = CreateObject("Scripting.FileSystemObject")

strFileName = Wscript.Arguments(0)
strFileName = "testdata.csv"
strPathtoTextFile = "C:\cygwin\home\pbolin\"
strFileName2 = strFileName & ".output"

Set objFile = objFSO.CreateTextFile(strPathtoTextFile & strFileName2)

objConnetion.Mode = 3
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
          "Data Source=" & strPathtoTextFile & ";" & _
          "Extended Properties=""text;HDR=YES;FMT=Delimited"""

objRecordset.Open "SELECT * FROM " & strFileName, _
          objConnection, adOpenDynamic, adLockOptimistic, adCmdText

Do Until objRecordset.EOF
    found = 0
    For i = 0 To 9
          if instr(objRecordset.Fields.Item("f7"),cStr(i)) then
                found = 1
                For j = 0 to objRecordSet.Fields.count
                      if j = 6 then
                            objFile.Write(objRecordset.Fields(7).value & ",")
                      else
                            if j < objRecordSet.Fields.count - 1 then
                                  objFile.Write(objRecordset.Fields(j).value & ",")
                            else
                                  objFile.Write(objRecordset.Fields(j).value)
                            end if
                      End If
                Next
            objFile.Writeline("")
            break
          End If
    Next
    if found <> 1 then
          For j = 0 to objRecordSet.Fields.count
              if j < (objRecordSet.Fields.count - 1) then
                         objFile.Write(objRecordset.Fields(j).value & ",")
              else
                      objFile.Write(objRecordset.Fields(j).value)
              End if
          Next
    objFile.Writeline("")
    end if
   
    objRecordset.MoveNext
Loop
ok.. that doesnt work either.. i wont post again until i have something that is guaranteed to work.. sorry
allright, this one is well tested.  important things of note:

1.  it handles null values.
2.  It handles a quoted string with a comma in it.
3. It really, really wants a schema.ini defined.  I dont see any reason why they cant all be defined as text fields for our purposes, so I did not test with other data  types.  I do not anticipate any problems if there were though.
4.  The files must have the first row as header values.  I tried setting the connection string to HDR=NO, but it still treated the first row as headers.

schema.ini:
[testdata.csv]
Col1=A Text Width 100
Col2=A Text Width 100
Col3=A Text Width 100
Col4=A Text Width 100
Col5=A Text Width 100
Col6=A Text Width 100
Col7=A Text Width 100
Col8=A Text Width 100
Col9=A Text Width 100

test input:
f1,f2,f3,f4,f5,f6,f7,f8,f9
1111,111,111,111,111,111,"fdfh555",666
1111,111,111,111,111,111,"stillnonumbers","5452","blah"
1111,111,111,111,111,111,"nonumbershere",888
222,222,222,222,222,222,"noth,ing","nsddd,saff"
222,222,222,222,222,222,"noth,ing4","nsddd,saff"
1111,111,111,111,111,111,555a,777
aaaa,bbbb,cccc,dddd,eeee,fffff,ggggg,hhhhh

test output:
1111,111,111,111,111,111,666,666,""
1111,111,111,111,111,111,stillnonumbers,5452,blah
1111,111,111,111,111,111,nonumbershere,888,""
222,222,222,222,222,222,"noth,ing","nsddd,saff",""
222,222,222,222,222,222,"nsddd,saff","nsddd,saff",""
1111,111,111,111,111,111,777,777,""
aaaa,bbbb,cccc,dddd,eeee,fffff,ggggg,hhhhh,""
tester.vbs
Avatar of pkromer

ASKER

I altered it to this...

strFileName = "*.csv"
strPathtoTextFile = ""

because it needs to go through all the csv files and deal with them, but it didn't work.

And, a process called wscript.exe runs continuously after i run the .vbs file, using 100% of my resources.

Any ideas?
pkromer - have been busy on other things since looking at this initially.  If you don't get anywhere with the methods here I might have some time tomorrow to look.

Steve
Avatar of pkromer

ASKER

Still no luck with that script, pbolintx. dragon-it, do you think this could be done along the lines of that last thing you did, which deals with the same files?

https://www.experts-exchange.com/questions/25025469/batch-file-to-read-csv-files-replace-text.html
Hadn't forgot about this, have it bookmarked but have been busy sorry.  Think in pure batch it would be unnecessarily complex but can do a hybrid batch / vbs or maybe just vbs.  Will take a look tomorrow if I can

Steve
Avatar of pkromer

ASKER

Cool, thanks much.
pkromer - sorry this one seemed to have got lost along the way and missed that it needed some feedback.  Did you get anywhere, and if not let us know if you want some more help with it.
Steve
Please allow me to make a late contribution to this question.

I notice this question has not been answered. I do not want to tread on peoples' toes however, such an interesting problem merited a full resolve.

Some of the complexities of this problem required a thorough work-around to enable DOS to successfully process CSV files containing blank fields (especially at the end of lines) as well as spaces.

Having spent a whole day writing, testing and debugging this code, I wonder if it's possible to bag the points with this late offering.

I have verified this code works 100% with the supplied data as well as test data which includes spaces.



@echo off
setlocal enabledelayedexpansion

if "%~1"=="?" (
   call :help
   exit /b 1
)

if "%~1"=="/?" (
   call :help
   exit /b 1
)

if not "%~1"=="" (
   set csvfile=%~1
) else (
   set csvfile=
   set /p csvfile=Enter name of CSV file:

   if "!csvfile!"=="" (
      echo Name of CSV file required. Aborted.
      exit /b 1
   )
)

if not exist "!csvfile!" (
   if exist "!csvfile!.csv" (
      set csvfile=!csvfile!.csv
   ) else (
      echo.
      echo Cannot find CSV file !csvfile!. Aborted.
      exit /b 1
   )
)



echo Processing file: !csvfile!

del "%~0.tmp" 2>nul

for /f "tokens=* usebackq" %%a in ("!csvfile!") do (
   set string=%%a
   set string="[!string:,=]","[!]"
   call :parse !string!
   echo !string!>>"%~0.tmp"
)
move "%~0.tmp" "!csvfile!"
exit /b 0



:parse
   set count=0

   :loop
      set /a count+=1
      set token=%~1
      set token[!count!]=!token:~1,-1!
      shift
   if not "%~1"=="" goto :loop

   echo.!token[7]! | findstr "[0-9]" >nul
   if !errorlevel!==0 (
      call :RemoveNonNumerics
   ) else (
      set token[7]=
      echo.!token[8]! | findstr "[0-9]" >nul

      if !errorlevel!==0 set token[7]=!token[8]!
   )

   set string=!token[1]!
   for /l %%a in (2,1,!count!) do set string=!string!,!token[%%a]!
goto :eof



:RemoveNonNumerics
   set numeric=

   echo !token[7]!>len
   for %%a in (len) do set /a len=%%~za - 3

   for /l %%a in (0,1,!len!) do (
      if "!token[7]:~%%a,1!" geq "0" (
         if "!token[7]:~%%a,1!" leq "9" set numeric=!numeric!!token[7]:~%%a,1!
      )
   )

   set token[7]=!numeric!
goto :eof



:help
   echo.
   echo Call using: %~n0 "filename"
   echo.
   echo Where filename is the name of your CSV file.
   echo.
   echo Can be CALLed from another batch feil as a function
   echo when processing multiple CSV files.
   echo.
goto :eof

Oops! - Just noticed an editing error. This has been corrected below:


@echo off
setlocal enabledelayedexpansion

if "%~1"=="?" (
   call :help
   exit /b 1
)

if "%~1"=="/?" (
   call :help
   exit /b 1
)

if not "%~1"=="" (
   set csvfile=%~1
) else (
   set csvfile=
   set /p csvfile=Enter name of CSV file:

   if "!csvfile!"=="" (
      echo Name of CSV file required. Aborted.
      exit /b 1
   )
)

if not exist "!csvfile!" (
   if exist "!csvfile!.csv" (
      set csvfile=!csvfile!.csv
   ) else (
      echo.
      echo Cannot find CSV file !csvfile!. Aborted.
      exit /b 1
   )
)



echo Processing file: !csvfile!

del "%~0.tmp" 2>nul

for /f "tokens=* usebackq" %%a in ("!csvfile!") do (
   set string=%%a
   set string="[!string:,=]","[!]"
   call :parse !string!
   echo !string!>>"%~0.tmp"
)
move "%~0.tmp" "!csvfile!"
exit /b 0



:parse
   set count=0

   :loop
      set /a count+=1
      set token=%~1
      set token[!count!]=!token:~1,-1!
      shift
   if not "%~1"=="" goto :loop

   echo.!token[7]! | findstr "[0-9]" >nul
   if !errorlevel!==0 (
      call :RemoveNonNumerics
   ) else (
      set token[7]=
      echo.!token[8]! | findstr "[0-9]" >nul
      if !errorlevel!==0 (
         set token[7]=!token[8]!
         call :RemoveNonNumerics
      )
   )

   set string=!token[1]!
   for /l %%a in (2,1,!count!) do (
      set string=!string!,!token[%%a]!
   )
goto :eof



:RemoveNonNumerics
   set numeric=

   echo !token[7]!>len
   for %%a in (len) do set /a len=%%~za - 3

   for /l %%a in (0,1,!len!) do (
      if "!token[7]:~%%a,1!" geq "0" (
         if "!token[7]:~%%a,1!" leq "9" (
            set numeric=!numeric!!token[7]:~%%a,1!
         )
      )
   )

   set token[7]=!numeric!
goto :eof



:help
   echo.
   echo Call using: %~n0 "filename"
   echo.
   echo Where filename is the name of your CSV file.
   echo.
   echo Can be CALLed from another batch feil as a function
   echo when processing multiple CSV files.
   echo.
goto :eof
ASKER CERTIFIED SOLUTION
Avatar of t0t0
t0t0
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial