Link to home
Start Free TrialLog in
Avatar of vpnsol123
vpnsol123

asked on

VBScipt to remove lines with zero value

I have a .csv file.  In column 14 of the file many of the lines have a value of "0".  I need to script to look at each line and if the value in column 14 on that line is 0, delete it then write the remaining lines to a new file.
Avatar of Mike Tomlinson
Mike Tomlinson
Flag of United States of America image

"....if the value in column 14 on that line is 0, delete it then write the remaining lines to a new file."

Could please clarify...

Do you want the original file to remain UNTOUCHED or not?

The new file should only have lines where Column 14 is NOT zero?

Also, do any of your values have a comma INSIDE the quotation marks?...or can we safely Split() on comma?
Avatar of vpnsol123
vpnsol123

ASKER

It is preferable that the originial file remain in tact and a new file be created for the non-zero value lines.

Also, yes we can safely split on comma.

Thanks
Dim fso, tsIn, tsOut, TheLine, LineArr

Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    LineArr = Split(TheLine, ",")
    If LineArr(13) <> 0 Then tsOut.WriteLine TheLine
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing

MsgBox "Done"
If we had to be wary of fields that included commas, assuming doublequote is the text qualifier...



Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX

Set RegX = New RegExp
RegX.Pattern = """[^""]*"""
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = RegX.Replace(TheLine, "")
    LineArr = Split(Test, ",")
    If LineArr(13) <> 0 Then tsOut.WriteLine TheLine
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set RegExp = Nothing

MsgBox "Done"
Looks like matthewspatrick has already given you a couple of great answers...  =)
Need a small tweak to the "have to be wary of internal commas" version, as the Global property for RegExp
is False by default:




Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX

Set RegX = New RegExp
With RegX
    .Pattern = """[^""]*"""
    .Global = True
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = RegX.Replace(TheLine, "")
    LineArr = Split(Test, ",")
    If LineArr(13) <> 0 Then tsOut.WriteLine TheLine
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set RegExp = Nothing

MsgBox "Done"
This is perfect.  Now to complicate it.  I am finding that the program that I am importing this into is a bit complicated.  The above logic works perfectly.  Is there an ability to have the VBScript output only fields pecified in the script.  In other words, if I only want to output columns 5,7,9,11,13 and 15 into the output file, how would I do that?

Thank you again.
Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX, OutputCols, OutCol, NewLine

OutputCols = Array(5, 7, 9, 11, 13, 15)

Set RegX = New RegExp
With RegX
    .Pattern = """[^""]*"""
    .Global = True
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = RegX.Replace(TheLine, "")
    LineArr = Split(Test, ",")
    If LineArr(13) <> 0 Then
        NewLine = ""
        For Each OutputCol In OutputCols
            NewLine = NewLine & "," & LineArr(OutputCol)
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set RegExp = Nothing

MsgBox "Done"
This works very well.  The only problem that I am having now is that column 31 contains a comma in it.  It appeared as though the logic was there to split that cell but that column is actually messing up my output.  Any thoughts?
Ok...I see the problem.  The script in its current form is searching for the comma within a column and then blanking out that column.   What I need it to do it to split the column into 2 and then to output those values or to just replace the comma with a space.
Try this revised code.  It tries to replace internal commas with spaces.




Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX, OutputCols, OutCol, NewLine, Mats

OutputCols = Array(5, 7, 9, 11, 13, 15)

Set RegX = New RegExp
With RegX
    .Pattern = """[^,]*,.*"""
    .Global = False
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = TheLine
    Do
        Test = RegX.Replace(Test, " ")
        Set Mats = RegX.Execute(Test)
        If Mats.Count = 0 Then Exit Do
    Loop
    LineArr = Split(Test, ",")
    If LineArr(13) <> 0 Then
        NewLine = ""
        For Each OutputCol In OutputCols
            NewLine = NewLine & "," & LineArr(OutputCol)
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set Mats = Nothing
Set RegExp = Nothing

MsgBox "Done"
That seems to do the trick with the comma's but it also seems to have broken the part of the code that deletes the lines with a value of 0 in column 14.

Thanks
I am getting subscript out of rage:'OutputCol' when I run this.
My output columns look like this:Array(7, 8, 12, 13, 15, 25, 31, 33)
vpnsol123,

It would be useful to see some sample data and/or a sample file.  EE now allows you to directly upload files
to your question.

Please be advised that once you upload a file, it can be publicly accessed, and that it may not be possible
to fully and permanently delete it.  Therefore, be very careful about posting proprietary, confidential, or
other sensitive information.  If necessary, use "fake" and/or obfuscated data in your sample.

Please note that at present EE restricts uploads to certain file types.  If your file type does not match
those in the list, you can use http://www.ee-stuff.com instead, which is not officially an EE site, but is run
by people connected to EE.

Patrick
Below is a single sample record.  I can not upload a file.  It is patient sensitive so I creasted a sample..  You will see that the data is separated by commas.  The comma between the words Release and final is the one that I am trying to remove.

Outpatient Technical,N,Y,3127,11/06/2009 17:41:00,9,DG09-75588,Smith John,,,mim4681961,11/05/2009 04:24:00,00000,ANP31100068,1,,,N,N,11/06/2009 17:42:07:246,11/06/2009 17:42:15,Sent,1,Outpatient,3019681166,,352603735,Medical Center,CYTO THIN PREP SCREEN 88142/31100068,,"Release,Final",mim55071,"Jones, John. (021675)"
vpnsol123,

OK, there were a few syntax errors there, but this seems to work.  Some notes:

1) The lower bound for all arrays in VBScript is zero.  Thus, the 33rd column is referred to as 32
2) On that basis, I changed the reference to the 13th column to 12 for testing the <>0 condition



Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX, OutputCols, OutCol, NewLine, Mats

OutputCols = Array(7, 8, 12, 13, 15, 25, 31, 32)

Set RegX = New RegExp
With RegX
    .Pattern = """[^,]*,.*"""
    .Global = False
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\test.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = TheLine
    Do
        Test = RegX.Replace(Test, " ")
        Set Mats = RegX.Execute(Test)
        If Mats.Count = 0 Then Exit Do
    Loop
    LineArr = Split(Test, ",")
    If CDbl(LineArr(12)) <> 0 Then
        NewLine = ""
        For OutputCol = LBound(OutputCols) To UBound(OutputCols)
            NewLine = NewLine & "," & LineArr(OutputCol)
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set Mats = Nothing
Set RegExp = Nothing

MsgBox "Done"


Patrick
I see what you changed here.  

For OutputCol = LBound(OutputCols) To UBound(OutputCols)

But in this it is taking columns 0 through 6

Is that because 0 is the LBound and 6 is the UBound?
Sorry, dumb error on my part.




Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX, OutputCols, OutCol, NewLine, Mats

OutputCols = Array(7, 8, 12, 13, 15, 25, 31, 32)

Set RegX = New RegExp
With RegX
    .Pattern = """[^,]*,.*"""
    .Global = False
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\test.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = TheLine
    Do
        Test = RegX.Replace(Test, " ")
        Set Mats = RegX.Execute(Test)
        If Mats.Count = 0 Then Exit Do
    Loop
    LineArr = Split(Test, ",")
    If CDbl(LineArr(12)) <> 0 Then
        NewLine = ""
        For OutputCol = LBound(OutputCols) To UBound(OutputCols)
            NewLine = NewLine & "," & LineArr(OutputCols(OutputCol))
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set Mats = Nothing
Set RegExp = Nothing

MsgBox "Done"
Sorry to be a pain.  Same Subscript  Out Of Range Error.
ASKER CERTIFIED SOLUTION
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
vpnsol123,

Any feedback?

Patrick