?
Solved

VBScipt to remove lines with zero value

Posted on 2010-01-05
22
Medium Priority
?
475 Views
Last Modified: 2012-06-21
I have a .csv file.  In column 14 of the file many of the lines have a value of "0".  I need to script to look at each line and if the value in column 14 on that line is 0, delete it then write the remaining lines to a new file.
0
Comment
Question by:vpnsol123
  • 10
  • 9
  • 2
21 Comments
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 26182743
"....if the value in column 14 on that line is 0, delete it then write the remaining lines to a new file."

Could please clarify...

Do you want the original file to remain UNTOUCHED or not?

The new file should only have lines where Column 14 is NOT zero?

Also, do any of your values have a comma INSIDE the quotation marks?...or can we safely Split() on comma?
0
 

Author Comment

by:vpnsol123
ID: 26182796
It is preferable that the originial file remain in tact and a new file be created for the non-zero value lines.

Also, yes we can safely split on comma.

Thanks
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26182872
Dim fso, tsIn, tsOut, TheLine, LineArr

Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    LineArr = Split(TheLine, ",")
    If LineArr(13) <> 0 Then tsOut.WriteLine TheLine
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing

MsgBox "Done"
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26182915
If we had to be wary of fields that included commas, assuming doublequote is the text qualifier...



Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX

Set RegX = New RegExp
RegX.Pattern = """[^""]*"""
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = RegX.Replace(TheLine, "")
    LineArr = Split(Test, ",")
    If LineArr(13) <> 0 Then tsOut.WriteLine TheLine
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set RegExp = Nothing

MsgBox "Done"
0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 26182949
Looks like matthewspatrick has already given you a couple of great answers...  =)
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26183133
Need a small tweak to the "have to be wary of internal commas" version, as the Global property for RegExp
is False by default:




Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX

Set RegX = New RegExp
With RegX
    .Pattern = """[^""]*"""
    .Global = True
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = RegX.Replace(TheLine, "")
    LineArr = Split(Test, ",")
    If LineArr(13) <> 0 Then tsOut.WriteLine TheLine
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set RegExp = Nothing

MsgBox "Done"
0
 

Author Comment

by:vpnsol123
ID: 26184309
This is perfect.  Now to complicate it.  I am finding that the program that I am importing this into is a bit complicated.  The above logic works perfectly.  Is there an ability to have the VBScript output only fields pecified in the script.  In other words, if I only want to output columns 5,7,9,11,13 and 15 into the output file, how would I do that?

Thank you again.
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26191717
Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX, OutputCols, OutCol, NewLine

OutputCols = Array(5, 7, 9, 11, 13, 15)

Set RegX = New RegExp
With RegX
    .Pattern = """[^""]*"""
    .Global = True
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = RegX.Replace(TheLine, "")
    LineArr = Split(Test, ",")
    If LineArr(13) <> 0 Then
        NewLine = ""
        For Each OutputCol In OutputCols
            NewLine = NewLine & "," & LineArr(OutputCol)
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set RegExp = Nothing

MsgBox "Done"
0
 

Author Comment

by:vpnsol123
ID: 26213428
This works very well.  The only problem that I am having now is that column 31 contains a comma in it.  It appeared as though the logic was there to split that cell but that column is actually messing up my output.  Any thoughts?
0
 

Author Comment

by:vpnsol123
ID: 26214256
Ok...I see the problem.  The script in its current form is searching for the comma within a column and then blanking out that column.   What I need it to do it to split the column into 2 and then to output those values or to just replace the comma with a space.
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26283885
Try this revised code.  It tries to replace internal commas with spaces.




Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX, OutputCols, OutCol, NewLine, Mats

OutputCols = Array(5, 7, 9, 11, 13, 15)

Set RegX = New RegExp
With RegX
    .Pattern = """[^,]*,.*"""
    .Global = False
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\the file.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = TheLine
    Do
        Test = RegX.Replace(Test, " ")
        Set Mats = RegX.Execute(Test)
        If Mats.Count = 0 Then Exit Do
    Loop
    LineArr = Split(Test, ",")
    If LineArr(13) <> 0 Then
        NewLine = ""
        For Each OutputCol In OutputCols
            NewLine = NewLine & "," & LineArr(OutputCol)
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set Mats = Nothing
Set RegExp = Nothing

MsgBox "Done"
0
 

Author Comment

by:vpnsol123
ID: 26284683
That seems to do the trick with the comma's but it also seems to have broken the part of the code that deletes the lines with a value of 0 in column 14.

Thanks
0
 

Author Comment

by:vpnsol123
ID: 26285125
I am getting subscript out of rage:'OutputCol' when I run this.
My output columns look like this:Array(7, 8, 12, 13, 15, 25, 31, 33)
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26286811
vpnsol123,

It would be useful to see some sample data and/or a sample file.  EE now allows you to directly upload files
to your question.

Please be advised that once you upload a file, it can be publicly accessed, and that it may not be possible
to fully and permanently delete it.  Therefore, be very careful about posting proprietary, confidential, or
other sensitive information.  If necessary, use "fake" and/or obfuscated data in your sample.

Please note that at present EE restricts uploads to certain file types.  If your file type does not match
those in the list, you can use http://www.ee-stuff.com instead, which is not officially an EE site, but is run
by people connected to EE.

Patrick
0
 

Author Comment

by:vpnsol123
ID: 26287183
Below is a single sample record.  I can not upload a file.  It is patient sensitive so I creasted a sample..  You will see that the data is separated by commas.  The comma between the words Release and final is the one that I am trying to remove.

Outpatient Technical,N,Y,3127,11/06/2009 17:41:00,9,DG09-75588,Smith John,,,mim4681961,11/05/2009 04:24:00,00000,ANP31100068,1,,,N,N,11/06/2009 17:42:07:246,11/06/2009 17:42:15,Sent,1,Outpatient,3019681166,,352603735,Medical Center,CYTO THIN PREP SCREEN 88142/31100068,,"Release,Final",mim55071,"Jones, John. (021675)"
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26287480
vpnsol123,

OK, there were a few syntax errors there, but this seems to work.  Some notes:

1) The lower bound for all arrays in VBScript is zero.  Thus, the 33rd column is referred to as 32
2) On that basis, I changed the reference to the 13th column to 12 for testing the <>0 condition



Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX, OutputCols, OutCol, NewLine, Mats

OutputCols = Array(7, 8, 12, 13, 15, 25, 31, 32)

Set RegX = New RegExp
With RegX
    .Pattern = """[^,]*,.*"""
    .Global = False
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\test.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = TheLine
    Do
        Test = RegX.Replace(Test, " ")
        Set Mats = RegX.Execute(Test)
        If Mats.Count = 0 Then Exit Do
    Loop
    LineArr = Split(Test, ",")
    If CDbl(LineArr(12)) <> 0 Then
        NewLine = ""
        For OutputCol = LBound(OutputCols) To UBound(OutputCols)
            NewLine = NewLine & "," & LineArr(OutputCol)
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set Mats = Nothing
Set RegExp = Nothing

MsgBox "Done"


Patrick
0
 

Author Comment

by:vpnsol123
ID: 26287638
I see what you changed here.  

For OutputCol = LBound(OutputCols) To UBound(OutputCols)

But in this it is taking columns 0 through 6

Is that because 0 is the LBound and 6 is the UBound?
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26288556
Sorry, dumb error on my part.




Dim fso, tsIn, tsOut, TheLine, Test, LineArr, RegX, OutputCols, OutCol, NewLine, Mats

OutputCols = Array(7, 8, 12, 13, 15, 25, 31, 32)

Set RegX = New RegExp
With RegX
    .Pattern = """[^,]*,.*"""
    .Global = False
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\test.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    Test = TheLine
    Do
        Test = RegX.Replace(Test, " ")
        Set Mats = RegX.Execute(Test)
        If Mats.Count = 0 Then Exit Do
    Loop
    LineArr = Split(Test, ",")
    If CDbl(LineArr(12)) <> 0 Then
        NewLine = ""
        For OutputCol = LBound(OutputCols) To UBound(OutputCols)
            NewLine = NewLine & "," & LineArr(OutputCols(OutputCol))
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set Mats = Nothing
Set RegExp = Nothing

MsgBox "Done"
0
 

Author Comment

by:vpnsol123
ID: 26289100
Sorry to be a pain.  Same Subscript  Out Of Range Error.
0
 
LVL 93

Accepted Solution

by:
Patrick Matthews earned 2000 total points
ID: 26289577
vpnsol123,

This one was trickier than I thought :)

This seems to be working now--the RegExp was not quite right before.




Dim fso, tsIn, tsOut, TheLine, test, LineArr, RegX, OutputCols, OutputCol, NewLine, Mats

OutputCols = Array(7, 8, 12, 13, 15, 25, 31, 32)

Set RegX = New RegExp
With RegX
    .Pattern = "(,|^)(""[^,]+)(,)([^""]+"")(,|$)"
    .Global = False
End With
Set fso = CreateObject("Scripting.FileSystemObject")
Set tsIn = fso.OpenTextFile("c:\test.csv")
Set tsOut = fso.CreateTextFile("c:\new file.csv", True)

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    test = TheLine
    Do
        test = RegX.Replace(test, "$1$2 $4$5")
        Set Mats = RegX.Execute(test)
        If Mats.Count = 0 Then Exit Do
    Loop
    LineArr = Split(test, ",")
    If CDbl(LineArr(12)) <> 0 Then
        NewLine = ""
        For OutputCol = LBound(OutputCols) To UBound(OutputCols)
            NewLine = NewLine & "," & LineArr(OutputCols(OutputCol))
        Next
        tsOut.WriteLine Mid(NewLine, 2)
    End If
Loop

tsIn.Close
Set tsIn = Nothing
tsOut.Close
Set tsOut = Nothing
Set fso = Nothing
Set Mats = Nothing
Set RegX = Nothing

MsgBox "Done"





Patrick
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 26313325
vpnsol123,

Any feedback?

Patrick
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is the result of a quest to better understand Task Scheduler 2.0 and all the newer objects available in vbscript in this version over  the limited options we had scripting in Task Scheduler 1.0.  As I started my journey of knowledge I f…
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Suggested Courses
Course of the Month15 days, 9 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question