Solved

Remove lines with Vb script

Posted on 2015-02-16
28
159 Views
Last Modified: 2015-02-17
Hello,

I have two csv files:
file1.csv with the following information:

1111;A8;5555
3455;B7;6778
5767;C3;6770
2222;A8;6666

file2.csv with the following information

C4;aaaa
A8;cccc
C3;dddd

I need a vbscript which reads both files and create a file1_revised.csv without lines which are not in column A of file2.

Final result of file1_revised.csv:
1111;A8;5555
5767;C3;6770
2222;A8;6666
0
Comment
Question by:LD16
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 6
  • 5
  • +1
28 Comments
 
LVL 46

Expert Comment

by:aikimark
ID: 40613086
Please test this:
Sub Q_28618048()
    Const ForReading = 1, ForWriting = 2, ForAppending = 3
    Dim oFS, oTS
    Dim strFile1Data, strFile2Data
    Dim vParsedData, vParsedLine, vItem
    Dim dicUnique
    Set oFS = CreateObject("Scripting.FileSystemObject")
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file1.csv", ForReading, False)
    strFile1Data = oTS.readall
    oTS.Close
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file2.csv", ForReading, False)
    strFile2Data = oTS.readall
    oTS.Close
    
    Set dicUnique = CreateObject("Scripting.dictionary")
    
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim oSM As Object
    Dim lngSM As Long

    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "([^;]*);.*(?:\r\n|$)"

    If oRE.test(strFile2Data) Then
        Set oMatches = oRE.Execute(strFile2Data)
        For Each oM In oMatches
            With oM
                If dicUnique.exists(.submatches(0)) Then
                Else
                    dicUnique.Add .submatches(0), 1
                End If
            End With
        Next
    End If
    
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file1_revised.csv ", ForWriting, True)
    vParsedData = Split(strFile1Data, vbCrLf)
    For Each vItem In vParsedData
        vParsedLine = Split(vItem, ";")
        If dicUnique.exists(vParsedLine(1)) Then
            oTS.writeline vItem
        End If
    Next
    oTS.Close
    
End Sub

Open in new window

0
 
LVL 65

Expert Comment

by:RobSampson
ID: 40613219
Hi, here's a Powershell version, if you're interested:
$file1 = Import-CSV ".\test1.txt" -Header "Col1","Col2","Col3" -Delimiter ";"
$file2 = Import-CSV ".\test2.txt" -Header "Col2","Col3" -Delimiter ";"
$file1 | Where Col2 -In $file2.Col2 | ConvertTo-CSV -Delimiter ";" -NoTypeInformation | Select -Skip 1 | ForEach {$_ -replace '"', ''} | Out-File ".\test1_revised.txt"

Open in new window


Regards,

Rob.
0
 
LVL 56

Expert Comment

by:Bill Prew
ID: 40613392
Got an idea for a slightly different approach, hope I can get it to work...

~bp
0
 Database Backup and Recovery Best Practices

Join Percona’s, Architect, Manjot Singh as he presents Database Backup and Recovery Best Practices (with a Focus on MySQL) on Thursday, July 27, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7). In the case of a failure, do you know how long it will take to restore your database?

 
LVL 56

Expert Comment

by:Bill Prew
ID: 40613444
Okay, here's a different VBS approach.  I used ADO to allow treating the CSV files as "databases tables", and then can do a JOIN of the column you wanted to match on.  Seems to work when I ran it here.

Save as VBS in the folder with the files, adjust file names and path as needed. Also store the INI file below in the same folder and it must be named "schema.ini", it defines the format and columns in each CSV file.

Let me know what questions you have...

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
 
' Define files to work with
strFileIn1 = "file1.csv"
strFileIn2 = "file2.csv"
strFileOut = "out.csv"
strFolder="B:\ee\EE28618048"
 
' Define needed objects 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")

' Set up ADO to access the CSV files and join on reference column
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & strFolder & ";Extended Properties='text;HDR=NO;FMT=Delimited'"
objRecordset.Open "SELECT * FROM " & strFileIn1 & ", " & strFileIn2 & " WHERE C21=C12", objConnection

' Open output file
Set objFile = objFSO.OpenTextFile(strFileOut, ForWriting, True)

' Read all records and reformat before writing to output file
Do Until objRecordset.EOF
   objFile.WriteLine objRecordset.Fields("C11").Value & ";" & _
   objRecordset.Fields("C12").Value & ";" & _
   objRecordset.Fields("C13").Value
   objRecordset.MoveNext
Loop

' Wrapup
objFile.Close
objRecordset.Close
objConnection.Close

Open in new window


[file1.csv]
Format=Delimited(;)
ColNameHeader=False
Col1=C11 Text
Col2=C12 Text
Col3=C13 Text

[file2.csv]
Format=Delimited(;)
ColNameHeader=False
Col1=C21 Text
Col2=C22 Text

Open in new window

~bp
0
 

Author Comment

by:LD16
ID: 40613769
Hello Bill,

I tried, your script however I have duplicate values on  the out.csv.
Please find bellow the result of out.csv:
1111;A8;5555
1111;A8;5555
2222;A8;6666
2222;A8;6666
5767;C3;6770

And the result should be the following:
1111;A8;5555
2222;A8;6666
5767;C3;6770

Thank you again for your help.
0
 

Author Comment

by:LD16
ID: 40613776
Hello Aikimar

I have a  Compilation error: Expected end of statement on line 16 caract 13. Thank you again for your help.
0
 
LVL 65

Assisted Solution

by:RobSampson
RobSampson earned 166 total points
ID: 40613935
Hi, if the Powershell script isn't going to suit your needs, here is my version of a VBS approach.

Regards,

Rob.

strFile1 = "C:\Temp\Scripts\EE\File1.txt"
strFile2 = "C:\Temp\Scripts\EE\File2.txt"
strOutput = "C:\Temp\Scripts\EE\File1_Revised.txt"

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile2Dct = CreateObject("Scripting.Dictionary")
objFile2Dct.CompareMode = vbTextCompare

' Create the output file
Set objOutput = objFSO.CreateTextFile(strOutput, True)

' First read file 2 to obtain the "key" values
Set objFile2 = objFSO.OpenTextFile(strFile2, 1, False)
While Not objFile2.AtEndOfStream
	strLine = objFile2.ReadLine
	strKey = Split(strLine, ";")(0)
	If objFile2Dct.Exists(strKey) = False Then objFile2Dct.Add strKey, 0
Wend
objFile2.Close

' Then read through the first file and see if the "key" matches those found from the second
Set objFile1 = objFSO.OpenTextFile(strFile1, 1, False)
While Not objFile1.AtEndOfStream
	strLine = objFile1.ReadLine
	strKey = Split(strLine, ";")(1)
	If objFile2Dct.Exists(strKey) = True Then objOutput.WriteLine strLine
Wend
objFile1.Close

' Close the output file
objOutput.Close

WScript.Echo "Done"

Open in new window

0
 
LVL 65

Expert Comment

by:RobSampson
ID: 40613937
For aikimark's to work, you need to remove the "As [Type]" parts of lines 17 through 21, and remove the first and last lines if you are using a pure text VBScript file.

Rob.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40613977
Oops.  I forgot to remove the data types in the definitions.  Please test this.
Sub Q_28618048()
    Const ForReading = 1, ForWriting = 2, ForAppending = 3
    Dim oFS, oTS
    Dim strFile1Data, strFile2Data
    Dim vParsedData, vParsedLine, vItem
    Dim dicUnique
    Set oFS = CreateObject("Scripting.FileSystemObject")
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file1.csv", ForReading, False)
    strFile1Data = oTS.readall
    oTS.Close
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file2.csv", ForReading, False)
    strFile2Data = oTS.readall
    oTS.Close
    
    Set dicUnique = CreateObject("Scripting.dictionary")
    
    Dim oRE 
    Dim oMatches 
    Dim oM 

    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "([^;]*);.*(?:\r\n|$)"

    If oRE.test(strFile2Data) Then
        Set oMatches = oRE.Execute(strFile2Data)
        For Each oM In oMatches
            With oM
                If dicUnique.exists(.submatches(0)) Then
                Else
                    dicUnique.Add .submatches(0), 1
                End If
            End With
        Next
    End If
    
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file1_revised.csv ", ForWriting, True)
    vParsedData = Split(strFile1Data, vbCrLf)
    For Each vItem In vParsedData
        vParsedLine = Split(vItem, ";")
        If dicUnique.exists(vParsedLine(1)) Then
            oTS.writeline vItem
        End If
    Next
    oTS.Close
    
End Sub

Open in new window

0
 

Author Comment

by:LD16
ID: 40614023
Hello,
Your vbs work prefectly. One question came across. How can include the headers of file 1
0
 

Author Comment

by:LD16
ID: 40614028
Hello Rob,
Your vbs work prefectly. One question came across. How can include the headers of file 1 in the File1_Revised?

Thank you again for your help.

Regards,
0
 
LVL 46

Accepted Solution

by:
aikimark earned 167 total points
ID: 40614046
This seems to work with headers.  Since your posted data did not include headers, I had to create some to test this revised code.
Sub Q_28618048()
    Const ForReading = 1, ForWriting = 2, ForAppending = 3
    Dim oFS, oTS
    Dim strFile1Data, strFile2Data
    Dim vParsedData, vParsedLine, vItem
    Dim dicUnique
    Set oFS = CreateObject("Scripting.FileSystemObject")
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file1.csv", ForReading, False)
    strFile1Data = oTS.readall
    oTS.Close
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file2.csv", ForReading, False)
    strFile2Data = oTS.readall
    oTS.Close
    
    Set dicUnique = CreateObject("Scripting.dictionary")
    
    Dim oRE
    Dim oMatches
    Dim oM

    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "([^;]*);.*(?:\r\n|$)"

    If oRE.test(strFile2Data) Then
        Set oMatches = oRE.Execute(strFile2Data)
        For Each oM In oMatches
            With oM
                If dicUnique.exists(.submatches(0)) Then
                Else
                    dicUnique.Add .submatches(0), 1
                End If
            End With
        Next
    End If
    
    Set oTS = oFS.OpenTextFile("c:\users\aikimark\downloads\file1_revised.csv ", ForWriting, True)
    vParsedData = Split(strFile1Data, vbCrLf)
    oTS.writeline vParsedData(0)
    For Each vItem In vParsedData
        vParsedLine = Split(vItem, ";")
        If dicUnique.exists(vParsedLine(1)) Then
            oTS.writeline vItem
        End If
    Next
    oTS.Close
    
End Sub

Open in new window

0
 

Author Comment

by:LD16
ID: 40614080
Hello Aikimark,

I was not able to run your revised vbscript any error message however I putted a wcript.Echo and I was unable to display the echo.  I modified the path and create the csv files as mention in my last comment.
Additionnally I tried to launch the script without file1_revised.csv and with a blank version of file1_revised.csv.

Thank you again for your help.

Regards,
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40614238
Since my script is a routine, did you invoke the routine or just try to run the VBS as submitted?

Did you change the path in the string literals?
0
 

Author Comment

by:LD16
ID: 40614361
Hello,

I've just tried, to run the vbs.

I've changed just the path of oTS variable.

Could you please let me know how to invoke the routine?

Regards,
0
 

Author Comment

by:LD16
ID: 40614406
Hello Rob,

I have included a objOutput.WriteLine for the headers of the revised file.

Thank you very much for your help.

Regards,
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40614436
you just call the routine by name somewhere in your VBS script.

Optionally, you could remove the Sub and End Sub lines.
0
 
LVL 56

Assisted Solution

by:Bill Prew
Bill Prew earned 167 total points
ID: 40615227
Here's a small change to remove the duplicates in the approach I shared.

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
 
' Define files to work with
strFileIn1 = "file1.csv"
strFileIn2 = "file2.csv"
strFileOut = "out.csv"
strFolder="B:\ee\EE28618048"
 
' Define needed objects 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")

' Set up ADO to access the CSV files and join on reference column
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & strFolder & ";Extended Properties='text;HDR=NO;FMT=Delimited'"
objRecordset.Open "SELECT DISTINCT * FROM " & strFileIn1 & ", " & strFileIn2 & " WHERE C21=C12", objConnection

' Open output file
Set objFile = objFSO.OpenTextFile(strFileOut, ForWriting, True)

' Read all records and reformat before writing to output file
Do Until objRecordset.EOF
   objFile.WriteLine objRecordset.Fields("C11").Value & ";" & _
   objRecordset.Fields("C12").Value & ";" & _
   objRecordset.Fields("C13").Value
   objRecordset.MoveNext
Loop

' Wrapup
objFile.Close
objRecordset.Close
objConnection.Close

Open in new window

~bp
0
 

Author Comment

by:LD16
ID: 40615321
Hello,

Both scripts work properly. Thank you very much for help.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40615347
Both scripts work properly
If both scripts worked, why did you choose only one as the solution?
0
 

Author Comment

by:LD16
ID: 40615360
Good question, I tried also to choose your solution but I don't know if there is a way to select many accepted solutions, do you know?
0
 
LVL 65

Expert Comment

by:RobSampson
ID: 40615371
You can select multiple solutions.  I will re-open the question for you to re-close.

Rob.
0
 

Author Comment

by:LD16
ID: 40615376
Aikimar, if you are interested, I have opened another question:
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_28618822.html
0
 
LVL 65

Expert Comment

by:RobSampson
ID: 40615392
The selected answers don't necessarily reflect the correct scripts.  Would you like me to re-open it for you to double-check?
0
 

Author Comment

by:LD16
ID: 40615408
Yes, sorry for this. Thank you.
0

Featured Post

Percona Live Europe 2017 | Sep 25 - 27, 2017

The Percona Live Open Source Database Conference Europe 2017 is the premier event for the diverse and active European open source database community, as well as businesses that develop and use open source database software.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Prime numbers are natural numbers greater than 1 that have only two divisors (the number itself and 1). By “divisible” we mean dividend % divisor = 0 (% indicates MODULAR. It gives the reminder of a division operation). We’ll follow multiple approac…
Deploying a Microsoft Access application in a Citrix environment is not difficult but takes a few steps. However, Citrix system people are often of little help, as they typically know next to nothing about Access. The script provided here will take …
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question