Solved

Remove string in VB Script

Posted on 2015-02-17
15
131 Views
Last Modified: 2015-02-18
Hello,

I have a test.csv file with the following information:

A1;B1;C1
1111;ID345;475758
4566;Z_ID4758;5768595
7568;z1_ID45;4657576
45676;zxxx_ID48;4657576

I need a Vb Script to remove extra characters of values of column B if its begin with z or Z. All values of column B need to start with the char "ID".
The len of values of column B can be different.

Here the final result of test2.csv
A1;B1;C1
1111;ID345;475758
4566;ID4758;5768595
7568;ID45;4657576
45676;ID48;4657576

Thank you in advance for your help.
0
Comment
Question by:LD16
  • 5
  • 3
  • 3
  • +2
15 Comments
 
LVL 65

Expert Comment

by:RobSampson
Comment Utility
Hi there,

Give this VBS a try.

Regards,

Rob.

strFile1 = "C:\Temp\Scripts\EE\File1_Revised.txt"
strOutput = "C:\Temp\Scripts\EE\File1_Revised_Filtered.txt"

' Create the output file
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutput = objFSO.CreateTextFile(strOutput, True)

' Read through the file to omit invalid rows
Set objFile1 = objFSO.OpenTextFile(strFile1, 1, False)
' Read the header
strHeader = objFile1.ReadLine
objOutput.WriteLine strHeader
While Not objFile1.AtEndOfStream
	strLine = objFile1.ReadLine
	strKey = Split(strLine, ";")(1)
	If Left(strKey, 2) = "ID" Then objOutput.WriteLine strLine
Wend
objFile1.Close

' Close the output file
objOutput.Close

WScript.Echo "Done"

Open in new window

0
 
LVL 12

Expert Comment

by:Jeff Darling
Comment Utility
here is a way using a RegEx.  this may enable you to change the rules slightly if needed.

Option Explicit

Dim objRE
Dim strInput
Dim colMatches
Dim objMatch
Dim arrInput
Dim strOutput
Dim i
Dim objFSO
Dim objTextFileOut
Dim objTextFileIn
Dim strNextLine

Set objRE = New RegExp
objRE.Global     = True
objRE.IgnoreCase = False
objRE.Pattern    = "(.*)(ID\d{1,})"

'A1;B1;C1
'1111;ID345;475758
'4566;Z_ID4758;5768595
'7568;z1_ID45;4657576
'45676;zxxx_ID48;4657576

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFileOut = objFSO.OpenTextFile ("c:\data\out.txt", ForWriting, True)

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFileIn = objFSO.OpenTextFile  ("c:\data\in.txt", ForReading)

Do Until objTextFileIn.AtEndOfStream
    strNextLine = objTextFileIn.Readline

arrInput = Split(strNextLine,";")

Set colMatches = objRE.Execute(arrInput(1))

If colMatches.Count>0 then
For Each objMatch In colMatches
    arrInput(1)= objmatch.Submatches(1)
Next
End If
 strOutput = ""
For i = 0 To Ubound(arrInput)
 strOutput = strOutput & arrInput(i) & ";"
Next

strOutput = Mid(strOutput,1,Len(strOutput)-1)

WScript.Echo strOutput
objTextFileOut.WriteLine(strOutput)

Loop

objTextFileIn.Close
objTextFileOut.Close

Open in new window

0
 

Author Comment

by:LD16
Comment Utility
Hello Rob,

I will test this tomorrow, however I see in your code

If Left(strKey, 2) = "ID" Then objOutput.WriteLine strLine

Are you totally dropping the lines which do not begin with ID?
This is ok, however I need to clean the values, which do not begin with ID and also WriteLine with the revised value which begin with ID.
0
 
LVL 12

Expert Comment

by:Jeff Darling
Comment Utility
Your specs state if column B starts with z or Z.

I'm guessing you have rows, that have an ID that do not start with Z?

example In

A1;B1;C1
1111;ID345;475758
4566;Z_ID4758;5768595
568;x1_ID45;46576
7568;z1_ID45;4657576
45676;zxxx_ID48;4657576

Open in new window


example out

A1;B1;C1
1111;ID345;475758
4566;ID4758;5768595
568;x1_ID45;46576
7568;ID45;4657576
45676;ID48;4657576

Open in new window


revised code to also follow the rule for z or Z

Option Explicit

Dim objRE
Dim strInput
Dim colMatches
Dim objMatch
Dim arrInput
Dim strOutput
Dim i
Dim objFSO
Dim objTextFileOut
Dim objTextFileIn
Dim strNextLine

Set objRE = New RegExp
objRE.Global     = True
objRE.IgnoreCase = False
objRE.Pattern    = "[z|Z](.*)(ID\d{1,})"


Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFileOut = objFSO.OpenTextFile ("c:\data\out.txt", ForWriting, True)

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFileIn = objFSO.OpenTextFile  ("c:\data\in.txt", ForReading)

Do Until objTextFileIn.AtEndOfStream
    strNextLine = objTextFileIn.Readline

arrInput = Split(strNextLine,";")

Set colMatches = objRE.Execute(arrInput(1))

If colMatches.Count>0 then
For Each objMatch In colMatches
    arrInput(1)= objmatch.Submatches(1)
Next
End If
 strOutput = ""
For i = 0 To Ubound(arrInput)
 strOutput = strOutput & arrInput(i) & ";"
Next

strOutput = Mid(strOutput,1,Len(strOutput)-1)

WScript.Echo strOutput
objTextFileOut.WriteLine(strOutput)

Loop

objTextFileIn.Close
objTextFileOut.Close

Open in new window

0
 
LVL 65

Accepted Solution

by:
RobSampson earned 167 total points
Comment Utility
Oh, I see what you're saying.  Assuming that all of the keys have ID in them somewhere, this would strip anything before ID, and also add ID to those that do not have it:
strFile1 = "C:\Temp\Scripts\EE\File1_Revised.txt"
strOutput = "C:\Temp\Scripts\EE\File1_Revised_Filtered.txt"

' Create the output file
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutput = objFSO.CreateTextFile(strOutput, True)

' Read through the file to omit invalid rows
Set objFile1 = objFSO.OpenTextFile(strFile1, 1, False)
' Read the header
strHeader = objFile1.ReadLine
objOutput.WriteLine strHeader
While Not objFile1.AtEndOfStream
	strLine = objFile1.ReadLine
	arrValues = Split(strLine, ";")
	If InStr(arrValues(1), "ID") > 0 Then
		arrValues(1) = Mid(arrValues(1), InStr(arrValues(1), "ID"))
	Else
		arrValues(1) = "ID" & arrValues(1)
	End If
	objOutput.WriteLine Join(arrValues, ";")
Wend
objFile1.Close

' Close the output file
objOutput.Close

WScript.Echo "Done"

Open in new window


Rob.
0
 
LVL 51

Assisted Solution

by:Bill Prew
Bill Prew earned 333 total points
Comment Utility
Here's a relatively strait forward REGEX replacement approach.

As always, run with one or two file parks:

cscript EE28618822.vbs in.csv out.csv

or

cscript EE28618822.vbs in.csv

the second overwrites the file with the changes.

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

' Get input file name from command line parm, if 2 parms entered
' use second as new output file, else rewrite to input file
If (WScript.Arguments.Count > 0) Then
  sInfile = WScript.Arguments(0)
Else
  WScript.Echo "No filename specified."
  WScript.Quit
End If
If (WScript.Arguments.Count > 1) Then
  sOutfile = WScript.Arguments(1)
Else
  sOutfile = sInfile
End If

' Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")

' Read entire input file into a variable and close it
Set oInfile = oFSO.OpenTextFile(sInfile, ForReading, False, TriStateUseDefault)
sData = oInfile.ReadAll
oInfile.Close
Set oInfile = Nothing

' Replace desired strings
Set oRegExp = New RegExp
oRegExp.Global = True
oRegExp.IgnoreCase = False
oRegExp.Pattern = "(;)([zZ].*)(ID)"
sData = oRegExp.Replace(sData, "$1$3")

' Write file with any changes made
Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing

' Cleanup and end
Set oFSO = Nothing
Wscript.Quit

Open in new window

~bp
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
The simpler regex replacement pattern is this
;(.*?)(ID[^;]*);

Open in new window


Replace with
;$2;

Open in new window

0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
@aikimark,

I always enjoy seeing alternate REGEX expressions, thanks for adding.

I thought the OP only wanted to edit that column if it started with z or Z though, wouldn't you need to add that?

~bp
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
The subsequent comments led me to think that anything between the ; and the ID should be removed.  If I've misread/misinterpreted the requirement then ignore my comment.

Also, if you set the regexp object to ignore case, your patterns might be simpler.
0
 
LVL 51

Assisted Solution

by:Bill Prew
Bill Prew earned 333 total points
Comment Utility
I guess another alternate could be this, although I lean toward soft coding the result string rather than recoding the search string surrounds, easier to maintain in the future, only one change.  A personal preference...

oRegExp.Pattern = ";([zZ].*)ID"
sData = oRegExp.Replace(sData, ";ID")

~bp
0
 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
@LD16,

Need some further clarification from you.  Do you only want to do this for the fields with a z or Z at the beginning, or any field that has extra characters before the "ID".

Also, if a field does not have either a z/Z or an ID at the beginning, are you saying those need an ID inserted?

~bp
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
And this pattern would probably be more efficient, since it only replaces if there are one or more characters before the ID
;(.+?)(ID[^;]*);

Open in new window


Maybe I should test these two methods like I did in my most recent article.  However, performance probably isn't a big concern in this instance.
0
 
LVL 65

Expert Comment

by:RobSampson
Comment Utility
The subsequent comments led me to think that anything between the ; and the ID should be removed.

That was the assumption I made as well, since the author stated that each field should start with ID.

Rob.
0
 

Author Comment

by:LD16
Comment Utility
@Bill,

I made a mistake in my spec all the values begins with z or Z so  "ID" insertion is not required.

Thank you
0
 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
Okay, great, let me know how you make out with my solution or others, hopefully we have some things that canget the job done.

~bp
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

A little background as to how I came to I design this code: Around 5 years ago I designed an add-in that formatted Excel files to a corporate standard, applying different cell colours and font type depending on whether the cells contained inputs,…
This tutorial explains how to create a series of drop-down lists that are dependent upon prior selections to guide (“force”) the user to make the correct selection and reduce data errors within Microsoft Excel. Excel 2010 was used for this tutorial;…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now