Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 189
  • Last Modified:

Remove string in VB Script

Hello,

I have a test.csv file with the following information:

A1;B1;C1
1111;ID345;475758
4566;Z_ID4758;5768595
7568;z1_ID45;4657576
45676;zxxx_ID48;4657576

I need a Vb Script to remove extra characters of values of column B if its begin with z or Z. All values of column B need to start with the char "ID".
The len of values of column B can be different.

Here the final result of test2.csv
A1;B1;C1
1111;ID345;475758
4566;ID4758;5768595
7568;ID45;4657576
45676;ID48;4657576

Thank you in advance for your help.
0
LD16
Asked:
LD16
  • 5
  • 3
  • 3
  • +2
3 Solutions
 
RobSampsonCommented:
Hi there,

Give this VBS a try.

Regards,

Rob.

strFile1 = "C:\Temp\Scripts\EE\File1_Revised.txt"
strOutput = "C:\Temp\Scripts\EE\File1_Revised_Filtered.txt"

' Create the output file
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutput = objFSO.CreateTextFile(strOutput, True)

' Read through the file to omit invalid rows
Set objFile1 = objFSO.OpenTextFile(strFile1, 1, False)
' Read the header
strHeader = objFile1.ReadLine
objOutput.WriteLine strHeader
While Not objFile1.AtEndOfStream
	strLine = objFile1.ReadLine
	strKey = Split(strLine, ";")(1)
	If Left(strKey, 2) = "ID" Then objOutput.WriteLine strLine
Wend
objFile1.Close

' Close the output file
objOutput.Close

WScript.Echo "Done"

Open in new window

0
 
Jeff DarlingDeveloper AnalystCommented:
here is a way using a RegEx.  this may enable you to change the rules slightly if needed.

Option Explicit

Dim objRE
Dim strInput
Dim colMatches
Dim objMatch
Dim arrInput
Dim strOutput
Dim i
Dim objFSO
Dim objTextFileOut
Dim objTextFileIn
Dim strNextLine

Set objRE = New RegExp
objRE.Global     = True
objRE.IgnoreCase = False
objRE.Pattern    = "(.*)(ID\d{1,})"

'A1;B1;C1
'1111;ID345;475758
'4566;Z_ID4758;5768595
'7568;z1_ID45;4657576
'45676;zxxx_ID48;4657576

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFileOut = objFSO.OpenTextFile ("c:\data\out.txt", ForWriting, True)

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFileIn = objFSO.OpenTextFile  ("c:\data\in.txt", ForReading)

Do Until objTextFileIn.AtEndOfStream
    strNextLine = objTextFileIn.Readline

arrInput = Split(strNextLine,";")

Set colMatches = objRE.Execute(arrInput(1))

If colMatches.Count>0 then
For Each objMatch In colMatches
    arrInput(1)= objmatch.Submatches(1)
Next
End If
 strOutput = ""
For i = 0 To Ubound(arrInput)
 strOutput = strOutput & arrInput(i) & ";"
Next

strOutput = Mid(strOutput,1,Len(strOutput)-1)

WScript.Echo strOutput
objTextFileOut.WriteLine(strOutput)

Loop

objTextFileIn.Close
objTextFileOut.Close

Open in new window

0
 
LD16Author Commented:
Hello Rob,

I will test this tomorrow, however I see in your code

If Left(strKey, 2) = "ID" Then objOutput.WriteLine strLine

Are you totally dropping the lines which do not begin with ID?
This is ok, however I need to clean the values, which do not begin with ID and also WriteLine with the revised value which begin with ID.
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
Jeff DarlingDeveloper AnalystCommented:
Your specs state if column B starts with z or Z.

I'm guessing you have rows, that have an ID that do not start with Z?

example In

A1;B1;C1
1111;ID345;475758
4566;Z_ID4758;5768595
568;x1_ID45;46576
7568;z1_ID45;4657576
45676;zxxx_ID48;4657576

Open in new window


example out

A1;B1;C1
1111;ID345;475758
4566;ID4758;5768595
568;x1_ID45;46576
7568;ID45;4657576
45676;ID48;4657576

Open in new window


revised code to also follow the rule for z or Z

Option Explicit

Dim objRE
Dim strInput
Dim colMatches
Dim objMatch
Dim arrInput
Dim strOutput
Dim i
Dim objFSO
Dim objTextFileOut
Dim objTextFileIn
Dim strNextLine

Set objRE = New RegExp
objRE.Global     = True
objRE.IgnoreCase = False
objRE.Pattern    = "[z|Z](.*)(ID\d{1,})"


Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFileOut = objFSO.OpenTextFile ("c:\data\out.txt", ForWriting, True)

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFileIn = objFSO.OpenTextFile  ("c:\data\in.txt", ForReading)

Do Until objTextFileIn.AtEndOfStream
    strNextLine = objTextFileIn.Readline

arrInput = Split(strNextLine,";")

Set colMatches = objRE.Execute(arrInput(1))

If colMatches.Count>0 then
For Each objMatch In colMatches
    arrInput(1)= objmatch.Submatches(1)
Next
End If
 strOutput = ""
For i = 0 To Ubound(arrInput)
 strOutput = strOutput & arrInput(i) & ";"
Next

strOutput = Mid(strOutput,1,Len(strOutput)-1)

WScript.Echo strOutput
objTextFileOut.WriteLine(strOutput)

Loop

objTextFileIn.Close
objTextFileOut.Close

Open in new window

0
 
RobSampsonCommented:
Oh, I see what you're saying.  Assuming that all of the keys have ID in them somewhere, this would strip anything before ID, and also add ID to those that do not have it:
strFile1 = "C:\Temp\Scripts\EE\File1_Revised.txt"
strOutput = "C:\Temp\Scripts\EE\File1_Revised_Filtered.txt"

' Create the output file
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutput = objFSO.CreateTextFile(strOutput, True)

' Read through the file to omit invalid rows
Set objFile1 = objFSO.OpenTextFile(strFile1, 1, False)
' Read the header
strHeader = objFile1.ReadLine
objOutput.WriteLine strHeader
While Not objFile1.AtEndOfStream
	strLine = objFile1.ReadLine
	arrValues = Split(strLine, ";")
	If InStr(arrValues(1), "ID") > 0 Then
		arrValues(1) = Mid(arrValues(1), InStr(arrValues(1), "ID"))
	Else
		arrValues(1) = "ID" & arrValues(1)
	End If
	objOutput.WriteLine Join(arrValues, ";")
Wend
objFile1.Close

' Close the output file
objOutput.Close

WScript.Echo "Done"

Open in new window


Rob.
0
 
Bill PrewCommented:
Here's a relatively strait forward REGEX replacement approach.

As always, run with one or two file parks:

cscript EE28618822.vbs in.csv out.csv

or

cscript EE28618822.vbs in.csv

the second overwrites the file with the changes.

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

' Get input file name from command line parm, if 2 parms entered
' use second as new output file, else rewrite to input file
If (WScript.Arguments.Count > 0) Then
  sInfile = WScript.Arguments(0)
Else
  WScript.Echo "No filename specified."
  WScript.Quit
End If
If (WScript.Arguments.Count > 1) Then
  sOutfile = WScript.Arguments(1)
Else
  sOutfile = sInfile
End If

' Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")

' Read entire input file into a variable and close it
Set oInfile = oFSO.OpenTextFile(sInfile, ForReading, False, TriStateUseDefault)
sData = oInfile.ReadAll
oInfile.Close
Set oInfile = Nothing

' Replace desired strings
Set oRegExp = New RegExp
oRegExp.Global = True
oRegExp.IgnoreCase = False
oRegExp.Pattern = "(;)([zZ].*)(ID)"
sData = oRegExp.Replace(sData, "$1$3")

' Write file with any changes made
Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing

' Cleanup and end
Set oFSO = Nothing
Wscript.Quit

Open in new window

~bp
0
 
aikimarkCommented:
The simpler regex replacement pattern is this
;(.*?)(ID[^;]*);

Open in new window


Replace with
;$2;

Open in new window

0
 
Bill PrewCommented:
@aikimark,

I always enjoy seeing alternate REGEX expressions, thanks for adding.

I thought the OP only wanted to edit that column if it started with z or Z though, wouldn't you need to add that?

~bp
0
 
aikimarkCommented:
The subsequent comments led me to think that anything between the ; and the ID should be removed.  If I've misread/misinterpreted the requirement then ignore my comment.

Also, if you set the regexp object to ignore case, your patterns might be simpler.
0
 
Bill PrewCommented:
I guess another alternate could be this, although I lean toward soft coding the result string rather than recoding the search string surrounds, easier to maintain in the future, only one change.  A personal preference...

oRegExp.Pattern = ";([zZ].*)ID"
sData = oRegExp.Replace(sData, ";ID")

~bp
0
 
Bill PrewCommented:
@LD16,

Need some further clarification from you.  Do you only want to do this for the fields with a z or Z at the beginning, or any field that has extra characters before the "ID".

Also, if a field does not have either a z/Z or an ID at the beginning, are you saying those need an ID inserted?

~bp
0
 
aikimarkCommented:
And this pattern would probably be more efficient, since it only replaces if there are one or more characters before the ID
;(.+?)(ID[^;]*);

Open in new window


Maybe I should test these two methods like I did in my most recent article.  However, performance probably isn't a big concern in this instance.
0
 
RobSampsonCommented:
The subsequent comments led me to think that anything between the ; and the ID should be removed.

That was the assumption I made as well, since the author stated that each field should start with ID.

Rob.
0
 
LD16Author Commented:
@Bill,

I made a mistake in my spec all the values begins with z or Z so  "ID" insertion is not required.

Thank you
0
 
Bill PrewCommented:
Okay, great, let me know how you make out with my solution or others, hopefully we have some things that canget the job done.

~bp
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 3
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now