Link to home
Start Free TrialLog in
Avatar of MadIce
MadIce

asked on

Search and replace text in documents using wildcard

I have hundreds of html pages with tags that I need to get rid of. I can go thru each page and remove but that would take a very long time. Is there a way to use a wildcard to search and remove these tags? I've used the following when I know what the text is:

fileReader = My.Computer.FileSystem.ReadAllText("C:\Documents\Working\Content\" & strTag & "\" & strFile).Replace("<img src=""""*.jpg", strTag & "-" & strPhotoNum & ".jpg")

My.Computer.FileSystem.WriteAllText("C:\Documents\Working\Content\" & strTag & "\" & strFile, fileReader, False)
 

Can Regex be used to do this? if so can you point me to example?
Using VB.NET 2010
ASKER CERTIFIED SOLUTION
Avatar of Brian Pringle
Brian Pringle
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of MadIce
MadIce

ASKER

I'll have to try that at home. I'll get back tomorrow.
Avatar of Ark
        Dim jpgPattern = "(<img .+?src\s*=\s*[""'])(.*?.jpg)(['""].+?/>)"
        Dim testString = "some html text <img class=""someclass"" id='1' src='test.jpg' /> rest of html <img class=""someclass"" id='2' src = 'subfolder/test2.jpg' />"
        Dim regex = New System.Text.RegularExpressions.Regex(jpgPattern)
        MessageBox.Show(regex.Replace(testString, "$1New.jpg$3"))

Open in new window

Notepad++ allows you to find and replace by regex too... I second the idea that it's an appropriate tool for the job.
BTW, Total Commander as well as some other console-like apps ( including famous FAR) allow regex too ... :)
Avatar of MadIce

ASKER

btpringle,

Tried notepad ++ and home and was able to do what I needed using regex. Wasn't familiar with Notepad ++ or regex. I'm have software that has the regex feature already. Thanks for the info. Thanks to everyone else as well.