Regular Expression

Hi guys,

I have a string which contains a series of characters, including html tags. I would like to preserve the text but remove all tags, replacing the paragraph end </p> tag with <br> - I have found the following script, but not understanding regular expressions makes it hard to understand what it does exactly what exactly does this function do and how can I modify it to preserve the text as I said earlier.

Function stripTags(HTMLstring)
	Set RegularExpressionObject = New RegExp
	With RegularExpressionObject
		.Pattern = "<[^>]+>"
		.IgnoreCase = True
		.Global = True
	End With
	stripTags = RegularExpressionObject.Replace(HTMLstring, "")
	Set RegularExpressionObject = nothing
End Function

Open in new window


MTIA

DWE
LVL 1
dwe0608Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

matija_Commented:
Here:
Function stripTags(HTMLstring)
	Set RegularExpressionObject = New RegExp
	With RegularExpressionObject
		.Pattern = "<[^>]+>"
		.IgnoreCase = True
		.Global = True
	End With

	HTMLstring = Replace(HTMLstring, "</p>", "#br /#" & VbCrLf)
	HTMLstring = RegularExpressionObject.Replace(HTMLstring, "")
	stripTags = Replace(HTMLstring, "#br /#", "<br />")
	Set RegularExpressionObject = nothing
End Function

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
dwe0608Author Commented:
hi matija_ - ok I can see what you've done - you take the incoming HTMLString, replace </p> with #br /# and then strip the tags, and then replace #br /# with <br /> - but what does the pattern do ? ie that character sequence means nothing to me, so what do they mean?
0
matija_Commented:
The patterns finds and removes every trace of "< anything inside brackets >" inside your text.
0
Become a Microsoft Certified Solutions Expert

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

dwe0608Author Commented:
so will that delete something like "<input type="text" value="test value" /> including the value of text ... ? I suppose it would wouldnt it ...
0
dwe0608Author Commented:
thanks for the help ...
0
matija_Commented:
Yes it would. Glad I could help...
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
ASP

From novice to tech pro — start learning today.