We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

Replacing multiple values in a string

higgsy
higgsy asked
on
Medium Priority
893 Views
Last Modified: 2008-06-05
Hi,

Im using the Microsoft XML component to retrieve the source code for certain pages on our website. We have email templates setup on our site, and then we can get the content from any where on our site to send as an email.

When the mail is being sent in HTML format this is fine, it looks perfect. However when its in plain text i run a bit of code that removes all tags such as <br> <html> etc etc.....The one thing this doesnt remove is all occurences of special characters such as vbCrLf,vbTab,VbCr,VbLf.

I could do this using 4 replace statements but there must be somesort of function that i can call once that will replace them with predefined values...

Any ideas anyone??

Thanks in advance

Al Higgs
Comment
Watch Question

Try:
Function ReplaceChars(YourString)
replaceChars = replace(replace(replace(replace(YOURSTRING, vbcrlf,""),vbTab,""),vbcr,""),vblf,"")
End Function
I havent tested that BTW...
You could also extend it so you can define what to replace with:

Function ReplaceChars(YourString, crlfrep, tabrep, crrep, lfrep)
replaceChars = replace(replace(replace(replace(YOURSTRING, vbcrlf,crlfrep),vbTab,tabrep),vbcr,crrep),vblf,lfrep)
End Function


replacechars ("Some String","<br>", "&nbsp;","<br>","")

Commented:
use RegExp, whitespace = \s


Function RemoveWhiteSpace(ByVal sIn)
    Dim oRegExp
    Set oRegExp = New RegExp
    oRegExp.Pattern = "\s"
    RemoveWhiteSpace = oRegExp.Replace(sIn, "")  
    Set oRegExp = Nothing
End Function
This document also has info on using xsl to automatically convert the whitespace components...
http://www.devasp.com/SampleChapters/3579/default13.asp

Author

Commented:
Hi Sybe,

That sort of works apart from if im searching for vbCrLf,vbCr,vbLf and vbTab i need to be able to replace with different values for each one. For instance, vbTab i want to be able to replace with "", however vbCrLf i want to be able to replace with "<br>"...

Is there a way to do this??

Thanks

Al
The function i made, crude as it is, does just that. No idea how to do that though with regex thjough

Author

Commented:
Hi Sybe,

I'll tell you why im trying to do this and it might give you a clearer idea..

When i send out plain text emails, which are basically the parsed source code of a HTML document with its HTML tags removed, becuase of the formatting that HTML pages have like so:

<tr>
     <td>Hi this is my sentence</td>
</tr>

When that HTML was parsed i would end up sending an email that included the indentation. So the memeber would get:

Dear member,

         Hi this is my sentence

Which tends to look a bit daft....

Any ideas???

Thanks again

al

Commented:
The question is how you want it to have then. I understand that you are nor satisfied with what you have.
Who wants to have an HTML page send as plain text?

What about opening the HTML in Word and copy/paste it to notepad (it is possible not script that, although using Word from ASP is not recommandable)

Author

Commented:
Hi Sybe,

The way i really want it is to end up with a string of the plain text (i will have already removed all HTML tags) which is stripped of all indents...

Any ideas???

Al

Commented:
No hard returns either? Just a String where the only white space is the space between words?
Commented:
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview

Author

Commented:
Hi Sybe,

This gives me the following error:

Microsoft VBScript runtime error '800a139a'

Unexpected quantifier

Any ideas?

It only gives the error when "*" is in the pattern.

Thanks

Al

Author

Commented:
Sorry it doesn't return an error, im missed the single space before the *.

However this does seem to remove all spacing even between words....

Commented:
* should be +, sorry

Commented:
" +" = all multiple spaces

Author

Commented:
Hi Sybe,

That didnt work for me either, but this did "[\n\r\t]| *\B"

The points are yours anyway as you did it all anyway...

Thanks mate, much appreciated...

Author

Commented:
Actually thats not quite working....

Where every new line starts in the proper HTML page it seems to put in a single space so i have something like this.

" Dear member,<br><br> Welcome to MusicSubway.Com- Your stop for a network of gifted and unsigned artists!<br><br> MusicSubway.Com has been developed as a tool for you."

You can see where the single spaces are, after the <br> tags.

Also cant i set one Pattern, execute it, then redefine the pattern an execute again like so:

      'first replace all <br> tags with vbCrLf
      objRegExpr.Pattern = "<br>"      
      strString = objRegExpr.Replace(strBody,vbCrLf)

      'set pattern to remove special characters
      objRegExpr.Pattern = "[\n\r\t\v\f]| *\B"
      strString = objRegExpr.Replace(strBody,"")

???

Al

Commented:
that is probably because it does not replace "\n " (<newline><space>) with nothing, but (<newline>) with nothing and some lines start with a single <space>.

yes, you can of course set different patterns, but not in the way you do. After you have Replaced on the first pattern, you should Replce the resulting string the second time, not the starting string.

     'first replace all <br> tags with vbCrLf
     objRegExpr.Pattern = "<br>"    
     strString = objRegExpr.Replace(strBody,vbCrLf)

     'set pattern to remove special characters
     objRegExpr.Pattern = "[\n\r\t\v\f]| *\B"
     strString = objRegExpr.Replace(strString,"") '<==== use strString (the result of the previous)


<%
Function RemoveWhiteSpace(ByVal sIn)
    Dim oRegExp
    Set oRegExp = New RegExp
    oRegExp.Pattern = "\n |[\f\n\r\t\v]| +"  ' <== added " \n" (= <newline><space>) to the search pattern
    RemoveWhiteSpace = oRegExp.Replace(sIn, "")  
    Set oRegExp = Nothing
End Function
%>


Though sybe and others gave you what you wanted, but if I were you, I would have taken a different approach.How many charactarer would you replace ?

I would rather use some XSL and convert it to some form like html/database and read it from there.That way you will always get a uniform structure and you can format it accordingly

Do you have any control of those HTML pages you are reading ?. What if they enter something else that would not be recognized by your replace code ?

Anyway, just my 2 cents.All togther a differnet approach, you may not have enough time and resource to do that

Commented:
I like the XSL approach, but the problem is that it requires valid XML. And most webpages are not valid XML, and it is a pain to convert HTML to XHTML. There are some tools, but what i have seen they need a component installed, and don't do the conversion from HTML special characters to XML special characters.
>>And most webpages are not valid xml
You are right and thats why I would write a solid XSL so that those VBCRLF characters are eliminated.
or
if he/she can grab those from a db table, but I suppose he/she cannot and thats why using XML

Commented:
Teach me how XSL can process a string that is not valid XML. I live in the assumption the XSL can only process (valid) XML.
No no I am not saying that. I would rather correct the XML before it can go for parsing and then have a solid xsl to process those. So that my input is clear and I don't need to use any replace charactarers

Author

Commented:
Hi guys,

After promising Sybe the points i feel bad coming back, but i still havent been able to get the code working 100%. I have a HTML page which is used for email content all over the site. By passing a parameter to the page the page will return different sections of HTML and text, u can also pass it a parameter which will tell the page whether you want HTML or plain text. You can see the page at: http://www.musicsubway.com/templates/default.asp?strReference=registration&strVersion=    if you add plaintext to the end of this url you will see it only shows the text and none of the images or tables.

To get the content for plain text emails i am now using this code:

[code]
Function GetEmailContent()

      'declare some variables
      Dim objXMLHTTP, URL, strHTMLBody, strPlainBody, objRegExpr, strTmp, strString

      'create an instance of the XML component
      Set objXMLHTTP = Server.CreateObject("Microsoft.XMLHTTP")
      URL = "http://" & Request.ServerVariables("HTTP_HOST") & _
      "/templates/default.asp?strReference=registration&strVersion=plaintext"
      objXMLHTTP.Open "GET", URL, False
      'send the request and receive the source code back
      objXMLHTTP.Send
      strTmp = objXMLHTTP.responseText
      
      'create an instance of the regexp object
      Set objRegExpr = New regexp
      
      'set properties for regular expression
      objRegExpr.Global = True
      objRegExpr.IgnoreCase = True
      
      'first replace all <br> tags with vbCrLf
      objRegExpr.Pattern = "<br>"      
      strString = objRegExpr.Replace(strTmp,"VbCrTag")
      strTmp = strString
      
      'set pattern to remove special characters
      objRegExpr.Pattern = "\n +\b|[\r\n\t\v\f]| *\B"
      strString = objRegExpr.Replace(strTmp,"")
      strTmp = strString      
      
      'set pattern to remove all HTML tags
      objRegExpr.Pattern = "<[^>]*>"
      strString = objRegExpr.Replace(strTmp,"")
      strTmp = strString      
      
      'set pattern to replace custom tags with line breaks
      objRegExpr.Pattern = "VbCrTag"
      strString = objRegExpr.Replace(strTmp,vbCr)
      strTmp = strString
      
      Response.Write(strTmp)

End Function
[/code]

The result of this code can be seen at http://www.musicsubway.com/shared/asp/functions.asp. It all works fine apart from in a couple of places. On the template HTML page i referred to earlier, the first line ends approximately where the 'and' or the 'unsigned' word finish. In my code above it tends to attach the end word and the start of the next word together.........

Could u just see where my code is going wrong please?

Thanks again Sybe

Al
Sybe et al,

Changing the pattern to "[\f\n\r\t\v]| {2,}" seems to work better for me. It replaces 2 or more spaces leaving individual spaces alone!

Regards.
<%
Function RemoveWhiteSpace(ByVal sIn)
    Dim oRegExp
    Set oRegExp = New RegExp
    oRegExp.Pattern = "[\f\n\r\t\v]| {2,}"
    RemoveWhiteSpace = oRegExp.Replace(sIn, "")  
    Set oRegExp = Nothing
End Function
%>

Open in new window

Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a free trial preview!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.