Solved

Replacing multiple values in a string

Posted on 2004-03-30
26
870 Views
Last Modified: 2008-06-05
Hi,

Im using the Microsoft XML component to retrieve the source code for certain pages on our website. We have email templates setup on our site, and then we can get the content from any where on our site to send as an email.

When the mail is being sent in HTML format this is fine, it looks perfect. However when its in plain text i run a bit of code that removes all tags such as <br> <html> etc etc.....The one thing this doesnt remove is all occurences of special characters such as vbCrLf,vbTab,VbCr,VbLf.

I could do this using 4 replace statements but there must be somesort of function that i can call once that will replace them with predefined values...

Any ideas anyone??

Thanks in advance

Al Higgs
0
Comment
Question by:higgsy
  • 9
  • 8
  • 5
  • +2
26 Comments
 
LVL 11

Expert Comment

by:Slimshaneey
Comment Utility
Try:
Function ReplaceChars(YourString)
replaceChars = replace(replace(replace(replace(YOURSTRING, vbcrlf,""),vbTab,""),vbcr,""),vblf,"")
End Function
0
 
LVL 11

Expert Comment

by:Slimshaneey
Comment Utility
I havent tested that BTW...
0
 
LVL 11

Expert Comment

by:Slimshaneey
Comment Utility
You could also extend it so you can define what to replace with:

Function ReplaceChars(YourString, crlfrep, tabrep, crrep, lfrep)
replaceChars = replace(replace(replace(replace(YOURSTRING, vbcrlf,crlfrep),vbTab,tabrep),vbcr,crrep),vblf,lfrep)
End Function


replacechars ("Some String","<br>", "&nbsp;","<br>","")
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
use RegExp, whitespace = \s


Function RemoveWhiteSpace(ByVal sIn)
    Dim oRegExp
    Set oRegExp = New RegExp
    oRegExp.Pattern = "\s"
    RemoveWhiteSpace = oRegExp.Replace(sIn, "")  
    Set oRegExp = Nothing
End Function
0
 
LVL 11

Expert Comment

by:Slimshaneey
Comment Utility
This document also has info on using xsl to automatically convert the whitespace components...
http://www.devasp.com/SampleChapters/3579/default13.asp
0
 

Author Comment

by:higgsy
Comment Utility
Hi Sybe,

That sort of works apart from if im searching for vbCrLf,vbCr,vbLf and vbTab i need to be able to replace with different values for each one. For instance, vbTab i want to be able to replace with "", however vbCrLf i want to be able to replace with "<br>"...

Is there a way to do this??

Thanks

Al
0
 
LVL 11

Expert Comment

by:Slimshaneey
Comment Utility
The function i made, crude as it is, does just that. No idea how to do that though with regex thjough
0
 

Author Comment

by:higgsy
Comment Utility
Hi Sybe,

I'll tell you why im trying to do this and it might give you a clearer idea..

When i send out plain text emails, which are basically the parsed source code of a HTML document with its HTML tags removed, becuase of the formatting that HTML pages have like so:

<tr>
     <td>Hi this is my sentence</td>
</tr>

When that HTML was parsed i would end up sending an email that included the indentation. So the memeber would get:

Dear member,

         Hi this is my sentence

Which tends to look a bit daft....

Any ideas???

Thanks again

al
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
The question is how you want it to have then. I understand that you are nor satisfied with what you have.
Who wants to have an HTML page send as plain text?

What about opening the HTML in Word and copy/paste it to notepad (it is possible not script that, although using Word from ASP is not recommandable)
0
 

Author Comment

by:higgsy
Comment Utility
Hi Sybe,

The way i really want it is to end up with a string of the plain text (i will have already removed all HTML tags) which is stripped of all indents...

Any ideas???

Al
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
No hard returns either? Just a String where the only white space is the space between words?
0
 
LVL 28

Accepted Solution

by:
sybe earned 500 total points
Comment Utility
The function below will remove all these

\f = formfeed
\n = new line
\r = carriage return
\t = tab
\v = vertical tab
 * = all multiple spaces

<%
Function RemoveWhiteSpace(ByVal sIn)
    Dim oRegExp
    Set oRegExp = New RegExp
    oRegExp.Pattern = "[\f\n\r\t\v]| *"
    RemoveWhiteSpace = oRegExp.Replace(sIn, "")  
    Set oRegExp = Nothing
End Function
%>
0
 

Author Comment

by:higgsy
Comment Utility
Hi Sybe,

This gives me the following error:

Microsoft VBScript runtime error '800a139a'

Unexpected quantifier

Any ideas?

It only gives the error when "*" is in the pattern.

Thanks

Al
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:higgsy
Comment Utility
Sorry it doesn't return an error, im missed the single space before the *.

However this does seem to remove all spacing even between words....
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
* should be +, sorry
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
" +" = all multiple spaces
0
 

Author Comment

by:higgsy
Comment Utility
Hi Sybe,

That didnt work for me either, but this did "[\n\r\t]| *\B"

The points are yours anyway as you did it all anyway...

Thanks mate, much appreciated...
0
 

Author Comment

by:higgsy
Comment Utility
Actually thats not quite working....

Where every new line starts in the proper HTML page it seems to put in a single space so i have something like this.

" Dear member,<br><br> Welcome to MusicSubway.Com- Your stop for a network of gifted and unsigned artists!<br><br> MusicSubway.Com has been developed as a tool for you."

You can see where the single spaces are, after the <br> tags.

Also cant i set one Pattern, execute it, then redefine the pattern an execute again like so:

      'first replace all <br> tags with vbCrLf
      objRegExpr.Pattern = "<br>"      
      strString = objRegExpr.Replace(strBody,vbCrLf)

      'set pattern to remove special characters
      objRegExpr.Pattern = "[\n\r\t\v\f]| *\B"
      strString = objRegExpr.Replace(strBody,"")

???

Al
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
that is probably because it does not replace "\n " (<newline><space>) with nothing, but (<newline>) with nothing and some lines start with a single <space>.

yes, you can of course set different patterns, but not in the way you do. After you have Replaced on the first pattern, you should Replce the resulting string the second time, not the starting string.

     'first replace all <br> tags with vbCrLf
     objRegExpr.Pattern = "<br>"    
     strString = objRegExpr.Replace(strBody,vbCrLf)

     'set pattern to remove special characters
     objRegExpr.Pattern = "[\n\r\t\v\f]| *\B"
     strString = objRegExpr.Replace(strString,"") '<==== use strString (the result of the previous)


<%
Function RemoveWhiteSpace(ByVal sIn)
    Dim oRegExp
    Set oRegExp = New RegExp
    oRegExp.Pattern = "\n |[\f\n\r\t\v]| +"  ' <== added " \n" (= <newline><space>) to the search pattern
    RemoveWhiteSpace = oRegExp.Replace(sIn, "")  
    Set oRegExp = Nothing
End Function
%>


0
 
LVL 20

Expert Comment

by:jitganguly
Comment Utility
Though sybe and others gave you what you wanted, but if I were you, I would have taken a different approach.How many charactarer would you replace ?

I would rather use some XSL and convert it to some form like html/database and read it from there.That way you will always get a uniform structure and you can format it accordingly

Do you have any control of those HTML pages you are reading ?. What if they enter something else that would not be recognized by your replace code ?

Anyway, just my 2 cents.All togther a differnet approach, you may not have enough time and resource to do that
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
I like the XSL approach, but the problem is that it requires valid XML. And most webpages are not valid XML, and it is a pain to convert HTML to XHTML. There are some tools, but what i have seen they need a component installed, and don't do the conversion from HTML special characters to XML special characters.
0
 
LVL 20

Expert Comment

by:jitganguly
Comment Utility
>>And most webpages are not valid xml
You are right and thats why I would write a solid XSL so that those VBCRLF characters are eliminated.
or
if he/she can grab those from a db table, but I suppose he/she cannot and thats why using XML
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
Teach me how XSL can process a string that is not valid XML. I live in the assumption the XSL can only process (valid) XML.
0
 
LVL 20

Expert Comment

by:jitganguly
Comment Utility
No no I am not saying that. I would rather correct the XML before it can go for parsing and then have a solid xsl to process those. So that my input is clear and I don't need to use any replace charactarers
0
 

Author Comment

by:higgsy
Comment Utility
Hi guys,

After promising Sybe the points i feel bad coming back, but i still havent been able to get the code working 100%. I have a HTML page which is used for email content all over the site. By passing a parameter to the page the page will return different sections of HTML and text, u can also pass it a parameter which will tell the page whether you want HTML or plain text. You can see the page at: http://www.musicsubway.com/templates/default.asp?strReference=registration&strVersion=    if you add plaintext to the end of this url you will see it only shows the text and none of the images or tables.

To get the content for plain text emails i am now using this code:

[code]
Function GetEmailContent()

      'declare some variables
      Dim objXMLHTTP, URL, strHTMLBody, strPlainBody, objRegExpr, strTmp, strString

      'create an instance of the XML component
      Set objXMLHTTP = Server.CreateObject("Microsoft.XMLHTTP")
      URL = "http://" & Request.ServerVariables("HTTP_HOST") & _
      "/templates/default.asp?strReference=registration&strVersion=plaintext"
      objXMLHTTP.Open "GET", URL, False
      'send the request and receive the source code back
      objXMLHTTP.Send
      strTmp = objXMLHTTP.responseText
      
      'create an instance of the regexp object
      Set objRegExpr = New regexp
      
      'set properties for regular expression
      objRegExpr.Global = True
      objRegExpr.IgnoreCase = True
      
      'first replace all <br> tags with vbCrLf
      objRegExpr.Pattern = "<br>"      
      strString = objRegExpr.Replace(strTmp,"VbCrTag")
      strTmp = strString
      
      'set pattern to remove special characters
      objRegExpr.Pattern = "\n +\b|[\r\n\t\v\f]| *\B"
      strString = objRegExpr.Replace(strTmp,"")
      strTmp = strString      
      
      'set pattern to remove all HTML tags
      objRegExpr.Pattern = "<[^>]*>"
      strString = objRegExpr.Replace(strTmp,"")
      strTmp = strString      
      
      'set pattern to replace custom tags with line breaks
      objRegExpr.Pattern = "VbCrTag"
      strString = objRegExpr.Replace(strTmp,vbCr)
      strTmp = strString
      
      Response.Write(strTmp)

End Function
[/code]

The result of this code can be seen at http://www.musicsubway.com/shared/asp/functions.asp. It all works fine apart from in a couple of places. On the template HTML page i referred to earlier, the first line ends approximately where the 'and' or the 'unsigned' word finish. In my code above it tends to attach the end word and the start of the next word together.........

Could u just see where my code is going wrong please?

Thanks again Sybe

Al
0
 

Expert Comment

by:longbloke69
Comment Utility
Sybe et al,

Changing the pattern to "[\f\n\r\t\v]| {2,}" seems to work better for me. It replaces 2 or more spaces leaving individual spaces alone!

Regards.
<%

Function RemoveWhiteSpace(ByVal sIn)

    Dim oRegExp

    Set oRegExp = New RegExp

    oRegExp.Pattern = "[\f\n\r\t\v]| {2,}"

    RemoveWhiteSpace = oRegExp.Replace(sIn, "")  

    Set oRegExp = Nothing

End Function

%>

Open in new window

0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

I would like to start this tip/trick by saying Thank You, to all who said that this could not be done, as it forced me to make sure that it could be accomplished. :) To start, I want to make sure everyone understands the importance of utilizing p…
I was asked about the differences between classic ASP and ASP.NET, so let me put them down here, for reference: Let's make the introductions... Classic ASP was launched by Microsoft in 1998 and dynamically generate web pages upon user interact…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now