Solved

Not Validated RSS feed, uses relative src for images/script... how to inject the FQDN?

Posted on 2007-12-04
18
259 Views
Last Modified: 2012-05-05
The RSS feed is out of my hands, so please do not ask me to get them to change it ...

I am coming across an issue with the feed.  The images and scripts are all in it with a relative source, rather than using the FQDN/folder/file.ext as the source.

Is there any way I can inject the FQDN into the scr="" for these?
0
Comment
Question by:kevp75
  • 10
  • 6
18 Comments
 
LVL 25

Author Comment

by:kevp75
Comment Utility
update.  The following function will get me the required FQDN...i just need to figure out how I can inject it into the image src, script src....
    Private Function stripFQDN(strURL)

        Set objRegExp = New RegExp

        objRegExp.IgnoreCase = True

        objRegExp.Multiline = True

        objRegExp.Global = True

        objRegExp.Pattern = "[a-zA-Z0-9]+([a-zA-Z0-9\-\.]+)?\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)"

        Set myMatches = objRegExp.Execute(strURL)

        For Each myMatch In myMatches

            stripFQDN = stripFQDN & myMatch.Value & vbcrlf

        Next

    End Function

Open in new window

0
 
LVL 3

Expert Comment

by:Martin-Smith
Comment Utility
So assuming you have a fully qualified domain of

http://www.bbc.co.uk


You want to manipulate all things like

src="xyz/page.asp"


so they end up like


src="http://www.bbc.co.uk/xyz/page.asp"?

If so you can use another RegEx as below


Dim ResultString As String
Dim myRegExp As RegExp
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "src=""([^"":]*)"""
ResultString = myRegExp.Replace(SubjectString, "src=""http://www.bbc.co.uk/$1""")
0
 
LVL 25

Author Comment

by:kevp75
Comment Utility
precisely.

I'll give that a shot and let you know in a few
0
 
LVL 25

Author Comment

by:kevp75
Comment Utility
what is the $1 for?
0
 
LVL 25

Author Comment

by:kevp75
Comment Utility
ok.  I think I'm off on something.  When I try what I'm doing with the code you posted, I get a blank page.

Here's what I have for code, and the way I am using it
Include.asp:

<%

Response.Expires=-1

Response.ExpiresAbsolute = Now() - 1

Response.CacheControl="private"

Response.CacheControl="no-cache"

Response.CacheControl="no-store"
 

Class clsFeedPuller

    'Get The files path

    Public Function strGetFilePath()

        Dim lsPath, arPath

        lsPath = Request.ServerVariables("SCRIPT_NAME")

        arPath = Split(lsPath, "/")

        arPath(UBound(arPath,1)) = ""

        strGetFilePath = Join(arPath, "/")

    End Function

    'Regular Expression Parser

    Public Function strParseContent(strContent, strPattern)

        Set objRegExp = New RegExp

        objRegExp.IgnoreCase = True

        objRegExp.Multiline = True

        objRegExp.Global = True

        objRegExp.Pattern = strPattern

        Set myMatches = objRegExp.Execute(strContent)

        For Each myMatch In myMatches

            strParseContent = strParseContent & myMatch.Value & vbcrlf

        Next

    End Function

    'Content Puller

    Public Function strGetContent(strURL)

	    'create an instance of the MS XMLhttp component.

	    Set xmlObj = Server.CreateObject("MSXML2.ServerXMLHTTP")

	    'Open the connection and send the request Set the optional Async parameter to True 

	    xmlObj.Open "GET", strURL, False  

	    Call xmlObj.Send()

	    'Turn off error handling

	    On Error Resume Next

	    'Wait for up to 3 seconds if we've not gotten the data yet

	    If xmlObj.readyState <> 4 Then xmlObj.waitForResponse 3

		    'Did an error occur?  If so, use a default value for our data

		    If Err.Number <> 0 Then

			    strGetContent = "There was an error retreiving the remote page"

		    Else

			    'If we reach here, we know the server responded

			    'now check for a 200 status and a ready state 4

			    If (xmlObj.readyState <> 4) Or (xmlObj.Status <> 200) Then

				    'Abort the request

				    xmlObj.Abort

				    strGetContent = "Problem communicating with remote server..."

			    Else

			        

				    strGetContent = injectFQDN(xmlObj.ResponseText, strURL)

				    'response.Write(injectFQDN(strGetContent, strURL))

			    End If

	    End If

    End Function

    'Feed Puller

    Public Function strGetRSS(strURL, strFeedsToShow) 

	    'Let's set our object

	    dim xmlDom, nodeCol, oNode, oChildNode

	    set xmlDom = Server.CreateObject("MSXML2.Domdocument")

		    xmlDOM.async = False

		    'Set our HTTP Request

		    call xmlDom.setProperty("ServerHTTPRequest", true)

		    xmlDom.async = False

		    'Now we load the document

		    call xmlDom.load(strURL)

		    'Check for elements

		    if not xmlDom.documentElement is nothing then

			    set nodeCol = xmlDom.documentElement.selectNodes("channel/item")

				    'Start a count of the articles to display

				    i = 0			  

				    'Start to loop through each article

				    for each oNode in nodeCol

					    'This number sets the number of articles to display

					    if i < strFeedsToShow then

						    Response.Write("<div>" & vbCrLf)

						    'The Link

						    set oChildNode = oNode.selectSingleNode("link")

							    if not oChildNode is nothing then

								    strRSSLink = oChildNode.text

							    end if

						    set oChildNode = nothing

						    'The Title

						    set oChildNode = oNode.selectSingleNode("title")

							    if not oChildNode is nothing then

								    strRSSTitle = Server.HTMLEncode(oChildNode.text)

								    strGetRSS = strGetRSS & "<div class='rssTitle'><a href=""#"" onclick=""loadurl('/golfTipsMagModule/content.asp?url=" & server.URLEncode(strRSSLink) & "&pt="&Request.QueryString("c")&"&title="&server.URLEncode(strRSSTitle)&"','rssFull');return false;"">" & strRSSTitle & "</a></div>"

							    end if

						    set oChildNode = nothing

						    'Published Date

						    set oChildNode = oNode.selectSingleNode("pubDate")

							    if not oChildNode is nothing then

								    strRSSPubDate = Server.HTMLEncode(oChildNode.text)

								    strGetRSS = strGetRSS & "<div class='rssDate'>" & strRSSPubDate & "</div>" & vbCrLf

							    end if

						    set oChildNode = nothing

						    'The Description

						    set oChildNode = oNode.selectSingleNode("description")

							    if not oChildNode is nothing then

								    strRSSDesc = oChildNode.text

								    strGetRSS = strGetRSS & "<div class='rssDesc'>" & strRSSDesc & "</div>"

							    end if

						    set oChildNode = nothing

						    'Add 1 to the article count number

						    i = i + 1

						    strGetRSS = strGetRSS & "</div>" & vbCrLf

					    end if

				    next

			    set nodeCol = nothing

		    else

			    strGetRSS = strGetRSS & strPANError & vbCrLf

		    end if

	    set xmlDom = nothing

    End Function

    

    'Strip the FQDN for image and script injection

    Private Function stripFQDN(strURL)

        Set objRegExp = New RegExp

        objRegExp.IgnoreCase = True

        objRegExp.Multiline = True

        objRegExp.Global = True

        objRegExp.Pattern = "[a-zA-Z0-9]+([a-zA-Z0-9\-\.]+)?\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)"

        Set myMatches = objRegExp.Execute(strURL)

        For Each myMatch In myMatches

            stripFQDN = stripFQDN & myMatch.Value & vbcrlf

        Next

    End Function

    'Inject the domain into src

    Private Sub injectFQDN(strString, strURL)

        Set objRegExp = New RegExp

            objRegExp.IgnoreCase = True

            objRegExp.Multiline = True

            objRegExp.Global = True

            objRegExp.Pattern = "src=""([^"":]*)"""

            ResultString = objRegExp.Replace(strString, "src=""http://" & stripFQDN(strURL) & "/$1""")

        set objRegExp = nothing

    End Sub

End Class

%>
 

page.asp:

<%

set objContent = new clsFeedPuller
 

    strPattern = "<!-- PRODUCER NOTE -->([\s\S]*?)<!-- RIGHT COLUMN -->"

    strContent = objContent.strGetContent("http://www.vinfolio.com/do/store/detail?vid=93869&utm_source=RSS&utm_medium=RSS")

    response.Write(objContent.strParseContent(strContent,strPattern))

    

    

set objContent = nothing
 

%>

Open in new window

0
 
LVL 3

Expert Comment

by:Martin-Smith
Comment Utility
injectFQDN  should surely be a function not a sub?
0
 
LVL 3

Expert Comment

by:Martin-Smith
Comment Utility
Did the above work?

Also sorry I missed you earlier question as to what the $1 is for.

The Regular Expression matches everything like

src="xyz"

where xyz is any length string of characters not including either a " (as this is the end delimiter) or a : (as this would indicate an absolute URL that shouldn't be adjusted)

the xyz stuff is put into a "backreference" by enclosing it in brackets. It is the first and only backreference in the expression.

The $1 in the replace expression basically means substitute the back reference value.

If you want to learn more about RegEx's I strongly recommend RegExBuddy.
0
 
LVL 25

Author Comment

by:kevp75
Comment Utility
ok.  looks like it works for src="something", but what about src='something' and src=something?
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 3

Expert Comment

by:Martin-Smith
Comment Utility
Change the pattern to "src=(""|')?([^"":]*)\1"

Change $1 to $2
0
 
LVL 25

Author Comment

by:kevp75
Comment Utility
got it.  thanks
0
 
LVL 25

Author Comment

by:kevp75
Comment Utility
I stand corrected.  This still does not work for src='sopmething.ext' and src=something.ext

nor does it seem to be working with anything other than images...

any thoughts?  or should I re-open the question?
0
 
LVL 25

Author Comment

by:kevp75
Comment Utility
bueller?
0
 
LVL 25

Author Comment

by:kevp75
Comment Utility
bueller, bueller.....anyone?
0
 
LVL 3

Expert Comment

by:Martin-Smith
Comment Utility
It should work for single quotes.

The (""|') portion of the Regex means match either a single or double quote.

You may need to tweak the regex to allow for spaces next to the "=" character or something along those lines.

Your question only asked about src.

If you want to match, eg, href as well use the alternation character as well.

src|href=(""|')?([^":]*)\1
0
 
LVL 3

Accepted Solution

by:
Martin-Smith earned 500 total points
Comment Utility
Try the following pattern

"(?:src|href)[\s]*=[\s]*(""|')([^"":]*)\1"
0
 
LVL 25

Author Closing Comment

by:kevp75
Comment Utility
sorry bout that...
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

I would like to start this tip/trick by saying Thank You, to all who said that this could not be done, as it forced me to make sure that it could be accomplished. :) To start, I want to make sure everyone understands the importance of utilizing p…
Have you ever needed to get an ASP script to wait for a while? I have, just to let something else happen. Or in my case, to allow other stuff to happen while I was murdering my MySQL database with an update. The Original Issue This was written…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This video discusses moving either the default database or any database to a new volume.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now