Solved

Not Validated RSS feed, uses relative src for images/script... how to inject the FQDN?

Posted on 2007-12-04
18
260 Views
Last Modified: 2012-05-05
The RSS feed is out of my hands, so please do not ask me to get them to change it ...

I am coming across an issue with the feed.  The images and scripts are all in it with a relative source, rather than using the FQDN/folder/file.ext as the source.

Is there any way I can inject the FQDN into the scr="" for these?
0
Comment
Question by:kevp75
  • 10
  • 6
18 Comments
 
LVL 25

Author Comment

by:kevp75
ID: 20405383
update.  The following function will get me the required FQDN...i just need to figure out how I can inject it into the image src, script src....
    Private Function stripFQDN(strURL)

        Set objRegExp = New RegExp

        objRegExp.IgnoreCase = True

        objRegExp.Multiline = True

        objRegExp.Global = True

        objRegExp.Pattern = "[a-zA-Z0-9]+([a-zA-Z0-9\-\.]+)?\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)"

        Set myMatches = objRegExp.Execute(strURL)

        For Each myMatch In myMatches

            stripFQDN = stripFQDN & myMatch.Value & vbcrlf

        Next

    End Function

Open in new window

0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20405765
So assuming you have a fully qualified domain of

http://www.bbc.co.uk


You want to manipulate all things like

src="xyz/page.asp"


so they end up like


src="http://www.bbc.co.uk/xyz/page.asp"?

If so you can use another RegEx as below


Dim ResultString As String
Dim myRegExp As RegExp
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "src=""([^"":]*)"""
ResultString = myRegExp.Replace(SubjectString, "src=""http://www.bbc.co.uk/$1""")
0
 
LVL 25

Author Comment

by:kevp75
ID: 20410914
precisely.

I'll give that a shot and let you know in a few
0
 
LVL 25

Author Comment

by:kevp75
ID: 20410986
what is the $1 for?
0
 
LVL 25

Author Comment

by:kevp75
ID: 20411042
ok.  I think I'm off on something.  When I try what I'm doing with the code you posted, I get a blank page.

Here's what I have for code, and the way I am using it
Include.asp:

<%

Response.Expires=-1

Response.ExpiresAbsolute = Now() - 1

Response.CacheControl="private"

Response.CacheControl="no-cache"

Response.CacheControl="no-store"
 

Class clsFeedPuller

    'Get The files path

    Public Function strGetFilePath()

        Dim lsPath, arPath

        lsPath = Request.ServerVariables("SCRIPT_NAME")

        arPath = Split(lsPath, "/")

        arPath(UBound(arPath,1)) = ""

        strGetFilePath = Join(arPath, "/")

    End Function

    'Regular Expression Parser

    Public Function strParseContent(strContent, strPattern)

        Set objRegExp = New RegExp

        objRegExp.IgnoreCase = True

        objRegExp.Multiline = True

        objRegExp.Global = True

        objRegExp.Pattern = strPattern

        Set myMatches = objRegExp.Execute(strContent)

        For Each myMatch In myMatches

            strParseContent = strParseContent & myMatch.Value & vbcrlf

        Next

    End Function

    'Content Puller

    Public Function strGetContent(strURL)

	    'create an instance of the MS XMLhttp component.

	    Set xmlObj = Server.CreateObject("MSXML2.ServerXMLHTTP")

	    'Open the connection and send the request Set the optional Async parameter to True 

	    xmlObj.Open "GET", strURL, False  

	    Call xmlObj.Send()

	    'Turn off error handling

	    On Error Resume Next

	    'Wait for up to 3 seconds if we've not gotten the data yet

	    If xmlObj.readyState <> 4 Then xmlObj.waitForResponse 3

		    'Did an error occur?  If so, use a default value for our data

		    If Err.Number <> 0 Then

			    strGetContent = "There was an error retreiving the remote page"

		    Else

			    'If we reach here, we know the server responded

			    'now check for a 200 status and a ready state 4

			    If (xmlObj.readyState <> 4) Or (xmlObj.Status <> 200) Then

				    'Abort the request

				    xmlObj.Abort

				    strGetContent = "Problem communicating with remote server..."

			    Else

			        

				    strGetContent = injectFQDN(xmlObj.ResponseText, strURL)

				    'response.Write(injectFQDN(strGetContent, strURL))

			    End If

	    End If

    End Function

    'Feed Puller

    Public Function strGetRSS(strURL, strFeedsToShow) 

	    'Let's set our object

	    dim xmlDom, nodeCol, oNode, oChildNode

	    set xmlDom = Server.CreateObject("MSXML2.Domdocument")

		    xmlDOM.async = False

		    'Set our HTTP Request

		    call xmlDom.setProperty("ServerHTTPRequest", true)

		    xmlDom.async = False

		    'Now we load the document

		    call xmlDom.load(strURL)

		    'Check for elements

		    if not xmlDom.documentElement is nothing then

			    set nodeCol = xmlDom.documentElement.selectNodes("channel/item")

				    'Start a count of the articles to display

				    i = 0			  

				    'Start to loop through each article

				    for each oNode in nodeCol

					    'This number sets the number of articles to display

					    if i < strFeedsToShow then

						    Response.Write("<div>" & vbCrLf)

						    'The Link

						    set oChildNode = oNode.selectSingleNode("link")

							    if not oChildNode is nothing then

								    strRSSLink = oChildNode.text

							    end if

						    set oChildNode = nothing

						    'The Title

						    set oChildNode = oNode.selectSingleNode("title")

							    if not oChildNode is nothing then

								    strRSSTitle = Server.HTMLEncode(oChildNode.text)

								    strGetRSS = strGetRSS & "<div class='rssTitle'><a href=""#"" onclick=""loadurl('/golfTipsMagModule/content.asp?url=" & server.URLEncode(strRSSLink) & "&pt="&Request.QueryString("c")&"&title="&server.URLEncode(strRSSTitle)&"','rssFull');return false;"">" & strRSSTitle & "</a></div>"

							    end if

						    set oChildNode = nothing

						    'Published Date

						    set oChildNode = oNode.selectSingleNode("pubDate")

							    if not oChildNode is nothing then

								    strRSSPubDate = Server.HTMLEncode(oChildNode.text)

								    strGetRSS = strGetRSS & "<div class='rssDate'>" & strRSSPubDate & "</div>" & vbCrLf

							    end if

						    set oChildNode = nothing

						    'The Description

						    set oChildNode = oNode.selectSingleNode("description")

							    if not oChildNode is nothing then

								    strRSSDesc = oChildNode.text

								    strGetRSS = strGetRSS & "<div class='rssDesc'>" & strRSSDesc & "</div>"

							    end if

						    set oChildNode = nothing

						    'Add 1 to the article count number

						    i = i + 1

						    strGetRSS = strGetRSS & "</div>" & vbCrLf

					    end if

				    next

			    set nodeCol = nothing

		    else

			    strGetRSS = strGetRSS & strPANError & vbCrLf

		    end if

	    set xmlDom = nothing

    End Function

    

    'Strip the FQDN for image and script injection

    Private Function stripFQDN(strURL)

        Set objRegExp = New RegExp

        objRegExp.IgnoreCase = True

        objRegExp.Multiline = True

        objRegExp.Global = True

        objRegExp.Pattern = "[a-zA-Z0-9]+([a-zA-Z0-9\-\.]+)?\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)"

        Set myMatches = objRegExp.Execute(strURL)

        For Each myMatch In myMatches

            stripFQDN = stripFQDN & myMatch.Value & vbcrlf

        Next

    End Function

    'Inject the domain into src

    Private Sub injectFQDN(strString, strURL)

        Set objRegExp = New RegExp

            objRegExp.IgnoreCase = True

            objRegExp.Multiline = True

            objRegExp.Global = True

            objRegExp.Pattern = "src=""([^"":]*)"""

            ResultString = objRegExp.Replace(strString, "src=""http://" & stripFQDN(strURL) & "/$1""")

        set objRegExp = nothing

    End Sub

End Class

%>
 

page.asp:

<%

set objContent = new clsFeedPuller
 

    strPattern = "<!-- PRODUCER NOTE -->([\s\S]*?)<!-- RIGHT COLUMN -->"

    strContent = objContent.strGetContent("http://www.vinfolio.com/do/store/detail?vid=93869&utm_source=RSS&utm_medium=RSS")

    response.Write(objContent.strParseContent(strContent,strPattern))

    

    

set objContent = nothing
 

%>

Open in new window

0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20411088
injectFQDN  should surely be a function not a sub?
0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20416510
Did the above work?

Also sorry I missed you earlier question as to what the $1 is for.

The Regular Expression matches everything like

src="xyz"

where xyz is any length string of characters not including either a " (as this is the end delimiter) or a : (as this would indicate an absolute URL that shouldn't be adjusted)

the xyz stuff is put into a "backreference" by enclosing it in brackets. It is the first and only backreference in the expression.

The $1 in the replace expression basically means substitute the back reference value.

If you want to learn more about RegEx's I strongly recommend RegExBuddy.
0
 
LVL 25

Author Comment

by:kevp75
ID: 20419472
ok.  looks like it works for src="something", but what about src='something' and src=something?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20419533
Change the pattern to "src=(""|')?([^"":]*)\1"

Change $1 to $2
0
 
LVL 25

Author Comment

by:kevp75
ID: 20432096
got it.  thanks
0
 
LVL 25

Author Comment

by:kevp75
ID: 20449201
I stand corrected.  This still does not work for src='sopmething.ext' and src=something.ext

nor does it seem to be working with anything other than images...

any thoughts?  or should I re-open the question?
0
 
LVL 25

Author Comment

by:kevp75
ID: 20546933
bueller?
0
 
LVL 25

Author Comment

by:kevp75
ID: 20581587
bueller, bueller.....anyone?
0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20645350
It should work for single quotes.

The (""|') portion of the Regex means match either a single or double quote.

You may need to tweak the regex to allow for spaces next to the "=" character or something along those lines.

Your question only asked about src.

If you want to match, eg, href as well use the alternation character as well.

src|href=(""|')?([^":]*)\1
0
 
LVL 3

Accepted Solution

by:
Martin-Smith earned 500 total points
ID: 20645358
Try the following pattern

"(?:src|href)[\s]*=[\s]*(""|')([^"":]*)\1"
0
 
LVL 25

Author Closing Comment

by:kevp75
ID: 31412645
sorry bout that...
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I recently decide that I needed a way to make my pages scream on the net.   While searching around how I can accomplish this I stumbled across a great article that stated "minimize the server requests." I got to thinking, hey, I use more than one…
I was asked about the differences between classic ASP and ASP.NET, so let me put them down here, for reference: Let's make the introductions... Classic ASP was launched by Microsoft in 1998 and dynamically generate web pages upon user interact…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
I designed this idea while studying technology in the classroom.  This is a semester long project.  Students are asked to take photographs on a specific topic which they find meaningful, it can be a place or situation such as travel or homelessness.…

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now