Solved

Not Validated RSS feed, uses relative src for images/script... how to inject the FQDN?

Posted on 2007-12-04
18
262 Views
Last Modified: 2012-05-05
The RSS feed is out of my hands, so please do not ask me to get them to change it ...

I am coming across an issue with the feed.  The images and scripts are all in it with a relative source, rather than using the FQDN/folder/file.ext as the source.

Is there any way I can inject the FQDN into the scr="" for these?
0
Comment
Question by:kevp75
  • 10
  • 6
18 Comments
 
LVL 25

Author Comment

by:kevp75
ID: 20405383
update.  The following function will get me the required FQDN...i just need to figure out how I can inject it into the image src, script src....
    Private Function stripFQDN(strURL)
        Set objRegExp = New RegExp
        objRegExp.IgnoreCase = True
        objRegExp.Multiline = True
        objRegExp.Global = True
        objRegExp.Pattern = "[a-zA-Z0-9]+([a-zA-Z0-9\-\.]+)?\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)"
        Set myMatches = objRegExp.Execute(strURL)
        For Each myMatch In myMatches
            stripFQDN = stripFQDN & myMatch.Value & vbcrlf
        Next
    End Function

Open in new window

0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20405765
So assuming you have a fully qualified domain of

http://www.bbc.co.uk


You want to manipulate all things like

src="xyz/page.asp"


so they end up like


src="http://www.bbc.co.uk/xyz/page.asp"?

If so you can use another RegEx as below


Dim ResultString As String
Dim myRegExp As RegExp
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "src=""([^"":]*)"""
ResultString = myRegExp.Replace(SubjectString, "src=""http://www.bbc.co.uk/$1""")
0
 
LVL 25

Author Comment

by:kevp75
ID: 20410914
precisely.

I'll give that a shot and let you know in a few
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 25

Author Comment

by:kevp75
ID: 20410986
what is the $1 for?
0
 
LVL 25

Author Comment

by:kevp75
ID: 20411042
ok.  I think I'm off on something.  When I try what I'm doing with the code you posted, I get a blank page.

Here's what I have for code, and the way I am using it
Include.asp:
<%
Response.Expires=-1
Response.ExpiresAbsolute = Now() - 1
Response.CacheControl="private"
Response.CacheControl="no-cache"
Response.CacheControl="no-store"
 
Class clsFeedPuller
    'Get The files path
    Public Function strGetFilePath()
        Dim lsPath, arPath
        lsPath = Request.ServerVariables("SCRIPT_NAME")
        arPath = Split(lsPath, "/")
        arPath(UBound(arPath,1)) = ""
        strGetFilePath = Join(arPath, "/")
    End Function
    'Regular Expression Parser
    Public Function strParseContent(strContent, strPattern)
        Set objRegExp = New RegExp
        objRegExp.IgnoreCase = True
        objRegExp.Multiline = True
        objRegExp.Global = True
        objRegExp.Pattern = strPattern
        Set myMatches = objRegExp.Execute(strContent)
        For Each myMatch In myMatches
            strParseContent = strParseContent & myMatch.Value & vbcrlf
        Next
    End Function
    'Content Puller
    Public Function strGetContent(strURL)
	    'create an instance of the MS XMLhttp component.
	    Set xmlObj = Server.CreateObject("MSXML2.ServerXMLHTTP")
	    'Open the connection and send the request Set the optional Async parameter to True 
	    xmlObj.Open "GET", strURL, False  
	    Call xmlObj.Send()
	    'Turn off error handling
	    On Error Resume Next
	    'Wait for up to 3 seconds if we've not gotten the data yet
	    If xmlObj.readyState <> 4 Then xmlObj.waitForResponse 3
		    'Did an error occur?  If so, use a default value for our data
		    If Err.Number <> 0 Then
			    strGetContent = "There was an error retreiving the remote page"
		    Else
			    'If we reach here, we know the server responded
			    'now check for a 200 status and a ready state 4
			    If (xmlObj.readyState <> 4) Or (xmlObj.Status <> 200) Then
				    'Abort the request
				    xmlObj.Abort
				    strGetContent = "Problem communicating with remote server..."
			    Else
			        
				    strGetContent = injectFQDN(xmlObj.ResponseText, strURL)
				    'response.Write(injectFQDN(strGetContent, strURL))
			    End If
	    End If
    End Function
    'Feed Puller
    Public Function strGetRSS(strURL, strFeedsToShow) 
	    'Let's set our object
	    dim xmlDom, nodeCol, oNode, oChildNode
	    set xmlDom = Server.CreateObject("MSXML2.Domdocument")
		    xmlDOM.async = False
		    'Set our HTTP Request
		    call xmlDom.setProperty("ServerHTTPRequest", true)
		    xmlDom.async = False
		    'Now we load the document
		    call xmlDom.load(strURL)
		    'Check for elements
		    if not xmlDom.documentElement is nothing then
			    set nodeCol = xmlDom.documentElement.selectNodes("channel/item")
				    'Start a count of the articles to display
				    i = 0			  
				    'Start to loop through each article
				    for each oNode in nodeCol
					    'This number sets the number of articles to display
					    if i < strFeedsToShow then
						    Response.Write("<div>" & vbCrLf)
						    'The Link
						    set oChildNode = oNode.selectSingleNode("link")
							    if not oChildNode is nothing then
								    strRSSLink = oChildNode.text
							    end if
						    set oChildNode = nothing
						    'The Title
						    set oChildNode = oNode.selectSingleNode("title")
							    if not oChildNode is nothing then
								    strRSSTitle = Server.HTMLEncode(oChildNode.text)
								    strGetRSS = strGetRSS & "<div class='rssTitle'><a href=""#"" onclick=""loadurl('/golfTipsMagModule/content.asp?url=" & server.URLEncode(strRSSLink) & "&pt="&Request.QueryString("c")&"&title="&server.URLEncode(strRSSTitle)&"','rssFull');return false;"">" & strRSSTitle & "</a></div>"
							    end if
						    set oChildNode = nothing
						    'Published Date
						    set oChildNode = oNode.selectSingleNode("pubDate")
							    if not oChildNode is nothing then
								    strRSSPubDate = Server.HTMLEncode(oChildNode.text)
								    strGetRSS = strGetRSS & "<div class='rssDate'>" & strRSSPubDate & "</div>" & vbCrLf
							    end if
						    set oChildNode = nothing
						    'The Description
						    set oChildNode = oNode.selectSingleNode("description")
							    if not oChildNode is nothing then
								    strRSSDesc = oChildNode.text
								    strGetRSS = strGetRSS & "<div class='rssDesc'>" & strRSSDesc & "</div>"
							    end if
						    set oChildNode = nothing
						    'Add 1 to the article count number
						    i = i + 1
						    strGetRSS = strGetRSS & "</div>" & vbCrLf
					    end if
				    next
			    set nodeCol = nothing
		    else
			    strGetRSS = strGetRSS & strPANError & vbCrLf
		    end if
	    set xmlDom = nothing
    End Function
    
    'Strip the FQDN for image and script injection
    Private Function stripFQDN(strURL)
        Set objRegExp = New RegExp
        objRegExp.IgnoreCase = True
        objRegExp.Multiline = True
        objRegExp.Global = True
        objRegExp.Pattern = "[a-zA-Z0-9]+([a-zA-Z0-9\-\.]+)?\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)"
        Set myMatches = objRegExp.Execute(strURL)
        For Each myMatch In myMatches
            stripFQDN = stripFQDN & myMatch.Value & vbcrlf
        Next
    End Function
    'Inject the domain into src
    Private Sub injectFQDN(strString, strURL)
        Set objRegExp = New RegExp
            objRegExp.IgnoreCase = True
            objRegExp.Multiline = True
            objRegExp.Global = True
            objRegExp.Pattern = "src=""([^"":]*)"""
            ResultString = objRegExp.Replace(strString, "src=""http://" & stripFQDN(strURL) & "/$1""")
        set objRegExp = nothing
    End Sub
End Class
%>
 
page.asp:
<%
set objContent = new clsFeedPuller
 
    strPattern = "<!-- PRODUCER NOTE -->([\s\S]*?)<!-- RIGHT COLUMN -->"
    strContent = objContent.strGetContent("http://www.vinfolio.com/do/store/detail?vid=93869&utm_source=RSS&utm_medium=RSS")
    response.Write(objContent.strParseContent(strContent,strPattern))
    
    
set objContent = nothing
 
%>

Open in new window

0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20411088
injectFQDN  should surely be a function not a sub?
0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20416510
Did the above work?

Also sorry I missed you earlier question as to what the $1 is for.

The Regular Expression matches everything like

src="xyz"

where xyz is any length string of characters not including either a " (as this is the end delimiter) or a : (as this would indicate an absolute URL that shouldn't be adjusted)

the xyz stuff is put into a "backreference" by enclosing it in brackets. It is the first and only backreference in the expression.

The $1 in the replace expression basically means substitute the back reference value.

If you want to learn more about RegEx's I strongly recommend RegExBuddy.
0
 
LVL 25

Author Comment

by:kevp75
ID: 20419472
ok.  looks like it works for src="something", but what about src='something' and src=something?
0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20419533
Change the pattern to "src=(""|')?([^"":]*)\1"

Change $1 to $2
0
 
LVL 25

Author Comment

by:kevp75
ID: 20432096
got it.  thanks
0
 
LVL 25

Author Comment

by:kevp75
ID: 20449201
I stand corrected.  This still does not work for src='sopmething.ext' and src=something.ext

nor does it seem to be working with anything other than images...

any thoughts?  or should I re-open the question?
0
 
LVL 25

Author Comment

by:kevp75
ID: 20546933
bueller?
0
 
LVL 25

Author Comment

by:kevp75
ID: 20581587
bueller, bueller.....anyone?
0
 
LVL 3

Expert Comment

by:Martin-Smith
ID: 20645350
It should work for single quotes.

The (""|') portion of the Regex means match either a single or double quote.

You may need to tweak the regex to allow for spaces next to the "=" character or something along those lines.

Your question only asked about src.

If you want to match, eg, href as well use the alternation character as well.

src|href=(""|')?([^":]*)\1
0
 
LVL 3

Accepted Solution

by:
Martin-Smith earned 500 total points
ID: 20645358
Try the following pattern

"(?:src|href)[\s]*=[\s]*(""|')([^"":]*)\1"
0
 
LVL 25

Author Closing Comment

by:kevp75
ID: 31412645
sorry bout that...
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I would like to start this tip/trick by saying Thank You, to all who said that this could not be done, as it forced me to make sure that it could be accomplished. :) To start, I want to make sure everyone understands the importance of utilizing p…
I was asked about the differences between classic ASP and ASP.NET, so let me put them down here, for reference: Let's make the introductions... Classic ASP was launched by Microsoft in 1998 and dynamically generate web pages upon user interact…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question