CalDev
asked on
Coldfusion white space removal from head of XML
I'm trying to parse a coldfusion XML feed and reformat it to the Yahoo! Media RSS format. The big problem I'm having has to do with white space characters in the head of the file (before the <?xml declaration) and removing ASCII white characters. I've tried all the solutions I could find on the web and nothing has worked. I imagine this should be a fairly simple fix.
Here is the list of white characters that are in the head of the document:
CR = Character return. ASCII value = 13
LF = Line feed. ASCII value = 10
SPC = Space. ASCII value = 32
TAB = tab. ASCII value = 9
Here is list of the white space characters and the order in which they appear in the head of the document.
CR|LF
SPC|CR|LF
SPC|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
CR|LF
TAB|CR|LF
SPC|CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
<?xml version="1.0" encoding="utf-8"?> ...
Here is the list of white characters that are in the head of the document:
CR = Character return. ASCII value = 13
LF = Line feed. ASCII value = 10
SPC = Space. ASCII value = 32
TAB = tab. ASCII value = 9
Here is list of the white space characters and the order in which they appear in the head of the document.
CR|LF
SPC|CR|LF
SPC|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
CR|LF
TAB|CR|LF
SPC|CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
<?xml version="1.0" encoding="utf-8"?> ...
ASKER
I guess I should have added that I'm working with CF7.
Try it anyway, it *shouldn't* make a difference.
If that doesn't work, try a regex
...
<cfset content = reReplace(content, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(content))>
If that doesn't work, try a regex
...
<cfset content = reReplace(content, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(content))>
ASKER
I feel like what you have posted might work but I think the reason I'm not getting the correct result is that I'm not implementing it correctly. I am grabbing the original RSS feed via CFHTTP and assigning it to a variable #XMLContent# and then parsing. Can you show what the syntax should look like parsing out the white space from the begging of the content stored in the #XMLContent# variable?
Should be exactly what I posted already. Just change a change in variable name ie from #content# to #xmlContent#
<cfhttp ...>
<cfset content = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent)) >
Could you post the feed's url, so I can test it on the live data?
<cfhttp ...>
<cfset content = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))
Could you post the feed's url, so I can test it on the live data?
So did the original code work under MX7? If it *does* work then the problem must be your cfhttp code. We'd need to see that part (or test a live feed).
ie
ie
<!--- this just simulates a cfhttp call --->
<cfsavecontent variable="xmlContent"><cfoutput>
#chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
<?xml version="1.0" encoding="utf-8"?>
<order>
<customer firstname="Philip" lastname="Cramer" accountNum="21"/>
</order></cfoutput></cfsavecontent>
<cfset xmlContent = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))>
<cfdump var="#doc#">
ASKER
Sure the original feed is here: http://www.redding.com/feeds/photo-galleries/sports/college-sports/
I'm still getting the same results so my guess is this be caused by something I'm introducing in my process.
So this is how I'm grabbing it:
<cfif URL.page EQ "sports">
<cfset URLToPull = "http://www.redding.com/feeds/photo-galleries/sports/college-sports/">
</cfif>
<cftry>
<cfhttp url="#URLToPull#"
method="GET"
timeout="15">
</cfhttp>
<cfcatch>
cfhttp failure
</cfcatch>
</cftry>
and this is how I'm cleaning up the content:
<cfset XMLContent = reReplace(cfhttp.fileconte nt , "<", "<", "ALL")>
<cfset XMLContent = reReplace(XMLContent , ">", ">", "ALL")>
That's all I've done so far. I have yet to begin reformatting into the media RSS format.
and outputting the #XMLContent# produces all the white space content described above.
I'm still getting the same results so my guess is this be caused by something I'm introducing in my process.
So this is how I'm grabbing it:
<cfif URL.page EQ "sports">
<cfset URLToPull = "http://www.redding.com/feeds/photo-galleries/sports/college-sports/">
</cfif>
<cftry>
<cfhttp url="#URLToPull#"
method="GET"
timeout="15">
</cfhttp>
<cfcatch>
cfhttp failure
</cfcatch>
</cftry>
and this is how I'm cleaning up the content:
<cfset XMLContent = reReplace(cfhttp.fileconte
<cfset XMLContent = reReplace(XMLContent , ">", ">", "ALL")>
That's all I've done so far. I have yet to begin reformatting into the media RSS format.
and outputting the #XMLContent# produces all the white space content described above.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That's strange.
Here is my entire code:
and then here is the white space (don't know if this will show).
Here is my entire code:
<cfif URL.page EQ "sports">
<cfset URLToPull = "http://www.redding.com/feeds/photo-galleries/sports/college-sports/">
</cfif>
<cftry>
<cfhttp url="#URLToPull#"
method="GET"
timeout="15">
</cfhttp>
<cfcatch>
cfhttp failure
</cfcatch>
</cftry>
<cfset XMLContent = reReplace(cfhttp.filecontent , "<", "<", "ALL")>
<cfset XMLContent = reReplace(XMLContent , ">", ">", "ALL")>
<cfset content = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))>
<cfoutput>#content#</cfoutput>
and then here is the white space (don't know if this will show).
ASKER
So when I run the most recent code you provided I don't see white space in the browser view however when I look at the source code this is what I see:
<pre>
<pre>
ASKER
ASKER
OK it's working now. Must of been me. Thanks for your help!
Open in new window