Link to home
Start Free TrialLog in
Avatar of CalDev
CalDev

asked on

Coldfusion white space removal from head of XML

I'm trying to parse a coldfusion XML feed and reformat it to the Yahoo! Media RSS format. The big problem I'm having has to do with white space characters in the head of the file (before the <?xml declaration) and removing ASCII white characters. I've tried all the solutions I could find on the web and nothing has worked. I imagine this should be a fairly simple fix.

Here is the list of white characters that are in the head of the document:
CR =  Character return.  ASCII value = 13
LF =  Line feed.  ASCII value = 10
SPC =  Space.  ASCII value = 32
TAB =  tab.  ASCII value = 9

Here is list of the white space characters and the order in which they appear in the head of the document.

CR|LF
SPC|CR|LF
SPC|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
CR|LF
TAB|CR|LF
SPC|CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
<?xml version="1.0" encoding="utf-8"?> ...
Avatar of _agx_
_agx_
Flag of United States of America image

A simple trim() seems to work w/CF9

<cfsavecontent variable="content"><cfoutput>
#chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
<?xml version="1.0" encoding="utf-8"?>
<order> 
    <customer firstname="Philip" lastname="Cramer" accountNum="21"/> 
</order></cfoutput></cfsavecontent>

<cfset doc = xmlParse(trim(content))>
<cfdump var="#doc#">

Open in new window

Avatar of CalDev
CalDev

ASKER

I guess I should have added that I'm working with CF7.
Try it anyway, it *shouldn't* make a difference.  

If that doesn't work, try a regex
...
<cfset content = reReplace(content, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(content))>

Avatar of CalDev

ASKER

I feel like what you have posted might work but I think the reason I'm not getting the correct result is that I'm not implementing it correctly. I am grabbing the original RSS feed via CFHTTP and assigning it to a variable #XMLContent# and then parsing. Can you show what the syntax should look like parsing out the white space from the begging of the content stored in the #XMLContent# variable?
Should be exactly what I posted already. Just change a change in variable name ie from #content# to #xmlContent#

<cfhttp ...>
<cfset content = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))>

Could you post the feed's url, so I can test it on the live data?
So did the original code work under MX7?  If it *does* work then the problem must be your cfhttp code. We'd need to see that part (or test a live feed).

ie
<!--- this just simulates a cfhttp call --->
<cfsavecontent variable="xmlContent"><cfoutput>
#chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
<?xml version="1.0" encoding="utf-8"?>
<order> 
    <customer firstname="Philip" lastname="Cramer" accountNum="21"/> 
</order></cfoutput></cfsavecontent>

<cfset xmlContent = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))>
<cfdump var="#doc#">

Open in new window




Avatar of CalDev

ASKER

Sure the original  feed is here: http://www.redding.com/feeds/photo-galleries/sports/college-sports/

I'm still getting the same results so my guess is this be caused by something I'm introducing in my process.

So this is how I'm grabbing it:
<cfif URL.page EQ "sports">
      <cfset URLToPull = "http://www.redding.com/feeds/photo-galleries/sports/college-sports/">
</cfif>

<cftry>
<cfhttp url="#URLToPull#"
           method="GET"
           timeout="15">
</cfhttp>
      
 <cfcatch>
  cfhttp failure
 </cfcatch>
</cftry>

and this is how I'm cleaning up the content:

<cfset XMLContent = reReplace(cfhttp.filecontent , "&lt;", "<", "ALL")>
<cfset XMLContent = reReplace(XMLContent , "&gt;", ">", "ALL")>

That's all I've done so far. I have yet to begin reformatting into the media RSS format.

and outputting the #XMLContent# produces all the white space content described above.
ASKER CERTIFIED SOLUTION
Avatar of _agx_
_agx_
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of CalDev

ASKER

That's strange.

Here is my entire code:
<cfif URL.page EQ "sports">
	<cfset URLToPull = "http://www.redding.com/feeds/photo-galleries/sports/college-sports/">
</cfif>

<cftry>
<cfhttp url="#URLToPull#"
           method="GET"
           timeout="15">
</cfhttp>
	
 <cfcatch>
  cfhttp failure
 </cfcatch>
</cftry>

<cfset XMLContent = reReplace(cfhttp.filecontent , "&lt;", "<", "ALL")>
<cfset XMLContent = reReplace(XMLContent , "&gt;", ">", "ALL")>


<cfset content = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))>


<cfoutput>#content#</cfoutput>

Open in new window


and then here is the white space (don't know if this will show).


 

 

      

      

      

      

      

      



      

 












Avatar of CalDev

ASKER

So when I run the most recent code you provided I don't see white space in the browser view however when I look at the source code this is what I see:

 
 
      
      
      
      
      
      

      
 






<pre>
Avatar of CalDev

ASKER

OK it's working now. Must of been me. Thanks for your help!