[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Coldfusion white space removal from head of XML

Posted on 2011-05-04
12
Medium Priority
?
630 Views
Last Modified: 2012-05-11
I'm trying to parse a coldfusion XML feed and reformat it to the Yahoo! Media RSS format. The big problem I'm having has to do with white space characters in the head of the file (before the <?xml declaration) and removing ASCII white characters. I've tried all the solutions I could find on the web and nothing has worked. I imagine this should be a fairly simple fix.

Here is the list of white characters that are in the head of the document:
CR =  Character return.  ASCII value = 13
LF =  Line feed.  ASCII value = 10
SPC =  Space.  ASCII value = 32
TAB =  tab.  ASCII value = 9

Here is list of the white space characters and the order in which they appear in the head of the document.

CR|LF
SPC|CR|LF
SPC|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
TAB|CR|LF
CR|LF
TAB|CR|LF
SPC|CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
CR|LF
<?xml version="1.0" encoding="utf-8"?> ...
0
Comment
Question by:CalDev
  • 7
  • 5
12 Comments
 
LVL 52

Expert Comment

by:_agx_
ID: 35694083
A simple trim() seems to work w/CF9

<cfsavecontent variable="content"><cfoutput>
#chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
<?xml version="1.0" encoding="utf-8"?>
<order> 
    <customer firstname="Philip" lastname="Cramer" accountNum="21"/> 
</order></cfoutput></cfsavecontent>

<cfset doc = xmlParse(trim(content))>
<cfdump var="#doc#">

Open in new window

0
 

Author Comment

by:CalDev
ID: 35694096
I guess I should have added that I'm working with CF7.
0
 
LVL 52

Expert Comment

by:_agx_
ID: 35694112
Try it anyway, it *shouldn't* make a difference.  

If that doesn't work, try a regex
...
<cfset content = reReplace(content, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(content))>

0
Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

 

Author Comment

by:CalDev
ID: 35694150
I feel like what you have posted might work but I think the reason I'm not getting the correct result is that I'm not implementing it correctly. I am grabbing the original RSS feed via CFHTTP and assigning it to a variable #XMLContent# and then parsing. Can you show what the syntax should look like parsing out the white space from the begging of the content stored in the #XMLContent# variable?
0
 
LVL 52

Expert Comment

by:_agx_
ID: 35694267
Should be exactly what I posted already. Just change a change in variable name ie from #content# to #xmlContent#

<cfhttp ...>
<cfset content = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))>

Could you post the feed's url, so I can test it on the live data?
0
 
LVL 52

Expert Comment

by:_agx_
ID: 35694315
So did the original code work under MX7?  If it *does* work then the problem must be your cfhttp code. We'd need to see that part (or test a live feed).

ie
<!--- this just simulates a cfhttp call --->
<cfsavecontent variable="xmlContent"><cfoutput>
#chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(9)##chr(13)##chr(10)#
#chr(32)##chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
#chr(13)##chr(10)#
<?xml version="1.0" encoding="utf-8"?>
<order> 
    <customer firstname="Philip" lastname="Cramer" accountNum="21"/> 
</order></cfoutput></cfsavecontent>

<cfset xmlContent = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))>
<cfdump var="#doc#">

Open in new window




0
 

Author Comment

by:CalDev
ID: 35694321
Sure the original  feed is here: http://www.redding.com/feeds/photo-galleries/sports/college-sports/

I'm still getting the same results so my guess is this be caused by something I'm introducing in my process.

So this is how I'm grabbing it:
<cfif URL.page EQ "sports">
      <cfset URLToPull = "http://www.redding.com/feeds/photo-galleries/sports/college-sports/">
</cfif>

<cftry>
<cfhttp url="#URLToPull#"
           method="GET"
           timeout="15">
</cfhttp>
      
 <cfcatch>
  cfhttp failure
 </cfcatch>
</cftry>

and this is how I'm cleaning up the content:

<cfset XMLContent = reReplace(cfhttp.filecontent , "&lt;", "<", "ALL")>
<cfset XMLContent = reReplace(XMLContent , "&gt;", ">", "ALL")>

That's all I've done so far. I have yet to begin reformatting into the media RSS format.

and outputting the #XMLContent# produces all the white space content described above.
0
 
LVL 52

Accepted Solution

by:
_agx_ earned 2000 total points
ID: 35694406
I dont' get it ;-) Where is the leading white space? If I run this:

<cfset URLToPull = "http://www.redding.com/feeds/photo-galleries/sports/college-sports/">
<cfhttp url="#URLToPull#"
           method="GET"
           timeout="15">
<!--- see where white space begins and ends ....--->
<cfoutput><pre>|start|#htmlEditFormat(cfhttp.filecontent)#|end|</pre></cfoutput>

Open in new window


I don't see any leading white space in the result

|start|<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">....

Open in new window

0
 

Author Comment

by:CalDev
ID: 35694507
That's strange.

Here is my entire code:
<cfif URL.page EQ "sports">
	<cfset URLToPull = "http://www.redding.com/feeds/photo-galleries/sports/college-sports/">
</cfif>

<cftry>
<cfhttp url="#URLToPull#"
           method="GET"
           timeout="15">
</cfhttp>
	
 <cfcatch>
  cfhttp failure
 </cfcatch>
</cftry>

<cfset XMLContent = reReplace(cfhttp.filecontent , "&lt;", "<", "ALL")>
<cfset XMLContent = reReplace(XMLContent , "&gt;", ">", "ALL")>


<cfset content = reReplace(XMLContent, "^[\n\r\t\s]+", "")>
<cfset doc = xmlParse(trim(XMLContent))>


<cfoutput>#content#</cfoutput>

Open in new window


and then here is the white space (don't know if this will show).


 

 

      

      

      

      

      

      



      

 












0
 

Author Comment

by:CalDev
ID: 35694526
So when I run the most recent code you provided I don't see white space in the browser view however when I look at the source code this is what I see:

 
 
      
      
      
      
      
      

      
 






<pre>
0
 

Author Comment

by:CalDev
ID: 35694543
0
 

Author Closing Comment

by:CalDev
ID: 35694570
OK it's working now. Must of been me. Thanks for your help!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hi, Even though I have created this Tutorial on My personal Blog, Some people might not able to find my website, So here i am posting it again Today, from the topic it is very clear that i will be showing you here the very basic usage of how we …
Sometimes databases have MILLIONS of records and we need a way to quickly query that table to return the results me need. Sure you could use CFQUERY but it takes too long when there are millions of records. That is why SOLR was invented. Please …
this video summaries big data hadoop online training demo (http://onlineitguru.com/big-data-hadoop-online-training-placement.html) , and covers basics in big data hadoop .
Whether it be Exchange Server Crash Issues, Dirty Shutdown Errors or Failed to mount error, Stellar Phoenix Mailbox Exchange Recovery has always got your back. With the help of its easy to understand user interface and 3 simple steps recovery proced…
Suggested Courses
Course of the Month19 days, 13 hours left to enroll

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question