Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 246
  • Last Modified:

XML Question: Ideas for formatting or inserting into database

I work for a company that makes journals in Quark.  I am trying to figure a way to repurpose the material.  The XML output in the link below is from a program called Atomik XML by Easypress.com.   What I need to know is if it is possilbe to take this exported XML document and have it be inserted into a database (SQL 2000).  Yes  I know I can format it wth a stylesheet and just use the XML document, but I would also like to know if it can be properly inserted into a database with some of the fields provided (name, byline, abstract, the main article, references).

Here is the link, let me know if it is possilbe to work with it.

http://www.jhasim.net/xml/xmltest1.xml

thanks,
Jeff
0
jeffmace
Asked:
jeffmace
  • 13
  • 10
1 Solution
 
anandkpCommented:
Yes it can be done easily with a CF tag called CFX_XMLParser

few example u can take a look at with this tag :

<CFSET xml="<?xml version='1.0'?>
<root>
      <person id='007'>
            <firstname>James</firstname>
            <lastname>Bond</lastname>
      </person>
         <person id='008'>
            <firstname>James</firstname>
            <lastname>Bond</lastname>
      </person>
</root>
">
<CF_XMLPARSER xml="#xml#" output="obj"><!---
<CF_DUMP v="root">--->
<CFOUTPUT>
      Agent #obj.root.person.id# is #obj.root.person.firstname.INNER_TEXT# #obj.root.person.lastname.INNER_TEXT#
</CFOUTPUT>

<!--- my own example with a xml file--->
<CF_XMLPARSER FILE="C:\Clients\Current.xml" OUTPUT="Root">
<CF_XMLPARSER FILE="C:\Clients\#Root.ynarticles.ynarticle.key.INNER_TEXT#.xml" OUTPUT="Main">
<CFQUERY NAME="Del_Date" DATASOURCE="#IWD_DSN#" DBTYPE="ODBC" USERNAME="#IWD_USERNAME#" PASSWORD="#IWD_PASSWORD#">
      Update Tbl_HoroscopeText_M Set
      HoroscopeText_1 = 'Main.ynarticle.body.web.INNER_TEXT#'
</CFQUERY>

HTH

K'Rgds
Anand
0
 
proceptCommented:
Hi,

if you are using CF MX, it can be done without a custom tag. What do you call "the main article"? Everything inside the body tag? How do you want to format headings, subheadings and paragraphs. Also, so you want to insert all references into one field in the main DB table or do you want to set up a related table??

It's quite easy, but, have to know the details. ;-))

Chris
0
 
jeffmaceAuthor Commented:
Yes, I am using CFMX 6.1.   The main article is everything inside of <body>,  <abstract> would go into its own database field, <references> would go into its own field, <byline> would be its own field.  The images/figures/tables need to be worked out yet, so i am not sure what i am going to do with them.  Also, you will see a tag in their that will look like this: <superior>43</superior> , this is for superscripts.  If i am inserting this information into a database, is it possilbe to make sure that they go in as superscripts.  Basically change that tag to <sup>43</sup>.   I hope you understand what I am asking.

Thanks
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
proceptCommented:
what about the different bodytext tags and how would you format the headings?
0
 
jeffmaceAuthor Commented:
Well I would like all the body text with the headings to be in field called something like articletext.   All of this is one article, so its only one title, byline, articletext, references..   I don't need it broken up to be inserted anywhere else.   If you are asking about actual format such as color or something, headings can just be bolded.
0
 
proceptCommented:
OK, here you go....

the following code will
- read the file and parse the XML
- extract title, byline, abstract, articletext, and references
- format all <headings> as <h2>, all <subheadings> as <h3>, all <superior> as <sup>, and all <paragraphs> as <p>

It seems very long, because I put in a lot of empty lines to improve readability. ;-))

If you need something changed, just let me know.

HTH,

Chris


<!--- read the file and parse the xml; could be HTTP request or something else instead of file read --->
<cffile action="read" file="E:\wwwroot\test\xml\xmltest1.xml" variable="xmlFile">
<cfset myXML = xmlParse(xmlFile)>

<!--- find all articles ad store in array --->
<cfset aryArticles = xmlSearch(myXML, "//article")>

<!--- loop over all articles --->
<cfloop from="1" to="#arrayLen(aryArticles)#" index="idx">

   <!--- get title and byline as text --->
   <cfset title = aryArticles[idx].head.articletitle.xmlText>
   <cfset byline = aryArticles[idx].head.byline.xmlText>
   
   <!--- get abstract as text --->
   <cfset abstract = toString(aryArticles[idx].head.abstract)>

   <!--- remove XML definition and abstract tag --->
   <cfset abstract = reReplaceNoCase(abstract, "<\?xml[^>]+>", "")>
   <cfset abstract = reReplaceNoCase(abstract, "</?abstract>", "", "ALL")>  

   <!--- replace paragraph tags with <p></p> --->
   <cfset abstract = reReplaceNoCase(abstract, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- get body as XML element --->
   <cfset body = aryArticles[idx].body>

   <!--- get references as XML element --->
   <cfset references = toString(aryArticles[idx].body.references)>

   <!---  go though all sub elements of body --->
   <cfloop from="#ArrayLen(body.xmlChildren)#" to="1" step="-1" index="childPos">

      <!---  if subelement is not bodytext, heading, or subheading: delete element --->
      <cfif NOT ListFIndNoCase("bodytext,heading,subheading", body.xmlChildren[childPos].xmlName)>
         <cfset ArrayDeleteAt(body.xmlChildren, childPos)>
      </cfif>

   </cfloop>
   
   <!--- transform body element to text --->
   <cfset bodyText = toString(body)>

   <!--- remove XML definition and body tag --->
   <cfset bodytext = reReplaceNoCase(bodyText, "<\?xml[^>]+>", "")>
   <cfset bodyText = reReplaceNoCase(bodyText, "</?body[^>]*>", "", "ALL")>  
   
   <!---
      Replaces:
      <paragraph> with <p>
      <heading> with <h2>
      <subheading> with <h3>
      <superior> with <sup>
    --->
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?paragraph>", "<\1p>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?heading>", "<\1h2>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?subheading>", "<\1h3>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?superior>", "<\1sup>", "ALL")>

   <!--- replace <paragraph> with <p> in references --->
   <cfset references = reReplaceNoCase(references, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- insert into database --->  
   <CFQUERY name="insertStrings" datasource="#request.dsn#">
          INSERT INTO myTable (title, byline, abstract, body, references)
          VALUES ('#title#', '#byline#', '#abstract#', '#body#', '#references#')
          </CFQUERY>
     
</cfloop>
0
 
jeffmaceAuthor Commented:
I got an error:
line 16


14 :    
15 :    <!--- get abstract as text --->
16 :    <cfset abstract = toString(aryArticles[idx].head.abstract)>
17 :
18 :    <!--- remove XML definition and abstract tag --->

0
 
proceptCommented:
what does the error msg say????
0
 
jeffmaceAuthor Commented:
Just what I pasted:

Error Occurred While Processing Request  
 
 
 
The error occurred in E:\Inetpub\ASiM\web\xml\output6.cfm: line 16
 
14 :    
15 :    <!--- get abstract as text --->
16 :    <cfset abstract = toString(aryArticles[idx].head.abstract)>
17 :
18 :    <!--- remove XML definition and abstract tag --->

 

--------------------------------------------------------------------------------
 
Please try the following:
Check the ColdFusion documentation to verify that you are using the correct syntax.
Search the Knowledge Base to find a solution to your problem.

 
Browser   Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
Remote Address   138.89.24.246
Referrer    
Date/Time   29-Oct-03 10:58 AM
 
Stack Trace (click to expand)  
 
0
 
proceptCommented:
I just copied the code from my post with copy and paste back to my machine and executed it ... works without an error.

Does the error say something else. like "ELement XYZ is undefined" or "ColdFusion was looking at the following text"???
0
 
jeffmaceAuthor Commented:
The only thing i changed so far was the path to the XML file.

here is a image of the error:
http://www.jhasim.net/xml/error.jpg


Here is the code that i am using.  I also commented out the code to insert into database right now.


<!--- read the file and parse the xml; could be HTTP request or something else instead of file read --->
<cffile action="read" file="E:\Inetpub\ASiM\web\xml\xmltest1.xml" variable="xmlFile">
<cfset myXML = xmlParse(xmlFile)>

<!--- find all articles ad store in array --->
<cfset aryArticles = xmlSearch(myXML, "//article")>

<!--- loop over all articles --->
<cfloop from="1" to="#arrayLen(aryArticles)#" index="idx">

   <!--- get title and byline as text --->
   <cfset title = aryArticles[idx].head.articletitle.xmlText>
   <cfset byline = aryArticles[idx].head.byline.xmlText>
   
   <!--- get abstract as text --->
   <cfset abstract = toString(aryArticles[idx].head.abstract)>

   <!--- remove XML definition and abstract tag --->
   <cfset abstract = reReplaceNoCase(abstract, "<\?xml[^>]+>", "")>
   <cfset abstract = reReplaceNoCase(abstract, "</?abstract>", "", "ALL")>  

   <!--- replace paragraph tags with <p></p> --->
   <cfset abstract = reReplaceNoCase(abstract, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- get body as XML element --->
   <cfset body = aryArticles[idx].body>

   <!--- get references as XML element --->
   <cfset references = toString(aryArticles[idx].body.references)>

   <!---  go though all sub elements of body --->
   <cfloop from="#ArrayLen(body.xmlChildren)#" to="1" step="-1" index="childPos">

      <!---  if subelement is not bodytext, heading, or subheading: delete element --->
      <cfif NOT ListFIndNoCase("bodytext,heading,subheading", body.xmlChildren[childPos].xmlName)>
         <cfset ArrayDeleteAt(body.xmlChildren, childPos)>
      </cfif>

   </cfloop>
   
   <!--- transform body element to text --->
   <cfset bodyText = toString(body)>

   <!--- remove XML definition and body tag --->
   <cfset bodytext = reReplaceNoCase(bodyText, "<\?xml[^>]+>", "")>
   <cfset bodyText = reReplaceNoCase(bodyText, "</?body[^>]*>", "", "ALL")>  
   
   <!---
      Replaces:
      <paragraph> with <p>
      <heading> with <h2>
      <subheading> with <h3>
      <superior> with <sup>
    --->
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?paragraph>", "<\1p>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?heading>", "<\1h2>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?subheading>", "<\1h3>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?superior>", "<\1sup>", "ALL")>

   <!--- replace <paragraph> with <p> in references --->
   <cfset references = reReplaceNoCase(references, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- insert into database --->  
<!---    <CFQUERY name="insertStrings" datasource="#request.dsn#">
          INSERT INTO myTable (title, byline, abstract, body, references)
          VALUES ('#title#', '#byline#', '#abstract#', '#body#', '#references#')
          </CFQUERY> --->
     
</cfloop>
0
 
jeffmaceAuthor Commented:
Do you know of any issues with toString and CFMX, i tried something a while ago with toString and it returned an error and i never resolved it.  
0
 
jeffmaceAuthor Commented:
i even tried commenting out all the parts dealing with abstract and then when i ran it, i got the next error that had the converting to string in it.


this line below errored when commenting out the abstract section
<cfset references = toString(aryArticles[idx].body.references)>
0
 
jeffmaceAuthor Commented:
Ok, I got it to work, there seems to be an issue with toString.  I had to turn off CF Security and it worked just fine.

Ok, here is my question for you and maybe you can show me what i did wrong.  For right now I just wanted to output the text to see what happened.  Everything came out great except for the Body text.  It didn't come out at all.

Also, do you know of how to prepare the html to allow for special characters such as quotes.  I know in a XML document you can have the (encoding="UTF-8") which allows for quotes,etc..

Here is a live link to my output.
http://www.jhasim.net/xml/output6.cfm

My CODE:

<!--- read the file and parse the xml; could be HTTP request or something else instead of file read --->
<cffile action="read" file="E:\InetPub\ASiM\web\xml\xmltest1.xml" variable="xmlFile">
<cfset myXML = xmlParse(xmlFile)>

<!--- find all articles ad store in array --->
<cfset aryArticles = xmlSearch(myXML, "//article")>

<!--- loop over all articles --->
<cfloop from="1" to="#arrayLen(aryArticles)#" index="idx">

   <!--- get title and byline as text --->
   <cfset title = aryArticles[idx].head.articletitle.xmlText>
   <cfset byline = aryArticles[idx].head.byline.xmlText>
   
   <!--- get abstract as text --->
   <cfset abstract = toString(aryArticles[idx].head.abstract)>

   <!--- remove XML definition and abstract tag --->
   <cfset abstract = reReplaceNoCase(abstract, "<\?xml[^>]+>", "")>
   <cfset abstract = reReplaceNoCase(abstract, "</?abstract>", "", "ALL")>  

   <!--- replace paragraph tags with <p></p> --->
   <cfset abstract = reReplaceNoCase(abstract, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- get body as XML element --->
   <cfset body = aryArticles[idx].body>

   <!--- get references as XML element --->
   <cfset references = toString(aryArticles[idx].body.references)>

   <!---  go though all sub elements of body --->
   <cfloop from="#ArrayLen(body.xmlChildren)#" to="1" step="-1" index="childPos">

      <!---  if subelement is not bodytext, heading, or subheading: delete element --->
      <cfif NOT ListFIndNoCase("bodytext,heading,subheading", body.xmlChildren[childPos].xmlName)>
         <cfset ArrayDeleteAt(body.xmlChildren, childPos)>
      </cfif>

   </cfloop>
   
   <!--- transform body element to text --->
   <cfset bodyText = toString(body)>

   <!--- remove XML definition and body tag --->
   <cfset bodytext = reReplaceNoCase(bodyText, "<\?xml[^>]+>", "")>
   <cfset bodyText = reReplaceNoCase(bodyText, "</?body[^>]*>", "", "ALL")>  
   
   <!---
      Replaces:
      <paragraph> with <p>
      <heading> with <h2>
      <subheading> with <h3>
      <superior> with <sup>
    --->
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?paragraph>", "<\1p>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?heading>", "<\1h2>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?subheading>", "<\1h3>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?superior>", "<\1sup>", "ALL")>

   <!--- replace <paragraph> with <p> in references --->
   <cfset references = reReplaceNoCase(references, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- insert into database --->  
<!---    <CFQUERY name="insertStrings" datasource="#request.dsn#">
          INSERT INTO myTable (title, byline, abstract, body, references)
          VALUES ('#title#', '#byline#', '#abstract#', '#body#', '#references#')
          </CFQUERY> --->
             
<CFOUTPUT>
#title#<br>
#byline#<br>
#abstract#<br>
#body#<br>
#references#<br>
</CFOUTPUT>
     
</cfloop>

0
 
jeffmaceAuthor Commented:
Well, i changed body to bodytext and that is outputting all the content just fine.  So my last problem is the crazy character encoding, so if you can help me fix that, that would be wonderful.

http://www.jhasim.net/xml/output6.cfm

0
 
proceptCommented:
Hi,

that issue with security is weird... I'm gonna do some tests when I find the time!

Missing body-output: you make the same mistake as me. ;-)) You are displaying the var body instead of bodyText. (So did I in the query, change that when you need to do a real query!!)

UTF-8: you might try putting a <cfprocessingdirective> at the top of the page:

<cfprocessingdirective suppresswhitespace="Yes" pageencoding="UTF-8">

Or you might use a meta tag (can be done with <cfheader>, too):
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

What database are you going to use?? Should be unicode-enabled!

HTH,

Chris

0
 
jeffmaceAuthor Commented:
Well i am using SQL 2000.   I inserted your code into the database and the same output came out.

http://www.jhasim.net/xml/test.cfm

It seems to be inserting into the database with those characters.

I tried the different encodings, but its still coming out the same.
0
 
proceptCommented:
With SQL2000 you need to use nvarchar fields instead of varchar, and ntext instead of text.

With nvarchar-fields you need to keep in mind that UTF-8 is a 2-byte encoding. That means your fields need to be twice as long as the maximum character length. If you want to store a max of 50 chars in the field, it has to be nvarchar(100).

Also, with inserting or updating, you need to tell SQL Server that you are inserting nvarchar by adding a capital N to the value (no kidding!!):
INSERT INTO myTable (myNvarcharField)
VALUES (N'#myCFVariable#')



0
 
jeffmaceAuthor Commented:
I am sorry about this, but still does not work

here is a shot of my SQL Server.

http://www.jhasim.net/xml/sql.jpg

No matter how I am seeing the data, it seems to be grabbing it and sending it out as  this type of stuff:  Ã¢&#128;&#156;somatoform disordersâ&#128;&#157.  

0
 
proceptCommented:
I just did some test and it turned out that you need to add the charset attribute to the cffile tag:

<cffile action="read" file="E:\InetPub\ASiM\web\xml\xmltest1.xml" variable="xmlFile" charset="utf-8">

Also, I forgot to remove the xml definition and the references tag from the references string. You need to add those lines:

   <cfset references = reReplaceNoCase(references, "<\?xml[^>]+>", "")>
   <cfset references = reReplaceNoCase(references, "</?references[^>]*>", "", "ALL")>  

Hope it works now. ;-))

Chris

0
 
jeffmaceAuthor Commented:
Oh my God, you are the best.  I wish I could give you 1,000,000 points.   This works great and I should be able to do what I need with it.

Great JOB!!!!
0
 
proceptCommented:
you're welcome... let me know if you need more help. :-))

Chris
0
 
jeffmaceAuthor Commented:
One more question regarding the formatting of the headings.  If i wanted to change the h2 and h3 to a font size, how would that be done if it's possilbe.
0
 
proceptCommented:
You need to change these lines:

<cfset bodytext = reReplaceNoCase(bodytext, "<(/)?heading>", "<\1h2>", "ALL")>
<cfset bodytext = reReplaceNoCase(bodytext, "<(/)?subheading>", "<\1h3>", "ALL")>
   
But, as <font> requires attributes, you have to so it slightly different:

<cfset bodytext = replaceNoCase(bodytext, "<heading>", "<font size="4">", "ALL")>
<cfset bodytext = replaceNoCase(bodytext, "<subheading>", "<font size="3">", "ALL")>
<cfset bodytext = reReplaceNoCase(bodytext, "</(sub)?heading>", "</font>", "ALL")>

That should do, haven't tested it though!

Chris

0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 13
  • 10
Tackle projects and never again get stuck behind a technical roadblock.
Join Now