Solved

XML Question:  Ideas for formatting or inserting into database

Posted on 2003-10-28
24
232 Views
Last Modified: 2013-12-24
I work for a company that makes journals in Quark.  I am trying to figure a way to repurpose the material.  The XML output in the link below is from a program called Atomik XML by Easypress.com.   What I need to know is if it is possilbe to take this exported XML document and have it be inserted into a database (SQL 2000).  Yes  I know I can format it wth a stylesheet and just use the XML document, but I would also like to know if it can be properly inserted into a database with some of the fields provided (name, byline, abstract, the main article, references).

Here is the link, let me know if it is possilbe to work with it.

http://www.jhasim.net/xml/xmltest1.xml

thanks,
Jeff
0
Comment
Question by:jeffmace
  • 13
  • 10
24 Comments
 
LVL 17

Expert Comment

by:anandkp
ID: 9639349
Yes it can be done easily with a CF tag called CFX_XMLParser

few example u can take a look at with this tag :

<CFSET xml="<?xml version='1.0'?>
<root>
      <person id='007'>
            <firstname>James</firstname>
            <lastname>Bond</lastname>
      </person>
         <person id='008'>
            <firstname>James</firstname>
            <lastname>Bond</lastname>
      </person>
</root>
">
<CF_XMLPARSER xml="#xml#" output="obj"><!---
<CF_DUMP v="root">--->
<CFOUTPUT>
      Agent #obj.root.person.id# is #obj.root.person.firstname.INNER_TEXT# #obj.root.person.lastname.INNER_TEXT#
</CFOUTPUT>

<!--- my own example with a xml file--->
<CF_XMLPARSER FILE="C:\Clients\Current.xml" OUTPUT="Root">
<CF_XMLPARSER FILE="C:\Clients\#Root.ynarticles.ynarticle.key.INNER_TEXT#.xml" OUTPUT="Main">
<CFQUERY NAME="Del_Date" DATASOURCE="#IWD_DSN#" DBTYPE="ODBC" USERNAME="#IWD_USERNAME#" PASSWORD="#IWD_PASSWORD#">
      Update Tbl_HoroscopeText_M Set
      HoroscopeText_1 = 'Main.ynarticle.body.web.INNER_TEXT#'
</CFQUERY>

HTH

K'Rgds
Anand
0
 
LVL 4

Expert Comment

by:procept
ID: 9640038
Hi,

if you are using CF MX, it can be done without a custom tag. What do you call "the main article"? Everything inside the body tag? How do you want to format headings, subheadings and paragraphs. Also, so you want to insert all references into one field in the main DB table or do you want to set up a related table??

It's quite easy, but, have to know the details. ;-))

Chris
0
 

Author Comment

by:jeffmace
ID: 9642019
Yes, I am using CFMX 6.1.   The main article is everything inside of <body>,  <abstract> would go into its own database field, <references> would go into its own field, <byline> would be its own field.  The images/figures/tables need to be worked out yet, so i am not sure what i am going to do with them.  Also, you will see a tag in their that will look like this: <superior>43</superior> , this is for superscripts.  If i am inserting this information into a database, is it possilbe to make sure that they go in as superscripts.  Basically change that tag to <sup>43</sup>.   I hope you understand what I am asking.

Thanks
0
 
LVL 4

Expert Comment

by:procept
ID: 9642200
what about the different bodytext tags and how would you format the headings?
0
 

Author Comment

by:jeffmace
ID: 9642232
Well I would like all the body text with the headings to be in field called something like articletext.   All of this is one article, so its only one title, byline, articletext, references..   I don't need it broken up to be inserted anywhere else.   If you are asking about actual format such as color or something, headings can just be bolded.
0
 
LVL 4

Expert Comment

by:procept
ID: 9642568
OK, here you go....

the following code will
- read the file and parse the XML
- extract title, byline, abstract, articletext, and references
- format all <headings> as <h2>, all <subheadings> as <h3>, all <superior> as <sup>, and all <paragraphs> as <p>

It seems very long, because I put in a lot of empty lines to improve readability. ;-))

If you need something changed, just let me know.

HTH,

Chris


<!--- read the file and parse the xml; could be HTTP request or something else instead of file read --->
<cffile action="read" file="E:\wwwroot\test\xml\xmltest1.xml" variable="xmlFile">
<cfset myXML = xmlParse(xmlFile)>

<!--- find all articles ad store in array --->
<cfset aryArticles = xmlSearch(myXML, "//article")>

<!--- loop over all articles --->
<cfloop from="1" to="#arrayLen(aryArticles)#" index="idx">

   <!--- get title and byline as text --->
   <cfset title = aryArticles[idx].head.articletitle.xmlText>
   <cfset byline = aryArticles[idx].head.byline.xmlText>
   
   <!--- get abstract as text --->
   <cfset abstract = toString(aryArticles[idx].head.abstract)>

   <!--- remove XML definition and abstract tag --->
   <cfset abstract = reReplaceNoCase(abstract, "<\?xml[^>]+>", "")>
   <cfset abstract = reReplaceNoCase(abstract, "</?abstract>", "", "ALL")>  

   <!--- replace paragraph tags with <p></p> --->
   <cfset abstract = reReplaceNoCase(abstract, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- get body as XML element --->
   <cfset body = aryArticles[idx].body>

   <!--- get references as XML element --->
   <cfset references = toString(aryArticles[idx].body.references)>

   <!---  go though all sub elements of body --->
   <cfloop from="#ArrayLen(body.xmlChildren)#" to="1" step="-1" index="childPos">

      <!---  if subelement is not bodytext, heading, or subheading: delete element --->
      <cfif NOT ListFIndNoCase("bodytext,heading,subheading", body.xmlChildren[childPos].xmlName)>
         <cfset ArrayDeleteAt(body.xmlChildren, childPos)>
      </cfif>

   </cfloop>
   
   <!--- transform body element to text --->
   <cfset bodyText = toString(body)>

   <!--- remove XML definition and body tag --->
   <cfset bodytext = reReplaceNoCase(bodyText, "<\?xml[^>]+>", "")>
   <cfset bodyText = reReplaceNoCase(bodyText, "</?body[^>]*>", "", "ALL")>  
   
   <!---
      Replaces:
      <paragraph> with <p>
      <heading> with <h2>
      <subheading> with <h3>
      <superior> with <sup>
    --->
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?paragraph>", "<\1p>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?heading>", "<\1h2>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?subheading>", "<\1h3>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?superior>", "<\1sup>", "ALL")>

   <!--- replace <paragraph> with <p> in references --->
   <cfset references = reReplaceNoCase(references, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- insert into database --->  
   <CFQUERY name="insertStrings" datasource="#request.dsn#">
          INSERT INTO myTable (title, byline, abstract, body, references)
          VALUES ('#title#', '#byline#', '#abstract#', '#body#', '#references#')
          </CFQUERY>
     
</cfloop>
0
 

Author Comment

by:jeffmace
ID: 9642599
I got an error:
line 16


14 :    
15 :    <!--- get abstract as text --->
16 :    <cfset abstract = toString(aryArticles[idx].head.abstract)>
17 :
18 :    <!--- remove XML definition and abstract tag --->

0
 
LVL 4

Expert Comment

by:procept
ID: 9642852
what does the error msg say????
0
 

Author Comment

by:jeffmace
ID: 9642873
Just what I pasted:

Error Occurred While Processing Request  
 
 
 
The error occurred in E:\Inetpub\ASiM\web\xml\output6.cfm: line 16
 
14 :    
15 :    <!--- get abstract as text --->
16 :    <cfset abstract = toString(aryArticles[idx].head.abstract)>
17 :
18 :    <!--- remove XML definition and abstract tag --->

 

--------------------------------------------------------------------------------
 
Please try the following:
Check the ColdFusion documentation to verify that you are using the correct syntax.
Search the Knowledge Base to find a solution to your problem.

 
Browser   Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
Remote Address   138.89.24.246
Referrer    
Date/Time   29-Oct-03 10:58 AM
 
Stack Trace (click to expand)  
 
0
 
LVL 4

Expert Comment

by:procept
ID: 9643043
I just copied the code from my post with copy and paste back to my machine and executed it ... works without an error.

Does the error say something else. like "ELement XYZ is undefined" or "ColdFusion was looking at the following text"???
0
 

Author Comment

by:jeffmace
ID: 9643121
The only thing i changed so far was the path to the XML file.

here is a image of the error:
http://www.jhasim.net/xml/error.jpg


Here is the code that i am using.  I also commented out the code to insert into database right now.


<!--- read the file and parse the xml; could be HTTP request or something else instead of file read --->
<cffile action="read" file="E:\Inetpub\ASiM\web\xml\xmltest1.xml" variable="xmlFile">
<cfset myXML = xmlParse(xmlFile)>

<!--- find all articles ad store in array --->
<cfset aryArticles = xmlSearch(myXML, "//article")>

<!--- loop over all articles --->
<cfloop from="1" to="#arrayLen(aryArticles)#" index="idx">

   <!--- get title and byline as text --->
   <cfset title = aryArticles[idx].head.articletitle.xmlText>
   <cfset byline = aryArticles[idx].head.byline.xmlText>
   
   <!--- get abstract as text --->
   <cfset abstract = toString(aryArticles[idx].head.abstract)>

   <!--- remove XML definition and abstract tag --->
   <cfset abstract = reReplaceNoCase(abstract, "<\?xml[^>]+>", "")>
   <cfset abstract = reReplaceNoCase(abstract, "</?abstract>", "", "ALL")>  

   <!--- replace paragraph tags with <p></p> --->
   <cfset abstract = reReplaceNoCase(abstract, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- get body as XML element --->
   <cfset body = aryArticles[idx].body>

   <!--- get references as XML element --->
   <cfset references = toString(aryArticles[idx].body.references)>

   <!---  go though all sub elements of body --->
   <cfloop from="#ArrayLen(body.xmlChildren)#" to="1" step="-1" index="childPos">

      <!---  if subelement is not bodytext, heading, or subheading: delete element --->
      <cfif NOT ListFIndNoCase("bodytext,heading,subheading", body.xmlChildren[childPos].xmlName)>
         <cfset ArrayDeleteAt(body.xmlChildren, childPos)>
      </cfif>

   </cfloop>
   
   <!--- transform body element to text --->
   <cfset bodyText = toString(body)>

   <!--- remove XML definition and body tag --->
   <cfset bodytext = reReplaceNoCase(bodyText, "<\?xml[^>]+>", "")>
   <cfset bodyText = reReplaceNoCase(bodyText, "</?body[^>]*>", "", "ALL")>  
   
   <!---
      Replaces:
      <paragraph> with <p>
      <heading> with <h2>
      <subheading> with <h3>
      <superior> with <sup>
    --->
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?paragraph>", "<\1p>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?heading>", "<\1h2>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?subheading>", "<\1h3>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?superior>", "<\1sup>", "ALL")>

   <!--- replace <paragraph> with <p> in references --->
   <cfset references = reReplaceNoCase(references, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- insert into database --->  
<!---    <CFQUERY name="insertStrings" datasource="#request.dsn#">
          INSERT INTO myTable (title, byline, abstract, body, references)
          VALUES ('#title#', '#byline#', '#abstract#', '#body#', '#references#')
          </CFQUERY> --->
     
</cfloop>
0
 

Author Comment

by:jeffmace
ID: 9643180
Do you know of any issues with toString and CFMX, i tried something a while ago with toString and it returned an error and i never resolved it.  
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:jeffmace
ID: 9643267
i even tried commenting out all the parts dealing with abstract and then when i ran it, i got the next error that had the converting to string in it.


this line below errored when commenting out the abstract section
<cfset references = toString(aryArticles[idx].body.references)>
0
 

Author Comment

by:jeffmace
ID: 9643678
Ok, I got it to work, there seems to be an issue with toString.  I had to turn off CF Security and it worked just fine.

Ok, here is my question for you and maybe you can show me what i did wrong.  For right now I just wanted to output the text to see what happened.  Everything came out great except for the Body text.  It didn't come out at all.

Also, do you know of how to prepare the html to allow for special characters such as quotes.  I know in a XML document you can have the (encoding="UTF-8") which allows for quotes,etc..

Here is a live link to my output.
http://www.jhasim.net/xml/output6.cfm

My CODE:

<!--- read the file and parse the xml; could be HTTP request or something else instead of file read --->
<cffile action="read" file="E:\InetPub\ASiM\web\xml\xmltest1.xml" variable="xmlFile">
<cfset myXML = xmlParse(xmlFile)>

<!--- find all articles ad store in array --->
<cfset aryArticles = xmlSearch(myXML, "//article")>

<!--- loop over all articles --->
<cfloop from="1" to="#arrayLen(aryArticles)#" index="idx">

   <!--- get title and byline as text --->
   <cfset title = aryArticles[idx].head.articletitle.xmlText>
   <cfset byline = aryArticles[idx].head.byline.xmlText>
   
   <!--- get abstract as text --->
   <cfset abstract = toString(aryArticles[idx].head.abstract)>

   <!--- remove XML definition and abstract tag --->
   <cfset abstract = reReplaceNoCase(abstract, "<\?xml[^>]+>", "")>
   <cfset abstract = reReplaceNoCase(abstract, "</?abstract>", "", "ALL")>  

   <!--- replace paragraph tags with <p></p> --->
   <cfset abstract = reReplaceNoCase(abstract, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- get body as XML element --->
   <cfset body = aryArticles[idx].body>

   <!--- get references as XML element --->
   <cfset references = toString(aryArticles[idx].body.references)>

   <!---  go though all sub elements of body --->
   <cfloop from="#ArrayLen(body.xmlChildren)#" to="1" step="-1" index="childPos">

      <!---  if subelement is not bodytext, heading, or subheading: delete element --->
      <cfif NOT ListFIndNoCase("bodytext,heading,subheading", body.xmlChildren[childPos].xmlName)>
         <cfset ArrayDeleteAt(body.xmlChildren, childPos)>
      </cfif>

   </cfloop>
   
   <!--- transform body element to text --->
   <cfset bodyText = toString(body)>

   <!--- remove XML definition and body tag --->
   <cfset bodytext = reReplaceNoCase(bodyText, "<\?xml[^>]+>", "")>
   <cfset bodyText = reReplaceNoCase(bodyText, "</?body[^>]*>", "", "ALL")>  
   
   <!---
      Replaces:
      <paragraph> with <p>
      <heading> with <h2>
      <subheading> with <h3>
      <superior> with <sup>
    --->
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?paragraph>", "<\1p>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?heading>", "<\1h2>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?subheading>", "<\1h3>", "ALL")>
   <cfset bodytext = reReplaceNoCase(bodytext, "<(/)?superior>", "<\1sup>", "ALL")>

   <!--- replace <paragraph> with <p> in references --->
   <cfset references = reReplaceNoCase(references, "<(/)?paragraph>", "<\1p>", "ALL")>
   
   <!--- insert into database --->  
<!---    <CFQUERY name="insertStrings" datasource="#request.dsn#">
          INSERT INTO myTable (title, byline, abstract, body, references)
          VALUES ('#title#', '#byline#', '#abstract#', '#body#', '#references#')
          </CFQUERY> --->
             
<CFOUTPUT>
#title#<br>
#byline#<br>
#abstract#<br>
#body#<br>
#references#<br>
</CFOUTPUT>
     
</cfloop>

0
 

Author Comment

by:jeffmace
ID: 9643965
Well, i changed body to bodytext and that is outputting all the content just fine.  So my last problem is the crazy character encoding, so if you can help me fix that, that would be wonderful.

http://www.jhasim.net/xml/output6.cfm

0
 
LVL 4

Expert Comment

by:procept
ID: 9644002
Hi,

that issue with security is weird... I'm gonna do some tests when I find the time!

Missing body-output: you make the same mistake as me. ;-)) You are displaying the var body instead of bodyText. (So did I in the query, change that when you need to do a real query!!)

UTF-8: you might try putting a <cfprocessingdirective> at the top of the page:

<cfprocessingdirective suppresswhitespace="Yes" pageencoding="UTF-8">

Or you might use a meta tag (can be done with <cfheader>, too):
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

What database are you going to use?? Should be unicode-enabled!

HTH,

Chris

0
 

Author Comment

by:jeffmace
ID: 9644189
Well i am using SQL 2000.   I inserted your code into the database and the same output came out.

http://www.jhasim.net/xml/test.cfm

It seems to be inserting into the database with those characters.

I tried the different encodings, but its still coming out the same.
0
 
LVL 4

Expert Comment

by:procept
ID: 9644261
With SQL2000 you need to use nvarchar fields instead of varchar, and ntext instead of text.

With nvarchar-fields you need to keep in mind that UTF-8 is a 2-byte encoding. That means your fields need to be twice as long as the maximum character length. If you want to store a max of 50 chars in the field, it has to be nvarchar(100).

Also, with inserting or updating, you need to tell SQL Server that you are inserting nvarchar by adding a capital N to the value (no kidding!!):
INSERT INTO myTable (myNvarcharField)
VALUES (N'#myCFVariable#')



0
 

Author Comment

by:jeffmace
ID: 9644501
I am sorry about this, but still does not work

here is a shot of my SQL Server.

http://www.jhasim.net/xml/sql.jpg

No matter how I am seeing the data, it seems to be grabbing it and sending it out as  this type of stuff:  â&#128;&#156;somatoform disordersâ&#128;&#157.  

0
 
LVL 4

Accepted Solution

by:
procept earned 500 total points
ID: 9644695
I just did some test and it turned out that you need to add the charset attribute to the cffile tag:

<cffile action="read" file="E:\InetPub\ASiM\web\xml\xmltest1.xml" variable="xmlFile" charset="utf-8">

Also, I forgot to remove the xml definition and the references tag from the references string. You need to add those lines:

   <cfset references = reReplaceNoCase(references, "<\?xml[^>]+>", "")>
   <cfset references = reReplaceNoCase(references, "</?references[^>]*>", "", "ALL")>  

Hope it works now. ;-))

Chris

0
 

Author Comment

by:jeffmace
ID: 9644741
Oh my God, you are the best.  I wish I could give you 1,000,000 points.   This works great and I should be able to do what I need with it.

Great JOB!!!!
0
 
LVL 4

Expert Comment

by:procept
ID: 9644791
you're welcome... let me know if you need more help. :-))

Chris
0
 

Author Comment

by:jeffmace
ID: 9645777
One more question regarding the formatting of the headings.  If i wanted to change the h2 and h3 to a font size, how would that be done if it's possilbe.
0
 
LVL 4

Expert Comment

by:procept
ID: 9648300
You need to change these lines:

<cfset bodytext = reReplaceNoCase(bodytext, "<(/)?heading>", "<\1h2>", "ALL")>
<cfset bodytext = reReplaceNoCase(bodytext, "<(/)?subheading>", "<\1h3>", "ALL")>
   
But, as <font> requires attributes, you have to so it slightly different:

<cfset bodytext = replaceNoCase(bodytext, "<heading>", "<font size="4">", "ALL")>
<cfset bodytext = replaceNoCase(bodytext, "<subheading>", "<font size="3">", "ALL")>
<cfset bodytext = reReplaceNoCase(bodytext, "</(sub)?heading>", "</font>", "ALL")>

That should do, haven't tested it though!

Chris

0

Featured Post

Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

Join & Write a Comment

Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
SASS allows you to treat your CSS code in a more OOP way. Let's have a look on how you can structure your code in order for it to be easily maintained and reused.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now