Solved

Editing XML file

Posted on 2004-03-23
18
560 Views
Last Modified: 2013-11-19
Hi,

I have an XML file whose second tag is:

<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st November 2003//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_031101.dtd">

i.e. the tag appears between <?xml version="1.0"?>  and  the root tag.

I would like to delete this tag with ASP. Any suggestions?

0
Comment
Question by:anorgeorge
  • 8
  • 5
  • 5
18 Comments
 
LVL 15

Expert Comment

by:deighc
ID: 10656348
This is not direct answer to your question, but why do you want to do this?

This line is a DTD declaration. It's not part of the document hierachy so you can't remove it by loading the XML into a DOM and programatically removing it. But you can instruct a parser to simply ignore it.

So I don't understand why you want to remove it.
0
 

Author Comment

by:anorgeorge
ID: 10656377
I still haven't figured out why, but for some reason my ASP program won't work when it's there. The program works perfectly when it's removed! The contents of this XML file are periodically downloaded from the Internet, so each time I have to manually remove the tag. I was hoping there was a way to do this programmatically.
0
 
LVL 15

Expert Comment

by:deighc
ID: 10656461
If you're using MSXML you can ignore the the DTD by setting the resolveExternals property to false. Be sure to set this before calling the load (or loadXML) method.

ie. (say your DOM is xmlObj)

xmlObj.async = false
xml.Obj.resolveExternals = false
xmlObj.load "<your URL or file path here>"
0
 

Author Comment

by:anorgeorge
ID: 10656514
I tried what you suggested, but it still won't work. This is the error message I get:

Microsoft VBScript runtime error '800a01a8'

Object required: '[object]'

/xml/datafeed.asp, line 15

However, once I remove the DTD, it works!
0
 
LVL 15

Expert Comment

by:deighc
ID: 10656595
Past your code here so I can have a look. And make a note of the actual line (line 15) that's causing the error.
0
 

Author Comment

by:anorgeorge
ID: 10656644
This is the code:

<!--#include virtual="/adovbs.inc"-->

<HTML><BODY>

<%



 Set objXML = Server.CreateObject("Microsoft.XMLDOM")
 objXML.async = False
 objXML.resolveExternals=False
 objXML.Load (Server.MapPath("query.xml"))


 Set objMeshHeadingList=objXML.documentElement.getElementsByTagName("MeshHeadingList")

 noOfArticles=objMeshHeadingList.length


' for each article with MeSH terms, retrieve the PMID and MeSH terms

 Dim objConn
 Set objConn= Server.CreateObject("ADODB.Connection")
 objConn.ConnectionString= "Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\Inetpub\wwwroot\xml\pubmed.mdb"
 objConn.Open  

 for j=0 to (noOfArticles-1)
   
    Set objParent=objMeshHeadingList.item(j).parentNode
    strPMID=objParent.firstchild.text  

    Set objMeshGroup=objMeshHeadingList.item(j)
 
   

   objMeshGroupNumber=objMeshGroup.childNodes.length

   for k=0 to (objMeshGroupNumber-1)

     MeshTermNumber= objMeshGroup.childNodes(k).childNodes.length

   
      for i=0 to (MeshTermNumber-1)

        strMajor=objMeshGroup.childNodes(k).childNodes(i).getAttribute("MajorTopicYN")

       
        if (strMajor="Y") then
 
                strMeshTerm= objMeshGroup.childNodes(k).firstChild.text
            strMeshTerm= Replace(strMeshTerm, "'", "''")
   


                'Entering the retrieved values into a table  
   
   
                mySQL="INSERT INTO articles(PMID,MeshTerms) VALUES (" & strPMID & ",'" & strMeshTerm & "')"
                objConn.Execute(mySQL)
               
                exit for

       end if
   
      Next  
   
   Next

   
Next

objConn.Close
Set objConn=nothing
   

response.write("<br><br><br> Yes, Articles and MeSH terms successfully stored")


%>
</BODY></HTML>




This is the XML file:

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st November 2003//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_031101.dtd">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="Completed">
<PMID>12428570</PMID>
<DateCreated>
<Year>2002</Year>
<Month>11</Month>
<Day>13</Day>
</DateCreated>
<DateCompleted>
<Year>2003</Year>
<Month>02</Month>
<Day>07</Day>
</DateCompleted>
<DateRevised>
<Year>2003</Year>
<Month>11</Month>
<Day>14</Day>
</DateRevised>
<Article>
<Journal>
<ISSN>0043-5147</ISSN>
<JournalIssue PrintYN="Y">
<Volume>55</Volume>
<Issue>7-8</Issue>
<PubDate>
<Year>2002</Year>
</PubDate>
</JournalIssue>
</Journal>
<ArticleTitle>[Clinical symptoms and signs in Kimmerle anomaly]</ArticleTitle>
<Pagination>
<MedlinePgn>416-22</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>The aim of the study was to consider Kimmerle anomaly (ponticulus posterior of the atlas) as an anatomic variant, which can cause a set of clinical symptoms and signs. A hundred and eight patients, 58 females and 50 males at the age of 18-59 years (M. 36.9 years, SD = 9.6) with radiologically verified Kimmerle anomaly were examined. A control group comprised 40 healthy subjects at the similar age range. The diagnosis of headaches was based on the criteria proposed by the IHS. A character of headaches, their localization, frequency, duration, number of days with headaches per year, circumstances associated with their onset and concomitant symptoms were evaluated. All the patients were subjected to electrophysiological studies (ENG, EEG and VEP). The results were statistically analyzed using a SPSS/PC+ computer system. It was revealed that clinical symptoms and signs in Kimmerle anomaly occurred most frequently in the third and fourth decade of life (65% of cases). These were most often tension-type headaches (50% of cases with headaches), vascular headaches (26% of cases) and neuralgia (24% of cases). Intensity of headaches was high. Headaches were accompanied by other complaints like vertigo (59% of cases) and in one third of cases--nausea. About 10% of patients also suffered from vomiting, paresthesia, dizziness, short periods of loss of consciousness. Sporadically--tinitus, drop attack, and vegetative symptoms. In cases without pain the most frequent signs were short periods of loss of consciousness, dizziness, and also nausea and dizziness. The EEG examination revealed pathology in 40% of patients with Kimmerle anomaly. The ENG examination in more than 33% of anomaly cases showed injury in the central part of vestibular system. Improper answers were reported in about 75% of the patients during the VEP examination.</AbstractText>
</Abstract>
<Affiliation>Zak&#322;adu Neurologii i Zaburze&#324; Czynno&#347;ciowych Narzadu Zucia Instytutu Stomatologii Akademii Medycznej w &#321;odzi.</Affiliation>
<AuthorList CompleteYN="Y">
<Author>
<LastName>Split</LastName>
<ForeName>Wojciech</ForeName>
<Initials>W</Initials>
</Author>
<Author>
<LastName>Sawrasewicz-Rybak</LastName>
<ForeName>Ma&#322;gorzata</ForeName>
<Initials>M</Initials>
</Author>
</AuthorList>
<Language>pol</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
<VernacularTitle>Zespó&#322; objawów klinicznych w anomalii Kimmerlego.</VernacularTitle>
</Article>
<MedlineJournalInfo>
<Country>Poland</Country>
<MedlineTA>Wiad Lek</MedlineTA>
<NlmUniqueID>9705467</NlmUniqueID>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Adult</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Age Factors</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Atlas</DescriptorName>
<QualifierName MajorTopicYN="Y">abnormalities</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Case-Control Studies</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">English Abstract</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Female</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Headache</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Hearing Disorders</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Human</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Joint Diseases</DescriptorName>
<QualifierName MajorTopicYN="N">complications</QualifierName>
<QualifierName MajorTopicYN="N">pathology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Male</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Middle Aged</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Nerve Compression Syndromes</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Neuralgia</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Severity of Illness Index</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Support, Non-U.S. Gov't</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Tinnitus</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Vertebrobasilar Insufficiency</DescriptorName>
<QualifierName MajorTopicYN="Y">complications</QualifierName>
<QualifierName MajorTopicYN="Y">diagnosis</QualifierName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Vertigo</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
      <History>
            <PubMedPubDate PubStatus="pubmed">
                  <Year>2002</Year>
                  <Month>11</Month>
                  <Day>14</Day>
                  <Hour>4</Hour>
                  <Minute>0</Minute>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="medline">
                  <Year>2003</Year>
                  <Month>2</Month>
                  <Day>8</Day>
                  <Hour>4</Hour>
                  <Minute>0</Minute>
            </PubMedPubDate>
      </History>
      <PublicationStatus>ppublish</PublicationStatus>
      <ArticleIdList>
            <ArticleId IdType="pubmed">12428570</ArticleId>
            <ArticleId IdType="medline">22316280</ArticleId>
      </ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
0
 
LVL 15

Expert Comment

by:deighc
ID: 10656674
...and which line is giving you the error?
0
 

Author Comment

by:anorgeorge
ID: 10656733
This is the line givng the error:
Set objMeshHeadingList=objXML.documentElement.getElementsByTagName("MeshHeadingList")

The lines preceding it are:

<!--#include virtual="/adovbs.inc"-->

<HTML><BODY>

<%



 Set objXML = Server.CreateObject("Microsoft.XMLDOM")
 objXML.async = False
 objXML.resolveExternals=False
 objXML.Load (Server.MapPath("query.xml"))


0
 
LVL 15

Accepted Solution

by:
deighc earned 55 total points
ID: 10656814
OK, I wonder if the problem is related to the DTD declaration at all. I copied your XML and your ASP page and tested it. I found that the XML doc wouldn't load because it had a parse error. If you look at the XML there are some un-encoded characters in this node:

<VernacularTitle>Zespó&#322; objawów klinicznych w anomalii Kimmerlego.</VernacularTitle>

("ó" is an illegal character in XML

Once I removed these the doc it loaded OK.

The error you're getting suggests that the hasn't loaded properly because you're trying to create a node list from the documentElement object (which is the correct way to do it). But this object doesn't exist if the document isn't loaded.

So, a very easy way to test my theory is to use the true/false return value of the load method (in fact you should ALWAYS use this):

<%
Set objXML = Server.CreateObject("Microsoft.XMLDOM")
objXML.async = False
objXML.resolveExternals=False
if objXML.Load (Server.MapPath("query.xml")) then
 ' all your other ASP code here
else
 ' Document didn't load. Perhaps display a message or something
end if
set xmlObj = nothing
%>
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 75

Assisted Solution

by:Anthony Perkins
Anthony Perkins earned 70 total points
ID: 10664207
deighc,

I am going to have to differ with you on this one.  On several grounds:

1. >>If you're using MSXML you can ignore the the DTD by setting the resolveExternals property to false.<<
You also need to set validateOnParse = False.  If all you use set is resolveExternals and it cannot locate the DTD you get:
The element 'PubmedArticleSet' is used but not declared in the DTD/Schema.
I duplicated this by changing the DTD URL

2. >>("ó" is an illegal character in XML<<
Actually "ó" is very legal  (otherwise Spanish speaking countries would be up in arms! ).  Why it did not work for you, I am not sure, but may have had to do with the encoding.

Absolutely agree with you that the Load method needs to be tested.

So the code should look something like this (plagiarizing your version):
<%
Set objXML = Server.CreateObject("Microsoft.XMLDOM")
objXML.async = False
objXML.resolveExternals = False
objXML.validateOnParse = False
if objXML.Load (Server.MapPath("query.xml")) then
 ' all your other ASP code here
else
 ' Document didn't load. Perhaps display a message or something
   Response.Write "Error Code: " & objXML.parseError.errorCode & "<br/>"
   Response.Write "Reason: " & objXML.parseError.reason & "<br/>"
   Response.Write "Line: " & objXML.parseError.linepos & "<br/>"
   Response.Write "Source: " & objXML.parseError.srcText & "<br/>"
end if
set xmlObj = nothing
%>

By the way, I got the expected result of 1 for noOfArticles.  But I suspect the answer that is desired is 19 in which case change:
Set objMeshHeadingList=objXML.documentElement.getElementsByTagName("MeshHeadingList")

To:
Set objMeshHeadingList=objXML.documentElement.getElementsByTagName("MeshHeading")
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10664216
>>On several grounds:<<
Actually, that was a gross exageration.  Only one and a half :)
0
 
LVL 15

Expert Comment

by:deighc
ID: 10665459
@acperkins,

Thanks for the extra info.

I must say I'm kind-of confused by the "ó" issue. I was always under the impression that so-called 'international' characters had be encoded. I work in Germany so often deal with "ü", "Ä" etc and encode everything as a matter of course. I know that having these characters in an XSL document is a definite no-no so I assumed that that also applied to other flavours XML.

I copied @anorgeorges's XML doc and ASP code and everything worked fine once I removed the "ó" characters. I left validateOnParse as the default value (true) and set resolveExternals to false.

And...

One thing I would add to this:

> Set objMeshHeadingList=objXML.documentElement.getElementsByTagName("MeshHeading")

is that, yes I agree - it looked likely that @anorgeorge really wanted a node list of <MeshHeading> nodes.

My preferred method for selecting nodes is to use a XPath query. I suspect that this provides better performance because you can specify the full path to the nodes and prevent a search thru the entire XML document. So I would use:

Set objMeshHeadingList=objXML.documentElement.selectNodes("/PubmedArticleSet/PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading")

Using getElementsByTagName("MeshHeading") would be the XPath equivalent of selectNodes("//MeshHeading") which, particularly for larger documents, would slow things down by forcing a search thru the entire document.

Anyway, it seems that @anorgeorge has found a satisfactory answer to his/her problem. So three cheers for us...
0
 

Author Comment

by:anorgeorge
ID: 10667084
Thanks to both acperkins and deighc for all their help. I did finally get it to work. This is the line that did the trick:
objXML.validateOnParse = False

On a minor note, the xml file I gave you is not complete- just a portion of it. I did indeed want to access MeshHeadingLists. But thanks for the info on XPath.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10667962
deighc,

I don't know what to say, other than I tested it :)  Not only in an XML editor we use here, but also in code using MSXML v4.0.

However, I just tried loading it in IE and I get:

An invalid character was found in text content. Error processing resource 'file:///C:/Temp/Temp.xml'. Line 56, Position 22
<VernacularTitle>Zesp

So it looks like IE is using a different encoding or (far fetched) an older version of MSXML.  Either way I am somewhat confused, as I am not sure why it is doing that.  If I get a chance I will look into it more.

As to the validateOnParse I found that the only way I could load the XML document when I did not have access to the DTD (I purposely butchered the URL) was to set this property.  On re-reading the definition for each property it made sense to me. But to be quite honest, I am less certain than I was yesterday :)

>>My preferred method for selecting nodes is to use a XPath query. <<
Could not agree with you more.

>>Using getElementsByTagName("MeshHeading") would be the XPath equivalent of selectNodes("//MeshHeading") which, particularly for larger documents, would slow things down by forcing a search thru the entire document.<<
The only thing I would add to that is worse than bad performance is the number of subtle bugs introduced using global paths such as ("//MeshHeading").  Definitely use:

/PubmedArticleSet/PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading

Anthony
0
 
LVL 15

Expert Comment

by:deighc
ID: 10668084
> So it looks like IE is using a different encoding or (far fetched) an older version of MSXML

That's a good point. I've always been curious to know which parser version IE uses by default if more than one version is installed. That could explain things.

And I you're right about the effects of the validateOnParse property. I originally thought that this meant, when true, "validate the XML for well formed-ness after parsing". And of course this is wrong. This is the behaviour when validateOnParse is set to FALSE. When true the behaviour is "validate the XML for well formed-ness AND validate the document against any DTD's". So in a nutshell this would solve @anorgeorges's problem (as we already know because (s)he told us so...)

> I am less certain than I was yesterday.

I know the feeling!!

As always Anthony I appreciate the your input. Thanks.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10669621
>>This is the behaviour when validateOnParse is set to FALSE. When true the behaviour is "validate the XML for well formed-ness AND validate the document against any DTD's"<<
And if I can quibble with you again. I don't believe validateOnParse has any effect on the "well-formed-ness" only on the validity (which as you know has to do with DTD's and XML Schemas). You can detect that an XML document is considered well formed when you load it.

>>As always Anthony I appreciate the your input.<<
Just let me know when to quit, I can be quite pedantic !
0
 
LVL 15

Expert Comment

by:deighc
ID: 10669745
>> Just let me know when to quit, I can be quite pedantic

Quibble away - this is good stuff to know.

The documentation for MSXML 4 has this to say about the validateOnParse property:

"The property is read/write. If True, it validates during parsing. If False, it parses only for well-formed XML. The default is True."

So actually, in reality, you have no control over where the you want to allow the document to be well-formed or not. It MUST be. validateOnParse cannot override this behaviour.

What you're getting at is that validateOnParse is only of practical use in determining whether or not a document is validated against a DTD or schema. And this of course is absolutely correct.

I got a bit confused by this and have duly wasted both yours and my time!!

And of course you're more than welcome to tell me to shutup too...
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 10670520
>>So actually, in reality, you have no control over where the you want to allow the document to be well-formed or not. It MUST be. <<
Exactly.

>>What you're getting at is that validateOnParse is only of practical use in determining whether or not a document is validated against a DTD or schema.<<
True, again.

Damn I hate it when that happens, now I have nothing to argue about :)

But I guess it is time to shut this thread down, as I think we have hijacked this question enough.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL (http://www.experts-exchange.com/articles/201/Handling-Date-and-Time-in-PHP-and-MySQL.html) several years ago, it seemed like now was a good time to updat…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now