Link to home
Start Free TrialLog in
Avatar of anorgeorge
anorgeorge

asked on

Editing XML file

Hi,

I have an XML file whose second tag is:

<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st November 2003//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_031101.dtd">

i.e. the tag appears between <?xml version="1.0"?>  and  the root tag.

I would like to delete this tag with ASP. Any suggestions?

Avatar of deighc
deighc

This is not direct answer to your question, but why do you want to do this?

This line is a DTD declaration. It's not part of the document hierachy so you can't remove it by loading the XML into a DOM and programatically removing it. But you can instruct a parser to simply ignore it.

So I don't understand why you want to remove it.
Avatar of anorgeorge

ASKER

I still haven't figured out why, but for some reason my ASP program won't work when it's there. The program works perfectly when it's removed! The contents of this XML file are periodically downloaded from the Internet, so each time I have to manually remove the tag. I was hoping there was a way to do this programmatically.
If you're using MSXML you can ignore the the DTD by setting the resolveExternals property to false. Be sure to set this before calling the load (or loadXML) method.

ie. (say your DOM is xmlObj)

xmlObj.async = false
xml.Obj.resolveExternals = false
xmlObj.load "<your URL or file path here>"
I tried what you suggested, but it still won't work. This is the error message I get:

Microsoft VBScript runtime error '800a01a8'

Object required: '[object]'

/xml/datafeed.asp, line 15

However, once I remove the DTD, it works!
Past your code here so I can have a look. And make a note of the actual line (line 15) that's causing the error.
This is the code:

<!--#include virtual="/adovbs.inc"-->

<HTML><BODY>

<%



 Set objXML = Server.CreateObject("Microsoft.XMLDOM")
 objXML.async = False
 objXML.resolveExternals=False
 objXML.Load (Server.MapPath("query.xml"))


 Set objMeshHeadingList=objXML.documentElement.getElementsByTagName("MeshHeadingList")

 noOfArticles=objMeshHeadingList.length


' for each article with MeSH terms, retrieve the PMID and MeSH terms

 Dim objConn
 Set objConn= Server.CreateObject("ADODB.Connection")
 objConn.ConnectionString= "Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\Inetpub\wwwroot\xml\pubmed.mdb"
 objConn.Open  

 for j=0 to (noOfArticles-1)
   
    Set objParent=objMeshHeadingList.item(j).parentNode
    strPMID=objParent.firstchild.text  

    Set objMeshGroup=objMeshHeadingList.item(j)
 
   

   objMeshGroupNumber=objMeshGroup.childNodes.length

   for k=0 to (objMeshGroupNumber-1)

     MeshTermNumber= objMeshGroup.childNodes(k).childNodes.length

   
      for i=0 to (MeshTermNumber-1)

        strMajor=objMeshGroup.childNodes(k).childNodes(i).getAttribute("MajorTopicYN")

       
        if (strMajor="Y") then
 
                strMeshTerm= objMeshGroup.childNodes(k).firstChild.text
            strMeshTerm= Replace(strMeshTerm, "'", "''")
   


                'Entering the retrieved values into a table  
   
   
                mySQL="INSERT INTO articles(PMID,MeshTerms) VALUES (" & strPMID & ",'" & strMeshTerm & "')"
                objConn.Execute(mySQL)
               
                exit for

       end if
   
      Next  
   
   Next

   
Next

objConn.Close
Set objConn=nothing
   

response.write("<br><br><br> Yes, Articles and MeSH terms successfully stored")


%>
</BODY></HTML>




This is the XML file:

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st November 2003//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_031101.dtd">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="Completed">
<PMID>12428570</PMID>
<DateCreated>
<Year>2002</Year>
<Month>11</Month>
<Day>13</Day>
</DateCreated>
<DateCompleted>
<Year>2003</Year>
<Month>02</Month>
<Day>07</Day>
</DateCompleted>
<DateRevised>
<Year>2003</Year>
<Month>11</Month>
<Day>14</Day>
</DateRevised>
<Article>
<Journal>
<ISSN>0043-5147</ISSN>
<JournalIssue PrintYN="Y">
<Volume>55</Volume>
<Issue>7-8</Issue>
<PubDate>
<Year>2002</Year>
</PubDate>
</JournalIssue>
</Journal>
<ArticleTitle>[Clinical symptoms and signs in Kimmerle anomaly]</ArticleTitle>
<Pagination>
<MedlinePgn>416-22</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>The aim of the study was to consider Kimmerle anomaly (ponticulus posterior of the atlas) as an anatomic variant, which can cause a set of clinical symptoms and signs. A hundred and eight patients, 58 females and 50 males at the age of 18-59 years (M. 36.9 years, SD = 9.6) with radiologically verified Kimmerle anomaly were examined. A control group comprised 40 healthy subjects at the similar age range. The diagnosis of headaches was based on the criteria proposed by the IHS. A character of headaches, their localization, frequency, duration, number of days with headaches per year, circumstances associated with their onset and concomitant symptoms were evaluated. All the patients were subjected to electrophysiological studies (ENG, EEG and VEP). The results were statistically analyzed using a SPSS/PC+ computer system. It was revealed that clinical symptoms and signs in Kimmerle anomaly occurred most frequently in the third and fourth decade of life (65% of cases). These were most often tension-type headaches (50% of cases with headaches), vascular headaches (26% of cases) and neuralgia (24% of cases). Intensity of headaches was high. Headaches were accompanied by other complaints like vertigo (59% of cases) and in one third of cases--nausea. About 10% of patients also suffered from vomiting, paresthesia, dizziness, short periods of loss of consciousness. Sporadically--tinitus, drop attack, and vegetative symptoms. In cases without pain the most frequent signs were short periods of loss of consciousness, dizziness, and also nausea and dizziness. The EEG examination revealed pathology in 40% of patients with Kimmerle anomaly. The ENG examination in more than 33% of anomaly cases showed injury in the central part of vestibular system. Improper answers were reported in about 75% of the patients during the VEP examination.</AbstractText>
</Abstract>
<Affiliation>Zak&#322;adu Neurologii i Zaburze&#324; Czynno&#347;ciowych Narzadu Zucia Instytutu Stomatologii Akademii Medycznej w &#321;odzi.</Affiliation>
<AuthorList CompleteYN="Y">
<Author>
<LastName>Split</LastName>
<ForeName>Wojciech</ForeName>
<Initials>W</Initials>
</Author>
<Author>
<LastName>Sawrasewicz-Rybak</LastName>
<ForeName>Ma&#322;gorzata</ForeName>
<Initials>M</Initials>
</Author>
</AuthorList>
<Language>pol</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
<VernacularTitle>Zespó&#322; objawów klinicznych w anomalii Kimmerlego.</VernacularTitle>
</Article>
<MedlineJournalInfo>
<Country>Poland</Country>
<MedlineTA>Wiad Lek</MedlineTA>
<NlmUniqueID>9705467</NlmUniqueID>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Adult</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Age Factors</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Atlas</DescriptorName>
<QualifierName MajorTopicYN="Y">abnormalities</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Case-Control Studies</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">English Abstract</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Female</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Headache</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Hearing Disorders</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Human</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Joint Diseases</DescriptorName>
<QualifierName MajorTopicYN="N">complications</QualifierName>
<QualifierName MajorTopicYN="N">pathology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Male</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Middle Aged</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Nerve Compression Syndromes</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Neuralgia</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Severity of Illness Index</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Support, Non-U.S. Gov't</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Tinnitus</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Vertebrobasilar Insufficiency</DescriptorName>
<QualifierName MajorTopicYN="Y">complications</QualifierName>
<QualifierName MajorTopicYN="Y">diagnosis</QualifierName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Vertigo</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
      <History>
            <PubMedPubDate PubStatus="pubmed">
                  <Year>2002</Year>
                  <Month>11</Month>
                  <Day>14</Day>
                  <Hour>4</Hour>
                  <Minute>0</Minute>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="medline">
                  <Year>2003</Year>
                  <Month>2</Month>
                  <Day>8</Day>
                  <Hour>4</Hour>
                  <Minute>0</Minute>
            </PubMedPubDate>
      </History>
      <PublicationStatus>ppublish</PublicationStatus>
      <ArticleIdList>
            <ArticleId IdType="pubmed">12428570</ArticleId>
            <ArticleId IdType="medline">22316280</ArticleId>
      </ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
...and which line is giving you the error?
This is the line givng the error:
Set objMeshHeadingList=objXML.documentElement.getElementsByTagName("MeshHeadingList")

The lines preceding it are:

<!--#include virtual="/adovbs.inc"-->

<HTML><BODY>

<%



 Set objXML = Server.CreateObject("Microsoft.XMLDOM")
 objXML.async = False
 objXML.resolveExternals=False
 objXML.Load (Server.MapPath("query.xml"))


ASKER CERTIFIED SOLUTION
Avatar of deighc
deighc

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>On several grounds:<<
Actually, that was a gross exageration.  Only one and a half :)
@acperkins,

Thanks for the extra info.

I must say I'm kind-of confused by the "ó" issue. I was always under the impression that so-called 'international' characters had be encoded. I work in Germany so often deal with "ü", "Ä" etc and encode everything as a matter of course. I know that having these characters in an XSL document is a definite no-no so I assumed that that also applied to other flavours XML.

I copied @anorgeorges's XML doc and ASP code and everything worked fine once I removed the "ó" characters. I left validateOnParse as the default value (true) and set resolveExternals to false.

And...

One thing I would add to this:

> Set objMeshHeadingList=objXML.documentElement.getElementsByTagName("MeshHeading")

is that, yes I agree - it looked likely that @anorgeorge really wanted a node list of <MeshHeading> nodes.

My preferred method for selecting nodes is to use a XPath query. I suspect that this provides better performance because you can specify the full path to the nodes and prevent a search thru the entire XML document. So I would use:

Set objMeshHeadingList=objXML.documentElement.selectNodes("/PubmedArticleSet/PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading")

Using getElementsByTagName("MeshHeading") would be the XPath equivalent of selectNodes("//MeshHeading") which, particularly for larger documents, would slow things down by forcing a search thru the entire document.

Anyway, it seems that @anorgeorge has found a satisfactory answer to his/her problem. So three cheers for us...
Thanks to both acperkins and deighc for all their help. I did finally get it to work. This is the line that did the trick:
objXML.validateOnParse = False

On a minor note, the xml file I gave you is not complete- just a portion of it. I did indeed want to access MeshHeadingLists. But thanks for the info on XPath.
deighc,

I don't know what to say, other than I tested it :)  Not only in an XML editor we use here, but also in code using MSXML v4.0.

However, I just tried loading it in IE and I get:

An invalid character was found in text content. Error processing resource 'file:///C:/Temp/Temp.xml'. Line 56, Position 22
<VernacularTitle>Zesp

So it looks like IE is using a different encoding or (far fetched) an older version of MSXML.  Either way I am somewhat confused, as I am not sure why it is doing that.  If I get a chance I will look into it more.

As to the validateOnParse I found that the only way I could load the XML document when I did not have access to the DTD (I purposely butchered the URL) was to set this property.  On re-reading the definition for each property it made sense to me. But to be quite honest, I am less certain than I was yesterday :)

>>My preferred method for selecting nodes is to use a XPath query. <<
Could not agree with you more.

>>Using getElementsByTagName("MeshHeading") would be the XPath equivalent of selectNodes("//MeshHeading") which, particularly for larger documents, would slow things down by forcing a search thru the entire document.<<
The only thing I would add to that is worse than bad performance is the number of subtle bugs introduced using global paths such as ("//MeshHeading").  Definitely use:

/PubmedArticleSet/PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading

Anthony
> So it looks like IE is using a different encoding or (far fetched) an older version of MSXML

That's a good point. I've always been curious to know which parser version IE uses by default if more than one version is installed. That could explain things.

And I you're right about the effects of the validateOnParse property. I originally thought that this meant, when true, "validate the XML for well formed-ness after parsing". And of course this is wrong. This is the behaviour when validateOnParse is set to FALSE. When true the behaviour is "validate the XML for well formed-ness AND validate the document against any DTD's". So in a nutshell this would solve @anorgeorges's problem (as we already know because (s)he told us so...)

> I am less certain than I was yesterday.

I know the feeling!!

As always Anthony I appreciate the your input. Thanks.
>>This is the behaviour when validateOnParse is set to FALSE. When true the behaviour is "validate the XML for well formed-ness AND validate the document against any DTD's"<<
And if I can quibble with you again. I don't believe validateOnParse has any effect on the "well-formed-ness" only on the validity (which as you know has to do with DTD's and XML Schemas). You can detect that an XML document is considered well formed when you load it.

>>As always Anthony I appreciate the your input.<<
Just let me know when to quit, I can be quite pedantic !
>> Just let me know when to quit, I can be quite pedantic

Quibble away - this is good stuff to know.

The documentation for MSXML 4 has this to say about the validateOnParse property:

"The property is read/write. If True, it validates during parsing. If False, it parses only for well-formed XML. The default is True."

So actually, in reality, you have no control over where the you want to allow the document to be well-formed or not. It MUST be. validateOnParse cannot override this behaviour.

What you're getting at is that validateOnParse is only of practical use in determining whether or not a document is validated against a DTD or schema. And this of course is absolutely correct.

I got a bit confused by this and have duly wasted both yours and my time!!

And of course you're more than welcome to tell me to shutup too...
>>So actually, in reality, you have no control over where the you want to allow the document to be well-formed or not. It MUST be. <<
Exactly.

>>What you're getting at is that validateOnParse is only of practical use in determining whether or not a document is validated against a DTD or schema.<<
True, again.

Damn I hate it when that happens, now I have nothing to argue about :)

But I guess it is time to shut this thread down, as I think we have hijacked this question enough.