Solved

Using XMLHTTP to retrieve XML document and parsing the result

Posted on 2004-04-17
6
933 Views
Last Modified: 2008-02-01
I am trying to remotely retrieve an XML document using XMLHTTP (in ASP), and then parse out a particular tag. The problem I am running into is that Internet Explorer seems to want to parse the XML document on its own.

For example, the following code:

Set HttpReq = Server.CreateObject("Microsoft.XMLHTTP")
url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=PubMed&report=medline&mode=xml&id=15088295"
HttpReq.Open "GET", url, False, username, password
HttpReq.Send
buffer = HttpReq.responseText
Set HttpReq = Nothing

I then simply want to print out what is in the buffer:

Response.Write buffer

However, internet explorer is interpreting it as an XML document and processing it on it's own. Is there anyway to have the browser just treat is as a regular HTML document?

The reason I want to do it this way, is I get an error message back from Internet Explorer "An invalid character was found in text content. Error processing resource "

This is interesting because by typing the URL above directly into your browser, it displays fine. But using the code above, it generates this error message. (It is being tripped specifically on the Affiliation tag, when it comes the the letter 'e' with the accent over it).

Any idea what is going on? Or how to instruct Internet Explorer to handle the data as pure text, not XML??

Thanks very much.
0
Comment
Question by:nealg2
  • 4
  • 2
6 Comments
 

Author Comment

by:nealg2
ID: 10849594
I think I have found out what to do...

When doing the parse with XMLDOM, I set the validateOnParse parameter to false, and then it doesn't complain about invalid characters.

This seems to be a work-around for my original problem, so I would still appreciate any suggestions on a better way to do this.
0
 

Author Comment

by:nealg2
ID: 10849626
The question that still stands is how to get Internet Explorer, or any other browser, to treat the data I print out as an HTML document and not an XML document. I tried setting the ContentType but that did not seem to work.
0
 
LVL 26

Accepted Solution

by:
rdcpro earned 125 total points
ID: 10851759
Don't use:

Set HttpReq = Server.CreateObject("Microsoft.XMLHTTP")

This uses MSXML 2.0, and its *not* server-safe.  Use MSXML 3 or 4:

Set HttpReq = Server.CreateObject("Msxml2.ServerXMLHTTP")
Set HttpReq = Server.CreateObject("Msxml2.ServerXMLHTTP.4.0")

Then to fix your other problem, you're probably having problems with invalid characters because of the encoding switch when you use responseText to a string. Do it this way:

HttpReq.Send
Response.ContentType = "text/xml"
HttpReq.responseXML.save Response

If you want it sent as plain text, use:

Response.ContentType = "text/plain"

But I prefer to see the XML, so that I know it's getting parsed correctly.  Using
HttpReq.responseXML.save Response
should get the encoding problem fixed.  Unfortunately, the NIH site doesn't specify the encoding in their XML.  Shame on them! It appears that the encoding is actually UTF-16, but using responseText probably strips the Byte Order Mark from the text, so that IE can't figure out what the encoding is.  Use the responseXML property, and call the save method on it to save it to the Response object.

Regards,
Mike Sharp




0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:nealg2
ID: 10854137
Thanks Mike.

Some followup questions:

After I issue the command "HttpReq.responseXML.save Response" then what happens to the XML data I retrieved? Because the next step in the code that I did not show was:

      'Microsoft XML parser
      dim objXML
      Set objXML = Server.CreateObject("Microsoft.XMLDOM")
      objXML.async = False
      objXML.validateOnParse = False
      objXML.LoadXML(****what goes here now??****)

And from there I am going to parse out just the fields I want using the command "objXML.documentElement.selectSingleNode(tag).text" , and save that into a string var.

I also tried using the command "Response.ContentType = "text/xml" but that seems to mess up the HTML document that is printed out. Internet Explorer is thinking that I want to output an XML document, when in fact, I am outputing an HTML document from the ASP (with the data stored in the selected XML tags as part of the HTML).

I hope this makes sense.
0
 

Author Comment

by:nealg2
ID: 10854402
Once again I posted too hastily. Turns out the problem I was running into was the MSXML component did not work on my localhost, but worked once uploaded to my host provider.
0
 
LVL 26

Expert Comment

by:rdcpro
ID: 10854572
Use

objXML.Load  HttpReq.responseBody

Avoid using strings.

But to re-iterate, do not use Microsoft.XMLDOM for your progID.  This is not server safe.  Use MSXML 3 or 4.

Regards,
Mike Sharp
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to get the parent segment field value in an xml using XSLT 6 35
Quest Defender - XML > HTML POST Data 9 32
Unattended XML settings 4 102
Create XML 5 46
The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Internet Business Fax to Email Made Easy - With  eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, f…
Both in life and business – not all partnerships are created equal. As the demand for cloud services increases, so do the number of self-proclaimed cloud partners. Asking the right questions up front in the partnership, will enable both parties …

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now