Solved

How to get the charset through MSXML2.ServerXMLHTTP

Posted on 2010-09-17
6
782 Views
Last Modified: 2012-05-10
How do i return the approriate charset if one is not defined in the getResponseHeader. In the example below the charset is not returned by the page headers.

<% 
url = "http://www.embalgeria.nl/Contact.htm" 
set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") 
xmlhttp.open "GET", url, false 
xmlhttp.send "" 

mycharset = ""
mycharset= xmlhttp.getResponseHeader("Content-Type")
If(inStr(mycharset,"charset")=0) Then 
 mycharset = ""
 ' Find the correct charset
End If 

response.Write mycharset 
%>

Open in new window

0
Comment
Question by:Nebukad
  • 2
  • 2
6 Comments
 
LVL 27

Expert Comment

by:BigRat
ID: 33752511
>>How do i return the approriate charset if one is not defined in the getResponseHeader

Then it is dependant on the Content-Type, for example with text/html it is ISO-8859-1 and with application/xml (and sometimes text/xml) it is UTF-8. If it is image/* then there isn't one.

What content does your URL return?
0
 

Author Comment

by:Nebukad
ID: 33753456
In my example the page headers returns:

HTTP Status Code: HTTP/1.1 200 OK
....................
Content-Type: text/html
....................

By default the charset is set to ISO-8859-1 because no charset is set in the response header, correct?

nb: Since my application only crawls webpages i have only need for the examples you already mentioned in you response (text/html, application/xml and text/xml) .
0
 
LVL 27

Accepted Solution

by:
BigRat earned 500 total points
ID: 33754210
>>By default the charset is set to ISO-8859-1 because no charset is set in the response header, correct?

Correct, that is the default. It is also the standard.

BUT

You'll find some web servers that don't conform to the rules - mostly in Russia, Greece, Arabia and China/Japan/Korea where they have a <META> HTML tag with the charset set ommitted from the header and different from ISO-8859-1 (normally of course a Russian, Greek, Arabic or Big-5 char set).

I'd use this rule :-
   1) charset in header -> extract and set that initially as the set to use
   2) charset not in header -> set ISO-8859-1 initially.
   3) META tag with charset (equivalence to content-type header) then override setting with that
   4) decode the page with that charset.
  Note that any numeric entities are to be interpreted in the Unicode set. This is also a problem since Netscape used to interpret them in the selected charset and you'll find Russian sites still doing the same.

HTH
0
 

Author Closing Comment

by:Nebukad
ID: 33767645
Thanks for the response and the explanation on how webservers deal with charsets.
0

Featured Post

The New “Normal” in Modern Enterprise Operations

DevOps for the modern enterprise offers many benefits — increased agility, productivity, and more, but digital transformation isn’t easy, especially if you’re not addressing the right issues. Register for the webinar to dive into the “new normal” for enterprise modern ops.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Question about ASP Including Files 6 61
.net and XML report from SCCM 5 43
Select case on click 3 21
Help with a Python script converting xml to csv 4 23
The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question