Solved

How to get the charset through MSXML2.ServerXMLHTTP

Posted on 2010-09-17
6
771 Views
Last Modified: 2012-05-10
How do i return the approriate charset if one is not defined in the getResponseHeader. In the example below the charset is not returned by the page headers.

<% 
url = "http://www.embalgeria.nl/Contact.htm" 
set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") 
xmlhttp.open "GET", url, false 
xmlhttp.send "" 

mycharset = ""
mycharset= xmlhttp.getResponseHeader("Content-Type")
If(inStr(mycharset,"charset")=0) Then 
 mycharset = ""
 ' Find the correct charset
End If 

response.Write mycharset 
%>

Open in new window

0
Comment
Question by:Nebukad
  • 2
  • 2
6 Comments
 
LVL 27

Expert Comment

by:BigRat
ID: 33752511
>>How do i return the approriate charset if one is not defined in the getResponseHeader

Then it is dependant on the Content-Type, for example with text/html it is ISO-8859-1 and with application/xml (and sometimes text/xml) it is UTF-8. If it is image/* then there isn't one.

What content does your URL return?
0
 

Author Comment

by:Nebukad
ID: 33753456
In my example the page headers returns:

HTTP Status Code: HTTP/1.1 200 OK
....................
Content-Type: text/html
....................

By default the charset is set to ISO-8859-1 because no charset is set in the response header, correct?

nb: Since my application only crawls webpages i have only need for the examples you already mentioned in you response (text/html, application/xml and text/xml) .
0
 
LVL 27

Accepted Solution

by:
BigRat earned 500 total points
ID: 33754210
>>By default the charset is set to ISO-8859-1 because no charset is set in the response header, correct?

Correct, that is the default. It is also the standard.

BUT

You'll find some web servers that don't conform to the rules - mostly in Russia, Greece, Arabia and China/Japan/Korea where they have a <META> HTML tag with the charset set ommitted from the header and different from ISO-8859-1 (normally of course a Russian, Greek, Arabic or Big-5 char set).

I'd use this rule :-
   1) charset in header -> extract and set that initially as the set to use
   2) charset not in header -> set ISO-8859-1 initially.
   3) META tag with charset (equivalence to content-type header) then override setting with that
   4) decode the page with that charset.
  Note that any numeric entities are to be interpreted in the Unicode set. This is also a problem since Netscape used to interpret them in the selected charset and you'll find Russian sites still doing the same.

HTH
0
 

Author Closing Comment

by:Nebukad
ID: 33767645
Thanks for the response and the explanation on how webservers deal with charsets.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have helped a lot of people on EE with their coding sources and have enjoyed near about every minute of it. Sometimes it can get a little tedious but it is always a challenge and the one thing that I always say is:  The Exchange of information …
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

24 Experts available now in Live!

Get 1:1 Help Now