Solved

How to get the charset through MSXML2.ServerXMLHTTP

Posted on 2010-09-17
6
788 Views
Last Modified: 2012-05-10
How do i return the approriate charset if one is not defined in the getResponseHeader. In the example below the charset is not returned by the page headers.

<% 
url = "http://www.embalgeria.nl/Contact.htm" 
set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") 
xmlhttp.open "GET", url, false 
xmlhttp.send "" 

mycharset = ""
mycharset= xmlhttp.getResponseHeader("Content-Type")
If(inStr(mycharset,"charset")=0) Then 
 mycharset = ""
 ' Find the correct charset
End If 

response.Write mycharset 
%>

Open in new window

0
Comment
Question by:Nebukad
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
6 Comments
 
LVL 27

Expert Comment

by:BigRat
ID: 33752511
>>How do i return the approriate charset if one is not defined in the getResponseHeader

Then it is dependant on the Content-Type, for example with text/html it is ISO-8859-1 and with application/xml (and sometimes text/xml) it is UTF-8. If it is image/* then there isn't one.

What content does your URL return?
0
 

Author Comment

by:Nebukad
ID: 33753456
In my example the page headers returns:

HTTP Status Code: HTTP/1.1 200 OK
....................
Content-Type: text/html
....................

By default the charset is set to ISO-8859-1 because no charset is set in the response header, correct?

nb: Since my application only crawls webpages i have only need for the examples you already mentioned in you response (text/html, application/xml and text/xml) .
0
 
LVL 27

Accepted Solution

by:
BigRat earned 500 total points
ID: 33754210
>>By default the charset is set to ISO-8859-1 because no charset is set in the response header, correct?

Correct, that is the default. It is also the standard.

BUT

You'll find some web servers that don't conform to the rules - mostly in Russia, Greece, Arabia and China/Japan/Korea where they have a <META> HTML tag with the charset set ommitted from the header and different from ISO-8859-1 (normally of course a Russian, Greek, Arabic or Big-5 char set).

I'd use this rule :-
   1) charset in header -> extract and set that initially as the set to use
   2) charset not in header -> set ISO-8859-1 initially.
   3) META tag with charset (equivalence to content-type header) then override setting with that
   4) decode the page with that charset.
  Note that any numeric entities are to be interpreted in the Unicode set. This is also a problem since Netscape used to interpret them in the selected charset and you'll find Russian sites still doing the same.

HTH
0
 

Author Closing Comment

by:Nebukad
ID: 33767645
Thanks for the response and the explanation on how webservers deal with charsets.
0

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I would like to start this tip/trick by saying Thank You, to all who said that this could not be done, as it forced me to make sure that it could be accomplished. :) To start, I want to make sure everyone understands the importance of utilizing p…
The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question