Solved

How to get the charset through MSXML2.ServerXMLHTTP

Posted on 2010-09-17
6
798 Views
Last Modified: 2012-05-10
How do i return the approriate charset if one is not defined in the getResponseHeader. In the example below the charset is not returned by the page headers.

<% 
url = "http://www.embalgeria.nl/Contact.htm" 
set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP") 
xmlhttp.open "GET", url, false 
xmlhttp.send "" 

mycharset = ""
mycharset= xmlhttp.getResponseHeader("Content-Type")
If(inStr(mycharset,"charset")=0) Then 
 mycharset = ""
 ' Find the correct charset
End If 

response.Write mycharset 
%>

Open in new window

0
Comment
Question by:Nebukad
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
6 Comments
 
LVL 27

Expert Comment

by:BigRat
ID: 33752511
>>How do i return the approriate charset if one is not defined in the getResponseHeader

Then it is dependant on the Content-Type, for example with text/html it is ISO-8859-1 and with application/xml (and sometimes text/xml) it is UTF-8. If it is image/* then there isn't one.

What content does your URL return?
0
 

Author Comment

by:Nebukad
ID: 33753456
In my example the page headers returns:

HTTP Status Code: HTTP/1.1 200 OK
....................
Content-Type: text/html
....................

By default the charset is set to ISO-8859-1 because no charset is set in the response header, correct?

nb: Since my application only crawls webpages i have only need for the examples you already mentioned in you response (text/html, application/xml and text/xml) .
0
 
LVL 27

Accepted Solution

by:
BigRat earned 500 total points
ID: 33754210
>>By default the charset is set to ISO-8859-1 because no charset is set in the response header, correct?

Correct, that is the default. It is also the standard.

BUT

You'll find some web servers that don't conform to the rules - mostly in Russia, Greece, Arabia and China/Japan/Korea where they have a <META> HTML tag with the charset set ommitted from the header and different from ISO-8859-1 (normally of course a Russian, Greek, Arabic or Big-5 char set).

I'd use this rule :-
   1) charset in header -> extract and set that initially as the set to use
   2) charset not in header -> set ISO-8859-1 initially.
   3) META tag with charset (equivalence to content-type header) then override setting with that
   4) decode the page with that charset.
  Note that any numeric entities are to be interpreted in the Unicode set. This is also a problem since Netscape used to interpret them in the selected charset and you'll find Russian sites still doing the same.

HTH
0
 

Author Closing Comment

by:Nebukad
ID: 33767645
Thanks for the response and the explanation on how webservers deal with charsets.
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I recently decide that I needed a way to make my pages scream on the net.   While searching around how I can accomplish this I stumbled across a great article that stated "minimize the server requests." I got to thinking, hey, I use more than one…
Hello, all! I just recently started using Microsoft's IIS 7.5 within Windows 7, as I just downloaded and installed the 90 day trial of Windows 7. (Got to love Microsoft for allowing 90 days) The main reason for downloading and testing Windows 7 is t…
There's a multitude of different network monitoring solutions out there, and you're probably wondering what makes NetCrunch so special. It's completely agentless, but does let you create an agent, if you desire. It offers powerful scalability …
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question