Solved

Get remote page (url) with foreign characters

Posted on 2004-08-09
5
241 Views
Last Modified: 2012-06-27
Hi,
I want to get the contents of a remote page (an url like http://www.google.com) into a string variable using asp.
I know i can use the MSXML2.ServerXMLHTTP object to "get" a remote file, using the .ResponseText property to read the returning text into a string.
However, this method raises severe problems when the resulting text contains foreign characters like ë or é etc.
Does anyone knows a solution for this ?
(I've read something about reading the return values as binary and converting them to ascii ... how does this work?)

Thanks in advance!
Steffest
0
Comment
Question by:Steffest
  • 3
  • 2
5 Comments
 
LVL 4

Expert Comment

by:Tasneem
ID: 11752285
<%@CodePage = 65001
Response.CharSet = "utf-8"
%>
Put the above code in the calling page.  ie the page where you are doing xmlhttppost.It should ideally work. If not then can think of alternatives
0
 
LVL 1

Author Comment

by:Steffest
ID: 11761573
Hi Tasneem,

nope, i tried setting it both in the calling page and in the page that is called ....
no results ...

In the mean time I've found a solution that works (more or less)

reading the response as binary data using the .ResponseBody property and converting it to ascii using the function at http://www.motobit.com/tips/detpg_binarytostring.htm

but it's ridiculously slow ....
There's got to be a better solution for this ....
0
 
LVL 4

Expert Comment

by:Tasneem
ID: 11781804
0
 
LVL 4

Accepted Solution

by:
Tasneem earned 250 total points
ID: 11781813
The above link posted earlier.. is of PHP.. but you can use that solution for your page perhaps.
for general reading
http://www.mezzoblue.com/archives/2003/07/29/html_and_for/
0
 
LVL 1

Author Comment

by:Steffest
ID: 11782036
Thanks Tasneem

I had some clarifying reads there.
The problem was indicated very well:

quote
"Oh, I can just pretend this is UTF-8. This sometimes works, but unfortunately there's not that much pure ASCII left in the world
If there's even one é or smart quotation mark (“ instead of ") in your text, it's probably encoded in ISO-8859 or some Microsoft code page, and will seriously confuse software that thinks it's reading UTF-8, including most XML software."
/quote

Seems that most of the requested url's are not UTF-8 at all, therefore messing up the MSXML2 text parser ...
Problem solved.
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have helped a lot of people on EE with their coding sources and have enjoyed near about every minute of it. Sometimes it can get a little tedious but it is always a challenge and the one thing that I always say is:  The Exchange of information …
I was asked about the differences between classic ASP and ASP.NET, so let me put them down here, for reference: Let's make the introductions... Classic ASP was launched by Microsoft in 1998 and dynamically generate web pages upon user interact…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question