Link to home
Start Free TrialLog in
Avatar of Vampir
Vampir

asked on

Insert HTML code with special characters into Access DB or Add as a Text file

I have an application that reads the html code from a web page and adds the whole html code into Access Database.
The problem is that some special characters like ÛÛÛ are converted to ??? when inserted into DB.
But when I copy/paste a webpage with special characters directly into DB there is no problem.

Is there another way to solve this? Perhaps by insterting the .txt or .htm file into DB, if Access supports it?

Almost complete code:
Dim Httpobj As MSXML2.XMLHTTP
Set Httpobj = New MSXML2.XMLHTTP
Dim oConn As ADODB.Connection
Dim oRs As ADODB.Recordset
Dim sSql As String
Dim sSql2 As String
Set oConn = New ADODB.Connection
oConn.Open ("DRIVER={Microsoft Access Driver (*.mdb)}; DBQ=" & "d:/documents.mdb")
sSql = "SELECT * FROM doc"
Set oRs = oConn.Execute(sSql)

Do While Not oRs.EOF
link = oRs("link")
id = oRs("id")

With Httpobj
.Open "GET", link, False
.send
niz = .responseText
End With

sSql2 = "UPDATE doc SET nfo = '" & niz & "' WHERE ID = " & id & ""
Set oRs2 = oConn.Execute(sSql2)
oRs.MoveNext
Loop
Avatar of ASPGuru
ASPGuru

try to open the recordset and then do this:
rs("nfo") = Httpobj.responseText

ASPGuru
Have you tried with server.htmlencode ?

Like

sSql2 = "UPDATE doc SET nfo = '" & server.htmlencode(niz) & "' WHERE ID = " & id & ""

Avatar of Vampir

ASKER

Neither works, the problem is with niz = .responseText.
The problem is that .responseText converts the text and I need to keep it "binary".
But since I dont have a clue about encoding.

I found some info on http://dbforums.com/arch/195/2003/2/676676

And there is another link on expert-excange, but cant seem to find it again. Its listed in the XML section, I should probably ask my question there.
well... have a look at the xmlhttp object... there's not only responseText but also a binary property... don't know the exact name right now...

ASPGuru
It's got to be an encoding issue? You're viewing html pages (or html output), so it MUST be ASCII. So you need to know what the source encoding is, and use this when you right into your database. Trouble is, I have no idea how you'd do this?? :-)
no... it doesn't need to be ascii...

ASPGuru
really? Shows what I know. html is text though? yes.
The characters are mapped against the wrong character map.

before calling Httpobj.open set the code page to UTF-8 with SetRequestHeader. like: Httpobj.SetRequestHeader "Content-Type", "text/html;Charset=UTF-8"



Also set the codepage to utf-8 in the asp page you are executing this code using Session.CodePage=65001 and Session.Charset = "UTF-8"

Httpobj.SetRequestHeader "Content-Type", "text/html;Charset=utf-8"
Avatar of Vampir

ASKER

First of I must admit that I currently testing the application as Visual Basic application.

niz = .responseStream produces an error

ASPGuru : i think that the binary method is .responseBody, see the solution i came with below, but it doesnt work with niz = .responseBody

Breedj(your solution seems to be the closest, but still an error):
Unspecified error in:
Httpobj.SetRequestHeader "Content-Type", "text/html;Charset=UTF-8"


Now here is what I got from the link above (my post) that works, but it first writes to txt file then reads from txt. There is also a link on what site I'm testing. Writing directy to Access DB as below doesnt work either.

Sub test()
sFile = "c:\test.txt"
sURL = "http://www.nforce.nl/nfos/clear_txt.php?id=27651"

Set objXMLHTTP = CreateObject("MSXML2.serverXMLHTTP.4.0")

objXMLHTTP.Open "GET", sURL, False
objXMLHTTP.send

Set strm1 = CreateObject("adodb.stream")
With strm1
.Type = 1
.Open
.Write objXMLHTTP.responseBody
.SaveToFile sFile, 2 ' adSaveCreateOverWrite
.Close
End With

Set strm2 = CreateObject("adodb.stream")
With strm2
.Type = 2
.Charset = "euc-kr" 'Use any proper charset
.Open
.LoadFromFile "c:\test.txt"
MsgBox .ReadText
Text1.Text = .ReadText
.Close
End With
End Sub
Avatar of Vampir

ASKER

Sorry for bad spelling above. I need a better solution without the use of saving to Text file.
requestheaders have nothing to do with the response...

you don't need to write to a file... you can load the text into the stream directly... then go to the beginnig of the stream,  set the right charset and read it out again...


ASPGuru
Avatar of Vampir

ASKER

Sorry for bad spelling above. I need a better solution without the use of saving to Text file.
Avatar of Vampir

ASKER

Could you please post a working code?
try something like this:


Set strm2 = CreateObject("adodb.stream")
With strm2
.Type = 2
.Charset = "euc-kr" 'Use any proper charset
.Open
objXMLHTTP.responseStream.copyTo(strm2)
MsgBox .ReadText
Text1.Text = .ReadText
.Close
End With
End Sub


ASPGuru
Avatar of Vampir

ASKER

Could you please post a working code?
Avatar of Vampir

ASKER

Refresh...

Sorry ASPGuru but your code doesnt seem to work:
Error is Method Required at: objXMLHTTP.responseStream.CopyTo (strm2)
there doesnt seem to be sub method .CopyTo.

Here is the code is used (I changed MSXML a little, so it gives better error reporting)

Sub test()
sFile = "c:\test3.txt"
sURL = "http://www.nforce.nl/nfos/clear_txt.php?id=27651"

Dim objXMLHTTP As MSXML2.ServerXMLHTTP40
Set objXMLHTTP = New MSXML2.ServerXMLHTTP40

objXMLHTTP.Open "GET", sURL, False
objXMLHTTP.send

Set strm2 = CreateObject("adodb.stream")
With strm2
.Type = 2
.Charset = "euc-kr" 'Use any proper charset
.Open
objXMLHTTP.responseStream.CopyTo (strm2)
MsgBox .ReadText
Text1.Text = .ReadText
.Close
End With
End Sub
Avatar of Vampir

ASKER

Btw I've also decided to just use text files instead inserting text in Access DB (too much trouble anyway).
Not to mention that still not every character is recognized, is there a Charset table for all characters? I've tried iso-8859-1.
I'll still give points for a working example for my previous question.
ok... then i won't bother to find the db solution..

you need to use the encoding of the html file...
this can be different for any file...

ASPGuru
Avatar of Vampir

ASKER

All files are plain text similar to http://www.nforce.nl/nfos/clear_txt.php?id=27651.
What encoding should I use?
Avatar of Vampir

ASKER

Remove . at the end of the link
well... it depends... i can't tell you...
the characters with code 0 - 127 are equal for all ascci encodings...
just for characters above it depends...

oh... i just had a look at the text...
with which characters do you have problems?
NFOs normally use the DOS-charset, which also includes this frame graphics...

ASPGuru
Avatar of Vampir

ASKER

When creating a text file using the following method:
.Type = 1 'means binary and doesnt allow charset'
.Open
.Write objXMLHTTP.responseBody 'also binary'
.SaveToFile sFile, 2 ' adSaveCreateOverWrite

character (AND) & is converted to &
There is also >, <, "
Possible others too.

Problem can be solved by editing all files and searching for those strings and replacing them with characters (with an application of course).
Dear expert(s),

A request has been made to close this Q in CS:
https://www.experts-exchange.com/questions/20562615/Please-refund-the-points-and-Lock-the-question.html

Without a response in 72 hrs, a moderator will finalize this question by:

 - Saving this Q as a PAQ and refunding the points to the questionner

When you agree or disagree, please add a comment here.

Thank you.

modulo

Community Support Moderator
Experts Exchange
ASKER CERTIFIED SOLUTION
Avatar of modulo
modulo

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial