Link to home
Start Free TrialLog in
Avatar of kankan
kankan

asked on

UTF8 to ASCII

Dear all experts,

i encounter a problem when converting a UTF8 file to ASCII
coz my txt file has some chinese word, i try to use ado to read as ascii format character

Dim rs As ADODB.Recordset
      Set rs = New ADODB.Recordset
      Dim conn As ADODB.Connection
      Set conn = New ADODB.Connection
     
        conn.Open "DRIVER={Microsoft Text Driver (*.txt; *.csv)};" _
            & "DBQ=C:\;" _
            & "HDR=NO;FMT=Delimited;"
      rs.Open "select * from temp.txt", conn, adOpenStatic, _
                  adLockReadOnly, adCmdText
    Do While Not rs.EOF
        Debug.Print rs.Fields.Item(0)
    rs.MoveNext
    Loop
    Set rs = Nothing
    Set conn = Nothing

but the output still utf-8, how can i solve this problem by using VB programming lang ?
Thanks
Avatar of anthonywjones66
anthonywjones66

Is it really ASCII you are after?  ASCII can't represent Chinese characters hence they would have to be replaced by some arbitary ASCII characters.

Do you not in fact require UTF-8 encoded string to become Unicode?

If you open the source file in notepad and select file | save as.. does the encoding dropdown show UTF-8?

Avatar of kankan

ASKER

hi anthonywjones66

actually, it is a html file,  in winXP, when i open it in notepad, it will can show the chinese words, it's becoz it can show the unicode
but i use VB program read the source code such as "<tr><td>askdjka</td></tr>"  some chinese words will change to stranger words such as &#21958;??
hi

place this meta tag in ur html file and try
<html>
<head>
          <meta http-equiv=content-type content=text/html;charset=utf-8>
</head>

;-)
Shiju
hi

>>Debug.Print rs.Fields.Item(0)

immediate window wont print chinese characters using ur debug.print statement
infact vb ide doest support these characters

;-)
Shiju
Indeed but what kankan will be seeing is UTF-8 encoding for these characters rather than the rectangular boxes that VB would normally substitute.

Anthony.
hi

in the case of html , its fine and i have used it many times
just by including meta tags with utf-8 supported
have a look at this

http://www.i18nguy.com/markup/metatags.html

;-)
Shiju
ASKER CERTIFIED SOLUTION
Avatar of anthonywjones66
anthonywjones66

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
For UTF8 string processing, try MultiByteToWideChar: http://bhashaindia.com/ForumV2/shwmessage.aspx?ForumID=7&MessageID=212
For Unicode file handling try the FileSytemObject: http://www.w3schools.com/asp/asp_ref_filesystem.asp
For Unicode controls, try the MS Forms 2.0 library: http://www.wheller.ic24.net/whellersprojects/ares%20chat%20filter/

HTH

J.
J.,

FileSystemObject still can't read a UTF-8 encoded files which is a shame because that would be a good solution otherwise :(

The bahashaindia solution should work albeit a little more complex than my code.   Although it uses the return value as an error status. ;) :P

Anthony.
Avatar of kankan

ASKER

ok, thx all experts firstly
coz i'm off in coming sat, let me try on monday
and reply u then
Avatar of kankan

ASKER

hi anthony,

after using u provided function, it still appear the monster code`
i'm saving the result to a txt file, however, still a utf format, not the ascii format
sigh .. i dunno how to solve, its too complexity !
My code results in a unicode string.

How are you saving this back into a file

Anthony.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial