Solved

converting non-ASCII characters to ASCII plain text in ASP/VB Script

Posted on 2009-05-19
2
754 Views
Last Modified: 2012-05-07
I have an ASP web app that takes a user-submitted CSV file and populates a SQL database table using the data. However, some users are exporting Excel files into CSV format, and its not always plain text. Examples would be typographer's quotes instead of plain text ", fancy apostrophe's instead of ', etc. So I end up with garbage characters in the table.

Is there some way I can reformat the data "on the fly" so the garbage characters will be converted properly? I don't want to simply delete them, they need to be intelligently converted into their corresponding ASCII character, whenever possible. I'm sure there are some characters that can't be converted. In BB Edit (my text editor), if I do a convert-to-ASCII it is smart enough to convert a © to (C), ® to (R), etc. That would be really nice, though not sure if its practical in this case.

I also don't know if this needs to be done on a character-by-character basis (which may slow the app down considerably), or if there is some global way to do the conversion?

I found this script, but can't seem to get it to work:

http://www.robvanderwoude.com/vbstech_files_utf8.php

Would appreciate any advice. Thanks!
0
Comment
Question by:bbdesign
2 Comments
 
LVL 14

Accepted Solution

by:
_Stilgar_ earned 500 total points
ID: 24429844
The code you provided link to does the opposite of what you're after. You are trying to take UTF8 or non-standard ANSI codes and transform them to the limited 256-code ASCII, which exists on all ANSI tables. This has nothing to do with simple conversions, as © exists in all of those codepages, yet you want to convert it to (C) and so on.

As far as I know, this is only possible to do by building a conversion table and replacing them one-by-one. Use the VBScript WChar function to handle Unicode streams correctly (and to convert non-Unicode to Unicode in order to provide correct conversion), and then replace chars one by one, correcting and adding as you go. There are probably tables listing all those characters commonly used, which you could use.

Since Unicode/ANSI handling in ASP/VBscript is limited at best, I'd even consider building a small COM object to do this job, to make sure no codepage is left behind.
0
 

Author Comment

by:bbdesign
ID: 24430895
Thanks for the advice!
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hello again, all.  For those of you that have been following along, you'll know that this is my third article on this topic (though it is not Part III).  This article is sort of remedial, and probably the topic with which I should have started the s…
When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question