[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 784
  • Last Modified:

converting non-ASCII characters to ASCII plain text in ASP/VB Script

I have an ASP web app that takes a user-submitted CSV file and populates a SQL database table using the data. However, some users are exporting Excel files into CSV format, and its not always plain text. Examples would be typographer's quotes instead of plain text ", fancy apostrophe's instead of ', etc. So I end up with garbage characters in the table.

Is there some way I can reformat the data "on the fly" so the garbage characters will be converted properly? I don't want to simply delete them, they need to be intelligently converted into their corresponding ASCII character, whenever possible. I'm sure there are some characters that can't be converted. In BB Edit (my text editor), if I do a convert-to-ASCII it is smart enough to convert a © to (C), ® to (R), etc. That would be really nice, though not sure if its practical in this case.

I also don't know if this needs to be done on a character-by-character basis (which may slow the app down considerably), or if there is some global way to do the conversion?

I found this script, but can't seem to get it to work:

http://www.robvanderwoude.com/vbstech_files_utf8.php

Would appreciate any advice. Thanks!
0
bbdesign
Asked:
bbdesign
1 Solution
 
_Stilgar_Commented:
The code you provided link to does the opposite of what you're after. You are trying to take UTF8 or non-standard ANSI codes and transform them to the limited 256-code ASCII, which exists on all ANSI tables. This has nothing to do with simple conversions, as © exists in all of those codepages, yet you want to convert it to (C) and so on.

As far as I know, this is only possible to do by building a conversion table and replacing them one-by-one. Use the VBScript WChar function to handle Unicode streams correctly (and to convert non-Unicode to Unicode in order to provide correct conversion), and then replace chars one by one, correcting and adding as you go. There are probably tables listing all those characters commonly used, which you could use.

Since Unicode/ANSI handling in ASP/VBscript is limited at best, I'd even consider building a small COM object to do this job, to make sure no codepage is left behind.
0
 
bbdesignAuthor Commented:
Thanks for the advice!
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now