Solved

converting non-ASCII characters to ASCII plain text in ASP/VB Script

Posted on 2009-05-19
2
768 Views
Last Modified: 2012-05-07
I have an ASP web app that takes a user-submitted CSV file and populates a SQL database table using the data. However, some users are exporting Excel files into CSV format, and its not always plain text. Examples would be typographer's quotes instead of plain text ", fancy apostrophe's instead of ', etc. So I end up with garbage characters in the table.

Is there some way I can reformat the data "on the fly" so the garbage characters will be converted properly? I don't want to simply delete them, they need to be intelligently converted into their corresponding ASCII character, whenever possible. I'm sure there are some characters that can't be converted. In BB Edit (my text editor), if I do a convert-to-ASCII it is smart enough to convert a © to (C), ® to (R), etc. That would be really nice, though not sure if its practical in this case.

I also don't know if this needs to be done on a character-by-character basis (which may slow the app down considerably), or if there is some global way to do the conversion?

I found this script, but can't seem to get it to work:

http://www.robvanderwoude.com/vbstech_files_utf8.php

Would appreciate any advice. Thanks!
0
Comment
Question by:bbdesign
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 14

Accepted Solution

by:
_Stilgar_ earned 500 total points
ID: 24429844
The code you provided link to does the opposite of what you're after. You are trying to take UTF8 or non-standard ANSI codes and transform them to the limited 256-code ASCII, which exists on all ANSI tables. This has nothing to do with simple conversions, as © exists in all of those codepages, yet you want to convert it to (C) and so on.

As far as I know, this is only possible to do by building a conversion table and replacing them one-by-one. Use the VBScript WChar function to handle Unicode streams correctly (and to convert non-Unicode to Unicode in order to provide correct conversion), and then replace chars one by one, correcting and adding as you go. There are probably tables listing all those characters commonly used, which you could use.

Since Unicode/ANSI handling in ASP/VBscript is limited at best, I'd even consider building a small COM object to do this job, to make sure no codepage is left behind.
0
 

Author Comment

by:bbdesign
ID: 24430895
Thanks for the advice!
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This script will sweep a range of IP addresses (class c only, 255.255.255.0) and report to a log the version of office installed. What it does: 1.)      Creates log file in the directory the script is run from (if it doesn't already exist) 2.)      Sweep…
This demonstration started out as a follow up to some recently posted questions on the subject of logging in: http://www.experts-exchange.com/Programming/Languages/Scripting/JavaScript/Q_28634665.html and http://www.experts-exchange.com/Programming/…
This video shows how to use Hyena, from SystemTools Software, to update 100 user accounts from an external text file. View in 1080p for best video quality.

732 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question