Solved

converting non-ASCII characters to ASCII plain text in ASP/VB Script

Posted on 2009-05-19
2
764 Views
Last Modified: 2012-05-07
I have an ASP web app that takes a user-submitted CSV file and populates a SQL database table using the data. However, some users are exporting Excel files into CSV format, and its not always plain text. Examples would be typographer's quotes instead of plain text ", fancy apostrophe's instead of ', etc. So I end up with garbage characters in the table.

Is there some way I can reformat the data "on the fly" so the garbage characters will be converted properly? I don't want to simply delete them, they need to be intelligently converted into their corresponding ASCII character, whenever possible. I'm sure there are some characters that can't be converted. In BB Edit (my text editor), if I do a convert-to-ASCII it is smart enough to convert a © to (C), ® to (R), etc. That would be really nice, though not sure if its practical in this case.

I also don't know if this needs to be done on a character-by-character basis (which may slow the app down considerably), or if there is some global way to do the conversion?

I found this script, but can't seem to get it to work:

http://www.robvanderwoude.com/vbstech_files_utf8.php

Would appreciate any advice. Thanks!
0
Comment
Question by:bbdesign
2 Comments
 
LVL 14

Accepted Solution

by:
_Stilgar_ earned 500 total points
ID: 24429844
The code you provided link to does the opposite of what you're after. You are trying to take UTF8 or non-standard ANSI codes and transform them to the limited 256-code ASCII, which exists on all ANSI tables. This has nothing to do with simple conversions, as © exists in all of those codepages, yet you want to convert it to (C) and so on.

As far as I know, this is only possible to do by building a conversion table and replacing them one-by-one. Use the VBScript WChar function to handle Unicode streams correctly (and to convert non-Unicode to Unicode in order to provide correct conversion), and then replace chars one by one, correcting and adding as you go. There are probably tables listing all those characters commonly used, which you could use.

Since Unicode/ANSI handling in ASP/VBscript is limited at best, I'd even consider building a small COM object to do this job, to make sure no codepage is left behind.
0
 

Author Comment

by:bbdesign
ID: 24430895
Thanks for the advice!
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Welcome, welcome!  If you are new to the series and haven't been following along, please take a brief moment to review the first three installments: Part 1 (http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/A_266-VBScri…
I was asked about the differences between classic ASP and ASP.NET, so let me put them down here, for reference: Let's make the introductions... Classic ASP was launched by Microsoft in 1998 and dynamically generate web pages upon user interact…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question