I'm building a ColdFusion Application that allows a user to upload a CSV file, which then gets imported into a MySQL database.
The properties of the uploaded CSV file are as follows:
1) is comma separated
2) has quotation mark qualifiers
3) contains some empty columns
4) has no column headers
5) contains only a single row of data
6) contains columns that sometimes have commas in them
7) is plain text and can be viewed easily using a text editor
I'm reading the file using CFFILE, and would like to check it for illegal characters. More specifically, I only want to allow for letters (both upper and lower case), numbers, punctuation, carriage returns and line feeds -- you know -- the kinds of characters you would expect to find in a comma separated CSV file.
There is one offending character that I constantly encounter in these uploaded CSV files. I'm not exactly sure what it is, but you can clearly see an example of it you refer to the CSV file that I've attached to this post. It's a "VT" that's enclosed in a black circle, .. and I need to alert the user regarding the presence of this unwanted character in the CSV file BEFORE they attempt to import it into the database ... since it always results in a ColdFusion error.
Currently, I am using the following code:
<cffile action="read" file="C:\wwwroot\InspectionRequest_1.csv" variable="csvData">
<cfif refind("[^a-zA-Z0-9 ,.&'$()\-+*=/]",csvData)>
<p class="errortext">AN ERROR HAS OCCURRED!!!</p>
How can I extend this regex so that it's forgiving of all the types of characters that you'd expect to find in a comma separated CSV file -- yet unforgiving of all other illegal characters?