Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Detecting illegal characters in CSV file using a ColdFusion Regular Expression

Posted on 2014-03-12
4
Medium Priority
?
1,117 Views
Last Modified: 2014-03-12
I'm building a ColdFusion Application that allows a user to upload a CSV file, which then gets imported into a MySQL database.  

The properties of the uploaded CSV file are as follows:

1) is comma separated
2) has quotation mark qualifiers
3) contains some empty columns
4) has no column headers
5) contains only a single row of data
6) contains columns that sometimes have commas in them
7) is plain text and can be viewed easily using a text editor

I'm reading the file using CFFILE, and would like to check it for illegal characters.  More specifically, I only want to allow for letters (both upper and lower case), numbers, punctuation, carriage returns and line feeds -- you know -- the kinds of characters you would expect to find in a comma separated CSV file.  

There is one offending character that I constantly encounter in these uploaded CSV files.  I'm not exactly sure what it is, but you can clearly see an example of it you refer to the CSV file that I've attached to this post.  It's a "VT" that's enclosed in a black circle, .. and I need to alert the user regarding the presence of this unwanted character in the CSV file BEFORE they attempt to import it into the database ... since it always results in a ColdFusion error.  

Currently, I am using the following code:

<cffile action="read" file="C:\wwwroot\InspectionRequest_1.csv" variable="csvData">
					
<cfif refind("[^a-zA-Z0-9 ,.&'$()\-+*=/]",csvData)>
					
     <p class="errortext">AN ERROR HAS OCCURRED!!!</p>
					
</cfif>

Open in new window


How can I extend this regex so that it's forgiving of all the types of characters that you'd expect to find in a comma separated CSV file -- yet unforgiving of all other illegal characters?  

Thanks,
- Yvan
InspectionRequest-1.csv
0
Comment
Question by:egoselfaxis
  • 2
  • 2
4 Comments
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 2000 total points
ID: 39924129
0x0B is the vertical tab.

With that out of the way, I would do it this way:
\w - all word characters and numbers. Will accept characters from other languages
\s - all space characters, like space, tab, etc
"' - all text fields will be enclosed in "" or ''
,.?!:; - punctuation signs
@~#$%^&*(){}-+=\[\] - accepted symbols, add to the list as needed. you need to escape []

For a test expression like this:
[\w\s"',.?!:;@~#$%^&*(){}-+=\[\]]

Open in new window


HTH,
Dan
0
 

Author Closing Comment

by:egoselfaxis
ID: 39924410
Thank you!  

Since you identified the mysterious character for me, ..  I was actually able to simplify my conditional statement as follows:

<cfif csvData CONTAINS chr(11)>

<p>Error!</p>

</cfif>

Cheers!
- Yvan
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39924426
Glad I could help!

Just as a curiosity: you're sure that is the only illegal character?
0
 

Author Comment

by:egoselfaxis
ID: 39924449
Now, I am not 100% sure just yet .. but this particular vertical tab character has been known to be particularly problematic for us for the last couple of years, so we've decided to handle each offending character on a case-by-case basis until we've identified all of them.

Thanks again!
- yg
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are …
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.

824 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question