Solved

Detecting illegal characters in CSV file using a ColdFusion Regular Expression

Posted on 2014-03-12
4
972 Views
Last Modified: 2014-03-12
I'm building a ColdFusion Application that allows a user to upload a CSV file, which then gets imported into a MySQL database.  

The properties of the uploaded CSV file are as follows:

1) is comma separated
2) has quotation mark qualifiers
3) contains some empty columns
4) has no column headers
5) contains only a single row of data
6) contains columns that sometimes have commas in them
7) is plain text and can be viewed easily using a text editor

I'm reading the file using CFFILE, and would like to check it for illegal characters.  More specifically, I only want to allow for letters (both upper and lower case), numbers, punctuation, carriage returns and line feeds -- you know -- the kinds of characters you would expect to find in a comma separated CSV file.  

There is one offending character that I constantly encounter in these uploaded CSV files.  I'm not exactly sure what it is, but you can clearly see an example of it you refer to the CSV file that I've attached to this post.  It's a "VT" that's enclosed in a black circle, .. and I need to alert the user regarding the presence of this unwanted character in the CSV file BEFORE they attempt to import it into the database ... since it always results in a ColdFusion error.  

Currently, I am using the following code:

<cffile action="read" file="C:\wwwroot\InspectionRequest_1.csv" variable="csvData">
					
<cfif refind("[^a-zA-Z0-9 ,.&'$()\-+*=/]",csvData)>
					
     <p class="errortext">AN ERROR HAS OCCURRED!!!</p>
					
</cfif>

Open in new window


How can I extend this regex so that it's forgiving of all the types of characters that you'd expect to find in a comma separated CSV file -- yet unforgiving of all other illegal characters?  

Thanks,
- Yvan
InspectionRequest-1.csv
0
Comment
Question by:egoselfaxis
  • 2
  • 2
4 Comments
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 39924129
0x0B is the vertical tab.

With that out of the way, I would do it this way:
\w - all word characters and numbers. Will accept characters from other languages
\s - all space characters, like space, tab, etc
"' - all text fields will be enclosed in "" or ''
,.?!:; - punctuation signs
@~#$%^&*(){}-+=\[\] - accepted symbols, add to the list as needed. you need to escape []

For a test expression like this:
[\w\s"',.?!:;@~#$%^&*(){}-+=\[\]]

Open in new window


HTH,
Dan
0
 

Author Closing Comment

by:egoselfaxis
ID: 39924410
Thank you!  

Since you identified the mysterious character for me, ..  I was actually able to simplify my conditional statement as follows:

<cfif csvData CONTAINS chr(11)>

<p>Error!</p>

</cfif>

Cheers!
- Yvan
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39924426
Glad I could help!

Just as a curiosity: you're sure that is the only illegal character?
0
 

Author Comment

by:egoselfaxis
ID: 39924449
Now, I am not 100% sure just yet .. but this particular vertical tab character has been known to be particularly problematic for us for the last couple of years, so we've decided to handle each offending character on a case-by-case basis until we've identified all of them.

Thanks again!
- yg
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Sometimes databases have MILLIONS of records and we need a way to quickly query that table to return the results me need. Sure you could use CFQUERY but it takes too long when there are millions of records. That is why SOLR was invented. Please …
This article descibes how to create a connection between Excel and SAP and how to move data from Excel to SAP or the other way around.
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now