Problem parsing tab delimted csv/txt file - some columns contains carriage returns and some columns are empty

Posted on 2006-11-03
Last Modified: 2013-12-24
I've developed an app that allows a user to upload a TAB DELIMTED CSV file (of job listings), and which is then turned into a query object on the front end of his website (against which searches are conducted).  

I instructed the client to export his data as a TAB DELIMTED TEXT file, and to then re-name the file so that it has a CSV extension.  This is the file that gets uploaded.

When the file gets uploaded, I am using the ColdFusion Replace function to strip out quotation marks wherever they exist in the file, and then the modified data gets re-written to another, final CSV file.  What we end up with is a TAB DELIMTED CSV file that doesn't contain any quotation marks (which are sometimes automatically inserted as text qualifiers when there is a comma in the column data).  

Everything works fine when all of the following criteria are met:

1) All of the columns contain data
2) There are no carriage returns in any of the column data
3) There are no TAB characters in any of the column data

However, .. I've recently learned that the column data in the uploaded files will often contain carriage returns, and that there will often be columns that contain no data. Obviously, I need to re-think my approach and devise some kind of alternate solution for this.

So -- how can I work around this?  Is there some other, more compatible text-based file format that I should have the client export his data as? I suspect that using quotation marks as text quailfiers might actually be necessary in this case, though I don't want to be to hasty about making the switch just yet.  

Also, -- is there any way that I can replace empty columns with, .. say, ... 'N/A' prior to re-saving the data as a TAB-delimited CSV file?  

Lastly, -- how on earth should I handle those additional carriage returns in the column data?  If I use the replace function to strip them out -- then I would effectively corrupt the import file, since there would be no more expected carriage returns at the end of each row of data.  

Note that the file upload routines I already have in place could easily be adapted so that they "clean up" the file in whatever way that's necessary.  I just need some tips on how to handle the unwanted characters.

Any tips or suggestions would be greatly appreciated.  

- Yvan

Question by:IDEASDesign
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
LVL 25

Expert Comment

ID: 17873015
You've gotten yourself into a bad situation.
Your problems need to be taken care of at the source (where the csv is created)
1) I think you should have your client export data (from where?) to an excel sheet instead of csv or txt.
2) You can then read the excel sheet and replace whatever is needed.
3) Inserting the data should then be fairly easy.

Author Comment

ID: 17873412
Ok, .. that's what I've concluded as well - that I need to implement some kind of custom macro in Excel that would prepare the data for importing and then export it to the needed format.  The ideal Excel macro would do all of the following:

1) replace all tab characters in any of the columns with spaces
2) remove all quotes from any of the columns that contain them
3) export the data as a TAB DELIMTED CSV file, -- using quote symbols as text qualifiers, -- and that exports all of the empty columns as "" along with all the other columns that contains data

You wouldn't happen to know how to program excel macros, would you?  Because I think that's what I'll be needing at this point.  The above specifications would most certainly do the trick, I think.

- Yvan

P.S. I don't believe it's possible to read an XLS file via CHTTP like you are able to do with a CSV or TXT file.  Or am I misken about that?  If so, how would I extract the data from the XLS file (without having to set up a new datasource)?

LVL 25

Expert Comment

ID: 17873504
When they create the xls I don't believe you need to do any processing on it - just make sure the columns are in place.
Then have them upload it via a form you create.
Then once you have the xls on the server you can:
<cfquery datasource="#Request.Datasource#" name="XLSData">
SELECT desired columns
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=#xlsfile#;HDR=YES', 'SELECT * FROM [Sheet1$]')
#xlsfile# is the absolute path to xls sheet "D:\web\text.xls"
Where it says Sheet1 - this needs to be your sheet name in your file - Sheet1 is the default, but it can be renamed
This is using sql server
Your query XLSData is now available to do whatever with the data
In your output query you can then strip whatever or add whatever to get desired results - then insert to table.

If you're using access - I believe you can simply open access and click the "Open" icon selecting files of type "*.xls"
this will import your sheet into an access table where you can do whatever with the data
This would be a manual approach though.

I hope you're using sql server

Simple, centralized multimedia control

Watch and learn to see how ATEN provided an easy and effective way for three jointly-owned pubs to control the 60 televisions located across their three venues utilizing the ATEN Control System, Modular Matrix Switch and HDBaseT extenders.


Author Comment

ID: 17873540
No, I am not using any kind of database at all.  I merely read the contents of the uploaded CSV file using CHTTP and then turn it into a query object using <cfquery dbtype="query">.  

The method you've described, it seems, would require that I first set up a datasource in CFAdmin.  Am I correct?  

If the setting up of a ColdFusion datasource is NOT needed, .. then will this approach that you've outlined work using ColdFusion MX 6.1?  Or does it only work using ColdFusion MX  7 ?

Also, why would interfacing with an XLS file be any more efficient than the method I'm using now?  Would it be more forgiving of the missing column data and of the illegal characters in the individual columns?  I am open to using this approach, .. but of course, I would prefer not to bother with it if it'll just put me right back where I started.

If it's necessary that I first set up a ColdFusion datasource for the Excel file, ... what's the process for doing that in CFAdmin?  In other words, .. which driver do I need to choose, and are there any special settings that I need to apply?

- Yvan

LVL 25

Accepted Solution

dgrafx earned 500 total points
ID: 17873657
The reason you can't read a xls file as simply as a text file is because of it's internal formatting - it's a win32 app not just text.
And you can't read doing <cfquery dbtype="query"> either.

>>would require that I first set up a datasource in CFAdmin.  Am I correct?  

should work with any recent version of cf
doing in the fashion I described would yes be more efficient, for one thing - the method you're using now doesn't work.

the output will give you columns that may be null or have incorrect data, where you can then "fix"
the method you are using will not even give you a column if data there is null - as I'm sure you realized - and is one of your problems.

the clincher is the carriage returns within a column's value - when you are using a cr as delimiter - absolutely will not fly.

The concept though is not to setup a datasource for the excel file, but to read the excel file with sql server.
You do need to setup the sql server datasource and it doesn't matter what it's pointed to as long as it's a valid sql server db.
sql server then actually uses access drivers (as you see in the code) to read the file.

but since you don't use a datasource then why not use a custom tag.
Here is one that I have used in a case where sql server didn't have correct permissions to read an excel file and the host wouldn't make the changes.
after thinking about it - this is really the way to go and you won't need to setup a datasource.
You will need to install this tag and if you are hosting where you don't have full control of the server - you'll need your host to set it up for you.

good luck

Author Comment

ID: 17881276
Ok, .. I've tried the method you outlined where you create an MSSQL datasource, set it up in CFadmin, and then use the sql drivers to query the uploaded xls file:


<cfset xlsfile = '#application.server_path#job_listings\joblistings.xls'>

<cfquery datasource="#application.datasource#" name="XLSData">
      SELECT JobIdentifier
      FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=#xlsfile#;HDR=YES', 'SELECT * FROM [Sheet1$]')

<cfoutput query="XLSData">#JobIdentifier#<br></cfoutput>


However, .. I am getting this error, and I'm not sure why:

[Macromedia][SQLServer JDBC Driver][SQLServer]OLE DB provider 'Microsoft.Jet.OLEDB.4.0' reported an error. The provider did not give any information about the error.

SQLSTATE         HY000

I am open to experimenting with the custom tag yu highlighted (thank you for that), .. but want to explore the cost-free option first.  Any idea why I'm getting these errors? The mssql database that I've set up as a coldfusion datasource DOES exist, and I set it up in CFAdmin using the MSSQL driver.  

- Yvan


LVL 25

Expert Comment

ID: 17881416
First - do you control this server?
i.e. can you modify the registry?
What account is sql server logged on as? - look in services / mssql$servername

Author Comment

ID: 17881477
Yes, I have full admin access to the server , ... and yes I can modify the registry.

I don't know what you mean when you ask me to look in services / mssql$servername though.  

I am looking in services, and all I see listed is MSSQLSERVER and MSSQLServerADHelper

I believe, however, that sql server is logged on as 'sa', as that is what I used to configure my DSN in CFAdmin.

- Yvan


Author Comment

ID: 17881701
Actually -- nevermind.  I just got approval from my employer for us to purchase the excel2qery custom tag.  Thanks so much for your help.  I am awarding you the 500 pts.

 - Yvan
LVL 25

Expert Comment

ID: 17881747
Ok - when you first setup sql server - you can name it - you must not have so it appears as just MSSQLSERVER - no prob

sa should be good to go, but I did take a look and on our setup it's logged on as local system account.
if you need to look at that ever it would be in Services / MSSQLSERVER / log on tab

look for this registry key:

or search for DisallowAdhocAccess - set it's value to 0 wherever you find it.
try that
LVL 25

Expert Comment

ID: 17881776
ah - I see you posted again and awarded me the points - thank you!
yes the tag works great and considering it's cost it's probably cheaper than the time it takes to get the "free" method setup.

good luck ...

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: kevp75
Hey folks, 'bout time for me to come around with a little tip. Thanks to IIS 7.5 Extensions and Microsoft (well... really Windows 8, and IIS 8 I guess...), we can now prime our Application Pools, when IIS starts. Now, though it would be nice t…
Lease-to-own eliminates the expenditure of hardware replacement and allows you to pay off the server over time. Usually, this is much cheaper than leasing servers. Think of lease-to-own as credit without interest.
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question