Solved

Problem parsing tab delimted csv/txt file - some columns contains carriage returns and some columns are empty

Posted on 2006-11-03
11
3,499 Views
Last Modified: 2013-12-24
I've developed an app that allows a user to upload a TAB DELIMTED CSV file (of job listings), and which is then turned into a query object on the front end of his website (against which searches are conducted).  

I instructed the client to export his data as a TAB DELIMTED TEXT file, and to then re-name the file so that it has a CSV extension.  This is the file that gets uploaded.

When the file gets uploaded, I am using the ColdFusion Replace function to strip out quotation marks wherever they exist in the file, and then the modified data gets re-written to another, final CSV file.  What we end up with is a TAB DELIMTED CSV file that doesn't contain any quotation marks (which are sometimes automatically inserted as text qualifiers when there is a comma in the column data).  

Everything works fine when all of the following criteria are met:

1) All of the columns contain data
2) There are no carriage returns in any of the column data
3) There are no TAB characters in any of the column data

However, .. I've recently learned that the column data in the uploaded files will often contain carriage returns, and that there will often be columns that contain no data. Obviously, I need to re-think my approach and devise some kind of alternate solution for this.

So -- how can I work around this?  Is there some other, more compatible text-based file format that I should have the client export his data as? I suspect that using quotation marks as text quailfiers might actually be necessary in this case, though I don't want to be to hasty about making the switch just yet.  

Also, -- is there any way that I can replace empty columns with, .. say, ... 'N/A' prior to re-saving the data as a TAB-delimited CSV file?  

Lastly, -- how on earth should I handle those additional carriage returns in the column data?  If I use the replace function to strip them out -- then I would effectively corrupt the import file, since there would be no more expected carriage returns at the end of each row of data.  

Note that the file upload routines I already have in place could easily be adapted so that they "clean up" the file in whatever way that's necessary.  I just need some tips on how to handle the unwanted characters.

Any tips or suggestions would be greatly appreciated.  

Thanks!
- Yvan









0
Comment
Question by:IDEASDesign
  • 6
  • 5
11 Comments
 
LVL 24

Expert Comment

by:dgrafx
Comment Utility
You've gotten yourself into a bad situation.
Your problems need to be taken care of at the source (where the csv is created)
1) I think you should have your client export data (from where?) to an excel sheet instead of csv or txt.
2) You can then read the excel sheet and replace whatever is needed.
3) Inserting the data should then be fairly easy.
0
 

Author Comment

by:IDEASDesign
Comment Utility
Ok, .. that's what I've concluded as well - that I need to implement some kind of custom macro in Excel that would prepare the data for importing and then export it to the needed format.  The ideal Excel macro would do all of the following:

1) replace all tab characters in any of the columns with spaces
2) remove all quotes from any of the columns that contain them
3) export the data as a TAB DELIMTED CSV file, -- using quote symbols as text qualifiers, -- and that exports all of the empty columns as "" along with all the other columns that contains data

You wouldn't happen to know how to program excel macros, would you?  Because I think that's what I'll be needing at this point.  The above specifications would most certainly do the trick, I think.

Thanks!
- Yvan

P.S. I don't believe it's possible to read an XLS file via CHTTP like you are able to do with a CSV or TXT file.  Or am I misken about that?  If so, how would I extract the data from the XLS file (without having to set up a new datasource)?

0
 
LVL 24

Expert Comment

by:dgrafx
Comment Utility
When they create the xls I don't believe you need to do any processing on it - just make sure the columns are in place.
Then have them upload it via a form you create.
Then once you have the xls on the server you can:
<cfquery datasource="#Request.Datasource#" name="XLSData">
SELECT desired columns
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=#xlsfile#;HDR=YES', 'SELECT * FROM [Sheet1$]')
</cfquery>
#xlsfile# is the absolute path to xls sheet "D:\web\text.xls"
Where it says Sheet1 - this needs to be your sheet name in your file - Sheet1 is the default, but it can be renamed
This is using sql server
Your query XLSData is now available to do whatever with the data
In your output query you can then strip whatever or add whatever to get desired results - then insert to table.

If you're using access - I believe you can simply open access and click the "Open" icon selecting files of type "*.xls"
this will import your sheet into an access table where you can do whatever with the data
This would be a manual approach though.

I hope you're using sql server

0
 

Author Comment

by:IDEASDesign
Comment Utility
No, I am not using any kind of database at all.  I merely read the contents of the uploaded CSV file using CHTTP and then turn it into a query object using <cfquery dbtype="query">.  

The method you've described, it seems, would require that I first set up a datasource in CFAdmin.  Am I correct?  

If the setting up of a ColdFusion datasource is NOT needed, .. then will this approach that you've outlined work using ColdFusion MX 6.1?  Or does it only work using ColdFusion MX  7 ?

Also, why would interfacing with an XLS file be any more efficient than the method I'm using now?  Would it be more forgiving of the missing column data and of the illegal characters in the individual columns?  I am open to using this approach, .. but of course, I would prefer not to bother with it if it'll just put me right back where I started.

If it's necessary that I first set up a ColdFusion datasource for the Excel file, ... what's the process for doing that in CFAdmin?  In other words, .. which driver do I need to choose, and are there any special settings that I need to apply?

Thanks!
- Yvan

0
 
LVL 24

Accepted Solution

by:
dgrafx earned 500 total points
Comment Utility
Hi,
The reason you can't read a xls file as simply as a text file is because of it's internal formatting - it's a win32 app not just text.
And you can't read doing <cfquery dbtype="query"> either.

>>would require that I first set up a datasource in CFAdmin.  Am I correct?  
Yes

should work with any recent version of cf
doing in the fashion I described would yes be more efficient, for one thing - the method you're using now doesn't work.

the output will give you columns that may be null or have incorrect data, where you can then "fix"
the method you are using will not even give you a column if data there is null - as I'm sure you realized - and is one of your problems.

the clincher is the carriage returns within a column's value - when you are using a cr as delimiter - absolutely will not fly.

The concept though is not to setup a datasource for the excel file, but to read the excel file with sql server.
You do need to setup the sql server datasource and it doesn't matter what it's pointed to as long as it's a valid sql server db.
sql server then actually uses access drivers (as you see in the code) to read the file.

but since you don't use a datasource then why not use a custom tag.
Here is one that I have used in a case where sql server didn't have correct permissions to read an excel file and the host wouldn't make the changes.
CFX_Excel2Query
http://www.emerle.net/programming/display.cfm/t/cfx_excel2query
after thinking about it - this is really the way to go and you won't need to setup a datasource.
You will need to install this tag and if you are hosting where you don't have full control of the server - you'll need your host to set it up for you.

good luck
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 

Author Comment

by:IDEASDesign
Comment Utility
Ok, .. I've tried the method you outlined where you create an MSSQL datasource, set it up in CFadmin, and then use the sql drivers to query the uploaded xls file:

-----------------------------------------------------------

<cfset xlsfile = '#application.server_path#job_listings\joblistings.xls'>

<cfquery datasource="#application.datasource#" name="XLSData">
      SELECT JobIdentifier
      FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=#xlsfile#;HDR=YES', 'SELECT * FROM [Sheet1$]')
</cfquery>

<cfoutput query="XLSData">#JobIdentifier#<br></cfoutput>

-----------------------------------------------------------

However, .. I am getting this error, and I'm not sure why:

[Macromedia][SQLServer JDBC Driver][SQLServer]OLE DB provider 'Microsoft.Jet.OLEDB.4.0' reported an error. The provider did not give any information about the error.

VENDORERRORCODE          7399
SQLSTATE         HY000

I am open to experimenting with the custom tag yu highlighted (thank you for that), .. but want to explore the cost-free option first.  Any idea why I'm getting these errors? The mssql database that I've set up as a coldfusion datasource DOES exist, and I set it up in CFAdmin using the MSSQL driver.  

Thanks,
- Yvan


 



0
 
LVL 24

Expert Comment

by:dgrafx
Comment Utility
First - do you control this server?
i.e. can you modify the registry?
What account is sql server logged on as? - look in services / mssql$servername
0
 

Author Comment

by:IDEASDesign
Comment Utility
Yes, I have full admin access to the server , ... and yes I can modify the registry.

I don't know what you mean when you ask me to look in services / mssql$servername though.  

I am looking in services, and all I see listed is MSSQLSERVER and MSSQLServerADHelper

I believe, however, that sql server is logged on as 'sa', as that is what I used to configure my DSN in CFAdmin.

- Yvan


0
 

Author Comment

by:IDEASDesign
Comment Utility
Actually -- nevermind.  I just got approval from my employer for us to purchase the excel2qery custom tag.  Thanks so much for your help.  I am awarding you the 500 pts.

Cheers,
 - Yvan
0
 
LVL 24

Expert Comment

by:dgrafx
Comment Utility
Ok - when you first setup sql server - you can name it - you must not have so it appears as just MSSQLSERVER - no prob

sa should be good to go, but I did take a look and on our setup it's logged on as local system account.
if you need to look at that ever it would be in Services / MSSQLSERVER / log on tab

look for this registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Providers\SQLOLEDB\DisallowAdhocAccess

or search for DisallowAdhocAccess - set it's value to 0 wherever you find it.
try that
0
 
LVL 24

Expert Comment

by:dgrafx
Comment Utility
ah - I see you posted again and awarded me the points - thank you!
yes the tag works great and considering it's cost it's probably cheaper than the time it takes to get the "free" method setup.

good luck ...
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Join & Write a Comment

In our day to day coding, how many times have we come across a necessity to check whether a URL is a broken link or not? For those of you that answered countless and are using ColdFusion like myself, then this article is for you.  It will show yo…
When it comes to showing a 404 error page to your visitors, you do not want that generic page to show, and you especially do not want your hosting provider’s ad error page to show either. In this article, I will show you how to enable the custom 40…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now