Solved

# Problem parsing tab delimted csv/txt file - some columns contains carriage returns and some columns are empty

Posted on 2006-11-03
3,506 Views
I've developed an app that allows a user to upload a TAB DELIMTED CSV file (of job listings), and which is then turned into a query object on the front end of his website (against which searches are conducted).

I instructed the client to export his data as a TAB DELIMTED TEXT file, and to then re-name the file so that it has a CSV extension.  This is the file that gets uploaded.

When the file gets uploaded, I am using the ColdFusion Replace function to strip out quotation marks wherever they exist in the file, and then the modified data gets re-written to another, final CSV file.  What we end up with is a TAB DELIMTED CSV file that doesn't contain any quotation marks (which are sometimes automatically inserted as text qualifiers when there is a comma in the column data).

Everything works fine when all of the following criteria are met:

1) All of the columns contain data
2) There are no carriage returns in any of the column data
3) There are no TAB characters in any of the column data

However, .. I've recently learned that the column data in the uploaded files will often contain carriage returns, and that there will often be columns that contain no data. Obviously, I need to re-think my approach and devise some kind of alternate solution for this.

So -- how can I work around this?  Is there some other, more compatible text-based file format that I should have the client export his data as? I suspect that using quotation marks as text quailfiers might actually be necessary in this case, though I don't want to be to hasty about making the switch just yet.

Also, -- is there any way that I can replace empty columns with, .. say, ... 'N/A' prior to re-saving the data as a TAB-delimited CSV file?

Lastly, -- how on earth should I handle those additional carriage returns in the column data?  If I use the replace function to strip them out -- then I would effectively corrupt the import file, since there would be no more expected carriage returns at the end of each row of data.

Note that the file upload routines I already have in place could easily be adapted so that they "clean up" the file in whatever way that's necessary.  I just need some tips on how to handle the unwanted characters.

Any tips or suggestions would be greatly appreciated.

Thanks!
- Yvan

0
Question by:IDEASDesign
• 6
• 5

LVL 25

Expert Comment

ID: 17873015
You've gotten yourself into a bad situation.
Your problems need to be taken care of at the source (where the csv is created)
1) I think you should have your client export data (from where?) to an excel sheet instead of csv or txt.
2) You can then read the excel sheet and replace whatever is needed.
3) Inserting the data should then be fairly easy.
0

Author Comment

ID: 17873412
Ok, .. that's what I've concluded as well - that I need to implement some kind of custom macro in Excel that would prepare the data for importing and then export it to the needed format.  The ideal Excel macro would do all of the following:

1) replace all tab characters in any of the columns with spaces
2) remove all quotes from any of the columns that contain them
3) export the data as a TAB DELIMTED CSV file, -- using quote symbols as text qualifiers, -- and that exports all of the empty columns as "" along with all the other columns that contains data

You wouldn't happen to know how to program excel macros, would you?  Because I think that's what I'll be needing at this point.  The above specifications would most certainly do the trick, I think.

Thanks!
- Yvan

P.S. I don't believe it's possible to read an XLS file via CHTTP like you are able to do with a CSV or TXT file.  Or am I misken about that?  If so, how would I extract the data from the XLS file (without having to set up a new datasource)?

0

LVL 25

Expert Comment

ID: 17873504
When they create the xls I don't believe you need to do any processing on it - just make sure the columns are in place.
Then have them upload it via a form you create.
Then once you have the xls on the server you can:
<cfquery datasource="#Request.Datasource#" name="XLSData">
SELECT desired columns
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=#xlsfile#;HDR=YES', 'SELECT * FROM [Sheet1$]') </cfquery> #xlsfile# is the absolute path to xls sheet "D:\web\text.xls" Where it says Sheet1 - this needs to be your sheet name in your file - Sheet1 is the default, but it can be renamed This is using sql server Your query XLSData is now available to do whatever with the data In your output query you can then strip whatever or add whatever to get desired results - then insert to table. If you're using access - I believe you can simply open access and click the "Open" icon selecting files of type "*.xls" this will import your sheet into an access table where you can do whatever with the data This would be a manual approach though. I hope you're using sql server 0 Author Comment ID: 17873540 No, I am not using any kind of database at all. I merely read the contents of the uploaded CSV file using CHTTP and then turn it into a query object using <cfquery dbtype="query">. The method you've described, it seems, would require that I first set up a datasource in CFAdmin. Am I correct? If the setting up of a ColdFusion datasource is NOT needed, .. then will this approach that you've outlined work using ColdFusion MX 6.1? Or does it only work using ColdFusion MX 7 ? Also, why would interfacing with an XLS file be any more efficient than the method I'm using now? Would it be more forgiving of the missing column data and of the illegal characters in the individual columns? I am open to using this approach, .. but of course, I would prefer not to bother with it if it'll just put me right back where I started. If it's necessary that I first set up a ColdFusion datasource for the Excel file, ... what's the process for doing that in CFAdmin? In other words, .. which driver do I need to choose, and are there any special settings that I need to apply? Thanks! - Yvan 0 LVL 25 Accepted Solution dgrafx earned 500 total points ID: 17873657 Hi, The reason you can't read a xls file as simply as a text file is because of it's internal formatting - it's a win32 app not just text. And you can't read doing <cfquery dbtype="query"> either. >>would require that I first set up a datasource in CFAdmin. Am I correct? Yes should work with any recent version of cf doing in the fashion I described would yes be more efficient, for one thing - the method you're using now doesn't work. the output will give you columns that may be null or have incorrect data, where you can then "fix" the method you are using will not even give you a column if data there is null - as I'm sure you realized - and is one of your problems. the clincher is the carriage returns within a column's value - when you are using a cr as delimiter - absolutely will not fly. The concept though is not to setup a datasource for the excel file, but to read the excel file with sql server. You do need to setup the sql server datasource and it doesn't matter what it's pointed to as long as it's a valid sql server db. sql server then actually uses access drivers (as you see in the code) to read the file. but since you don't use a datasource then why not use a custom tag. Here is one that I have used in a case where sql server didn't have correct permissions to read an excel file and the host wouldn't make the changes. CFX_Excel2Query http://www.emerle.net/programming/display.cfm/t/cfx_excel2query after thinking about it - this is really the way to go and you won't need to setup a datasource. You will need to install this tag and if you are hosting where you don't have full control of the server - you'll need your host to set it up for you. good luck 0 Author Comment ID: 17881276 Ok, .. I've tried the method you outlined where you create an MSSQL datasource, set it up in CFadmin, and then use the sql drivers to query the uploaded xls file: ----------------------------------------------------------- <cfset xlsfile = '#application.server_path#job_listings\joblistings.xls'> <cfquery datasource="#application.datasource#" name="XLSData"> SELECT JobIdentifier FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=#xlsfile#;HDR=YES', 'SELECT * FROM [Sheet1$]')
</cfquery>

<cfoutput query="XLSData">#JobIdentifier#<br></cfoutput>

-----------------------------------------------------------

However, .. I am getting this error, and I'm not sure why:

[Macromedia][SQLServer JDBC Driver][SQLServer]OLE DB provider 'Microsoft.Jet.OLEDB.4.0' reported an error. The provider did not give any information about the error.

VENDORERRORCODE          7399
SQLSTATE         HY000

I am open to experimenting with the custom tag yu highlighted (thank you for that), .. but want to explore the cost-free option first.  Any idea why I'm getting these errors? The mssql database that I've set up as a coldfusion datasource DOES exist, and I set it up in CFAdmin using the MSSQL driver.

Thanks,
- Yvan

0

LVL 25

Expert Comment

ID: 17881416
First - do you control this server?
i.e. can you modify the registry?
What account is sql server logged on as? - look in services / mssql$servername 0 Author Comment ID: 17881477 Yes, I have full admin access to the server , ... and yes I can modify the registry. I don't know what you mean when you ask me to look in services / mssql$servername though.

I am looking in services, and all I see listed is MSSQLSERVER and MSSQLServerADHelper

I believe, however, that sql server is logged on as 'sa', as that is what I used to configure my DSN in CFAdmin.

- Yvan

0

Author Comment

ID: 17881701
Actually -- nevermind.  I just got approval from my employer for us to purchase the excel2qery custom tag.  Thanks so much for your help.  I am awarding you the 500 pts.

Cheers,
- Yvan
0

LVL 25

Expert Comment

ID: 17881747
Ok - when you first setup sql server - you can name it - you must not have so it appears as just MSSQLSERVER - no prob

sa should be good to go, but I did take a look and on our setup it's logged on as local system account.
if you need to look at that ever it would be in Services / MSSQLSERVER / log on tab

look for this registry key:

or search for DisallowAdhocAccess - set it's value to 0 wherever you find it.
try that
0

LVL 25

Expert Comment

ID: 17881776
ah - I see you posted again and awarded me the points - thank you!
yes the tag works great and considering it's cost it's probably cheaper than the time it takes to get the "free" method setup.

good luck ...
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article explores the design of a cache system that can improve the performance of a web site or web application.  The assumption is that the web site has many more “read” operations than “write” operations (this is commonly the ca…
Meet the world's only “Transparent Cloud™” from Superb Internet Corporation. Now, you can experience firsthand a cloud platform that consistently outperforms Amazon Web Services (AWS), IBM’s Softlayer, and Microsoft’s Azure when it comes to CPU and …
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…