Problem parsing tab delimted csv/txt file - some columns contains carriage returns and some columns are empty
Posted on 2006-11-03
I've developed an app that allows a user to upload a TAB DELIMTED CSV file (of job listings), and which is then turned into a query object on the front end of his website (against which searches are conducted).
I instructed the client to export his data as a TAB DELIMTED TEXT file, and to then re-name the file so that it has a CSV extension. This is the file that gets uploaded.
When the file gets uploaded, I am using the ColdFusion Replace function to strip out quotation marks wherever they exist in the file, and then the modified data gets re-written to another, final CSV file. What we end up with is a TAB DELIMTED CSV file that doesn't contain any quotation marks (which are sometimes automatically inserted as text qualifiers when there is a comma in the column data).
Everything works fine when all of the following criteria are met:
1) All of the columns contain data
2) There are no carriage returns in any of the column data
3) There are no TAB characters in any of the column data
However, .. I've recently learned that the column data in the uploaded files will often contain carriage returns, and that there will often be columns that contain no data. Obviously, I need to re-think my approach and devise some kind of alternate solution for this.
So -- how can I work around this? Is there some other, more compatible text-based file format that I should have the client export his data as? I suspect that using quotation marks as text quailfiers might actually be necessary in this case, though I don't want to be to hasty about making the switch just yet.
Also, -- is there any way that I can replace empty columns with, .. say, ... 'N/A' prior to re-saving the data as a TAB-delimited CSV file?
Lastly, -- how on earth should I handle those additional carriage returns in the column data? If I use the replace function to strip them out -- then I would effectively corrupt the import file, since there would be no more expected carriage returns at the end of each row of data.
Note that the file upload routines I already have in place could easily be adapted so that they "clean up" the file in whatever way that's necessary. I just need some tips on how to handle the unwanted characters.
Any tips or suggestions would be greatly appreciated.