Solved

Problem parsing tab delimted csv/txt file - some columns contains carriage returns and some columns are empty

Posted on 2006-11-03
11
3,504 Views
Last Modified: 2013-12-24
I've developed an app that allows a user to upload a TAB DELIMTED CSV file (of job listings), and which is then turned into a query object on the front end of his website (against which searches are conducted).  

I instructed the client to export his data as a TAB DELIMTED TEXT file, and to then re-name the file so that it has a CSV extension.  This is the file that gets uploaded.

When the file gets uploaded, I am using the ColdFusion Replace function to strip out quotation marks wherever they exist in the file, and then the modified data gets re-written to another, final CSV file.  What we end up with is a TAB DELIMTED CSV file that doesn't contain any quotation marks (which are sometimes automatically inserted as text qualifiers when there is a comma in the column data).  

Everything works fine when all of the following criteria are met:

1) All of the columns contain data
2) There are no carriage returns in any of the column data
3) There are no TAB characters in any of the column data

However, .. I've recently learned that the column data in the uploaded files will often contain carriage returns, and that there will often be columns that contain no data. Obviously, I need to re-think my approach and devise some kind of alternate solution for this.

So -- how can I work around this?  Is there some other, more compatible text-based file format that I should have the client export his data as? I suspect that using quotation marks as text quailfiers might actually be necessary in this case, though I don't want to be to hasty about making the switch just yet.  

Also, -- is there any way that I can replace empty columns with, .. say, ... 'N/A' prior to re-saving the data as a TAB-delimited CSV file?  

Lastly, -- how on earth should I handle those additional carriage returns in the column data?  If I use the replace function to strip them out -- then I would effectively corrupt the import file, since there would be no more expected carriage returns at the end of each row of data.  

Note that the file upload routines I already have in place could easily be adapted so that they "clean up" the file in whatever way that's necessary.  I just need some tips on how to handle the unwanted characters.

Any tips or suggestions would be greatly appreciated.  

Thanks!
- Yvan









0
Comment
Question by:IDEASDesign
  • 6
  • 5
11 Comments
 
LVL 25

Expert Comment

by:dgrafx
ID: 17873015
You've gotten yourself into a bad situation.
Your problems need to be taken care of at the source (where the csv is created)
1) I think you should have your client export data (from where?) to an excel sheet instead of csv or txt.
2) You can then read the excel sheet and replace whatever is needed.
3) Inserting the data should then be fairly easy.
0
 

Author Comment

by:IDEASDesign
ID: 17873412
Ok, .. that's what I've concluded as well - that I need to implement some kind of custom macro in Excel that would prepare the data for importing and then export it to the needed format.  The ideal Excel macro would do all of the following:

1) replace all tab characters in any of the columns with spaces
2) remove all quotes from any of the columns that contain them
3) export the data as a TAB DELIMTED CSV file, -- using quote symbols as text qualifiers, -- and that exports all of the empty columns as "" along with all the other columns that contains data

You wouldn't happen to know how to program excel macros, would you?  Because I think that's what I'll be needing at this point.  The above specifications would most certainly do the trick, I think.

Thanks!
- Yvan

P.S. I don't believe it's possible to read an XLS file via CHTTP like you are able to do with a CSV or TXT file.  Or am I misken about that?  If so, how would I extract the data from the XLS file (without having to set up a new datasource)?

0
 
LVL 25

Expert Comment

by:dgrafx
ID: 17873504
When they create the xls I don't believe you need to do any processing on it - just make sure the columns are in place.
Then have them upload it via a form you create.
Then once you have the xls on the server you can:
<cfquery datasource="#Request.Datasource#" name="XLSData">
SELECT desired columns
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=#xlsfile#;HDR=YES', 'SELECT * FROM [Sheet1$]')
</cfquery>
#xlsfile# is the absolute path to xls sheet "D:\web\text.xls"
Where it says Sheet1 - this needs to be your sheet name in your file - Sheet1 is the default, but it can be renamed
This is using sql server
Your query XLSData is now available to do whatever with the data
In your output query you can then strip whatever or add whatever to get desired results - then insert to table.

If you're using access - I believe you can simply open access and click the "Open" icon selecting files of type "*.xls"
this will import your sheet into an access table where you can do whatever with the data
This would be a manual approach though.

I hope you're using sql server

0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 

Author Comment

by:IDEASDesign
ID: 17873540
No, I am not using any kind of database at all.  I merely read the contents of the uploaded CSV file using CHTTP and then turn it into a query object using <cfquery dbtype="query">.  

The method you've described, it seems, would require that I first set up a datasource in CFAdmin.  Am I correct?  

If the setting up of a ColdFusion datasource is NOT needed, .. then will this approach that you've outlined work using ColdFusion MX 6.1?  Or does it only work using ColdFusion MX  7 ?

Also, why would interfacing with an XLS file be any more efficient than the method I'm using now?  Would it be more forgiving of the missing column data and of the illegal characters in the individual columns?  I am open to using this approach, .. but of course, I would prefer not to bother with it if it'll just put me right back where I started.

If it's necessary that I first set up a ColdFusion datasource for the Excel file, ... what's the process for doing that in CFAdmin?  In other words, .. which driver do I need to choose, and are there any special settings that I need to apply?

Thanks!
- Yvan

0
 
LVL 25

Accepted Solution

by:
dgrafx earned 500 total points
ID: 17873657
Hi,
The reason you can't read a xls file as simply as a text file is because of it's internal formatting - it's a win32 app not just text.
And you can't read doing <cfquery dbtype="query"> either.

>>would require that I first set up a datasource in CFAdmin.  Am I correct?  
Yes

should work with any recent version of cf
doing in the fashion I described would yes be more efficient, for one thing - the method you're using now doesn't work.

the output will give you columns that may be null or have incorrect data, where you can then "fix"
the method you are using will not even give you a column if data there is null - as I'm sure you realized - and is one of your problems.

the clincher is the carriage returns within a column's value - when you are using a cr as delimiter - absolutely will not fly.

The concept though is not to setup a datasource for the excel file, but to read the excel file with sql server.
You do need to setup the sql server datasource and it doesn't matter what it's pointed to as long as it's a valid sql server db.
sql server then actually uses access drivers (as you see in the code) to read the file.

but since you don't use a datasource then why not use a custom tag.
Here is one that I have used in a case where sql server didn't have correct permissions to read an excel file and the host wouldn't make the changes.
CFX_Excel2Query
http://www.emerle.net/programming/display.cfm/t/cfx_excel2query
after thinking about it - this is really the way to go and you won't need to setup a datasource.
You will need to install this tag and if you are hosting where you don't have full control of the server - you'll need your host to set it up for you.

good luck
0
 

Author Comment

by:IDEASDesign
ID: 17881276
Ok, .. I've tried the method you outlined where you create an MSSQL datasource, set it up in CFadmin, and then use the sql drivers to query the uploaded xls file:

-----------------------------------------------------------

<cfset xlsfile = '#application.server_path#job_listings\joblistings.xls'>

<cfquery datasource="#application.datasource#" name="XLSData">
      SELECT JobIdentifier
      FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=#xlsfile#;HDR=YES', 'SELECT * FROM [Sheet1$]')
</cfquery>

<cfoutput query="XLSData">#JobIdentifier#<br></cfoutput>

-----------------------------------------------------------

However, .. I am getting this error, and I'm not sure why:

[Macromedia][SQLServer JDBC Driver][SQLServer]OLE DB provider 'Microsoft.Jet.OLEDB.4.0' reported an error. The provider did not give any information about the error.

VENDORERRORCODE          7399
SQLSTATE         HY000

I am open to experimenting with the custom tag yu highlighted (thank you for that), .. but want to explore the cost-free option first.  Any idea why I'm getting these errors? The mssql database that I've set up as a coldfusion datasource DOES exist, and I set it up in CFAdmin using the MSSQL driver.  

Thanks,
- Yvan


 



0
 
LVL 25

Expert Comment

by:dgrafx
ID: 17881416
First - do you control this server?
i.e. can you modify the registry?
What account is sql server logged on as? - look in services / mssql$servername
0
 

Author Comment

by:IDEASDesign
ID: 17881477
Yes, I have full admin access to the server , ... and yes I can modify the registry.

I don't know what you mean when you ask me to look in services / mssql$servername though.  

I am looking in services, and all I see listed is MSSQLSERVER and MSSQLServerADHelper

I believe, however, that sql server is logged on as 'sa', as that is what I used to configure my DSN in CFAdmin.

- Yvan


0
 

Author Comment

by:IDEASDesign
ID: 17881701
Actually -- nevermind.  I just got approval from my employer for us to purchase the excel2qery custom tag.  Thanks so much for your help.  I am awarding you the 500 pts.

Cheers,
 - Yvan
0
 
LVL 25

Expert Comment

by:dgrafx
ID: 17881747
Ok - when you first setup sql server - you can name it - you must not have so it appears as just MSSQLSERVER - no prob

sa should be good to go, but I did take a look and on our setup it's logged on as local system account.
if you need to look at that ever it would be in Services / MSSQLSERVER / log on tab

look for this registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Providers\SQLOLEDB\DisallowAdhocAccess

or search for DisallowAdhocAccess - set it's value to 0 wherever you find it.
try that
0
 
LVL 25

Expert Comment

by:dgrafx
ID: 17881776
ah - I see you posted again and awarded me the points - thank you!
yes the tag works great and considering it's cost it's probably cheaper than the time it takes to get the "free" method setup.

good luck ...
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is a guide to setting up a new WHM/cPanel Server to be used for web hosting accounts. It is intended for web hosting company administrators and dedicated server owners. For under $99 per month (considering normal rate of Big Data Cetnters like …
Most ColdFusion developers get confused between the CFSet, Duplicate, and Structcopy methods of copying a Structure, especially which one to use when. This Article will explain the differences in the approaches with examples; therefore, after readin…
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question