Solved

Read CSV file with newlines in fields?

Posted on 2004-08-22
4
500 Views
Last Modified: 2012-05-05
Hi experts,

I'm trying to read a CSV file which is the result of saving a folder of Outlook emails.  (Excel reads it fine - one row per email.)  Field 2 is the body of the message, and it contains CR/LF chars.  To get around this non-standard CSV format, I'm using the Text::CSV_XS module, with the "binary => 1" switch.  That seems to work when I call it with a 1 record 'here document', but if I open a file and read it with a 'while' loop, then of course, it only reads 1 record (delimitten by CR) at a time.  So, the record ends part way through field 2!

How can I read this CSV file so that one email is read at a time (I guess) and field 2 contains the entire body of the message?

Thanks.
0
Comment
Question by:tel2
  • 2
  • 2
4 Comments
 
LVL 19

Accepted Solution

by:
Kim Ryan earned 80 total points
ID: 11866205
There is a module to handles just this problem: http://search.cpan.org/~tilly/Text-xSV-0.11/lib/Text/xSV.pm
The documentation is quite complete. You can fefine how many elements you expect per row, and it can be set to ignore any embedded new line characters.
0
 
LVL 12

Author Comment

by:tel2
ID: 11867511
That's great, teraplane!

A few questions about that though:

1. Would you agree that the CSV_XS's "binary" option (which is stated to work with CRs & LFs) is pretty useless for most purposes, because of the difficulty of identifying, reading and feeding it one CSV record at a time?


2. Here's the code I'm now testing with xSV:

use Text::xSV;
my $csv = new Text::xSV;
$csv->open_file("Undeliv1.csv");
$csv->bind_header();
while ($csv->get_row()) {
  my ($subject,$body) = $csv->extract(qw(Subject Body));
  print "Subject = $subject\n";
}

There are actually upto 19 fields in the data, but often only 8 or so are used, so I'm getting warnings like:
  Line 16, file Undeliv1.csv had 8 fields, expected 19 at C:\Temp\vcf2csv12.pl line 13
  Line 888, file Undeliv1.csv had 17 fields, expected 8 at C:\Temp\vcf2csv12.pl line 15
The documentation says you can turn off warnings with

What exact syntax should I use to turn off the warnings?  I've tried various things with "set_row_size_warnings => 0", but I don't really understand how & where to use it.


3.  If I want to set the row size manually (using set_row_size), what syntax should I use?  I've tried a few things, but I just get errors.


Thanks.
0
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 11867548
1) yes, agree. The doco for Text::xSV highlights this limitation.

2) all the methods need to be applied to your csv object. So you can say at the top of your code:
my $csv = new Text::xSV;
$csv->set_row_size_warning = 0; # suppress warning for truncated rows

3)
$csv->set_row_size = 8; # only expecting 8 columns on the next read, can change this for each row if you want

0
 
LVL 12

Author Comment

by:tel2
ID: 11875888
teraplane,

1. OK - thanks.

2. Thanks.  It seems the reason I couldn't get set_row_size_warning to work, was, I'm using ActiveState Perl, and their repository has version 0.05, which didn't have that option, while your CPAN site has version 0.11, which does.  I don't know how to easily install from the CPAN site, but I worked around it by clicking on Source and replacing the old xSV.pm file with that.  The syntax I had to use was with "... => 0" instead of "... = 0", but close enough for me.

3. Thanks.  Similar to 2.

Good to have you on the EE team!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now