• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 531
  • Last Modified:

Read CSV file with newlines in fields?

Hi experts,

I'm trying to read a CSV file which is the result of saving a folder of Outlook emails.  (Excel reads it fine - one row per email.)  Field 2 is the body of the message, and it contains CR/LF chars.  To get around this non-standard CSV format, I'm using the Text::CSV_XS module, with the "binary => 1" switch.  That seems to work when I call it with a 1 record 'here document', but if I open a file and read it with a 'while' loop, then of course, it only reads 1 record (delimitten by CR) at a time.  So, the record ends part way through field 2!

How can I read this CSV file so that one email is read at a time (I guess) and field 2 contains the entire body of the message?

Thanks.
0
tel2
Asked:
tel2
  • 2
  • 2
1 Solution
 
Kim RyanIT ConsultantCommented:
There is a module to handles just this problem: http://search.cpan.org/~tilly/Text-xSV-0.11/lib/Text/xSV.pm
The documentation is quite complete. You can fefine how many elements you expect per row, and it can be set to ignore any embedded new line characters.
0
 
tel2Author Commented:
That's great, teraplane!

A few questions about that though:

1. Would you agree that the CSV_XS's "binary" option (which is stated to work with CRs & LFs) is pretty useless for most purposes, because of the difficulty of identifying, reading and feeding it one CSV record at a time?


2. Here's the code I'm now testing with xSV:

use Text::xSV;
my $csv = new Text::xSV;
$csv->open_file("Undeliv1.csv");
$csv->bind_header();
while ($csv->get_row()) {
  my ($subject,$body) = $csv->extract(qw(Subject Body));
  print "Subject = $subject\n";
}

There are actually upto 19 fields in the data, but often only 8 or so are used, so I'm getting warnings like:
  Line 16, file Undeliv1.csv had 8 fields, expected 19 at C:\Temp\vcf2csv12.pl line 13
  Line 888, file Undeliv1.csv had 17 fields, expected 8 at C:\Temp\vcf2csv12.pl line 15
The documentation says you can turn off warnings with

What exact syntax should I use to turn off the warnings?  I've tried various things with "set_row_size_warnings => 0", but I don't really understand how & where to use it.


3.  If I want to set the row size manually (using set_row_size), what syntax should I use?  I've tried a few things, but I just get errors.


Thanks.
0
 
Kim RyanIT ConsultantCommented:
1) yes, agree. The doco for Text::xSV highlights this limitation.

2) all the methods need to be applied to your csv object. So you can say at the top of your code:
my $csv = new Text::xSV;
$csv->set_row_size_warning = 0; # suppress warning for truncated rows

3)
$csv->set_row_size = 8; # only expecting 8 columns on the next read, can change this for each row if you want

0
 
tel2Author Commented:
teraplane,

1. OK - thanks.

2. Thanks.  It seems the reason I couldn't get set_row_size_warning to work, was, I'm using ActiveState Perl, and their repository has version 0.05, which didn't have that option, while your CPAN site has version 0.11, which does.  I don't know how to easily install from the CPAN site, but I worked around it by clicking on Source and replacing the old xSV.pm file with that.  The syntax I had to use was with "... => 0" instead of "... = 0", but close enough for me.

3. Thanks.  Similar to 2.

Good to have you on the EE team!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now