• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 529
  • Last Modified:

Read CSV file with newlines in fields?

Hi experts,

I'm trying to read a CSV file which is the result of saving a folder of Outlook emails.  (Excel reads it fine - one row per email.)  Field 2 is the body of the message, and it contains CR/LF chars.  To get around this non-standard CSV format, I'm using the Text::CSV_XS module, with the "binary => 1" switch.  That seems to work when I call it with a 1 record 'here document', but if I open a file and read it with a 'while' loop, then of course, it only reads 1 record (delimitten by CR) at a time.  So, the record ends part way through field 2!

How can I read this CSV file so that one email is read at a time (I guess) and field 2 contains the entire body of the message?

  • 2
  • 2
1 Solution
Kim RyanIT ConsultantCommented:
There is a module to handles just this problem: http://search.cpan.org/~tilly/Text-xSV-0.11/lib/Text/xSV.pm
The documentation is quite complete. You can fefine how many elements you expect per row, and it can be set to ignore any embedded new line characters.
tel2Author Commented:
That's great, teraplane!

A few questions about that though:

1. Would you agree that the CSV_XS's "binary" option (which is stated to work with CRs & LFs) is pretty useless for most purposes, because of the difficulty of identifying, reading and feeding it one CSV record at a time?

2. Here's the code I'm now testing with xSV:

use Text::xSV;
my $csv = new Text::xSV;
while ($csv->get_row()) {
  my ($subject,$body) = $csv->extract(qw(Subject Body));
  print "Subject = $subject\n";

There are actually upto 19 fields in the data, but often only 8 or so are used, so I'm getting warnings like:
  Line 16, file Undeliv1.csv had 8 fields, expected 19 at C:\Temp\vcf2csv12.pl line 13
  Line 888, file Undeliv1.csv had 17 fields, expected 8 at C:\Temp\vcf2csv12.pl line 15
The documentation says you can turn off warnings with

What exact syntax should I use to turn off the warnings?  I've tried various things with "set_row_size_warnings => 0", but I don't really understand how & where to use it.

3.  If I want to set the row size manually (using set_row_size), what syntax should I use?  I've tried a few things, but I just get errors.

Kim RyanIT ConsultantCommented:
1) yes, agree. The doco for Text::xSV highlights this limitation.

2) all the methods need to be applied to your csv object. So you can say at the top of your code:
my $csv = new Text::xSV;
$csv->set_row_size_warning = 0; # suppress warning for truncated rows

$csv->set_row_size = 8; # only expecting 8 columns on the next read, can change this for each row if you want

tel2Author Commented:

1. OK - thanks.

2. Thanks.  It seems the reason I couldn't get set_row_size_warning to work, was, I'm using ActiveState Perl, and their repository has version 0.05, which didn't have that option, while your CPAN site has version 0.11, which does.  I don't know how to easily install from the CPAN site, but I worked around it by clicking on Source and replacing the old xSV.pm file with that.  The syntax I had to use was with "... => 0" instead of "... = 0", but close enough for me.

3. Thanks.  Similar to 2.

Good to have you on the EE team!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now