Solved

Read CSV file with newlines in fields?

Posted on 2004-08-22
4
508 Views
Last Modified: 2012-05-05
Hi experts,

I'm trying to read a CSV file which is the result of saving a folder of Outlook emails.  (Excel reads it fine - one row per email.)  Field 2 is the body of the message, and it contains CR/LF chars.  To get around this non-standard CSV format, I'm using the Text::CSV_XS module, with the "binary => 1" switch.  That seems to work when I call it with a 1 record 'here document', but if I open a file and read it with a 'while' loop, then of course, it only reads 1 record (delimitten by CR) at a time.  So, the record ends part way through field 2!

How can I read this CSV file so that one email is read at a time (I guess) and field 2 contains the entire body of the message?

Thanks.
0
Comment
Question by:tel2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 19

Accepted Solution

by:
Kim Ryan earned 80 total points
ID: 11866205
There is a module to handles just this problem: http://search.cpan.org/~tilly/Text-xSV-0.11/lib/Text/xSV.pm
The documentation is quite complete. You can fefine how many elements you expect per row, and it can be set to ignore any embedded new line characters.
0
 
LVL 12

Author Comment

by:tel2
ID: 11867511
That's great, teraplane!

A few questions about that though:

1. Would you agree that the CSV_XS's "binary" option (which is stated to work with CRs & LFs) is pretty useless for most purposes, because of the difficulty of identifying, reading and feeding it one CSV record at a time?


2. Here's the code I'm now testing with xSV:

use Text::xSV;
my $csv = new Text::xSV;
$csv->open_file("Undeliv1.csv");
$csv->bind_header();
while ($csv->get_row()) {
  my ($subject,$body) = $csv->extract(qw(Subject Body));
  print "Subject = $subject\n";
}

There are actually upto 19 fields in the data, but often only 8 or so are used, so I'm getting warnings like:
  Line 16, file Undeliv1.csv had 8 fields, expected 19 at C:\Temp\vcf2csv12.pl line 13
  Line 888, file Undeliv1.csv had 17 fields, expected 8 at C:\Temp\vcf2csv12.pl line 15
The documentation says you can turn off warnings with

What exact syntax should I use to turn off the warnings?  I've tried various things with "set_row_size_warnings => 0", but I don't really understand how & where to use it.


3.  If I want to set the row size manually (using set_row_size), what syntax should I use?  I've tried a few things, but I just get errors.


Thanks.
0
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 11867548
1) yes, agree. The doco for Text::xSV highlights this limitation.

2) all the methods need to be applied to your csv object. So you can say at the top of your code:
my $csv = new Text::xSV;
$csv->set_row_size_warning = 0; # suppress warning for truncated rows

3)
$csv->set_row_size = 8; # only expecting 8 columns on the next read, can change this for each row if you want

0
 
LVL 12

Author Comment

by:tel2
ID: 11875888
teraplane,

1. OK - thanks.

2. Thanks.  It seems the reason I couldn't get set_row_size_warning to work, was, I'm using ActiveState Perl, and their repository has version 0.05, which didn't have that option, while your CPAN site has version 0.11, which does.  I don't know how to easily install from the CPAN site, but I worked around it by clicking on Source and replacing the old xSV.pm file with that.  The syntax I had to use was with "... => 0" instead of "... = 0", but close enough for me.

3. Thanks.  Similar to 2.

Good to have you on the EE team!
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question