Solved

Read CSV file with newlines in fields?

Posted on 2004-08-22
4
506 Views
Last Modified: 2012-05-05
Hi experts,

I'm trying to read a CSV file which is the result of saving a folder of Outlook emails.  (Excel reads it fine - one row per email.)  Field 2 is the body of the message, and it contains CR/LF chars.  To get around this non-standard CSV format, I'm using the Text::CSV_XS module, with the "binary => 1" switch.  That seems to work when I call it with a 1 record 'here document', but if I open a file and read it with a 'while' loop, then of course, it only reads 1 record (delimitten by CR) at a time.  So, the record ends part way through field 2!

How can I read this CSV file so that one email is read at a time (I guess) and field 2 contains the entire body of the message?

Thanks.
0
Comment
Question by:tel2
  • 2
  • 2
4 Comments
 
LVL 19

Accepted Solution

by:
Kim Ryan earned 80 total points
ID: 11866205
There is a module to handles just this problem: http://search.cpan.org/~tilly/Text-xSV-0.11/lib/Text/xSV.pm
The documentation is quite complete. You can fefine how many elements you expect per row, and it can be set to ignore any embedded new line characters.
0
 
LVL 12

Author Comment

by:tel2
ID: 11867511
That's great, teraplane!

A few questions about that though:

1. Would you agree that the CSV_XS's "binary" option (which is stated to work with CRs & LFs) is pretty useless for most purposes, because of the difficulty of identifying, reading and feeding it one CSV record at a time?


2. Here's the code I'm now testing with xSV:

use Text::xSV;
my $csv = new Text::xSV;
$csv->open_file("Undeliv1.csv");
$csv->bind_header();
while ($csv->get_row()) {
  my ($subject,$body) = $csv->extract(qw(Subject Body));
  print "Subject = $subject\n";
}

There are actually upto 19 fields in the data, but often only 8 or so are used, so I'm getting warnings like:
  Line 16, file Undeliv1.csv had 8 fields, expected 19 at C:\Temp\vcf2csv12.pl line 13
  Line 888, file Undeliv1.csv had 17 fields, expected 8 at C:\Temp\vcf2csv12.pl line 15
The documentation says you can turn off warnings with

What exact syntax should I use to turn off the warnings?  I've tried various things with "set_row_size_warnings => 0", but I don't really understand how & where to use it.


3.  If I want to set the row size manually (using set_row_size), what syntax should I use?  I've tried a few things, but I just get errors.


Thanks.
0
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 11867548
1) yes, agree. The doco for Text::xSV highlights this limitation.

2) all the methods need to be applied to your csv object. So you can say at the top of your code:
my $csv = new Text::xSV;
$csv->set_row_size_warning = 0; # suppress warning for truncated rows

3)
$csv->set_row_size = 8; # only expecting 8 columns on the next read, can change this for each row if you want

0
 
LVL 12

Author Comment

by:tel2
ID: 11875888
teraplane,

1. OK - thanks.

2. Thanks.  It seems the reason I couldn't get set_row_size_warning to work, was, I'm using ActiveState Perl, and their repository has version 0.05, which didn't have that option, while your CPAN site has version 0.11, which does.  I don't know how to easily install from the CPAN site, but I worked around it by clicking on Source and replacing the old xSV.pm file with that.  The syntax I had to use was with "... => 0" instead of "... = 0", but close enough for me.

3. Thanks.  Similar to 2.

Good to have you on the EE team!
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question