• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1929
  • Last Modified:

Simple csv file conversion

I need to convert: http://basskozz.com/Convert.txt
to the following format: http://basskozz.com/Converted.txt

Can someone help me with a quick little Perl Script to do this conversion?

Things to note:
1.  Although these are "txt" files they are really just "csv" files.
2.  These aren't exact conversions, these two files are just to show you the format the data should be converted from/to.
3.  Note the dates are in different formats: "d-mmm-yy" needs to be converted to "d/m/yy"
4. Here is a list of the fields that need to be converted from/to for each record:
Docket Num = Docket_Num
Rec Date = Rec_Date
City = City
Street = Street
Plaintiff = Plaintiff
Defendant = Defendant
Print Date = Print_Date

5. The "Docket Num" field needs to have "Misc. " appended to the front of the number and placed in the "Docket_Num" field of the converted document.

Is Perl the best way to go for this, or might it be easier to use MS Access or Excel ?
Thanks in advance,
-BassKozz
0
basskozz
Asked:
basskozz
3 Solutions
 
CtzenCommented:
Perl is the easiest way to do this. But I think the code, need to be developed by you.

I can give you some parts that you can put together for your code...

#!/usr/bin/perl -w

$in_file="Convert.txt";

open(INFILE, $in_file) || die("Could not open Convert.txt!");  # read file
open(OUTFILE, ">>Converted.txt");                     # to append
@in_data=<INFILE>;

foreach $in_data (@in_data)
{
            @infile_split_line = split /, / , $in_data ;
            if ($infile_split_line[1] eq 'FAMATE')            #index of your column in []
            {
                            Do your manipulation on the formatting or changing of name for each index split.
            }

Concat each index together and write to OUTFILE.
0
 
mish33Commented:
If you are willing to take python solution, that save that to a file cvt.py
and invoke it:
  python cvt.py convert.txt converted.txt

--------------------------------------------------------------------cut here----------
def format_date(old):
    from time import strptime, strftime
    date = strptime(old, '%d-%b-%y')
    return strftime('%m/%d/%Y', date).lstrip('0')

def convert_lines(inp):
    title = [s.replace(' ', '_') for s in inp.next()]
    yield title
    for dn,rd,st,ci,pl,df,pd in inp:
        rd,pd = [format_date(d) for d in (rd,pd)]
        yield 'Misc. '+dn,rd,st,ci,pl,df,pd

def reorder(f):
    return f[0], f[1], f[3], f[2], f[4], f[5], f[6]

def csv_convertor(inp, out):
    import csv
    w = csv.writer(out)
    for line in convert_lines(csv.reader(inp)):
        w.writerow(reorder(line))

import sys
csv_convertor(open(sys.argv[1]), open(sys.argv[2], 'wb'))


0
 
jeveristCommented:
Hi BassKozz,

>  might it be easier to use MS Access or Excel ?

Well, it's pretty easy with Excel.  Try this:

Sub ReformatDocket()
Dim wb As Workbook, ws As Worksheet

Set wb = Workbooks.Open("C:\Convert.txt")
Set ws = ActiveSheet

ws.Columns("B").Insert Shift:=xlToRight

With Intersect(ws.Columns("B"), ws.UsedRange)
    .FormulaR1C1 = "=""Misc. "" &RC[-1]"
    .Value = .Value
End With

ws.Columns("A").Delete Shift:=xlToLeft

ws.Range("G:G,B:B").NumberFormat = "mm/dd/yy;@"

ws.Columns("D").Cut
ws.Columns("C").Insert Shift:=xlToRight

ws.Range("A1:G1").Value = Array("Docket_Num", "Rec_Date", "City", "Street", "Plaintiff", "Defendant", "Print_Date")

ws.Columns("A").Sort Key1:=ws.Range("A2"), Order1:=xlAscending, Header:=xlGuess

wb.SaveAs Filename:=Left(wb.FullName, Len(wb.FullName) - 4) & ".csv", FileFormat:=xlCSV

End Sub

Jim
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
mjcoyneCommented:
This seems to fit all requirements (including dealing with nasty things like commas and single quotes embedded in fields).  I left line 16 in but commented out, because I wasn't sure if you wanted to re-quote the Plaintiff field.

#!/usr/bin/perl -w
use strict;
use Text::ParseWords;
use Date::Manip qw(ParseDate UnixDate);
$ENV{TZ} = 'US/Eastern';

open (IN, "Convert.txt") or die;
open (OUT, ">Converted.txt") or die;
print OUT "Docket_Num,Rec_Date,City,Street,Plaintiff,Defendant,Print_Date\n";

while (<IN>) {
    next if (/^Docket Number/ or /^\s*$/);
    my @fields = parse_csv($_);
    $fields[1] = reformat_date($fields[1]);
    $fields[6] = reformat_date($fields[6]);
    #$fields[4] = "\"" . $fields[4] . "\"";
    $fields[0] = "Misc. " . $fields[0];
    ($fields[2], $fields[3]) = ($fields[3], $fields[2]);
    print OUT join (",", @fields), "\n";
}


sub parse_csv {
    my $line = shift;
    $line =~ s/'/\\'/;
    return quotewords(",", 0, $line);
}

sub reformat_date {
    my $old_date = shift;
    my $new_date = UnixDate(ParseDate($old_date), "%f/%e/%Y");
    $new_date =~ s/ //g;
    return $new_date;
}

0
 
mjcoyneCommented:
BTW, the Text::ParseWords module ships with Perl, but you'll have to install the Date::Manip module (see http://search.cpan.org/dist/Date-Manip/Manip.pod).  If you don't know how to do this (install a module), let us know, and we'll show you.  We'll need to know what operating system you're using (hint: basically, it's the Perl Package Manager -- PPM -- if under Windows, and the CPAN module if under *NIX).
0
 
basskozzAuthor Commented:
Thanks for the script mjcoyne,

"PPM install Date::Manip" worked flawlessly. :-)

One problem with the script thou... The "Street" field sometimes contains comma's, so for example here is what your script outputs: http://basskozz.com/mjcoyne.txt

Notice line 5:
Misc. 355078,8/14/2007,Acton,5 Hawthorne Street, Acton,"Washington Mutual Bank formerly known as Washington Mutual Bank, FA et al",Martin C. Gross et al,9/19/2007

Should read:
Misc. 355078,8/14/2007,Acton,"5 Hawthorne Street, Acton","Washington Mutual Bank formerly known as Washington Mutual Bank, FA et al",Martin C. Gross et al,9/19/2007

p.s. yah, works better with line 16 un-commented :-)
0
 
basskozzAuthor Commented:
Nevermind, I fixed it... p.s. the Defendant needed txt qualifier also...

#!/usr/bin/perl -w
use strict;
use Text::ParseWords;
use Date::Manip qw(ParseDate UnixDate);
$ENV{TZ} = 'US/Eastern';

open (IN, "Convert.txt") or die;
open (OUT, ">Converted.txt") or die;
print OUT "Docket_Num,Rec_Date,City,Street,Plaintiff,Defendant,Print_Date\n";

while (<IN>) {
    next if (/^Docket Number/ or /^\s*$/);
    my @fields = parse_csv($_);
    $fields[1] = reformat_date($fields[1]);
    $fields[6] = reformat_date($fields[6]);
    $fields[4] = "\"" . $fields[4] . "\"";
    $fields[2] = "\"" . $fields[2] . "\"";
    $fields[5] = "\"" . $fields[5] . "\"";
    $fields[0] = "Misc. " . $fields[0];
    ($fields[2], $fields[3]) = ($fields[3], $fields[2]);
    print OUT join (",", @fields), "\n";
}


sub parse_csv {
    my $line = shift;
    $line =~ s/'/\\'/;
    return quotewords(",", 0, $line);
}

sub reformat_date {
    my $old_date = shift;
    my $new_date = UnixDate(ParseDate($old_date), "%f/%e/%Y");
    $new_date =~ s/ //g;
    return $new_date;
}
0
 
basskozzAuthor Commented:
Thanks mish33 & jeverist for your solutions also ;-)
0

Featured Post

Become an Android App Developer

Ready to kick start your career in 2018? Learn how to build an Android app in January’s Course of the Month and open the door to new opportunities.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now