asked on

take a CSV file and convert contents

Hello,

I am seeking help with some advance way either via perl or another method to convert a csv file that looks like this:

Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD


and turn it into this:

HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
Network,45.3.12.0,255.255.255.0,45.14.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD

Open in new window

Thanks for your advanced help

wilcoxon

You need to better explain the output. Is Network a constant string? Should 255.255.255.0 be hard-coded or interpreted from the /24 (in other words, does your data contain anything other than /24 (assuming that's the right mask for a /24))? Where does the IP address (45.2.12.1, 45.14.12.1, or 45.3.11.1) come from? I don't see that anywhere in the data.

richsark

ASKER

Hello Wilcoxon,.

Yes, Network is a constant string. the /24 needs to be interpreted, but I could have /28 /30 etc...

I made a copy/paste mistake: I have corrected it

S

Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD


and turn it into this:

HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
Network,45.3.12.0,255.255.255.0,45.3.11.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
Network,45.3.11.0,255.255.255.0,45.3.12.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD

Open in new window

45.2.12.1, 45.14.12.1, or 45.3.11.1) comes from the gateway address in field "EA-Gateway" or 4 column.

I corrected it as I noted above. I made a copy/paste mistake

richsark

ASKER

Hi wilcoxon,
Did my reply answer your questions ?

wilcoxon

This should do what you want...

There are still some mismatches (and a typo) between what is generated and what you listed as desired so I've included my full debugging code and the output (showing mismatched lines).

The first line has some significant formatting differences with "Dr. Parikh / Children's Heart Tele-Echo". The second line has a double-space in the input but only a single space in the expected output. Should the script modify the input strings in some way or is this just a copy-paste error?

#!/usr/local/bin/perl

use strict;
use warnings;

my @correct = (
'Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD',
'Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD',
'Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD',
);

my %masks = (
    24 => '255.255.255.0',
    # replace below with correct mask and add any others you need
    28 => '28-mask',
    30 => '30-mask',
);

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<DATA>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]+),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    my $str = "Network,$addr,$mask,$gate,$rest";
    my $corr = shift @correct;
    if ($str ne $corr) {
        my $loc = find_err($str, $corr);
        print "lines do not match:\n",
              "expected: $corr\n",
              "got:      $str\n",
              "          $loc\n\n";
    } else {
        print $str, "\n";
    }
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    warn "don't have mask $mask mapped\n";
    return 'unknown-mask';
}

sub find_err {
    my ($str1, $str2) = @_;
    my @arr1 = split //, $str1;
    my @arr2 = split //, $str2;
    my $loc = '';
    for my $i (0..@arr1-1) {
        return "$loc^" unless (defined $arr1[$i] and defined $arr2[$i] and $arr1[$i] eq $arr2[$i]);
        $loc .= '.';
    }
    return 'could not locate mismatch';
}

__DATA__
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD

Open in new window

### output from script
HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
lines do not match:
expected: Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
got:      Network,45.2.12.0,255.255.255.0,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
          ............................................^

lines do not match:
expected: Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD
got:      Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
          ...............................................^

Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD

Open in new window

richsark

ASKER

Hi Wilcoxon,

Yes, can we normalize the output so if there are too many spaces or crazy characters like, to many spaces, /, $, (,),! or anything like that, change to make the final extract exclude them. For to many spaces, just make it one space please.

Is there a way, ( if not to much to ask) include any possible netmasks /24, /32, /30, /28 etc....to calculate it for me.

I have actually attached the master csv file so you can actually see.

Thank you very much for all your help !

ININD-IP-Data.csv

wilcoxon

Here's modified code that reads the provided CSV file and outputs the modified rows. It removes the special characters you specified (if you run across more, you can just add them into the char class regex to remove them). It also removes apostrophe (as in your example) and removes period at the end of a string.

The one remaining difference in your sample is "Dr. Parikh" ends up as "DrParikh" in your expected output. Is this a mistake or what rule should be followed to make that transform?

#!/usr/local/bin/perl

use strict;
use warnings;

#my @correct = (
#'Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD',
#'Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD',
#'Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD',
#);

my %masks = (
    21 => '255.255.248.0',
    22 => '255.255.252.0',
    23 => '255.255.254.0',
    24 => '255.255.255.0',
    25 => '255.255.255.128',
    26 => '255.255.255.192',
    27 => '255.255.255.224',
    28 => '255.255.255.240',
    29 => '255.255.255.248',
    30 => '255.255.255.252',
);

open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<IN>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]+),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    $rest =~ s{[/\$!']}{}g; # remove special characters
    $rest =~ s{\s\s+}{ }g; # remove extra spaces
    $rest =~ s{\.,}{,}g; # remove periods just before comma
    my $str = "Network,$addr,$mask,$gate,$rest";
#    my $corr = shift @correct;
#    if ($str ne $corr) {
#        my $loc = find_err($str, $corr);
#        print "lines do not match:\n",
#              "expected: $corr\n",
#              "got:      $str\n",
#              "          $loc\n\n";
#    } else {
        print $str, "\n";
#    }
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    die "don't have mask $mask mapped\n";
}

sub find_err {
    my ($str1, $str2) = @_;
    my @arr1 = split //, $str1;
    my @arr2 = split //, $str2;
    my $loc = '';
    for my $i (0..@arr1-1) {
        return "$loc^" unless (defined $arr1[$i] and defined $arr2[$i] and $arr1[$i] eq $arr2[$i]);
        $loc .= '.';
    }
    return 'could not locate mismatch';
}

__DATA__
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD

Open in new window

richsark

ASKER

Hi,
For the doctor name. It's ok to have the dot in the names. But not like Dr. Parkikh... but Dr.Parikh is ok.

If I elect to run it as perl transform.pl filename.csv > inid-convert.csv would that work rather then putting in the file in the script?

Lastly, if for some reason we mised a net mask, I just need to add it on your script right?

Thank you very much!

wilcoxon

As the code is right now, it will have "Dr. Parikh" (period and single space).

Sure, just comment out or remove line 25 and change line 29 to "while (<>) {" in order to call it that way.

Correct, you just need to add a line to the masks hash with numeric to quad conversion.

richsark

ASKER

Awesome , will test this puppy out and get back to you.

Thank you wilcoxon.

richsark

ASKER

Hi Wilcoxon,

OK, seems good, but I see some entries ( on the master csv) that dont have nothing filled in for gateway. So for anything passed the word EA* for example on the header:
EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode

if there is no entry, replace the blank entry with "Not Provided"

Can that be done?

wilcoxon

Sure. This should do it...

#!/usr/local/bin/perl

use strict;
use warnings;

my %masks = (
    21 => '255.255.248.0',
    22 => '255.255.252.0',
    23 => '255.255.254.0',
    24 => '255.255.255.0',
    25 => '255.255.255.128',
    26 => '255.255.255.192',
    27 => '255.255.255.224',
    28 => '255.255.255.240',
    29 => '255.255.255.248',
    30 => '255.255.255.252',
);

open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<IN>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    $gate = 'Not Provided' unless $gate;
    $rest =~ s{[/\$!']}{}g; # remove special characters
    $rest =~ s{\s\s+}{ }g; # remove extra spaces
    $rest =~ s{\.,}{,}g; # remove periods just before comma
    my $str = "Network,$addr,$mask,$gate,$rest";
    print $str, "\n";
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    die "don't have mask $mask mapped\n";
}

Open in new window

richsark

ASKER

Hi,

Ok. Looking good Wilcoxon!

Can we also remove quotes, I am seeing its causing an miss alignment

Network,10.171.12.96,255.255.255.224,10.171.12.97,Fishers YMCA Physical Therapy site STV-DHCPA users,443,"9012 E. 126th St, Fishers, IN 46038",,,,ININD

I assume that this is done at line 33 right?

Would this be right to remove the quotes?

$rest =~ s{[/\$!\"']}{}g; # remove special characters

I have attached the code I am using so we can keep in sync

#!/usr/bin/perl

use strict;
use warnings;

my %masks = (
    21 => '255.255.248.0',
    22 => '255.255.252.0',
    23 => '255.255.254.0',
    24 => '255.255.255.0',
    25 => '255.255.255.128',
    26 => '255.255.255.192',
    27 => '255.255.255.224',
    28 => '255.255.255.240',
    29 => '255.255.255.248',
    30 => '255.255.255.252',
    32 => '255.255.255.255',
);

open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<IN>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    $gate = 'Not Provided' unless $gate;
    $rest =~ s{[/\$!']}{}g; # remove special characters
    $rest =~ s{\s\s+}{ }g; # remove extra spaces
    $rest =~ s{\.,}{,}g; # remove periods just before comma
    my $str = "Network,$addr,$mask,$gate,$rest";
    print $str, "\n";
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    die "don't have mask $mask mapped\n";
}

Open in new window

wilcoxon

It's not the quotes that are throwing it off - it's the commas that are included inside the quotes. Here is a modified version to change any embedded commas in strings into spaces (prior to removing extra spaces so you'll still only end up with at most one space).

#!/usr/bin/perl

use strict;
use warnings;

my %masks = (
    21 => '255.255.248.0',
    22 => '255.255.252.0',
    23 => '255.255.254.0',
    24 => '255.255.255.0',
    25 => '255.255.255.128',
    26 => '255.255.255.192',
    27 => '255.255.255.224',
    28 => '255.255.255.240',
    29 => '255.255.255.248',
    30 => '255.255.255.252',
    32 => '255.255.255.255',
);

open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<IN>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    $gate = 'Not Provided' unless $gate;
    $rest =~ s{[/\$!']}{}g; # remove special characters
    while ($rest =~ m{^([^"]*)"([^"]+)"}) {
        my ($beg, $qstr) = ($1, $2);
        my $ori = "$beg\"$qstr\"";
        $qstr =~ s{,}{ }g;
        $rest =~ s{^$ori}{$beg$qstr};
    }
    $rest =~ s{\s\s+}{ }g; # remove extra spaces
    $rest =~ s{\.,}{,}g; # remove periods just before comma
    my $str = "Network,$addr,$mask,$gate,$rest";
    print $str, "\n";
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    die "don't have mask $mask mapped\n";
}

Open in new window

richsark

ASKER

Hello Wilcoxon, I wanted to note that lines 598 to 594 are erroring out:

Wrong

HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,10.171.2.0,255.255.255.257,10.171.2.1,, Wired Data Users",300,136 Dixon Rd. Kokomo IN,,,,ININD

Open in new window

Correct, Notice the 1 less comma near SITECODE "ININD"

HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,10.171.2.0,255.255.255.257,10.171.2.1,, Wired Data Users",300,136 Dixon Rd. Kokomo IN,,,ININD

Open in new window

Perhaps because there is no entry under description. In this case, could we compensate for these types not to error out and simply proceed with a blank entry. If there are no entries for anything passed Description, its ok. I can still bring it in.

Importantly, The commas need to stay matched up.

I hope that makes sense.

Thanks again!

richsark

ASKER

HI,

I ran your new code, THANKS!

But now its hanging here
45.10.17.0

Can't source UNKNOWN to string in subst at C:\bin\CSV-convert.pl line 35, <> line 39.
Use of uninitialized value $rest in substitution (s///) at C:\bin\CSV-convert.pl line 38, <> line 39.
Use of uninitialized value $rest in pattern match (m//) at C:\bin\CSV-convert.pl line 38, <> line 39.

ASKER CERTIFIED SOLUTION

wilcoxon

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

richsark

ASKER

Ok got it. So as long as the csv files I have contain Same headers this script will work right?

Thanks

wilcoxon

Yes. It should work as long as the format of the files don't change.

wilcoxon

To be more specific, it really doesn't matter what the columns are provided that the first 2 columns remain the same (address/mask and gateway). If the columns after that change, the only issue will be with the header row that is output not matching (since that is a static string).

richsark

ASKER

Excellent Response and Knowledge!