take a CSV file and convert contents

Hello,

I am seeking help with some advance way either via perl or another method to convert a csv file that looks like this:

Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD


and turn it into this:

HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
Network,45.3.12.0,255.255.255.0,45.14.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD

Open in new window


Thanks for your advanced help
LVL 1
richsarkAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

wilcoxonCommented:
You need to better explain the output.  Is Network a constant string?  Should 255.255.255.0 be hard-coded or interpreted from the /24 (in other words, does your data contain anything other than /24 (assuming that's the right mask for a /24))?  Where does the IP address (45.2.12.1, 45.14.12.1, or 45.3.11.1) come from?  I don't see that anywhere in the data.
richsarkAuthor Commented:
Hello Wilcoxon,.

Yes, Network is a constant string. the /24 needs to be interpreted, but I could have /28 /30 etc...

I made a copy/paste mistake: I have corrected it

S
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD


and turn it into this:

HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
Network,45.3.12.0,255.255.255.0,45.3.11.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
Network,45.3.11.0,255.255.255.0,45.3.12.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD

Open in new window


45.2.12.1, 45.14.12.1, or 45.3.11.1) comes from the gateway address in field "EA-Gateway" or 4 column.

I corrected it as I noted above. I made a copy/paste mistake
richsarkAuthor Commented:
Hi wilcoxon,
Did my reply answer your questions ?
OWASP Proactive Controls

Learn the most important control and control categories that every architect and developer should include in their projects.

wilcoxonCommented:
This should do what you want...

There are still some mismatches (and a typo) between what is generated and what you listed as desired so I've included my full debugging code and the output (showing mismatched lines).

The first line has some significant formatting differences with "Dr. Parikh / Children's Heart Tele-Echo".  The second line has a double-space in the input but only a single space in the expected output.  Should the script modify the input strings in some way or is this just a copy-paste error?
#!/usr/local/bin/perl

use strict;
use warnings;

my @correct = (
'Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD',
'Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD',
'Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD',
);

my %masks = (
    24 => '255.255.255.0',
    # replace below with correct mask and add any others you need
    28 => '28-mask',
    30 => '30-mask',
);

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<DATA>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]+),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    my $str = "Network,$addr,$mask,$gate,$rest";
    my $corr = shift @correct;
    if ($str ne $corr) {
        my $loc = find_err($str, $corr);
        print "lines do not match:\n",
              "expected: $corr\n",
              "got:      $str\n",
              "          $loc\n\n";
    } else {
        print $str, "\n";
    }
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    warn "don't have mask $mask mapped\n";
    return 'unknown-mask';
}

sub find_err {
    my ($str1, $str2) = @_;
    my @arr1 = split //, $str1;
    my @arr2 = split //, $str2;
    my $loc = '';
    for my $i (0..@arr1-1) {
        return "$loc^" unless (defined $arr1[$i] and defined $arr2[$i] and $arr1[$i] eq $arr2[$i]);
        $loc .= '.';
    }
    return 'could not locate mismatch';
}

__DATA__
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD

Open in new window

### output from script
HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
lines do not match:
expected: Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
got:      Network,45.2.12.0,255.255.255.0,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
          ............................................^

lines do not match:
expected: Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD
got:      Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
          ...............................................^

Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD

Open in new window

richsarkAuthor Commented:
Hi Wilcoxon,

Yes, can we normalize the output so if there are too many spaces or crazy characters like, to many spaces, /, $, (,),! or anything like that, change to make the final extract exclude them. For to many spaces, just make it one space please.

Is there a way, ( if not to much to ask) include any possible netmasks /24, /32, /30, /28 etc....to calculate it for me.

I have actually attached  the master csv file so you can actually see.

Thank you very much for all your help !


ININD-IP-Data.csv
wilcoxonCommented:
Here's modified code that reads the provided CSV file and outputs the modified rows.  It removes the special characters you specified (if you run across more, you can just add them into the char class regex to remove them).  It also removes apostrophe (as in your example) and removes period at the end of a string.

The one remaining difference in your sample is "Dr. Parikh" ends up as "DrParikh" in your expected output.  Is this a mistake or what rule should be followed to make that transform?
#!/usr/local/bin/perl

use strict;
use warnings;

#my @correct = (
#'Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD',
#'Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD',
#'Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD',
#);

my %masks = (
    21 => '255.255.248.0',
    22 => '255.255.252.0',
    23 => '255.255.254.0',
    24 => '255.255.255.0',
    25 => '255.255.255.128',
    26 => '255.255.255.192',
    27 => '255.255.255.224',
    28 => '255.255.255.240',
    29 => '255.255.255.248',
    30 => '255.255.255.252',
);

open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<IN>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]+),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    $rest =~ s{[/\$!']}{}g; # remove special characters
    $rest =~ s{\s\s+}{ }g; # remove extra spaces
    $rest =~ s{\.,}{,}g; # remove periods just before comma
    my $str = "Network,$addr,$mask,$gate,$rest";
#    my $corr = shift @correct;
#    if ($str ne $corr) {
#        my $loc = find_err($str, $corr);
#        print "lines do not match:\n",
#              "expected: $corr\n",
#              "got:      $str\n",
#              "          $loc\n\n";
#    } else {
        print $str, "\n";
#    }
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    die "don't have mask $mask mapped\n";
}

sub find_err {
    my ($str1, $str2) = @_;
    my @arr1 = split //, $str1;
    my @arr2 = split //, $str2;
    my $loc = '';
    for my $i (0..@arr1-1) {
        return "$loc^" unless (defined $arr1[$i] and defined $arr2[$i] and $arr1[$i] eq $arr2[$i]);
        $loc .= '.';
    }
    return 'could not locate mismatch';
}

__DATA__
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor  Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M  to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD

Open in new window

richsarkAuthor Commented:
Hi,
For the doctor name. It's ok to have the dot in the names. But not like Dr.    Parkikh... but Dr.Parikh is ok.

If I elect to run it as perl transform.pl filename.csv > inid-convert.csv would that work rather then putting in the file in the script?

Lastly, if for some reason we mised a net mask, I just need to add it on your script right?

Thank you very much!
wilcoxonCommented:
As the code is right now, it will have "Dr. Parikh" (period and single space).

Sure, just comment out or remove line 25 and change line 29 to "while (<>) {" in order to call it that way.

Correct, you just need to add a line to the masks hash with numeric to quad conversion.
richsarkAuthor Commented:
Awesome , will test this puppy out and get back to you.

Thank you wilcoxon.
richsarkAuthor Commented:
Hi Wilcoxon,

OK, seems good, but I see some entries ( on the master csv) that  dont have nothing filled in for gateway. So for anything passed the word EA* for example on the header:
EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode

if there is no entry, replace the blank entry with "Not Provided"

Can that be done?
wilcoxonCommented:
Sure.  This should do it...
#!/usr/local/bin/perl

use strict;
use warnings;

my %masks = (
    21 => '255.255.248.0',
    22 => '255.255.252.0',
    23 => '255.255.254.0',
    24 => '255.255.255.0',
    25 => '255.255.255.128',
    26 => '255.255.255.192',
    27 => '255.255.255.224',
    28 => '255.255.255.240',
    29 => '255.255.255.248',
    30 => '255.255.255.252',
);

open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<IN>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    $gate = 'Not Provided' unless $gate;
    $rest =~ s{[/\$!']}{}g; # remove special characters
    $rest =~ s{\s\s+}{ }g; # remove extra spaces
    $rest =~ s{\.,}{,}g; # remove periods just before comma
    my $str = "Network,$addr,$mask,$gate,$rest";
    print $str, "\n";
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    die "don't have mask $mask mapped\n";
}

Open in new window

richsarkAuthor Commented:
Hi,

Ok. Looking good Wilcoxon!

Can we also remove quotes, I am seeing its causing an miss alignment

Network,10.171.12.96,255.255.255.224,10.171.12.97,Fishers YMCA Physical Therapy site STV-DHCPA users,443,"9012 E. 126th St, Fishers, IN 46038",,,,ININD

I assume that this is done at line 33 right?

Would this be right to remove the quotes?

$rest =~ s{[/\$!\"']}{}g; # remove special characters

I have attached the code I am using so we can keep in sync
#!/usr/bin/perl

use strict;
use warnings;

my %masks = (
    21 => '255.255.248.0',
    22 => '255.255.252.0',
    23 => '255.255.254.0',
    24 => '255.255.255.0',
    25 => '255.255.255.128',
    26 => '255.255.255.192',
    27 => '255.255.255.224',
    28 => '255.255.255.240',
    29 => '255.255.255.248',
    30 => '255.255.255.252',
    32 => '255.255.255.255',
);

open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<IN>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    $gate = 'Not Provided' unless $gate;
    $rest =~ s{[/\$!']}{}g; # remove special characters
    $rest =~ s{\s\s+}{ }g; # remove extra spaces
    $rest =~ s{\.,}{,}g; # remove periods just before comma
    my $str = "Network,$addr,$mask,$gate,$rest";
    print $str, "\n";
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    die "don't have mask $mask mapped\n";
}

Open in new window

wilcoxonCommented:
It's not the quotes that are throwing it off - it's the commas that are included inside the quotes.  Here is a modified version to change any embedded commas in strings into spaces (prior to removing extra spaces so you'll still only end up with at most one space).
#!/usr/bin/perl

use strict;
use warnings;

my %masks = (
    21 => '255.255.248.0',
    22 => '255.255.252.0',
    23 => '255.255.254.0',
    24 => '255.255.255.0',
    25 => '255.255.255.128',
    26 => '255.255.255.192',
    27 => '255.255.255.224',
    28 => '255.255.255.240',
    29 => '255.255.255.248',
    30 => '255.255.255.252',
    32 => '255.255.255.255',
);

open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";

print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";

while (<IN>) {
    chomp;
    next if m{^Network,Gateway}; # skip header row
    unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
        warn "could not parse record:\n$_\n";
        next;
    }
    my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
    $gate = 'Not Provided' unless $gate;
    $rest =~ s{[/\$!']}{}g; # remove special characters
    while ($rest =~ m{^([^"]*)"([^"]+)"}) {
        my ($beg, $qstr) = ($1, $2);
        my $ori = "$beg\"$qstr\"";
        $qstr =~ s{,}{ }g;
        $rest =~ s{^$ori}{$beg$qstr};
    }
    $rest =~ s{\s\s+}{ }g; # remove extra spaces
    $rest =~ s{\.,}{,}g; # remove periods just before comma
    my $str = "Network,$addr,$mask,$gate,$rest";
    print $str, "\n";
}

sub decode_mask {
    my ($mask) = @_;
    return $masks{$mask} if exists($masks{$mask});
    die "don't have mask $mask mapped\n";
}

Open in new window

richsarkAuthor Commented:
Hello Wilcoxon, I wanted to note that lines 598 to 594 are erroring out:

Wrong
HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,10.171.2.0,255.255.255.257,10.171.2.1,, Wired Data Users",300,136 Dixon Rd. Kokomo IN,,,,ININD

Open in new window


Correct, Notice the 1 less comma near SITECODE "ININD"

HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,10.171.2.0,255.255.255.257,10.171.2.1,, Wired Data Users",300,136 Dixon Rd. Kokomo IN,,,ININD

Open in new window


Perhaps because there is no entry under description. In this case, could we compensate for these types not to error out and simply proceed with a blank entry. If there are no entries for anything passed Description, its ok. I can still bring it in.

Importantly, The commas need to stay matched up.

I hope that makes sense.

Thanks again!
richsarkAuthor Commented:
HI,

I ran your new code, THANKS!

But now its hanging here
45.10.17.0

Can't source UNKNOWN to string in subst at C:\bin\CSV-convert.pl line 35, <> line 39.
Use of uninitialized value $rest in substitution (s///) at C:\bin\CSV-convert.pl line 38, <> line 39.
Use of uninitialized value $rest in pattern match (m//) at C:\bin\CSV-convert.pl line 38, <> line 39.
wilcoxonCommented:
The one gotcha with using a variable for regex is that it can contain regex special chars.  The problem in the current code is that it is looping infinitely on line 39 of the input because line 38 of the code is never matching anything.  The fix is to change line 38 to:

$rest =~ s{^\Q$ori}{$beg$qstr};

Now it will treat all regex special chars in $ori as literals.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
richsarkAuthor Commented:
Ok got it. So as long as the csv files I have contain Same headers this script will work right?

Thanks
wilcoxonCommented:
Yes.  It should work as long as the format of the files don't change.
wilcoxonCommented:
To be more specific, it really doesn't matter what the columns are provided that the first 2 columns remain the same (address/mask and gateway).  If the columns after that change, the only issue will be with the header row that is output not matching (since that is a static string).
richsarkAuthor Commented:
Excellent Response and Knowledge!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.