richsark
asked on
take a CSV file and convert contents
Hello,
I am seeking help with some advance way either via perl or another method to convert a csv file that looks like this:
Thanks for your advanced help
I am seeking help with some advance way either via perl or another method to convert a csv file that looks like this:
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
and turn it into this:
HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
Network,45.3.12.0,255.255.255.0,45.14.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD
Thanks for your advanced help
You need to better explain the output. Is Network a constant string? Should 255.255.255.0 be hard-coded or interpreted from the /24 (in other words, does your data contain anything other than /24 (assuming that's the right mask for a /24))? Where does the IP address (45.2.12.1, 45.14.12.1, or 45.3.11.1) come from? I don't see that anywhere in the data.
ASKER
Hello Wilcoxon,.
Yes, Network is a constant string. the /24 needs to be interpreted, but I could have /28 /30 etc...
I made a copy/paste mistake: I have corrected it
S
45.2.12.1, 45.14.12.1, or 45.3.11.1) comes from the gateway address in field "EA-Gateway" or 4 column.
I corrected it as I noted above. I made a copy/paste mistake
Yes, Network is a constant string. the /24 needs to be interpreted, but I could have /28 /30 etc...
I made a copy/paste mistake: I have corrected it
S
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
and turn it into this:
HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
Network,45.3.12.0,255.255.255.0,45.3.11.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
Network,45.3.11.0,255.255.255.0,45.3.12.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD
45.2.12.1, 45.14.12.1, or 45.3.11.1) comes from the gateway address in field "EA-Gateway" or 4 column.
I corrected it as I noted above. I made a copy/paste mistake
ASKER
Hi wilcoxon,
Did my reply answer your questions ?
Did my reply answer your questions ?
This should do what you want...
There are still some mismatches (and a typo) between what is generated and what you listed as desired so I've included my full debugging code and the output (showing mismatched lines).
The first line has some significant formatting differences with "Dr. Parikh / Children's Heart Tele-Echo". The second line has a double-space in the input but only a single space in the expected output. Should the script modify the input strings in some way or is this just a copy-paste error?
There are still some mismatches (and a typo) between what is generated and what you listed as desired so I've included my full debugging code and the output (showing mismatched lines).
The first line has some significant formatting differences with "Dr. Parikh / Children's Heart Tele-Echo". The second line has a double-space in the input but only a single space in the expected output. Should the script modify the input strings in some way or is this just a copy-paste error?
#!/usr/local/bin/perl
use strict;
use warnings;
my @correct = (
'Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD',
'Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD',
'Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD',
);
my %masks = (
24 => '255.255.255.0',
# replace below with correct mask and add any others you need
28 => '28-mask',
30 => '30-mask',
);
print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";
while (<DATA>) {
chomp;
next if m{^Network,Gateway}; # skip header row
unless (m{^([\d\.]+)/(\d+),([\d\.]+),(.*)$}) {
warn "could not parse record:\n$_\n";
next;
}
my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
my $str = "Network,$addr,$mask,$gate,$rest";
my $corr = shift @correct;
if ($str ne $corr) {
my $loc = find_err($str, $corr);
print "lines do not match:\n",
"expected: $corr\n",
"got: $str\n",
" $loc\n\n";
} else {
print $str, "\n";
}
}
sub decode_mask {
my ($mask) = @_;
return $masks{$mask} if exists($masks{$mask});
warn "don't have mask $mask mapped\n";
return 'unknown-mask';
}
sub find_err {
my ($str1, $str2) = @_;
my @arr1 = split //, $str1;
my @arr2 = split //, $str2;
my $loc = '';
for my $i (0..@arr1-1) {
return "$loc^" unless (defined $arr1[$i] and defined $arr2[$i] and $arr1[$i] eq $arr2[$i]);
$loc .= '.';
}
return 'could not locate mismatch';
}
__DATA__
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
### output from script
HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
lines do not match:
expected: Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
got: Network,45.2.12.0,255.255.255.0,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
............................................^
lines do not match:
expected: Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD
got: Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
...............................................^
Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
ASKER
Hi Wilcoxon,
Yes, can we normalize the output so if there are too many spaces or crazy characters like, to many spaces, /, $, (,),! or anything like that, change to make the final extract exclude them. For to many spaces, just make it one space please.
Is there a way, ( if not to much to ask) include any possible netmasks /24, /32, /30, /28 etc....to calculate it for me.
I have actually attached the master csv file so you can actually see.
Thank you very much for all your help !
ININD-IP-Data.csv
Yes, can we normalize the output so if there are too many spaces or crazy characters like, to many spaces, /, $, (,),! or anything like that, change to make the final extract exclude them. For to many spaces, just make it one space please.
Is there a way, ( if not to much to ask) include any possible netmasks /24, /32, /30, /28 etc....to calculate it for me.
I have actually attached the master csv file so you can actually see.
Thank you very much for all your help !
ININD-IP-Data.csv
Here's modified code that reads the provided CSV file and outputs the modified rows. It removes the special characters you specified (if you run across more, you can just add them into the char class regex to remove them). It also removes apostrophe (as in your example) and removes period at the end of a string.
The one remaining difference in your sample is "Dr. Parikh" ends up as "DrParikh" in your expected output. Is this a mistake or what rule should be followed to make that transform?
The one remaining difference in your sample is "Dr. Parikh" ends up as "DrParikh" in your expected output. Is this a mistake or what rule should be followed to make that transform?
#!/usr/local/bin/perl
use strict;
use warnings;
#my @correct = (
#'Network,45.2.12.0,255.255.255.0,45.2.12.1,DrParikh Childrens Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD',
#'Network,45.3.11.0,255.255.255.0,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp,,,,ININD',
#'Network,45.3.12.0,255.255.255.0,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD',
#);
my %masks = (
21 => '255.255.248.0',
22 => '255.255.252.0',
23 => '255.255.254.0',
24 => '255.255.255.0',
25 => '255.255.255.128',
26 => '255.255.255.192',
27 => '255.255.255.224',
28 => '255.255.255.240',
29 => '255.255.255.248',
30 => '255.255.255.252',
);
open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";
print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";
while (<IN>) {
chomp;
next if m{^Network,Gateway}; # skip header row
unless (m{^([\d\.]+)/(\d+),([\d\.]+),(.*)$}) {
warn "could not parse record:\n$_\n";
next;
}
my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
$rest =~ s{[/\$!']}{}g; # remove special characters
$rest =~ s{\s\s+}{ }g; # remove extra spaces
$rest =~ s{\.,}{,}g; # remove periods just before comma
my $str = "Network,$addr,$mask,$gate,$rest";
# my $corr = shift @correct;
# if ($str ne $corr) {
# my $loc = find_err($str, $corr);
# print "lines do not match:\n",
# "expected: $corr\n",
# "got: $str\n",
# " $loc\n\n";
# } else {
print $str, "\n";
# }
}
sub decode_mask {
my ($mask) = @_;
return $masks{$mask} if exists($masks{$mask});
die "don't have mask $mask mapped\n";
}
sub find_err {
my ($str1, $str2) = @_;
my @arr1 = split //, $str1;
my @arr2 = split //, $str2;
my $loc = '';
for my $i (0..@arr1-1) {
return "$loc^" unless (defined $arr1[$i] and defined $arr2[$i] and $arr1[$i] eq $arr2[$i]);
$loc .= '.';
}
return 'could not locate mismatch';
}
__DATA__
Network,Gateway:,Description:,VLAN:,Site:,Purpose:,IGP ID:,IGP Number:,Sitecode:
45.2.12.0/24,45.2.12.1,Dr. Parikh / Children's Heart Tele-Echo,,2P3A 3rd Floor Cardiology POB,,,,ININD
45.3.11.0/24,45.3.11.1,SV1M to BSRLAN e4.2,,SV1M 1st Floor 86th St. Hosp.,,,,ININD
45.3.12.0/24,45.3.12.1,Marten House - Hope Lodge,527,Hope Lodge (Closet used to be in MH),,,,ININD
ASKER
Hi,
For the doctor name. It's ok to have the dot in the names. But not like Dr. Parkikh... but Dr.Parikh is ok.
If I elect to run it as perl transform.pl filename.csv > inid-convert.csv would that work rather then putting in the file in the script?
Lastly, if for some reason we mised a net mask, I just need to add it on your script right?
Thank you very much!
For the doctor name. It's ok to have the dot in the names. But not like Dr. Parkikh... but Dr.Parikh is ok.
If I elect to run it as perl transform.pl filename.csv > inid-convert.csv would that work rather then putting in the file in the script?
Lastly, if for some reason we mised a net mask, I just need to add it on your script right?
Thank you very much!
As the code is right now, it will have "Dr. Parikh" (period and single space).
Sure, just comment out or remove line 25 and change line 29 to "while (<>) {" in order to call it that way.
Correct, you just need to add a line to the masks hash with numeric to quad conversion.
Sure, just comment out or remove line 25 and change line 29 to "while (<>) {" in order to call it that way.
Correct, you just need to add a line to the masks hash with numeric to quad conversion.
ASKER
Awesome , will test this puppy out and get back to you.
Thank you wilcoxon.
Thank you wilcoxon.
ASKER
Hi Wilcoxon,
OK, seems good, but I see some entries ( on the master csv) that dont have nothing filled in for gateway. So for anything passed the word EA* for example on the header:
EA-Gateway,EA-Description, EA-VLAN,EA -Site,EA-P urpose,EA- IGP ID,EA-IGP Number,EA-Sitecode
if there is no entry, replace the blank entry with "Not Provided"
Can that be done?
OK, seems good, but I see some entries ( on the master csv) that dont have nothing filled in for gateway. So for anything passed the word EA* for example on the header:
EA-Gateway,EA-Description,
if there is no entry, replace the blank entry with "Not Provided"
Can that be done?
Sure. This should do it...
#!/usr/local/bin/perl
use strict;
use warnings;
my %masks = (
21 => '255.255.248.0',
22 => '255.255.252.0',
23 => '255.255.254.0',
24 => '255.255.255.0',
25 => '255.255.255.128',
26 => '255.255.255.192',
27 => '255.255.255.224',
28 => '255.255.255.240',
29 => '255.255.255.248',
30 => '255.255.255.252',
);
open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";
print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";
while (<IN>) {
chomp;
next if m{^Network,Gateway}; # skip header row
unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
warn "could not parse record:\n$_\n";
next;
}
my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
$gate = 'Not Provided' unless $gate;
$rest =~ s{[/\$!']}{}g; # remove special characters
$rest =~ s{\s\s+}{ }g; # remove extra spaces
$rest =~ s{\.,}{,}g; # remove periods just before comma
my $str = "Network,$addr,$mask,$gate,$rest";
print $str, "\n";
}
sub decode_mask {
my ($mask) = @_;
return $masks{$mask} if exists($masks{$mask});
die "don't have mask $mask mapped\n";
}
ASKER
Hi,
Ok. Looking good Wilcoxon!
Can we also remove quotes, I am seeing its causing an miss alignment
Network,10.171.12.96,255.2 55.255.224 ,10.171.12 .97,Fisher s YMCA Physical Therapy site STV-DHCPA users,443,"9012 E. 126th St, Fishers, IN 46038",,,,ININD
I assume that this is done at line 33 right?
Would this be right to remove the quotes?
$rest =~ s{[/\$!\"']}{}g; # remove special characters
I have attached the code I am using so we can keep in sync
Ok. Looking good Wilcoxon!
Can we also remove quotes, I am seeing its causing an miss alignment
Network,10.171.12.96,255.2
I assume that this is done at line 33 right?
Would this be right to remove the quotes?
$rest =~ s{[/\$!\"']}{}g; # remove special characters
I have attached the code I am using so we can keep in sync
#!/usr/bin/perl
use strict;
use warnings;
my %masks = (
21 => '255.255.248.0',
22 => '255.255.252.0',
23 => '255.255.254.0',
24 => '255.255.255.0',
25 => '255.255.255.128',
26 => '255.255.255.192',
27 => '255.255.255.224',
28 => '255.255.255.240',
29 => '255.255.255.248',
30 => '255.255.255.252',
32 => '255.255.255.255',
);
open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";
print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";
while (<IN>) {
chomp;
next if m{^Network,Gateway}; # skip header row
unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
warn "could not parse record:\n$_\n";
next;
}
my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
$gate = 'Not Provided' unless $gate;
$rest =~ s{[/\$!']}{}g; # remove special characters
$rest =~ s{\s\s+}{ }g; # remove extra spaces
$rest =~ s{\.,}{,}g; # remove periods just before comma
my $str = "Network,$addr,$mask,$gate,$rest";
print $str, "\n";
}
sub decode_mask {
my ($mask) = @_;
return $masks{$mask} if exists($masks{$mask});
die "don't have mask $mask mapped\n";
}
It's not the quotes that are throwing it off - it's the commas that are included inside the quotes. Here is a modified version to change any embedded commas in strings into spaces (prior to removing extra spaces so you'll still only end up with at most one space).
#!/usr/bin/perl
use strict;
use warnings;
my %masks = (
21 => '255.255.248.0',
22 => '255.255.252.0',
23 => '255.255.254.0',
24 => '255.255.255.0',
25 => '255.255.255.128',
26 => '255.255.255.192',
27 => '255.255.255.224',
28 => '255.255.255.240',
29 => '255.255.255.248',
30 => '255.255.255.252',
32 => '255.255.255.255',
);
open IN, 'ININD-IP-Data.csv' or die "could not open csv: $!";
print "HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode\n";
while (<IN>) {
chomp;
next if m{^Network,Gateway}; # skip header row
unless (m{^([\d\.]+)/(\d+),([\d\.]*),(.*)$}) {
warn "could not parse record:\n$_\n";
next;
}
my ($addr, $mask, $gate, $rest) = ($1, decode_mask($2), $3, $4);
$gate = 'Not Provided' unless $gate;
$rest =~ s{[/\$!']}{}g; # remove special characters
while ($rest =~ m{^([^"]*)"([^"]+)"}) {
my ($beg, $qstr) = ($1, $2);
my $ori = "$beg\"$qstr\"";
$qstr =~ s{,}{ }g;
$rest =~ s{^$ori}{$beg$qstr};
}
$rest =~ s{\s\s+}{ }g; # remove extra spaces
$rest =~ s{\.,}{,}g; # remove periods just before comma
my $str = "Network,$addr,$mask,$gate,$rest";
print $str, "\n";
}
sub decode_mask {
my ($mask) = @_;
return $masks{$mask} if exists($masks{$mask});
die "don't have mask $mask mapped\n";
}
ASKER
Hello Wilcoxon, I wanted to note that lines 598 to 594 are erroring out:
Wrong
Correct, Notice the 1 less comma near SITECODE "ININD"
Perhaps because there is no entry under description. In this case, could we compensate for these types not to error out and simply proceed with a blank entry. If there are no entries for anything passed Description, its ok. I can still bring it in.
Importantly, The commas need to stay matched up.
I hope that makes sense.
Thanks again!
Wrong
HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,10.171.2.0,255.255.255.257,10.171.2.1,, Wired Data Users",300,136 Dixon Rd. Kokomo IN,,,,ININD
Correct, Notice the 1 less comma near SITECODE "ININD"
HEADER-Network,Address*,Netmask*,EA-Gateway,EA-Description,EA-VLAN,EA-Site,EA-Purpose,EA-IGP ID,EA-IGP Number,EA-Sitecode
Network,10.171.2.0,255.255.255.257,10.171.2.1,, Wired Data Users",300,136 Dixon Rd. Kokomo IN,,,ININD
Perhaps because there is no entry under description. In this case, could we compensate for these types not to error out and simply proceed with a blank entry. If there are no entries for anything passed Description, its ok. I can still bring it in.
Importantly, The commas need to stay matched up.
I hope that makes sense.
Thanks again!
ASKER
HI,
I ran your new code, THANKS!
But now its hanging here
45.10.17.0
Can't source UNKNOWN to string in subst at C:\bin\CSV-convert.pl line 35, <> line 39.
Use of uninitialized value $rest in substitution (s///) at C:\bin\CSV-convert.pl line 38, <> line 39.
Use of uninitialized value $rest in pattern match (m//) at C:\bin\CSV-convert.pl line 38, <> line 39.
I ran your new code, THANKS!
But now its hanging here
45.10.17.0
Can't source UNKNOWN to string in subst at C:\bin\CSV-convert.pl line 35, <> line 39.
Use of uninitialized value $rest in substitution (s///) at C:\bin\CSV-convert.pl line 38, <> line 39.
Use of uninitialized value $rest in pattern match (m//) at C:\bin\CSV-convert.pl line 38, <> line 39.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Ok got it. So as long as the csv files I have contain Same headers this script will work right?
Thanks
Thanks
Yes. It should work as long as the format of the files don't change.
To be more specific, it really doesn't matter what the columns are provided that the first 2 columns remain the same (address/mask and gateway). If the columns after that change, the only issue will be with the header row that is output not matching (since that is a static string).
ASKER
Excellent Response and Knowledge!