summer_soccer
asked on
perl script takes too long to finish
I have wrote a perl script to parse and process a lot of large gzipped text files line by line. There are about 700k .gz files, and the total size of these gzip files are around 120G. The decompressed files should be tens of magnitude larger.
I found that it takes about 8 hours to process even 800 gz files. So it will take about one year to finish all of them with such processing speed.
I am wondering why perl takes so long to process them. will it be possible for me to improve the running speed by re-write the code in c?
I found that it takes about 8 hours to process even 800 gz files. So it will take about one year to finish all of them with such processing speed.
I am wondering why perl takes so long to process them. will it be possible for me to improve the running speed by re-write the code in c?
As Tintin says, if we saw your code and had some idea of the manipulations you were performing on the files, maybe we could offer suggestions to speed the task up. In answer to your other question, a C program would very likely be much faster, but we don't know yet what part of the operation is causing the process to be so slow -- the manipulations done by the script on the contents of the archives, or the actions done on the archives themselves (decompressing and perhaps recompressing, for example).
ASKER
Okay, I have copied my code below. It is very long, more than 1000 lines.
#!/usr/bin/perl -w
use strict;
use File::Find;
use File::Basename;
use DBI;
use Net::IP;
use Net::Patricia;
use Time::Local;
## start-time is the selected starting time, end-time is the selected ending time for traceroute data processing, prefixasfile is the prefix-as mapping file name, inconsistent-as-path-outpu t is the file to store inconsistent aspath entries, inconsistent-pop-path-outp ut is the file to store inconsistent poppath entries, discarding-stats-output is the file to store files and traceroutes being discarded
if($#ARGV != 7) {
print "usage: process-traceroute.pl start-time end-time good-traceroute-file-list corrupted-traceroute-file- list prefix-as-mapping-file inconsistent-as-path-outpu t inconsistent-pop-path-outp ut policy-filtering-stats-out put\n";
print "start-time and end-time in format YYMMDDHHMMSS\n";
exit(1);
}
my ($startingtime, $endingtime, $goodfilelist, $corruptedfilelist, $prefixasfile, $aspathoutput, $poppathoutput, $policyoutput) = @ARGV;
## open the prefix-as mapping file and store them in the Patricia handler
open(INPUT1, "<$prefixasfile") || die "cannot open $prefixasfile file for read.";
## open the good file list
open(INPUT2, "<$goodfilelist") || die "cannot open $goodfilelist file for read.";
## open the corrupted file list
open(INPUT3, "<$corruptedfilelist") || die "cannot open $corruptedfilelist file for read.";
## open the inconsistent as-path file for write
open(OUTPUT1, ">$aspathoutput") || die "cannot open $aspathoutput file for write.";
## open the inconsistent pop-path file for write
open(OUTPUT2, ">$poppathoutput") || die "cannot open $poppathoutput file for write.";
## open the policy-filtering-stats-out put file for write
open(OUTPUT3, ">$policyoutput") || die "cannot open $policyoutput file for write.";
my $pt = new Net::Patricia;
my $prefixt = new Net::Patricia;
$startingtime =~ /(\d{2})(\d{2})(\d{2})(\d{ 2})(\d{2}) (\d{2})/;
my ($yy1, $mm1, $dd1, $hh1, $min1, $ss1) = ($1, $2, $3, $4, $5, $6);
$endingtime =~ /(\d{2})(\d{2})(\d{2})(\d{ 2})(\d{2}) (\d{2})/;
my ($yy2, $mm2, $dd2, $hh2, $min2, $ss2) = ($1, $2, $3, $4, $5, $6);
print OUTPUT3 "Traceroute files between 20$yy1-$mm1-$dd1 $hh1:$min1:$ss1 and 20$yy2-$mm2-$dd2 $hh2:$min2:$ss2 are checked.\n";
my $fileasn;
my $begintime = time(); ## get time in seconds since 1970
print OUTPUT3 "The code starts time is $begintime\n";
my @filecache = ();
while(my $oneline = <INPUT1>) {
push(@filecache, $oneline);
}
foreach my $oneline (@filecache) {
chomp($oneline);
my ($oneprefix, $oneas) = split(/\s+/, $oneline);
if($oneprefix =~ /\d{1,3}(.\d{1,3}){3}\/\d{ 1,2}/) {
$pt->add_string($oneprefix , $oneas);
$prefixt->add_string($onep refix);
}
}
close(INPUT1);
my %timefilehash = ();
my $totalfiles = 0;
my $corruptedfiles = 0;
my $workingfiles = 0;
@filecache = ();
while(my $file=<INPUT2>) {
push(@filecache, $file);
}
foreach my $file (@filecache) {
chomp($file);
if( -f $file && $file =~ /_(\d{12})_re_\d+/ ) {
my $cnttime = $1;
my $diff1 = to_seconds($cnttime) - to_seconds($startingtime);
my $diff2 = to_seconds($endingtime) - to_seconds($cnttime);
if( $diff1 >= 0 && $diff2 >= 0 ) {
$totalfiles++;
$workingfiles++;
my $timefiles = $timefilehash{$cnttime};
if(not(defined($timefiles) )) {
$timefilehash{$cnttime} = [$file];
}
else {
push(@{$timefilehash{$cntt ime}}, $file);
}
}
}
}
@filecache = ();
while(my $file=<INPUT3>) {
push(@filecache, $file);
}
foreach my $file (@filecache) {
chomp($file);
if(-f $file && $file =~ /_(\d{12})_re_\d+/ ) {
my $cnttime = $1;
my $diff1 = to_seconds($cnttime) - to_seconds($startingtime);
my $diff2 = to_seconds($endingtime) - to_seconds($cnttime);
if( $diff1 >= 0 && $diff2 >= 0 ) {
$totalfiles++;
$corruptedfiles++;
}
}
}
my @files = ();
sub to_seconds
{
use integer;
my $x = $_[0];
my $year = "20".substr($x,0,2);
my $mo = substr($x,2,2);
my $day = substr($x,4,2);
my $hour = substr($x,6,2);
my $minute = substr($x,8,2);
my $second = substr($x,10,2);
my $t = timelocal($second,$minute, $hour,$day ,$mo - 1,$year - 1900);
return($t);
}
my $numgroups= keys %timefilehash;
for my $onetime (sort keys %timefilehash) {
my @cntset = @{$timefilehash{$onetime}} ;
my $goodfiles = $#cntset+1;
foreach my $onefile (@{$timefilehash{$onetime} }) {
push(@files, $onefile);
print "The file is $onefile\n";
}
}
# connect to mySQL database for later data query and retrieval
my $dsn = "DBI:mysql:test_bm"; # data source name
my $user_name = "root"; # user name
my $password = "NewPw"; # password
my %ipasntable = ();
my %iplockeytable = ();
my %lockeyloctable = ();
my %iploctable = ();
my %bgphash = ();
my %igphash = ();
# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## select ipAddress, asn, lockey from the ipAddress table
my $sth = $dbh->prepare("SELECT ipAddress, asn, locKey FROM ipAddress");
$sth->execute();
## fetch query results from ipAddress table
while(my @ary = $sth->fetchrow_array()) {
my ($cntip, $cntasn, $cntkey) = @ary;
if($cntasn ne "NULL") {
if($cntasn > 0) {
$ipasntable{$cntip} = $cntasn;
}
}
else {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($cntip);
if(defined($cntasn)) {
$ipasntable{$cntip} = $cntasn;
}
}
if($cntkey ne "NULL") {
if($cntkey > 1) {
$iplockeytable{$cntip} = $cntkey;
}
}
}
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## select lockey, locName from the location table
$sth = $dbh->prepare("SELECT locKey, locName FROM location");
$sth->execute();
## fetch query results from location table
while(my @ary = $sth->fetchrow_array()) {
my ($cntkey, $cntloc) = @ary;
if($cntkey ne "NULL") {
if($cntkey > 1) {
$lockeyloctable{$cntkey} = $cntloc;
}
}
}
while ( my ($oneip, $onekey) = each(%iplockeytable) ) {
my $oneloc = $lockeyloctable{$onekey};
my $oneasn = $ipasntable{$oneip};
# print "For ip $oneip, its ASN is $oneasn, its PoP is $oneloc\n";
$iploctable{$oneip} = $oneloc;
}
## release iplockeytable and lockeyloctable memory
%iplockeytable = ();
%lockeyloctable = ();
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## drop inferred BGP table if it exists
my $bgpdrop = "
DROP TABLE IF EXISTS bgp";
$sth = $dbh->prepare($bgpdrop);
$sth->execute();
## create inferred BGP table
my $bgpcreate = "
CREATE TABLE bgp (
bkey int(12) unsigned NOT NULL auto_increment,
ptime datetime NOT NULL,
tstart datetime NOT NULL,
vpip varchar(16) NOT NULL,
dip varchar(24) NOT NULL,
cntas int(8) unsigned NOT NULL,
cntpop varchar(32) NOT NULL,
nextas int(8) unsigned,
nextpop varchar(32),
aspath varchar(64),
PRIMARY KEY (bkey)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8";
$sth = $dbh->prepare($bgpcreate);
$sth->execute();
## drop intra-AS PoP-path table if it exists
my $poppathdrop = "
DROP TABLE IF EXISTS poppath";
$sth = $dbh->prepare($poppathdrop );
$sth->execute();
## create intra-AS PoP-path table
my $poppathcreate = "
CREATE TABLE poppath (
pkey int(12) unsigned NOT NULL auto_increment,
ptime datetime NOT NULL,
tstart datetime NOT NULL,
vpip varchar(16) NOT NULL,
dip varchar(16) NOT NULL,
asn int(8) unsigned NOT NULL,
srcpop varchar(32) NOT NULL,
dstpop varchar(32) NOT NULL,
poppath varchar(256) NOT NULL,
ippathlen int(4) NOT NULL,
PRIMARY KEY (pkey)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8";
$sth = $dbh->prepare($poppathcrea te);
$sth->execute();
## subroutine to check whether an ASN is a targeted ASN
sub istargetas {
my $asn = $_;
if($asn eq "1239" || $asn eq "16631" || $asn eq "1668" || $asn eq "209" ||
$asn eq "2828" || $asn eq "2856" || $asn eq "2914" || $asn eq "3257" ||
$asn eq "3320" || $asn eq "3356" || $asn eq "3549" || $asn eq "3561" ||
$asn eq "5511" || $asn eq "6395" || $asn eq "6453" || $asn eq "6461" ||
$asn eq "701" || $asn eq "7018") {
return 1;
}
else {
return 0;
}
}
sub bgpcontains {
my ($first, $second) = @_;
foreach my $one (@{$first}) {
if($one eq $second) {
return 1;
}
}
return 0;
}
sub igpcontains {
my ($first, $second) = @_;
foreach my $one (@{$first}) {
if($one eq $second) {
return 1;
}
}
return 0;
}
my $totalnextingress = 0;
my $nextingressmorethanonesta rskipped = 0;
my $nextingressonestarnounkno wnincluded = 0; ## current last PoP is not NULL, next hop is *
my $nextingressonestarunknown included = 0; ## current last PoP is NULL, next hop is *
my $nextingressnostarnounknow nincluded = 0; ## current last PoP is not NULL, next hop is non-*
my $nextingressnostarunknowni ncluded = 0; ## current last PoP is NULL, next hop is non-*;
my $totalsameasnexthop = 0;
my $sameasnexthopegresstwounk nownsdisca rded = 0;
my $sameasnexthopegressnotunk nown = 0;
my $sameasnexthopegressunknow n = 0;
my $totalaspath = 0;
my $aspathmorethanonestarskip ped = 0;
my $aspathonestarincluded = 0;
my $aspathnostarincluded = 0;
my $totalpoppath = 0;
my $poppathmorethanoneunknown skipped = 0;
my $poppathoneunknownincluded = 0;
my $poppathnounknownincluded = 0;
## subroutine to process one traceroute, create bgp entries and poppath entries, and insert entries into bgp table and poppath table
sub processoneprobe {
my ($probetime, $starttime, @hops) = @_;
my $asstr = "";
my $lastasn="-1";
my @asgroups = ();
my $cntgroup = "";
my $cntindex = 0;
my $srchop = $hops[0];
my $dsthop = $hops[$#hops];
my ($srcip, $dummy1, $dummy2, $srcasn, $srcpop) = split(/:/, $srchop);
my ($dstip, $dummy3, $dummy4, $dstasn, $dstpop) = split(/:/, $dsthop);
my @noduplicates = ();
## remove duplicate IPs in the hops
my $lastip = "-1";
for(my $i=0; $i<=$#hops; $i++) {
if($hops[$i] eq "*") {
push(@noduplicates, $hops[$i]);
}
else {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, $hops[$i]);
if($cntip ne $lastip) {
push(@noduplicates, $hops[$i]);
$lastip = $cntip;
}
}
}
@hops = @noduplicates;
## get as-path and divide hops into AS groups by getting each AS group's hop indices
foreach my $onehop (@hops) {
## skip stars
if($onehop eq "*") {
$cntindex++;
next;
}
my ($cntIP, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $onehop);
## this is the first AS in the traceroute path
if($lastasn eq "-1") {
if($cntasn ne "NULL" && $cntasn ne "0") {
$asstr = $cntasn;
$cntgroup = $cntindex;
$lastasn = $cntasn;
}
}
else { ## Non-first AS in the traceroute path
if($cntasn ne "NULL" && $cntasn ne "0" && $cntasn ne $lastasn) {
push(@asgroups, $cntgroup);
$asstr .= ">$cntasn";
$cntgroup = $cntindex;
$lastasn = $cntasn;
}
elsif($cntasn ne "NULL" && $cntasn ne "0" && $cntasn eq $lastasn) {
$cntgroup .= ":$cntindex";
}
if($cntindex == $#hops) {
push(@asgroups, $cntgroup);
}
}
$cntindex++;
}
## get the PoP-level paths for the as groups
my @groups = ();
foreach my $onegroup (@asgroups) {
my (@indices) = split(/:/, $onegroup);
my $firstindex = $indices[0];
my $lastindex = $indices[$#indices];
my $groupstr = "";
my $lastpop = "-1";
my $i;
for($i=$firstindex; $i<=$lastindex; $i++) {
my $cnthop = $hops[$i];
if($cnthop eq "*") {
if($groupstr eq "") {
$groupstr .= $cnthop;
$lastpop = "NULL";
}
else {
$groupstr .= "|$cnthop";
$lastpop = "NULL";
}
}
else {
my ($cntIP, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $cnthop);
if($cntpop eq "NULL") {
if($groupstr eq "") {
$groupstr .= "$cntIP:$cntasn:$cntpop:$i "; ## i is the index of the hop in hops
$lastpop = "NULL";
}
else {
$groupstr .= "|$cntIP:$cntasn:$cntpop:$ i";
$lastpop = "NULL";
}
}
elsif($cntpop ne $lastpop) {
if($groupstr eq "") {
$groupstr .= "$cntIP:$cntasn:$cntpop:$i ";
$lastpop = $cntpop;
}
else {
$groupstr .= "|$cntIP:$cntasn:$cntpop:$ i";
$lastpop = $cntpop;
}
}
}
} ## end for($i ... ...)
push(@groups, $groupstr);
}
## begin to process these groups one by one
my $ii; ## index for groups
for($ii=0; $ii<=$#groups; $ii++) {
my $onegroup = $groups[$ii];
my (@cntpops) = split(/\|/, $onegroup);
my ($cntIP, $cntasn, $cntpop, $cntindex) = split(/:/, $cntpops[0]);
if($cntasn != $fileasn) {
next;
}
my $cntlastasnpop = $cntpops[$#cntpops];
my $afterstars = 0;
my $afterhasstar = 0;
if($cntlastasnpop eq "*") { ## the current last hop is not *
print "Current asnpop is *. \n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my $tmpi;
my ($cntlastIP, $cntlastasn, $cntlastpop, $cntlastindex) = split(/:/, $cntlastasnpop);
for($tmpi=$cntlastindex+1; $tmpi<=$#hops; $tmpi++) {
if($hops[$tmpi] eq "*") {
$afterstars++;
$afterhasstar = 1;
if($afterstars == 2) {
last;
}
}
else {
$afterstars = 0;
}
}
my $partialaspath;
if($afterstars < 2) {
### get the partial as path from current as to the destination as
my (@ashops) = split(/>/, $asstr);
my $asindex;
my $jj;
for($jj=0; $jj<=$#ashops; $jj++) {
my $cntasn = $ashops[$jj];
if($cntasn == $cntlastasn) {
$asindex = $jj;
next;
}
}
$partialaspath = $ashops[$asindex];
for($jj=$asindex+1; $jj<=$#ashops; $jj++) {
$partialaspath .= ">$ashops[$jj]";
}
}
else {
$partialaspath = "NULL"; ## do not use the AS path if there exists two consecutive "*" after the current AS
}
## begin to test previous egress--next ingress PoP entry and populate it into the bgp table
if($ii<$#groups) { ## it is not the last AS
my $nextgroup = $groups[$ii+1];
my (@nextpops) = split(/\|/, $nextgroup);
my $nextfirstasnpop = $nextpops[0];
my $nextegresstype = 0;
my ($cntsecondlastIP, $cntsecondlastasn, $cntsecondlastpop, $secondcntlastindex);
if($cntlastasn ne "NULL" && $cntlastpop ne "NULL") { ## current last pop is valid
$nextegresstype = 1;
}
elsif($#cntpops >= 1) {
my $cntsecondlastasnpop = $cntpops[$#cntpops-1];
if($cntsecondlastasnpop ne "*") {
($cntsecondlastIP, $cntsecondlastasn, $cntsecondlastpop, $secondcntlastindex) = split(/:/, $cntsecondlastasnpop);
if($cntsecondlastasn ne "NULL" && $cntsecondlastpop ne "NULL") {
$nextegresstype = 2;
}
}
}
if($nextegresstype > 0) {
if($nextfirstasnpop ne "*") {
my ($nextfirstIP, $nextfirstasn, $nextfirstpop, $nextfirstindex) = split(/:/, $nextfirstasnpop);
## check whether there are more than 1 consecutive * hop between the last PoP and next-AS ingress IP
my $betweenstars = 0;
for($tmpi=$cntlastindex+1; $tmpi<$nextfirstindex; $tmpi++) {
if($hops[$tmpi] eq "*") {
$betweenstars++;
}
}
my $key;
if($nextegresstype == 1) {
$key = "$srcip<$dstip<$cntlastasn <$cntlastp op<$startt ime";
}
else {
$key = "$srcip<$dstip<$cntsecondl astasn<$cn tsecondlas tpop<$star ttime";
}
my $oneentry;
if($betweenstars >= 2) {
$oneentry = "$probetime<$nextfirstasn< NULL<$part ialaspath" ;
}
else {
$oneentry = "$probetime<$nextfirstasn< $nextfirst IP<$partia laspath";
}
my $entries = $bgphash{$key};
if(not(defined($entries))) {
$totalnextingress++;
if($betweenstars >= 2) {
$nextingressmorethanonesta rskipped++ ;
}
else {
if($betweenstars == 1 && $nextegresstype == 1) {
$nextingressonestarnounkno wnincluded ++;
}
elsif($betweenstars == 1 && $nextegresstype == 2) {
$nextingressonestarunknown included++ ;
}
elsif($betweenstars == 0 && $nextegresstype == 1) {
$nextingressnostarnounknow nincluded+ +;
}
elsif($betweenstars == 0 && $nextegresstype == 2) {
$nextingressnostarunknowni ncluded++;
}
} ## end if($betweenstars >= 2) { ... } else { ... }
$totalaspath++;
if($afterstars >= 2) {
$aspathmorethanonestarskip ped++;
}
elsif($afterhasstar == 1) {
$aspathonestarincluded++;
}
else {
$aspathnostarincluded++;
}
if($betweenstars < 2 || $afterstars < 2) {
@{$bgphash{$key}} = ($oneentry);
}
}
else {
if(bgpcontains(\@{$entries }, $oneentry) == 0) {
$totalnextingress++;
if($betweenstars >= 2) {
$nextingressmorethanonesta rskipped++ ;
}
else {
if($betweenstars == 1 && $nextegresstype == 1) {
$nextingressonestarnounkno wnincluded ++;
}
elsif($betweenstars == 1 && $nextegresstype == 2) {
$nextingressonestarunknown included++ ;
}
elsif($betweenstars == 0 && $nextegresstype == 1) {
$nextingressnostarnounknow nincluded+ +;
}
elsif($betweenstars == 0 && $nextegresstype == 2) {
$nextingressnostarunknowni ncluded++;
}
} ## end if($betweenstars >= 2) { ... } else { ... }
$totalaspath++;
if($afterstars >= 2) {
$aspathmorethanonestarskip ped++;
}
elsif($afterhasstar == 1) {
$aspathonestarincluded++;
}
else {
$aspathnostarincluded++;
}
if($betweenstars < 2 || $afterstars < 2) {
push(@{$bgphash{$key}}, $oneentry);
}
}
}
} ## end if($nextfirstasnpop ne "*")
} ## end if($nextegresstype > 0)
}
## begin to generate bgp and IGP PoP-path entries for this group
my (@asnpops) = split(/\|/, $onegroup);
if($asnpops[0] eq "*") {
print "The first pop is *.\n";
print "The asnpops are @asnpops\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
if($asnpops[$#asnpops] eq "*") {
print "The last PoP is *\n";
print "The asnpops are @asnpops\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my ($egressIP, $egressasn, $egresspop, $egressindex) = split(/:/, $asnpops[$#asnpops]);
my $egresstype = 0; ## this variable keeps the type of the egress
if($egresspop ne "NULL") {
$egresstype = 1;
}
else {
if($#asnpops>=1) {
if($asnpops[$#asnpops-1] ne "*") {
($egressIP, $egressasn, $egresspop, $egressindex) = split(/:/, $asnpops[$#asnpops-1]);
if($egresspop ne "NULL") {
$egresstype = 2;
}
}
}
}
my $i;
my $j;
for($i=0; $i<=$#asnpops-1; $i++) {
if($asnpops[$i] eq "*") {
next;
}
my ($startIP, $startasn, $startpop, $startindex) = split(/:/, $asnpops[$i]);
if($startasn eq "NULL" || $startpop eq "NULL" || $startpop eq $egresspop) {
next;
}
$totalaspath++;
if($afterstars >= 2) {
$aspathmorethanonestarskip ped++;
}
elsif($afterhasstar == 1) {
$aspathonestarincluded++;
}
else {
$aspathnostarincluded++;
}
my $key = "$srcip<$dstip<$startasn<$ startpop<$ starttime" ;
my $entries = $bgphash{$key};
my $oneentry = "$probetime<$egressasn<$eg resspop<$p artialaspa th";
if(not(defined($entries))) {
$totalsameasnexthop++;
if($egresstype != 0) {
if($egresstype == 1) {
$sameasnexthopegressnotunk nown++;
}
elsif($egresstype == 2) {
$sameasnexthopegressunknow n++;
}
@{$bgphash{$key}} = ($oneentry);
}
else {
$sameasnexthopegresstwounk nownsdisca rded++;
}
}
else {
if(bgpcontains(\@{$entries }, $oneentry) == 0) {
$totalsameasnexthop++;
if($egresstype != 0) {
if($egresstype == 1) {
$sameasnexthopegressnotunk nown++;
}
elsif($egresstype == 2) {
$sameasnexthopegressunknow n++;
}
push(@{$bgphash{$key}}, $oneentry);
}
else {
$sameasnexthopegresstwounk nownsdisca rded++;
}
}
}
my $poppath = $startpop;
## begin to process the right-side paths of the current pop
for($j=$i+1; $j<=$#asnpops; $j++) {
if($asnpops[$j] eq "*") {
$poppath .= ">*";
next;
}
else {
my ($endip, $endasn, $endpop, $endindex) = split(/:/, $asnpops[$j]);
if($endasn ne "NULL" && $endasn ne $startasn) {
print "The end pop asn is $endasn, and the start asn is $startasn\n";
print "The group is $onegroup\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
if($endpop eq "NULL") {
$poppath .= ">NULL";
next;
}
else {
$poppath .= ">$endpop";
## begin to inspect and compress the PoP-path
my @pops = split(/>/, $poppath);
my @knownpops = ();
for(my $tmpkk=0; $tmpkk<=$#pops; $tmpkk++) {
## keep known PoP indices into PoP list
if($pops[$tmpkk] ne "*" && $pops[$tmpkk] ne "NULL") {
push(@knownpops, $tmpkk);
}
}
if($knownpops[0] != 0 || $knownpops[$#knownpops] != $#pops) {
print "The first known PoP index is $knownpops[0], and the last known PoP index is $knownpops[$#knownpops].\n ";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my $twounknownnotbetweensameP oP = 0;
my $oneunknownnotbetweensameP oP = 0;
my $newpoppath = "$pops[0]";
my $lastkeptPoP = $pops[0];
my $lastPoPindex = 0;
for(my $tmpkk=1; $tmpkk<=$#knownpops; $tmpkk++) {
my $cntPoP = $pops[$knownpops[$tmpkk]];
if($cntPoP ne $lastkeptPoP) {
## there are more than one NULL PoP or * between current PoP and last known PoP
if($knownpops[$tmpkk]-$las tPoPindex > 2) {
$twounknownnotbetweensameP oP = 1;
last;
}
## there is one NULL PoP or * between current PoP and last known PoP, add a wild card
elsif($knownpops[$tmpkk]-$ lastPoPind ex == 2) {
if($oneunknownnotbetweensa mePoP == 0) { ## this is the first NULL PoP or * between two known PoPs
$oneunknownnotbetweensameP oP = 1;
$lastPoPindex = $knownpops[$tmpkk];
$lastkeptPoP = $cntPoP;
$newpoppath .= ">*>$cntPoP";
}
else {
$twounknownnotbetweensameP oP = 1;
last;
}
}
else { ## this is no NULL PoP or * between two known PoPs
$lastPoPindex = $knownpops[$tmpkk];
$lastkeptPoP = $cntPoP;
$newpoppath .= ">$cntPoP";
}
}
}
## begin to find the earliest IP index in the same PoP as startIP
my $earliestindex = $startindex;
for(my $lll=$startindex; $lll>=0; $lll--) {
my $cnthop = $hops[$lll];
if($hops[$lll] ne "*") {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, $hops[$lll]);
if($cntasn eq $startasn && $cntpop eq $startpop) {
$earliestindex = $lll;
}
elsif($cntpop ne "NULL" && $cntpop ne $startpop) {
last;
}
elsif($cntasn ne "NULL" && $cntasn ne $startasn) {
last;
}
}
}
## begin to find the latest IP index in the same PoP as endIP
my $latestindex = $endindex;
for(my $lll=$endindex; $lll<=$#hops; $lll++) {
my $cnthop = $hops[$lll];
if($hops[$lll] ne "*") {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, $hops[$lll]);
if($cntasn eq $endasn && $cntpop eq $endpop) {
$latestindex = $lll;
}
elsif($cntpop ne "NULL" && $cntpop ne $endpop) {
last;
}
elsif($cntasn ne "NULL" && $cntasn ne $endasn) {
last;
}
}
}
my $ippathlen = $latestindex - $earliestindex + 1;
my $key = "$srcip<$dstip<$startasn<$ startpop<$ endpop<$st arttime";
my $oneentry = "$probetime<$newpoppath<$i ppathlen";
my $entries = $igphash{$key};
if(not(defined($entries))) {
$totalpoppath++;
if($twounknownnotbetweensa mePoP == 1) {
$poppathmorethanoneunknown skipped++;
}
else {
if($oneunknownnotbetweensa mePoP == 1) {
$poppathoneunknownincluded ++;
}
else {
$poppathnounknownincluded+ +;
}
@{$igphash{$key}} = ($oneentry);
}
}
else {
if(igpcontains(\@{$entries }, $oneentry) == 0) {
$totalpoppath++;
if($twounknownnotbetweensa mePoP == 1) {
$poppathmorethanoneunknown skipped++;
}
else {
if($oneunknownnotbetweensa mePoP == 1) {
$poppathoneunknownincluded ++;
}
else {
$poppathnounknownincluded+ +;
}
push(@{$igphash{$key}}, $oneentry);
}
} ## end if(igpcontains(\@{$entries }, $oneentry) == 0) { ... }
} ## end if(not(defined($entries))) { ... } else { ... }
} ## end if($endpop eq "NULL") { ... } else { ... }
} ## end if($asnpops[$j] eq "*") { ... } else { ... }
} ## end for($j=$i+1; $j<=$#asnpops; $j++)
} ## end for($i=0; $i<=$#asnpops-1; $i++)
} ## end for($ii=0; $ii<=$#groups; $ii++)
}
sub populatehashes {
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
my $str = "LOCK TABLES bgp WRITE";
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO bgp (ptime, tstart, vpip, dip, cntas, cntpop, nextas, nextpop, aspath) VALUES ";
my $first = 1;
my $entrycount = 0;
while ( my ($key, $val) = each(%bgphash) ) {
my ($srcip, $dstip, $asn, $pop, $starttime) = split(/</, $key);
my $count=0;
my $strtowrite = "";
$entrycount++;
foreach my $one (@{$val}) {
my ($probetime, $nextasn, $nextpop, $aspath) = split(/</, $one);
$count++;
if($first == 1) {
$str .= "('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$pop', $nextasn, '$nextpop', '$aspath')";
$first = 0;
}
else {
$str .= ", ('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$pop', $nextasn, '$nextpop', '$aspath')";
}
$strtowrite .= "$probetime\t$starttime\t$ srcip\t$ds tip\t$asn\ t$pop\t$ne xtasn\t$ne xtpop\t$as path\n";
}
if($count>1) {
print OUTPUT2 "$strtowrite";
}
if($entrycount >= 3000) {
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO bgp (ptime, tstart, vpip, dip, cntas, cntpop, nextas, nextpop, aspath) VALUES ";
$first = 1;
$entrycount = 0;
}
}
if($entrycount > 0) {
$sth = $dbh->prepare($str);
$sth->execute();
}
$str = "LOCK TABLES poppath WRITE";
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO poppath (ptime, tstart, vpip, dip, asn, srcpop, dstpop, poppath, ippathlen) VALUES ";
$first = 1;
$entrycount = 0;
while ( my ($key, $val) = each(%igphash) ) {
my ($srcip, $dstip, $asn, $srcpop, $dstpop, $starttime) = split(/</, $key);
my $count = 0;
my $strtowrite="";
$entrycount++;
foreach my $one (@{$val}) {
my ($probetime, $poppath, $ippathlen) = split(/</, $one);
$count++;
if($first == 1) {
$str .= "('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$srcpop', '$dstpop', '$poppath', '$ippathlen')";
$first = 0;
}
else {
$str .= ", ('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$srcpop', '$dstpop', '$poppath', '$ippathlen')";
}
$strtowrite .= "$probetime\t$starttime\t$ srcip\t$ds tip\t$asn\ t$srcpop\t $dstpop\t$ poppath\t$ ippathlen\ n";
}
if($count>1) {
print OUTPUT1 "$strtowrite";
}
if($entrycount >= 3000) {
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO poppath (ptime, tstart, vpip, dip, asn, srcpop, dstpop, poppath, ippathlen) VALUES ";
$first = 1;
$entrycount = 0;
}
}
if($entrycount>0) {
$sth = $dbh->prepare($str);
$sth->execute();
}
$str = "UNLOCK TABLES";
$sth = $dbh->prepare($str);
$sth->execute();
%bgphash = ();
%igphash = ();
}
my $filecount = 0;
my $discardedIPloops = 0;
my $discardedPoPloops = 0;
my $discardedASloops = 0;
my $processedtraceroutes = 0;
sub discardpath {
my (@hops) = @_;
## check for IP, PoP, AS loops
my %iphash = ();
my %pophash = ();
my %ashash = ();
my $lastip = "-1";
my $lastasn = -1;
my $lastpop = "-1";
my $aspath = "";
foreach my $onehop (@hops) {
if($onehop ne "*") {
my ($ip, $rtt, $ttl, $asn, $pop) = split(/:/, $onehop);
if($ip ne $lastip) {
if(not(defined($iphash{$ip }))) {
$iphash{$ip} = 1;
$lastip = $ip;
}
else {
$discardedIPloops++;
## print "The traceroute @hops contains an IP loop.\n";
## print "$ip appeared more than once.\n";
return 1;
}
}
if($asn ne "NULL") {
if($pop ne "NULL") {
my $cntpop = "$asn->$pop";
if($cntpop ne $lastpop) {
if(not(defined($pophash{$c ntpop}))) {
$pophash{$cntpop} = 1;
$lastpop = $cntpop;
}
else {
$discardedPoPloops++;
## print "The traceroute @hops contains a PoP loop.\n";
return 2;
}
}
}
$aspath .= "$asn|";
if($asn ne $lastasn) {
if(not(defined($ashash{$as n}))) {
$ashash{$asn} = 1;
$lastasn = $asn;
}
else {
$discardedASloops++;
return 3;
}
}
}
}
}
return 0;
}
my $totaltraceroutes = 0;
my $processbegintime = time(); ## get time in seconds since 1970
my $lastprocesstime = $processbegintime;
my $cntprocesstime;
my $totalprocesstime = 0;
# begin to process traceroute plain texts, and map intermediate IPs to their AS numbers and locations (POPs)
foreach my $file (@files) {
my $lastinserttime="-1"; ## this stores the old data collection hour
# this is traceroute plain text file
if(-f $file) {
open(INPUT1, "zcat $file | ") || die "can't open file $file for read";
$filecount++;
print "File $filecount is $file\n";
my $tag;
my $date;
my $time;
my $srcaddr;
my $arrow;
my $dstaddr;
my $icmpstatus;
my $hopcount;
my @hops = ();
my $lastindex=0;
my $dummy;
my $cntindex=0;
my $cntIP;
my $cntrtt;
my $cntttl;
my $firsthop;
my $lasthop;
## begin to extract probing time from the file name
# extension is in the format of .*
$file =~ /.*\/([^\/]+)/;
my $filename = $1;
my ($site, $datetime, $re, $targetasn) = split(/_/, $filename);
$fileasn = $targetasn;
$datetime =~ /(\d\d)(\d\d)(\d\d)(\d\d)( \d\d)(\d\d )/;
my ($yy, $mm, $dd, $hh, $min, $ss) = ($1, $2, $3, $4, $5, $6);
my $yyyy;
my $probetime;
my $starttime;
$probetime = "20$yy-$mm-$dd $hh:$min:$ss";
my $startmin = 15*int(($min/15));
$starttime = "20$yy-$mm-$dd $hh:$startmin:00";
@filecache = ();
while(my $line = <INPUT1>) {
push(@filecache, $line);
}
## begin to process lines iteratively
foreach my $line (@filecache) {
chomp($line);
## this is the beginning of a new traceroute probe
if($line =~ /->/) {
($tag, $date, $time, $srcaddr, $arrow, $dstaddr, $icmpstatus, $hopcount) = split(/\s+/, $line);
## find out the asn and pop for the first IP
my $oneIP;
if(not($oneIP = new Net::IP($srcaddr))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
my $IPnum = $oneIP->intip();
my $cntasn;
my $cntpop;
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($srcaddr );
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$firsthop = "$srcaddr:-1:$-1:$cntasn:$ cntpop";
if($lastinserttime ne "-1") {
my $oldtime = to_seconds($lastinserttime );
my $newtime = to_seconds("$yy$mm$dd$hh$m in$ss");
my $timediff = $newtime - $oldtime;
## populate bgp and igp hash into database if it has been 3 hours since last insertion
if( $timediff > 10800 ) {
$cntprocesstime = time();
$totalprocesstime += $cntprocesstime - $lastprocesstime;
$lastprocesstime = $processbegintime;
populatehashes();
$lastinserttime = "$yy$mm$dd$hh$min$ss";
}
}
else {
$lastinserttime = "$yy$mm$dd$hh$min$ss";
}
## find out the asn and pop for the last IP
if(not($oneIP = new Net::IP($dstaddr))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
$IPnum = $oneIP->intip();
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($dstaddr );
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$lasthop = "$dstaddr:-1:$-1:$cntasn:$ cntpop";
}
elsif($line =~ /duration/) {
next;
}
elsif($line !~ /^\s*$/) { # process the line if it contains other than white spaces
($dummy, $cntindex, $cntIP, $cntrtt, $cntttl) = split(/\s+/, $line);
if($lastindex != 0) { # not the beginning of the first hop
my $numstars = $cntindex-$lastindex-1;
my $i;
## fill in * for those missing hops
for($i=0; $i<$numstars; $i++) {
push(@hops, "*");
}
}
# print "Current ip is $cntIP\n";
my $oneIP;
if(not($oneIP = new Net::IP($cntIP))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
my $IPnum = $oneIP->intip();
my $cntasn;
my $cntpop;
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($cntIP);
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
my $hopstr = "$cntIP:$cntrtt:$cntttl:$c ntasn:$cnt pop";
push(@hops, $hopstr);
$lastindex = $cntindex;
}
elsif($line =~ /^\s*$/) { # this is the white space lines
## begin to process one traceroute probing result
if($#hops >=0) {
my @cnthops = ($firsthop, @hops, $lasthop);
$totaltraceroutes++;
if(discardpath(@cnthops) == 0) {
$processedtraceroutes++;
processoneprobe($probetime , $starttime, @cnthops);
}
}
## last line is the end of one traceroute probe
if($lastindex != 0) {
@hops = (); # reset hops array to empty
$lastindex = 0; # reset lastindex to 0
}
}
}
close(INPUT1);
}
}
$cntprocesstime = time();
$totalprocesstime += $cntprocesstime - $lastprocesstime;
$lastprocesstime = $processbegintime;
populatehashes();
$sth->finish ();
$dbh->disconnect ();
my $ratio;
print OUTPUT3 "$totalfiles traceroute files are checked.\n";
print OUTPUT3 "There are $numgroups groups(s) of probing in the checking interval.\n";
for my $onetime (sort keys %timefilehash) {
my @cntset = @{$timefilehash{$onetime}} ;
my $goodfiles = $#cntset+1;
print OUTPUT3 "At time $onetime, there are $goodfiles good files.\n";
foreach my $onefile (@{$timefilehash{$onetime} }) {
print "The file is $onefile\n";
}
}
print OUTPUT3 "\n\n";
$ratio = $corruptedfiles/$totalfile s;
print OUTPUT3 "$corruptedfiles, counted as $ratio, traceroute files are corrupted and are discarded.\n";
$ratio = $workingfiles/$totalfiles;
print OUTPUT3 "$workingfiles, counted as $ratio, traceroute files are good.\n\n\n";
print OUTPUT3 "$totaltraceroutes traceroute paths are parsed.\n";
$ratio = $discardedIPloops/$totaltr aceroutes;
print OUTPUT3 "$discardedIPloops, counted as $ratio, traceroute paths are discarded due to IP loops.\n";
$ratio = $discardedPoPloops/$totalt raceroutes ;
print OUTPUT3 "$discardedPoPloops, counted as $ratio, traceroute paths are discarded due to PoP loops.\n";
$ratio = $discardedASloops/$totaltr aceroutes;
print OUTPUT3 "$discardedASloops traceroute paths, counted as $ratio, are discarded due to AS loops.\n";
$ratio = $processedtraceroutes/$tot altracerou tes;
print OUTPUT3 "$processedtraceroutes, counted as $ratio, traceroutes are processed.\n\n\n";
print OUTPUT3 "$totalnextingress BGP next-ingresses are checked.\n";
$ratio = $nextingressmorethanonesta rskipped/$ totalnexti ngress;
print OUTPUT3 "$nextingressmorethanonest arskipped, counted as $ratio, BGP next-ingresses are discarded due to more than one consecutive stars between the egress and next-ingress IP.\n";
$ratio = $nextingressonestarnounkno wnincluded /$totalnex tingress;
print OUTPUT3 "$nextingressonestarnounkn owninclude d, counted as $ratio, BGP next-ingresses are included with last IP in AS mapped to PoP and one * between the egress and next-ingress IP.\n";
$ratio = $nextingressonestarunknown included/$ totalnexti ngress;
print OUTPUT3 "$nextingressonestarunknow nincluded, counted as $ratio, BGP next-ingresses are included with last IP in AS unmapped while second last IP mapped to PoP, and one * between the last IP and next-ingress IP.\n";
$ratio = $nextingressnostarnounknow nincluded/ $totalnext ingress;
print OUTPUT3 "$nextingressnostarnounkno wnincluded , counted as $ratio, BGP next-ingresses are included with last IP in AS mapped to PoP and no star between the egress and next-ingress IP.\n";
$ratio = $nextingressnostarunknowni ncluded/$t otalnextin gress;
print OUTPUT3 "$nextingressnostarunknown included, counted as $ratio, BGP next-ingresses are included with last IP in AS unmapped while second last IP mapped to PoP, and no star between the egress and next-ingress IP.\n\n\n";
print OUTPUT3 "$totalsameasnexthop BGP same-AS egresses are checked.\n";
$ratio = $sameasnexthopegresstwounk nownsdisca rded/$tota lsameasnex thop;
print OUTPUT3 "$sameasnexthopegresstwoun knownsdisc arded, counted as $ratio, BGP same-AS egresses are discarded due to last and second last IPs in AS unmapped.\n";
$ratio = $sameasnexthopegressnotunk nown/$tota lsameasnex thop;
print OUTPUT3 "$sameasnexthopegressnotun known, counted as $ratio, BGP same-AS egresses are included with last IP in AS mapped to PoP.\n";
$ratio = $sameasnexthopegressunknow n/$totalsa measnextho p;
print OUTPUT3 "$sameasnexthopegressunkno wn, counted as $ratio, BGP same-AS egresses are included with last IP in AS unmapped and second last IP mapped.\n\n\n";
print OUTPUT3 "$totalaspath BGP AS-paths are checked.\n";
$ratio = $aspathmorethanonestarskip ped/$total aspath;
print OUTPUT3 "$aspathmorethanonestarski pped, counted as $ratio, BGP AS-paths are discarded due to more than one consecutive stars on IP-path from current AS to destination host.\n";
$ratio = $aspathonestarincluded/$to talaspath;
print OUTPUT3 "$aspathonestarincluded, counted as $ratio, BGP AS-paths are included with single star(s) on IP-path from current AS to destination host.\n";
$ratio = $aspathnostarincluded/$tot alaspath;
print OUTPUT3 "$aspathnostarincluded, counted as $ratio, BGP AS-paths are included with no star on IP-path from current AS to destination host.\n\n\n";
print OUTPUT3 "$totalpoppath IGP PoP-paths are checked.\n";
$ratio = $poppathmorethanoneunknown skipped/$t otalpoppat h;
print OUTPUT3 "$poppathmorethanoneunknow nskipped, counted as $ratio, IGP PoP-paths are discarded due to more than one consecutive unknown PoPs (either NULL PoP or *).\n";
$ratio = $poppathoneunknownincluded /$totalpop path;
print OUTPUT3 "$poppathoneunknowninclude d, counted as $ratio, IGP PoP-paths are included with single unknown PoP(s) (either NULL PoP or *).\n";
$ratio = $poppathnounknownincluded/ $totalpopp ath;
print OUTPUT3 "$poppathnounknownincluded , counted as $ratio, IGP PoP-paths are included with no unknown PoP(s) (either NULL PoP or *).\n\n\n";
my $stoptime = time(); ## get time in seconds since 1970
my $elapsedtime = $stoptime - $begintime;
print OUTPUT3 "The code stops at time $stoptime\n\n";
print OUTPUT3 "The code runs $elapsedtime seconds\n";
print OUTPUT3 "The traceroute processing runs $totalprocesstime seconds\n";
close(OUTPUT1);
close(OUTPUT2);
close(OUTPUT3);
exit (0);
#!/usr/bin/perl -w
use strict;
use File::Find;
use File::Basename;
use DBI;
use Net::IP;
use Net::Patricia;
use Time::Local;
## start-time is the selected starting time, end-time is the selected ending time for traceroute data processing, prefixasfile is the prefix-as mapping file name, inconsistent-as-path-outpu
if($#ARGV != 7) {
print "usage: process-traceroute.pl start-time end-time good-traceroute-file-list corrupted-traceroute-file-
print "start-time and end-time in format YYMMDDHHMMSS\n";
exit(1);
}
my ($startingtime, $endingtime, $goodfilelist, $corruptedfilelist, $prefixasfile, $aspathoutput, $poppathoutput, $policyoutput) = @ARGV;
## open the prefix-as mapping file and store them in the Patricia handler
open(INPUT1, "<$prefixasfile") || die "cannot open $prefixasfile file for read.";
## open the good file list
open(INPUT2, "<$goodfilelist") || die "cannot open $goodfilelist file for read.";
## open the corrupted file list
open(INPUT3, "<$corruptedfilelist") || die "cannot open $corruptedfilelist file for read.";
## open the inconsistent as-path file for write
open(OUTPUT1, ">$aspathoutput") || die "cannot open $aspathoutput file for write.";
## open the inconsistent pop-path file for write
open(OUTPUT2, ">$poppathoutput") || die "cannot open $poppathoutput file for write.";
## open the policy-filtering-stats-out
open(OUTPUT3, ">$policyoutput") || die "cannot open $policyoutput file for write.";
my $pt = new Net::Patricia;
my $prefixt = new Net::Patricia;
$startingtime =~ /(\d{2})(\d{2})(\d{2})(\d{
my ($yy1, $mm1, $dd1, $hh1, $min1, $ss1) = ($1, $2, $3, $4, $5, $6);
$endingtime =~ /(\d{2})(\d{2})(\d{2})(\d{
my ($yy2, $mm2, $dd2, $hh2, $min2, $ss2) = ($1, $2, $3, $4, $5, $6);
print OUTPUT3 "Traceroute files between 20$yy1-$mm1-$dd1 $hh1:$min1:$ss1 and 20$yy2-$mm2-$dd2 $hh2:$min2:$ss2 are checked.\n";
my $fileasn;
my $begintime = time(); ## get time in seconds since 1970
print OUTPUT3 "The code starts time is $begintime\n";
my @filecache = ();
while(my $oneline = <INPUT1>) {
push(@filecache, $oneline);
}
foreach my $oneline (@filecache) {
chomp($oneline);
my ($oneprefix, $oneas) = split(/\s+/, $oneline);
if($oneprefix =~ /\d{1,3}(.\d{1,3}){3}\/\d{
$pt->add_string($oneprefix
$prefixt->add_string($onep
}
}
close(INPUT1);
my %timefilehash = ();
my $totalfiles = 0;
my $corruptedfiles = 0;
my $workingfiles = 0;
@filecache = ();
while(my $file=<INPUT2>) {
push(@filecache, $file);
}
foreach my $file (@filecache) {
chomp($file);
if( -f $file && $file =~ /_(\d{12})_re_\d+/ ) {
my $cnttime = $1;
my $diff1 = to_seconds($cnttime) - to_seconds($startingtime);
my $diff2 = to_seconds($endingtime) - to_seconds($cnttime);
if( $diff1 >= 0 && $diff2 >= 0 ) {
$totalfiles++;
$workingfiles++;
my $timefiles = $timefilehash{$cnttime};
if(not(defined($timefiles)
$timefilehash{$cnttime} = [$file];
}
else {
push(@{$timefilehash{$cntt
}
}
}
}
@filecache = ();
while(my $file=<INPUT3>) {
push(@filecache, $file);
}
foreach my $file (@filecache) {
chomp($file);
if(-f $file && $file =~ /_(\d{12})_re_\d+/ ) {
my $cnttime = $1;
my $diff1 = to_seconds($cnttime) - to_seconds($startingtime);
my $diff2 = to_seconds($endingtime) - to_seconds($cnttime);
if( $diff1 >= 0 && $diff2 >= 0 ) {
$totalfiles++;
$corruptedfiles++;
}
}
}
my @files = ();
sub to_seconds
{
use integer;
my $x = $_[0];
my $year = "20".substr($x,0,2);
my $mo = substr($x,2,2);
my $day = substr($x,4,2);
my $hour = substr($x,6,2);
my $minute = substr($x,8,2);
my $second = substr($x,10,2);
my $t = timelocal($second,$minute,
return($t);
}
my $numgroups= keys %timefilehash;
for my $onetime (sort keys %timefilehash) {
my @cntset = @{$timefilehash{$onetime}}
my $goodfiles = $#cntset+1;
foreach my $onefile (@{$timefilehash{$onetime}
push(@files, $onefile);
print "The file is $onefile\n";
}
}
# connect to mySQL database for later data query and retrieval
my $dsn = "DBI:mysql:test_bm"; # data source name
my $user_name = "root"; # user name
my $password = "NewPw"; # password
my %ipasntable = ();
my %iplockeytable = ();
my %lockeyloctable = ();
my %iploctable = ();
my %bgphash = ();
my %igphash = ();
# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## select ipAddress, asn, lockey from the ipAddress table
my $sth = $dbh->prepare("SELECT ipAddress, asn, locKey FROM ipAddress");
$sth->execute();
## fetch query results from ipAddress table
while(my @ary = $sth->fetchrow_array()) {
my ($cntip, $cntasn, $cntkey) = @ary;
if($cntasn ne "NULL") {
if($cntasn > 0) {
$ipasntable{$cntip} = $cntasn;
}
}
else {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($cntip);
if(defined($cntasn)) {
$ipasntable{$cntip} = $cntasn;
}
}
if($cntkey ne "NULL") {
if($cntkey > 1) {
$iplockeytable{$cntip} = $cntkey;
}
}
}
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## select lockey, locName from the location table
$sth = $dbh->prepare("SELECT locKey, locName FROM location");
$sth->execute();
## fetch query results from location table
while(my @ary = $sth->fetchrow_array()) {
my ($cntkey, $cntloc) = @ary;
if($cntkey ne "NULL") {
if($cntkey > 1) {
$lockeyloctable{$cntkey} = $cntloc;
}
}
}
while ( my ($oneip, $onekey) = each(%iplockeytable) ) {
my $oneloc = $lockeyloctable{$onekey};
my $oneasn = $ipasntable{$oneip};
# print "For ip $oneip, its ASN is $oneasn, its PoP is $oneloc\n";
$iploctable{$oneip} = $oneloc;
}
## release iplockeytable and lockeyloctable memory
%iplockeytable = ();
%lockeyloctable = ();
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## drop inferred BGP table if it exists
my $bgpdrop = "
DROP TABLE IF EXISTS bgp";
$sth = $dbh->prepare($bgpdrop);
$sth->execute();
## create inferred BGP table
my $bgpcreate = "
CREATE TABLE bgp (
bkey int(12) unsigned NOT NULL auto_increment,
ptime datetime NOT NULL,
tstart datetime NOT NULL,
vpip varchar(16) NOT NULL,
dip varchar(24) NOT NULL,
cntas int(8) unsigned NOT NULL,
cntpop varchar(32) NOT NULL,
nextas int(8) unsigned,
nextpop varchar(32),
aspath varchar(64),
PRIMARY KEY (bkey)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8";
$sth = $dbh->prepare($bgpcreate);
$sth->execute();
## drop intra-AS PoP-path table if it exists
my $poppathdrop = "
DROP TABLE IF EXISTS poppath";
$sth = $dbh->prepare($poppathdrop
$sth->execute();
## create intra-AS PoP-path table
my $poppathcreate = "
CREATE TABLE poppath (
pkey int(12) unsigned NOT NULL auto_increment,
ptime datetime NOT NULL,
tstart datetime NOT NULL,
vpip varchar(16) NOT NULL,
dip varchar(16) NOT NULL,
asn int(8) unsigned NOT NULL,
srcpop varchar(32) NOT NULL,
dstpop varchar(32) NOT NULL,
poppath varchar(256) NOT NULL,
ippathlen int(4) NOT NULL,
PRIMARY KEY (pkey)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8";
$sth = $dbh->prepare($poppathcrea
$sth->execute();
## subroutine to check whether an ASN is a targeted ASN
sub istargetas {
my $asn = $_;
if($asn eq "1239" || $asn eq "16631" || $asn eq "1668" || $asn eq "209" ||
$asn eq "2828" || $asn eq "2856" || $asn eq "2914" || $asn eq "3257" ||
$asn eq "3320" || $asn eq "3356" || $asn eq "3549" || $asn eq "3561" ||
$asn eq "5511" || $asn eq "6395" || $asn eq "6453" || $asn eq "6461" ||
$asn eq "701" || $asn eq "7018") {
return 1;
}
else {
return 0;
}
}
sub bgpcontains {
my ($first, $second) = @_;
foreach my $one (@{$first}) {
if($one eq $second) {
return 1;
}
}
return 0;
}
sub igpcontains {
my ($first, $second) = @_;
foreach my $one (@{$first}) {
if($one eq $second) {
return 1;
}
}
return 0;
}
my $totalnextingress = 0;
my $nextingressmorethanonesta
my $nextingressonestarnounkno
my $nextingressonestarunknown
my $nextingressnostarnounknow
my $nextingressnostarunknowni
my $totalsameasnexthop = 0;
my $sameasnexthopegresstwounk
my $sameasnexthopegressnotunk
my $sameasnexthopegressunknow
my $totalaspath = 0;
my $aspathmorethanonestarskip
my $aspathonestarincluded = 0;
my $aspathnostarincluded = 0;
my $totalpoppath = 0;
my $poppathmorethanoneunknown
my $poppathoneunknownincluded
my $poppathnounknownincluded = 0;
## subroutine to process one traceroute, create bgp entries and poppath entries, and insert entries into bgp table and poppath table
sub processoneprobe {
my ($probetime, $starttime, @hops) = @_;
my $asstr = "";
my $lastasn="-1";
my @asgroups = ();
my $cntgroup = "";
my $cntindex = 0;
my $srchop = $hops[0];
my $dsthop = $hops[$#hops];
my ($srcip, $dummy1, $dummy2, $srcasn, $srcpop) = split(/:/, $srchop);
my ($dstip, $dummy3, $dummy4, $dstasn, $dstpop) = split(/:/, $dsthop);
my @noduplicates = ();
## remove duplicate IPs in the hops
my $lastip = "-1";
for(my $i=0; $i<=$#hops; $i++) {
if($hops[$i] eq "*") {
push(@noduplicates, $hops[$i]);
}
else {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, $hops[$i]);
if($cntip ne $lastip) {
push(@noduplicates, $hops[$i]);
$lastip = $cntip;
}
}
}
@hops = @noduplicates;
## get as-path and divide hops into AS groups by getting each AS group's hop indices
foreach my $onehop (@hops) {
## skip stars
if($onehop eq "*") {
$cntindex++;
next;
}
my ($cntIP, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $onehop);
## this is the first AS in the traceroute path
if($lastasn eq "-1") {
if($cntasn ne "NULL" && $cntasn ne "0") {
$asstr = $cntasn;
$cntgroup = $cntindex;
$lastasn = $cntasn;
}
}
else { ## Non-first AS in the traceroute path
if($cntasn ne "NULL" && $cntasn ne "0" && $cntasn ne $lastasn) {
push(@asgroups, $cntgroup);
$asstr .= ">$cntasn";
$cntgroup = $cntindex;
$lastasn = $cntasn;
}
elsif($cntasn ne "NULL" && $cntasn ne "0" && $cntasn eq $lastasn) {
$cntgroup .= ":$cntindex";
}
if($cntindex == $#hops) {
push(@asgroups, $cntgroup);
}
}
$cntindex++;
}
## get the PoP-level paths for the as groups
my @groups = ();
foreach my $onegroup (@asgroups) {
my (@indices) = split(/:/, $onegroup);
my $firstindex = $indices[0];
my $lastindex = $indices[$#indices];
my $groupstr = "";
my $lastpop = "-1";
my $i;
for($i=$firstindex; $i<=$lastindex; $i++) {
my $cnthop = $hops[$i];
if($cnthop eq "*") {
if($groupstr eq "") {
$groupstr .= $cnthop;
$lastpop = "NULL";
}
else {
$groupstr .= "|$cnthop";
$lastpop = "NULL";
}
}
else {
my ($cntIP, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $cnthop);
if($cntpop eq "NULL") {
if($groupstr eq "") {
$groupstr .= "$cntIP:$cntasn:$cntpop:$i
$lastpop = "NULL";
}
else {
$groupstr .= "|$cntIP:$cntasn:$cntpop:$
$lastpop = "NULL";
}
}
elsif($cntpop ne $lastpop) {
if($groupstr eq "") {
$groupstr .= "$cntIP:$cntasn:$cntpop:$i
$lastpop = $cntpop;
}
else {
$groupstr .= "|$cntIP:$cntasn:$cntpop:$
$lastpop = $cntpop;
}
}
}
} ## end for($i ... ...)
push(@groups, $groupstr);
}
## begin to process these groups one by one
my $ii; ## index for groups
for($ii=0; $ii<=$#groups; $ii++) {
my $onegroup = $groups[$ii];
my (@cntpops) = split(/\|/, $onegroup);
my ($cntIP, $cntasn, $cntpop, $cntindex) = split(/:/, $cntpops[0]);
if($cntasn != $fileasn) {
next;
}
my $cntlastasnpop = $cntpops[$#cntpops];
my $afterstars = 0;
my $afterhasstar = 0;
if($cntlastasnpop eq "*") { ## the current last hop is not *
print "Current asnpop is *. \n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my $tmpi;
my ($cntlastIP, $cntlastasn, $cntlastpop, $cntlastindex) = split(/:/, $cntlastasnpop);
for($tmpi=$cntlastindex+1;
if($hops[$tmpi] eq "*") {
$afterstars++;
$afterhasstar = 1;
if($afterstars == 2) {
last;
}
}
else {
$afterstars = 0;
}
}
my $partialaspath;
if($afterstars < 2) {
### get the partial as path from current as to the destination as
my (@ashops) = split(/>/, $asstr);
my $asindex;
my $jj;
for($jj=0; $jj<=$#ashops; $jj++) {
my $cntasn = $ashops[$jj];
if($cntasn == $cntlastasn) {
$asindex = $jj;
next;
}
}
$partialaspath = $ashops[$asindex];
for($jj=$asindex+1; $jj<=$#ashops; $jj++) {
$partialaspath .= ">$ashops[$jj]";
}
}
else {
$partialaspath = "NULL"; ## do not use the AS path if there exists two consecutive "*" after the current AS
}
## begin to test previous egress--next ingress PoP entry and populate it into the bgp table
if($ii<$#groups) { ## it is not the last AS
my $nextgroup = $groups[$ii+1];
my (@nextpops) = split(/\|/, $nextgroup);
my $nextfirstasnpop = $nextpops[0];
my $nextegresstype = 0;
my ($cntsecondlastIP, $cntsecondlastasn, $cntsecondlastpop, $secondcntlastindex);
if($cntlastasn ne "NULL" && $cntlastpop ne "NULL") { ## current last pop is valid
$nextegresstype = 1;
}
elsif($#cntpops >= 1) {
my $cntsecondlastasnpop = $cntpops[$#cntpops-1];
if($cntsecondlastasnpop ne "*") {
($cntsecondlastIP, $cntsecondlastasn, $cntsecondlastpop, $secondcntlastindex) = split(/:/, $cntsecondlastasnpop);
if($cntsecondlastasn ne "NULL" && $cntsecondlastpop ne "NULL") {
$nextegresstype = 2;
}
}
}
if($nextegresstype > 0) {
if($nextfirstasnpop ne "*") {
my ($nextfirstIP, $nextfirstasn, $nextfirstpop, $nextfirstindex) = split(/:/, $nextfirstasnpop);
## check whether there are more than 1 consecutive * hop between the last PoP and next-AS ingress IP
my $betweenstars = 0;
for($tmpi=$cntlastindex+1;
if($hops[$tmpi] eq "*") {
$betweenstars++;
}
}
my $key;
if($nextegresstype == 1) {
$key = "$srcip<$dstip<$cntlastasn
}
else {
$key = "$srcip<$dstip<$cntsecondl
}
my $oneentry;
if($betweenstars >= 2) {
$oneentry = "$probetime<$nextfirstasn<
}
else {
$oneentry = "$probetime<$nextfirstasn<
}
my $entries = $bgphash{$key};
if(not(defined($entries)))
$totalnextingress++;
if($betweenstars >= 2) {
$nextingressmorethanonesta
}
else {
if($betweenstars == 1 && $nextegresstype == 1) {
$nextingressonestarnounkno
}
elsif($betweenstars == 1 && $nextegresstype == 2) {
$nextingressonestarunknown
}
elsif($betweenstars == 0 && $nextegresstype == 1) {
$nextingressnostarnounknow
}
elsif($betweenstars == 0 && $nextegresstype == 2) {
$nextingressnostarunknowni
}
} ## end if($betweenstars >= 2) { ... } else { ... }
$totalaspath++;
if($afterstars >= 2) {
$aspathmorethanonestarskip
}
elsif($afterhasstar == 1) {
$aspathonestarincluded++;
}
else {
$aspathnostarincluded++;
}
if($betweenstars < 2 || $afterstars < 2) {
@{$bgphash{$key}} = ($oneentry);
}
}
else {
if(bgpcontains(\@{$entries
$totalnextingress++;
if($betweenstars >= 2) {
$nextingressmorethanonesta
}
else {
if($betweenstars == 1 && $nextegresstype == 1) {
$nextingressonestarnounkno
}
elsif($betweenstars == 1 && $nextegresstype == 2) {
$nextingressonestarunknown
}
elsif($betweenstars == 0 && $nextegresstype == 1) {
$nextingressnostarnounknow
}
elsif($betweenstars == 0 && $nextegresstype == 2) {
$nextingressnostarunknowni
}
} ## end if($betweenstars >= 2) { ... } else { ... }
$totalaspath++;
if($afterstars >= 2) {
$aspathmorethanonestarskip
}
elsif($afterhasstar == 1) {
$aspathonestarincluded++;
}
else {
$aspathnostarincluded++;
}
if($betweenstars < 2 || $afterstars < 2) {
push(@{$bgphash{$key}}, $oneentry);
}
}
}
} ## end if($nextfirstasnpop ne "*")
} ## end if($nextegresstype > 0)
}
## begin to generate bgp and IGP PoP-path entries for this group
my (@asnpops) = split(/\|/, $onegroup);
if($asnpops[0] eq "*") {
print "The first pop is *.\n";
print "The asnpops are @asnpops\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
if($asnpops[$#asnpops] eq "*") {
print "The last PoP is *\n";
print "The asnpops are @asnpops\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my ($egressIP, $egressasn, $egresspop, $egressindex) = split(/:/, $asnpops[$#asnpops]);
my $egresstype = 0; ## this variable keeps the type of the egress
if($egresspop ne "NULL") {
$egresstype = 1;
}
else {
if($#asnpops>=1) {
if($asnpops[$#asnpops-1] ne "*") {
($egressIP, $egressasn, $egresspop, $egressindex) = split(/:/, $asnpops[$#asnpops-1]);
if($egresspop ne "NULL") {
$egresstype = 2;
}
}
}
}
my $i;
my $j;
for($i=0; $i<=$#asnpops-1; $i++) {
if($asnpops[$i] eq "*") {
next;
}
my ($startIP, $startasn, $startpop, $startindex) = split(/:/, $asnpops[$i]);
if($startasn eq "NULL" || $startpop eq "NULL" || $startpop eq $egresspop) {
next;
}
$totalaspath++;
if($afterstars >= 2) {
$aspathmorethanonestarskip
}
elsif($afterhasstar == 1) {
$aspathonestarincluded++;
}
else {
$aspathnostarincluded++;
}
my $key = "$srcip<$dstip<$startasn<$
my $entries = $bgphash{$key};
my $oneentry = "$probetime<$egressasn<$eg
if(not(defined($entries)))
$totalsameasnexthop++;
if($egresstype != 0) {
if($egresstype == 1) {
$sameasnexthopegressnotunk
}
elsif($egresstype == 2) {
$sameasnexthopegressunknow
}
@{$bgphash{$key}} = ($oneentry);
}
else {
$sameasnexthopegresstwounk
}
}
else {
if(bgpcontains(\@{$entries
$totalsameasnexthop++;
if($egresstype != 0) {
if($egresstype == 1) {
$sameasnexthopegressnotunk
}
elsif($egresstype == 2) {
$sameasnexthopegressunknow
}
push(@{$bgphash{$key}}, $oneentry);
}
else {
$sameasnexthopegresstwounk
}
}
}
my $poppath = $startpop;
## begin to process the right-side paths of the current pop
for($j=$i+1; $j<=$#asnpops; $j++) {
if($asnpops[$j] eq "*") {
$poppath .= ">*";
next;
}
else {
my ($endip, $endasn, $endpop, $endindex) = split(/:/, $asnpops[$j]);
if($endasn ne "NULL" && $endasn ne $startasn) {
print "The end pop asn is $endasn, and the start asn is $startasn\n";
print "The group is $onegroup\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
if($endpop eq "NULL") {
$poppath .= ">NULL";
next;
}
else {
$poppath .= ">$endpop";
## begin to inspect and compress the PoP-path
my @pops = split(/>/, $poppath);
my @knownpops = ();
for(my $tmpkk=0; $tmpkk<=$#pops; $tmpkk++) {
## keep known PoP indices into PoP list
if($pops[$tmpkk] ne "*" && $pops[$tmpkk] ne "NULL") {
push(@knownpops, $tmpkk);
}
}
if($knownpops[0] != 0 || $knownpops[$#knownpops] != $#pops) {
print "The first known PoP index is $knownpops[0], and the last known PoP index is $knownpops[$#knownpops].\n
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my $twounknownnotbetweensameP
my $oneunknownnotbetweensameP
my $newpoppath = "$pops[0]";
my $lastkeptPoP = $pops[0];
my $lastPoPindex = 0;
for(my $tmpkk=1; $tmpkk<=$#knownpops; $tmpkk++) {
my $cntPoP = $pops[$knownpops[$tmpkk]];
if($cntPoP ne $lastkeptPoP) {
## there are more than one NULL PoP or * between current PoP and last known PoP
if($knownpops[$tmpkk]-$las
$twounknownnotbetweensameP
last;
}
## there is one NULL PoP or * between current PoP and last known PoP, add a wild card
elsif($knownpops[$tmpkk]-$
if($oneunknownnotbetweensa
$oneunknownnotbetweensameP
$lastPoPindex = $knownpops[$tmpkk];
$lastkeptPoP = $cntPoP;
$newpoppath .= ">*>$cntPoP";
}
else {
$twounknownnotbetweensameP
last;
}
}
else { ## this is no NULL PoP or * between two known PoPs
$lastPoPindex = $knownpops[$tmpkk];
$lastkeptPoP = $cntPoP;
$newpoppath .= ">$cntPoP";
}
}
}
## begin to find the earliest IP index in the same PoP as startIP
my $earliestindex = $startindex;
for(my $lll=$startindex; $lll>=0; $lll--) {
my $cnthop = $hops[$lll];
if($hops[$lll] ne "*") {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, $hops[$lll]);
if($cntasn eq $startasn && $cntpop eq $startpop) {
$earliestindex = $lll;
}
elsif($cntpop ne "NULL" && $cntpop ne $startpop) {
last;
}
elsif($cntasn ne "NULL" && $cntasn ne $startasn) {
last;
}
}
}
## begin to find the latest IP index in the same PoP as endIP
my $latestindex = $endindex;
for(my $lll=$endindex; $lll<=$#hops; $lll++) {
my $cnthop = $hops[$lll];
if($hops[$lll] ne "*") {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, $hops[$lll]);
if($cntasn eq $endasn && $cntpop eq $endpop) {
$latestindex = $lll;
}
elsif($cntpop ne "NULL" && $cntpop ne $endpop) {
last;
}
elsif($cntasn ne "NULL" && $cntasn ne $endasn) {
last;
}
}
}
my $ippathlen = $latestindex - $earliestindex + 1;
my $key = "$srcip<$dstip<$startasn<$
my $oneentry = "$probetime<$newpoppath<$i
my $entries = $igphash{$key};
if(not(defined($entries)))
$totalpoppath++;
if($twounknownnotbetweensa
$poppathmorethanoneunknown
}
else {
if($oneunknownnotbetweensa
$poppathoneunknownincluded
}
else {
$poppathnounknownincluded+
}
@{$igphash{$key}} = ($oneentry);
}
}
else {
if(igpcontains(\@{$entries
$totalpoppath++;
if($twounknownnotbetweensa
$poppathmorethanoneunknown
}
else {
if($oneunknownnotbetweensa
$poppathoneunknownincluded
}
else {
$poppathnounknownincluded+
}
push(@{$igphash{$key}}, $oneentry);
}
} ## end if(igpcontains(\@{$entries
} ## end if(not(defined($entries)))
} ## end if($endpop eq "NULL") { ... } else { ... }
} ## end if($asnpops[$j] eq "*") { ... } else { ... }
} ## end for($j=$i+1; $j<=$#asnpops; $j++)
} ## end for($i=0; $i<=$#asnpops-1; $i++)
} ## end for($ii=0; $ii<=$#groups; $ii++)
}
sub populatehashes {
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
my $str = "LOCK TABLES bgp WRITE";
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO bgp (ptime, tstart, vpip, dip, cntas, cntpop, nextas, nextpop, aspath) VALUES ";
my $first = 1;
my $entrycount = 0;
while ( my ($key, $val) = each(%bgphash) ) {
my ($srcip, $dstip, $asn, $pop, $starttime) = split(/</, $key);
my $count=0;
my $strtowrite = "";
$entrycount++;
foreach my $one (@{$val}) {
my ($probetime, $nextasn, $nextpop, $aspath) = split(/</, $one);
$count++;
if($first == 1) {
$str .= "('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$pop', $nextasn, '$nextpop', '$aspath')";
$first = 0;
}
else {
$str .= ", ('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$pop', $nextasn, '$nextpop', '$aspath')";
}
$strtowrite .= "$probetime\t$starttime\t$
}
if($count>1) {
print OUTPUT2 "$strtowrite";
}
if($entrycount >= 3000) {
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO bgp (ptime, tstart, vpip, dip, cntas, cntpop, nextas, nextpop, aspath) VALUES ";
$first = 1;
$entrycount = 0;
}
}
if($entrycount > 0) {
$sth = $dbh->prepare($str);
$sth->execute();
}
$str = "LOCK TABLES poppath WRITE";
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO poppath (ptime, tstart, vpip, dip, asn, srcpop, dstpop, poppath, ippathlen) VALUES ";
$first = 1;
$entrycount = 0;
while ( my ($key, $val) = each(%igphash) ) {
my ($srcip, $dstip, $asn, $srcpop, $dstpop, $starttime) = split(/</, $key);
my $count = 0;
my $strtowrite="";
$entrycount++;
foreach my $one (@{$val}) {
my ($probetime, $poppath, $ippathlen) = split(/</, $one);
$count++;
if($first == 1) {
$str .= "('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$srcpop', '$dstpop', '$poppath', '$ippathlen')";
$first = 0;
}
else {
$str .= ", ('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$srcpop', '$dstpop', '$poppath', '$ippathlen')";
}
$strtowrite .= "$probetime\t$starttime\t$
}
if($count>1) {
print OUTPUT1 "$strtowrite";
}
if($entrycount >= 3000) {
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO poppath (ptime, tstart, vpip, dip, asn, srcpop, dstpop, poppath, ippathlen) VALUES ";
$first = 1;
$entrycount = 0;
}
}
if($entrycount>0) {
$sth = $dbh->prepare($str);
$sth->execute();
}
$str = "UNLOCK TABLES";
$sth = $dbh->prepare($str);
$sth->execute();
%bgphash = ();
%igphash = ();
}
my $filecount = 0;
my $discardedIPloops = 0;
my $discardedPoPloops = 0;
my $discardedASloops = 0;
my $processedtraceroutes = 0;
sub discardpath {
my (@hops) = @_;
## check for IP, PoP, AS loops
my %iphash = ();
my %pophash = ();
my %ashash = ();
my $lastip = "-1";
my $lastasn = -1;
my $lastpop = "-1";
my $aspath = "";
foreach my $onehop (@hops) {
if($onehop ne "*") {
my ($ip, $rtt, $ttl, $asn, $pop) = split(/:/, $onehop);
if($ip ne $lastip) {
if(not(defined($iphash{$ip
$iphash{$ip} = 1;
$lastip = $ip;
}
else {
$discardedIPloops++;
## print "The traceroute @hops contains an IP loop.\n";
## print "$ip appeared more than once.\n";
return 1;
}
}
if($asn ne "NULL") {
if($pop ne "NULL") {
my $cntpop = "$asn->$pop";
if($cntpop ne $lastpop) {
if(not(defined($pophash{$c
$pophash{$cntpop} = 1;
$lastpop = $cntpop;
}
else {
$discardedPoPloops++;
## print "The traceroute @hops contains a PoP loop.\n";
return 2;
}
}
}
$aspath .= "$asn|";
if($asn ne $lastasn) {
if(not(defined($ashash{$as
$ashash{$asn} = 1;
$lastasn = $asn;
}
else {
$discardedASloops++;
return 3;
}
}
}
}
}
return 0;
}
my $totaltraceroutes = 0;
my $processbegintime = time(); ## get time in seconds since 1970
my $lastprocesstime = $processbegintime;
my $cntprocesstime;
my $totalprocesstime = 0;
# begin to process traceroute plain texts, and map intermediate IPs to their AS numbers and locations (POPs)
foreach my $file (@files) {
my $lastinserttime="-1"; ## this stores the old data collection hour
# this is traceroute plain text file
if(-f $file) {
open(INPUT1, "zcat $file | ") || die "can't open file $file for read";
$filecount++;
print "File $filecount is $file\n";
my $tag;
my $date;
my $time;
my $srcaddr;
my $arrow;
my $dstaddr;
my $icmpstatus;
my $hopcount;
my @hops = ();
my $lastindex=0;
my $dummy;
my $cntindex=0;
my $cntIP;
my $cntrtt;
my $cntttl;
my $firsthop;
my $lasthop;
## begin to extract probing time from the file name
# extension is in the format of .*
$file =~ /.*\/([^\/]+)/;
my $filename = $1;
my ($site, $datetime, $re, $targetasn) = split(/_/, $filename);
$fileasn = $targetasn;
$datetime =~ /(\d\d)(\d\d)(\d\d)(\d\d)(
my ($yy, $mm, $dd, $hh, $min, $ss) = ($1, $2, $3, $4, $5, $6);
my $yyyy;
my $probetime;
my $starttime;
$probetime = "20$yy-$mm-$dd $hh:$min:$ss";
my $startmin = 15*int(($min/15));
$starttime = "20$yy-$mm-$dd $hh:$startmin:00";
@filecache = ();
while(my $line = <INPUT1>) {
push(@filecache, $line);
}
## begin to process lines iteratively
foreach my $line (@filecache) {
chomp($line);
## this is the beginning of a new traceroute probe
if($line =~ /->/) {
($tag, $date, $time, $srcaddr, $arrow, $dstaddr, $icmpstatus, $hopcount) = split(/\s+/, $line);
## find out the asn and pop for the first IP
my $oneIP;
if(not($oneIP = new Net::IP($srcaddr))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
my $IPnum = $oneIP->intip();
my $cntasn;
my $cntpop;
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($srcaddr
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$firsthop = "$srcaddr:-1:$-1:$cntasn:$
if($lastinserttime ne "-1") {
my $oldtime = to_seconds($lastinserttime
my $newtime = to_seconds("$yy$mm$dd$hh$m
my $timediff = $newtime - $oldtime;
## populate bgp and igp hash into database if it has been 3 hours since last insertion
if( $timediff > 10800 ) {
$cntprocesstime = time();
$totalprocesstime += $cntprocesstime - $lastprocesstime;
$lastprocesstime = $processbegintime;
populatehashes();
$lastinserttime = "$yy$mm$dd$hh$min$ss";
}
}
else {
$lastinserttime = "$yy$mm$dd$hh$min$ss";
}
## find out the asn and pop for the last IP
if(not($oneIP = new Net::IP($dstaddr))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
$IPnum = $oneIP->intip();
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($dstaddr
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$lasthop = "$dstaddr:-1:$-1:$cntasn:$
}
elsif($line =~ /duration/) {
next;
}
elsif($line !~ /^\s*$/) { # process the line if it contains other than white spaces
($dummy, $cntindex, $cntIP, $cntrtt, $cntttl) = split(/\s+/, $line);
if($lastindex != 0) { # not the beginning of the first hop
my $numstars = $cntindex-$lastindex-1;
my $i;
## fill in * for those missing hops
for($i=0; $i<$numstars; $i++) {
push(@hops, "*");
}
}
# print "Current ip is $cntIP\n";
my $oneIP;
if(not($oneIP = new Net::IP($cntIP))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
my $IPnum = $oneIP->intip();
my $cntasn;
my $cntpop;
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($cntIP);
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
my $hopstr = "$cntIP:$cntrtt:$cntttl:$c
push(@hops, $hopstr);
$lastindex = $cntindex;
}
elsif($line =~ /^\s*$/) { # this is the white space lines
## begin to process one traceroute probing result
if($#hops >=0) {
my @cnthops = ($firsthop, @hops, $lasthop);
$totaltraceroutes++;
if(discardpath(@cnthops) == 0) {
$processedtraceroutes++;
processoneprobe($probetime
}
}
## last line is the end of one traceroute probe
if($lastindex != 0) {
@hops = (); # reset hops array to empty
$lastindex = 0; # reset lastindex to 0
}
}
}
close(INPUT1);
}
}
$cntprocesstime = time();
$totalprocesstime += $cntprocesstime - $lastprocesstime;
$lastprocesstime = $processbegintime;
populatehashes();
$sth->finish ();
$dbh->disconnect ();
my $ratio;
print OUTPUT3 "$totalfiles traceroute files are checked.\n";
print OUTPUT3 "There are $numgroups groups(s) of probing in the checking interval.\n";
for my $onetime (sort keys %timefilehash) {
my @cntset = @{$timefilehash{$onetime}}
my $goodfiles = $#cntset+1;
print OUTPUT3 "At time $onetime, there are $goodfiles good files.\n";
foreach my $onefile (@{$timefilehash{$onetime}
print "The file is $onefile\n";
}
}
print OUTPUT3 "\n\n";
$ratio = $corruptedfiles/$totalfile
print OUTPUT3 "$corruptedfiles, counted as $ratio, traceroute files are corrupted and are discarded.\n";
$ratio = $workingfiles/$totalfiles;
print OUTPUT3 "$workingfiles, counted as $ratio, traceroute files are good.\n\n\n";
print OUTPUT3 "$totaltraceroutes traceroute paths are parsed.\n";
$ratio = $discardedIPloops/$totaltr
print OUTPUT3 "$discardedIPloops, counted as $ratio, traceroute paths are discarded due to IP loops.\n";
$ratio = $discardedPoPloops/$totalt
print OUTPUT3 "$discardedPoPloops, counted as $ratio, traceroute paths are discarded due to PoP loops.\n";
$ratio = $discardedASloops/$totaltr
print OUTPUT3 "$discardedASloops traceroute paths, counted as $ratio, are discarded due to AS loops.\n";
$ratio = $processedtraceroutes/$tot
print OUTPUT3 "$processedtraceroutes, counted as $ratio, traceroutes are processed.\n\n\n";
print OUTPUT3 "$totalnextingress BGP next-ingresses are checked.\n";
$ratio = $nextingressmorethanonesta
print OUTPUT3 "$nextingressmorethanonest
$ratio = $nextingressonestarnounkno
print OUTPUT3 "$nextingressonestarnounkn
$ratio = $nextingressonestarunknown
print OUTPUT3 "$nextingressonestarunknow
$ratio = $nextingressnostarnounknow
print OUTPUT3 "$nextingressnostarnounkno
$ratio = $nextingressnostarunknowni
print OUTPUT3 "$nextingressnostarunknown
print OUTPUT3 "$totalsameasnexthop BGP same-AS egresses are checked.\n";
$ratio = $sameasnexthopegresstwounk
print OUTPUT3 "$sameasnexthopegresstwoun
$ratio = $sameasnexthopegressnotunk
print OUTPUT3 "$sameasnexthopegressnotun
$ratio = $sameasnexthopegressunknow
print OUTPUT3 "$sameasnexthopegressunkno
print OUTPUT3 "$totalaspath BGP AS-paths are checked.\n";
$ratio = $aspathmorethanonestarskip
print OUTPUT3 "$aspathmorethanonestarski
$ratio = $aspathonestarincluded/$to
print OUTPUT3 "$aspathonestarincluded, counted as $ratio, BGP AS-paths are included with single star(s) on IP-path from current AS to destination host.\n";
$ratio = $aspathnostarincluded/$tot
print OUTPUT3 "$aspathnostarincluded, counted as $ratio, BGP AS-paths are included with no star on IP-path from current AS to destination host.\n\n\n";
print OUTPUT3 "$totalpoppath IGP PoP-paths are checked.\n";
$ratio = $poppathmorethanoneunknown
print OUTPUT3 "$poppathmorethanoneunknow
$ratio = $poppathoneunknownincluded
print OUTPUT3 "$poppathoneunknowninclude
$ratio = $poppathnounknownincluded/
print OUTPUT3 "$poppathnounknownincluded
my $stoptime = time(); ## get time in seconds since 1970
my $elapsedtime = $stoptime - $begintime;
print OUTPUT3 "The code stops at time $stoptime\n\n";
print OUTPUT3 "The code runs $elapsedtime seconds\n";
print OUTPUT3 "The traceroute processing runs $totalprocesstime seconds\n";
close(OUTPUT1);
close(OUTPUT2);
close(OUTPUT3);
exit (0);
I don't see anything in the Perl code that handles gzip files. Do you have a wrapper script calling it?
ASKER
See the line:
open(INPUT1, "zcat $file | ") || die "can't open file $file for read";
open(INPUT1, "zcat $file | ") || die "can't open file $file for read";
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I agree with Tintin. Look at it this way -- if it's taking 8 hours to process 800 files, that's 36 seconds per file. Of that 36 seconds, I'd bet 35 are spent on zcat operations, DBI operations, or network overhead. Since these are esssentially fixed operations (they'll be the same regardless of the language the script is written in), there's little to gain by switching languages.
Your biggest time chunk is undoubtedly used decompressing the files. I don't know the zcat program -- is there a faster alternative (like tar or gzip)? You're already using MySQL, which is about the fastest SQL DB program out there....
Can you distribute the work amongst several computers?
Your biggest time chunk is undoubtedly used decompressing the files. I don't know the zcat program -- is there a faster alternative (like tar or gzip)? You're already using MySQL, which is about the fastest SQL DB program out there....
Can you distribute the work amongst several computers?
Or, depending on what you're doing, is it necessary to completely decompress the files? I assume each archive has several files in it -- is it necessary to decompress the entire file, or can you just extract the one you need to manipulate?
ASKER
Actually, I did timing on the code. DBI and zcat operations only took very small amount of time to finish. Also each gz file only contains one compressed plain text file. There are no network operations in my code.
Your script does this for INPUT1, INPUT2, INPUT3
my @filecache = ();
while(my $oneline = <INPUT1>) {
push(@filecache, $oneline);
}
foreach my $oneline (@filecache) {
It might be faster do do this (replace all above lines with this 1 line):
while(my $oneline=<INPUT1>) {
=======================
In this loop:
for my $onetime (sort keys %timefilehash) {
my @cntset = @{$timefilehash{$onetime}} ;
my $goodfiles = $#cntset+1;
foreach my $onefile (@{$timefilehash{$onetime} }) {
push(@files, $onefile);
print "The file is $onefile\n";
}
}
You create @cntset and use it to create $goodfiles. Neither are used after this. Removing those two lines should give you a little speed.
I haven't looked through the rest of the file... when you looked at the timing, which portions were using the most time.
my @filecache = ();
while(my $oneline = <INPUT1>) {
push(@filecache, $oneline);
}
foreach my $oneline (@filecache) {
It might be faster do do this (replace all above lines with this 1 line):
while(my $oneline=<INPUT1>) {
=======================
In this loop:
for my $onetime (sort keys %timefilehash) {
my @cntset = @{$timefilehash{$onetime}}
my $goodfiles = $#cntset+1;
foreach my $onefile (@{$timefilehash{$onetime}
push(@files, $onefile);
print "The file is $onefile\n";
}
}
You create @cntset and use it to create $goodfiles. Neither are used after this. Removing those two lines should give you a little speed.
I haven't looked through the rest of the file... when you looked at the timing, which portions were using the most time.
ASKER
I have modified the code to make it more readable. The new code is as follows.
#!/usr/bin/perl -w
# process-traceroute-insert- mysql.pl
use strict;
use File::Find;
use File::Basename;
use DBI;
use Net::IP;
use Net::Patricia;
use Time::Local;
## start-time is the selected starting time, end-time is the selected ending time for traceroute data processing, prefixasfile is the prefix-as mapping file name, inconsistent-as-path-outpu t is the file to store inconsistent aspath entries, inconsistent-pop-path-outp ut is the file to store inconsistent poppath entries, discarding-stats-output is the file to store files and traceroutes being discarded
if($#ARGV != 7) {
print "usage: process-traceroute.pl start-time end-time good-traceroute-file-list corrupted-traceroute-file- list prefix-as-mapping-file inconsistent-as-path-outpu t inconsistent-pop-path-outp ut policy-filtering-stats-out put\n";
print "start-time and end-time in format YYMMDDHHMMSS\n";
exit(1);
}
my ($startingtime, $endingtime, $goodfilelist, $corruptedfilelist, $prefixasfile, $aspathoutput, $poppathoutput, $policyoutput) = @ARGV;
## open the prefix-as mapping file and store them in the Patricia handler
open(INPUT1, "<$prefixasfile") || die "cannot open $prefixasfile file for read.";
## open the good file list
open(INPUT2, "<$goodfilelist") || die "cannot open $goodfilelist file for read.";
## open the corrupted file list
open(INPUT3, "<$corruptedfilelist") || die "cannot open $corruptedfilelist file for read.";
## open the inconsistent as-path file for write
open(OUTPUT1, ">$aspathoutput") || die "cannot open $aspathoutput file for write.";
## open the inconsistent pop-path file for write
open(OUTPUT2, ">$poppathoutput") || die "cannot open $poppathoutput file for write.";
## open the policy-filtering-stats-out put file for write
open(OUTPUT3, ">$policyoutput") || die "cannot open $policyoutput file for write.";
## open the AS-loop file for write
open(OUTPUT4, ">as-loops.txt") || die "cannot open as-loops.txt for write.";
## open the AS-loop distribution file for write
open(OUTPUT5, ">as-loops-num-ips.txt") || die "cannot open as-loops-num-ips.txt for write.";
my $pt = new Net::Patricia;
my $prefixt = new Net::Patricia;
$startingtime =~ /(\d{2})(\d{2})(\d{2})(\d{ 2})(\d{2}) (\d{2})/;
my ($yy1, $mm1, $dd1, $hh1, $min1, $ss1) = ($1, $2, $3, $4, $5, $6);
$endingtime =~ /(\d{2})(\d{2})(\d{2})(\d{ 2})(\d{2}) (\d{2})/;
my ($yy2, $mm2, $dd2, $hh2, $min2, $ss2) = ($1, $2, $3, $4, $5, $6);
print OUTPUT3 "Traceroute files between 20$yy1-$mm1-$dd1 $hh1:$min1:$ss1 and 20$yy2-$mm2-$dd2 $hh2:$min2:$ss2 are checked.\n";
my $fileasn;
my $begintime = time(); ## get time in seconds since 1970
print OUTPUT3 "The code starts time is $begintime\n";
while(my $oneline = <INPUT1>) {
chomp($oneline);
my ($oneprefix, $oneas) = split(/\s+/, $oneline);
if($oneprefix =~ /\d{1,3}(.\d{1,3}){3}\/\d{ 1,2}/) {
$pt->add_string($oneprefix , $oneas);
$prefixt->add_string($onep refix);
}
}
close(INPUT1);
my %timefilehash = ();
my $totalfiles = 0;
my $corruptedfiles = 0;
my $workingfiles = 0;
while(my $file=<INPUT2>) {
chomp($file);
if( -f $file && $file =~ /_(\d{12})_re_\d+/ ) {
my $cnttime = $1;
my $diff1 = to_seconds($cnttime) - to_seconds($startingtime);
my $diff2 = to_seconds($endingtime) - to_seconds($cnttime);
if( $diff1 >= 0 && $diff2 >= 0 ) {
$totalfiles++;
$workingfiles++;
my $timefiles = $timefilehash{$cnttime};
if(not(defined($timefiles) )) {
$timefilehash{$cnttime} = [$file];
}
else {
push(@{$timefilehash{$cntt ime}}, $file);
}
}
}
}
while(my $file=<INPUT3>) {
chomp($file);
if(-f $file && $file =~ /_(\d{12})_re_\d+/ ) {
my $cnttime = $1;
my $diff1 = to_seconds($cnttime) - to_seconds($startingtime);
my $diff2 = to_seconds($endingtime) - to_seconds($cnttime);
if( $diff1 >= 0 && $diff2 >= 0 ) {
$totalfiles++;
$corruptedfiles++;
}
}
}
my @files = ();
sub to_seconds
{
use integer;
my $x = $_[0];
my $year = "20".substr($x,0,2);
my $mo = substr($x,2,2);
my $day = substr($x,4,2);
my $hour = substr($x,6,2);
my $minute = substr($x,8,2);
my $second = substr($x,10,2);
my $t = timelocal($second,$minute, $hour,$day ,$mo - 1,$year - 1900);
return($t);
}
my $numgroups= keys %timefilehash;
for my $onetime (sort keys %timefilehash) {
foreach my $onefile (@{$timefilehash{$onetime} }) {
push(@files, $onefile);
print "The file is $onefile\n";
}
}
# connect to mySQL database for later data query and retrieval
my $dsn = "DBI:mysql:test_bm"; # data source name
my $user_name = "root"; # user name
my $password = "NewPw"; # password
my %ipasntable = (); ## this hash table keeps the ASN of an IP from DNS name mapping
my %iplockeytable = (); ## this hash table keeps the PoP key of an IP from DNS name mapping
my %lockeyloctable = (); ## this hash table keeps the PoP of an PoP key from DNS naming mapping
my %iploctable = (); ## this hash table keeps the PoP of an IP from DNS name mapping
my %ipasnpoptable = (); ## this hashtable keeps the ASN and PoP value
my %bgphash = ();
my %igphash = ();
# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## select ipAddress, asn, lockey from the ipAddress table
my $sth = $dbh->prepare("SELECT ipAddress, asn, locKey FROM ipAddress");
$sth->execute();
## fetch query results from ipAddress table
while(my @ary = $sth->fetchrow_array()) {
my ($cntip, $cntasn, $cntkey) = @ary;
if($cntasn ne "NULL") {
if($cntasn > 0) {
$ipasntable{$cntip} = $cntasn;
}
}
else {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($cntip);
if(defined($cntasn)) {
$ipasntable{$cntip} = $cntasn;
}
}
if($cntkey ne "NULL") {
if($cntkey > 1) {
$iplockeytable{$cntip} = $cntkey;
}
}
}
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## select lockey, locName from the location table
$sth = $dbh->prepare("SELECT locKey, locName FROM location");
$sth->execute();
## fetch query results from location table
while(my @ary = $sth->fetchrow_array()) {
my ($cntkey, $cntloc) = @ary;
if($cntkey ne "NULL") {
if($cntkey > 1) {
$lockeyloctable{$cntkey} = $cntloc;
}
}
}
while ( my ($oneip, $onekey) = each(%iplockeytable) ) {
my $oneloc = $lockeyloctable{$onekey};
my $oneasn = $ipasntable{$oneip};
# print "For ip $oneip, its ASN is $oneasn, its PoP is $oneloc\n";
$iploctable{$oneip} = $oneloc;
}
## release iplockeytable and lockeyloctable memory
%iplockeytable = ();
%lockeyloctable = ();
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## drop inferred BGP table if it exists
my $bgpdrop = "
DROP TABLE IF EXISTS bgp";
$sth = $dbh->prepare($bgpdrop);
$sth->execute();
## create inferred BGP table
my $bgpcreate = "
CREATE TABLE bgp (
bkey int(12) unsigned NOT NULL auto_increment,
ptime datetime NOT NULL,
tstart datetime NOT NULL,
vpip varchar(16) NOT NULL,
dip varchar(24) NOT NULL,
cntas int(8) unsigned NOT NULL,
cntpop varchar(32) NOT NULL,
nextas int(8) unsigned,
nextpop varchar(32),
aspath varchar(64),
PRIMARY KEY (bkey)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8";
$sth = $dbh->prepare($bgpcreate);
$sth->execute();
## drop intra-AS PoP-path table if it exists
my $poppathdrop = "
DROP TABLE IF EXISTS poppath";
$sth = $dbh->prepare($poppathdrop );
$sth->execute();
## create intra-AS PoP-path table
my $poppathcreate = "
CREATE TABLE poppath (
pkey int(12) unsigned NOT NULL auto_increment,
ptime datetime NOT NULL,
tstart datetime NOT NULL,
vpip varchar(16) NOT NULL,
dip varchar(16) NOT NULL,
asn int(8) unsigned NOT NULL,
srcpop varchar(32) NOT NULL,
dstpop varchar(32) NOT NULL,
poppath varchar(256) NOT NULL,
ippathlen int(4) NOT NULL,
PRIMARY KEY (pkey)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8";
$sth = $dbh->prepare($poppathcrea te);
$sth->execute();
## subroutine to check whether an ASN is a targeted ASN
sub istargetas {
my $asn = $_;
if($asn eq "1239" || $asn eq "16631" || $asn eq "1668" || $asn eq "209" ||
$asn eq "2828" || $asn eq "2856" || $asn eq "2914" || $asn eq "3257" ||
$asn eq "3320" || $asn eq "3356" || $asn eq "3549" || $asn eq "3561" ||
$asn eq "5511" || $asn eq "6395" || $asn eq "6453" || $asn eq "6461" ||
$asn eq "701" || $asn eq "7018") {
return 1;
}
else {
return 0;
}
}
sub bgpcontains {
my ($first, $second) = @_;
foreach my $one (@{$first}) {
if($one eq $second) {
return 1;
}
}
return 0;
}
sub igpcontains {
my ($first, $second) = @_;
foreach my $one (@{$first}) {
if($one eq $second) {
return 1;
}
}
return 0;
}
my $totalnextingress = 0;
my $nextingressincluded = 0;
my $nextingressskipped = 0;
my $totalnextegress = 0;
my $nextegressskipped = 0;
my $nextegressincluded = 0;
my $totalaspath = 0;
my $aspathskipped = 0;
my $aspathincluded = 0;
my $totalpoppath = 0;
my $poppathskipped = 0;
my $poppathincluded = 0;
## this function prints out the IP-path of a traceroute path
sub getippath {
my (@hops) = @_;
my ($ippath) = split(/:/, $hops[0]);
for(my $i=1; $i<=$#hops; $i++) {
my ($cntip) = split(/:/, $hops[$i]);
$ippath .= "->$cntip";
}
return $ippath;
}
sub comparelist {
my ($list1, $list2) = @_;
my $listsize = $#{@{$list1}};
my $i;
for($i=0; $i<=$listsize; $i++) {
if(@{$list1}[$i] ne @{$list2}[$i]) {
return 1;
}
}
return 0;
}
## this subroutine removes duplicate IPs
sub removeduplicates {
my @hops = @_;
my @noduplicates = ();
## remove duplicate IPs in the hops
my $lastip = "-1";
for(my $i=0; $i<=$#hops; $i++) {
if($hops[$i] eq "*") {
push(@noduplicates, $hops[$i]);
}
else {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, $hops[$i]);
if($cntip ne $lastip) {
push(@noduplicates, $hops[$i]);
$lastip = $cntip;
}
}
}
return @noduplicates;
}
## this subroutine divdes IP-hops into AS groups
sub dividehopsintoasgroups {
my (@hops) = @_;
my $asstr = "";
my @asgroups = ();
my $cntindex = 0;
my $cntgroup = "";
my $lastasn="-1";
my $hopsize = $#hops;
## get as-path and divide hops into AS groups by getting each AS group's hop indices
foreach my $onehop (@hops) {
## skip stars
if($onehop eq "*") {
$cntindex++;
next;
}
my ($cntIP, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $onehop);
## this is the first AS in the traceroute path
if($lastasn eq "-1") {
if($cntasn ne "NULL" && $cntasn ne "0") {
$asstr = $cntasn;
$cntgroup = $cntindex;
$lastasn = $cntasn;
}
}
else { ## not first AS in the traceroute path
if($cntasn ne "NULL" && $cntasn ne "0" && $cntasn ne $lastasn) {
push(@asgroups, $cntgroup);
$asstr .= ">$cntasn";
$cntgroup = $cntindex;
$lastasn = $cntasn;
}
elsif($cntasn ne "NULL" && $cntasn ne "0" && $cntasn eq $lastasn) {
my $i;
$cntgroup .= ":$cntindex";
}
if($cntindex == $hopsize) {
push(@asgroups, $cntgroup);
}
}
$cntindex++;
}
return($asstr, @asgroups);
}
## this subroutine gets PoP-level path for the as groups
sub getpoppaths {
## get the PoP-level paths for the as groups
my ($hops, $asgroups) = @_;
my @groups = ();
foreach my $onegroup (@{$asgroups}) {
my (@indices) = split(/:/, $onegroup);
my $firstindex = $indices[0];
my $lastindex = $indices[$#indices];
my $groupstr = "";
my $lastpop = "-1";
my $i;
for($i=$firstindex; $i<=$lastindex; $i++) {
my $cnthop = @{$hops}[$i];
if($cnthop eq "*") {
if($groupstr eq "") {
$groupstr .= $cnthop;
$lastpop = "NULL";
}
else {
$groupstr .= "|$cnthop";
$lastpop = "NULL";
}
}
else {
my ($cntIP, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $cnthop);
# if($cntpop eq "NULL") {
if($groupstr eq "") {
$groupstr .= "$cntIP:$cntasn:$cntpop:$i "; ## i is the index of the hop in hops
$lastpop = "NULL";
}
else {
$groupstr .= "|$cntIP:$cntasn:$cntpop:$ i";
$lastpop = "NULL";
}
# }
#elsif($cntpop ne $lastpop) {
#if($groupstr eq "") {
# $groupstr .= "$cntIP:$cntasn:$cntpop:$i ";
# $lastpop = $cntpop;
#}
#else {
# $groupstr .= "|$cntIP:$cntasn:$cntpop:$ i";
# $lastpop = $cntpop;
#}
#}
}
} ## end for($i ... ...)
push(@groups, $groupstr);
}
return @groups;
}
## subroutine to get AS-path from current AS to destination host
sub getpartialaspath {
my ($afterstars, $asstr, $cntlastasn) = @_;
#print "The as string is $asstr\n";
my $partialaspath;
if($afterstars < 2) {
### get the partial as path from current as to the destination as
my (@ashops) = split(/>/, $asstr);
my $asindex;
my $jj;
for($jj=0; $jj<=$#ashops; $jj++) {
my $cntasn = $ashops[$jj];
if($cntasn == $cntlastasn) {
$asindex = $jj;
next;
}
}
$partialaspath = $ashops[$asindex];
for($jj=$asindex+1; $jj<=$#ashops; $jj++) {
$partialaspath .= ">$ashops[$jj]";
}
}
else {
$partialaspath = "NULL"; ## do not use the AS path if there exists two consecutive "*" after the current AS
}
return $partialaspath;
}
## this subroutine calculates the unknowns between current last group's last valid PoP and next group's first IP
sub checkunknownsbetweentwoase s {
my ($groups, $cntgroupindex, $hops) = @_;
my $groupsize = $#{@{$groups}};
my @cntgrouphops = split(/\|/, @{$groups}[$cntgroupindex] );
my $cntgroupsize = $#cntgrouphops;
#print "Current group hops are @cntgrouphops\n";
my ($cntgrouplastvalidIP, $cntgrouplastvalidasn, $cntgrouplastvalidpop, $cntgrouplastvalidindex);
my ($nextgroupfirstIP, $nextgroupfirstasn, $nextgroupfirstpop, $nextgroupfirstindex);
my $cntgrouphasvalidpop = 0;
my $islastgroup = 0;
## get the last valid hop elements at current group
for(my $i=$cntgroupsize; $i>=0; $i--) {
if($cntgrouphops[$i] ne "*") {
my ($cntIP, $cntasn, $cntpop, $cntindex) = split(/:/, $cntgrouphops[$i]);
if($cntasn ne "NULL" && $cntasn ne "0" && $cntpop ne "NULL") {
$cntgrouphasvalidpop = 1;
($cntgrouplastvalidIP, $cntgrouplastvalidasn, $cntgrouplastvalidpop, $cntgrouplastvalidindex) = ($cntIP, $cntasn, $cntpop, $cntindex);
}
}
}
## begin to check previous egress--next ingress PoP entry and populate it into the bgp table
if($cntgroupindex<$groupsi ze) { ## it is not the last AS
my $nextgroup = @{$groups}[$cntgroupindex+ 1];
my (@nextgrouphops) = split(/\|/, $nextgroup);
my $nextgroupfirsthop = $nextgrouphops[0];
if($nextgroupfirsthop eq "*") {
print "The first pop is *.\n";
print "The next group hops are @nextgrouphops\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
($nextgroupfirstIP, $nextgroupfirstasn, $nextgroupfirstpop, $nextgroupfirstindex) = split(/:/, $nextgroupfirsthop);
}
else {
$islastgroup = 1;
}
my $betweenunknowns;
## this variable indicates how many unknowns between the current group's last valid pop and next group's first IP
if($islastgroup == 0 && defined($cntgrouplastvalid index) && defined($nextgroupfirstind ex)) {
$betweenunknowns = $nextgroupfirstindex - $cntgrouplastvalidindex - 1;
}
return($islastgroup, $cntgrouphasvalidpop, $betweenunknowns, $cntgrouplastvalidasn, $cntgrouplastvalidpop, $cntgrouplastvalidindex, $nextgroupfirstasn, $nextgroupfirstIP);
}
## subroutine to generate next-ingress field for a bgp table entry
sub generatenextingressbgpentr y {
my ($key, $oneentry, $afterstars) = @_;
my $entries = $bgphash{$key};
if(not(defined($entries))) {
$totalaspath++;
if($afterstars >= 2) {
$aspathskipped++;
}
else {
$aspathincluded++;
}
$totalnextingress++; ## increase the next ingress entry count
$nextingressincluded++;
@{$bgphash{$key}} = ($oneentry);
}
else {
if(bgpcontains(\@{$entries }, $oneentry) == 0) {
$totalaspath++;
if($afterstars >= 2) {
$aspathskipped++;
}
else {
$aspathincluded++;
}
$totalnextingress++;
$nextingressincluded++;
push(@{$bgphash{$key}}, $oneentry);
}
}
}
## this subroutine create one bgp next-egress entry and store it in the bgp hash table
sub generatenextegressbgpentry {
my ($key, $oneentry, $afterstars) = @_;
$totalaspath++;
if($afterstars >= 2) {
$aspathskipped++;
}
else {
$aspathincluded++;
}
my $entries = $bgphash{$key};
if(not(defined($entries))) {
$totalnextegress++;
$nextegressincluded++;
@{$bgphash{$key}} = ($oneentry);
}
else {
if(bgpcontains(\@{$entries }, $oneentry) == 0) {
$totalnextegress++;
$nextegressincluded++;
push(@{$bgphash{$key}}, $oneentry);
}
}
}
## subroutine to generate igp poppath table entries
sub generatepoppathentries {
my ($hops, $cntgrouphops, $cnthopindex, $startasn, $startpop, $startindex, $srcip, $dstip, $starttime, $probetime) = @_;
my $cntgroupsize = $#{@{$cntgrouphops}};
my $hopsize = $#{@{$hops}};
my ($i, $j);
my $poppath = $startpop;
## begin to process the right-side paths of the current pop
for($i=$cnthopindex+1; $i<=$cntgroupsize; $i++) {
if(@{$cntgrouphops}[$i] eq "*") {
$poppath .= ">*";
next;
}
else {
my ($endip, $endasn, $endpop, $endindex) = split(/:/, @{$cntgrouphops}[$i]);
if($endasn ne "NULL" && $endasn ne $startasn) {
print "The start PoP ASN is $startasn, and the end PoP ASN is $endasn\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
if($endpop eq "NULL") {
$poppath .= ">NULL";
next;
}
else {
$poppath .= ">$endpop";
## begin to inspect and compress the PoP-path
my @pops = split(/>/, $poppath);
my @knownpops = ();
for($j=0; $j<=$#pops; $j++) {
## keep known PoP indices into PoP list
if($pops[$j] ne "*" && $pops[$j] ne "NULL") {
push(@knownpops, $j);
}
}
if($knownpops[0] != 0 || $knownpops[$#knownpops] != $#pops) {
print "The first known PoP index is $knownpops[0], and the last known PoP index is $knownpops[$#knownpops].\n ";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my $twounknownbetweendifferen tpops = 0;
my $oneunknownbetweendifferen tpops = 0;
my $newpoppath = "$pops[0]";
my $lastkeptPoP = $pops[0];
my $lastPoPindex = 0;
for(my $j=1; $j<=$#knownpops; $j++) {
my $cntPoP = $pops[$knownpops[$j]];
if($cntPoP ne $lastkeptPoP) {
## there are more than one NULL PoP or * between current PoP and last known PoP
if($knownpops[$j]-$lastPoP index > 2) {
$twounknownbetweendifferen tpops = 1;
last;
}
## there is one NULL PoP or * between current PoP and last known PoP, add a wild card
elsif($knownpops[$j]-$last PoPindex == 2) {
if($oneunknownbetweendiffe rentpops == 0) { ## this is the first NULL PoP or * between two known PoPs
$oneunknownbetweendifferen tpops = 1;
$lastPoPindex = $knownpops[$j];
$lastkeptPoP = $cntPoP;
$newpoppath .= ">*>$cntPoP";
}
else {
$twounknownbetweendifferen tpops = 1;
last;
}
}
else { ## this is no NULL PoP or * between two known PoPs
$lastPoPindex = $knownpops[$j];
$lastkeptPoP = $cntPoP;
$newpoppath .= ">$cntPoP";
}
}
}
## begin to find the earliest IP index in the same PoP as startIP
my $earliestindex = $startindex;
for(my $j=$startindex; $j>=0; $j--) {
my $cnthop = @{$hops}[$j];
if(@{$hops}[$j] ne "*") {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, @{$hops}[$j]);
if($cntasn eq $startasn && $cntpop eq $startpop) {
$earliestindex = $j;
}
elsif($cntpop ne "NULL" && $cntpop ne $startpop) {
last;
}
elsif($cntasn ne "NULL" && $cntasn ne $startasn) {
last;
}
}
}
## begin to find the latest IP index in the same PoP as endIP
my $latestindex = $endindex;
for(my $j=$endindex; $j<=$hopsize; $j++) {
my $cnthop = @{$hops}[$j];
if(@{$hops}[$j] ne "*") {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, @{$hops}[$j]);
if($cntasn eq $endasn && $cntpop eq $endpop) {
$latestindex = $j;
}
elsif($cntpop ne "NULL" && $cntpop ne $endpop) {
last;
}
elsif($cntasn ne "NULL" && $cntasn ne $endasn) {
last;
}
}
}
my $ippathlen = $latestindex - $earliestindex + 1;
my $key = "$srcip<$dstip<$startasn<$ startpop<$ endpop<$st arttime";
my $oneentry = "$probetime<$newpoppath<$i ppathlen";
my $entries = $igphash{$key};
if(not(defined($entries))) {
$totalpoppath++;
if($twounknownbetweendiffe rentpops == 1) {
$poppathskipped++;
}
else {
$poppathincluded++;
@{$igphash{$key}} = ($oneentry);
}
}
else {
if(igpcontains(\@{$entries }, $oneentry) == 0) {
$totalpoppath++;
if($twounknownbetweendiffe rentpops == 1) {
$poppathskipped++;
}
else {
$poppathincluded++;
push(@{$igphash{$key}}, $oneentry);
}
} ## end if(igpcontains(\@{$entries }, $oneentry) == 0) { ... }
} ## end if(not(defined($entries))) { ... } else { ... }
} ## end if($endpop eq "NULL") { ... } else { ... }
} ## end if(@{$cntgrouphops}[$i] eq "*") {...} else {...}
} ## end for($i=$cnthopindex+1; $i<=$cntgroupsize; $i++)
}
## get the consecutive * from current group index to the end of the group
sub getconsecutivestarsonpath {
my ($cntgrouplastindex, $hops) = @_;
my ($i, $hopsize);
my $afterstars = 0;
$hopsize = $#{@{$hops}};
for($i=$cntgrouplastindex+ 1; $i<=$hopsize; $i++) {
if(@{$hops}[$i] eq "*") {
$afterstars++;
if($afterstars == 2) {
last;
}
}
else {
$afterstars = 0;
}
}
return $afterstars;
}
## this subroutine generate table entries and
sub generatetableentriesforasg roups {
my ($hops, $groups, $asstr, $srcip, $dstip, $starttime, $probetime) = @_;
my ($hopsize, $groupsize) = ($#{@{$hops}}, $#{@{$groups}});
my ($i, $j);
for($i=0; $i<=$groupsize; $i++) { ## iterate through AS groups
my $onegroup = @{$groups}[$i];
my (@cntgrouphops) = split(/\|/, $onegroup);
my ($cntIP, $cntasn, $cntpop, $cntindex) = split(/:/, $cntgrouphops[0]);
if($cntasn != $fileasn) { ## only process the as group whose asn is specified in the file name
next;
}
if($cntgrouphops[0] eq "*") {
print "Current group first hop is *.\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my $cntgrouplasthop = $cntgrouphops[$#cntgroupho ps];
if($cntgrouplasthop eq "*") { ## the current last hop is not *
print "Current group last hop is *. \n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my ($cntgrouplastIP, $cntgrouplastasn, $cntgrouplastpop, $cntgrouplastindex) = split(/:/, $cntgrouplasthop);
my $afterstars = getconsecutivestarsonpath( $cntgroupl astindex, \@{$hops});
my $partialaspath = getpartialaspath($aftersta rs, $asstr, $cntgrouplastasn);
## print "The current asn is $cntlastasn, and the as path is $partialaspath, and the as string is $asstr, and file asn is $fileasn\n";
my ($islastgroup, $cntgrouphasvalidpop, $betweenunknowns, $cntgrouplastvalidasn, $cntgrouplastvalidpop, $cntgrouplastvalidindex, $nextgroupfirstasn, $nextgroupfirstIP) = checkunknownsbetweentwoase s(\@{$grou ps}, $i, \@{$hops});
if($islastgroup == 0) {
if($cntgrouphasvalidpop == 1 && $betweenunknowns <= 1) {
my $key = "$srcip<$dstip<$cntgroupla stvalidasn <$cntgroup lastvalidp op<$startt ime";
my $oneentry = "$probetime<$nextgroupfirs tasn<$next groupfirst IP<$partia laspath";
generatenextingressbgpentr y($key, $oneentry, $afterstars);
}
else {
$totalnextingress++;
$nextingressskipped++;
}## end if($cntgrouphasvalidpop == 1 && $betweenunknowns <= 1) {...} else {...}
}
## begin to generate bgp and IGP PoP-path entries for this group
for($j=0; $j<=$#cntgrouphops-1; $j++) {
if($cntgrouphops[$j] eq "*") {
next;
}
my ($startIP, $startasn, $startpop, $startindex) = split(/:/, $cntgrouphops[$j]);
if($startasn eq "0" || $startasn eq "NULL" || $startpop eq "NULL") {
next;
}
if(($islastgroup == 1 || $betweenunknowns <= 1) && $cntgrouphasvalidpop == 1) {
if($startindex >= $cntgrouplastvalidindex || $startpop eq $cntgrouplastvalidpop) {
next;
}
my $key = "$srcip<$dstip<$startasn<$ startpop<$ starttime" ;
my $oneentry = "$probetime<$cntgrouplastv alidasn<$c ntgrouplas tvalidpop< $partialas path";
generatenextegressbgpentry ($key, $oneentry, $afterstars);
}
else {
$totalnextegress++;
$nextegressskipped++;
}
## generate IGP pop-path table entries for this group
generatepoppathentries($ho ps, \@cntgrouphops, $j, $startasn, $startpop, $startindex, $srcip, $dstip, $starttime, $probetime);
} ## end for($j=0; $j<=$#cntgrouphops-1; $j++)
} ## end for($i=0; $i<=$#groups; $i++)
}
## subroutine to process one traceroute, create bgp entries and poppath entries, and insert entries into bgp table and poppath table
sub processonetraceroute {
my ($probetime, $starttime, @hops) = @_;
my $srchop = $hops[0];
my $dsthop = $hops[$#hops];
my ($srcip, $dummy1, $dummy2, $srcasn, $srcpop) = split(/:/, $srchop);
my ($dstip, $dummy3, $dummy4, $dstasn, $dstpop) = split(/:/, $dsthop);
@hops = removeduplicates(@hops); ## remove consecutive duplicate IPs in the traceroute hops
my ($asstr, @asgroups) = dividehopsintoasgroups(@ho ps); ## divide hops into as groups
my @groups = getpoppaths(\@hops, \@asgroups); ## get pop-path in ASes seperated by "|"
generatetableentriesforasg roups(\@ho ps, \@groups, $asstr, $srcip, $dstip, $probetime, $starttime);
}
## this subroutine inserts bgp and igp entries into mysql database
sub populatehashes {
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
my $str = "LOCK TABLES bgp WRITE";
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO bgp (ptime, tstart, vpip, dip, cntas, cntpop, nextas, nextpop, aspath) VALUES ";
my $first = 1;
my $entrycount = 0;
while ( my ($key, $val) = each(%bgphash) ) {
my ($srcip, $dstip, $asn, $pop, $starttime) = split(/</, $key);
my $count=0;
my $strtowrite = "";
$entrycount++;
foreach my $one (@{$val}) {
my ($probetime, $nextasn, $nextpop, $aspath) = split(/</, $one);
$count++;
if($first == 1) {
$str .= "('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$pop', $nextasn, '$nextpop', '$aspath')";
$first = 0;
}
else {
$str .= ", ('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$pop', $nextasn, '$nextpop', '$aspath')";
}
$strtowrite .= "$probetime\t$starttime\t$ srcip\t$ds tip\t$asn\ t$pop\t$ne xtasn\t$ne xtpop\t$as path\n";
}
if($count>1) {
print OUTPUT2 "$strtowrite";
}
if($entrycount >= 3000) {
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO bgp (ptime, tstart, vpip, dip, cntas, cntpop, nextas, nextpop, aspath) VALUES ";
$first = 1;
$entrycount = 0;
}
}
if($entrycount > 0) {
$sth = $dbh->prepare($str);
$sth->execute();
}
$str = "LOCK TABLES poppath WRITE";
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO poppath (ptime, tstart, vpip, dip, asn, srcpop, dstpop, poppath, ippathlen) VALUES ";
$first = 1;
$entrycount = 0;
while ( my ($key, $val) = each(%igphash) ) {
my ($srcip, $dstip, $asn, $srcpop, $dstpop, $starttime) = split(/</, $key);
my $count = 0;
my $strtowrite="";
$entrycount++;
foreach my $one (@{$val}) {
my ($probetime, $poppath, $ippathlen) = split(/</, $one);
$count++;
if($first == 1) {
$str .= "('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$srcpop', '$dstpop', '$poppath', '$ippathlen')";
$first = 0;
}
else {
$str .= ", ('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$srcpop', '$dstpop', '$poppath', '$ippathlen')";
}
$strtowrite .= "$probetime\t$starttime\t$ srcip\t$ds tip\t$asn\ t$srcpop\t $dstpop\t$ poppath\t$ ippathlen\ n";
}
if($count>1) {
print OUTPUT1 "$strtowrite";
}
if($entrycount >= 3000) {
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO poppath (ptime, tstart, vpip, dip, asn, srcpop, dstpop, poppath, ippathlen) VALUES ";
$first = 1;
$entrycount = 0;
}
}
if($entrycount>0) {
$sth = $dbh->prepare($str);
$sth->execute();
}
$str = "UNLOCK TABLES";
$sth = $dbh->prepare($str);
$sth->execute();
%bgphash = ();
%igphash = ();
}
my $filecount = 0;
my $discardedIPloops = 0;
my $discardedPoPloops = 0;
my $discardedASloops = 0;
my $processedtraceroutes = 0;
sub findnumIPs {
my @hops = @_;
my %lastasindex = ();
my $lastasn = "-1";
my $loopstr = "";
for(my $i=0; $i<=$#hops; $i++) {
if($hops[$i] ne "*") {
my ($ip, $rtt, $ttl, $asn, $pop) = split(/:/, $hops[$i]);
if($asn eq "NULL") {
next;
}
if(not(defined($lastasinde x{$asn}))) {
$lastasindex{$asn} = $i;
$lastasn = $asn;
}
else {
if($asn eq $lastasn) {
$lastasindex{$asn} = $i;
}
else {
my $lastindex = $lastasindex{$asn};
my $numipsonloop = $i - $lastindex;
my $distinctas = 0;
my $keptasn = -1;
for(my $j=$lastindex; $j<=$i; $j++) {
if($hops[$j] eq "*") {
if($loopstr eq "") {
$loopstr = "*";
}
else {
$loopstr .= ">*";
}
next;
}
my ($cntip, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $hops[$j]);
if($loopstr eq "") {
$loopstr = "$cntip:$cntasn";
}
else {
$loopstr .= ">$cntip:$cntasn";
}
if($cntasn ne "NULL" && $cntasn ne $keptasn) {
$distinctas++;
$keptasn = $cntasn;
}
} ## end for(my $j ... ...)
$distinctas--; ## decrease by one due to last repeating as
print OUTPUT5 "$numipsonloop $distinctas\n";
return $loopstr;
}
}
}
}
return $loopstr;
}
sub discardpath {
my (@hops) = @_;
## check for IP, PoP, AS loops
my %iphash = ();
my %pophash = ();
my %ashash = ();
my $lastip = "-1";
my $lastasn = -1;
my $lastpop = "-1";
my $aspath = "";
foreach my $onehop (@hops) {
if($onehop ne "*") {
my ($ip, $rtt, $ttl, $asn, $pop) = split(/:/, $onehop);
if($ip ne $lastip) {
if(not(defined($iphash{$ip }))) {
$iphash{$ip} = 1;
$lastip = $ip;
}
else {
$discardedIPloops++;
## print "The traceroute @hops contains an IP loop.\n";
## print "$ip appeared more than once.\n";
return 1;
}
}
if($asn ne "NULL") {
if($pop ne "NULL") {
my $cntpop = "$asn->$pop";
if($cntpop ne $lastpop) {
if(not(defined($pophash{$c ntpop}))) {
$pophash{$cntpop} = 1;
$lastpop = $cntpop;
}
else {
$discardedPoPloops++;
## print "The traceroute @hops contains a PoP loop.\n";
return 2;
}
}
}
$aspath .= "$asn|";
if($asn ne $lastasn) {
if(not(defined($ashash{$as n}))) {
$ashash{$asn} = 1;
$lastasn = $asn;
}
else {
my $loopstr = findnumIPs(@hops);
print OUTPUT4 "The hops are @hops, there is an AS loop. The AS loops are $aspath\n";
print OUTPUT4 "The AS loop segments are $loopstr\n";
$discardedASloops++;
return 3;
}
}
}
}
}
return 0;
}
my $totaltraceroutes = 0;
my $processbegintime = time(); ## get time in seconds since 1970
my $lastprocesstime = $processbegintime;
my $cntprocesstime;
my $totalprocesstime = 0;
# begin to process traceroute plain texts, and map intermediate IPs to their AS numbers and locations (POPs)
foreach my $file (@files) {
my $lastinserttime="-1"; ## this stores the old data collection hour
# this is traceroute plain text file
if(-f $file) {
open(INPUT1, "zcat $file | ") || die "can't open file $file for read";
$filecount++;
print "File $filecount is $file\n";
my $tag;
my $date;
my $time;
my $srcaddr;
my $arrow;
my $dstaddr;
my $icmpstatus;
my $hopcount;
my @hops = ();
my $lastindex=0;
my $dummy;
my $cntindex=0;
my $cntIP;
my $cntrtt;
my $cntttl;
my $firsthop;
my $lasthop;
## begin to extract probing time from the file name
# extension is in the format of .*
$file =~ /.*\/([^\/]+)/;
my $filename = $1;
my ($site, $datetime, $re, $targetasn) = split(/_/, $filename);
$fileasn = $targetasn;
$datetime =~ /(\d\d)(\d\d)(\d\d)(\d\d)( \d\d)(\d\d )/;
my ($yy, $mm, $dd, $hh, $min, $ss) = ($1, $2, $3, $4, $5, $6);
my $yyyy;
my $probetime;
my $starttime;
$probetime = "20$yy-$mm-$dd $hh:$min:$ss";
my $startmin = 15*int(($min/15));
$starttime = "20$yy-$mm-$dd $hh:$startmin:00";
## begin to process lines iteratively
while(my $line = <INPUT1>) {
chomp($line);
## this is the beginning of a new traceroute probe
if($line =~ /->/) {
($tag, $date, $time, $srcaddr, $arrow, $dstaddr, $icmpstatus, $hopcount) = split(/\s+/, $line);
my ($firsthop, $lasthop, $oneIP, $IPnum, $cntasn, $cntpop);
if(not(defined($ipasnpopta ble{$srcad dr}))) {
## find out the asn and pop for the first IP
if(not($oneIP = new Net::IP($srcaddr))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
$IPnum = $oneIP->intip();
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($srcaddr );
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$firsthop = "$srcaddr:-1:$-1:$cntasn:$ cntpop";
$ipasnpoptable{$srcaddr} = $firsthop;
}
else {
$firsthop = $ipasnpoptable{$srcaddr};
}
if($lastinserttime ne "-1") {
my $oldtime = to_seconds($lastinserttime );
my $newtime = to_seconds("$yy$mm$dd$hh$m in$ss");
my $timediff = $newtime - $oldtime;
## populate bgp and igp hash into database if it has been 3 hours since last insertion
if( $timediff > 10800 ) {
$cntprocesstime = time();
$totalprocesstime += $cntprocesstime - $lastprocesstime;
$lastprocesstime = $processbegintime;
populatehashes();
$lastinserttime = "$yy$mm$dd$hh$min$ss";
}
}
else {
$lastinserttime = "$yy$mm$dd$hh$min$ss";
}
if(not(defined($ipasnpopta ble{$dstad dr}))) {
## find out the asn and pop for the last IP
if(not($oneIP = new Net::IP($dstaddr))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
$IPnum = $oneIP->intip();
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($dstaddr );
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$lasthop = "$dstaddr:-1:$-1:$cntasn:$ cntpop";
$ipasnpoptable{$dstaddr} = $lasthop;
}
else {
$lasthop = $ipasnpoptable{$dstaddr};
}
}
elsif($line =~ /duration/) {
next;
}
elsif($line !~ /^\s*$/) { # process the line if it contains other than white spaces
($dummy, $cntindex, $cntIP, $cntrtt, $cntttl) = split(/\s+/, $line);
if($lastindex != 0) { # not the beginning of the first hop
my $numstars = $cntindex-$lastindex-1;
my $i;
## fill in * for those missing hops
for($i=0; $i<$numstars; $i++) {
push(@hops, "*");
}
}
my $hopstr;
if(not(defined($ipasnpopta ble{$cntIP }))) {
# print "Current ip is $cntIP\n";
my $oneIP;
if(not($oneIP = new Net::IP($cntIP))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
my $IPnum = $oneIP->intip();
my $cntasn;
my $cntpop;
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($cntIP);
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$hopstr = "$cntIP:$cntrtt:$cntttl:$c ntasn:$cnt pop";
$ipasnpoptable{$cntIP} = $hopstr;
}
else {
$hopstr = $ipasnpoptable{$cntIP};
}
push(@hops, $hopstr);
$lastindex = $cntindex;
}
elsif($line =~ /^\s*$/) { # this is the white space lines
## begin to process one traceroute probing result
if($#hops >=0) {
my @cnthops = ($firsthop, @hops, $lasthop);
$totaltraceroutes++;
if(discardpath(@cnthops) == 0) {
$processedtraceroutes++;
processonetraceroute($prob etime, $starttime, @cnthops);
}
}
## last line is the end of one traceroute probe
if($lastindex != 0) {
@hops = (); # reset hops array to empty
$lastindex = 0; # reset lastindex to 0
}
}
}
close(INPUT1);
}
}
$cntprocesstime = time();
$totalprocesstime += $cntprocesstime - $lastprocesstime;
$lastprocesstime = $processbegintime;
populatehashes();
$sth->finish ();
$dbh->disconnect ();
my $ratio;
print OUTPUT3 "$totalfiles traceroute files are checked.\n";
print OUTPUT3 "There are $numgroups groups(s) of probing in the checking interval.\n";
for my $onetime (sort keys %timefilehash) {
my @cntset = @{$timefilehash{$onetime}} ;
my $goodfiles = $#cntset+1;
print OUTPUT3 "At time $onetime, there are $goodfiles good files.\n";
foreach my $onefile (@{$timefilehash{$onetime} }) {
print "The file is $onefile\n";
}
}
print OUTPUT3 "\n\n";
$ratio = $corruptedfiles/$totalfile s;
print OUTPUT3 "$corruptedfiles, counted as $ratio, traceroute files are corrupted and are discarded.\n";
$ratio = $workingfiles/$totalfiles;
print OUTPUT3 "$workingfiles, counted as $ratio, traceroute files are good.\n\n\n";
print OUTPUT3 "$totaltraceroutes traceroute paths are parsed.\n";
$ratio = $discardedIPloops/$totaltr aceroutes;
print OUTPUT3 "$discardedIPloops, counted as $ratio, traceroute paths are discarded due to IP loops.\n";
$ratio = $discardedPoPloops/$totalt raceroutes ;
print OUTPUT3 "$discardedPoPloops, counted as $ratio, traceroute paths are discarded due to PoP loops.\n";
$ratio = $discardedASloops/$totaltr aceroutes;
print OUTPUT3 "$discardedASloops traceroute paths, counted as $ratio, are discarded due to AS loops.\n";
$ratio = $processedtraceroutes/$tot altracerou tes;
print OUTPUT3 "$processedtraceroutes, counted as $ratio, traceroutes are processed.\n\n\n";
print OUTPUT3 "$totalnextingress BGP next-ingresses are checked.\n";
$ratio = $nextingressskipped/$total nextingres s;
print OUTPUT3 "$nextingressskipped, counted as $ratio, BGP next-ingresses are discarded due to more than one consecutive unknowns between last valid PoP and next-ingress IP.\n";
$ratio = $nextingressincluded/$tota lnextingre ss;
print OUTPUT3 "$nextingressincluded, counted as $ratio, BGP next-ingresses are included.\n";
print OUTPUT3 "$totalnextegress BGP same-AS egresses are checked.\n";
$ratio = $nextegressskipped/$totaln extegress;
print OUTPUT3 "$nextegressskipped, counted as $ratio, BGP same-AS egresses are discarded due to more than one consecutive unknowns between last valid PoP and next-ingress IP.\n";
$ratio = $nextegressincluded/$total nextegress ;
print OUTPUT3 "$nextegressincluded, counted as $ratio, BGP same-AS egresses are included.\n";
print OUTPUT3 "$totalaspath BGP AS-paths are checked.\n";
$ratio = $aspathskipped/$totalaspat h;
print OUTPUT3 "$aspathskipped, counted as $ratio, BGP AS-paths are discarded due to more than one consecutive stars on IP-path from current AS to destination host.\n";
$ratio = $aspathincluded/$totalaspa th;
print OUTPUT3 "$aspathincluded, counted as $ratio, BGP AS-paths are included.\n";
print OUTPUT3 "$totalpoppath IGP PoP-paths are checked.\n";
$ratio = $poppathskipped/$totalpopp ath;
print OUTPUT3 "$poppathskipped, counted as $ratio, IGP PoP-paths are discarded due to more than one unknown PoPs (either NULL PoP or *) between different neighboring PoPs.\n";
$ratio = $poppathincluded/$totalpop path;
print OUTPUT3 "$poppathincluded, counted as $ratio, IGP PoP-paths are included.\n";
my $stoptime = time(); ## get time in seconds since 1970
my $elapsedtime = $stoptime - $begintime;
print OUTPUT3 "The code stops at time $stoptime\n\n";
print OUTPUT3 "The code runs $elapsedtime seconds\n";
print OUTPUT3 "The traceroute processing runs $totalprocesstime seconds\n";
close(OUTPUT1);
close(OUTPUT2);
close(OUTPUT3);
close(OUTPUT4);
close(OUTPUT5);
exit (0);
#!/usr/bin/perl -w
# process-traceroute-insert-
use strict;
use File::Find;
use File::Basename;
use DBI;
use Net::IP;
use Net::Patricia;
use Time::Local;
## start-time is the selected starting time, end-time is the selected ending time for traceroute data processing, prefixasfile is the prefix-as mapping file name, inconsistent-as-path-outpu
if($#ARGV != 7) {
print "usage: process-traceroute.pl start-time end-time good-traceroute-file-list corrupted-traceroute-file-
print "start-time and end-time in format YYMMDDHHMMSS\n";
exit(1);
}
my ($startingtime, $endingtime, $goodfilelist, $corruptedfilelist, $prefixasfile, $aspathoutput, $poppathoutput, $policyoutput) = @ARGV;
## open the prefix-as mapping file and store them in the Patricia handler
open(INPUT1, "<$prefixasfile") || die "cannot open $prefixasfile file for read.";
## open the good file list
open(INPUT2, "<$goodfilelist") || die "cannot open $goodfilelist file for read.";
## open the corrupted file list
open(INPUT3, "<$corruptedfilelist") || die "cannot open $corruptedfilelist file for read.";
## open the inconsistent as-path file for write
open(OUTPUT1, ">$aspathoutput") || die "cannot open $aspathoutput file for write.";
## open the inconsistent pop-path file for write
open(OUTPUT2, ">$poppathoutput") || die "cannot open $poppathoutput file for write.";
## open the policy-filtering-stats-out
open(OUTPUT3, ">$policyoutput") || die "cannot open $policyoutput file for write.";
## open the AS-loop file for write
open(OUTPUT4, ">as-loops.txt") || die "cannot open as-loops.txt for write.";
## open the AS-loop distribution file for write
open(OUTPUT5, ">as-loops-num-ips.txt") || die "cannot open as-loops-num-ips.txt for write.";
my $pt = new Net::Patricia;
my $prefixt = new Net::Patricia;
$startingtime =~ /(\d{2})(\d{2})(\d{2})(\d{
my ($yy1, $mm1, $dd1, $hh1, $min1, $ss1) = ($1, $2, $3, $4, $5, $6);
$endingtime =~ /(\d{2})(\d{2})(\d{2})(\d{
my ($yy2, $mm2, $dd2, $hh2, $min2, $ss2) = ($1, $2, $3, $4, $5, $6);
print OUTPUT3 "Traceroute files between 20$yy1-$mm1-$dd1 $hh1:$min1:$ss1 and 20$yy2-$mm2-$dd2 $hh2:$min2:$ss2 are checked.\n";
my $fileasn;
my $begintime = time(); ## get time in seconds since 1970
print OUTPUT3 "The code starts time is $begintime\n";
while(my $oneline = <INPUT1>) {
chomp($oneline);
my ($oneprefix, $oneas) = split(/\s+/, $oneline);
if($oneprefix =~ /\d{1,3}(.\d{1,3}){3}\/\d{
$pt->add_string($oneprefix
$prefixt->add_string($onep
}
}
close(INPUT1);
my %timefilehash = ();
my $totalfiles = 0;
my $corruptedfiles = 0;
my $workingfiles = 0;
while(my $file=<INPUT2>) {
chomp($file);
if( -f $file && $file =~ /_(\d{12})_re_\d+/ ) {
my $cnttime = $1;
my $diff1 = to_seconds($cnttime) - to_seconds($startingtime);
my $diff2 = to_seconds($endingtime) - to_seconds($cnttime);
if( $diff1 >= 0 && $diff2 >= 0 ) {
$totalfiles++;
$workingfiles++;
my $timefiles = $timefilehash{$cnttime};
if(not(defined($timefiles)
$timefilehash{$cnttime} = [$file];
}
else {
push(@{$timefilehash{$cntt
}
}
}
}
while(my $file=<INPUT3>) {
chomp($file);
if(-f $file && $file =~ /_(\d{12})_re_\d+/ ) {
my $cnttime = $1;
my $diff1 = to_seconds($cnttime) - to_seconds($startingtime);
my $diff2 = to_seconds($endingtime) - to_seconds($cnttime);
if( $diff1 >= 0 && $diff2 >= 0 ) {
$totalfiles++;
$corruptedfiles++;
}
}
}
my @files = ();
sub to_seconds
{
use integer;
my $x = $_[0];
my $year = "20".substr($x,0,2);
my $mo = substr($x,2,2);
my $day = substr($x,4,2);
my $hour = substr($x,6,2);
my $minute = substr($x,8,2);
my $second = substr($x,10,2);
my $t = timelocal($second,$minute,
return($t);
}
my $numgroups= keys %timefilehash;
for my $onetime (sort keys %timefilehash) {
foreach my $onefile (@{$timefilehash{$onetime}
push(@files, $onefile);
print "The file is $onefile\n";
}
}
# connect to mySQL database for later data query and retrieval
my $dsn = "DBI:mysql:test_bm"; # data source name
my $user_name = "root"; # user name
my $password = "NewPw"; # password
my %ipasntable = (); ## this hash table keeps the ASN of an IP from DNS name mapping
my %iplockeytable = (); ## this hash table keeps the PoP key of an IP from DNS name mapping
my %lockeyloctable = (); ## this hash table keeps the PoP of an PoP key from DNS naming mapping
my %iploctable = (); ## this hash table keeps the PoP of an IP from DNS name mapping
my %ipasnpoptable = (); ## this hashtable keeps the ASN and PoP value
my %bgphash = ();
my %igphash = ();
# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## select ipAddress, asn, lockey from the ipAddress table
my $sth = $dbh->prepare("SELECT ipAddress, asn, locKey FROM ipAddress");
$sth->execute();
## fetch query results from ipAddress table
while(my @ary = $sth->fetchrow_array()) {
my ($cntip, $cntasn, $cntkey) = @ary;
if($cntasn ne "NULL") {
if($cntasn > 0) {
$ipasntable{$cntip} = $cntasn;
}
}
else {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($cntip);
if(defined($cntasn)) {
$ipasntable{$cntip} = $cntasn;
}
}
if($cntkey ne "NULL") {
if($cntkey > 1) {
$iplockeytable{$cntip} = $cntkey;
}
}
}
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## select lockey, locName from the location table
$sth = $dbh->prepare("SELECT locKey, locName FROM location");
$sth->execute();
## fetch query results from location table
while(my @ary = $sth->fetchrow_array()) {
my ($cntkey, $cntloc) = @ary;
if($cntkey ne "NULL") {
if($cntkey > 1) {
$lockeyloctable{$cntkey} = $cntloc;
}
}
}
while ( my ($oneip, $onekey) = each(%iplockeytable) ) {
my $oneloc = $lockeyloctable{$onekey};
my $oneasn = $ipasntable{$oneip};
# print "For ip $oneip, its ASN is $oneasn, its PoP is $oneloc\n";
$iploctable{$oneip} = $oneloc;
}
## release iplockeytable and lockeyloctable memory
%iplockeytable = ();
%lockeyloctable = ();
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
## drop inferred BGP table if it exists
my $bgpdrop = "
DROP TABLE IF EXISTS bgp";
$sth = $dbh->prepare($bgpdrop);
$sth->execute();
## create inferred BGP table
my $bgpcreate = "
CREATE TABLE bgp (
bkey int(12) unsigned NOT NULL auto_increment,
ptime datetime NOT NULL,
tstart datetime NOT NULL,
vpip varchar(16) NOT NULL,
dip varchar(24) NOT NULL,
cntas int(8) unsigned NOT NULL,
cntpop varchar(32) NOT NULL,
nextas int(8) unsigned,
nextpop varchar(32),
aspath varchar(64),
PRIMARY KEY (bkey)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8";
$sth = $dbh->prepare($bgpcreate);
$sth->execute();
## drop intra-AS PoP-path table if it exists
my $poppathdrop = "
DROP TABLE IF EXISTS poppath";
$sth = $dbh->prepare($poppathdrop
$sth->execute();
## create intra-AS PoP-path table
my $poppathcreate = "
CREATE TABLE poppath (
pkey int(12) unsigned NOT NULL auto_increment,
ptime datetime NOT NULL,
tstart datetime NOT NULL,
vpip varchar(16) NOT NULL,
dip varchar(16) NOT NULL,
asn int(8) unsigned NOT NULL,
srcpop varchar(32) NOT NULL,
dstpop varchar(32) NOT NULL,
poppath varchar(256) NOT NULL,
ippathlen int(4) NOT NULL,
PRIMARY KEY (pkey)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8";
$sth = $dbh->prepare($poppathcrea
$sth->execute();
## subroutine to check whether an ASN is a targeted ASN
sub istargetas {
my $asn = $_;
if($asn eq "1239" || $asn eq "16631" || $asn eq "1668" || $asn eq "209" ||
$asn eq "2828" || $asn eq "2856" || $asn eq "2914" || $asn eq "3257" ||
$asn eq "3320" || $asn eq "3356" || $asn eq "3549" || $asn eq "3561" ||
$asn eq "5511" || $asn eq "6395" || $asn eq "6453" || $asn eq "6461" ||
$asn eq "701" || $asn eq "7018") {
return 1;
}
else {
return 0;
}
}
sub bgpcontains {
my ($first, $second) = @_;
foreach my $one (@{$first}) {
if($one eq $second) {
return 1;
}
}
return 0;
}
sub igpcontains {
my ($first, $second) = @_;
foreach my $one (@{$first}) {
if($one eq $second) {
return 1;
}
}
return 0;
}
my $totalnextingress = 0;
my $nextingressincluded = 0;
my $nextingressskipped = 0;
my $totalnextegress = 0;
my $nextegressskipped = 0;
my $nextegressincluded = 0;
my $totalaspath = 0;
my $aspathskipped = 0;
my $aspathincluded = 0;
my $totalpoppath = 0;
my $poppathskipped = 0;
my $poppathincluded = 0;
## this function prints out the IP-path of a traceroute path
sub getippath {
my (@hops) = @_;
my ($ippath) = split(/:/, $hops[0]);
for(my $i=1; $i<=$#hops; $i++) {
my ($cntip) = split(/:/, $hops[$i]);
$ippath .= "->$cntip";
}
return $ippath;
}
sub comparelist {
my ($list1, $list2) = @_;
my $listsize = $#{@{$list1}};
my $i;
for($i=0; $i<=$listsize; $i++) {
if(@{$list1}[$i] ne @{$list2}[$i]) {
return 1;
}
}
return 0;
}
## this subroutine removes duplicate IPs
sub removeduplicates {
my @hops = @_;
my @noduplicates = ();
## remove duplicate IPs in the hops
my $lastip = "-1";
for(my $i=0; $i<=$#hops; $i++) {
if($hops[$i] eq "*") {
push(@noduplicates, $hops[$i]);
}
else {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, $hops[$i]);
if($cntip ne $lastip) {
push(@noduplicates, $hops[$i]);
$lastip = $cntip;
}
}
}
return @noduplicates;
}
## this subroutine divdes IP-hops into AS groups
sub dividehopsintoasgroups {
my (@hops) = @_;
my $asstr = "";
my @asgroups = ();
my $cntindex = 0;
my $cntgroup = "";
my $lastasn="-1";
my $hopsize = $#hops;
## get as-path and divide hops into AS groups by getting each AS group's hop indices
foreach my $onehop (@hops) {
## skip stars
if($onehop eq "*") {
$cntindex++;
next;
}
my ($cntIP, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $onehop);
## this is the first AS in the traceroute path
if($lastasn eq "-1") {
if($cntasn ne "NULL" && $cntasn ne "0") {
$asstr = $cntasn;
$cntgroup = $cntindex;
$lastasn = $cntasn;
}
}
else { ## not first AS in the traceroute path
if($cntasn ne "NULL" && $cntasn ne "0" && $cntasn ne $lastasn) {
push(@asgroups, $cntgroup);
$asstr .= ">$cntasn";
$cntgroup = $cntindex;
$lastasn = $cntasn;
}
elsif($cntasn ne "NULL" && $cntasn ne "0" && $cntasn eq $lastasn) {
my $i;
$cntgroup .= ":$cntindex";
}
if($cntindex == $hopsize) {
push(@asgroups, $cntgroup);
}
}
$cntindex++;
}
return($asstr, @asgroups);
}
## this subroutine gets PoP-level path for the as groups
sub getpoppaths {
## get the PoP-level paths for the as groups
my ($hops, $asgroups) = @_;
my @groups = ();
foreach my $onegroup (@{$asgroups}) {
my (@indices) = split(/:/, $onegroup);
my $firstindex = $indices[0];
my $lastindex = $indices[$#indices];
my $groupstr = "";
my $lastpop = "-1";
my $i;
for($i=$firstindex; $i<=$lastindex; $i++) {
my $cnthop = @{$hops}[$i];
if($cnthop eq "*") {
if($groupstr eq "") {
$groupstr .= $cnthop;
$lastpop = "NULL";
}
else {
$groupstr .= "|$cnthop";
$lastpop = "NULL";
}
}
else {
my ($cntIP, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $cnthop);
# if($cntpop eq "NULL") {
if($groupstr eq "") {
$groupstr .= "$cntIP:$cntasn:$cntpop:$i
$lastpop = "NULL";
}
else {
$groupstr .= "|$cntIP:$cntasn:$cntpop:$
$lastpop = "NULL";
}
# }
#elsif($cntpop ne $lastpop) {
#if($groupstr eq "") {
# $groupstr .= "$cntIP:$cntasn:$cntpop:$i
# $lastpop = $cntpop;
#}
#else {
# $groupstr .= "|$cntIP:$cntasn:$cntpop:$
# $lastpop = $cntpop;
#}
#}
}
} ## end for($i ... ...)
push(@groups, $groupstr);
}
return @groups;
}
## subroutine to get AS-path from current AS to destination host
sub getpartialaspath {
my ($afterstars, $asstr, $cntlastasn) = @_;
#print "The as string is $asstr\n";
my $partialaspath;
if($afterstars < 2) {
### get the partial as path from current as to the destination as
my (@ashops) = split(/>/, $asstr);
my $asindex;
my $jj;
for($jj=0; $jj<=$#ashops; $jj++) {
my $cntasn = $ashops[$jj];
if($cntasn == $cntlastasn) {
$asindex = $jj;
next;
}
}
$partialaspath = $ashops[$asindex];
for($jj=$asindex+1; $jj<=$#ashops; $jj++) {
$partialaspath .= ">$ashops[$jj]";
}
}
else {
$partialaspath = "NULL"; ## do not use the AS path if there exists two consecutive "*" after the current AS
}
return $partialaspath;
}
## this subroutine calculates the unknowns between current last group's last valid PoP and next group's first IP
sub checkunknownsbetweentwoase
my ($groups, $cntgroupindex, $hops) = @_;
my $groupsize = $#{@{$groups}};
my @cntgrouphops = split(/\|/, @{$groups}[$cntgroupindex]
my $cntgroupsize = $#cntgrouphops;
#print "Current group hops are @cntgrouphops\n";
my ($cntgrouplastvalidIP, $cntgrouplastvalidasn, $cntgrouplastvalidpop, $cntgrouplastvalidindex);
my ($nextgroupfirstIP, $nextgroupfirstasn, $nextgroupfirstpop, $nextgroupfirstindex);
my $cntgrouphasvalidpop = 0;
my $islastgroup = 0;
## get the last valid hop elements at current group
for(my $i=$cntgroupsize; $i>=0; $i--) {
if($cntgrouphops[$i] ne "*") {
my ($cntIP, $cntasn, $cntpop, $cntindex) = split(/:/, $cntgrouphops[$i]);
if($cntasn ne "NULL" && $cntasn ne "0" && $cntpop ne "NULL") {
$cntgrouphasvalidpop = 1;
($cntgrouplastvalidIP, $cntgrouplastvalidasn, $cntgrouplastvalidpop, $cntgrouplastvalidindex) = ($cntIP, $cntasn, $cntpop, $cntindex);
}
}
}
## begin to check previous egress--next ingress PoP entry and populate it into the bgp table
if($cntgroupindex<$groupsi
my $nextgroup = @{$groups}[$cntgroupindex+
my (@nextgrouphops) = split(/\|/, $nextgroup);
my $nextgroupfirsthop = $nextgrouphops[0];
if($nextgroupfirsthop eq "*") {
print "The first pop is *.\n";
print "The next group hops are @nextgrouphops\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
($nextgroupfirstIP, $nextgroupfirstasn, $nextgroupfirstpop, $nextgroupfirstindex) = split(/:/, $nextgroupfirsthop);
}
else {
$islastgroup = 1;
}
my $betweenunknowns;
## this variable indicates how many unknowns between the current group's last valid pop and next group's first IP
if($islastgroup == 0 && defined($cntgrouplastvalid
$betweenunknowns = $nextgroupfirstindex - $cntgrouplastvalidindex - 1;
}
return($islastgroup, $cntgrouphasvalidpop, $betweenunknowns, $cntgrouplastvalidasn, $cntgrouplastvalidpop, $cntgrouplastvalidindex, $nextgroupfirstasn, $nextgroupfirstIP);
}
## subroutine to generate next-ingress field for a bgp table entry
sub generatenextingressbgpentr
my ($key, $oneentry, $afterstars) = @_;
my $entries = $bgphash{$key};
if(not(defined($entries)))
$totalaspath++;
if($afterstars >= 2) {
$aspathskipped++;
}
else {
$aspathincluded++;
}
$totalnextingress++; ## increase the next ingress entry count
$nextingressincluded++;
@{$bgphash{$key}} = ($oneentry);
}
else {
if(bgpcontains(\@{$entries
$totalaspath++;
if($afterstars >= 2) {
$aspathskipped++;
}
else {
$aspathincluded++;
}
$totalnextingress++;
$nextingressincluded++;
push(@{$bgphash{$key}}, $oneentry);
}
}
}
## this subroutine create one bgp next-egress entry and store it in the bgp hash table
sub generatenextegressbgpentry
my ($key, $oneentry, $afterstars) = @_;
$totalaspath++;
if($afterstars >= 2) {
$aspathskipped++;
}
else {
$aspathincluded++;
}
my $entries = $bgphash{$key};
if(not(defined($entries)))
$totalnextegress++;
$nextegressincluded++;
@{$bgphash{$key}} = ($oneentry);
}
else {
if(bgpcontains(\@{$entries
$totalnextegress++;
$nextegressincluded++;
push(@{$bgphash{$key}}, $oneentry);
}
}
}
## subroutine to generate igp poppath table entries
sub generatepoppathentries {
my ($hops, $cntgrouphops, $cnthopindex, $startasn, $startpop, $startindex, $srcip, $dstip, $starttime, $probetime) = @_;
my $cntgroupsize = $#{@{$cntgrouphops}};
my $hopsize = $#{@{$hops}};
my ($i, $j);
my $poppath = $startpop;
## begin to process the right-side paths of the current pop
for($i=$cnthopindex+1; $i<=$cntgroupsize; $i++) {
if(@{$cntgrouphops}[$i] eq "*") {
$poppath .= ">*";
next;
}
else {
my ($endip, $endasn, $endpop, $endindex) = split(/:/, @{$cntgrouphops}[$i]);
if($endasn ne "NULL" && $endasn ne $startasn) {
print "The start PoP ASN is $startasn, and the end PoP ASN is $endasn\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
if($endpop eq "NULL") {
$poppath .= ">NULL";
next;
}
else {
$poppath .= ">$endpop";
## begin to inspect and compress the PoP-path
my @pops = split(/>/, $poppath);
my @knownpops = ();
for($j=0; $j<=$#pops; $j++) {
## keep known PoP indices into PoP list
if($pops[$j] ne "*" && $pops[$j] ne "NULL") {
push(@knownpops, $j);
}
}
if($knownpops[0] != 0 || $knownpops[$#knownpops] != $#pops) {
print "The first known PoP index is $knownpops[0], and the last known PoP index is $knownpops[$#knownpops].\n
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my $twounknownbetweendifferen
my $oneunknownbetweendifferen
my $newpoppath = "$pops[0]";
my $lastkeptPoP = $pops[0];
my $lastPoPindex = 0;
for(my $j=1; $j<=$#knownpops; $j++) {
my $cntPoP = $pops[$knownpops[$j]];
if($cntPoP ne $lastkeptPoP) {
## there are more than one NULL PoP or * between current PoP and last known PoP
if($knownpops[$j]-$lastPoP
$twounknownbetweendifferen
last;
}
## there is one NULL PoP or * between current PoP and last known PoP, add a wild card
elsif($knownpops[$j]-$last
if($oneunknownbetweendiffe
$oneunknownbetweendifferen
$lastPoPindex = $knownpops[$j];
$lastkeptPoP = $cntPoP;
$newpoppath .= ">*>$cntPoP";
}
else {
$twounknownbetweendifferen
last;
}
}
else { ## this is no NULL PoP or * between two known PoPs
$lastPoPindex = $knownpops[$j];
$lastkeptPoP = $cntPoP;
$newpoppath .= ">$cntPoP";
}
}
}
## begin to find the earliest IP index in the same PoP as startIP
my $earliestindex = $startindex;
for(my $j=$startindex; $j>=0; $j--) {
my $cnthop = @{$hops}[$j];
if(@{$hops}[$j] ne "*") {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, @{$hops}[$j]);
if($cntasn eq $startasn && $cntpop eq $startpop) {
$earliestindex = $j;
}
elsif($cntpop ne "NULL" && $cntpop ne $startpop) {
last;
}
elsif($cntasn ne "NULL" && $cntasn ne $startasn) {
last;
}
}
}
## begin to find the latest IP index in the same PoP as endIP
my $latestindex = $endindex;
for(my $j=$endindex; $j<=$hopsize; $j++) {
my $cnthop = @{$hops}[$j];
if(@{$hops}[$j] ne "*") {
my ($cntip, $cntdummy1, $cntdummy2, $cntasn, $cntpop) = split(/:/, @{$hops}[$j]);
if($cntasn eq $endasn && $cntpop eq $endpop) {
$latestindex = $j;
}
elsif($cntpop ne "NULL" && $cntpop ne $endpop) {
last;
}
elsif($cntasn ne "NULL" && $cntasn ne $endasn) {
last;
}
}
}
my $ippathlen = $latestindex - $earliestindex + 1;
my $key = "$srcip<$dstip<$startasn<$
my $oneentry = "$probetime<$newpoppath<$i
my $entries = $igphash{$key};
if(not(defined($entries)))
$totalpoppath++;
if($twounknownbetweendiffe
$poppathskipped++;
}
else {
$poppathincluded++;
@{$igphash{$key}} = ($oneentry);
}
}
else {
if(igpcontains(\@{$entries
$totalpoppath++;
if($twounknownbetweendiffe
$poppathskipped++;
}
else {
$poppathincluded++;
push(@{$igphash{$key}}, $oneentry);
}
} ## end if(igpcontains(\@{$entries
} ## end if(not(defined($entries)))
} ## end if($endpop eq "NULL") { ... } else { ... }
} ## end if(@{$cntgrouphops}[$i] eq "*") {...} else {...}
} ## end for($i=$cnthopindex+1; $i<=$cntgroupsize; $i++)
}
## get the consecutive * from current group index to the end of the group
sub getconsecutivestarsonpath {
my ($cntgrouplastindex, $hops) = @_;
my ($i, $hopsize);
my $afterstars = 0;
$hopsize = $#{@{$hops}};
for($i=$cntgrouplastindex+
if(@{$hops}[$i] eq "*") {
$afterstars++;
if($afterstars == 2) {
last;
}
}
else {
$afterstars = 0;
}
}
return $afterstars;
}
## this subroutine generate table entries and
sub generatetableentriesforasg
my ($hops, $groups, $asstr, $srcip, $dstip, $starttime, $probetime) = @_;
my ($hopsize, $groupsize) = ($#{@{$hops}}, $#{@{$groups}});
my ($i, $j);
for($i=0; $i<=$groupsize; $i++) { ## iterate through AS groups
my $onegroup = @{$groups}[$i];
my (@cntgrouphops) = split(/\|/, $onegroup);
my ($cntIP, $cntasn, $cntpop, $cntindex) = split(/:/, $cntgrouphops[0]);
if($cntasn != $fileasn) { ## only process the as group whose asn is specified in the file name
next;
}
if($cntgrouphops[0] eq "*") {
print "Current group first hop is *.\n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my $cntgrouplasthop = $cntgrouphops[$#cntgroupho
if($cntgrouplasthop eq "*") { ## the current last hop is not *
print "Current group last hop is *. \n";
print "Something is wrong with the code. Please check and fix it.\n";
exit(1);
}
my ($cntgrouplastIP, $cntgrouplastasn, $cntgrouplastpop, $cntgrouplastindex) = split(/:/, $cntgrouplasthop);
my $afterstars = getconsecutivestarsonpath(
my $partialaspath = getpartialaspath($aftersta
## print "The current asn is $cntlastasn, and the as path is $partialaspath, and the as string is $asstr, and file asn is $fileasn\n";
my ($islastgroup, $cntgrouphasvalidpop, $betweenunknowns, $cntgrouplastvalidasn, $cntgrouplastvalidpop, $cntgrouplastvalidindex, $nextgroupfirstasn, $nextgroupfirstIP) = checkunknownsbetweentwoase
if($islastgroup == 0) {
if($cntgrouphasvalidpop == 1 && $betweenunknowns <= 1) {
my $key = "$srcip<$dstip<$cntgroupla
my $oneentry = "$probetime<$nextgroupfirs
generatenextingressbgpentr
}
else {
$totalnextingress++;
$nextingressskipped++;
}## end if($cntgrouphasvalidpop == 1 && $betweenunknowns <= 1) {...} else {...}
}
## begin to generate bgp and IGP PoP-path entries for this group
for($j=0; $j<=$#cntgrouphops-1; $j++) {
if($cntgrouphops[$j] eq "*") {
next;
}
my ($startIP, $startasn, $startpop, $startindex) = split(/:/, $cntgrouphops[$j]);
if($startasn eq "0" || $startasn eq "NULL" || $startpop eq "NULL") {
next;
}
if(($islastgroup == 1 || $betweenunknowns <= 1) && $cntgrouphasvalidpop == 1) {
if($startindex >= $cntgrouplastvalidindex || $startpop eq $cntgrouplastvalidpop) {
next;
}
my $key = "$srcip<$dstip<$startasn<$
my $oneentry = "$probetime<$cntgrouplastv
generatenextegressbgpentry
}
else {
$totalnextegress++;
$nextegressskipped++;
}
## generate IGP pop-path table entries for this group
generatepoppathentries($ho
} ## end for($j=0; $j<=$#cntgrouphops-1; $j++)
} ## end for($i=0; $i<=$#groups; $i++)
}
## subroutine to process one traceroute, create bgp entries and poppath entries, and insert entries into bgp table and poppath table
sub processonetraceroute {
my ($probetime, $starttime, @hops) = @_;
my $srchop = $hops[0];
my $dsthop = $hops[$#hops];
my ($srcip, $dummy1, $dummy2, $srcasn, $srcpop) = split(/:/, $srchop);
my ($dstip, $dummy3, $dummy4, $dstasn, $dstpop) = split(/:/, $dsthop);
@hops = removeduplicates(@hops); ## remove consecutive duplicate IPs in the traceroute hops
my ($asstr, @asgroups) = dividehopsintoasgroups(@ho
my @groups = getpoppaths(\@hops, \@asgroups); ## get pop-path in ASes seperated by "|"
generatetableentriesforasg
}
## this subroutine inserts bgp and igp entries into mysql database
sub populatehashes {
# connect to database
$dbh = DBI->connect ($dsn, $user_name, $password,
{ RaiseError => 1, PrintError => 0 });
my $str = "LOCK TABLES bgp WRITE";
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO bgp (ptime, tstart, vpip, dip, cntas, cntpop, nextas, nextpop, aspath) VALUES ";
my $first = 1;
my $entrycount = 0;
while ( my ($key, $val) = each(%bgphash) ) {
my ($srcip, $dstip, $asn, $pop, $starttime) = split(/</, $key);
my $count=0;
my $strtowrite = "";
$entrycount++;
foreach my $one (@{$val}) {
my ($probetime, $nextasn, $nextpop, $aspath) = split(/</, $one);
$count++;
if($first == 1) {
$str .= "('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$pop', $nextasn, '$nextpop', '$aspath')";
$first = 0;
}
else {
$str .= ", ('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$pop', $nextasn, '$nextpop', '$aspath')";
}
$strtowrite .= "$probetime\t$starttime\t$
}
if($count>1) {
print OUTPUT2 "$strtowrite";
}
if($entrycount >= 3000) {
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO bgp (ptime, tstart, vpip, dip, cntas, cntpop, nextas, nextpop, aspath) VALUES ";
$first = 1;
$entrycount = 0;
}
}
if($entrycount > 0) {
$sth = $dbh->prepare($str);
$sth->execute();
}
$str = "LOCK TABLES poppath WRITE";
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO poppath (ptime, tstart, vpip, dip, asn, srcpop, dstpop, poppath, ippathlen) VALUES ";
$first = 1;
$entrycount = 0;
while ( my ($key, $val) = each(%igphash) ) {
my ($srcip, $dstip, $asn, $srcpop, $dstpop, $starttime) = split(/</, $key);
my $count = 0;
my $strtowrite="";
$entrycount++;
foreach my $one (@{$val}) {
my ($probetime, $poppath, $ippathlen) = split(/</, $one);
$count++;
if($first == 1) {
$str .= "('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$srcpop', '$dstpop', '$poppath', '$ippathlen')";
$first = 0;
}
else {
$str .= ", ('$probetime', '$starttime', '$srcip', '$dstip', $asn, '$srcpop', '$dstpop', '$poppath', '$ippathlen')";
}
$strtowrite .= "$probetime\t$starttime\t$
}
if($count>1) {
print OUTPUT1 "$strtowrite";
}
if($entrycount >= 3000) {
$sth = $dbh->prepare($str);
$sth->execute();
$str = "INSERT INTO poppath (ptime, tstart, vpip, dip, asn, srcpop, dstpop, poppath, ippathlen) VALUES ";
$first = 1;
$entrycount = 0;
}
}
if($entrycount>0) {
$sth = $dbh->prepare($str);
$sth->execute();
}
$str = "UNLOCK TABLES";
$sth = $dbh->prepare($str);
$sth->execute();
%bgphash = ();
%igphash = ();
}
my $filecount = 0;
my $discardedIPloops = 0;
my $discardedPoPloops = 0;
my $discardedASloops = 0;
my $processedtraceroutes = 0;
sub findnumIPs {
my @hops = @_;
my %lastasindex = ();
my $lastasn = "-1";
my $loopstr = "";
for(my $i=0; $i<=$#hops; $i++) {
if($hops[$i] ne "*") {
my ($ip, $rtt, $ttl, $asn, $pop) = split(/:/, $hops[$i]);
if($asn eq "NULL") {
next;
}
if(not(defined($lastasinde
$lastasindex{$asn} = $i;
$lastasn = $asn;
}
else {
if($asn eq $lastasn) {
$lastasindex{$asn} = $i;
}
else {
my $lastindex = $lastasindex{$asn};
my $numipsonloop = $i - $lastindex;
my $distinctas = 0;
my $keptasn = -1;
for(my $j=$lastindex; $j<=$i; $j++) {
if($hops[$j] eq "*") {
if($loopstr eq "") {
$loopstr = "*";
}
else {
$loopstr .= ">*";
}
next;
}
my ($cntip, $cntrtt, $cntttl, $cntasn, $cntpop) = split(/:/, $hops[$j]);
if($loopstr eq "") {
$loopstr = "$cntip:$cntasn";
}
else {
$loopstr .= ">$cntip:$cntasn";
}
if($cntasn ne "NULL" && $cntasn ne $keptasn) {
$distinctas++;
$keptasn = $cntasn;
}
} ## end for(my $j ... ...)
$distinctas--; ## decrease by one due to last repeating as
print OUTPUT5 "$numipsonloop $distinctas\n";
return $loopstr;
}
}
}
}
return $loopstr;
}
sub discardpath {
my (@hops) = @_;
## check for IP, PoP, AS loops
my %iphash = ();
my %pophash = ();
my %ashash = ();
my $lastip = "-1";
my $lastasn = -1;
my $lastpop = "-1";
my $aspath = "";
foreach my $onehop (@hops) {
if($onehop ne "*") {
my ($ip, $rtt, $ttl, $asn, $pop) = split(/:/, $onehop);
if($ip ne $lastip) {
if(not(defined($iphash{$ip
$iphash{$ip} = 1;
$lastip = $ip;
}
else {
$discardedIPloops++;
## print "The traceroute @hops contains an IP loop.\n";
## print "$ip appeared more than once.\n";
return 1;
}
}
if($asn ne "NULL") {
if($pop ne "NULL") {
my $cntpop = "$asn->$pop";
if($cntpop ne $lastpop) {
if(not(defined($pophash{$c
$pophash{$cntpop} = 1;
$lastpop = $cntpop;
}
else {
$discardedPoPloops++;
## print "The traceroute @hops contains a PoP loop.\n";
return 2;
}
}
}
$aspath .= "$asn|";
if($asn ne $lastasn) {
if(not(defined($ashash{$as
$ashash{$asn} = 1;
$lastasn = $asn;
}
else {
my $loopstr = findnumIPs(@hops);
print OUTPUT4 "The hops are @hops, there is an AS loop. The AS loops are $aspath\n";
print OUTPUT4 "The AS loop segments are $loopstr\n";
$discardedASloops++;
return 3;
}
}
}
}
}
return 0;
}
my $totaltraceroutes = 0;
my $processbegintime = time(); ## get time in seconds since 1970
my $lastprocesstime = $processbegintime;
my $cntprocesstime;
my $totalprocesstime = 0;
# begin to process traceroute plain texts, and map intermediate IPs to their AS numbers and locations (POPs)
foreach my $file (@files) {
my $lastinserttime="-1"; ## this stores the old data collection hour
# this is traceroute plain text file
if(-f $file) {
open(INPUT1, "zcat $file | ") || die "can't open file $file for read";
$filecount++;
print "File $filecount is $file\n";
my $tag;
my $date;
my $time;
my $srcaddr;
my $arrow;
my $dstaddr;
my $icmpstatus;
my $hopcount;
my @hops = ();
my $lastindex=0;
my $dummy;
my $cntindex=0;
my $cntIP;
my $cntrtt;
my $cntttl;
my $firsthop;
my $lasthop;
## begin to extract probing time from the file name
# extension is in the format of .*
$file =~ /.*\/([^\/]+)/;
my $filename = $1;
my ($site, $datetime, $re, $targetasn) = split(/_/, $filename);
$fileasn = $targetasn;
$datetime =~ /(\d\d)(\d\d)(\d\d)(\d\d)(
my ($yy, $mm, $dd, $hh, $min, $ss) = ($1, $2, $3, $4, $5, $6);
my $yyyy;
my $probetime;
my $starttime;
$probetime = "20$yy-$mm-$dd $hh:$min:$ss";
my $startmin = 15*int(($min/15));
$starttime = "20$yy-$mm-$dd $hh:$startmin:00";
## begin to process lines iteratively
while(my $line = <INPUT1>) {
chomp($line);
## this is the beginning of a new traceroute probe
if($line =~ /->/) {
($tag, $date, $time, $srcaddr, $arrow, $dstaddr, $icmpstatus, $hopcount) = split(/\s+/, $line);
my ($firsthop, $lasthop, $oneIP, $IPnum, $cntasn, $cntpop);
if(not(defined($ipasnpopta
## find out the asn and pop for the first IP
if(not($oneIP = new Net::IP($srcaddr))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
$IPnum = $oneIP->intip();
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($srcaddr
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$firsthop = "$srcaddr:-1:$-1:$cntasn:$
$ipasnpoptable{$srcaddr} = $firsthop;
}
else {
$firsthop = $ipasnpoptable{$srcaddr};
}
if($lastinserttime ne "-1") {
my $oldtime = to_seconds($lastinserttime
my $newtime = to_seconds("$yy$mm$dd$hh$m
my $timediff = $newtime - $oldtime;
## populate bgp and igp hash into database if it has been 3 hours since last insertion
if( $timediff > 10800 ) {
$cntprocesstime = time();
$totalprocesstime += $cntprocesstime - $lastprocesstime;
$lastprocesstime = $processbegintime;
populatehashes();
$lastinserttime = "$yy$mm$dd$hh$min$ss";
}
}
else {
$lastinserttime = "$yy$mm$dd$hh$min$ss";
}
if(not(defined($ipasnpopta
## find out the asn and pop for the last IP
if(not($oneIP = new Net::IP($dstaddr))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
$IPnum = $oneIP->intip();
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($dstaddr
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$lasthop = "$dstaddr:-1:$-1:$cntasn:$
$ipasnpoptable{$dstaddr} = $lasthop;
}
else {
$lasthop = $ipasnpoptable{$dstaddr};
}
}
elsif($line =~ /duration/) {
next;
}
elsif($line !~ /^\s*$/) { # process the line if it contains other than white spaces
($dummy, $cntindex, $cntIP, $cntrtt, $cntttl) = split(/\s+/, $line);
if($lastindex != 0) { # not the beginning of the first hop
my $numstars = $cntindex-$lastindex-1;
my $i;
## fill in * for those missing hops
for($i=0; $i<$numstars; $i++) {
push(@hops, "*");
}
}
my $hopstr;
if(not(defined($ipasnpopta
# print "Current ip is $cntIP\n";
my $oneIP;
if(not($oneIP = new Net::IP($cntIP))) {
print "Net::IP::Error()\n";
print "The file is $filename, the line is $line\n";
last;
}
my $IPnum = $oneIP->intip();
my $cntasn;
my $cntpop;
$cntasn = $ipasntable{$IPnum};
if(not(defined($cntasn))) {
## look up asn value from the prefix-as mapping patricia handler
$cntasn = $pt->match_string($cntIP);
if(not(defined($cntasn))) {
$cntasn = "NULL";
}
}
$cntpop = $iploctable{$IPnum};
if(not(defined($cntpop))) {
$cntpop = "NULL";
}
$hopstr = "$cntIP:$cntrtt:$cntttl:$c
$ipasnpoptable{$cntIP} = $hopstr;
}
else {
$hopstr = $ipasnpoptable{$cntIP};
}
push(@hops, $hopstr);
$lastindex = $cntindex;
}
elsif($line =~ /^\s*$/) { # this is the white space lines
## begin to process one traceroute probing result
if($#hops >=0) {
my @cnthops = ($firsthop, @hops, $lasthop);
$totaltraceroutes++;
if(discardpath(@cnthops) == 0) {
$processedtraceroutes++;
processonetraceroute($prob
}
}
## last line is the end of one traceroute probe
if($lastindex != 0) {
@hops = (); # reset hops array to empty
$lastindex = 0; # reset lastindex to 0
}
}
}
close(INPUT1);
}
}
$cntprocesstime = time();
$totalprocesstime += $cntprocesstime - $lastprocesstime;
$lastprocesstime = $processbegintime;
populatehashes();
$sth->finish ();
$dbh->disconnect ();
my $ratio;
print OUTPUT3 "$totalfiles traceroute files are checked.\n";
print OUTPUT3 "There are $numgroups groups(s) of probing in the checking interval.\n";
for my $onetime (sort keys %timefilehash) {
my @cntset = @{$timefilehash{$onetime}}
my $goodfiles = $#cntset+1;
print OUTPUT3 "At time $onetime, there are $goodfiles good files.\n";
foreach my $onefile (@{$timefilehash{$onetime}
print "The file is $onefile\n";
}
}
print OUTPUT3 "\n\n";
$ratio = $corruptedfiles/$totalfile
print OUTPUT3 "$corruptedfiles, counted as $ratio, traceroute files are corrupted and are discarded.\n";
$ratio = $workingfiles/$totalfiles;
print OUTPUT3 "$workingfiles, counted as $ratio, traceroute files are good.\n\n\n";
print OUTPUT3 "$totaltraceroutes traceroute paths are parsed.\n";
$ratio = $discardedIPloops/$totaltr
print OUTPUT3 "$discardedIPloops, counted as $ratio, traceroute paths are discarded due to IP loops.\n";
$ratio = $discardedPoPloops/$totalt
print OUTPUT3 "$discardedPoPloops, counted as $ratio, traceroute paths are discarded due to PoP loops.\n";
$ratio = $discardedASloops/$totaltr
print OUTPUT3 "$discardedASloops traceroute paths, counted as $ratio, are discarded due to AS loops.\n";
$ratio = $processedtraceroutes/$tot
print OUTPUT3 "$processedtraceroutes, counted as $ratio, traceroutes are processed.\n\n\n";
print OUTPUT3 "$totalnextingress BGP next-ingresses are checked.\n";
$ratio = $nextingressskipped/$total
print OUTPUT3 "$nextingressskipped, counted as $ratio, BGP next-ingresses are discarded due to more than one consecutive unknowns between last valid PoP and next-ingress IP.\n";
$ratio = $nextingressincluded/$tota
print OUTPUT3 "$nextingressincluded, counted as $ratio, BGP next-ingresses are included.\n";
print OUTPUT3 "$totalnextegress BGP same-AS egresses are checked.\n";
$ratio = $nextegressskipped/$totaln
print OUTPUT3 "$nextegressskipped, counted as $ratio, BGP same-AS egresses are discarded due to more than one consecutive unknowns between last valid PoP and next-ingress IP.\n";
$ratio = $nextegressincluded/$total
print OUTPUT3 "$nextegressincluded, counted as $ratio, BGP same-AS egresses are included.\n";
print OUTPUT3 "$totalaspath BGP AS-paths are checked.\n";
$ratio = $aspathskipped/$totalaspat
print OUTPUT3 "$aspathskipped, counted as $ratio, BGP AS-paths are discarded due to more than one consecutive stars on IP-path from current AS to destination host.\n";
$ratio = $aspathincluded/$totalaspa
print OUTPUT3 "$aspathincluded, counted as $ratio, BGP AS-paths are included.\n";
print OUTPUT3 "$totalpoppath IGP PoP-paths are checked.\n";
$ratio = $poppathskipped/$totalpopp
print OUTPUT3 "$poppathskipped, counted as $ratio, IGP PoP-paths are discarded due to more than one unknown PoPs (either NULL PoP or *) between different neighboring PoPs.\n";
$ratio = $poppathincluded/$totalpop
print OUTPUT3 "$poppathincluded, counted as $ratio, IGP PoP-paths are included.\n";
my $stoptime = time(); ## get time in seconds since 1970
my $elapsedtime = $stoptime - $begintime;
print OUTPUT3 "The code stops at time $stoptime\n\n";
print OUTPUT3 "The code runs $elapsedtime seconds\n";
print OUTPUT3 "The traceroute processing runs $totalprocesstime seconds\n";
close(OUTPUT1);
close(OUTPUT2);
close(OUTPUT3);
close(OUTPUT4);
close(OUTPUT5);
exit (0);
If i copy/paste the above, the "exit(0)" is on line 1647... I'll use line numbers that match that:
Lines 99-100: you call to_seconds($startingtime) and to_seconds($endingtime) inside the while loop from line 93. These aren't changing, and should be calculated once before the loop. eg:
$startingtimeseconds=to_se conds($sta rtingtime) ; #then use this inside the loop on line 99
Lines 106-113: this can be replaced with:
push @{$timefilehash{$cnttime}} , $file;
Lines 124-125: Same as lines 99-100
Lines 194-214: Don't create a copy of @ary in line 194... Just use $ary[0], $ary[1], and $ary[2]
Lines 228-234: Same as lines 194-214
That's about as far as I got so far.... I don't think any of those things will make a bug difference though.
Lines 99-100: you call to_seconds($startingtime) and to_seconds($endingtime) inside the while loop from line 93. These aren't changing, and should be calculated once before the loop. eg:
$startingtimeseconds=to_se
Lines 106-113: this can be replaced with:
push @{$timefilehash{$cnttime}}
Lines 124-125: Same as lines 99-100
Lines 194-214: Don't create a copy of @ary in line 194... Just use $ary[0], $ary[1], and $ary[2]
Lines 228-234: Same as lines 194-214
That's about as far as I got so far.... I don't think any of those things will make a bug difference though.
Are you decompressing the files first before processing? If so, that will increase the processing time quite considerably.