Link to home
Start Free TrialLog in
Avatar of catalini
catalini

asked on

calculating distance on a huge graph

I'm following up on my previous question

https://www.experts-exchange.com/questions/23428010/Breadth-First-Search-reading-data-from-a-mysql-table.html?anchorAnswerId=21635922#a21635922

since the solution still has some problems.

I'm trying to calculate distance on a graph. I've seen plenty of implementations online, but I need one that would allow me to read data from my mysql database. Ideally the Dijkstra algorithm would do the job.

I've a table structured like this:

INDIVIDUAL1         INDIVIDUAL2
---------------------------------------
AH2223423           AS38423842
AH2223423           HS2772722
AH2223423           IASD727272
HGS282828          AS38423842

where every row corresponds to an edge (e.g. row1 means that  AH2223423 and AS38423842 are connected) on the graph and every INDIVIDUAL (e.g. AH2223423) is a vertex


The code below by adam314 (see Snippet) calculates the distances on a small graph, but fails with the tables I use (more than 200,000 edges). Is there any other efficient way to calculate the distances between connected individuals on a graph and save them into mysql?



An alternative code suggested by ozo is here below, but it does not seem to work...

#!/bin/perl
use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:my mysql database", username, password)
for(  $ldbh->selectall_arrayre("SELECT INDIVIDUAL1,INDIVIDUAL2 from edges") ){
  $edge{$_->[0]}{$_->[1]}++;
  $edge{$_->[1]}{$_->[0]}++;
}

#does it need to be depth first, or can you use any algorithm that finds the correct distance?
sub distance{
  my($from,$to)=@_;
  return 0 if  $from eq $to;
  my @first=($from);
  my %dist=($from=>0);
  while( my $first = shift @first ){
       $dist = $dist{$first}+1;
       return $dist if $edge{$first}{$to};
       for( grep !exists $dist{$_},keys %{$edge{$first} ){
                   $dist{$_}=$dist;        
                   push @first,$_;
        }
  }
  return undef;
}


use DBI;
use Graph;
 
#Connect to database
my $dbh = DBI->connect($data_source, $username, $password) or die "Could not connect: $DBI::errstr\n";
 
#Created undirected graph
my $graph = Graph->new(undirected=>1);
 
#Get data from database
my $sth = $dbh->prepare("SELECT INDIVIDUAL1, INDIVIDUAL2 FROM table");
$sth->execute;
 
#create and edge according to db (a vertex is automatically created if it doesn't already exist)
while(my ($a, $b) = $sth->fetchrow_array) {
        $graph->add_weighted_edge($a, $b, 1);
}
 
#Create statement to insert data
my $sth_insert = $dbh->prepare("INSERT INTO table_distances (Individual1, Individual2, Distance) VALUES (?, ?, ?);");
 
#Calculate shortest distances using the Floyd Warshall algorithm
my $apsp = $graph->APSP_Floyd_Warshall;
 
#Save data to database
foreach my $a ($graph->vertices) {
        foreach my $b ($graph->vertices) {
                $sth_insert->execute($a, $b, $apsp->path_length($a, $b));
        }
}

Open in new window

Avatar of Kim Ryan
Kim Ryan
Flag of Australia image

In what way does it fail, any error messages?
Avatar of catalini
catalini

ASKER

I get a syntax error at distance.pl line 4, near "){"
Avatar of ozo
use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:my mysql database", username, password); #  ; was missing
for(  $ldbh->selectall_arrayre("SELECT INDIVIDUAL1,INDIVIDUAL2 from edges") ){
  $edge{$_->[0]}{$_->[1]}++;
  $edge{$_->[1]}{$_->[0]}++;
}


#This uses meet in the middle rather than depth first,
sub distance{
  my($from,$to)=@_;
  return 0 if  $from eq $to;
  my %df=($from=>1);
  my %dr=($to=>1);
  my $d=0;
  my $k=0;
  while( !$k ){
      $k=1;
      ++$d;
      my($f,$r);
     if( keys %df <= keys %dr ){
        ($f,$r) = (\%df,\%dr);
     }else{
        ($f,$r) = (\%dr,\%df);
     }
    for( keys %$f ){
         for( keys %{$edge{$_}} ){
            $k =0 if !$f->{$_}++;
            return $d if $r->{$_};
         }
     }
  }
  return undef;
}

ozo, thanks. It seems to work. But how can i write the calculated distances in my mysql table (structured as INDIVIDUAL1, INDIVIDUAL2, DISTANCE)?

thanks

Do you want to find all distances, rather than just a single distance?
yes :-(
#untested
my $count=1;
while( $count ){
  last unless my $dbh = $ldbh-prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t1.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
  $count=0;
  while( my $r = $dbh->fetchrow_arrayref() ){
      ++$count;
      $ldbj->do("insert into edges set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2];");
  }
}
it returns

Undefined subroutine &main::prepare called at distance.pl line 42.

line 42 is:
  last unless my $dbh = $ldbh-prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t1.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();

sorry, try $ldbh->prepare
now i get

Can't call method "execute" on an undefined value at distance.pl line 42.

thanks ozo for helping me out on this :-)
#sorry, I would have caught those myself if I had DBD::mysql on this machine
#I also noticed a typo of t1 where I meant t2
my $dbh;
last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
DBD::mysql::st execute failed: Unknown column 't1.DISTANCE' in 'field list' at distance.pl line 42.

actually i do not have any field DISTANCE in the originating table (it's just INDIVIDUAL1, INDIVIDUAL2)...
#untested
my $count=1;
while( $count ){
my $dbh;
last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
  $count=0;
  while( my $r = $dbh->fetchrow_arrayref() ){
      ++$count;
      $ldbj->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2];");
  }
}

Open in new window

then what was the mysql table (structured as INDIVIDUAL1, INDIVIDUAL2, DISTANCE)?
Can you create a new table with DISTANCE=1 when (INDIVIDUAL1, INDIVIDUAL2) exists in the original table?
the INDIVIDUAL1, INDIVIDUAL2, DISTANCE is the final table I would like to save the distances into,

I've created a column called DISTANCE with value = 1 as you suggested, the code seems to be running now. I'll let you know asap if it works! thanks for now!
Can't call method "do" on an undefined value at distance.pl line 47.

line 47 is

$ldbj->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2];");
$ldbj should be the $ldbh from $ldbh = DBI->connect
It might be best to debug  it on a small graph so you can watch that all its steps make sense before trying it on a huge graph
now it's writing on the table DISTANCES but it gives all the individuals distance = 2 ?

maybe because of t1.DISTANCE+t2.DISTANCE?
I'm trying it on a 6000 nodes table
it has already written over 7,000,000 rows?!?!
Are there nodes which should have distance 3 or NULL?
What was the step that set that distance to 2?
A connected graph of 6000 nodes could have 36000000 distances
yes distances should go from NULL to 7, 8 at least...

it's probably because in the table edges all the DISTANCE=1 and when the sql joins them it sums 1+1=2?

just guessing....
It should only add an edge with t1.INDIVIDUAL1 to t2.INDIVIDUAL2 at distance 2 when there is an edge from t1.INDIVIDUAL1 to t1.INDIVIDUAL2 at distance 1
and an edge from t2.INDIVIDUAL1 to t2.INDIVIDUAL2 at distance 1
with t1.INDIVIDUAL2 = t2.INDIVIDUAL1  which means that there is  a path of length two between them is it adding such an edge when there is no such path?
I think the where condition should probably have been
not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2)
or exists (select * from   edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance > t1.DISTANCE+t2.DISTANCE)
No, that wouldn't explain the symptoms.
Could you print out the edge that it wrongly added, and what what rows it selected for t1, t2 and t3 that made it think that edge should be added?
I noticed another typo
insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2]
should have been
insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2]

but that doesn't explain the symptoms, since that would only add an edge from a node to itself.
the nodes to itself where already present, it made a huge table (over 350,000,000) with all possible pairs and all the distances = 2

my starting table (edges) is something like this

INDIVIDUAL1, INDIVIDUAL2, DISTANCE

A8282            B2828                 1
A8282            A7272                 1
A8282            B2222                 1
A7272            H1234                 1

when there is an edge from A8282 to B2828 then there is also an edge between B2828 and A8282 (i.e. network is undirected) and their distance is 1.
From above it should return that

A8282 is at distance 1 from B2828, A7272 and B2222
A8282 is at distance 2 from H1234
H1234 is at distance 3 from B2828 and so on....

:-)
my code ozo looks like this
#!/bin/perl
use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:database=test;user=XXXXX;password=XXXXXXXX") or die "Could not connect: $DBI::errstr\n";
 
for(  $ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges") ){
  $edge{$_->[0]}{$_->[1]}++;
  $edge{$_->[1]}{$_->[0]}++;
}
 
 
#This uses meet in the middle rather than depth first,
sub distance{
  my($from,$to)=@_;
  return 0 if  $from eq $to;
  my %df=($from=>1);
  my %dr=($to=>1);
  my $d=0;
  my $k=0;
  while( !$k ){
      $k=1;
      ++$d;
      my($f,$r);
     if( keys %df <= keys %dr ){
        ($f,$r) = (\%df,\%dr);
     }else{
        ($f,$r) = (\%dr,\%df);
     }
    for( keys %$f ){
         for( keys %{$edge{$_}} ){
            $k =0 if !$f->{$_}++;
            return $d if $r->{$_};
         }
     }
  }
  return undef;
}
 
 
#untested
my $count=1;
while( $count ){
my $dbh;
last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 or exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance > t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
  $count=0;
  while( my $r = $dbh->fetchrow_arrayref() ){
      ++$count;
      $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2];");
  }
}

Open in new window

I think I will try to load DBD::mysql on this machine so I can test it.
in the mean time, if you start with these four rows in your table:
A8282            B2828                 1
A8282            A7272                 1
A8282            B2222                 1
A7272            H1234                 1
can you find out exactly what rows were selected when it tries to insert
H1234            B2828                 2
I run the code exactly with your table (4 rows)

and it started writing without an end this row

A8282        H1234        2

into the table DISTANCES...

it created more than 387,000 records in a few seconds! :-))
I'm not sure how this compares time/memory wise to what ozo has, but an other option would be to have the Graph module calculate the distances.  Calculating them all at once with the APSP_Floyd_Warshall method is to slow/time consuming on a large graph (based on your other question).  But it should be easier to calculate one set of nodes at a time.  I'm also not sure how the cache'ing is done.  It might help to clear the cache every so often.

Replace lines 22-30 from your original script with this
#Calculate shortest distances
foreach my $a ($graph->vertices) {
	foreach my $b ($graph->vertices) {
		next if $a eq $b;  #skip from-to same node
		my @nodes = $graph->SP_Dijkstra($a, $b);
		next unless @nodes;  #skip when $a to $b is not reachable
		$sth_insert->execute($a, $b, $#nodes);
	}
	#clear cache, uncomment to enable clearing
	#$graph->SPT_Dijkstra_clear_cache;
}

Open in new window

adam:

i get "2Not an ARRAY reference at /usr/share/perl5/Heap/Elem.pm line 31."

and the lines it refers to in the Elem.pm file are below
# get or set heap slot
sub heap {
    @_ > 1 ? ($_[0][1] = $_[1]) : $_[0][1];     #THIS IS LINE 31
}

Open in new window

A8282        H1234        2
is correct because of
t1=
A8282            A7272                 1
t2=
A7272            H1234                 1
but the
not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)
should have prevented it from adding it again because
A8282        H1234        2
would exist as t3
ozo:

this is the query I have... and it still generates all the duplicates forever...

"SELECT t1.INDIVIDUAL1, t2.NODE, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.NODE = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.NODE = t2.NODE AND t3.distance > t1.DISTANCE+t2.DISTANCE)"
I think the > should be <=
and is NODE another name for INDIVIDUAL2?
yes, node is the same as INDIVIDUAL2

I've changed the ">" to "<= "

but i still get a table full of rows like this....

 A8282        H1234        2
If you don't want to continue trying with the Graph module, let me know... and I'll step back...

Which line from the code in post 21653805 is causing the error?
adam, the graph module could be just fine, I'm trying this code
use DBI;
use Graph;
 
#Connect to database
my $dbh = DBI->connect("DBI:mysql:database=test;user=XXXXXX;password=XXXXX") or die "Could not connect: $DBI::errstr\n";
 
#Created undirected graph
my $graph = Graph->new(undirected=>1);
 
#Get data from database
my $sth = $dbh->prepare("SELECT INDIVIDUAL1, INDIVIDUAL2 FROM edges");
$sth->execute;
 
#create and edge according to db (a vertex is automatically created if it doesn't already exist)
while(my ($a, $b) = $sth->fetchrow_array) {
        $graph->add_weighted_edge($a, $b, 1);
}
 
#Create statement to insert data
my $sth_insert = $dbh->prepare("INSERT INTO DISTANCES (INDIVIDUAL1, INDIVIDUAL2, DISTANCE) VALUES (?, ?, ?);");
 
#Calculate shortest distances
foreach my $a ($graph->vertices) {
        foreach my $b ($graph->vertices) {
                next if $a eq $b;  #skip from-to same node
                my @nodes = $graph->SP_Dijkstra($a, $b);
                next unless @nodes;  #skip when $a to $b is not reachable
                $sth_insert->execute($a, $b, $#nodes);
        }
       #clear cache, uncomment to enable clearing
 $graph->SPT_Dijkstra_clear_cache;
}

Open in new window

adam:

it does not tell me on which line the error is... it just returns...

Not an ARRAY reference at /usr/share/perl5/Heap/Elem.pm line 31.
ASKER CERTIFIED SOLUTION
Avatar of Adam314
Adam314

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
adam;

exactly the same error as before

Not an ARRAY reference at /usr/share/perl5/Heap/Elem.pm line 31.
adam:

after a google search it could be a bug with Heap and debian based system... not sure about it...
"SELECT t1.INDIVIDUAL1, t2.NODE, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.NODE = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.NODE = t2.NODE AND t3.distance <= t1.DISTANCE+t2.DISTANCE)"

"insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2];"

you are selecting on edges but inserting into DISTANCES
that,s probably why the  (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.NODE = t2.NODE AND t3.distance <= t1.DISTANCE+t2.DISTANCE)"
is not seeing the edge that was just inserted
adam:

I've installed Heap from cpan directly... I'll try your code right now
ozo:

so should I write directly into edges? I just didn't want to risk damaging the real data...
Can there be two distances between a pair INDIVIDUALs or NODEs?
if not, you may want to make INDIVIDUAL1, INDIVIDUAL2 or INDIVIDUAL1 NODE unique


use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:database=test;user=XXXXX;password=XXXXXXXX") or die "Could not connect: $DBI::errstr\n";
 
#copy edge data to DISTANCES
for(  $ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges") ){
    $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=1;");
    $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[1]', INDIVIDUAL2='$r->[0]', DISTANCE=1;");
}
 
#form transitive closure
my $count=1;
while( $count ){
    my $dbh;
    last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from DISTANCES as t1, DISTANCES as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from DISTANCES as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
    $count=0;
    while( my $r = $dbh->fetchrow_arrayref() ){
        $count += $ldbh->do("update DISTANCES set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]")
               || $ldbh->do("insert DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2];");

    }
}
ozo:

running the above script now returns only two rows (always using the test table you suggested), where INDIVIDUAL1 and INDIVIDUAL2 are blank and DISTANCE is set to 1...
ozo:

actually it returns 5 blank rows...
Sorry I pasted together pieces from two programs without adjusting the variable names to match
copy edge data to DISTANCES should have been
for $r (  $ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges") ){
    $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=1;");
    $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[1]', INDIVIDUAL2='$r->[0]', DISTANCE=1;");
}
ozo:

probably almost there...
it wrote two lines

INDIVIDUAL1           INDIVIDUAL2       DISTANCE
ARRAY(0x715          ARRAY(0x715             1
for $r ( @{$ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges")} ){
  #this may waste memory if the edge table is very large.
ozo:

DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A8282' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A7272' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A7272' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A7272' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A8282' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A8282' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A8282' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A7272' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.


my line 21 is below

$count += $ldbh->do("update DISTANCES2 set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]")

Open in new window

ok there was a missing '

but now i get strange distances...
A8282        A8282        2
H1234       A7272       1
A7272       H1234       1
A8282       A8282       2
B2222       A8282       1
A8282       B2222       1
A7272       A7272       2
A7272       A8282       1
A8282       A7272       1
B2828       A8282       1
A8282       B2828       1
H1234       B2828       3
H1234       B2222       3
B2828       H1234       3
B2222       H1234       3
A8282       B2828       1
B2828       A8282       1
A8282       A7272       1
A7272       A8282       1
A8282       B2222       1
B2222       A8282       1
A7272       H1234       1
H1234       A7272       1


original table was
INDIVIDUAL1       INDIVIDUAL2       DISTANCE
A8282       B2828       1
A8282       A7272       1
A8282       B2222       1
A7272       H1234       1
the distances are actually right, just the node with itself is useless and seems to do redundant calculation... (I'll try with a unique index as you suggested)
Is there a missing '
"update DISTANCES set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]'"

it would probably be better to use ? and bind,  we can tidy things up once its wotking
is
A8282        A8282        2
one of the strange distances
it probably comes from e.g.
A8282       B2222       1
B2222       A8282       1
We could set all distances from a node to itself as 0
Or we could let that be understood and not bother entering it into the table when t1.INDIVIDUAL1 = t2.INDIVIDUAL2
depending on how you want your table to work

we can just ignore them!

I've added a unique index to avoid duplicates... but now it only records the first set of distances... (i.e. always 1)

 ALTER TABLE `DISTANCES` ADD UNIQUE `COMPOSITE` ( `DISTANCE` , `INDIVIDUAL1` , `INDIVIDUAL2` )  

A7272        A8282        1
A7272       H1234       1
A8282       A7272       1
A8282       B2222       1
A8282       B2828       1
B2222       A8282       1
B2828       A8282       1
H1234       A7272       1
I think it should be (`INDIVIDUAL1` , `INDIVIDUAL2` )
ozo:

I don't know what's going on but now I only get 7 DISTANCE 1 ROWS

this is the code I'm using...

use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:database=test;user=XXXX;password=XXXXXXXX") or die "Could not connect: $DBI::errstr\n";
 
#copy edge data to DISTANCES2
for $r ( @{$ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges2")} ){
  #this may waste memory if the edge table is very large.
    $ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=1;");
    $ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[1]', INDIVIDUAL2='$r->[0]', DISTANCE=1;");
}
 
#form transitive closure
my $count=1;
while( $count ){
    my $dbh;
    last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from DISTANCES2 as t1, DISTANCES2 as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from DISTANCES2 as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.DISTANCE <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
    $count=0;
    while( my $r = $dbh->fetchrow_arrayref() ){
        $count += $ldbh->do("update DISTANCES2 set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]'")
               || $ldbh->do("insert DISTANCES2 set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2];");

    }
}


and this is the result

H1234        A7272        1
A7272       H1234       1
B2222       A8282       1
A8282       B2222       1
A7272       A8282       1
A8282       A7272       1
B2828       A8282       1
A8282       B2828       1

this is really tricky! :-)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
To avoid distances from a node to itself, you can add
AND t1.INDIVIDUAL1 != t2.INDIVIDUAL2
to the where clause
using your debug code it worked :-)

A8282        B2828        1
B2828       A8282       1
A8282       A7272       1
A7272       A8282       1
A8282       B2222       1
B2222       A8282       1
A7272       H1234       1
H1234       A7272       1
B2828       B2828       2
A7272       B2828       2
B2222       B2828       2
A8282       A8282       2
B2828       A7272       2
A7272       A7272       2
B2222       A7272       2
H1234       A8282       2
B2828       B2222       2
A7272       B2222       2
B2222       B2222       2
A8282       H1234       2
H1234       H1234       2
H1234       B2828       3
H1234       B2222       3
B2828       H1234       3
B2222       H1234       3

I've tryed adding the code AND t1.INDIVIDUAL1 !=... but it started and infinite loop...


use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:database=test;user=XXXX;password=XXXXXX") or die "Could not connect: $DBI::errstr\n";
 
#copy edge data to DISTANCES2
for $r ( @{$ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges2")} ){
  #this may waste memory if the edge table is very large.
    $ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=1;");
    $ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[1]', INDIVIDUAL2='$r->[0]', DISTANCE=1;");
}
 
#form transitive closure
my $count=1;
 
while( $count ){
    my $dbh;
    last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from DISTANCES2 as t1, DISTANCES2 as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from DISTANCES2 as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.DISTANCE <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
    $count=0;
    while( my $r = $dbh->fetchrow_arrayref() ){
         print "fetchrow @$r\n";
         print "update: ",($d=$ldbh->do("update DISTANCES2 set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]'")),"\n";
         if( $d <= 0 ){
                 print "insert",($d=$ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2];")),"\n";
         }
         $count += $d;
    }
}

Open in new window

I've tried on a bigger table, it only returns some of the distances and all between 1 and 2...

:-(
How did you install the Graph module?  I just tried it on a ubuntu system, which is debian based, and it worked fine.

I installed the Graph module:
  as root:  perl -MCPAN -e 'install Graph'
adam:

it works now, the procedure is running...

do you think it should be faster with this enabled or not?


        $graph->SPT_Dijkstra_clear_cache;

Open in new window

I'm not sure.  I don't know how the Graph module works internally, or what is cached.  I could test it on my computer, but I think it'll depend more on your computer, how much memory, and the size of the graph.

If you wanted, you could have it print out how long it took after every node.  If you notice it getting slower as it goes on, then I think that clearing the cache will help.  If it stays about the same speed, then I'm guessing that clearing the cache would not help.

I also don't know how it will perform relative to the solution ozo was working on.
I'm running it right now... how could I add the debug time information? thanks!

Ozo solution could be fast, but does not work at the moment.

Thanks to both of you anyways!
Change the 100 in
    if($count==100)
to the number of nodes you want to process before checking the time

use Time::HiRes 'time';
 
 
#Calculate shortest distances
my $count=0;
my $time_a=time;
foreach my $a ($graph->vertices) {
	foreach my $b ($graph->vertices) {
		next if $a eq $b;  #skip from-to same node
		my @nodes = $graph->SP_Dijkstra($a, $b);
		$count++;
		if($count==100){
			$count=0;
			my $time_b=time;
			printf "Elapsed: %f\n", $time_b-$time_a;
			$time_a = $time_b;
		}
		next unless @nodes;  #skip when $a to $b is not reachable
		printf "Path  %2s - %2s: %d\n", $a, $b, $#nodes;
	}
	#clear cache, uncomment to enable clearing
	$graph->SPT_Dijkstra_clear_cache;
}

Open in new window

were the distances and all between 1 and 2 correct?
If there was a distance 3 missing, what was the path that should have produced a distance of 3?
I couldn't check it directly, since I've started with a 6000 rows table. But distances should be between 1 and 9 at least for sure :-(
thanks for all your time on this question!