Avatar of catalini
catalini asked on

calculating distance on a huge graph

I'm following up on my previous question

https://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_23428010.html#a21635922

since the solution still has some problems.

I'm trying to calculate distance on a graph. I've seen plenty of implementations online, but I need one that would allow me to read data from my mysql database. Ideally the Dijkstra algorithm would do the job.

I've a table structured like this:

INDIVIDUAL1         INDIVIDUAL2
---------------------------------------
AH2223423           AS38423842
AH2223423           HS2772722
AH2223423           IASD727272
HGS282828          AS38423842

where every row corresponds to an edge (e.g. row1 means that  AH2223423 and AS38423842 are connected) on the graph and every INDIVIDUAL (e.g. AH2223423) is a vertex


The code below by adam314 (see Snippet) calculates the distances on a small graph, but fails with the tables I use (more than 200,000 edges). Is there any other efficient way to calculate the distances between connected individuals on a graph and save them into mysql?



An alternative code suggested by ozo is here below, but it does not seem to work...

#!/bin/perl
use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:my mysql database", username, password)
for(  $ldbh->selectall_arrayre("SELECT INDIVIDUAL1,INDIVIDUAL2 from edges") ){
  $edge{$_->[0]}{$_->[1]}++;
  $edge{$_->[1]}{$_->[0]}++;
}

#does it need to be depth first, or can you use any algorithm that finds the correct distance?
sub distance{
  my($from,$to)=@_;
  return 0 if  $from eq $to;
  my @first=($from);
  my %dist=($from=>0);
  while( my $first = shift @first ){
       $dist = $dist{$first}+1;
       return $dist if $edge{$first}{$to};
       for( grep !exists $dist{$_},keys %{$edge{$first} ){
                   $dist{$_}=$dist;        
                   push @first,$_;
        }
  }
  return undef;
}


use DBI;
use Graph;
 
#Connect to database
my $dbh = DBI->connect($data_source, $username, $password) or die "Could not connect: $DBI::errstr\n";
 
#Created undirected graph
my $graph = Graph->new(undirected=>1);
 
#Get data from database
my $sth = $dbh->prepare("SELECT INDIVIDUAL1, INDIVIDUAL2 FROM table");
$sth->execute;
 
#create and edge according to db (a vertex is automatically created if it doesn't already exist)
while(my ($a, $b) = $sth->fetchrow_array) {
        $graph->add_weighted_edge($a, $b, 1);
}
 
#Create statement to insert data
my $sth_insert = $dbh->prepare("INSERT INTO table_distances (Individual1, Individual2, Distance) VALUES (?, ?, ?);");
 
#Calculate shortest distances using the Floyd Warshall algorithm
my $apsp = $graph->APSP_Floyd_Warshall;
 
#Save data to database
foreach my $a ($graph->vertices) {
        foreach my $b ($graph->vertices) {
                $sth_insert->execute($a, $b, $apsp->path_length($a, $b));
        }
}

Open in new window

PerlJava

Avatar of undefined
Last Comment
catalini

8/22/2022 - Mon
Kim Ryan

In what way does it fail, any error messages?
ASKER
catalini

I get a syntax error at distance.pl line 4, near "){"
ozo

use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:my mysql database", username, password); #  ; was missing
for(  $ldbh->selectall_arrayre("SELECT INDIVIDUAL1,INDIVIDUAL2 from edges") ){
  $edge{$_->[0]}{$_->[1]}++;
  $edge{$_->[1]}{$_->[0]}++;
}


#This uses meet in the middle rather than depth first,
sub distance{
  my($from,$to)=@_;
  return 0 if  $from eq $to;
  my %df=($from=>1);
  my %dr=($to=>1);
  my $d=0;
  my $k=0;
  while( !$k ){
      $k=1;
      ++$d;
      my($f,$r);
     if( keys %df <= keys %dr ){
        ($f,$r) = (\%df,\%dr);
     }else{
        ($f,$r) = (\%dr,\%df);
     }
    for( keys %$f ){
         for( keys %{$edge{$_}} ){
            $k =0 if !$f->{$_}++;
            return $d if $r->{$_};
         }
     }
  }
  return undef;
}

Your help has saved me hundreds of hours of internet surfing.
fblack61
ASKER
catalini

ozo, thanks. It seems to work. But how can i write the calculated distances in my mysql table (structured as INDIVIDUAL1, INDIVIDUAL2, DISTANCE)?

thanks

ozo

Do you want to find all distances, rather than just a single distance?
ASKER
catalini

yes :-(
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ozo

#untested
my $count=1;
while( $count ){
  last unless my $dbh = $ldbh-prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t1.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
  $count=0;
  while( my $r = $dbh->fetchrow_arrayref() ){
      ++$count;
      $ldbj->do("insert into edges set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2];");
  }
}
ASKER
catalini

it returns

Undefined subroutine &main::prepare called at distance.pl line 42.

line 42 is:
  last unless my $dbh = $ldbh-prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t1.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();

ozo

sorry, try $ldbh->prepare
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
ASKER
catalini

now i get

Can't call method "execute" on an undefined value at distance.pl line 42.

thanks ozo for helping me out on this :-)
ozo

#sorry, I would have caught those myself if I had DBD::mysql on this machine
#I also noticed a typo of t1 where I meant t2
my $dbh;
last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
ASKER
catalini

DBD::mysql::st execute failed: Unknown column 't1.DISTANCE' in 'field list' at distance.pl line 42.

actually i do not have any field DISTANCE in the originating table (it's just INDIVIDUAL1, INDIVIDUAL2)...
#untested
my $count=1;
while( $count ){
my $dbh;
last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
  $count=0;
  while( my $r = $dbh->fetchrow_arrayref() ){
      ++$count;
      $ldbj->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2];");
  }
}

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ozo

then what was the mysql table (structured as INDIVIDUAL1, INDIVIDUAL2, DISTANCE)?
Can you create a new table with DISTANCE=1 when (INDIVIDUAL1, INDIVIDUAL2) exists in the original table?
ASKER
catalini

the INDIVIDUAL1, INDIVIDUAL2, DISTANCE is the final table I would like to save the distances into,

I've created a column called DISTANCE with value = 1 as you suggested, the code seems to be running now. I'll let you know asap if it works! thanks for now!
ASKER
catalini

Can't call method "do" on an undefined value at distance.pl line 47.

line 47 is

$ldbj->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2];");
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
ozo

$ldbj should be the $ldbh from $ldbh = DBI->connect
ozo

It might be best to debug  it on a small graph so you can watch that all its steps make sense before trying it on a huge graph
ASKER
catalini

now it's writing on the table DISTANCES but it gives all the individuals distance = 2 ?

maybe because of t1.DISTANCE+t2.DISTANCE?
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
catalini

I'm trying it on a 6000 nodes table
ASKER
catalini

it has already written over 7,000,000 rows?!?!
ozo

Are there nodes which should have distance 3 or NULL?
What was the step that set that distance to 2?
A connected graph of 6000 nodes could have 36000000 distances
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
ASKER
catalini

yes distances should go from NULL to 7, 8 at least...

it's probably because in the table edges all the DISTANCE=1 and when the sql joins them it sums 1+1=2?

just guessing....
ozo

It should only add an edge with t1.INDIVIDUAL1 to t2.INDIVIDUAL2 at distance 2 when there is an edge from t1.INDIVIDUAL1 to t1.INDIVIDUAL2 at distance 1
and an edge from t2.INDIVIDUAL1 to t2.INDIVIDUAL2 at distance 1
with t1.INDIVIDUAL2 = t2.INDIVIDUAL1  which means that there is  a path of length two between them is it adding such an edge when there is no such path?
ozo

I think the where condition should probably have been
not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2)
or exists (select * from   edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance > t1.DISTANCE+t2.DISTANCE)
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ozo

No, that wouldn't explain the symptoms.
Could you print out the edge that it wrongly added, and what what rows it selected for t1, t2 and t3 that made it think that edge should be added?
ozo

I noticed another typo
insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2]
should have been
insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2]

but that doesn't explain the symptoms, since that would only add an edge from a node to itself.
ASKER
catalini

the nodes to itself where already present, it made a huge table (over 350,000,000) with all possible pairs and all the distances = 2

my starting table (edges) is something like this

INDIVIDUAL1, INDIVIDUAL2, DISTANCE

A8282            B2828                 1
A8282            A7272                 1
A8282            B2222                 1
A7272            H1234                 1

when there is an edge from A8282 to B2828 then there is also an edge between B2828 and A8282 (i.e. network is undirected) and their distance is 1.
From above it should return that

A8282 is at distance 1 from B2828, A7272 and B2222
A8282 is at distance 2 from H1234
H1234 is at distance 3 from B2828 and so on....

:-)
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
ASKER
catalini

my code ozo looks like this
#!/bin/perl
use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:database=test;user=XXXXX;password=XXXXXXXX") or die "Could not connect: $DBI::errstr\n";
 
for(  $ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges") ){
  $edge{$_->[0]}{$_->[1]}++;
  $edge{$_->[1]}{$_->[0]}++;
}
 
 
#This uses meet in the middle rather than depth first,
sub distance{
  my($from,$to)=@_;
  return 0 if  $from eq $to;
  my %df=($from=>1);
  my %dr=($to=>1);
  my $d=0;
  my $k=0;
  while( !$k ){
      $k=1;
      ++$d;
      my($f,$r);
     if( keys %df <= keys %dr ){
        ($f,$r) = (\%df,\%dr);
     }else{
        ($f,$r) = (\%dr,\%df);
     }
    for( keys %$f ){
         for( keys %{$edge{$_}} ){
            $k =0 if !$f->{$_}++;
            return $d if $r->{$_};
         }
     }
  }
  return undef;
}
 
 
#untested
my $count=1;
while( $count ){
my $dbh;
last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 or exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance > t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
  $count=0;
  while( my $r = $dbh->fetchrow_arrayref() ){
      ++$count;
      $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2];");
  }
}

Open in new window

ozo

I think I will try to load DBD::mysql on this machine so I can test it.
in the mean time, if you start with these four rows in your table:
A8282            B2828                 1
A8282            A7272                 1
A8282            B2222                 1
A7272            H1234                 1
can you find out exactly what rows were selected when it tries to insert
H1234            B2828                 2
ASKER
catalini

I run the code exactly with your table (4 rows)

and it started writing without an end this row

A8282        H1234        2

into the table DISTANCES...

it created more than 387,000 records in a few seconds! :-))
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Adam314

I'm not sure how this compares time/memory wise to what ozo has, but an other option would be to have the Graph module calculate the distances.  Calculating them all at once with the APSP_Floyd_Warshall method is to slow/time consuming on a large graph (based on your other question).  But it should be easier to calculate one set of nodes at a time.  I'm also not sure how the cache'ing is done.  It might help to clear the cache every so often.

Replace lines 22-30 from your original script with this
#Calculate shortest distances
foreach my $a ($graph->vertices) {
	foreach my $b ($graph->vertices) {
		next if $a eq $b;  #skip from-to same node
		my @nodes = $graph->SP_Dijkstra($a, $b);
		next unless @nodes;  #skip when $a to $b is not reachable
		$sth_insert->execute($a, $b, $#nodes);
	}
	#clear cache, uncomment to enable clearing
	#$graph->SPT_Dijkstra_clear_cache;
}

Open in new window

ASKER
catalini

adam:

i get "2Not an ARRAY reference at /usr/share/perl5/Heap/Elem.pm line 31."

and the lines it refers to in the Elem.pm file are below
# get or set heap slot
sub heap {
    @_ > 1 ? ($_[0][1] = $_[1]) : $_[0][1];     #THIS IS LINE 31
}

Open in new window

ozo

A8282        H1234        2
is correct because of
t1=
A8282            A7272                 1
t2=
A7272            H1234                 1
but the
not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)
should have prevented it from adding it again because
A8282        H1234        2
would exist as t3
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
ASKER
catalini

ozo:

this is the query I have... and it still generates all the duplicates forever...

"SELECT t1.INDIVIDUAL1, t2.NODE, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.NODE = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.NODE = t2.NODE AND t3.distance > t1.DISTANCE+t2.DISTANCE)"
ozo

I think the > should be <=
and is NODE another name for INDIVIDUAL2?
ASKER
catalini

yes, node is the same as INDIVIDUAL2

I've changed the ">" to "<= "

but i still get a table full of rows like this....

 A8282        H1234        2
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Adam314

If you don't want to continue trying with the Graph module, let me know... and I'll step back...

Which line from the code in post 21653805 is causing the error?
ASKER
catalini

adam, the graph module could be just fine, I'm trying this code
use DBI;
use Graph;
 
#Connect to database
my $dbh = DBI->connect("DBI:mysql:database=test;user=XXXXXX;password=XXXXX") or die "Could not connect: $DBI::errstr\n";
 
#Created undirected graph
my $graph = Graph->new(undirected=>1);
 
#Get data from database
my $sth = $dbh->prepare("SELECT INDIVIDUAL1, INDIVIDUAL2 FROM edges");
$sth->execute;
 
#create and edge according to db (a vertex is automatically created if it doesn't already exist)
while(my ($a, $b) = $sth->fetchrow_array) {
        $graph->add_weighted_edge($a, $b, 1);
}
 
#Create statement to insert data
my $sth_insert = $dbh->prepare("INSERT INTO DISTANCES (INDIVIDUAL1, INDIVIDUAL2, DISTANCE) VALUES (?, ?, ?);");
 
#Calculate shortest distances
foreach my $a ($graph->vertices) {
        foreach my $b ($graph->vertices) {
                next if $a eq $b;  #skip from-to same node
                my @nodes = $graph->SP_Dijkstra($a, $b);
                next unless @nodes;  #skip when $a to $b is not reachable
                $sth_insert->execute($a, $b, $#nodes);
        }
       #clear cache, uncomment to enable clearing
 $graph->SPT_Dijkstra_clear_cache;
}

Open in new window

ASKER
catalini

adam:

it does not tell me on which line the error is... it just returns...

Not an ARRAY reference at /usr/share/perl5/Heap/Elem.pm line 31.
Your help has saved me hundreds of hours of internet surfing.
fblack61
ASKER CERTIFIED SOLUTION
Adam314

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
ASKER
catalini

adam;

exactly the same error as before

Not an ARRAY reference at /usr/share/perl5/Heap/Elem.pm line 31.
ASKER
catalini

adam:

after a google search it could be a bug with Heap and debian based system... not sure about it...
ozo

"SELECT t1.INDIVIDUAL1, t2.NODE, t1.DISTANCE+t2.DISTANCE from edges as t1, edges as t2 where t1.NODE = t2.INDIVIDUAL1 and not exists (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.NODE = t2.NODE AND t3.distance <= t1.DISTANCE+t2.DISTANCE)"

"insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[0]', DISTANCE=$r->[2];"

you are selecting on edges but inserting into DISTANCES
that,s probably why the  (select * from edges as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.NODE = t2.NODE AND t3.distance <= t1.DISTANCE+t2.DISTANCE)"
is not seeing the edge that was just inserted
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
catalini

adam:

I've installed Heap from cpan directly... I'll try your code right now
ASKER
catalini

ozo:

so should I write directly into edges? I just didn't want to risk damaging the real data...
ozo

Can there be two distances between a pair INDIVIDUALs or NODEs?
if not, you may want to make INDIVIDUAL1, INDIVIDUAL2 or INDIVIDUAL1 NODE unique


use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:database=test;user=XXXXX;password=XXXXXXXX") or die "Could not connect: $DBI::errstr\n";
 
#copy edge data to DISTANCES
for(  $ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges") ){
    $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=1;");
    $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[1]', INDIVIDUAL2='$r->[0]', DISTANCE=1;");
}
 
#form transitive closure
my $count=1;
while( $count ){
    my $dbh;
    last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from DISTANCES as t1, DISTANCES as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from DISTANCES as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.distance <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
    $count=0;
    while( my $r = $dbh->fetchrow_arrayref() ){
        $count += $ldbh->do("update DISTANCES set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]")
               || $ldbh->do("insert DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2];");

    }
}
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
ASKER
catalini

ozo:

running the above script now returns only two rows (always using the test table you suggested), where INDIVIDUAL1 and INDIVIDUAL2 are blank and DISTANCE is set to 1...
ASKER
catalini

ozo:

actually it returns 5 blank rows...
ozo

Sorry I pasted together pieces from two programs without adjusting the variable names to match
copy edge data to DISTANCES should have been
for $r (  $ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges") ){
    $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=1;");
    $ldbh->do("insert into DISTANCES set INDIVIDUAL1='$r->[1]', INDIVIDUAL2='$r->[0]', DISTANCE=1;");
}
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
catalini

ozo:

probably almost there...
it wrote two lines

INDIVIDUAL1           INDIVIDUAL2       DISTANCE
ARRAY(0x715          ARRAY(0x715             1
ozo

for $r ( @{$ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges")} ){
  #this may waste memory if the edge table is very large.
ASKER
catalini

ozo:

DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A8282' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A7272' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A7272' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A7272' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A8282' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A8282' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A8282' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''A7272' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2828' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''B2222' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.
DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''H1234' at line 1 at bozo.pl line 21.


my line 21 is below

$count += $ldbh->do("update DISTANCES2 set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]")

Open in new window

Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
ASKER
catalini

ok there was a missing '

but now i get strange distances...
A8282        A8282        2
H1234       A7272       1
A7272       H1234       1
A8282       A8282       2
B2222       A8282       1
A8282       B2222       1
A7272       A7272       2
A7272       A8282       1
A8282       A7272       1
B2828       A8282       1
A8282       B2828       1
H1234       B2828       3
H1234       B2222       3
B2828       H1234       3
B2222       H1234       3
A8282       B2828       1
B2828       A8282       1
A8282       A7272       1
A7272       A8282       1
A8282       B2222       1
B2222       A8282       1
A7272       H1234       1
H1234       A7272       1


original table was
INDIVIDUAL1       INDIVIDUAL2       DISTANCE
A8282       B2828       1
A8282       A7272       1
A8282       B2222       1
A7272       H1234       1
ASKER
catalini

the distances are actually right, just the node with itself is useless and seems to do redundant calculation... (I'll try with a unique index as you suggested)
ozo

Is there a missing '
"update DISTANCES set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]'"

it would probably be better to use ? and bind,  we can tidy things up once its wotking
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ozo

is
A8282        A8282        2
one of the strange distances
it probably comes from e.g.
A8282       B2222       1
B2222       A8282       1
We could set all distances from a node to itself as 0
Or we could let that be understood and not bother entering it into the table when t1.INDIVIDUAL1 = t2.INDIVIDUAL2
depending on how you want your table to work

ASKER
catalini

we can just ignore them!

I've added a unique index to avoid duplicates... but now it only records the first set of distances... (i.e. always 1)

 ALTER TABLE `DISTANCES` ADD UNIQUE `COMPOSITE` ( `DISTANCE` , `INDIVIDUAL1` , `INDIVIDUAL2` )  

A7272        A8282        1
A7272       H1234       1
A8282       A7272       1
A8282       B2222       1
A8282       B2828       1
B2222       A8282       1
B2828       A8282       1
H1234       A7272       1
ozo

I think it should be (`INDIVIDUAL1` , `INDIVIDUAL2` )
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
ASKER
catalini

ozo:

I don't know what's going on but now I only get 7 DISTANCE 1 ROWS

this is the code I'm using...

use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:database=test;user=XXXX;password=XXXXXXXX") or die "Could not connect: $DBI::errstr\n";
 
#copy edge data to DISTANCES2
for $r ( @{$ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges2")} ){
  #this may waste memory if the edge table is very large.
    $ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=1;");
    $ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[1]', INDIVIDUAL2='$r->[0]', DISTANCE=1;");
}
 
#form transitive closure
my $count=1;
while( $count ){
    my $dbh;
    last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from DISTANCES2 as t1, DISTANCES2 as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from DISTANCES2 as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.DISTANCE <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
    $count=0;
    while( my $r = $dbh->fetchrow_arrayref() ){
        $count += $ldbh->do("update DISTANCES2 set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]'")
               || $ldbh->do("insert DISTANCES2 set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2];");

    }
}


and this is the result

H1234        A7272        1
A7272       H1234       1
B2222       A8282       1
A8282       B2222       1
A7272       A8282       1
A8282       A7272       1
B2828       A8282       1
A8282       B2828       1

this is really tricky! :-)
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ozo

To avoid distances from a node to itself, you can add
AND t1.INDIVIDUAL1 != t2.INDIVIDUAL2
to the where clause
ASKER
catalini

using your debug code it worked :-)

A8282        B2828        1
B2828       A8282       1
A8282       A7272       1
A7272       A8282       1
A8282       B2222       1
B2222       A8282       1
A7272       H1234       1
H1234       A7272       1
B2828       B2828       2
A7272       B2828       2
B2222       B2828       2
A8282       A8282       2
B2828       A7272       2
A7272       A7272       2
B2222       A7272       2
H1234       A8282       2
B2828       B2222       2
A7272       B2222       2
B2222       B2222       2
A8282       H1234       2
H1234       H1234       2
H1234       B2828       3
H1234       B2222       3
B2828       H1234       3
B2222       H1234       3

I've tryed adding the code AND t1.INDIVIDUAL1 !=... but it started and infinite loop...


use DBD::mysql;
my  $ldbh = DBI->connect("DBI:mysql:database=test;user=XXXX;password=XXXXXX") or die "Could not connect: $DBI::errstr\n";
 
#copy edge data to DISTANCES2
for $r ( @{$ldbh->selectall_arrayref("SELECT INDIVIDUAL1, INDIVIDUAL2 from edges2")} ){
  #this may waste memory if the edge table is very large.
    $ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=1;");
    $ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[1]', INDIVIDUAL2='$r->[0]', DISTANCE=1;");
}
 
#form transitive closure
my $count=1;
 
while( $count ){
    my $dbh;
    last unless $dbh = $ldbh->prepare("SELECT t1.INDIVIDUAL1, t2.INDIVIDUAL2, t1.DISTANCE+t2.DISTANCE from DISTANCES2 as t1, DISTANCES2 as t2 where t1.INDIVIDUAL2 = t2.INDIVIDUAL1 and not exists (select * from DISTANCES2 as t3 where t3.INDIVIDUAL1 = t1.INDIVIDUAL1 and t3.INDIVIDUAL2 = t2.INDIVIDUAL2 AND t3.DISTANCE <= t1.DISTANCE+t2.DISTANCE)") and $dbh->execute();
    $count=0;
    while( my $r = $dbh->fetchrow_arrayref() ){
         print "fetchrow @$r\n";
         print "update: ",($d=$ldbh->do("update DISTANCES2 set DISTANCE=$r->[2] where INDIVIDUAL1='$r->[0]' and INDIVIDUAL2='$r->[1]'")),"\n";
         if( $d <= 0 ){
                 print "insert",($d=$ldbh->do("insert into DISTANCES2 set INDIVIDUAL1='$r->[0]', INDIVIDUAL2='$r->[1]', DISTANCE=$r->[2];")),"\n";
         }
         $count += $d;
    }
}

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
catalini

I've tried on a bigger table, it only returns some of the distances and all between 1 and 2...

:-(
Adam314

How did you install the Graph module?  I just tried it on a ubuntu system, which is debian based, and it worked fine.

I installed the Graph module:
  as root:  perl -MCPAN -e 'install Graph'
ASKER
catalini

adam:

it works now, the procedure is running...

do you think it should be faster with this enabled or not?


        $graph->SPT_Dijkstra_clear_cache;

Open in new window

All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Adam314

I'm not sure.  I don't know how the Graph module works internally, or what is cached.  I could test it on my computer, but I think it'll depend more on your computer, how much memory, and the size of the graph.

If you wanted, you could have it print out how long it took after every node.  If you notice it getting slower as it goes on, then I think that clearing the cache will help.  If it stays about the same speed, then I'm guessing that clearing the cache would not help.

I also don't know how it will perform relative to the solution ozo was working on.
ASKER
catalini

I'm running it right now... how could I add the debug time information? thanks!

Ozo solution could be fast, but does not work at the moment.

Thanks to both of you anyways!
Adam314

Change the 100 in
    if($count==100)
to the number of nodes you want to process before checking the time

use Time::HiRes 'time';
 
 
#Calculate shortest distances
my $count=0;
my $time_a=time;
foreach my $a ($graph->vertices) {
	foreach my $b ($graph->vertices) {
		next if $a eq $b;  #skip from-to same node
		my @nodes = $graph->SP_Dijkstra($a, $b);
		$count++;
		if($count==100){
			$count=0;
			my $time_b=time;
			printf "Elapsed: %f\n", $time_b-$time_a;
			$time_a = $time_b;
		}
		next unless @nodes;  #skip when $a to $b is not reachable
		printf "Path  %2s - %2s: %d\n", $a, $b, $#nodes;
	}
	#clear cache, uncomment to enable clearing
	$graph->SPT_Dijkstra_clear_cache;
}

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ozo

were the distances and all between 1 and 2 correct?
If there was a distance 3 missing, what was the path that should have produced a distance of 3?
ASKER
catalini

I couldn't check it directly, since I've started with a 6000 rows table. But distances should be between 1 and 9 at least for sure :-(
ASKER
catalini

thanks for all your time on this question!
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23