catalini
asked on
Breadth First Search reading data from a mysql table
I'm trying to calculate distance on a graph using the Breadth First Search algorithm. I've seen plenty of implementations online, but I need one that would allow me to read data from my mysql database.
I've a table structured like this:
INDIVIDUAL1 INDIVIDUAL2
-------------------------- ---------- ---
AH2223423 AS38423842
AH2223423 HS2772722
AH2223423 IASD727272
HGS282828 AS38423842
where every row corresponds to an edge (e.g. row1 means that AH2223423 and AS38423842 are connected) on the graph and every INDIVIDUAL (e.g. AH2223423) is a vertex
If necessary I could export the table as csv and use it directly as a file.
I've a table structured like this:
INDIVIDUAL1 INDIVIDUAL2
--------------------------
AH2223423 AS38423842
AH2223423 HS2772722
AH2223423 IASD727272
HGS282828 AS38423842
where every row corresponds to an edge (e.g. row1 means that AH2223423 and AS38423842 are connected) on the graph and every INDIVIDUAL (e.g. AH2223423) is a vertex
If necessary I could export the table as csv and use it directly as a file.
Is it an directed graph, or should we assume a path from AS38423842 to AH2223423
even if there is no entry like
INDIVIDUAL1 INDIVIDUAL2
-------------------------- ---------- ---
AS38423842 AH2223423
even if there is no entry like
INDIVIDUAL1 INDIVIDUAL2
--------------------------
AS38423842 AH2223423
It looks as if it is an unweighted graph... meaning each edge is the same distance. Is this correct?
If the database is small enough that the entire graph can fit in memory, I would first create a hash, where the key is the vertex, and the value is a hash ref or array ref of neighbors. This would then make it very easy to do the search.
What part would you like help with?
If the database is small enough that the entire graph can fit in memory, I would first create a hash, where the key is the vertex, and the value is a hash ref or array ref of neighbors. This would then make it very easy to do the search.
What part would you like help with?
ASKER
ozo:
it is undirected, if there is a path between
INDIVIDUAL1 INDIVIDUAL2
-------------------------- ---------- ---
AS38423842 AH2223423
then the reverse is also true (i.e AH2223423 is connected to AS38423842)
adam314:
yes each edge is the same distance. the table is 240,000 rows (edges)
I've found a nice python script http://code.google.com/p/python-graph/
but would like to know how to use it in case with my mysql table...
(if you know a better script let me know!)
thanks!!!
it is undirected, if there is a path between
INDIVIDUAL1 INDIVIDUAL2
--------------------------
AS38423842 AH2223423
then the reverse is also true (i.e AH2223423 is connected to AS38423842)
adam314:
yes each edge is the same distance. the table is 240,000 rows (edges)
I've found a nice python script http://code.google.com/p/python-graph/
but would like to know how to use it in case with my mysql table...
(if you know a better script let me know!)
thanks!!!
#!/usr/bin/env python
# Copyright (c) 2007 Pedro Matiello <pmatiello@gmail.com>
# License: MIT (see COPYING file)
import graph
print "Example"
# Graph creation
gr = graph.graph()
# Add nodes and edges
gr.add_nodes(["Portugal","Spain","France","Germany","Belgium","Netherlands","Italy"])
gr.add_nodes(["England","Ireland","Scotland","Wales"])
gr.add_nodes(["Atlantis","Wonderland","Brazil","Lemuria","Ys"])
gr.add_edge("Portugal", "Spain", wt=1)
gr.add_edge("Spain","France", wt=3)
gr.add_edge("France","Belgium", wt=2)
gr.add_edge("France","Germany", wt=3)
gr.add_edge("France","Italy", wt=3)
gr.add_edge("Belgium","Netherlands", wt=1)
gr.add_edge("Germany","Belgium", wt=2)
gr.add_edge("Germany","Netherlands", wt=2)
gr.add_edge("Germany","Italy", wt=3)
gr.add_edge("England","Wales", wt=1)
gr.add_edge("England","Scotland", wt=2)
gr.add_edge("Scotland","Wales", wt=1)
gr.add_edge("Atlantis","Wonderland")
gr.add_arrow("Atlantis","Brazil")
gr.add_edge("Atlantis","Lemuria")
gr.add_arrow("Lemuria","Ys")
gr.add_arrow("Ys","Wonderland")
gr.add_arrow("Ys","Brazil")
print "------------------------------------------------------------------------"
print "Breadth First Search"
print gr.breadth_first_search()
ASKER
one problem with the script above is that it just returns the connected elements, not the distance on the graph...
You could use the DBI module to connect to your database
http://search.cpan.org/~timb/DBI-1.604/DBI.pm
And the Graph module to create a graph
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod
And the Graph::Traversal::BFS to do a breadth-first-search
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph/Traversal/BFS.pm
Each of those modules has documentation. If you would like some sample code using those modules, let me know.
http://search.cpan.org/~timb/DBI-1.604/DBI.pm
And the Graph module to create a graph
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod
And the Graph::Traversal::BFS to do a breadth-first-search
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph/Traversal/BFS.pm
Each of those modules has documentation. If you would like some sample code using those modules, let me know.
Here is some sample code...
use DBI;
use Graph;
use Graph::Traversal::BFS;
#Connect to database
#you will need to define the $data_source, $username, and $password
my $dbh = DBI->connect($data_source, $username, $password) or die "Could not connect: $DBI::errstr\n";
#Created undirected graph
my $graph = Graph->new(undirected=>1);
#Get data from database
my $sth = $dbh->prepare("SELECT INDIVIDUAL1, INDIVIDUAL2 FROM table")
or die "Could not prepare: $DBI::errstr\n";
$sth->execute;
#create and edge according to db (a vertex is automatically created if it doesn't already exist)
while(my ($a, $b) = $sth->fetchrow_array) {
$graph->add_edge($a, $b);
}
#Create traversal object (see documentation for details)
my $b = Graph::Traversal::BFS->new($graph);
$b->bfs;
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ozo:
I get a syntax error at distance.pl line 4, near "){"
I changed the connect line to
DBI->connect("dbi:mysql:da tabase=tes t;host=loc alhost;use r=root;pas sword=#### ####")
I get a syntax error at distance.pl line 4, near "){"
I changed the connect line to
DBI->connect("dbi:mysql:da
ASKER
adam:
is there a way to save the distances between each vertex pair back into a mysql table?
is there a way to save the distances between each vertex pair back into a mysql table?
ASKER
adam:
maybe I could use this?
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph/AdjacencyMatrix.pm
not sure how to add it to your code...
thanks!
maybe I could use this?
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph/AdjacencyMatrix.pm
not sure how to add it to your code...
thanks!
Saving back to mysql, do you want to create a new table with every possible vertex combination, and the distance to them?
ASKER
yes that would be great! I already have a table called DISTANCES with structure
INDIVIDUAL1, INDIVIDUAL2, DISTANCE
INDIVIDUAL1, INDIVIDUAL2, DISTANCE
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
adam, thanks! the procedure is almost perfect. Do you think there is a way to write only the positive distances? in this way it's writing also all the distances that are infinite (NULL)... that would probably speed things a little bit,
thanks!
thanks!
ASKER
furthermore it seems to crash my system, using all available resources (memory and 4GB of swap)... :-(
ASKER
thanks!
To skip those with NULL distances, you could put this before line 28:
my $len = $apsp->path_length($a, $b);
next unless $len > 0;
As for it crashing... computing the shortest distance from each vertex to each is very memory/time consuming. I'm not sure what to do about that. One thing that might help is to compute the shortest distance from one vertex to each, and add that to the database, then release it from memory, and go to the next node. Look in the Single-Source Shortest Path section:
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod#Single-Source_Shortest_Paths_(SSSP)
my $len = $apsp->path_length($a, $b);
next unless $len > 0;
As for it crashing... computing the shortest distance from each vertex to each is very memory/time consuming. I'm not sure what to do about that. One thing that might help is to compute the shortest distance from one vertex to each, and add that to the database, then release it from memory, and go to the next node. Look in the Single-Source Shortest Path section:
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod#Single-Source_Shortest_Paths_(SSSP)
ASKER
probably SSSP_Dijkstra is the way to go...
I've opened a similar question to solve this problem, maybe you can take a look ?
https://www.experts-exchange.com/questions/23433195/calculating-distance-on-a-huge-graph.html
thanks adam
I've opened a similar question to solve this problem, maybe you can take a look ?
https://www.experts-exchange.com/questions/23433195/calculating-distance-on-a-huge-graph.html
thanks adam
ASKER
http://code.google.com/p/python-graph/
but I'm not sure how to load the data....