Link to home
Start Free TrialLog in
Avatar of catalini
catalini

asked on

Breadth First Search reading data from a mysql table

I'm trying to calculate distance on a graph using the Breadth First Search algorithm. I've seen plenty of implementations online, but I need one that would allow me to read data from my mysql database.

I've a table structured like this:

INDIVIDUAL1         INDIVIDUAL2
---------------------------------------
AH2223423           AS38423842
AH2223423           HS2772722
AH2223423           IASD727272
HGS282828          AS38423842

where every row corresponds to an edge (e.g. row1 means that  AH2223423 and AS38423842 are connected) on the graph and every INDIVIDUAL (e.g. AH2223423) is a vertex

If necessary I could export the table as csv and use it directly as a file.
Avatar of catalini
catalini

ASKER

probably the best code is here

http://code.google.com/p/python-graph/

but I'm not sure how to load the data....
Avatar of ozo
Is it an directed graph, or should we assume a path from  AS38423842 to AH2223423
even if there is no entry like
INDIVIDUAL1         INDIVIDUAL2
---------------------------------------
AS38423842         AH2223423
It looks as if it is an unweighted graph... meaning each edge is the same distance.  Is this correct?
If the database is small enough that the entire graph can fit in memory, I would first create a hash, where the key is the vertex, and the value is a hash ref or array ref of neighbors.  This would then make it very easy to do the search.

What part would you like help with?
ozo:
it is undirected, if there is a path between

INDIVIDUAL1         INDIVIDUAL2
---------------------------------------
AS38423842         AH2223423

then the reverse is also true (i.e  AH2223423 is connected to AS38423842)


adam314:
yes each edge is the same distance. the table is 240,000 rows (edges)
I've found a nice python script http://code.google.com/p/python-graph/ 
but would like to know how to use it in case with my mysql table...
(if you know a better script let me know!)

thanks!!!



#!/usr/bin/env python
 
# Copyright (c) 2007 Pedro Matiello <pmatiello@gmail.com>
# License: MIT (see COPYING file)
 
import graph
 
print "Example"
# Graph creation
gr = graph.graph()
 
# Add nodes and edges
gr.add_nodes(["Portugal","Spain","France","Germany","Belgium","Netherlands","Italy"])
gr.add_nodes(["England","Ireland","Scotland","Wales"])
gr.add_nodes(["Atlantis","Wonderland","Brazil","Lemuria","Ys"])
gr.add_edge("Portugal", "Spain", wt=1)
gr.add_edge("Spain","France", wt=3)
gr.add_edge("France","Belgium", wt=2)
gr.add_edge("France","Germany", wt=3)
gr.add_edge("France","Italy", wt=3)
gr.add_edge("Belgium","Netherlands", wt=1)
gr.add_edge("Germany","Belgium", wt=2)
gr.add_edge("Germany","Netherlands", wt=2)
gr.add_edge("Germany","Italy", wt=3)
gr.add_edge("England","Wales", wt=1)
gr.add_edge("England","Scotland", wt=2)
gr.add_edge("Scotland","Wales", wt=1)
 
gr.add_edge("Atlantis","Wonderland")
gr.add_arrow("Atlantis","Brazil")
gr.add_edge("Atlantis","Lemuria")
gr.add_arrow("Lemuria","Ys")
gr.add_arrow("Ys","Wonderland")
gr.add_arrow("Ys","Brazil")
 
print "------------------------------------------------------------------------"
print "Breadth First Search"
print gr.breadth_first_search()

Open in new window

one problem with the script above is that it just returns the connected elements, not the distance on the graph...
You could use the DBI module to connect to your database
http://search.cpan.org/~timb/DBI-1.604/DBI.pm

And the Graph module to create a graph
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod

And the Graph::Traversal::BFS to do a breadth-first-search
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph/Traversal/BFS.pm

Each of those modules has documentation.  If you would like some sample code using those modules, let me know.
Here is some sample code...
use DBI;
use Graph;
use Graph::Traversal::BFS;
 
#Connect to database
#you will need to define the $data_source, $username, and $password
my $dbh = DBI->connect($data_source, $username, $password) or die "Could not connect: $DBI::errstr\n";
 
#Created undirected graph
my $graph = Graph->new(undirected=>1);
 
#Get data from database
my $sth = $dbh->prepare("SELECT INDIVIDUAL1, INDIVIDUAL2 FROM table")
  or die "Could not prepare: $DBI::errstr\n";
$sth->execute;
 
#create and edge according to db (a vertex is automatically created if it doesn't already exist)
while(my ($a, $b) = $sth->fetchrow_array) {
	$graph->add_edge($a, $b);
}
 
#Create traversal object (see documentation for details)
my $b = Graph::Traversal::BFS->new($graph);
$b->bfs;

Open in new window

SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ozo:

I get a syntax error at distance.pl line 4, near "){"

I changed the connect line to

DBI->connect("dbi:mysql:database=test;host=localhost;user=root;password=########")


adam:

is there a way to save the distances between each vertex pair back into a mysql table?
adam:

maybe I could use this?

http://search.cpan.org/~jhi/Graph-0.84/lib/Graph/AdjacencyMatrix.pm

not sure how to add it to your code...

thanks!
Saving back to mysql, do you want to create a new table with every possible vertex combination, and the distance to them?
yes that would be great! I already have a table called DISTANCES with structure

INDIVIDUAL1, INDIVIDUAL2, DISTANCE
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
adam, thanks! the procedure is almost perfect. Do you think there is a way to write only the positive distances? in this way it's writing also all the distances that are infinite (NULL)... that would probably speed things a little bit,

thanks!
furthermore it seems to crash my system, using all available resources (memory and 4GB of swap)...   :-(
thanks!
To skip those with NULL distances, you could put this before line 28:
    my $len = $apsp->path_length($a, $b);
    next unless $len > 0;

As for it crashing... computing the shortest distance from each vertex to each is very memory/time consuming.  I'm not sure what to do about that.  One thing that might help is to compute the shortest distance from one vertex to each, and add that to the database, then release it from memory, and go to the next node.  Look in the Single-Source Shortest Path section:
http://search.cpan.org/~jhi/Graph-0.84/lib/Graph.pod#Single-Source_Shortest_Paths_(SSSP)
probably SSSP_Dijkstra  is the way to go...
I've opened a similar question to solve this problem, maybe you can take a look ?
https://www.experts-exchange.com/questions/23433195/calculating-distance-on-a-huge-graph.html

thanks adam