Solved

Programming/Interesting algorithm to find clusters of nodes in a map of related nodes

Posted on 2008-10-16
23
567 Views
Last Modified: 2012-05-05
I have a problem that I havent been able to solve, need assistance in helping me go in the right direction.

question is best described by the attached picture.

I have a number of nodes (the circles) and each one is related to another node by inputs on the node referencing another node. each node only references 1 layer of outputs, but can reference multiple nodes. combining all nodes with references creates a node map.

now I need some help finding areas that are segregated from the node map via the 1 output reference, as pictured.

whilst i have provided minimum information, I am looking for suggestions that may push me in the right direction. I initially populate all nodes using an recursive method.

node-map.png
0
Comment
Question by:thydzik
  • 12
  • 11
23 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 22728794
Is the set of 5 nodes under the left output of the left child of the root also a cluster?
Is the set of two nodes under the right child of the left child of the root also a cluster?
Is the set of one node under that also a cluster?
If not, why not?
0
 
LVL 11

Author Comment

by:thydzik
ID: 22728819
yep, sorry. rushed the diagram.
as per your comment, those should all be clusters. and need toa way to find out how to determine them.


node-map-2.png
0
 
LVL 84

Expert Comment

by:ozo
ID: 22729688
#!/usr/bin/perl
use strict;
use warnings;
my %fwd;
my %back;
my %ancestor;
my %descendant;
while( <DATA> ){
    if(  my($from,$to)=split/\W+/ ){
        $fwd{$from}{$to}++;
        $back{$to}{$from}++;
        @{$descendant{$_}}{keys %{$descendant{$to}},$to}=() for $from,keys %{$ancestor{$from}};
        @{$ancestor{$_}}{keys %{$ancestor{$from}},$from}=() for $to,keys %{$descendant{$to}};

    }
}
my @clusters;
for( keys %back ){
      if( !$fwd{$_} ){  #leaf node
          my %set=($_,$_);
          my %new=%set;
          while( my @refs=grep{!exists $set{$_}} map{keys %{$back{$_}}} keys %new ){
              push @clusters,[keys %set] if @refs==1;
              %new=();
              @new{ grep{!exists $set{$_}}map{keys %{$descendant{$_}},$_}@refs}=();
              @set{keys %new}=();
          }
     }
  }
print "[@$_]\n" for @clusters;
__DATA__
A -> B
A -> C
B -> D
E -> D
D -> F
F -> G
E -> H
H -> G
B -> I
I -> J
J -> K
C -> L
C -> M
L -> N
M -> N



Produces the following clusters
[M C N L]
[K]
[J K]
[J K I]
[F E H D G]
[F J K B E H D I G]

0
 
LVL 11

Author Comment

by:thydzik
ID: 22729818
thanks for the reply, but sorry to be a pain, could you please explain the code. or the basic principles

I am not familiar with the language,  I know vb, php, java
0
 
LVL 84

Expert Comment

by:ozo
ID: 22729899
basically, it traces sets back from leaf nodes noting a cluster whenever there is only one arrow from outside the set to inside the set.
0
 
LVL 11

Author Comment

by:thydzik
ID: 22730018
okay, I moderately follow.

trace back references finding the single link, make the start of a cluster.

but how do you approach when the left splits into two?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22734543
when I add a node to a set, I also add all descendants of that node
0
 
LVL 11

Author Comment

by:thydzik
ID: 22737057
still not following

lets say we are trying to find the right hand side cluster of 4 nodes, please walk me through the steps.

we find the bottom leaf node as it has no outputs.
then I transverse up the left parent node, what tests do I perform? do I continue moving up or then transverse the missed right node?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22737088
As I think about it, I think the above program may fail to find some clusters in a more complicated diagram
I assume that
If a node is in a cluster, then all of its descendants are also in that cluster
If a node is in a cluster, and that node has more than one parent, and those parents have a common ancestor,
then there will be a common ancestor in the cluster
0
 
LVL 11

Author Comment

by:thydzik
ID: 22737351
I guess bad drawing on my part, but we can also have the following situation.

but your solution got me thinking.

we find the single cluster nodes as it has no inputs or no outputs.
you transverse to any node adjacent node, as soon as it gets to a node with a path choice (ie multiple connections), you stop and go back to the previous missed path choice.
if you reach a node with the single parent connection and no missed paths, then those nodes are considered a cluster.
if you reach a node with no more paths, then go to anymore missed paths or consider everything as 1 cluster.

can you please confirm?

i have been thinking about avoidance of duplication, because you are starting at each single cluster node, a lot of doubling up will occur. to avoid this if each node with > 3 connections will have a processed tick, if it has been processed it will make this tick true. and any other clusters will stop processing it if already processed.

can you also confirm it will work?

Thanks
node-map-3.png
0
 
LVL 11

Author Comment

by:thydzik
ID: 22737621
okay, the more i think about it the more complicated it gets, and the method above doesn't work.

here is a new node map.
node-map-4.png
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 84

Expert Comment

by:ozo
ID: 22737897
Could you label the nodes either circle or list  the clusters so we can test algorithms against it?
It looks to me like the node in the middle belongs to different clusters that overlap but one is not contained in the other.
(and also to some clusters such that one does contain the other)
In the worst case, testing all possible subsets to see if they are a cluster should work, but may become impractical for large maps
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 500 total points
ID: 22737913
I think it is still true that there can be no more than one cluster associated with each arrow,
So it should suffice to check each arrow to see if it defines a cluster.
In fact, can we say that every arrow that does not form a loop defines a cluster?
0
 
LVL 11

Author Comment

by:thydzik
ID: 22738023
yes, you are correct no more than one cluster per arrow.

i have updated the diagram, added labels and marked the clusters. I realised that each cluster will have an opposite cluster of the remaining nodes, so I have circled only the single side. made the diagram a little messy though.

node-map-5.png
0
 
LVL 84

Expert Comment

by:ozo
ID: 22738954
> I realised that each cluster will have an opposite cluster of the remaining nodes
So the arrow directions don't matter?
Then every edge that is not part of a cycle divides the graph into two clusters?
0
 
LVL 11

Author Comment

by:thydzik
ID: 22738978
yep, arrow directions don't matter.

Then every edge that is not part of a cycle divides the graph into two clusters?
don't follow.
0
 
LVL 84

Expert Comment

by:ozo
ID: 22739049
The edges between
A and D
B and D
C and D
D and E
E and F
F and L
F and K
F and J
F and L
F and M
L and M

Not the edges between
F and G
G and H
H and I
I and J
because
F G H I
form a cycle
0
 
LVL 11

Author Comment

by:thydzik
ID: 22739119
ahh, yes, a cluster should be connected via the single arrow only

0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 22743556
So an algorithm could be
for each edge
   find all nodes reachable from one end of the edge without using that edge
   find all nodes reachable from the other  end of the edge without using that edge
end
if the edge is part of a cycle, then the entire graph will still be reachable without using that edge,
but if you consider the entire graph to be a cluster too, then you just need to eliminate the duplicate clusters that the above procedure will find
0
 
LVL 11

Author Comment

by:thydzik
ID: 22746585
okay, that makes sense

but what would be the best way to determine if an edge is part of a cycle?
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 500 total points
ID: 22746696
While in the process of finding all the nodes reachable from one end of the edge without using that edge,
if you ever reach the node on the other end of that edge, you know you have a cycle.
(this might be ok if a cluster is also formed by all the nodes -- and its opposite cluster, which is none of the nodes? )
But it may be less work to just  stop at that point and handle the all or none cases separately.
0
 
LVL 11

Author Comment

by:thydzik
ID: 22747453
thank you very much for your help.
I have successfully implemented the algorithm.

to summarise,
* looped through all connections (arrows)
* chose either the tail or head node of the arrow (lets say head)
* recursively find all nodes references to the head while populating a cluster set of nodes at the same time
* if any nodes reference the tail node, then stop the iterations, delete the working cluster set and move to the next connection
0
 
LVL 11

Author Closing Comment

by:thydzik
ID: 31506630
thanks again for taking the time to answer this relatively complex problem
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Article by: Nadia
Suppose you use Uber application as a rider and you request a ride to go from one place to another. Your driver just arrived at the parking lot of your place. The only thing you know about the ride is the license plate number. How do you find your U…
Article by: Nadia
Linear search (searching each index in an array one by one) works almost everywhere but it is not optimal in many cases. Let's assume, we have a book which has 42949672960 pages. We also have a table of contents. Now we want to read the content on p…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now