Link to home
Start Free TrialLog in
Avatar of william007
william007

asked on

Hash Join Algorithm

I have two tables, table implements using List<int[]> structure.
The table has two columns(which mean int array is of size two)

List<int[]> tablea;
List<int[]> tableb;

Content of tablea
 a b
{1,2}
{3,4}
{5,6}

content of tableb
 b c
{2,3}
{2,4}
{6,7}

Now I wish to join two tables by field b, hence the result is
 a b c
{1,2,3}
{1,2,4}
{5,6,7}

I know I can do this is sort merge join algorithms, and the algorithms is quite intuitive.
But I heard hybrid hashjoin(mentioned here http://en.wikipedia.org/wiki/Hash_join#_note-1) is faster.

May I know how to use hybrid hashjoin algorithms to join tablea and tableb in C#(Preferably) or java?
(What I concern here is speed rather than space)
Avatar of oleber
oleber
Flag of Portugal image

I would first of all reconsider the data struture. Is it necessary to have List<int[]>

Or can you have a Hash<int[]>, having the keys has your first column and the value with all the second value.

In this way your work is nothing more then a simple loop in your keys
Avatar of william007
william007

ASKER

Hi, there is no key column in the table, but each row is guarantee to be unique.
SOLUTION
Avatar of Göran Andersson
Göran Andersson
Flag of Sweden image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
look to my idea in Java, notice that I'm considering that you have the 2 array

HashTable<Integer, List<Integer>> hashb = new HashTable<Integer, List<Integer>>();

for (int[] ai: tableb) {
   if (! hashb.contains(ai[0])) {
      hashb.put(ai[0], new List<Integer>);
   }
   hashb.get(ai[0]).add(ai[0]);
}

Now to get your tuples

for (int[] ai: tablea) {
   if (! hashb.contains(ai[1])) {
      for (int i: hashb.get(ai[1])) {
         System.out.println(ai[0]+" "+ ai[1] +" " + i)
      }
   }
}

The complexity shall be m+n since you are passing once each array.

Note: the code shall not be 100% correct, but the idea is there
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks :)
What happed in here?

The guy copy's my solution, transforms the code to C# and gets the points?



How do I filter out this user?
I certainly did not copy your solution!