Hash Join Algorithm

I have two tables, table implements using List<int[]> structure.
The table has two columns(which mean int array is of size two)

List<int[]> tablea;
List<int[]> tableb;

Content of tablea
 a b
{1,2}
{3,4}
{5,6}

content of tableb
 b c
{2,3}
{2,4}
{6,7}

Now I wish to join two tables by field b, hence the result is
 a b c
{1,2,3}
{1,2,4}
{5,6,7}

I know I can do this is sort merge join algorithms, and the algorithms is quite intuitive.
But I heard hybrid hashjoin(mentioned here http://en.wikipedia.org/wiki/Hash_join#_note-1) is faster.

May I know how to use hybrid hashjoin algorithms to join tablea and tableb in C#(Preferably) or java?
(What I concern here is speed rather than space)
LVL 9
william007Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

oleberCommented:
I would first of all reconsider the data struture. Is it necessary to have List<int[]>

Or can you have a Hash<int[]>, having the keys has your first column and the value with all the second value.

In this way your work is nothing more then a simple loop in your keys
0
william007Author Commented:
Hi, there is no key column in the table, but each row is guarantee to be unique.
0
Göran AnderssonCommented:
Just make a dictionary from tablea with the b value as key, then loop through tableb and look for matches.

If you expect a big difference in size, you should make a dictionary of the smaller table, but otherwise it doesn't matter much.

The value that you use as key in the dictionary has to be unique, though. Otherwise you will need to use a Dictionary<int, List<int[]>> and collect all the items with the same key in the list, then when joining them loop throught the list and add a line to the result for each item in the list.
Dictionary<int, int[]> hashA = new Dictionary<int, int[]>();
foreach (int[] a in tablea) {
   hashA.Add(a[1], a);
}
List<int[]> result = new List<int[]>();
foreach (int[] b in tableb) {
   int[] a;
   if (hashA.TryGetValue(b[0], out a) {
      result.Add(new int[] { a[0], a[1], b[1] });
   }
}

Open in new window

0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

oleberCommented:
look to my idea in Java, notice that I'm considering that you have the 2 array

HashTable<Integer, List<Integer>> hashb = new HashTable<Integer, List<Integer>>();

for (int[] ai: tableb) {
   if (! hashb.contains(ai[0])) {
      hashb.put(ai[0], new List<Integer>);
   }
   hashb.get(ai[0]).add(ai[0]);
}

Now to get your tuples

for (int[] ai: tablea) {
   if (! hashb.contains(ai[1])) {
      for (int i: hashb.get(ai[1])) {
         System.out.println(ai[0]+" "+ ai[1] +" " + i)
      }
   }
}

The complexity shall be m+n since you are passing once each array.

Note: the code shall not be 100% correct, but the idea is there
0
Göran AnderssonCommented:
The ode that can handle non-unique key values, would be like this:
Dictionary<int, List<int[]>> hashA = new Dictionary<int, List<int[]>>();
foreach (int[] a in tablea) {
   List<int> list;
   if (!hashA.TryGetValue(a[1], out list) {
      list = new List<int>();
      hashA.Add(a[1], list);
   }
   list.Add(a);
}
List<int[]> result = new List<int[]>();
foreach (int[] b in tableb) {
   List<int[]> a;
   if (hashA.TryGetValue(b[0], out a) {
      foreach (int[] line in a) {
         result.Add(new int[] { line[0], line[1], b[1] });
      }
   }
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
william007Author Commented:
Thanks :)
0
oleberCommented:
What happed in here?

The guy copy's my solution, transforms the code to C# and gets the points?



How do I filter out this user?
0
Göran AnderssonCommented:
I certainly did not copy your solution!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Algorithms

From novice to tech pro — start learning today.