Hash Join Algorithm

I have two tables, table implements using List<int[]> structure.
The table has two columns(which mean int array is of size two)

List<int[]> tablea;
List<int[]> tableb;

Content of tablea
 a b
{1,2}
{3,4}
{5,6}

content of tableb
 b c
{2,3}
{2,4}
{6,7}

Now I wish to join two tables by field b, hence the result is
 a b c
{1,2,3}
{1,2,4}
{5,6,7}

I know I can do this is sort merge join algorithms, and the algorithms is quite intuitive.
But I heard hybrid hashjoin(mentioned here http://en.wikipedia.org/wiki/Hash_join#_note-1) is faster.

May I know how to use hybrid hashjoin algorithms to join tablea and tableb in C#(Preferably) or java?
(What I concern here is speed rather than space)
LVL 9
william007Asked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
Göran AnderssonConnect With a Mentor Commented:
The ode that can handle non-unique key values, would be like this:
Dictionary<int, List<int[]>> hashA = new Dictionary<int, List<int[]>>();
foreach (int[] a in tablea) {
   List<int> list;
   if (!hashA.TryGetValue(a[1], out list) {
      list = new List<int>();
      hashA.Add(a[1], list);
   }
   list.Add(a);
}
List<int[]> result = new List<int[]>();
foreach (int[] b in tableb) {
   List<int[]> a;
   if (hashA.TryGetValue(b[0], out a) {
      foreach (int[] line in a) {
         result.Add(new int[] { line[0], line[1], b[1] });
      }
   }
}

Open in new window

0
 
oleberCommented:
I would first of all reconsider the data struture. Is it necessary to have List<int[]>

Or can you have a Hash<int[]>, having the keys has your first column and the value with all the second value.

In this way your work is nothing more then a simple loop in your keys
0
 
william007Author Commented:
Hi, there is no key column in the table, but each row is guarantee to be unique.
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
Göran AnderssonConnect With a Mentor Commented:
Just make a dictionary from tablea with the b value as key, then loop through tableb and look for matches.

If you expect a big difference in size, you should make a dictionary of the smaller table, but otherwise it doesn't matter much.

The value that you use as key in the dictionary has to be unique, though. Otherwise you will need to use a Dictionary<int, List<int[]>> and collect all the items with the same key in the list, then when joining them loop throught the list and add a line to the result for each item in the list.
Dictionary<int, int[]> hashA = new Dictionary<int, int[]>();
foreach (int[] a in tablea) {
   hashA.Add(a[1], a);
}
List<int[]> result = new List<int[]>();
foreach (int[] b in tableb) {
   int[] a;
   if (hashA.TryGetValue(b[0], out a) {
      result.Add(new int[] { a[0], a[1], b[1] });
   }
}

Open in new window

0
 
oleberCommented:
look to my idea in Java, notice that I'm considering that you have the 2 array

HashTable<Integer, List<Integer>> hashb = new HashTable<Integer, List<Integer>>();

for (int[] ai: tableb) {
   if (! hashb.contains(ai[0])) {
      hashb.put(ai[0], new List<Integer>);
   }
   hashb.get(ai[0]).add(ai[0]);
}

Now to get your tuples

for (int[] ai: tablea) {
   if (! hashb.contains(ai[1])) {
      for (int i: hashb.get(ai[1])) {
         System.out.println(ai[0]+" "+ ai[1] +" " + i)
      }
   }
}

The complexity shall be m+n since you are passing once each array.

Note: the code shall not be 100% correct, but the idea is there
0
 
william007Author Commented:
Thanks :)
0
 
oleberCommented:
What happed in here?

The guy copy's my solution, transforms the code to C# and gets the points?



How do I filter out this user?
0
 
Göran AnderssonCommented:
I certainly did not copy your solution!
0
All Courses

From novice to tech pro — start learning today.