william007
asked on
Hash Join Algorithm
I have two tables, table implements using List<int[]> structure.
The table has two columns(which mean int array is of size two)
List<int[]> tablea;
List<int[]> tableb;
Content of tablea
a b
{1,2}
{3,4}
{5,6}
content of tableb
b c
{2,3}
{2,4}
{6,7}
Now I wish to join two tables by field b, hence the result is
a b c
{1,2,3}
{1,2,4}
{5,6,7}
I know I can do this is sort merge join algorithms, and the algorithms is quite intuitive.
But I heard hybrid hashjoin(mentioned here http://en.wikipedia.org/wiki/Hash_join#_note-1) is faster.
May I know how to use hybrid hashjoin algorithms to join tablea and tableb in C#(Preferably) or java?
(What I concern here is speed rather than space)
The table has two columns(which mean int array is of size two)
List<int[]> tablea;
List<int[]> tableb;
Content of tablea
a b
{1,2}
{3,4}
{5,6}
content of tableb
b c
{2,3}
{2,4}
{6,7}
Now I wish to join two tables by field b, hence the result is
a b c
{1,2,3}
{1,2,4}
{5,6,7}
I know I can do this is sort merge join algorithms, and the algorithms is quite intuitive.
But I heard hybrid hashjoin(mentioned here http://en.wikipedia.org/wiki/Hash_join#_note-1) is faster.
May I know how to use hybrid hashjoin algorithms to join tablea and tableb in C#(Preferably) or java?
(What I concern here is speed rather than space)
ASKER
Hi, there is no key column in the table, but each row is guarantee to be unique.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
look to my idea in Java, notice that I'm considering that you have the 2 array
HashTable<Integer, List<Integer>> hashb = new HashTable<Integer, List<Integer>>();
for (int[] ai: tableb) {
if (! hashb.contains(ai[0])) {
hashb.put(ai[0], new List<Integer>);
}
hashb.get(ai[0]).add(ai[0] );
}
Now to get your tuples
for (int[] ai: tablea) {
if (! hashb.contains(ai[1])) {
for (int i: hashb.get(ai[1])) {
System.out.println(ai[0]+" "+ ai[1] +" " + i)
}
}
}
The complexity shall be m+n since you are passing once each array.
Note: the code shall not be 100% correct, but the idea is there
HashTable<Integer, List<Integer>> hashb = new HashTable<Integer, List<Integer>>();
for (int[] ai: tableb) {
if (! hashb.contains(ai[0])) {
hashb.put(ai[0], new List<Integer>);
}
hashb.get(ai[0]).add(ai[0]
}
Now to get your tuples
for (int[] ai: tablea) {
if (! hashb.contains(ai[1])) {
for (int i: hashb.get(ai[1])) {
System.out.println(ai[0]+"
}
}
}
The complexity shall be m+n since you are passing once each array.
Note: the code shall not be 100% correct, but the idea is there
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks :)
What happed in here?
The guy copy's my solution, transforms the code to C# and gets the points?
How do I filter out this user?
The guy copy's my solution, transforms the code to C# and gets the points?
How do I filter out this user?
I certainly did not copy your solution!
Or can you have a Hash<int[]>, having the keys has your first column and the value with all the second value.
In this way your work is nothing more then a simple loop in your keys