Bloom Filter intersection prone to Birthday Paradox?
Posted on 2010-08-24
Consider the following abstract idea, for bloom filter matching, that was proposed to me as an alternative to looking up individual candidates one at a time...
Assume we are using a general purpose bloom filter that sets K bits per entry where K is undetermined but > 1
We build a bloom filter for a collection. We then have a candidate list of items we want to check against that filter so we build another filter out of the candidate list. We then perform perform a bit intersection (binary and operation) on the two filters. It is assumed that if any of the bits in the result are set there is a reasonably good probability that one of the items in Set B can be found in set A.
It is my contention that, actually, the probability of a false positive is very high because we are saying if any bit in filter A matches any bit in filter B there is a match. I see this as being no different to the Birthday Paradox where we ask if any person in the room shares any birthday.
Am I wrong or right and (without going beyond simple high school maths - as I am really not that clever) can you prove it? Would this still be an issue if we only set 1 bit per entry?
NB. This is not a "homework question" - check my profile - it is a real world discussion I am having.