Assume a bloom filter api, with 2 parameters - 1. number of bits in bloom filter (n) and 2. expected number of insertions (m).
假设具有2个参数的布隆过滤器api - 1.布隆过滤器中的比特数(n)和2.预期的插入数量(m)。
Question:
Will m > n
always lead to complete
false positives? By complete
I intend to say, will every test for 'contains(element)' method return true, after m > n condition ?
m> n总会导致完全误报吗?通过完成我打算说,在m> n条件之后,'contains(element)'方法的每个测试都会返回true吗?
1 个解决方案
#1
1
The bloom filter will always answer yes not when your m > n, but when all n of its bits are 1 - then every query of h positions (where h is the number of hash functions) will yield h 1s. Still, the typical setup that optimizes the space vs. false positive rate tradeoff is when the probability of any bit being set is 1/2. The analysis is shown on the Bloom filter wikipedia article: http://en.wikipedia.org/wiki/Bloom_filter
布隆过滤器总是回答是,而不是当你的m> n,但是当它的所有n个位都是1时 - 那么h位置的每个查询(其中h是散列函数的数量)将产生h 1s。尽管如此,优化空间与误报率权衡的典型设置是当任何比特被设置的概率为1/2时。该分析显示在Bloom过滤器*文章中:http://en.wikipedia.org/wiki/Bloom_filter
#1
1
The bloom filter will always answer yes not when your m > n, but when all n of its bits are 1 - then every query of h positions (where h is the number of hash functions) will yield h 1s. Still, the typical setup that optimizes the space vs. false positive rate tradeoff is when the probability of any bit being set is 1/2. The analysis is shown on the Bloom filter wikipedia article: http://en.wikipedia.org/wiki/Bloom_filter
布隆过滤器总是回答是,而不是当你的m> n,但是当它的所有n个位都是1时 - 那么h位置的每个查询(其中h是散列函数的数量)将产生h 1s。尽管如此,优化空间与误报率权衡的典型设置是当任何比特被设置的概率为1/2时。该分析显示在Bloom过滤器*文章中:http://en.wikipedia.org/wiki/Bloom_filter