eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'BloomFilter':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: BloomFilter


Inheritance:

   Object
   |
   +--Collection
      |
      +--BloomFilter

Package:
stx:libbasic2
Category:
Collections-Unordered
Version:
rev: 1.3 date: 2019/05/30 09:39:24
user: cg
file: BloomFilter.st directory: libbasic2
module: stx stc-classLibrary: libbasic2

Description:


a BloomFilter (see eg. https://de.wikipedia.org/wiki/Bloomfilter)
is a Set-like data structure which can quickly answer whether
an element is definitely NOT in the set, but might generate false positives
(i.e. it might answer true, although the element is not actually present).

Bloom filters are used for tuning, when accesses (eg. to the disk or database)
are expensive, and can be avoided in many cases (in the NOT present case),
and false positives only lead to additional useless queries.

Bloom filters cannot remove or enumerate elements. 
The only useful operations are adding and inclusion testing (includes).

For example, a database query (eg. 'is something present'), 
might make use of a bloom filter, to avoid the query in many cases.

Bloom filters false positive rate degrades as the number of elements is added,
thus a good choice of its initial size is mandatory.


Class protocol:

instance creation
o  new: initialNumberOfBits
(comment from inherited method)
return an instance of myself with anInteger indexed variables

o  new: initialNumberOfBits hashes: hashActionBlocks


Instance protocol:

adding
o  add: aString
add the arguemnt, aString to the set

blocked
o  do: aBlock
(comment from inherited method)
evaluate the argument, aBlock for each element

o  remove: aString
raise an error - elements cannot be removed

initialization
o  numberOfBits: initialNumberOfBits hashes: hashingBlocksArg

queries
o  includes: aString
returns false, if the arguemnt, aString is definitly NOT in the set.
However, it may return true as a false positive


Examples:


<<END
    |filter set nFalsePositive|

    filter := BloomFilter new:10000.
    set := Set new:10000.
    
    "/ add all words from the Collection class's source code
    Collection sourceStream 
        linesDo:[:eachLine |
            eachLine 
               splitOn:[:ch | ch isSeparator]
               do:[:word | 
                  filter add:word.
                  set add:word.
               ].
        ];
        close.
    
    "/ check some which are known to be present
    set do:[:w |
        self assert:(filter includes:w)
    ].    
    "/ check for false positives
    nFalsePositive :=
        (1 to:1000) count:[:n |
            |randomWord isFalsePositive|

            isFalsePositive := false.
            randomWord := (1 to:6) collect:[:i | ($a to: $z) atRandom] as:String.
            (set includes:randomWord) ifFalse:[
                (filter includes:randomWord) ifTrue:[
                    Transcript showCR:'false positive: ',randomWord.
                    isFalsePositive := true.
                ].    
            ].
            isFalsePositive
        ].
    Transcript show:nFalsePositive; showCR:' false positives'.
    filter inspect.
END>>"

ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Thu, 19 Sep 2019 09:01:51 GMT