eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'FuzzyMatcher':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: FuzzyMatcher


Inheritance:

   Object
   |
   +--FuzzyMatcher

Package:
stx:libbasic2
Category:
Collections-Text-Support
Version:
rev: 1.14 date: 2023/12/13 11:04:54
user: cg
file: FuzzyMatcher.st directory: libbasic2
module: stx stc-classLibrary: libbasic2

Description:


FuzzyMatcher is an approximate string matching algorithm that can determine if a string includes a given pattern.
For example, the string 'axby' matches both the pattern 'ab' and, 'ay', but not 'ba'. 
I.e. it matches if the searched string contains a sequence of chars, probably intermixed by other chars,
which matches the given search pattern or part of it.

The algorithm is based on lib_fts[1], and includes an optional scoring algorithm 
that can be used to sort all the matches based on their similarity to the pattern.
It is used (among others) in the sublime text editor.

[caveat:]
    although this works great for class searches,
    it is strange that 'dabc' scores lower against 'abc' than 'adbc'
    (dabc has a longer common subsequence without interruptions...)

copyright

COPYRIGHT (c) 2018 by eXept Software AG All Rights Reserved This software is furnished under a license and may be used only in accordance with the terms of that license and with the inclusion of the above copyright notice. This software may not be provided or otherwise made available to, or used by, any other person. No title to or ownership of the software is hereby transferred.

example

|matcher| matcher := FuzzyMatcher pattern:'abc'. matcher match:'somearbitrarysequence' ifScored: [:score | Transcript show:('''somearbitrarysequence'' scores '); showCR:score]. matcher match:'someabcd' ifScored: [:score | Transcript show:('''someabcd'' scores '); showCR:score]. matcher match:'abcd' ifScored: [:score | Transcript show:('''abcd'' scores '); showCR:score]. matcher match:'dabc' ifScored: [:score | Transcript show:('''dabc'' scores '); showCR:score]. matcher match:'adbc' ifScored: [:score | Transcript show:('''adbc'' scores '); showCR:score]. matcher match:'abc' ifScored: [:score | Transcript show:('''abc'' scores '); showCR:score]. |top lv list field patternHolder names| patternHolder := '' asValue. list := List new. top := StandardSystemView new. lv := ListView origin:(0.0@30) corner:(1.0@1.0) in:top. lv model:list. field := EditField origin:(0.0@0.0) corner:(1.0@30) in:top. field model:patternHolder. field immediateAccept:true. names := Smalltalk allClasses collect:#name. patternHolder onChangeEvaluate:[ |matcher pattern matches| pattern := patternHolder value. pattern notEmpty ifTrue:[ matcher := FuzzyMatcher pattern:pattern. matches := OrderedCollection new. names do:[:eachClassName | matcher match:eachClassName ifScored: [ :score | matches add: eachClassName -> score ] ]. matches sort:[:a :b | a value < b value or:[ a value = b value and:[ a key > b key]] ]. list removeAll. list addAllReversed:(matches collect:[:nameScoreAssoc | '[%1] %2' bindWith:nameScoreAssoc value with:nameScoreAssoc key]) ]. ]. top open. patternHolder value:'mph'.

Class protocol:

instance creation
o  new
return an initialized instance

o  pattern: aString
(self pattern:'mrp') matches:'ButtonMorph'
(self pattern:'mrp') matches:'ButtonMorh'

(self pattern:'mrp') matches:'ButtonMorph'
(self pattern:'mrp') matches:'ButtonMorh'

utilities api
o  allMatching: aPattern in: aCollection
Assumes that the collection is a collection of Strings;
return all those which match

Usage example(s):

     self 
        allMatching:'clu' 
        in:(Smalltalk allClasses collect:#name)

o  allMatching: aPattern in: aCollection by: aBlockReturningString
selects matching elements from aCollection.
aBlockReturningString is applied to elements to get the string representation
(can be used eg. to sort classes)

Usage example(s):

         self 
            allMatching:'clu' 
            in:(Smalltalk allClasses)
            by:[:cls | cls name]

Usage example(s):

         self 
            allMatching:'clu' 
            in:(Smalltalk allClasses)
            by:#name

o  allSortedByScoreMatching: aPattern in: aCollection
Assumes that the collection is a collection of Strings;
returns matching strings sorted by score (level of similarity)

Usage example(s):

     self 
        allSortedByScoreMatching:'clu' 
        in:(Smalltalk allClasses collect:#name)

Usage example(s):

     self 
        allSortedByScoreMatching:'nary' 
        in:(Smalltalk allClasses collect:#name)

o  allSortedByScoreMatching: aPattern in: aCollection by: aBlockReturningString
selects matching elements from aCollection.
aBlockReturningString is applied to elements to get the string representation.
Returns them sorted by score (i.e. similarity).
(can be used eg. to sort classes)

Usage example(s):

     self 
        allSortedByScoreMatching:'' 
        in:(Smalltalk allClasses)
        by:[:cls | cls name]

Usage example(s):

     self 
        allSortedByScoreMatching:'nary' 
        in:(Smalltalk allClasses)
        by:[:cls | cls name]

Usage example(s):

     self 
        allSortedByScoreMatching:'nary' 
        in:(Smalltalk allClasses)
        by:#name

o  allWithScoresSortedByScoreMatching: aPattern in: aCollection by: aBlockReturningString
selects matching elements from aCollection.
aBlockReturningString is applied to elements to get the string representation.
Returns them sorted by score (i.e. similarity) associated to their scores.
(can be used eg. to sort classes)

Usage example(s):

     self 
        allWithScoresSortedByScoreMatching:'' 
        in:(Smalltalk allClasses)
        by:[:cls | cls name]

Usage example(s):

     self 
        allWithScoresSortedByScoreMatching:'OC' 
        in:(Smalltalk allClasses)
        by:[:cls | cls name]

Usage example(s):

     self 
        allWithScoresSortedByScoreMatching:'nary' 
        in:(Smalltalk allClasses)
        by:[:cls | cls name]

Usage example(s):

     self 
        allWithScoresSortedByScoreMatching:'nary' 
        in:(Smalltalk allClasses)
        by:#name


Instance protocol:

accessing
o  indexes
only valid inside the match callback block

o  pattern

o  pattern: aString
Modified (format): / 14-07-2017 / 12:59:15 / cg

api - comparing
o  match: aString ifScored: aBlock
If there is a match, evaluate aBlock, passing the score value

o  matchScoreOrNil: aString
return the scrore if there is a match; nil otherwise.

o  matches: aString
return true if there is a match; false otherwise.

initialization
o  initialize
Modified (format): / 14-07-2017 / 13:23:26 / cg

private
o  firstScore: aString at: anIndex

o  indexScore
Modified (format): / 14-07-2017 / 13:24:07 / cg

o  isSeparator: aCharacter

o  score: aString at: stringIndex patternAt: patternIndex

scoring-bonus
o  adjacencyBonus

o  adjacencyIncrease

o  adjacentCaseEqualBonus

o  caseEqualBonus

o  firstLetterBonus

o  separatorBonus

scoring-penalty
o  leadingLetterPenalty

o  maxLeadingLetterPenalty

o  unmatchedLetterPenalty



ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Mon, 18 Nov 2024 04:41:07 GMT