eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'StringUtilities':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: StringUtilities


Inheritance:

   Object
   |
   +--StringUtilities

Package:
stx:libbasic2
Category:
Collections-Text-Support
Version:
rev: 1.14 date: 2022/12/15 11:58:13
user: stefan
file: StringUtilities.st directory: libbasic2
module: stx stc-classLibrary: libbasic2

Description:


some less often used algorithms have been moved to here to
make libbasic more compact.

copyright

COPYRIGHT (c) 1994 by Claus Gittinger COPYRIGHT (c) 2009 by eXept Software AG All Rights Reserved This software is furnished under a license and may be used only in accordance with the terms of that license and with the inclusion of the above copyright notice. This software may not be provided or otherwise made available to, or used by, any other person. No title to or ownership of the software is hereby transferred.

Class protocol:

edit distance
o  editDistanceFrom: s1 to: s2 s: substWeight k: kbdTypoWeight c: caseWeight e: exchangeWeight i: insrtWeight
another, simpler editing distance between two strings.
See also: levenshtein

Usage example(s):

     'comptuer' levenshteinTo:'computer'                                    -> 8
     self editDistanceFrom:'comptuer' to:'computer' s:4 k:2 c:1 e:nil i:2   -> 8     

     'computr' levenshteinTo:'computer'                                     -> 2
     self editDistanceFrom:'computr' to:'computer' s:4 k:2 c:1 e:nil i:2    -> 2    

     'computer' levenshteinTo:'computre'                                    -> 8
     self editDistanceFrom:'computer' to:'computre' s:4 k:2 c:1 e:nil i:2   -> 8     

     'copmuter' levenshteinTo:'computer'                                    -> 8
     self editDistanceFrom:'copmuter' to:'computer' s:4 k:2 c:1 e:nil i:2   -> 8

o  isKey: k1 nextTo: k2
return true, if k1 and k2 are adjacent keys on the keyboard.
This is used to specially priorize plausible typing errors of adjacent keys.

Usage example(s):

     self isKey:$a nextTo:$a   
     self isKey:$a nextTo:$s   
     self isKey:$a nextTo:$q   
     self isKey:$a nextTo:$w    
     self isKey:$a nextTo:$y    
     self isKey:$a nextTo:$z    
     self isKey:$a nextTo:$x    
     self isKey:$ö nextTo:$ä onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)   
     self isKey:$ü nextTo:$ä onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)   
     self isKey:$t nextTo:$z onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)   
     self isKey:$t nextTo:$z onKeyboard:(StringUtilities keyboardLayoutForLanguage:#en)   

o  isKey: k1 nextTo: k2 onKeyboard: keys
return true, if k1 and k2 are adjacent keys on the keyboard defined by keys

Usage example(s):

     self isKey:$a nextTo:$q onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)
     self isKey:$a nextTo:$x onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)

o  keyboardLayout
the keyboard layout (used with algorithms to find possible typing errors,
for example: edit distance in levenshtein)

Usage example(s):

     self keyboardLayout

o  keyboardLayoutForLanguage: lang
the keyboard layout (used with algorithms to find possible nearby-key typing errors,
for example: edit distance in levenshtein).
CAVEAT:
hard coded some common languages' keyboards here - should go into resource file.

Usage example(s):

     self keyboardLayoutForLanguage:#de 

o  levenshteinDistanceFrom: string1 to: string2 s: substWeight k: kbdTypoWeight c: caseWeight e: exchangeWeight i: insrtWeight d: deleteWeight
parametrized levenshtein.
return the levenshtein distance of two strings;
this value corrensponds to the number of replacements that have to be
made to get string2 from string1. The smaller the returned number,
the more similar are the two strings.

This levenshtein is customizable, and (with proper parameters) better suited
to find matches while programming (typically: case differences and typo-swapping-characters).
(the default entry via levenshtein: provides the standard weights, as documented and implemented elsewhere).

The arguments are the costs for
s:substitution,
k:keyboard type (substitution), if nil, s is used
c:case-change, if nil, s is used
i:insertion
d:deletion
e:exchange if nil, s*2 is used
of a character.
The default levenshtein has k=nil, c=nil and e=nil,
whereas to get better matches when searching for eg. code, better results are generated
when c< k < s and e < 2*2.
See IEEE transactions on Computers 1976 Pg 172 ff.

Usage example(s):

     'comptuer' levenshteinTo:'computer'       

     self levenshteinDistanceFrom:'comptuer' to:'computer' 
            s:4 k:2 c:1 e:nil i:2 d:6    

matching
o  stringMatchFunctionFor: aMultiPattern glob: searchForGlobPattern regex: searchForRegexPattern caseSensitive: searchIsCaseSensitive
generates a check function which - given a string - checks for a match.
The match-pattern argument aMultiPattern
may contain multiple patterns separated by ';' (for and-search) or '|' (for or-search).
If the pattern is invalid, nil is returned and an information-notification
is signalled

Usage example(s):

     |fn|
     fn := self stringMatchFunctionFor:'aaa|bbb' glob:false regex:false caseSensitive:false.
     fn value:'   aaa   '.
     fn value:'   aa   '.
     fn value:'   bbb   '.
     fn value:'  aa bb   '.

Usage example(s):

     |fn|
     fn := self stringMatchFunctionFor:'aaa;bbb' glob:false regex:false caseSensitive:false.
     fn value:'   aaa   '.
     fn value:'   aa   '.
     fn value:'   bbb   '.
     fn value:'  aa bb   '.
     fn value:'  aaa bb   '.
     fn value:'  aaa bbb   '.

Usage example(s):

     |fn|
     fn := self stringMatchFunctionFor:'aa*;bb*' glob:true regex:false caseSensitive:false.
     fn value:'   aaa   '.
     fn value:'   aa   '.
     fn value:'   bbb   '.
     fn value:'  aa bb   '.
     fn value:'  aaa bb   '.
     fn value:'  aaa bbb   '.

queries
o  isUtilityClass
(comment from inherited method)
a utility class is one which is not to be instantiated,
but only provides a number of utility functions on the class side.
It is usually also abstract



ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Mon, 18 Nov 2024 04:25:04 GMT