|
Class: StringUtilities
Object
|
+--StringUtilities
- Package:
- stx:libbasic2
- Category:
- Collections-Text-Support
- Version:
- rev:
1.14
date: 2022/12/15 11:58:13
- user: stefan
- file: StringUtilities.st directory: libbasic2
- module: stx stc-classLibrary: libbasic2
some less often used algorithms have been moved to here to
make libbasic more compact.
copyrightCOPYRIGHT (c) 1994 by Claus Gittinger
COPYRIGHT (c) 2009 by eXept Software AG
All Rights Reserved
This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.
edit distance
-
editDistanceFrom: s1 to: s2 s: substWeight k: kbdTypoWeight c: caseWeight e: exchangeWeight i: insrtWeight
-
another, simpler editing distance between two strings.
See also: levenshtein
Usage example(s):
'comptuer' levenshteinTo:'computer' -> 8
self editDistanceFrom:'comptuer' to:'computer' s:4 k:2 c:1 e:nil i:2 -> 8
'computr' levenshteinTo:'computer' -> 2
self editDistanceFrom:'computr' to:'computer' s:4 k:2 c:1 e:nil i:2 -> 2
'computer' levenshteinTo:'computre' -> 8
self editDistanceFrom:'computer' to:'computre' s:4 k:2 c:1 e:nil i:2 -> 8
'copmuter' levenshteinTo:'computer' -> 8
self editDistanceFrom:'copmuter' to:'computer' s:4 k:2 c:1 e:nil i:2 -> 8
|
-
isKey: k1 nextTo: k2
-
return true, if k1 and k2 are adjacent keys on the keyboard.
This is used to specially priorize plausible typing errors of adjacent keys.
Usage example(s):
self isKey:$a nextTo:$a
self isKey:$a nextTo:$s
self isKey:$a nextTo:$q
self isKey:$a nextTo:$w
self isKey:$a nextTo:$y
self isKey:$a nextTo:$z
self isKey:$a nextTo:$x
self isKey:$ö nextTo:$ä onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)
self isKey:$ü nextTo:$ä onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)
self isKey:$t nextTo:$z onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)
self isKey:$t nextTo:$z onKeyboard:(StringUtilities keyboardLayoutForLanguage:#en)
|
-
isKey: k1 nextTo: k2 onKeyboard: keys
-
return true, if k1 and k2 are adjacent keys on the keyboard defined by keys
Usage example(s):
self isKey:$a nextTo:$q onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)
self isKey:$a nextTo:$x onKeyboard:(StringUtilities keyboardLayoutForLanguage:#de)
|
-
keyboardLayout
-
the keyboard layout (used with algorithms to find possible typing errors,
for example: edit distance in levenshtein)
Usage example(s):
-
keyboardLayoutForLanguage: lang
-
the keyboard layout (used with algorithms to find possible nearby-key typing errors,
for example: edit distance in levenshtein).
CAVEAT:
hard coded some common languages' keyboards here - should go into resource file.
Usage example(s):
self keyboardLayoutForLanguage:#de
|
-
levenshteinDistanceFrom: string1 to: string2 s: substWeight k: kbdTypoWeight c: caseWeight e: exchangeWeight i: insrtWeight d: deleteWeight
-
parametrized levenshtein.
return the levenshtein distance of two strings;
this value corrensponds to the number of replacements that have to be
made to get string2 from string1. The smaller the returned number,
the more similar are the two strings.
This levenshtein is customizable, and (with proper parameters) better suited
to find matches while programming (typically: case differences and typo-swapping-characters).
(the default entry via levenshtein: provides the standard weights, as documented and implemented elsewhere).
The arguments are the costs for
s:substitution,
k:keyboard type (substitution), if nil, s is used
c:case-change, if nil, s is used
i:insertion
d:deletion
e:exchange if nil, s*2 is used
of a character.
The default levenshtein has k=nil, c=nil and e=nil,
whereas to get better matches when searching for eg. code, better results are generated
when c< k < s and e < 2*2.
See IEEE transactions on Computers 1976 Pg 172 ff.
Usage example(s):
'comptuer' levenshteinTo:'computer'
self levenshteinDistanceFrom:'comptuer' to:'computer'
s:4 k:2 c:1 e:nil i:2 d:6
|
matching
-
stringMatchFunctionFor: aMultiPattern glob: searchForGlobPattern regex: searchForRegexPattern caseSensitive: searchIsCaseSensitive
-
generates a check function which - given a string - checks for a match.
The match-pattern argument aMultiPattern
may contain multiple patterns separated by ';' (for and-search) or '|' (for or-search).
If the pattern is invalid, nil is returned and an information-notification
is signalled
Usage example(s):
|fn|
fn := self stringMatchFunctionFor:'aaa|bbb' glob:false regex:false caseSensitive:false.
fn value:' aaa '.
fn value:' aa '.
fn value:' bbb '.
fn value:' aa bb '.
|
Usage example(s):
|fn|
fn := self stringMatchFunctionFor:'aaa;bbb' glob:false regex:false caseSensitive:false.
fn value:' aaa '.
fn value:' aa '.
fn value:' bbb '.
fn value:' aa bb '.
fn value:' aaa bb '.
fn value:' aaa bbb '.
|
Usage example(s):
|fn|
fn := self stringMatchFunctionFor:'aa*;bb*' glob:true regex:false caseSensitive:false.
fn value:' aaa '.
fn value:' aa '.
fn value:' bbb '.
fn value:' aa bb '.
fn value:' aaa bb '.
fn value:' aaa bbb '.
|
queries
-
isUtilityClass
-
(comment from inherited method)
a utility class is one which is not to be instantiated,
but only provides a number of utility functions on the class side.
It is usually also abstract
|