|
Class: PhoneticStringUtilities
Object
|
+--PhoneticStringUtilities
- Package:
- stx:libbasic2
- Category:
- Collections-Text-Support
- Version:
- rev:
1.31
date: 2017/10/09 15:11:32
- user: stefan
- file: PhoneticStringUtilities.st directory: libbasic2
- module: stx stc-classLibrary: libbasic2
Utilities which are helpful to perform phonetic string searches or comparisons.
These are all variations or improvements of the soundex algorithm, which usually fails
to provide good results for non-english languages.
soundexCode
this algorithm was originally contained in the CharacterArray class;
nysiis
a modified soundex algorithm
miracode
another modified soundex algorithm ('american soundex') used in the 1880 census.
mySQLSoundex
another modified soundex algorithm used in mySQL.
koelner phoneticCode
provides a functionality similar to soundex, but much more tuned towards the German language
Double metaphone
works with most european languages.
phonem
described in Georg Wilde and Carsten Meyer, 'Doppelgaenger gesucht - Ein Programm fuer kontextsensitive phonetische Textumwandlung'
from 'ct Magazin fuer Computer & Technik 25/1999'.
mra
Match Rating Approach Phonetic Algorithm Developed by Western Airlines in 1977.
caverphone2
better than soundex
spanish phonetic code
an algorithm slightly adjusted to spanish names
More info for german readers is found in:
http://www.uni-koeln.de/phil-fak/phonetik/Lehre/MA-Arbeiten/magister_wilz.pdf
phonetic codes
-
koelnerPhoneticCodeOf: aString
-
return a koelner phonetic code.
The koelnerPhonetic code is for the german language what the soundex code is for english;
it returns simular strings for similar sounding words.
There are some differences to soundex, though:
its length is not limited to 4, but depends on the length of the original string;
it does not start with the first character of the input.
This algorithm is described by Postel 1969
usage example(s):
#(
'Müller'
'Miller'
'Mueller'
'Mühler'
'Mühlherr'
'Mülherr'
'Myler'
'Millar'
'Myller'
'Müllar'
'Müler'
'Muehler'
'Mülller'
'Müllerr'
'Muehlherr'
'Muellar'
'Mueler'
'Mülleer'
'Mueller'
'Nüller'
'Nyller'
'Niler'
'Czerny'
'Tscherny'
'Czernie'
'Tschernie'
'Schernie'
'Scherny'
'Scherno'
'Czerne'
'Zerny'
'Tzernie'
'Breschnew'
) do:[:w |
Transcript show:w; show:'->'; showCR:(PhoneticStringUtilities koelnerPhoneticCodeOf:w)
].
|
usage example(s):
PhoneticStringUtilities koelnerPhoneticCodeOf:'Breschnew'. '17863'.
PhoneticStringUtilities koelnerPhoneticCodeOf:'Breschneff'. '17863'.
PhoneticStringUtilities koelnerPhoneticCodeOf:'Braeschneff'. '17863'.
PhoneticStringUtilities koelnerPhoneticCodeOf:'Braessneff'. '17863'.
PhoneticStringUtilities koelnerPhoneticCodeOf:'Pressneff'. '17863'.
PhoneticStringUtilities koelnerPhoneticCodeOf:'Presznäph'. '17863'.
PhoneticStringUtilities koelnerPhoneticCodeOf:'Preschnjiev'. '17863'.
|
-
miracodeCodeOf: aString
-
return a miracode soundex phonetic code or nil.
Miracode is a slightly modified soundex algorithm.
Notice that there are better algorithms around (doubleMetaphone)
usage example(s):
PhoneticStringUtilities miracodeCodeOf:'claus'
PhoneticStringUtilities miracodeCodeOf:'clause'
PhoneticStringUtilities miracodeCodeOf:'close'
PhoneticStringUtilities miracodeCodeOf:'smalltalk'
PhoneticStringUtilities miracodeCodeOf:'smaltalk'
PhoneticStringUtilities miracodeCodeOf:'smaltak'
PhoneticStringUtilities miracodeCodeOf:'smaltok'
PhoneticStringUtilities miracodeCodeOf:'smoltok'
PhoneticStringUtilities miracodeCodeOf:'aa'
PhoneticStringUtilities miracodeCodeOf:'by'
PhoneticStringUtilities miracodeCodeOf:'bab'
PhoneticStringUtilities miracodeCodeOf:'bob'
PhoneticStringUtilities miracodeCodeOf:'bop'
PhoneticStringUtilities miracodeCodeOf:'pub'
|
-
mySQLSoundexCodeOf: aString
-
return the mySQL soundex code. The mysql soundex coed is different from the miracode 'american' soundex
(no 4char limitation; different order of duplicate vowel vs. duplicate code elimination).
Notice that there are better algorithms around (doubleMetaphone)
usage example(s):
#(
'Müller'
'Miller'
'Mueller'
'Mühler'
'Mühlherr'
'Mülherr'
'Myler'
'Millar'
'Myller'
'Müllar'
'Müler'
'Muehler'
'Mülller'
'Müllerr'
'Muehlherr'
'Muellar'
'Mueler'
'Mülleer'
'Mueller'
'Nüller'
'Nyller'
'Niler'
'Czerny'
'Tscherny'
'Czernie'
'Tschernie'
'Schernie'
'Scherny'
'Scherno'
'Czerne'
'Zerny'
'Tzernie'
'Breschnew'
) do:[:w |
Transcript show:w; show:'->'; showCR:(PhoneticStringUtilities mySQLSoundexCodeOf:w)
].
|
usage example(s):
PhoneticStringUtilities mySQLSoundexCodeOf:'Breschnew'.
PhoneticStringUtilities mySQLSoundexCodeOf:'Breschneff'.
PhoneticStringUtilities mySQLSoundexCodeOf:'Braeschneff'.
PhoneticStringUtilities mySQLSoundexCodeOf:'Braessneff'.
PhoneticStringUtilities mySQLSoundexCodeOf:'Pressneff'.
PhoneticStringUtilities mySQLSoundexCodeOf:'Presznäph'.
PhoneticStringUtilities mySQLSoundexCodeOf:'Preschnjiev'.
|
-
soundexCodeOf: aString
-
return a soundex phonetic code or nil.
Soundex (1918, 1922) returns similar codes for similar sounding words, making it a useful
tool when searching for words where the correct spelling is unknown.
(read Knuth or search the web if you don't know what a soundex code is).
Caveat: 'similar sounding words' means: 'similar sounding in english'.
Notice that there are better algorithms around (doubleMetaphone)
usage example(s):
PhoneticStringUtilities soundexCodeOf:'claus'
PhoneticStringUtilities soundexCodeOf:'clause'
PhoneticStringUtilities soundexCodeOf:'close'
PhoneticStringUtilities soundexCodeOf:'smalltalk'
PhoneticStringUtilities soundexCodeOf:'smaltalk'
PhoneticStringUtilities soundexCodeOf:'smaltak'
PhoneticStringUtilities soundexCodeOf:'smaltok'
PhoneticStringUtilities soundexCodeOf:'smoltok'
PhoneticStringUtilities soundexCodeOf:'aa'
PhoneticStringUtilities soundexCodeOf:'by'
PhoneticStringUtilities soundexCodeOf:'bab'
PhoneticStringUtilities soundexCodeOf:'bob'
PhoneticStringUtilities soundexCodeOf:'bop'
|
queries
-
isUtilityClass
-
Caverphone2StringComparator
DaitchMokotoffStringComparator
DoubleMetaphoneStringComparator
ExtendedSoundexStringComparator
KoelnerPhoneticCodeStringComparator
MRAStringComparator
MetaphoneStringComparator
MiracodeStringComparator
MySQLSoundexStringComparator
NYSIISStringComparator
PhonemStringComparator
PhoneticStringComparator
SingleResultPhoneticStringComparator
SoundexStringComparator
SpanishPhoneticCodeStringComparator
|