eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'PhoneticStringUtilities':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: PhoneticStringUtilities


Inheritance:

   Object
   |
   +--PhoneticStringUtilities

Package:
stx:libbasic2
Category:
Collections-Text-Support
Version:
rev: 1.31 date: 2017/10/09 15:11:32
user: stefan
file: PhoneticStringUtilities.st directory: libbasic2
module: stx stc-classLibrary: libbasic2

Description:


Utilities which are helpful to perform phonetic string searches or comparisons.
These are all variations or improvements of the soundex algorithm, which usually fails
to provide good results for non-english languages.

soundexCode
    this algorithm was originally contained in the CharacterArray class;

nysiis
    a modified soundex algorithm

miracode
    another modified soundex algorithm ('american soundex') used in the 1880 census.

mySQLSoundex
    another modified soundex algorithm used in mySQL.

koelner phoneticCode 
    provides a functionality similar to soundex, but much more tuned towards the German language

Double metaphone 
    works with most european languages.

phonem
    described in Georg Wilde and Carsten Meyer, 'Doppelgaenger gesucht - Ein Programm fuer kontextsensitive phonetische Textumwandlung'
    from 'ct Magazin fuer Computer & Technik 25/1999'.

mra
    Match Rating Approach Phonetic Algorithm Developed by Western Airlines in 1977.

caverphone2
    better than soundex

spanish phonetic code
    an algorithm slightly adjusted to spanish names

More info for german readers is found in:
    http://www.uni-koeln.de/phil-fak/phonetik/Lehre/MA-Arbeiten/magister_wilz.pdf


Class protocol:

phonetic codes
o  koelnerPhoneticCodeOf: aString
return a koelner phonetic code.
The koelnerPhonetic code is for the german language what the soundex code is for english;
it returns simular strings for similar sounding words.
There are some differences to soundex, though:
its length is not limited to 4, but depends on the length of the original string;
it does not start with the first character of the input.
This algorithm is described by Postel 1969

usage example(s):

     #(
        'Müller'
        'Miller'
        'Mueller'
        'Mühler'
        'Mühlherr'
        'Mülherr'
        'Myler'
        'Millar'
        'Myller'
        'Müllar'
        'Müler'
        'Muehler'
        'Mülller'
        'Müllerr'
        'Muehlherr'
        'Muellar'
        'Mueler'
        'Mülleer'
        'Mueller'
        'Nüller'
        'Nyller'
        'Niler'
        'Czerny'
        'Tscherny'
        'Czernie'
        'Tschernie'
        'Schernie'
        'Scherny'
        'Scherno'
        'Czerne'
        'Zerny'
        'Tzernie'
        'Breschnew'
     ) do:[:w |
         Transcript show:w; show:'->'; showCR:(PhoneticStringUtilities koelnerPhoneticCodeOf:w)
     ].

usage example(s):

     PhoneticStringUtilities koelnerPhoneticCodeOf:'Breschnew'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Breschneff'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Braeschneff'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Braessneff'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Pressneff'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Presznäph'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Preschnjiev'. '17863'.

o  miracodeCodeOf: aString
return a miracode soundex phonetic code or nil.
Miracode is a slightly modified soundex algorithm.
Notice that there are better algorithms around (doubleMetaphone)

usage example(s):

     PhoneticStringUtilities miracodeCodeOf:'claus'   
     PhoneticStringUtilities miracodeCodeOf:'clause'   
     PhoneticStringUtilities miracodeCodeOf:'close'   
     PhoneticStringUtilities miracodeCodeOf:'smalltalk' 
     PhoneticStringUtilities miracodeCodeOf:'smaltalk'  
     PhoneticStringUtilities miracodeCodeOf:'smaltak'   
     PhoneticStringUtilities miracodeCodeOf:'smaltok'   
     PhoneticStringUtilities miracodeCodeOf:'smoltok'   
     PhoneticStringUtilities miracodeCodeOf:'aa'        
     PhoneticStringUtilities miracodeCodeOf:'by'        
     PhoneticStringUtilities miracodeCodeOf:'bab'       
     PhoneticStringUtilities miracodeCodeOf:'bob'       
     PhoneticStringUtilities miracodeCodeOf:'bop'       
     PhoneticStringUtilities miracodeCodeOf:'pub'       

o  mySQLSoundexCodeOf: aString
return the mySQL soundex code. The mysql soundex coed is different from the miracode 'american' soundex
(no 4char limitation; different order of duplicate vowel vs. duplicate code elimination).
Notice that there are better algorithms around (doubleMetaphone)

usage example(s):

     #(
        'Müller'
        'Miller'
        'Mueller'
        'Mühler'
        'Mühlherr'
        'Mülherr'
        'Myler'
        'Millar'
        'Myller'
        'Müllar'
        'Müler'
        'Muehler'
        'Mülller'
        'Müllerr'
        'Muehlherr'
        'Muellar'
        'Mueler'
        'Mülleer'
        'Mueller'
        'Nüller'
        'Nyller'
        'Niler'
        'Czerny'
        'Tscherny'
        'Czernie'
        'Tschernie'
        'Schernie'
        'Scherny'
        'Scherno'
        'Czerne'
        'Zerny'
        'Tzernie'
        'Breschnew'
     ) do:[:w |
         Transcript show:w; show:'->'; showCR:(PhoneticStringUtilities mySQLSoundexCodeOf:w)
     ].

usage example(s):

     PhoneticStringUtilities mySQLSoundexCodeOf:'Breschnew'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Breschneff'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Braeschneff'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Braessneff'.
     PhoneticStringUtilities mySQLSoundexCodeOf:'Pressneff'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Presznäph'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Preschnjiev'.

o  soundexCodeOf: aString
return a soundex phonetic code or nil.
Soundex (1918, 1922) returns similar codes for similar sounding words, making it a useful
tool when searching for words where the correct spelling is unknown.
(read Knuth or search the web if you don't know what a soundex code is).
Caveat: 'similar sounding words' means: 'similar sounding in english'.
Notice that there are better algorithms around (doubleMetaphone)

usage example(s):

     PhoneticStringUtilities soundexCodeOf:'claus'   
     PhoneticStringUtilities soundexCodeOf:'clause'   
     PhoneticStringUtilities soundexCodeOf:'close'   
     PhoneticStringUtilities soundexCodeOf:'smalltalk' 
     PhoneticStringUtilities soundexCodeOf:'smaltalk'  
     PhoneticStringUtilities soundexCodeOf:'smaltak'   
     PhoneticStringUtilities soundexCodeOf:'smaltok'   
     PhoneticStringUtilities soundexCodeOf:'smoltok'   
     PhoneticStringUtilities soundexCodeOf:'aa'        
     PhoneticStringUtilities soundexCodeOf:'by'        
     PhoneticStringUtilities soundexCodeOf:'bab'       
     PhoneticStringUtilities soundexCodeOf:'bob'       
     PhoneticStringUtilities soundexCodeOf:'bop'       

queries
o  isUtilityClass


Private classes:

    Caverphone2StringComparator
    DaitchMokotoffStringComparator
    DoubleMetaphoneStringComparator
    ExtendedSoundexStringComparator
    KoelnerPhoneticCodeStringComparator
    MRAStringComparator
    MetaphoneStringComparator
    MiracodeStringComparator
    MySQLSoundexStringComparator
    NYSIISStringComparator
    PhonemStringComparator
    PhoneticStringComparator
    SingleResultPhoneticStringComparator
    SoundexStringComparator
    SpanishPhoneticCodeStringComparator


ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Thu, 28 Mar 2024 13:49:54 GMT