eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'PhoneticStringUtilities':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: PhoneticStringUtilities


Inheritance:

   Object
   |
   +--PhoneticStringUtilities

Package:
stx:libbasic2
Category:
Collections-Text-Support
Version:
rev: 1.43 date: 2023/09/08 14:10:04
user: cg
file: PhoneticStringUtilities.st directory: libbasic2
module: stx stc-classLibrary: libbasic2

Description:


Utilities which are helpful to perform phonetic string searches or comparisons.
These are all variations or improvements of the soundex algorithm, which usually fails
to provide good results for non-english languages.

soundexCode
    this algorithm was originally contained in the CharacterArray class;

nysiis
    a modified soundex algorithm

miracode
    another modified soundex algorithm ('american soundex') used in the 1880 census.

mySQLSoundex
    another modified soundex algorithm used in mySQL.

koelner phoneticCode 
    provides a functionality similar to soundex, but much more tuned towards the German language

Double metaphone 
    works with most european languages.

phonem
    described in Georg Wilde and Carsten Meyer, 'Doppelgaenger gesucht - Ein Programm fuer kontextsensitive phonetische Textumwandlung'
    from 'ct Magazin fuer Computer & Technik 25/1999'.

mra
    Match Rating Approach Phonetic Algorithm Developed by Western Airlines in 1977.

caverphone2
    better than soundex

spanish phonetic code
    an algorithm slightly adjusted to spanish names

More info for german readers is found in:
    http://www.uni-koeln.de/phil-fak/phonetik/Lehre/MA-Arbeiten/magister_wilz.pdf

copyright

COPYRIGHT (c) 1994 by Claus Gittinger COPYRIGHT (c) 2009 by eXept Software AG All Rights Reserved This software is furnished under a license and may be used only in accordance with the terms of that license and with the inclusion of the above copyright notice. This software may not be provided or otherwise made available to, or used by, any other person. No title to or ownership of the software is hereby transferred.

sampleData

for the 50 most common german names, we get: ext. name soundex soundex metaphone phonet phonet2 phonix daitsch phonem koeln caverphone2 mra müller M460 54600000 MLR MÜLA NILA M4000000 689000 MYLR 657 MLA1111111 MLR schmidt S530 25300000 SKMTT SHMIT ZNIT S5300000 463000 CMYD 862 SKMT111111 SCHMDT schneider S536 25360000 SKNTR SHNEIDA ZNEITA S5300000 463900 CNAYDR 8627 SKNTA11111 SCHNDR fischer F260 12600000 FSKR FISHA FIZA F8000000 749000 VYCR 387 FSKA111111 FSCHR weber W160 16000000 WBR WEBA FEBA $1000000 779000 VBR 317 WPA1111111 WBR meyer M600 56000000 MYR MEIA NEIA M0000000 619000 MAYR 67 MA11111111 MYR wagner W256 25600000 WKNR WAKNA FAKNA $2500000 756900 VACNR 3467 WKNA111111 WGNR schulz S420 24200000 SKLS SHULS ZULZ S4800000 484000 CULC 858 SKS1111111 SCHLZ becker B260 12600000 BKR BEKA BEKA B2000000 759000 BCR 147 PKA1111111 BCKR hoffmann H155 15500000 HFMN HOFMAN UFNAN $7550000 576600 OVMAN 036 AFMN111111 HFMN schäfer S16ß 21600000 SKFR SHEFA ZEFA S7000000 479000 CVR 837 SKFA111111 SCHFR |cls| cls := MRAStringComparator. cls := SoundexStringComparator. cls := KoelnerPhoneticCodeStringComparator. cls := Caverphone2StringComparator. #('müller' 'schmidt' 'schneider' 'fischer' 'weber' 'meyer' 'wagner' 'schulz' 'becker' 'hoffmann' 'schäfer') do:[:name | Transcript show:''''; show:name; show:''' -> '''; show:(cls encode:name); showCR:''''. ]. KoelnerPhoneticCodeStringComparator encode:'Müller-Lüdenscheidt' -> '65752682'

Class protocol:

phonetic codes
o  koelnerPhoneticCodeOf: aString
return a koelner phonetic code.
The koelnerPhonetic code is for the german language what the soundex code is for english;
it returns simular strings for similar sounding words.
There are some differences to soundex, though:
its length is not limited to 4, but depends on the length of the original string;
it does not start with the first character of the input.
This algorithm is described by Postel 1969

Usage example(s):

     #(
        'Müller'
        'Mulier'
        'Moliere'
        'Miller'
        'Mueller'
        'Mühler'
        'Mühlherr'
        'Mülherr'
        'Myler'
        'Millar'
        'Myller'
        'Müllar'
        'Müler'
        'Muehler'
        'Mülller'
        'Müllerr'
        'Muehlherr'
        'Muellar'
        'Mueler'
        'Mülleer'
        'Mueller'
        'Nüller'
        'Nyller'
        'Niler'
        'Czerny'
        'Tscherny'
        'Czernie'
        'Tschernie'
        'Schernie'
        'Scherny'
        'Scherno'
        'Czerne'
        'Zerny'
        'Tzernie'
        'Breschnew'
     ) do:[:w |
         Transcript show:w; show:'->'; showCR:(PhoneticStringUtilities koelnerPhoneticCodeOf:w)
     ].

Usage example(s):

     PhoneticStringUtilities koelnerPhoneticCodeOf:'Breschnew'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Breschneff'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Braeschneff'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Braessneff'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Pressneff'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Presznäph'. '17863'.
     PhoneticStringUtilities koelnerPhoneticCodeOf:'Preschnjiev'. '17863'.

o  miracodeCodeOf: aString
return a miracode soundex phonetic code or nil.
Miracode is a slightly modified soundex algorithm.
Notice that there are better algorithms around (doubleMetaphone)

Usage example(s):

     PhoneticStringUtilities miracodeCodeOf:'claus'   
     PhoneticStringUtilities miracodeCodeOf:'clause'   
     PhoneticStringUtilities miracodeCodeOf:'close'   
     PhoneticStringUtilities miracodeCodeOf:'smalltalk' 
     PhoneticStringUtilities miracodeCodeOf:'smaltalk'  
     PhoneticStringUtilities miracodeCodeOf:'smaltak'   
     PhoneticStringUtilities miracodeCodeOf:'smaltok'   
     PhoneticStringUtilities miracodeCodeOf:'smoltok'   
     PhoneticStringUtilities miracodeCodeOf:'aa'        
     PhoneticStringUtilities miracodeCodeOf:'by'        
     PhoneticStringUtilities miracodeCodeOf:'bab'       
     PhoneticStringUtilities miracodeCodeOf:'bob'       
     PhoneticStringUtilities miracodeCodeOf:'bop'       
     PhoneticStringUtilities miracodeCodeOf:'pub'       

o  mySQLSoundexCodeOf: aString
return the mySQL soundex code. The mysql soundex coed is different from the miracode 'american' soundex
(no 4char limitation; different order of duplicate vowel vs. duplicate code elimination).
Notice that there are better algorithms around (doubleMetaphone)

Usage example(s):

     #(
        'Müller'
        'Miller'
        'Mueller'
        'Mühler'
        'Mühlherr'
        'Mülherr'
        'Myler'
        'Millar'
        'Myller'
        'Müllar'
        'Müler'
        'Muehler'
        'Mülller'
        'Müllerr'
        'Muehlherr'
        'Muellar'
        'Mueler'
        'Mülleer'
        'Mueller'
        'Nüller'
        'Nyller'
        'Niler'
        'Czerny'
        'Tscherny'
        'Czernie'
        'Tschernie'
        'Schernie'
        'Scherny'
        'Scherno'
        'Czerne'
        'Zerny'
        'Tzernie'
        'Breschnew'
     ) do:[:w |
         Transcript show:w; show:'->'; showCR:(PhoneticStringUtilities mySQLSoundexCodeOf:w)
     ].

Usage example(s):

     PhoneticStringUtilities mySQLSoundexCodeOf:'Breschnew'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Breschneff'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Braeschneff'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Braessneff'.
     PhoneticStringUtilities mySQLSoundexCodeOf:'Pressneff'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Presznäph'. 
     PhoneticStringUtilities mySQLSoundexCodeOf:'Preschnjiev'.

o  soundexCodeOf: aString
return a soundex phonetic code or nil.
Soundex (1918, 1922) returns similar codes for similar sounding words, making it a useful
tool when searching for words where the correct spelling is unknown.
(read Knuth or search the web if you don't know what a soundex code is).
Caveat: 'similar sounding words' means: 'similar sounding in english'.
Notice that there are better algorithms around (doubleMetaphone)

Usage example(s):

     PhoneticStringUtilities soundexCodeOf:'claus'   
     PhoneticStringUtilities soundexCodeOf:'clause'   
     PhoneticStringUtilities soundexCodeOf:'close'   
     PhoneticStringUtilities soundexCodeOf:'smalltalk' 
     PhoneticStringUtilities soundexCodeOf:'smaltalk'  
     PhoneticStringUtilities soundexCodeOf:'smaltak'   
     PhoneticStringUtilities soundexCodeOf:'smaltok'   
     PhoneticStringUtilities soundexCodeOf:'smoltok'   
     PhoneticStringUtilities soundexCodeOf:'aa'        
     PhoneticStringUtilities soundexCodeOf:'by'        
     PhoneticStringUtilities soundexCodeOf:'bab'       
     PhoneticStringUtilities soundexCodeOf:'bob'       
     PhoneticStringUtilities soundexCodeOf:'bop'       

queries
o  isUtilityClass
(comment from inherited method)
a utility class is one which is not to be instantiated,
but only provides a number of utility functions on the class side.
It is usually also abstract


Private classes:

    Caverphone2StringComparator
    DaitchMokotoffStringComparator
    DoubleMetaphoneStringComparator
    ExtendedSoundexStringComparator
    KoelnerPhoneticCodeStringComparator
    MRAStringComparator
    MetaphoneStringComparator
    MiracodeStringComparator
    MySQLSoundexStringComparator
    NYSIISStringComparator
    PhonemStringComparator
    PhoneticStringComparator
    SingleResultPhoneticStringComparator
    SoundexStringComparator
    SpanishPhoneticCodeStringComparator


ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Wed, 15 Jan 2025 08:32:57 GMT