eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'PhoneticStringUtilities':

Home

everywhere
www.exept.de
for:
[back]

Class: PhoneticStringUtilities


Inheritance:

   Object
   |
   +--PhoneticStringUtilities

Package:
stx:libbasic2
Category:
Collections-Text-Support
Version:
rev: 1.10 date: 2010/04/30 09:50:19
user: cg
file: PhoneticStringUtilities.st directory: libbasic2
module: stx stc-classLibrary: libbasic2

Description:


Utilities which are helpful to perform phonetic string searches or comparisons.
These are all variations or improvements of the soundex algorithm, which usually fails
to provide good results for non-english languages.

soundexCode
    this algorithm was originally contained in the CharacterArray class;

nysiis
    a modified soundex algorithm

miracode
    another modified soundex algorithm ('american soundex') used in the 1880 census.

mySQLSoundex
    another modified soundex algorithm used in mySQL.

koelner phoneticCode 
    provides a functionality similar to soundex, but much more tuned towards the German language

Double metaphone 
    works with most european languages.

phonem
    described in Georg Wilde and Carsten Meyer, 'Doppelgaenger gesucht - Ein Programm fuer kontextsensitive phonetische Textumwandlung'
    from 'ct Magazin fuer Computer & Technik 25/1999'.

More info for german readers is found in:
    http://www.uni-koeln.de/phil-fak/phonetik/Lehre/MA-Arbeiten/magister_wilz.pdf


Class protocol:

phonetic codes
o  koelnerPhoneticCodeOf: aString
return a koelner phonetic code.
The koelnerPhonetic code is for the german language what the soundex code is for english;
it returns simular strings for similar sounding words.
There are some differences to soundex, though:
its length is not limited to 4, but depends on the length of the original string;
it does not start with the first character of the input.
This algorithm is described by Postel 1969

o  mySQLSoundexCodeOf: aString
return the mySQL soundex code. The mysql soundex coed is different from the miracode 'american' soundex
(no 4char limitation; different order of duplicate vowel vs. duplicate code elimination)

o  soundexCodeOf: aString
return a soundex phonetic code or nil.
Soundex (1918, 1922) returns similar codes for similar sounding words, making it a useful
tool when searching for words where the correct spelling is unknown.
(read Knuth or search the web if you dont know what a soundex code is).
Caveat: 'similar sounding words' means: 'similar sounding in english'.


Private classes:

    DoubleMetaphoneStringComparator
    ExtendedSoundexStringComparator
    KoelnerPhoneticCodeStringComparator
    MiracodeStringComparator
    MySQLSoundexStringComparator
    NYSIISStringComparator
    PhonemStringComparator
    PhoneticStringComparator
    SoundexStringComparator


ST/X 6.1.1; WebServer 1.620 at exept:8081; Tue, 22 May 2012 21:36:10 GMT