eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'PhoneticStringUtilities::Caverphone2StringComparator':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: Caverphone2StringComparator (private in PhoneticStringUtilities

This class is only visible from within PhoneticStringUtilities.

Inheritance:

   Object
   |
   +--PhoneticStringUtilities::PhoneticStringComparator
      |
      +--PhoneticStringUtilities::SingleResultPhoneticStringComparator
         |
         +--PhoneticStringUtilities::Caverphone2StringComparator

Package:
stx:libbasic2
Category:
Collections-Text-Support
Owner:
PhoneticStringUtilities

Description:


Caverphone (2) Algorithm:

see http://caversham.otago.ac.nz/files/working/ctp150804.pdf

Caverphone 2.0 is being made available for free use for the benefit of anyone who has a use for it,
with the proviso that the Caversham Project at the University of Otago should be acknowledged as the
original source (which is hereby done ;-).

•  Start with a Surname or Firstname
•  Convert to lowercase
    This coding system is case sensitive, implementations should acknowledge that a is not the same as A
•  Remove anything not A-Z
    The main intention of this is to remove spaces, hyphens, and apostrophes.
    example:  o'brian becomes obrian
•  If the name starts with cough make it cou2f
    2 is being used as a temporary placeholder to indicate a consonant which we are no longer interested in.
•  If the name starts with rough make it rou2f
•  If the name starts with tough make it tou2f
•  If the name starts with enough make it enou2f
•  If the name starts with gn make it 2n
•  If the name ends with mb make it m2
•  replace cq with 2q
•  replace ci with si
•  replace ce with se
•  replace cy with sy
•  replace tch with 2ch
•  replace c with k
•  replace q with k
•  replace x with k
•  replace v with f
•  replace dg with 2g
•  replace tio with sio
•  replace tia with sia
•  replace d with t
•  replace ph with fh
•  replace b with p
•  replace sh with s2
•  replace z with s
•  replace and initial vowel with an A
•  replace all other vowels with a 3
    3 is a temporary placeholder marking a vowel
•  replace 3gh3 with 3kh3
    Exceptions are dealt with before the general case. gh between vowels is an except of the more general gh rule.
•  replace gh with 22
•  replace g with k
•  replace groups of the letter s with a S
    Continuous strings of s are replace by a single S
•  replace groups of the letter t with a T
•  replace groups of the letter p with a P
•  replace groups of the letter k with a K
•  replace groups of the letter f with a F
•  replace groups of the letter m with a M
•  replace groups of the letter n with a N
•  replace w3 with W3
•  replace wy with Wy
•  replace wh3 with Wh3
•  replace why with Why
•  replace w with 2
•  replace and initial h with an A
•  replace all other occurrences of h with a 2
•  replace r3 with R3
•  replace ry with Ry
•  replace r with 2
•  replace l3 with L3
•  replace ly with Ly
•  replace l with 2
•  replace j with y
•  replace y3 with Y3
•  replace y with 2
•  remove all 2s
•  remove all 3s
•  put six (v1) / ten (v2) 1s on the end
•  take the first six characters as the code (caverphone 1);
   / take the first ten characters as the code (caverphone 2);

 self new encode:'david'      -> 'TFT1111111'
 self new encode:'whittle'    -> 'WTA1111111'

 self new encode:'Stevenson'  -> 'STFNSN1111'
 self new encode:'Peter'      -> 'PTA1111111'

 self new encode:'washington' -> 'WSNKTN1111'
 self new encode:'lee'        -> 'LA11111111'
 self new encode:'Gutierrez'  -> 'KTRS111111'
 self new encode:'Pfister'    -> 'PFSTA11111'
 self new encode:'Jackson'    -> 'YKSN111111'
 self new encode:'Tymczak'    -> 'TMKSK11111'

 self new encode:'add'        -> 'AT11111111'
 self new encode:'aid'        -> 'AT11111111'
 self new encode:'at'         -> 'AT11111111'
 self new encode:'art'        -> 'AT11111111'
 self new encode:'earth'      -> 'AT11111111'
 self new encode:'head'       -> 'AT11111111'
 self new encode:'old'        -> 'AT11111111'

 self new encode:'ready'      -> 'RTA1111111'
 self new encode:'rather'     -> 'RTA1111111'
 self new encode:'able'       -> 'APA1111111'
 self new encode:'appear'     -> 'APA1111111'

 self new encode:'Deedee'     -> 'TTA1111111'


Instance protocol:

api
o  encode: word
1. Convert to lowercase

usage example(s):

     self new encode:'david'      -> 'TFT1111111'
     self new encode:'whittle'    -> 'WTA1111111'

     self new encode:'Stevenson'  -> 'STFNSN1111'
     self new encode:'Peter'      -> 'PTA1111111'

     self new encode:'washington' -> 'WSNKTN1111'
     self new encode:'lee'        -> 'LA11111111'
     self new encode:'Gutierrez'  -> 'KTRS111111'
     self new encode:'Pfister'    -> 'PFSTA11111'
     self new encode:'Jackson'    -> 'YKSN111111'
     self new encode:'Tymczak'    -> 'TMKSK11111'

     self new encode:'add'        -> 'AT11111111'
     self new encode:'aid'        -> 'AT11111111'
     self new encode:'at'         -> 'AT11111111'
     self new encode:'art'        -> 'AT11111111'
     self new encode:'earth'      -> 'AT11111111'
     self new encode:'head'       -> 'AT11111111'
     self new encode:'old'        -> 'AT11111111'

     self new encode:'ready'      -> 'RTA1111111'
     self new encode:'rather'     -> 'RTA1111111'
     self new encode:'able'       -> 'APA1111111'
     self new encode:'appear'     -> 'APA1111111'

     self new encode:'Deedee'     -> 'TTA1111111'



ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Fri, 26 Apr 2024 09:18:07 GMT