Collections are containers for other objects. Smalltalk provides a rich set of collection classes ready to be used.
The protocol offered by these collections is very orthogonal, making
good use of polymorphism. For example, most collections respond to
messages
like #at:
, #includes:
or #do:
,
even though the implementations are totally different and the performance
may vary greatly between different collections and access methods
(*).
To get a rough idea of the common protocol,
take a look at the
Collection
and SequenceableCollection
classes.
Collection
is the superclass of all collections,
and SequenceableCollection
is inherited by all collections which use
a numeric index (among others: Array
, String
and OrderedCollection
).
These implement most of the common operations,
so knowledge about those classes' protocol is highly recommended.
The most useful and most used collection classes are:
less frequently used collections:
seldom used or highly specialized collections:
The typical uses of arrays are:
anArray := Array new:size
anArray := someCollection asArray
{ expr1 . expr2 ... exprN }
theSize := anArray size
anArray at:index put:anObject
element := anArray at:index
anArray includes:anObject
anArray indexOf:anObject
anArray do:[:element | ... do something with element ...]
#( element1 element2 ... elementN )
where the individual elements must be literals (numbers, characters,
strings or other arrays).
Nested literal Arrays can be written as:
#(
(element1.1 element1.2 ... element1.M)
(element2.1 element2.2 ... element2.M)
...
(elementN.1 elementN.2 ... elementN.M)
)
Notice, the above is an Array of 3 Arrays. Not to confuse
with an array containing 9 elements (which you get, if the inner parenthesises
are ommited).
Smalltalk/X also allows the objects true
,
false
and nil
to be treated like constants in
literal arrays. This is NOT compatible with some older Smalltalk implementations (but is now part of the Ansi standard),
so care should be taken if that code is to be ported to other systems later.
For example,
represents a 5-element array literal consisting of
#( nil 1 'foo' true #true )
nil
,
the integer 1
, the string 'foo'
, the boolean
true
and the symbol #true
.
More details are found in
"Array's class description
".
In practice, arrays are very very seldom used except as literal constants "#( ... )".
Beginners, especially with a C/C++ background, often tend to overuse arrays - please look at the
description of OrderedCollection,
which behaves much like arrays do,
but offer functionality for growing and shrinking and are usually a better
choice. Also, for searching, counting, mapping etc. please use Set and Dictionary.
Depending on the number of bits which are required to store a character, strings use a single- or double-byte storage scheme. Unicode is used as internal encoding. However, various converters assist in transcoding to/from different coding schemes used with fonts or in files.
Typical operations on strings are:
theSize := aString size
aString at:index put:someCharacter
element := aString at:index
newString := aString copyReplaceString:string1 withString:string2
newString := aString copyReplace:character1 with:character2
newString := aString copyReplaceAll:character1 with:character2
aString replaceFrom:index1 to:index2 with:otherCollection startingAt:startIndex
newString := string1 , string2
newString := aString copyFrom:index1 to:index2
newString := aString copyFrom:index1
newString := aString copyTo:index1
newString := aString contractTo:size
Transcript show:aString
Transcript showCR:aString
aString print
aString printCR
newString := aString withoutSeparators
newString := aString withoutSpaces
newString := aString withoutLeadingSpaces
newString := aString withoutTrailingSpaces
newString := aString asUppercase
newString := aString asUppercaseFirst
newString := someString leftPaddedTo:size
newString := someString paddedTo:size
newString := someString centerPaddedTo:size
newString := someString decimalPaddedTo:size
index := aString indexOf:aCharacter startingAt:startIndex
index := aString indexOfAny:aCharacterCollection startingAt:startIndex
index := aString indexOfSeparatorStartingAt:startIndex
index := aString indexOfNonSeparatorStartingAt:startIndex
index := aString indexOfSubCollection:subString startingAt:startIndex
aString includes:character
aString includesString:subString
aString startsWith: anotherString
aString endsWith: anotherString
aPatternString match:someString
index := someString findMatchString:aPatternString startingAt:index
aPatternString matchesRegex:someString
words := aString asCollectionOfWords
words := aString asCollectionOfLines
parts := aString asCollectionOfSubstringsSeparatedBy:aCharacter
parts := aString asCollectionOfSubstringsSeparatedByAny:sepCharacters
parts := aString asCollectionOfSubstringsSeparatedByAll:aString
filename := aString asFilename
number := aString asNumber
symbol := aString asSymbol
aString isAlphaNumeric
aString isNumeric
string := aString asSoundexCode
number := aString spellAgainst: anotherString
number := aString levenshteinTo: anotherString
Constant strings (string literals) are entered by enclosing the characters in single quotes
('
). Single quotes within string constants are to be doubled.
String literals can cross line boundaries; in this case, the newline character
is part of the string.
Examples:
Currently, there exists no standard on how non printable characters
are to be represented in strings via escape sequences.
For embedded newline characters,
Smalltalk-80 provides the
'a normal string constant'
'a string constant with embedded ''quotes'''
'a string constant
with mutliple
lines'
#withCRs
method,
VisualAge calls this #addLineDelimiters
.
Both are supported by ST/X and convert embedded '\' (backslash)
characters to the newLine character, as in:
In addition, the (non standard) method
'a string\with newlines\' withCRs
#withEscapes
converts
embedded C-language escape sequences:
'a string\n\t\twith newline and tabs' withEscapes
More details are found in the class documentation of
"String
".
and (highly recommended) especially in its superclass
"CharacterArray
".
This is where most of the functionality of String
,
TwoByteString
and Unicode16String
is actually implemented.
Therefore, symbols can be compared using identity compare (==), which is much faster than value compare (=). Identity compare is technically implemented as a pointer compare, while value compare on strings is implementated by comparing each element of the two objects.
Symbols are internally used by the system to assign names to class objects (in the Smalltalk global table), and to assign method names to method objects (in each classes method/selector association). Also, symbols are perfect to use as keys in IdentityDictionaries.
Since symbols are handled just as fast as integers, they are also perfect for flags, states (state machines) or other variables which are represented by enumeration types or define-constants in other programming languages.
Symbols do not allow any change of their character elements; the at:put:
method will raise an error. Other than that, all read-accesses are the same as for
strings (actually, the Symbol class inherits all those methods from the String class).
Consider symbols as being some kind of readonly string.
Symbols can be created (at runtime) from a string with:
also, symbol constants (i.e. symbol literals)
can be written directly by prefixing some identifier with the #-character as in:
sym := aString asSymbol
These symbol literals are created at compile time -
NOT at execution time.
Therefore, the cost of using them is not more than using integers
sym := #helloThere.
Which is why there is no need for enumeration-types in Smalltalk;
simply use a symbol, wherever other programming languages would
use a numeric define or an enumeration type.
Only alphanumeric identifiers or valid message selectors
are allowed in the above example.
Examples for valid symbol iterals are:
Use quotes around the characters for other symbol literals:
#helloThere
#FooBar
#+
#at:
#at:put:
The quotes are not part of the symbols characters;
therefore
sym := #'a symbol with spaces and .... other stuff'
#+
and #'+'
represent the same symbol.
Symbol literals can also be present in array literals
as in:
Within such an array literal, the leading #-character can be ommitted
(for your convenience).
Thus, the above is equivalent to:
#( #foo #bar #'baz' )
but, NOT equivalent to:
#( foo bar baz )
which is an array of strings.
#( 'foo' 'bar' 'baz' )
Since true
, false
and nil
are treated
as constant objects,
you'd have to add a #-prefix if you ever want those symbols in an array literal,
as in:
Which - you have guessed it - is not the same as:
#( #true #false #nil)
#( true false nil)
More details are found in
"Symbol's class description
".
=
, which typically compares objects values.
1
(the Integer "one") will be considered
equal to 1.0
(the Float "one") in
set-inclusion tests.
IdentitySet
uses identity compare
(==
) and treats 1
and 1.0
as different
objects.
aSet := Set new
aSet := Set new:1000
theSize := aSet size
aSet add:anObject
aSet remove:anObject
aSet includes:anObject
aSet do:[:element | ... do something with element ...]
#at:
and #at:put:
) are
not allowed and trigger an error.
Sets and identitySets use hashing algorithms internally for inclusion tests. Thus, they are usually fast and show a constant access time which is determined by their fill-grade and the quality of the hash key algorithm; not by the absolute size of the set. The hash key generation is implemented by the elements - not by the set.
Sets automatically grow (and shrink) as required when elements are added or removed. To get good hashing performance, they grow larger (by some percentage) than the actual size required for the elements. Expect some 20% storage overhead when using sets.
IdentitySets are sometimes faster than Sets, since both hashKey generation
and comparing is implemented more efficiently.
If your keys are small integers or symbols, you may use an IdentitySet.
However, for string elements or float numbers, using identitySets is usually a bad choice.
More details are found in
"Set's class description
"
and
"IdentitySet's class description
".
#at:
and #at:put:
messages, using an access key.
However, in contrast to arrays they allow ANY other object to be used as key.
Thus an array can be thought of as a special dictionary using numeric keys
only (if you want to do this, it is faster to use an array rightaway - except
if your array is filled very sparse).
Like sets, dictionaries come in two flavours: Dictionary
uses
value compare (=
) and IdentityDictionary
which uses
identity compare (==
) for key accesses.
Thus:
will return the expected
|d value|
d := Dictionary new.
d at:1 put:'one'.
...
value := d at:1.0
'one'
in the variable "value",
because the integer "1" compares equal to the float "1.0" and therefore,
the dictionary will find a value under the key "1.0".
In contrast,
will raise an error, since there is no element stored under the key "1.0"
(remember: "
|d value|
d := IdentityDictionary new.
d at:1 put:'one'.
...
value := d at:1.0
1 = 1.0
" returns true, while "1 == 1.0
"
returns false).
Like with sets, dictionaries use hashing internally, and therefore show a constant access time depending on the fill-rate only. Not on the absolute size. (However, they do depend upon the quality of the hashing algorithm used).
Dictionaries are very powerful collections, and worth a second look.
You can put associations between objects into dictionaries.
A simple example:
You can also use Dictionaries as a replacement for simple data structures,
which are called ``record'' or ``struct'' in other programming languages:
|nameToAge|
nameToAge := Dictionary new.
nameToAge at:'fred' put:17.
nameToAge at:'peter' put:21.
...
age := nameToAgeAt:name ifAbsent:['I do not know'].
Of course, real programmers define a Person class and create an
instance of it ... this is both more 'object oriented' and provides
faster access than using dictionaries (which are not meant to be used
for the above 'constant' key access).
|peter marry|
peter := IdentityDictionary new.
peter at:#age put:23.
peter at:#firstName put:'peter'.
peter at:#middleName put:'m'.
peter at:#name put:'smalltalker'.
peter at:#salary put:51234.
...
marry := IdentityDictionary new.
marry at:#age put:21.
marry at:#firstName put:'marry'.
marry at:#middleName put:'s'.
marry at:#name put:'smalltalker'.
marry at:#husband put:peter.
...
age := peter at:#age.
firstNameOfHusband := (marry at:#husband) at:#firstName.
However, the above scheme may be used, if the number of information slots
to be held is not constant, or not foreseeable by the programmer.
A typical example would be an application, in which the end-user is allowed to
add user-defined fields to some entity.
More details are found in
"Dictionary's class description
"
and
"IdentityDictionary's class description
".
typical use:
coll := OrderedCollection new
coll := OrderedCollection new:1000
coll := OrderedCollection new grow:1000
theSize := coll size
coll add:anObject
coll addFirst:anObject
coll removeFirst
coll removeLast
coll includes:anObject
coll do:[:element | ... do something with element ...]
new:
instance creation method,
which uses the numeric argument as a size hint,
not to specify the logical size of the resulting collection.
It therefore returns a logically empty collection, which has some slots
"preallocated" for performance reasons.
One may argue about this and many do not like it. However, as it is too late to change
it now without affecting thousands of programs, you will have to live with it (just like the others have to).
Notice, that the above #includes:
message
is usually much slower for large orderedCollections or arrays than it
is for sets or dictionaries. The reason is that the former have to do
a sequential search over all elements, while the later can use a fast
hashed access, typically finding an element after a few compares.
Stacks are implemented by using #addLast:
as push operation,
and #removeLast
as a pop operation.
Queues are implemented using #addLast:
and #removeFirst
.
Also, Smalltalk provides a specialized Queue class, which is tuned
for this particular type of access.
More details are found in
"OrderedCollection's class description
".
Sorting is done using quickSort and binary search (when inserting elements). typical use:
sorted := someOtherCollection asSortedCollection
sorted := SortedCollection new
sorted sortBlock:[:a :b | a > b]
sorted sortBlock:[:a :b | a < b]
sorted sortBlock:[:a :b | a asUppercase < b asUppercase]
sorted add:someObject
sorted includes:someObject
sort
message.
Or else to use another collection, such as a BTree
, which is more specialized
to insertion/deletion of individual elements.
More details are found in
"SortedCollection's class description
".
Bytearrays are also valuable for data interchange with other programs via
files, sockets, shared memory or pipes.
Protocolwise, they behave much like Arrays.
More details are found in
"ByteArray's class description
".
More details are found in
"FloatArray's class description
".
(2 to:10 by:2)
.
More details are found in
"Interval's class description
".
Link
or
ValueLink
instances. They provide reasonable fast
methods for insertion at either ends,
but show poor performance when any inner element is to be accessed
(since the list has to be walked up to the element).
Although often overemphasized in existing computer science literature,
linkedLists are not very often used in Smalltalk, due to their poor
indexed access performance and their storage overhead.
In most cases, an orderedColllection or a specialized tree implementation
will outperform a linkedList.
More details are found in
"LinkedList's class description
".
SharedQueue
(which is a subclass of Queue
) and helps in writing
producer/consumer applications with separate writer/reader processes.
Notice, that queue functionality is also provided by OrderedCollection
(addLast:
/ removeFirst
messages);
however the Queue-class protocol mimics a stream (next
and nextPut:
messages)
and is therefore slightly
different.
If you don't care for this and if you don't need the synchronization
of a SharedQueue, use whichever you find more appealing.
More details are found in
"Queue's class description
".
More details are found in
"WeakArray's class description
" or
"WeakIdentitySet's class description
" or
"WeakIdentityDictionary's class description
".
More details are found in
"Registry's class description
" or
"HandleRegistry's class description
".
ResourcePacks are used to keep track of string translations for national language variants. Technically, they are dictionaries. However, the class provides additional protocol for loading resourcePacks from so called resource files.
More details are found in
"CacheDictionary's class description
" or
"ResourcePack's class description
".
More details are found in
"MappedCollection's class description
".
Notes:
(*)
Some collections are tuned for growing, while others are not;
some use hashing or other fast access methods to support fast
search and/or inclusion test, while others do sequential searches.
Please check out and understand the individual collection classes
and use the one which best fits your needs to get best performance and memory utilisation.
Most collection classes provide detailed information in their
#documentation method.
Copyright © 1996-2016 Claus Gittinger Development & Consulting
<info@exept.de>