Experienced smalltalkers may skip this document.
This document does not deal with possible conflicts between
performance and elegance/reusablibiliy.
Be aware, that in some cases a compromise has to be made.
Below, we try to quantify performance aspects only, to give you
the background information required for coding decisions.
Be aware that this document was written in 1995. Although the execution times have much improved in the meantime, the performance facts are still valid and the time ratios still hold. Simply divide the times given below by whatever your CPU has improved in speed.
Before worrying about any specific (microscopic) performance tuning, always make certain that your general concept is correct and the best algorithms are used.
For example, a bubbleSort even if written in assembler is slower than a quickSort written in a slow interpreted language, if the number of objects exceeds a certain limit.
On the other hand, if only a few elements are to be sorted (say 5), even a naive sort algorithm will outperform a quickSort.
These algorithmic effects (and also choice of collection as seen below), usually have a much higher impact on performance than microscopic fine tuning.
index1 to:index2 do:[:idx |
someCollection at:idx put:someValue
].
and ignore the fact, that a specialized method for doing this
already exists.
someCollection from:index1 to:index2 put:someValue.
This not only saves code, but is also faster by roughly a factor
of 4
#from:to:put:
is implemented as a highly
optimized primitive method).
This applies mostly to collections and streams.
If you speedup the 10% of your code where 90% if the time is spent by a factor of two, you get an overall speedup factor of 1.8. Whereas even if you speedup the remaining 90% by an infinite amount, your overall win will only be a factor of 1.1. Thus, do not waste you time tuning code which is seldom executed.
To get the list of top time-consumers, execute your program (or parts of it) under a messageTally. It will give you a list which summarizes the execution times sorted by method. Both the accumulated time (i.e. spent in a method and all all of its callees) and the leaf time (i.e. the time spent in a particular method alone) is reported.
Since the data is acquired by probing the execution
state of the program at regular time intervals, the information may not be exact
and in some cases be completely wrong (for example, if the program executes
in cycles which are in sync with the probe intervals).
This is of course a general problem
and always occurs when sampling with a (relatively) low frequency.
Despite its shortcomings, the information is generally quite
valueable to start with.
The default sampling occurs in 10ms ticks, but the MessageTally
class offers an interface where the sampling interval can be specified as
argument.
For reasonable statistics, the program should at least execute for a second; the longer, the better. If required, execute it multiple times in a loop.
For example, the output of:
|data rnd|
data := Array new:10000.
rnd := Random new.
1 to:10000 do:[:i |
data at:i put:(rnd nextIntegerBetween:1 and:100000)
].
MessageTally spyOn:[
10 timesRepeat:[data sort].
]
is:
total execution time: 2189 ms (108 probes ; error >= 0.9%) MessageTally » MessageTally execute (total 100.0%) Array » SequenceableCollection sort (total 100.0%) Array » SequenceableCollection quickSortFrom:to: (total 100.0%)(leaf 8.3%) Array » SequenceableCollection quickSortFrom:to: (total 91.7%)(leaf 9.3%) Array » SequenceableCollection quickSortFrom:to: (total 82.4%)(leaf 1.9%) Array » SequenceableCollection quickSortFrom:to: (total 80.6%)(leaf 0.9%) Array » SequenceableCollection quickSortFrom:to: (total 79.6%)(leaf 8.3%) Array » SequenceableCollection quickSortFrom:to: (total 71.3%)(leaf 8.3%) Array » SequenceableCollection quickSortFrom:to: (total 63.0%)(leaf 4.6%) Array » SequenceableCollection quickSortFrom:to: (total 58.3%)(leaf 9.3%) Array » SequenceableCollection quickSortFrom:to: (total 49.1%)(leaf 17.6%) Array » SequenceableCollection quickSortFrom:to: (total 31.5%)(leaf 7.4%) Array » SequenceableCollection quickSortFrom:to: (total 24.1%)(leaf 11.1%) Array » SequenceableCollection quickSortFrom:to: (total 13.0%)(leaf 5.6%) Array » SequenceableCollection quickSortFrom:to: (total 7.4%)(leaf 1.9%) Array » SequenceableCollection quickSortFrom:to: (total 5.6%)(leaf 0.9%) Array » SequenceableCollection quickSortFrom:to: (total 4.6%)(leaf 1.9%) Array » SequenceableCollection quickSortFrom:to: (total 2.8%) Array » SequenceableCollection quickSortFrom:to: (total 2.8%)(leaf 0.9%) Array » SequenceableCollection quickSortFrom:to: (total 1.9%) Array » SequenceableCollection quickSortFrom:to: (total 1.9%) Array » SequenceableCollection quickSortFrom:to: (total 1.9%)(leaf 0.9%) Array » SequenceableCollection quickSortFrom:to: (total 0.9%) Array » SequenceableCollection quickSortFrom:to: (total 0.9%) Array » SequenceableCollection quickSortFrom:to: (total 0.9%)(leaf 0.9%) leafs of calling tree: Array » SequenceableCollection quickSortFrom:to: (total 100.0%)(leaf 100.0%)Of course, this is what we expected; all time is spent in the quickSort method (which recursively recalls itself).
The execution time as reported is not the same when the program is executed
without a messageTally; the messageTally itself introduces some overhead,
slowing down the execution
(during its execution, it builds a call tree which may become quite large).
In most situations, this does not make a problem - the relative
numbers as reported are independent of the absolute execution time.
(exception: if you want to measure execution time of some program which
depends on real time.)
The number of probes as reported gives a hint on how exact the statistic data is. With more probes, the measured numbers become more exact. If the number of probes is too small, you should execute the program in a loop, or use a messageTally with a smaller probe tick-time.
In the following, the execution time is too short for a reasonable measure:
|data rnd|
data := Array new:2000.
rnd := Random new.
1 to:2000 do:[:i |
data at:i put:(rnd nextIntegerBetween:1 and:100000)
].
MessageTally spyOn:[
data sort.
]
because only a few probes occur.
|data rnd|
data := Array new:2000.
rnd := Random new.
1 to:2000 do:[:i |
data at:i put:(rnd nextIntegerBetween:1 and:100000)
].
MessageTally spyDetailedOn:[
data sort.
]
this samples in 1ms intervals.
Notice that not all operating systems support timer intervals this short; on some, the resolution may be no better than 20ms.
If you cannot run your code under a messageTally (for example, because it is an interrupt handler, or synchronized by the timer), try to find the executions times by placing only parts of your code into a timing block, as in:
...
t1 := Time millisecondsToRun:[
... some code here ...
].
...
t2 := Time millisecondsToRun:[
... more code here ...
].
...
'time spent in first block: ' print. t1 printNL.
'time spent in first block: ' print. t2 printNL.
If it is not possible to print the results (because the printing
itself disturbs the operation & timing), keep the time values
in some collection which is printed later, at uncritical times.
"Array add:"
is much more expensive than "anOrderedCollection add:"
).
Check out the use of collections and consider replacing them by others which offer better performance. There is no "general perfect collection" which satisfies all needs; performance depends heavily on the way a collection is used.
You should at least gather a rough idea of how collections are implemented
internally, to get a feeling of which operations are to be considered
"dangerous" with respect to performance. This may be somewhat in
conflict to the general "black box" or "need to know"
approach, but experience shows,
that most mistakes are due to misuse instead of proper reuse.
After all, that's what a browser and the full source
code are provided for with the system.
One typical situation is when elements are searched for in a collection. Many collections' search time grows linear with the number of elements (Array, OrderedCollection, LinkedList etc.). If the collection in question is large, replace arrays or orderedCollections by sortedCollections if the elements can be compared (log-n searchtime), or sets/dictionaries (almost constant searchtime).
Of course, there is the usual time/space tradeOf; hashed collections require some additional space (typically 20-30%), binary trees require 2 additional reference entries etc.
Be aware, that some collections are slower than others,
when adding elements (SortedCollection vs. OrderedCollection),
or that the time may depend on whether the elements come along
in ascending order or in random order (SortedCollection).
In some situations, it makes sense to build up
a collection as an OrderedCollection, Set or Dictionary,
and convert it as appropriate when all elements have been added,
for shorter search times later.
If many elements are to be added to a SortedCollection,
it is preferable to first add them unsorted (i.e. build it as an
OrderedCollection) and sort the elements once at the end;
instead of letting the collection sort each new element individually.
For example:
|randomValues rnd s|
rnd := Random new.
randomValues := (1 to:10000) collect:[:i | rnd next].
s := SortedCollection new.
Time millisecondsToRun:[
randomValues do:[:v | s add:v]
]
is about 4 times slower than:
...
Time millisecondsToRun:[
s := randomValues asSortedCollection
]
or:
...
Time millisecondsToRun:[
s addAll:randomValues
]
which uses the same algorithm internally.
In contrast, if we add elements in presorted order, as in:
|presortedValues s|
presortedValues := (1 to:10000).
s := SortedCollection new.
Time millisecondsToRun:[
presortedValues do:[:v | s add:v]
]
the difference drops to roughly a factor of 1.6
(but adding them all at once is still faster)
Lets try, analyze and measure a concrete example with searching; the following (naive) code reads a file and builds up a list of unique words:
|data stream line words|
data := OrderedCollection new.
stream := 'Makefile' asFilename readStream.
[stream atEnd] whileFalse:[
line := stream nextLine.
words := line asCollectionOfWords.
words do:[:word |
(data includes:word) ifFalse:[
data add:word
]
]
].
stream close
Executing this shows very poor performance, and you may ask yourself: why ?
At first, you may think that the splitting of lines into individual words
produces too much overhead, or that temporary storage allocation and
reclamation is the problem.
Before starting to replace #asCollectionOfWords
by an inline
scanning loop, either think about what may happen, or use a messageTally,
to see what really happens.
OrderedCollection » includes:
").
We used an unappropriate collection for this type of search, leading to a square runtime behavior. Such an algorithms execution time can grow quite rapidly, in our case (with roughly 2200 unique words out of a total of 8600 words), this accounts for millions of string compare operations.
Of course, we would like our algorithm to show an execution time
which grows linear with the size of the input, (possibly) unaffected by the
number of unique words.
Replacing the OrderedCollection by a Set accomplishes this:
|data stream line words|
data := Set new.
stream := 'Makefile' asFilename readStream.
[stream atEnd] whileFalse:[
line := stream nextLine.
words := line asCollectionOfWords.
words do:[:word |
(data includes:word) ifFalse:[
data add:word
]
]
].
stream close. data
and speeds up the program by some factor of 10.
|list|
list := Array new.
1 to:10000 do:[:i |
list := list copyWith:i
]
creates 9999 intermediate throwAway arrays of increasing size (0 to 9999).
All those temporaries account for a rough total of 200Mb garbage,
which produces quite a bit of overhead
in creation, initialization (nilling) and reclamation.
The same argument is true for the , (comma) message, which creates new collections. I.e.:
|string|
string := ''.
1 to:10000 do:[:i |
string := string , (i printString)
]
allocates huge amounts of temporary strings (which are immediately thrown away).
To avoid this, use a streams to collect the data:
|stream string|
stream := WriteStream on:''.
1 to:10000 do:[:i |
stream nextPutAll:(i printString).
].
string := stream contents.
this is much faster, because the stream uses a buffer, which is doubled in
size when full, thus only performing log2(N) allocations, for a final string size of N.
|list|
list := Array new:10000.
1 to:10000 do:[:i |
list at:i put:i
]
this takes only 90ms on the same machine (with interpreted code)
and will boost to 4ms, if compiled to machine code.
list := (1 to:10000) asArray
which does exactly the same as above (but is considered better coding style).
|list temp|
temp := OrderedCollection new.
1 to:10000 do:[:i |
temp add:i
].
list := temp asArray
or a writeStream, which is also tuned for growing its underlying collection:
|list stream|
stream := WriteStream on:(Array new).
1 to:10000 do:[:i |
stream nextPut:i
].
list := stream contents
on the same hardware, the last two versions both execute in about
110ms if interpreted or about 35ms if compiled.
All of the above is also true for the #','
(comma) concatenation message,
which is often used with string objects.
Especially when building strings, use of a writeStream is generally the better choice.
The reason is that to remove an inner element, (all) followup elements
in the underlying container have to be copied towards the removeIndex.
In contrast, removing at either end is much cheaper, since only the startIndex
or lastIndex instance variable has to be modified and no block copying
is needed.
If you have to remove many elements from a huge orderedCollection, it may be faster (depending on the size), to create a new collection from scratch - even though this may involve some garbage collector overhead.
For example, to remove all odd elements from a 10000 element collection, the following code:
coll removeAllSuchThat:[:el | el even].
takes around one second on a P5/133.
#removeAllSuchThat:
,
and do things ourself as in:
coll size to:1 by:-1 do:[:index |
(coll at:index) even ifTrue:[
coll removeAtIndex:index
]
]
things become a bit faster (around 530 ms).
The fastest here is to create a fresh collection as in:
coll := coll select:[:el | el odd].
executes in about 64 ms on the same machine.
The same is true for SortedCollection
, which is based
upon OrderedCollection
.
Notice, that in ST/X,
the Set
and Dictionary
classes use a special
removal algorithm, to NOT suffer from the above problem.
Other Smalltalk systems may show poor performance with Sets and Dictionaries
as well, due to excessive rehashing when elements are removed.
In any case, avoid indexed accesses on a linkedList.
For protocol completeness, indexed access is provided by the #at:
method, but is internally implemented by walking from the
lists anchor up to the specified element.
Some methods inherited by LinkedList walk over the receiver
doing indexed accesses - these show square runtime behavior,
when applied to linkedLists.
The worst performance is encountered when walking over a linked list
in reverse order - this is terribly slow.
Of course, it would be easy to rewrite these methods, to avoid this behavior.
However, this was not done, since LinkedLists are rarely used in practice.
(if you insist on using linkedLists, and you need those methods, please
define a private subclass of LinkedList and add tuned implementations of
those algorithms to that class; top candidates are:
#collect:
, #select:
and all other methods which
are inherited by SequentialCollection and walk over the receivers element
using #at:
to fetch elements.)
Also notice, that linkedLists require that their elements inherit from
Link
, or provide a similar protocol.
Thus, linkedLists are limited in the elements type, while
other collections work with any object.
Finally, linkedLists are dangerous in another area: its elements can only be on
one linkedList at any time. If you add an element to another list, the original
list will be corrupted if it was not removed from it before.
This means, that we highly recommend using ValueLink
elements,
instead of placing your objects directly into a linkedList.
In most Smalltalk implementations, arrays cannot grow in size.
(I.e. "someArray add:anElement"
is not possible).
In ST/X, arrays support growing for protocol completeness,
but since this operation is a very slow one,
a warning message is sent to the the terminal.
As an example,
|a|
a := Array new.
1 to:100000 do:[:i |
a add:i
]
may take forever, due to the very expensive resizing in #add:
.
Notice: the real reason for Array » add:
being so slow is that the
underlying basic mechanism (#become:
) is slow in many
Smalltalk implementations. Read more on this below.
Sets are perfect for inclusion tests or to find unique values.
Always consider using a set, if you have to search the collection using
#includes:
.
Bags are great for counting elements.
If your access keys are integers or symbols, use identitySets or identityDictionaries. These are slightly faster in most Smalltalk implementations.
For sparse arrays, a dictionary will do (there is also a class found in the goody-directory which implements sparseArrays on top of a dictionary).
#=
;
the compare method will recursively walk down the receiver and
the argument and compare elements (which may be other collections).
If you have to compare collections (searching game trees, pattern matching etc.)
you can avoid useless compares,
by adding some hashKey or signature to your collections,
and check these first for a fast trivial reject.
Since sets and dictionaries limit the number of compares when searching for
entries, there is not specific performance penalty when this kind of
object is stored into them. However, make certain, that your elements
provide a good hash value (some simply inherit the #hash
method
from Object
, which is only good for hashing on identity).
If you have a collection of mixed type objects, and want to search for some specific element, think about this comparison cost. It may be better to use separate containers or somehow tag elements.
When redefining the hash method, do not make its algorithm too expensive.
(Sets and dictionaries ask for the hashKey with every
add:
, remove:
, includes:
etc.)
You may have to compare the comparison time with the time it takes to
generate the hash value, and make a "good" compromise.
If you need big byteArrays (for example, in image processing),
you often have to allocate huge byteArrays, which are initialized with
pixel data soon after allocation.
In all Smalltalk memory systems, a major fraction of an objects creation
cost is the time required to initialize the new object (for normal objects,
the entries are nilled; for byteArrays, they are set to zero).
Of course, for normal objects, the nilling is required (otherwise, you could
get invalid object references), but for byteArrays, it is possible to create
them and not caring for the initial values.
This is done by:
...
data := ByteArray uninitializedNew:size
...
which is similar to the regular #basicNew:
, but does not
set the elements to zero (instead, you will find random values in the byteArray).
BTW: for huge truth-valued collections, it may make sense to define a
BooleanArray as a variableByteSubclass, and redefine the #at:
and #at:put:
methods to return boolean values instead of
numeric ones. This can save considerable memory (a factor of four of you
store one boolean per byte, or a factor of 32 if you compress 8 booleans
into one byte).
Actually, you will find such a BooleanArray already implemented in the system.
Since in practice, fractions are relatively seldom used, no specific tuning has been done. If there is a need, future releases may include a modified fraction class, which performs the above lazily (i.e. keeps the values unreduced as long as possible).
|sum|
sum := 1.0.
1 to:1000000 do:[:i |
sum := sum + 1
]
The above creates a million new floating point objects (which are, except for
the last one, discarded and reclaimed by the garbage collector).
All in all, the above creates and reclaimes about 20Mb of garbage.
The cost of allocating and collecting a float is in the order of 1-2 microseconds.
SmallIntegers do not suffer from this problem.
FloatArray and DoubleArray may also be subclassed (even with additional named instance variables) for special requirements.
Typical applications are matrix or vector classes.
Add primitive code for heavy duty operations (like matrix inversions, matrix multiplication etc.). These primitives can be written in a way to avoid any temporary garbage, which is not possible if regular floats are used.
Try to avoid creating many weakArrays. One reason for having many shadow objects created is the default dependency mechanism (i.e. non-Models). This creates a small weakArray for every object which has at least one dependent.
This is bad news, since using weakArrays for the dependency mechanism makes
pretty much sense - it allows you to forget about releasing dependencies.
(other systems use regular arrays for dependencies, and you have to manually
release these in order to allow the garbage collector to free objects.
Without a release, objects are kept from being freed by having a dependent.)
In practice, weakArray overhead is less of a problem,
since most dependencies are for models or objects derived from the Model class.
These have a redefined dependency mechanism, which is not based on weakArrays.
Future implementations may be rewritten, to reduce or even avoid this additional overhead.
There are certain Smalltalk specific aspects, which may be considered when tuning code. The following describes various microscopic effects. Do not expect as dramatic speedups as in the above examples. However, these microscopic effects may accumulate to a macroscopic effect, especially when present in loops which are executed many times.
Often, there is a conflict between the desire to write flexible, reusable and elegant code on one side, and the performance aspect on the other. We will not discuss this in the following, but instead simply list (and quantify) things.
#isKindOf:
message.
The #isKindOf:
method has to get the receivers class
and follow that classes superclass chain
way up to the top in the false case,
and up to the checked class in the true case.
In any case, this is one message send PLUS the message chain walk.
For average superclass chains, #isKindOf:
executes in the order of a microsecond (on my machine),
but it can take longer for deeply nested class hierarchies.
It is always better to write a check method, which returns false in
the default case and true for instances of your desired class.
I.e., instead of writing
"foo isKindOf:MyClass
"
you should
implement two methods:
in Object:
respondsToWhatINeed
^ false
or:
isMyClass
^ false
and in your class:
respondsToWhatINeed
^ true
or:
isMyClass
^ true
and write "foo respondsToWhatINeed
" or "foo isMyClass
".
This will always execute in a time shorter or equal to the #isKindOf:
time
(a single messageSend is required,
which takes some 300ns on the same machine)
and is more flexible, since you may implement
this check method in other classes, without a need to stay in a particular
subclass hierarchy.
If looking into the Object
classes protocol,
you will already find a bunch of these check methods
- #respondsToArithmetic
, #isStream
, #isString
and so on.
Measurements to prove the above (on SGI-Indy):
'hello' isKindOf:Number 1120ns (has to walk up for a miss)
1 isKindOf:Number 785ns (has to walk up 2 classes for a hit)
1.0 isKindOf:Number 787ns (has to walk up 2 classes for a hit)
'hello' isNumber 305ns (immediate return false in Object)
1 isNumber 305ns (immediate return true in Number)
Performance wise, it is better to access instance variables directly.
The cost of an instance variable access is in the order of 20-70ns, while
a message send returning an instance variable takes some 200-500ns.
Example:
Object subclass:MyClass
...
instanceVariableNames:'foo bar'
...
!MyClass methodsFor:'accessing'!
foo
^ foo
!
...
example
...
y := self foo.
...
!
is slightly slower than:
...
example
...
y := foo.
...
!
...
but of course, less flexible.
Despite the fact, that direct access is faster,
you should not ignore the maintenance and reusability issues.
Quite a few methods had to be rewritten in the past, to use self sends
instead of direct instVar accesses
(to allow things to be redefined in added subclasses).
Care for this "optimization" only after all other tuning fails. Never "optimize" classes this way, which are intended as abstract or framework classes. Only consider it in places which are executed millions of times in a loop (better: move the self send out of the loop).
|tmp|
1000 timesRepeat:[
...
tmp := ....
...
]
]
is slightly slower than:
1000 timesRepeat:[
|tmp|
...
tmp := ....
...
]
]
Also, local variable access is faster than globals, class variable or
instance variable access.
This may not be true for some (older?) versions of Smalltalk/V, especially in systems where block locals are technically implemented as method locals and are shared between the method and its blocks.
Blocks which only reference globals, class variables, block arguments or inner block locals are created once at compilation time and reused for every invocation (so called "cheap blocks"). These do not refer to the home context.
Currently, ST/X treats blocks which access read-only local variables (so called "copying blocks") just like full blocks. Later versions will handle these cases as well, and their cost will be somewhere in between cheap blocks and full blocks.
Try to avoid full blocks, if possible.
For example:
|i|
i := 1.
someArray do:[:element |
... do something with element and i ...
i := i + 1.
]
creates a full block, since the outer variable i is accessed
(and even modified) within the block.
someArray keysAndValuesDo:[:i :element |
... do something with element and i ...
]
is shorter and faster in execution, since the compiler can
create a cheap or copying block in many situations.
The following applies to full and copying blocks only.
All enumeration messages expect a block argument; if loops are nested, try to have the inner loop(s) be executed more often than the outer ones. For example:
1 to:10 do:[:i |
1 to:1000 do:[:j |
...
]
]
creates 10 garbage blocks (*), while:
1 to:1000 do:[:j |
1 to:10 do:[:i |
...
]
]
creates 1000 garbage blocks when executed (*).
Note: (*)
Actually, in the above concrete example, this is not really true, since
the stc-compiler inlines to:do:
messages,
iff its type analyzer can find out that the loop bounds are both integers.
Inlined loops do not create any garbage blocks at all.
Block creation cost is in the microsecond order for full blocks.
Block invocation time is between 150 (R4000/P5) and 500ns (485/50).
The cost of instantiating surviving contexts (i.e. an accessible blocks home) is also in the microsecond order and also creates additional GC overhead.
...
i := 1.
[i < aCollection size] whileTrue:[
...
do something with aCollection
...
]
here, "aCollection size
" is evaluated over and over,
since the compiler cannot know (in general) if the collections size changes
within the loop (actually, it does not even know about the meaning of the
size message; it does not even know that aCollection is some kind of
collection).
|sz|
...
i := 1.
sz := aCollectin size.
[i < sz] whileTrue:[
...
do something with aCollection
...
]
...
or (better):
...
1 to:aCollectin size do:[:i |
...
do something with aCollection
...
]
...
of course, all of the above can be considered "bad Smalltalk style".
A more general solution (if you don't modify the collection) is to write:
...
aCollection do:[:element |
...
do something with element
...
]
...
since it can operate on any type of collection, while the above examples
only work with sequencable collections (those that use a numeric index).
The same is true for arithmetic expressions, indexed collection access etc.
I.e. instead of:
...
... aCollection at:index+1 ...
...
... aCollection at:index+1 ...
...
use:
|nextIndex|
...
nextIndex := index + 1.
...
... aCollection at:nextIndex ...
...
... aCollection at:nextIndex ...
...
or (if possible) reuse the element:
|nextElement|
...
... (nextElement := aCollection at:index + 1) ...
...
... nextElement ...
...
Remark:
&
" (ampersand) for logical
and, where they could (should) use "and:
".
&
" always evaluates its right expression (even if the left
already returned false), the evaluation of conditional expressions
can usually be speeded up by using "and:
" and
arranging logical expressions in a way that the
minimum number of evaluations is required.
...
condition1 & condition2 ifTrue:[
...
].
...
(which evaluates both conditions in any case),
it is better, to write:
...
(condition1 and:[condition2]) ifTrue:[
...
].
...
which only evaluates condition2 if the first returned true.
Of course, there are rare situations, where it is intented that the
right expression is evaluated for its side effect and "&
" should
be used.
The same is true for "|
" vs. "or:
".
Notice, that the above may not be true for systems which do not create inline code for those blocks, but all modern Smalltalks do so (ST/X does).
Arrange the conditions that the condition can be computed with a minimum number of subexpression evaluations. I.e. (if the computation of the subconditions is expensive) instead of:
...
(mostlyFalse or:[mostlyTrue]) ifTrue:[
...
].
...
better write:
...
(mostlyTrue or:[mostlyFalse]) ifTrue:[
...
].
...
Of course, this is only possible, if the second subcondition does not
depend on side effects which result from the first subconditions evaluation.
#==
instead of #=
.
For #=
, there is no speed difference if the two objects are identical, since
the compiler produces inline code which checks for identity in any case.
However, if the objects are not identical, a regular message send to the #=
method is executed, which takes some 300-500ns.
(in contrast to the inline compare, which is in the order of 20-50ns).
However, for integers, you must be certain that the values are in the SmallInteger
range (i.e. -2^30 to +2^30-1).
Otherwise, the compare will always fail, since the other number classes
(especially: LargeInteger) never compare identical.
For symbols, #==
is a pointer compare, while #=
is effectively a string compare.
On the other hand, be careful, to never compare strings using #==
;
this will usually return false.
This (performance) argument is in opposite to the coding style argument; you must be certain about the objects type (i.e. SmallInteger) for this "optimization" to be valid.
If comparing different objects to one constant object in a loop,
use the constant object as receiver;
i.e. instead of:
|foo|
...
collection do:[:element |
element = foo ifTrue:[
...
]
].
...
better write:
|foo|
...
collection do:[:element |
foo = element ifTrue:[
...
]
].
...
which it is (slightly) faster in most Smalltalks.
(the reason lies deeply burried in the message caching mechanism).
(1 to:10) do:[:i |
...
]
and:
1 to:10 do:[:i |
...
]
the first example creates a temporary interval object (via #to:
) and
then loops on this object (via #do:
).
The second directly sends #to:do:
to an integer number.
Therefore, the second example requires only 1 message send, and does not create a new temporary object (and is therefore faster by a few microseconds).
The second version may also be compiled into faster code,
iff the compiler can somehow find out (or gets a hint)
that the the receiver and argument of the #to:do:
message are
integers. If this is the case, fast inline code is generated for the loop
and the block is inlined
(in the above example, this is certainly true).
#isNil
, #notNil
, #==
or #~~
instead of #=
or #~=
.
false
or true
,
although these compares are mostly implicit, by messages like #ifTrue:
or #ifFalse:
.
I.e. instead of:
something = nil ifTrue:[
...
]
or (often found in old ST-80 code "for performance"):
nil = something ifTrue:[
...
]
better write:
something isNil ifTrue:[
...
]
or:
something == nil ifTrue:[
...
]
The above are equivalent, since no other object may (should) return true
for a nil compare.
Therefore replacing the value compare by an identity
compare should always be legal.
#become:
has to search through all objects and replace all exchange all (pointer-)
references.
Although #become:
has been optimized and can do
quite fast in many cases (if the objects are of same size or if they
are located in the newSpace), the worst case (different sizes and
either one in oldSpace) involves a search through all memory and executes
in the order of 10-100ms, depending on the size of object memory.
In large applications and/or if paging is involved,
it can take up to a second or more.
All collection classes in ST/X have been designed to avoid
#become:
in their grow operations - this is done by having
the variable part of the collection in an array which is kept in an instance
variable (instead of having it in the objects indexed part).
All direct pointer Smalltalk implementations have this problem in common - so the above is also true for the Squak Smalltalk system (among others).
In your own classes, avoid become, and implement collections which need to grow by holding on a container object.
The following message sends are inline coded (and therefore much faster) if the receivers and/or arguments types are known to be integer:
#+
, #-
, #*
, #//
and #\\
#between:and:
#asInteger
, #asFloat
, #rounded
and #truncated
#abs
, #negated
#timesRepeat:
, #to:do:
and #to:by:do:
#at:
and #at:put:
if the compiler "feels"
that the receiver is propably an array, byteArray or string
You should use typehints only in well debugged, performance critical methods.
You will find methods which make good use of type hints in the Image class -
especially where bulk pixel data is processed, type hints can speedup these
operations by factors.
Copyright © 1995 Claus Gittinger, all rights reserved
<cg at exept.de>