Differences between ST/X and other Smalltalks

Short Introduction
PoolDictionaries
Floats vs. Doubles
Float Arithmetic Over/Inderflow
LargeInteger
MethodDictionary
Class Names
New Strings
String Compare
String Compare against Symbols
Dependents
Direct Pointers
Variable Collections
WriteStream
Restarting / Returning from Contexts
Processes after an Image Restart
Class and Method-Categories
Fileout Order; Class vs. Instance Methods

Short Introduction

All Smalltalk dialects differ slightly in some minor implementation details, of which some are visible on the programmer level. This document tries to summarize at least the one's which are known to the author.

The dialects mentioned are:

Smalltalk-80 or ST-80
referring to the original "Blue Book" Smalltalk.
Squeak and Pharo
this started from ST-80 as a base, but has moved on a lot since then.
VisualWorks or VW
also started from ST-80 as a base, but also has changed on a lot since then.
ST/V and VSE (Visual Smalltalk Enterprise)
departed early from ST-80. Used to be pretty wide spread, but has been no longer supported since ParcPlace bought the product. These days (2015), it is very seldom still in use.
V'Age (VisualAge Smalltalk)
originally from IBM (Instantiations), and now maintained by Instantiations.
has been massively used in the commercial world until IBM lost interest in it.
ObjectStudio (formerly: Enfin Smalltalk)
bought by Cincom, and now integrated into VisualWorks.

Semantic Differences

A few border cases are not (or not well) defined in the language standard, and/or are differently implemented in various dialects. Avoid depending on any of the following (the list is probably incomplete, and will be extended as we know more):

Construct	Comment	ST/X	VW	Squeak	V'Age	ObjectStudio
#foo='foo'	Symbol = String	true	false	sq?	va?	os?
1=0 ifTrue:[1]	value if a false ifTrue	nil	nil	sq?	va?	false
[:n\|] value:1	value of empty block	nil	1	sq?	va?	nil
Array new add:1;yourself	can add to array	#(1) (w)	ERROR	sq?	va?	#(1)
Array new add:1	answer of add:	1 (w)	ERROR	sq?	va?	#(1)
(Array new:1) at:1 put:7	answer of at:put:	7	7	sq?	va?	#(7)

Note: (w) means: a warning is sent to stderr/Transcript

PoolDictionaries

Before release 5.3, ST/X did not support pool dictionaries.
The implementation of pool dictionaries is based on class variables (i.e. pool variables are actually implemented as class variables of a SharedPool class).
A poolvariable must be initialized in the pool classes #initialize method.

Floats vs. Doubles

Smalltalk-80 supported only single 4-byte precision real numbers named Float.
VisualWorks added 8-byte double precision reals, named Double.
ST/V only supports 8-byte floats, which are named Float there.

ST/X provides single precision real numbers, called ShortFloat and double precision numbers, called Float. An alias named Double is provided for compatibility. This scheme provides the best compatibility of ST/X to all of the above dialects. If code is imported, referring to the Float class, it will work (although the precision of the real will be higher than in VisualWorks).

Float Arithmetic Over/Inderflow

Floating point arithmetic uses the underlying machine (i.e. C-language) support for double-precision arithmetic. If, in case of over- / underflow, the machine does NOT create a NAN (Not A Number) and does NOT create a floating point exception, ST/X will NOT detect these conditions.
(however, in reality, all real-world system do so)

LargeInteger

ST/X uses a single class to represent LargeIntegers, where the instances keep track of their sign plus absolute value. This is contrast to ST-80, VisualWorks and Squeak where the sign is encoded in the class by using separate LargePositive- and LargeNegativeInteger classes.

Method Dictionary

In ST/X, up to rev. 2.10.9, classes kept the methods and selectors in separate Arrays instead of a dictionary. This was done to avoid the need for knowledge about the implementation of dictionaries in the runtime system.
Thus, in ST/X it is allowed to change the layout of Set and Dictionary without affecting the workings of the runtime system.

ST-80 uses dictionaries (or a variant: "MethodDictionary") to hold the selector-to-method associations.
Unless you access these instance variables directly, the protocol makes both implementations compatible (i.e. the #methodDictionary method in ST/X creates a true dictionary from these arrays and the #methodDictionary: method extracts things from the supplied dictionary).
For portability, you should always use those access methods - never directly manipulate those instance variables.

Starting with rev. 2.10, ST/X uses a a lightweight MethodDictionary class. Its protocol looks much like a dictionary to the outside world, but it is implemented differently and its layout is known & understood by the VM.
You cannot change the instance layout of MethodDictionary.

Class Names

All major Smalltalk dialects (and ST/X) return a classes name as a symbol. The only exception is ST/V, which returns a string.

New Strings

In ST/X, newly created strings are filled with space characters (Character space); ST-80 fills them with a null character (Character value:0).
If you care for compatibility, you should create your strings in ST/X with:

	String new:n withAll:Character space

this will return the same (space filled) string on all systems.

String Compare

In ST-80, the equal method (#=) of strings does a case insensitive compare;
i.e. ('foo' = 'FoO' -> true).
In ST/X, the compare is case sensitive.
To compare strings while ignoring case differences, use the #sameAs: message in ST/X.

String Compare against Symbols

In ST-80, strings and symbols do not compare equal, even if they contain the same (Character-) Elements.
In ST/X, the the comparison returns true, if the elements are the same; i.e. ('foo' = #'foo' -> true).

Dependents

ST-80 stores dependents in a Set; ST/X uses a WeakSet.
This has the advantage, that a missing #release of a dependent will not lead to memory leaks. On the other hand, dependents will not be kept from being garbage collected simply by being a dependent of someone.

It has been reported that some ST-80 programs seem to depend on this being true, keeping references to some objects only via the dependency relationship.
To the author, this looks like a questionable design.

For programs which depend on the ST-80 semantics, ST/X offers an additional non-weak dependency mechanism, which is available via the messages: #addNonWeakDependent:, #nonWeakDependents and #removeNonWeakDependent:. These methods are found in the Object class.

Direct Pointers

ST/X does NOT use an object table, but represents objects by direct pointers to the underlying storage. This is an implementation issue and does not have an impact on the semantic of the language.
Using direct pointer should give the system some speed advantage in the normal case, by avoiding a memory indirection in every object access.

However, a possible drawback is, that it makes the "become:" operation slower in some cases, since instead of a simple pointer exchange, the whole memory may have to be scanned for references (this is the worst case; in many situations, only a search through a smaller part of the memory is required).

To avoid this, most collection classes have been rewritten to avoid "become:", which may make these classes less compatible for subclassing (more on this below).

It is not guaranteed, that this may hold in future versions - an experimental indirect version is planned to measure the speed (dis)advantage and decide upon these results (due to simplifications in the garbageCollector, it has still to be proved, if there is really a disadvantage).

Another possible problem is identityHashing, which cannot be based upon the pointer (i.e. address of the object table entry) in ST/X. To support identityHash, ST/X reserves some bits in the object header which contain the hash key. Since only 12 bits are currently available (in 32bit systems), hash- collisions are to be expected in IdSets/IdDicts with more than 4096 elements (usually, collisions occure before that many elements are added). Notice, that more bits are available in 64bit systems, and that 32bit systems are going to become obsolete sooner or later.

If you plan to hash heavily on instances of some new class and those hashtables are going to be (much) larger than 4k elements, you can (should) provide a different identityHash implementation, which assigns unique hashKeys (i.e. from a simple counter) to new instances and keep this hashkey in an instance variable.
Redefine identityHash in that class to return the value of this instance variable.
I.e. implement:

    ... subclass:#MyClass
	...
	instanceVariableNames:'... hashKey ...'
	classVariableNames:'NextHashKey'
	...

    !MyClass class methodsFor:'initialization'!

    initialize
	NextHashKey := 1
    ! !

    !MyClass methodsFor:'hashing'!

    identityHash
	"get my hashKey"

	"my key is nil, when asked for the hashKey the very first
	 time. Then assign a new unique key.
	 When asked again, return the (now nonNil) hashKey, as assigned
	 previously.
	"
	hashKey isNil ifTrue:[
	    hashKey := NextHashKey.
	    NextHashKey := NextHashKey + 1
	].
	^ hashKey
    !

If you plan to hash heavily on instances of existing system classes, there is no easy fix, since the field reserved in the object header cannot easily be made larger.

Notice, that the expected number of hash collisions is not growing too fast; the default hash provides reasonably good behavior for sizes up to (say) 50000 elements.

Measuring code:

    |set names t|

    #(5000 10000 50000 100000 200000) do:[:n |
	"
	 only want to measure the time spent in the set;
	 therefore, create the names before doing the timing:
	"
	set := IdentitySet new:n.
	names := (1 to:n) collect:[:i | i printString asSymbol].

	t := Time millisecondsToRun:[
		names do:[:nm | set add:nm]
	     ].

	Transcript show:'with '; show:n printString; show:' elements; adding -> ';
		   show:t printString; show:'ms'; endEntry.

	t := Time millisecondsToRun:[
		names do:[:nm | set includes:nm]
	     ].

	Transcript show:' testing -> ';
		   show:t printString; show:'ms'; cr; endEntry.

    ].
    "get rid of the 200000 new symbols"
    set := names := nil.
    ObjectMemory reclaimSymbols.

the code above was executed three times on a 100Mhz R4000 (32Mb SGI Indy - no 2nd level cache) and on a 133 Mhz P5, 32Mb and 256Kb second level cache.

Varying the number of elements, they show the following runtime behavior (both tests were executed with interpreted bytecode - compiled code is slightly faster):
R4000:

    with   5000 elements; adding ->  103ms testing ->   91ms
    with  10000 elements; adding ->  234ms testing ->  195ms
    with  50000 elements; adding -> 1132ms testing -> 1037ms
    with 100000 elements; adding -> 5115ms testing -> 4642ms
    with 200000 elements; adding -> 6455ms testing -> 4296ms

Pentium:

    with   5000 elements; adding ->   72ms testing ->   54ms
    with  10000 elements; adding ->  142ms testing ->  116ms
    with  50000 elements; adding ->  764ms testing ->  578ms
    with 100000 elements; adding -> 2533ms testing -> 2003ms
    with 200000 elements; adding -> 2925ms testing -> 2410ms

Notice, that the above execution times are more affected by memory reclamation speed and garbage collector effects than by the raw hashing speed; which explains the unreasonable result in the 100000 element testrun (on the tested systems, the newSpace's size is 400k. Thus, sets with sizes upto about 50k elements fit into it - bigger ones are allocate in the oldSpace).

I'd be interested in the results on other smalltalk systems.

Variable Collections

In ST/X, collections are not implemented as variable subclasses of Collection. Instead, most collections have an added instance variable which holds the variable part.
For example, in ST-80, OrderedCollection may be defined as:

    Collection variableSubclass:#OrderedCollection
	       instanceVariableNames:'index1 index2'
		...

while in ST/X, the corresponding definition is:

    Collection subclass:#OrderedCollection
	       instanceVariableNames:'contentsArray index1 index2'
		...

Originally, the reason to do so was to avoid the use of #become:, which can be expensive in direct pointer implementations (see above).
Thus, to grow an orderedCollection, ST-80 would do:

    ...
    newCollection := OrderedCollection new:newSize.
    newCollection replaceFrom:1 with:self startingAt:1.
    self become:newCollection
    ...

while in ST/X, the code for grow is:

    ...
    newContents := Array new:newSize.
    newContents replaceFrom:1 with:contents startingAt:1.
    contentsArray := newContents
    ...

not arguing which is more elegant, it adds some incompatibility when subclasses of those collections are filed-in from ST-80. To port such code to ST/X, you have to replace all access-messages to the receiver (self) by corresponding messages to the contents array;
i.e. withing the subclass,

    ...
    self basicAt:index
    ...

has to be changed to:

    ...
    contentsArray basicAt:index
    ...

WriteStream

In ST-80, a writeStreams #contents method returns a copy of the currently accumulated contents, while #reset simply positions the write pointer back to the beginning.
In ST/X, #contents may return the contents directly (i.e. NOT a copy) in some cases and #reset always creates a new empty contents collection. This has been done (in ST/X) after investigating the typical uses of a writestream and finding out that most are for generation of some collection via append operations (to avoid concatenation).
With that change, the creation of another garbage object is avoided in most situations
You have to be aware of this, if you access the contents of a writeStream AND continue to write to the stream after that.

Restarting / Returning from Contexts

In ST/X, contexts are only made restartable/returnable if its home method/block contains a non inlined block. This means, that you cannot return from or restart every context in the calling chain from within the debugger. Also, code which follows the sender chain (i.e. "thisContext sender"), cannot assume that these contexts are returnable or restartable.
Contexts which contain an exception handler are always restartable/returnable, since they will contain a non inlined block.
This little inconvenience was the price to pay for slightly faster execution (less state has to be saved in a method's entry code).

Processes after an Image Restart

In ST/X, processes cannot be continued after an image restart. Instead, all processes must be recreated manually.
A process can be instructed to automatically restart itself (from the beginning) at image restart time, but continuation (at the point where it was left) is not and will probably never be implemented.
The reason is that to implement this, many CPU and compiler dependencies (layout of the machine stack) had to be coded into ST/X's runtime system; thereby limiting its high portability.

All view processes are automatically restarted by the system, however their process priority is not restored correctly.

Other processes should be restarted in an #update method of an object which is a dependent of ObjectMemory. After restart, these dependents will be notified by ObjectMemory doing a self changed:#restart.

Late note:
With rel2.10.4 of ST/X, processes can be marked as restartable. These processes will automatically be restarted (executing the first statement of their creation block) when an image is snapped in.
Of course, this still does not continue the process where it left off, but at least reliefs you from caring about image restart, installing dependents etc.

Your process should decide on some flag (instance variable of some object) whether it has been restarted and try to continue where it left off.
However, be aware that it is not possible to restart or continue any context objects which were created in its previous life.

Class and Method-Categories

ST/V and VisualAge dialects do not support class- and method categories.
If such code is filedInto ST/X, classes will be categorized under the pseudo category "ST/V classes" and methods under "no category".

ST-80 holds the class categories in a special organization object, which keeps category vs. class relations in a dictionary-like fashion.
In that, a category may even exist (and persist) without any classes belonging to it.

In ST/X, the classCategory is kept in an instance variable of the class object - therefore, removing a category's last class also logically removes that category.
As a side effect, categories created in the browser vanish, if no class is created in that category.

The above is also true for methodCategories - here, the category is kept in the method object - NOT in the class object (as done in ST-80).
Likewise, methodCategories vanish, if no method is ever created for it, or the last method within a category is removed.

Fileout Order; Class vs. Instance Methods

In an ST/X fileOut, class methods are stored before instance methods; in VW / ST-80 class methods come last (which is wrong)

The reason is that in VW, a class cannot be filedIn again iff it redefines the syntax of its methods (via #compilerClass); for example, it is not possible to correctly transport SQL classes via fileOut-fileIn, since those redefine the compilerClass (in a class method), which comes after the instance methods. Therefore, compilation errors will arise during fileIn of the instance methods.

<cg at exept.de>

Doc $Revision: 1.33 $ $Date: 2021/12/13 21:44:43 $