Smalltalk basics

					To a hammer,
					everything looks like a nail.

Motivation

Smalltalk provides many features which are hard or impossible to implement in many other programming languages - namely closures, real reflection, dynamic late binding and a powerful integrated (I mean: really integrated) development environment.

Before describing the language and how you can create your own programs, we should explain a few basics - both to give you some background and to define the technical terms used in the documentation (and literature).

Keep in mind that this text is only a short introduction - we recommend reading of a standard textbook on the language for more detailed information on the language (-> 'literature').

Definitions, Nomenclature and Concepts

Objects

In Smalltalk, everything is about objects.
Objects can be as simple as a number, a character or as complex as a graphic being displayed, windows, user dialogs or complete applications. Typical applications consist of many objects (hundreds or thousands), each being specialized for and responsible for a particular functionality.

In contrast to hybrid systems like C++ or Java, "everything" means really "everything" in Smalltalk. This includes integers, characters, arrays, classes and even a program's stackframes, which hold the local variables during execution.
In Smalltalk, there are no such things as "builtin" types or classes, which have to be treated different, or which do not behave exactly like other objects with respect to message sending, inheritance or debuggability. For example, in Smalltalk, classes like integer, character, string or classes themself can be given new or modified methods - even at runtime, by dynamically loading new code.

Messages & Protocol

Objects communicate by sending messages to other objects. In other languages, this is sometimes called a "virtual function call", "virtual function" or even "function" alone.
All of those other names are a bit misleading, and - in the mathematical sense - actually wrong, because these message-send operations may have side effects or the result may depend on more than the argument(s) alone (i.e. they are not really "functions" in the mathematical sense).

We will therefore use the term "message" or "message send" for the act of asking for an operation by name, and, as we will see later, the term "method" for the actual code which will eventually perform the operation.

To the outside world, any internals of an object are hidden - all interaction happens only via messages. The set of messages an object understands is called its "message protocol", or "protocol" for short.

For example,: a number's protocol contains the +, -, *, etc. messages.; a string's protocol contains the asUppercase, asLowercase, etc. messages.

When an object receives a message, it is the object itself, which decides how to react to the message - the Smalltalk language does not imply any semantic meaning into it.

Therefore, theoretically, an object may add the "+" message to its protocol and perform an operation which has nothing to do with the mathematical concept of adding numbers.

In practice, this is never done in Smalltalk, since it makes programs less understandable. For example, the Java operator to concatenate strings is "+", whereas in Smalltalk it is "," (comma).
This was done by purpose, to make the code easier to understand.

However, it is useful to keep in mind that only the message's receiver is responsible for the outcome, and in theory, any operator or message selector can be redefined by any object. (As we will see, this is also the reason for the uncommon precedence rules in binary operations.)

On the other hand, it makes the system very flexible. For example, it is very easy to extend the numeric class hierarchy with additional things like Complex numbers, Matrices, Functional Objects etc.
All that is required for those new objects to be handled correctly is that they respond to some basic mathematical protocol for arithmetic, comparison etc. Existing mathematical code is usually not affected by such extensions, which makes Smalltalk one of the best environments for code reuse and sharing.

Classes & Instances

Since there are often many objects which share the same protocol, Smalltalk groups similar objects into classes. Every object is an instance of some class, and instances of the same class have the same protocol (although, each may have different internal state).
This does not imply that instances of different classes must implement different protocols - actually, there are classes in the system (numeric classes, collection classes), which implement a common protocol, but are implemented completely different internally. This is called polymorphism.

Classes may have zero, one or many instances.
You may wonder how a class without instances could be useful - this will become clear when inheritance and abstract classes are described further down in this document.

Examples:: 1, 99 and -8 are instances of the Integer class; 1.0 and 3.14159 are instances of the Float class; 'hello', 'foo' are instances of the String class; the buttons in a window are instances of the Button class; nil is the one and only instance of the UndefinedObject class

For the curious:: we say that Smalltalk is a class based object oriented language. There are other languages around, which are not based upon the concept of classes - the Self programming language, for example.
However, most object oriented languages (C++, Eiffel, Java, C# and many others) are class based.

So, in Smalltalk every object is an instance of a class, and it is the class which describes the behavior of its instances (by defining the protocol and thereby defining how instances react to messages). Smalltalk allows access to this class at runtime and to gather information from that "class thing". This is called reflection.
Because Smalltalk is a pure object oriented language, this "class thing" is also an object and therefore responds to a set of messages which is called its metaclass protocol (more on this below).

Methods

When a message is sent to an object, a corresponding action has to be performed - technically, a piece of code must be executed. This piece of code is called a method.

Every class keeps a table (called "MethodDictionary") which associates the name of the message (the so called message selector) to a method.
When a message is sent to an object, the classes method table is searched for a corresponding entry and - if found - the associated method is invoked (more details below ...).

Since Smalltalk is a pure object oriented language, this table is also an object and accessible at execution time; it may even be modified during execution and allows objects to learn about new messages dynamically.
Of course, the interactive programming environment heavily depends on this; for example, the browser is a tool which adds new items to this table when a method's new or changed code is to be installed.

Inheritance

In Smalltalk (like in most other object oriented languages), classes are organized as a tree. Every class has a so called superclass, and is called a subclass of its superclass. (C++ programmers call these "baseclass" and "derived class" respectively).
Since the superclass may itself have a superclass, we get a superclass-chain, which (typically) ends in a class called "Object" (*).
In Smalltalk, a class can have only a single superclass (as opposed to C++, for example, where classes can and often must inherit from multiple baseclasses) (**).

A class inherits all protocol as defined by its superclass(es) and may optionally redefine individual methods or provide additional protocol.

Therefore, a message send performs the following actions (***):

the object's class is asked for the method table
the table is searched for a method associated to the selector
if not found, repeat the previous step with the superclasses table.
unless there is no superclass - then report an error (see below).
if found, execute the method's code

Error reporting is done by packaging the bad messages arguments into a so called "message" object and resending another message (#doesNotUnderstand:) to the receiver with the message object as argument.
This mechanism can be used to implement special error handling and recovery mechanisms.

Footnotes:
(*): For the curious:
Although most classes eventually do inherit from Object, there is no need to. Actually, it may occasionally make sense for a class to inherit from no class at all (i.e. to have no superclass). The effect is that instances of such classes do not inherit ANY protocol and will therefore trigger an error for all received messages.
This behavior is useful to implement advanced features, such as proxies (placeholders) for remote objects, message tracers, etc.
(**): For the curious:
Support for multiple inheritance (MI) in C++ serves two purposes: first, to inherit private variables and operations (i.e. reuse) and second, to support polymorphism.
As multiple inheritance can make things very complicated and can lead to number of problems, Smalltalk does no longer support it. There used to be an experimental implementation in early ST versions, but was abandoned later.
Java provides interfaces which do solve the polymorphism issues of MI. Smalltalk does not require interfaces for that reason, as any message can be sent to any object (no matter what the class of the receiver is).
(***): For the curious:
All Smalltalk implementations use various tricks (caching) to avoid the above search (also called method lookup) if possible. In most situations, the method's code which corresponds to a selector is reached quickly by an indirect function call.

Instance Variables

An object may contain internal state (also called attributes). In Smalltalk, this state usually consists of references to other objects (*).
The slots which hold those references are called instance variables (**).
(Some refer to them as "private variables".)

All instances of a class provide the same message protocol, but typically contain different internal state.
It is actually the class, which provides the definition of the protocol and amount of internal state of its instances.

Example,: 'hi' and 'world' are both instances of the String class and respond to the same set of messages. But the internal state of the first string consists of the characters "h" and "i", whereas the second contains the characters "w", "o", "r", "l", "d".

An object's instance variables are only accessible via protocol, which is provided by the object - there is no way to access an object's internals except by sending messages to it.
This is true for every object - even for the strings in the example above.
There is no need for the sender of a message to actually know the class of the receiver - as long as it responds to the message and performs the appropriate action.

Example,: a string provides access to its individual characters via the 'at:' message. You could write an ExternalString class, which fetches characters from a file and returns them from this message. The sender of the 'at:' message would not be affected at all by this (except for a possible performance degration ;-).
What is more important: as long as the required protocol is implemented, every program which used to work with instances of String will also work unchanged with instances of ExternalString - there is no need to change the program in any way; there is not even a need to recompile, rebuild or any other means of telling the system about this new class.
Such additions are even possible while the program is executing.

FootNotes:
(*): For the curious:
other state which is not held in instanceVariables, and which are not references to other objects are the instance's size (collections) and its hashKey.
These are not accessible as instanceVariables - special protocol is provided to access those (#basicSize, #identityHash etc.).
(**): For the curious:
technically, those references are mostly pointers to the referred objects, with a few exceptions: smallIntegers keep the numeric value as a bit pattern, strings and others store raw bytes but simulate holding character objects to the outside world. Finally, some Smalltalk implementations (like ST/X) represent the nil-Object internally by a special NULL-pointer.

Metaclasses

Since Smalltalk is a pure object oriented language, everything within the Smalltalk world is an object - this implies that every object's behavior is determined by its class.
This is even true for classes themself - to the Smalltalk system, these are just like any other object and their protocol is specified by the classes class. These `class classes' are called Metaclasses.

Thus, when we send a message to some `normal' object, the corresponding class object provides the behavior - when some message is sent to a class object, the corresponding metaclass provides the behavior.
Technically, messages to classes are treated exactly the same way as messages to non-class objects: take the receiver's class, lookup the method in its method table, execute the method's code.

Since different metaclasses may provide different protocol for their class instances, it is possible to add or redefine class messages just like any other message.
As a concrete example, take instance creation which is done in Smalltalk by sending a "new"-message to a class.
In Smalltalk, there is no such thing as a built-in "new" (or any other built-in) instance creation message - the behavior of those instance creation (class) messages is defined exclusively by metaclass protocol.
Therefore, it is possible (and often done) to redefine the "new" method for special handling; for example singletons (classes which have only a single unique instance), caching and pooling (the "new" message returns an existing instance from a cache), tracing and many more are easily implemented by redefining class protocol.

Abstract Classes

Abstract classes are classes which are not meant to be instantiated (i.e. no instances of them are to be created). Their purpose is to provide common functionality for their subclass(es).
In Smalltalk, the most obvious abstract class is the Object-Class, which provides a rich protocol useful for all kinds of objects (comparing, dependency mechanism, reflection etc.).

Smalltalk Language Syntax

To a newcomer, the Smalltalk language syntax may look somewhat strange at the beginning; however, you will notice, that the syntax is highly orthogonal and pretty simple compared to most other programming languages (except Lisp ;-). Especially the syntax of blocks (which are described later) is one of the most beautifully designed ever.
Interestingly, people which have not been previously exposed to languages such as C or C++ find Smalltalk much more intuitive than hard core programmers.

As we will see shortly, Smalltalk programs only consist of messages being sent to objects.
Since even control structures (i.e. conditional evaluation, loops etc.) are conceptionally implemented as messages, a common syntax is used in your programs both for the programs flow control and for manipulating objects.
Once you know how to send messages to an object, you also know how to write and use fancy control structures.

Smalltalk's power (and difficulty to learn) does not lie in the language itself, but instead in the huge protocol provided by the class libraries.

Let's start with languages building blocks...

Spaces and Program Layout

The Smalltalk syntax is format free (as opposed to the Fortran language, for example). Spaces and line breaks may be added to a Smalltalk program without changing the meaning, except for the following:

within an identifier (i.e. a variable's name)
within a numeric constant
within some compound tokens, such as the assignment token: ":="

So, although you are free to use any indentation you like, we highly recommend that you adhere to a standard. Otherwise your code might be hard to read and understand by you and others later. Remember the golden coder's rule: "Code is written only once, but read many many times".

Comments

Regular Comments

In Smalltalk a comment is anything enclosed in double-quotes ("). A comment may spawn multiple lines.
Examples:

    "some comment"

    "this
     is
     a
     multiline comment"

    "
     another multiline comment
    "

End-of-Line Comments

As a language extension, ST/X also allows end-of-line comments. These are introduced by the character sequence "/ (doublequote-slash) and treat everything up to the end of the line as a comment:

    "/ this is an end-of-line comment

If the remaining line contains comment characters, these are ignored. As such, End-of-line comments are especially useful to comment-out code which contains comments.

Token Comments

Finally, the character sequence "<< (doublequote-less-less) introduces a token-comment. The word following that sequence (possibly separated by optioal spaces) is treated as an identifier (i.e. the "token"), and everything up to a line which starts with that token is ignored.
For example:

    "<< END
    some comment line
    more lines
    a line with "another comment"
    and followed by
    "/ an end of line comment
    plus more stuff here
    END

will all be ignored and treated as a comment, even if those lines contain other comments. As such, Token comments are highly useful to comment-out code which contains any other comment.

Literal Constants

Literal constants in a Smalltalk source code are processed by the compiler, which creates corresponding objects at compilation time and places references to these constants into the generated code.
This is in contrast to run-time created objects, which are typically created by some variant of the #new -message sent to a class or the #copy-message sent to an instance.

The following literal constant types are allowed:

Integer constants (possibly negative):
6,
-1,
12345678901234567890
with a radix (number base):
8r0777,
16r80000000000,
16rAFFE, -16r1000 and 16r-1000,
16r123456789abcdef0123456789abcdef,
2r0111000
There is no limit on the integer constant's value; eg. 1234567890123456789012345678901234567890 is a valid integer literal (and NOT truncated, overflowing or leading to an error).
Exact Rational (Fraction) constants:
1/3,
-1/3,
Fractions consist of an integer numerator and integer denominator. Both being arbitrary integers (i.e. unlimited in size).
Inexact Rational (Float) constants:
1.234,
1e10,
1.5e15
Float constants with radix (i.e. "16r10.1" or "2r10.1") are allowed, but should not be used in practice.
(because the 'e'-exponential character is a valid numeric character in hex; and therefore, float constants with a radix-base greater than 14 cannot have an exponent).
The name "Float" is a historic leftover - internally the IEEE double precision floating point representation is used (independent of the exponent character).
For compatibility with other Smalltalk systems, the "d"-character is also recognized as an exponential character. I.e. 1d10 has the same value as 1e10.
FixedPoint constants (Scaled Decimals):
1.234s4,
10s4,
FixedPoint constants are rational numbers which print themself as a scaled decimal number with the given number of post-decimal-point digits. Thus "1.234s4" prints as "1.2340" and "10s4" as "10.0000". Scaled Decimals are mostly used for monetary values and to format tabular data in a nice way.
Because scaled decimals are not supported by all Smalltalk systems, the compiler can be configured to treat them as errors via the settings dialog. If you want to ensure that your program is portable, disable them.
Boolean constants:
true, false
The UndefinedObject constant:
nil
Character constants from the 8-bit iso8859-1 character set:
$c
ST/X also allows unicode character constants with a codepoint above 16rFF, of up to 30 bit (i.e. up to 16r3FFFFFFF).
Therefore, $≠ is also a valid character constant in ST/X and represents a character with a codePoint of 16r2260 (8800).
Be aware, that not all Smalltalk dialects support unicode. Most noteworthy is VisualAge Smalltalk, which does not. However, most modern Smalltalks do (Squeak, Visualworks and GNU-Smalltalk). So your program may be less portable if you use them. If portability against such old Smalltalk versions is an issue, we recommend at least extracting unicode specific code into easy maintainable extra methods.
String constants:
'foo' or
'a long string constant'
String constants may spawn multiple lines.
The Smalltalk standard does not define any special escapes or other mechanisms to represent unprintable characters (such as <cr>, <tab> or <backspace>) in a string. This is certainly a major missing feature and something that ought to be added in a future Smalltalk standard.
ST/X provides two mechanisms, one being compatible with other Smalltalk's syntax, the other being a language extension, which will make your code non-portable. For portability, use the "withoutCEscapes"-message, (i.e. 'foo\nbar\tbaz' withoutCEscapes).
If you do not care for portability, prefix the string constant with a "c"-character, as in:
c'foo\nbar\tbaz'.
Both will unescape in a C-language like fashion, handling "\n" (newline), "\r" (return), "\t" (tab), "\b" (backspace), "\f" (formfeed), "\g" (bell), "\0" (null) and "\xXX" (hex-byte). However, the withoutCEscapes message will unescape at execution time, whereas the c-prefixed string will be at compilation time. Thus the later will execute faster.
ST/X also allows unicode string constants where individual characters may have a codepoint of up to 30 bit (i.e. up to 16r3FFFFFFF). However, the above mentioned character portability issues apply.
Symbol constants:
#'bar',
#'++' or
#'foo bar baz'
#foo - see below
Symbols are unique immutable strings - that is, the system arranges that for a given sequence of characters, at most one corresponding symbol object exists. (Lispers call them Atoms)
Symbols can be used much like readonly Strings, with the big advantage that they can be compared using identity compare (== / ~~) whereas Strings usually have to be compared using equality (i.e. contents-) compare operators (= / ~=).
If the symbol's characters are all alphanumeric or all from the set of binary special characters (+, -, *, and a few others), the quotes can be omitted and the short form #bar can be used instead of #'bar'. Until you've learned the exact details, always place those quotes around, to be sure.
Symbols are limited to the Latin-1 character set, and we do not intend to change this. The reason is that we do not want class names and method names to be written in non-English (it is hard enough, if some programmers do not follow that rule and write their stuff in different east-european languages...). Sorry to non-western natives; but as you are currently reading this, you obviously understand English better than and prefer it to German. And that a Chinese programmer will probably have more trouble reading (say) Hindu than English ;-) So this is one way to enforce at least a western language (the compiler is not smart enough to detect and complain about non-english).

More information on symbols is found in "collection classes".
Array constants:
#(1 2 $b 'hello' 3.14159)
The elements of an array constant, must be literal constants, and can be any of the literals described in this section.
Elements can themself be array literals - i.e. it can be a nested array literal, as in:
#(1 #two #(3 4) #( #(5 6) 7) ).
For simple symbol constants (identifiers) and nested arrays, the leading '#' may be ommitted within an array constant if it is not one of 'true', 'false' or 'nil' (however, we do not recommend doing so).
Also, array constants within an array constant are allowed to be written without the leading '#'-character. Therefore, the above array constant can also be written as:
#(1 two (3 4) ( (5 6) 7) )
ByteArray constants:
#[0 1 2 3 4]
The elements must be integer constants in the range 0..255. ByteArrays can be seen as more memory friendly, compact version of Arrays, and are often used when bulk data (bitmap images) is processed.

Identifiers

Identifiers (variable names) identify a variable. In Smalltalk, a variable holds a reference to some object (technically, a pointer to some object - not the object's contents)
Variables come in various flavours - differing in their scope (i.e. the visibility) and their lifetime.
Among others, there are global variables, class variables, classInstance variable, instance variables, arguments and local variables.

Identifiers must start with a letter or an underscore character. The remaining characters may be letters, digits or the underline character (*).
Examples:

foo

foo123

foo_123

aVeryLongIdentifier

anIdentifier_with_underline_characters

By purpose, identifiers consist of latin-1 alphanumeric characters (plus the underscore). Please read the section on symbols above for the reason.

Conventions

By convention, uppercase identifiers are used for global- and class-Variables.
Instance variables, arguments and local variables should start with a lowercase character. You will really confuse other Smalltalkers, and the compiler will give a warning, if you do not follow this rule.

FootNotes:
(*): Characters in variable names:
since not all Smalltalk dialects allow underscore characters in a variable name, this can be disabled in ST/X, to support portability checking of your code.
For portability with some (VMS-)VisualWorks Smalltalk variants, a dollar character ($) can also be allowed inside an identifier as a compiler option (the $ was used in the VMS Smalltalk version of ST/X).

Special Identifiers (Builtin Names)

nil
The one-and-only instance of the UndefinedObject class.
true and false
The two boolean truth values.
self
The receiver within a method.
super
Like self, but with different message lookup semantics if used as a message receiver.
This will be described later.
thisContext
The stackFrame object of the currently executing method or block as an object. Holds the receiver, message selector, arguments and local variables.
This will be described later.
here
Like self, but with different message lookup semantics if used as a message receiver.
This will be described later.
Since "here" is a Smalltalk/X language extension, its builtin-ness is less strict than that of the other special variables: if a variable named "here" is defined and visible in the current variable scope, here will refer to that variable; otherwise, it refers to the receiver (with different lookup semantics).

Messages

A message consists of three parts:

the receiver
the message name, called the selector
optional arguments

In contrast to other programming languages, Smalltalk uses a special syntax for messages, which makes the code readable almost like English. The syntax depends mainly upon the number of arguments. Notice that a message corresponds roughly to what C++ programmers refer to as a "virtual function call". If you have a Java background, you may want to read "Smalltalk for Java/JavaScript Guys".

Unary Messages

Messages without arguments are called Unary Messages. The name of a unary message (the "message selector") consists of a single word consisting of letters, digits or the underline character. The first character must not be a digit.
For example:

    1 negative

sends the message "negative" to the number 1, which is the receiver of the message.

Unary messages, like all other messages, return a result, which is simply another object.
In the above case, the answer from the "negative" message is the boolean false object.

Evaluate this in a workspace (using printIt); try different receivers (especially: try a negative number).

Unary messages parse left to right, so, for example:

    1 negative not

first sends the "negative"-message to the number 1. Then, the "not"-message is sent to the returned value. The response of this second message is returned as the final value. If you evaluate this in a workspace, the returned value will be the boolean true.

Try a few unary messages/expressions in a workspace:

    1 negated

    -1 negated

    false not

    false not not

    -1 abs

    1 abs

    10 factorial

    10 factorial sqrt

    5 sqrt

    1 isNumber

    $a isNumber

    $a isNumber not

    1 isCharacter

    $a isCharacter

    'someString' first

    'hello world' size

    'hello world' asUppercase

    'hello world' copy

    'hello world' copy sort

    #( 17 99 1 57 13) copy sort

    1 class name

    1 class name asUppercase

    WorkspaceApplication open

Notice, that in the above examples, you already encountered polymorphy: both strings and arrays respond to the sort message and sort their contents in place.
Also notice, that classes also respond to messages, just like any other object. The last example sends the "open"-message to the WorkspaceApplication class.

Keyword Messages

This type of message allows for arguments to be passed with a message. A keyword message consists of one or more keywords, each followed by an argument.
Each keyword is simply a name whereby the first character should be lower case by convention, and followed by a colon.
The arguments may be literal constants, variables or other message expressions (must be grouped using parenthesis, if another keyword message's result is to be used as argument).
For instance, in the message

    5 between:3 and:8

"between:" and "and:" are the keywords, the numbers 3 and 8 are the arguments and the number 5 is the receiver of the message.

The message's actual selector (i.e. the message name) is formed by the concatenation of all individual keywords; in the above example, the message selector is "between:and:".

As a beginner, keep in mind that this is different to both a "between:" and an "and:"-message. And of course, also "between:and:" and "and:between:" are different messages.
In the browser, the method will be listed under the name: "between:and:".

Keyword messages parse left to right, but if another keyword follows a keyword message, the expression is parsed as a single message (taking the keywords concatenation as selector).
Thus, the expression:

    a max: 5 min: 3

would send a "max:min:"-message to the object referred to by the variable "a".
This is not the same as:

    (a max: 5) min: 3

which first sends the "max:"-message to "a", then sends the "min:"-message to the result.
Try these in a workspace (don't fear the error...)

To avoid ambiguity you must place parentheses around.

Try a few keyword messages/expressions in a workspace (also see what happens, if you ommit or change the parenthesis):

    1 max: 2

    1 min: 2

    (2 max: 3) between: 1 and: 3

    (1 max: 2) raisedTo: (2 min: 3)

    'Hello' at: 1

    #(100 200 300) at: 2

    #(10 20 30 40 50 60) indexOf: 30

    #(10 20 30 40 50 60) at:('Hello' indexOf: $e)

Unary messages have higher precedence than keyword messages, thus:

    9 max: 16 sqrt

evaluates to 9.
(because it is evaluated as: "9 max: (16 sqrt)" which is "9 max:4".
It is not "(9 max: 16) sqrt", which is "16 sqrt" and would give 4 as answer.)

Binary Messages

A binary message takes 1 argument. Its selector is formed from one or two non-alphanumeric special characters. Some characters, such as braces, parenthesis or period cannot be used as binary selectors (*).

Binary messages are typically used for arithmetic operations - although, this is not enforced by the system. No semantic meaning is known or implied by the Smalltalk compiler, and binary messages could be defined and used for any class and any operation.

A typical example of a binary message is the one which implements arithmetic addition for numeric receivers (it is implemented in the Number classes):

    1 + 5

This is interpreted as a message sent to the object 1 with the selector '+' and one argument, the object 5. In a browser, the message will be listed under the name "+".

Binary messages parse left to right (like unary messages).
Therefore,

    2 + 5 * 3

results in 21, not 17.
(because of left-to-right evaluation, first '+' is sent to 2, with 5 as argument. This first message returns 7.
Then, '*' is sent to 7, with 3 as argument, resulting in 21 being answered.)

To change the execution order or to avoid ambiguity you should place parentheses around:

    2 + (5 * 3)

Now, the execution order has changed and the new result will be 17.

Unary messages have higher precedence than binary messages, thus

    9 + 16 sqrt

evaluates as "9 + (16 sqrt)", not "(9 + 16) sqrt". (notice, that sqrt returns a float result, and '+' is sent to the integer 9, with a float 4.0 as argument. All numeric operations support such "mixed-mode" operations and return an appropriate result object.)

On the other hand, binary messages have higher precedence than keyword messages, thus

    9 + 16 max: 3 + 4

evaluates as "(9 + 16) max: (3 + 4)" which is "25 max: 7" and answers 25.
It is not the same as "9 + (16 max: 3) + 4" (which results in 29) or "((9 + 16) max: 3) + 4" (which in this case also results in 29)

Again, we highly recommend the use of parentheses - even when the default evaluation order matches the desired order; it makes your code much more readable, and helps beginners a lot.

To practice, try a few binary messages/expressions in a workspace:

    1 + 2

    1 + 2 * 3

    (1 + 2) * 3

    1 + (2 * 3)

    -1 * 2 abs

    (-1 * 2) abs

    5 between:1 + 2 and:64 sqrt

    5 between:(1 + 2) and:(64 sqrt)

    #(100 200 300) at: (1+1)

The second example above shows why parentheses are so useful: from reading the code, it is not apparent, if the evaluation order was intended or is wrong.
You will be happy to see parentheses when you have to debug or fix a program which contains a lot of numeric computations.
Here are a few more "difficult" examples:

    1 negated min: 2 negated

    1 + 2 min: 2 + 3 negated

"Strange" Binary Messages

There are a few binary messages found in the system, which look like syntax at first sight, and are therefore a bit difficult to understand and read for beginners.
Examples for such fancy messages which are worth mentioning are:

, (comma): The ","-message is understood by collections, and mostly used for strings (which are collections of characters). As a binary message, it expects a single argument and returns the concatenation of the receiver and argument (i.e. a collection which contains the receiver's elements and those of the argument).
Thus, "'Hello','World' " returns the new string: 'HelloWorld'.
But it works with many other collections; the following concatenates two array objects:
"#(10 20 30),#(50 60 70)"
@: The "@"-message is understood by numbers. As a binary message, it expects a single argument. It returns a Point-object (coordinate in 2D space) with the receiver as x, and the argument as y value.
Thus, "10 @ 20" returns the same as "(Point new x:10 y:20)".
->: The "->"-message is similar to the above "@" in that it is a shorthand instance creation message. It is understood by any object and returns an association (a pair) object.
The message, "10 -> 20" returns the same as "(Association new key:10 value:20)".
?: The "?"-message returns the receiver if it is non-nil, and the argument otherwise. It is used to deal with possibly uninitialized variables in assignments or as message argument.
Thus, "a ? 20" returns the same as "(a notNil ifTrue:[a] ifFalse:[20])".

Notes:
(*): Binary Characters:
There is no real standard on which characters are actually allowed. For example, ST/X does allow for "#" or "!" to be used as binary selector, while other Smalltalk implementations do not.
Also, ST/X allows up to three characters, while other Smalltalk implementations only allow two.
For portable code, do not use more than 2 characters other than:
"+" ,"-" , "*" , "/" , "\" , "," , "%" , "&" , "|" , "<" , ">" , "=" , "?".
In ST/X, the actual set of allowed characters can be queried from the system by evaluating (and printing) the expression "Scanner binarySelectorCharacters".
(**): For the curious:
Technically, binary messages do not add any new functionality to the Smalltalk language - they are just syntactic sugar and Smalltalk could have easily be defined without them (i.e. in a Lisp-style, using keyword messages like 'plus:', 'minus:' etc.)

Message Syntax Summary

For some (especially for C or Java programmers), Smalltalk's message syntax might seem strange at first.
Interestingly, people with less programming experience seem to have less problems with this syntax - they often even find it more intuitive!

If you compare your favorite programming language against regular English, you will find Smalltalk to be much more similar to plain English than most other programming languages. For example, consider the order to a person called "tom", to send an email message to a person called "jane":
(assuming that tom, jane, theEmail refer to objects)

English Smalltalk Java / C++

tom, send an email to jane. tom sendEmailTo: jane. tom.sendEmail(jane);
tom->sendEmail(jane);

tom, send theEmail to jane. tom send: theEmail to: jane. tom.sendEmail(theEmail, jane);
tom->sendEmail(theEmail, jane);

tom, send theEmail to jane with subject: 'hi'. tom send: theEmail to: jane withSubject: 'hi'. tom.sendEmail(theEmail, jane, "hi");
tom->sendEmail(theEmail, jane, "hi");

Now, viewed from that angle, Smalltalk's syntax looks less strange and is actually very easy to read. You can see code that looks like:

English	Smalltalk	Java / C++
tom, send an email to jane.	tom sendEmailTo: jane.	tom.sendEmail(jane); tom->sendEmail(jane);
tom, send theEmail to jane.	tom send: theEmail to: jane.	tom.sendEmail(theEmail, jane); tom->sendEmail(theEmail, jane);
tom, send theEmail to jane with subject: 'hi'.	tom send: theEmail to: jane withSubject: 'hi'.	tom.sendEmail(theEmail, jane, "hi"); tom->sendEmail(theEmail, jane, "hi");

    album play.
    album playTrack: 1.
    album repeatTracksFrom: 1 to: 10.

and it does exactly what it looks like.

Another plus in Smalltalk is that the meaning of an argument is described by the keyword before it. Whereas in Java or C++ you have to look at a function's definition to get information on the order and type of argument, unless you use fancy function names like "sendEmail_to_withSubject()" which actually mimics the Smalltalk way.

Smalltalk was originally designed to be easily readable by both programmers AND non-programmers. Humor says, that this is one reason why some programmers do not like Smalltalk syntax: they fear to loose their "guru" aura if others understand their code ;-) .

Message Examples & Explanations

Here are a few message expressions as examples:

1 negated

sends "negated" to the number 1, which gives us a -1 (minus one) as result.

1 negated abs

demonstrates left-to-right evaluation of unary messages; first sends "negated" to the number 1, which gives us an intermediate result of -1 (minus one); then, the message "abs" is sent to it, giving us a final result of 1 (positive one).

-1 abs negated

first sends "abs" to the number -1 (minus one), which gives us a 1 (positive one) as intermediate result. Then this object gets a "negated" message.
The final return value is the number "-1" (minus one).

1 + 2

that seems obvious, but is a message send in Smalltalk: it sends the message "+" to the number 1, passing it the number 2 as argument. The returned object is 3.
Notice, that strictly speaking, the Smalltalk language does not define or require that the performed operation is an addition; instead, this is defined by how numbers react on (i.e. implement) the "+" message.
However, programmers would have a hard time if this was not defined as "addition"; therefore, in general, messages in the Smalltalk class libraries perform the action one would expect.

1 + 2 + 3

demonstrates left-to-right evaluation of binary messages; first, the message "+" is sent to the number 1, passing it the number 2 as argument. Then, another "+" message is sent to the intermediate result, passing the integer-object 3 as argument.

1 + 2 * 3

that is less obvious - however, from the above you should understand, that left-to-right evaluation is always done in Smalltalk (since the language does not define any arithmetic semantic for any message).
So, the outcome will be 9; not 7 as one would expect from mathematical precedence rules.
Use parentheses to change the evaluation order.

-1 abs + 2

demonstrates precedence rules, when mixing unary and binary messages.
first sends "abs" to the number -1 (minus one), then sends "+" to the result, passing 2 as argument.
The final return value is the number "3".

1 + -2 abs

demonstrates precedence rules, when mixing unary and binary messages.
first sends "abs" to the number -2, then sends "+" to the number 1, passing the result of the first message as argument.
The final return value is the number "3".
Remember: unary messages have higher precedence than binary messages

-1 abs + -2 abs

demonstrates precedence rules, when mixing unary and binary messages.
first sends "abs" to the number -1 (minus one) and remembers the result. Then sends "abs" to the number -2 and passes this as argument of the "+" message to the remembered object.
The final return value is the number "3".

1 + 2 sqrt

demonstrates precedence rules, when mixing unary and binary messages.
first sends "sqrt" to the number 2, then passes this as argument of the "+" message to the number 1.
The final return value is the number "2.41421".
Remember: unary messages have higher precedence than binary messages

(1 + 2) sqrt

first sends "+" to the number 1, passing 2 as argument. Then sends "sqrt" to the result.
The final return value is the number "1.73205".

1 min: 2

sends the "min:" (minimum) message to the number 1, passing 2 as argument.
The return value is the number "1" (the smaller one).

(1 max: 2) max: 3

first sends the "max:" (maximum) message to the number 1, passing 2 as argument. Then sends "max:" to the returned value, passing 3 as argument.
The final return value is the number "3" (the largest one).

(1 + 2 max: 3 + 4) min: 5 + 6

first sends "+" to the number 1 passing 2 as argument and remembers the result. Then, "+" is sent to the number 3, passing 4 as argument. Then, "max:" is sent to the remembered first result, passing the second result as argument. The result is again remembered. Then, "+" is sent to the number 5, passing 6 as argument. Finally, the "min:" message is sent to the remembered result from the first max: message, passing the result from the "+" message.
The final return value is the number "7".
Remember: binary messages have higher precedence than keyword messages

1 max: 2 max: 3

tries to send "max:max:" message to the number 1, passing the two arguments, 2 and 3.
Since numbers do not respond to a "max:max:" message, this leads to an error (message-not-understood).

This example illustrates why parentheses are highly recommended - especially with concatenated keyword messages.

'hello' at:1

sends the "at:" message to the string constant.
The return value is the character "h" (which displays itself prefixed by a $ dollar).

'hello' , ' world'

sends the "," binary message to the first string constant, passing another string as argument.
This message is implemented in the String class, and returns a concatenation of the receiver object and its argument.
The returned object is a new string, consisting of the characters 'hello world'.

'hello' , ' ' , 'world'

first sends the "," binary message to the first string constant, passing ' ' as argument. Then, the result gets another "," message, passing 'world' as argument.
The returned object is a new string, consisting of the characters 'hello world'.

#(10 2 15 99 123) min

sends the "min" unary message to an array object (in this case: a constant array literal). All collections respond to the "min" message by searching for its smallest element and returning it.

WorkspaceApplication new open

first sends the "new" unary message to the WorkspaceApplication class object, which returns a new instance of itself. Then, this new instance gets the "open" message, which asks for a window to be shown.

Statements

Multiple message expressions or assignments (see below) may be evaluated in sequence by separating individual expressions with a '.' (period) character. For example:

    -1 negated.
    1 + 2.

first sends the "negated" message to -1 (minus one), ignoring the result. Then, the "+" message is sent to 1 (positive one), passing the number 2 as argument.

Notice that there is actually no need for a period after the last statement (it is a statement-separator) - it does not hurt, though.
We will encounter more (useful) examples for multiple statements below.

Variables

In Smalltalk, a variable holds a reference to some object - we say, a variable "is bound" to some object.
A variable may refer to any object - there is no limitation as to which type of object (i.e. the object's class) a variable may refer to. Every variable is automatically initialized to refer to nil, when created.

Important Note to C, C++ and C# programmers:

Smalltalk variables always hold a reference (pointer) to some object. Every object "knows" its type. It is NOT the pointer, which knows the type of the object it points to. In Smalltalk it is totally impossible to treat a pointer as an integer or as a pointer to something else. There is no such thing like a cast in Smalltalk. Therefore we say, that Smalltalk is a "dynamically strongly typed language". In contrast to C++, which is a "statically weakly typed language".
In Smalltalk, all objects are always and only created conceptionally on the dynamic garbage collected heap storage. There is no such thing as "boxing" or "unboxing". Assignments never copy the value, but instead the reference to the object. When arguments are passed in a message, references are passed.

For now, only global variables and local variables are described (because we need them for more interesting examples); the other variable types will be described later.

Global Variables

Global variables are only used for objects of common interest. Especially, most classes are referred to by a global variable (for the curious: it is possible to create anonymous classes, which are not referred to by a global variable).

Beside classes, only a few other objects are bound to globals; the most interesting for now are:

Transcript
refers to the transcript window. This is a text window to which diagnostic output can be written. We will use the transcript in the examples below.
The most useful messages that can be sent to the transcript window are:

show:something
show a printed representation of something at the current text cursor position and advance the cursor. The argument, Something is typically a string - if any other object is passed as argument, it is converted as appropriate (actually it is asked to generate a printed representation of itself).

cr
move the text cursor of the transcript output window to the beginning of the next line. If required, the text is scrolled.

showCR:something
a short form for show: followed by cr.

flash
flashes the view, to get the user's attentions (try it).
Smalltalk
refers to the set of global variable bindings. It responds to messages to add, remove and query for global variables. Also, the command line arguments, language settings and some other configuration parameters are accessible via this object.
Stdin, Stdout and Stderr
These refer to the standard input, output and error streams. They are usually only relevant for non-graphical (i.e. non-GUI) applications which deal with those input/output streams. When used with a GUI, Input/Output of Smalltalk is usually done through the Transcript or specialized Workspace windows. Under the Windows operating system, the "stx.exe" program has no console and therefore no interactive input/output. Data written to the output streams will be found in the log file (in the user-specific temp directory). When reading from the input stream, and End-of-File condition is signalled.
Logger
refers to a logger object, which handles warning messages and debug output. By default, it is setup to send the messages to the Transcript and/or Stderr. But other loggers are available (eg. to send it to the operating system's syslog or event log).

In general, from a software engineering point of view, the use of global variables for anything other than classes is considered to be bad style.
Making something globally visible is actually not required - we highly recommend using class variables and to provide access to those via access-protocol or to use pool variables (see both below).

Even simple references to the Transcript, UserPreferences or Display screen lead to trouble when multiple threads/sessions/users are to be supported. For this, ST/X provides queries like "Transcript current", "UserPreferences current" or "Screen current", which return thread-local references. So each thread may have its own, private I/O devices and settings.

That said (and kept in mind), being able to access the console via the Transcript is often very helpful: it allows to send debugging and informative messages from the program.
For example:

    Transcript show: 'Hello world'

shows that greeting in the Transcript window, and

    Transcript cr

advances its text cursor to the next line.
There is also a combined message, as in:

    Transcript showCR: 'Hello world'

Finally, to wakeup a sleepy user, try:

    Transcript topView raise.
    Transcript showCR: 'Ring Ring - Wakeup!'.
    Transcript flash.

So, How is a Global Variable Created Then?

A global is created by sending the message at:put: to the global called Smalltalk, passing the name of the new global as a symbol. For example:

    Smalltalk at:#Foo put: 'Now there is a Foo'

and can then be used:

    Smalltalk at:#Foo

or simply:

Foo

if you want Smalltalk to forget about that variable, execute

    Smalltalk removeKey:#Foo

(be careful to not remove one of your loved one's by accident).

Having said this, you now better immediately forget about global variables.

Workspace Variables

When executing (example-) expressions in a workspace window, it is often helpful to be able to refer to an object via a known name (for example, to be able to send messages to it later) but without the danger of overwriting existing global variables which might interfere with Smalltalk internals.
For this, Smalltalk/X provides Workspace variables.
These behave much like global variables, in that their lifetime is not limited to a method or block execution or to a particular instance. However, they are only visible in the context of a workspace's doIt evaluation - they do not conflict with a corresponding globals name.

Workspace variables are created and destroyed via corresponding menu functions in the workspace window. You can also configure the workspace to auto-define any unknown variable as a workspace variable (in the workspace's "Workspace" - "Settings"-menu). That's the way to go for the remainder of this lecture, because it makes your life so much easier.

Be aware of the fact, that workspace variables are invisible to compiled code - i.e. any reference to such a variable from within compiled code will actually refer to a global variable with the same name (which will be seen as nil if it never gets a value assigned to).

Class Variables, Class Instance Variables and Shared Pools

These are variables with a visibility that is limited to a class and its subclasses or a limited group of classes.

Class Variables (now also called "Statics" in new VisualWorks versions)

A "Class Variable" is a single cell which is visible and shared by a class and all of its subclasses.
If some class "A" defines a class variable "Va", this "Va" can be seen, read and written to from all methods of class "A" and all of its subclasses. If "A" changes the value of "Va", the changed value is also seen in all of the subclasses and vice versa.

Class Instance Variables

A "Class Instance Variable" is defined in a class as a slot of the class object and is as such inherited by the subclasses. However, each of the class objects has its own private slot. This is very similar to the definition of an instance variable, which is also defined at one place and then inherited by subclasses. But with each individual instance having a possibly different value in it.
If some class "A" defines a class instance variable "Va", each class gets its own private slot named "Va". If "A" changes the value of its "Va", the other "Va"-slots of all the subclasses are not affected and vice versa.

For a C++, Java or C# programmer, class instance variables are hard to understand, unless they see the class objects as real objects with private slots, protocol etc. This is because none of those languages offers a similar construct.

Pool variables

A "Shared Pool" is a collection of variable bindings which are visible to more than a single class. The pool itself defines the names of the variables it contains, and also provides the initialization code to set the initial values. Then other classes can "attach" to the pool, which makes these variables visible inside the class. Pool variables are readOnly for all places outside the SharedPool itself. They are typically used to hold and provide shared constants, parameters and definitions that do not change.

More info on those variables will be presented below, after classes and metaclasses have been explained.

Instance Variables

Instance variables are private to some object and their lifetime is the lifetime of the object.
We will come back to instance variables, once we have learned how classes are defined.

Local Variables

A local variable declaration consists of an opening '|' (vertical bar) character, a list of identifiers and a closing '|'. It must be located before any statement within a code entity (a doIt-evaluation, block or method; the later being described below).
For example:

    | foo bar baz |

declares 3 local variables, named 'foo', 'bar' and 'baz'.

A local variable's lifetime is limited to the time the enclosing context is active - typically, a method or a block.

When a piece of code is evaluated in a workspace window, the system generates an anonymous method and calls it for the execution. Therefore, a local variable declaration is also allowed with doIt-evaluations (the variable's lifetime will be the time of the execution).

Assigning a Value to a Variable

A variable is "bound to" (made to refer to) an object by an assignment expression.
Assuming that "foo" and "bar" have been declared as variables before, you can assign a value with:

    foo := 1

or:

    bar := 'hello world'

This makes the variable refer to the object as specified by what is written after (to the right of) the assignment symbol. This may be either a literal (i.e. a constant), the value of another variable, or the outcome of a message expression.
Multiple assignments are allowed, as in:

    foo := bar := baz := 1

Notice:

Beginners should be careful to not forget the colon character ":" in ":=".
If you write "=" instead, you will get a binary message send expression which means "is equal to" (i.e. it is a comparison operator).
Therefore,

    foo := baz = 1.

would assign true or false to "foo", depending on whether "baz" is equal to 1 or not.

To make the intention clear, good programmers will often place the right side of the above expression in parentheses,

    foo := (baz = 1).

Even if they are not required, it is a bit easier to read.

All variables are initially bound to nil.
This is the same behavior as found in Java or C#, but opposed to C or C++. You will never get random or even invalid values in a Smalltalk variable.

Keep in mind that only a reference to an object is stored into the variable, not the state of the object itself. This means that multiple variables may refer to the same object.
For example:

    |var1 var2|

    "create an Array with 5 elements ... and assign it to var1"
    var1 := Array new:5.

    "and also to var2"
    var2 := var1.

    "change the 2nd element..."
    var1 at:2 put:1.

    Transcript show:'var1 is '. Transcript showCR:var1.
    Transcript show:'var2 is '. Transcript showCR:var2.

The previous example demonstrates, that both var1 and var2 refer to the same array object. I.e. that in Smalltalk, a variable actually holds a reference to an object, and that more than one variable may refer to the same object
Technically speaking: a variable holds a pointer to the object.

This is especially true with multiple assignments; so:

    foo := bar := 'hello'

binds both "foo" and "bar" to the same string object.

Side Effects

What happened in the above "at:put:" example is called a side effect. As a "side offect" of sending a message to var1, the world changed also as seen from var1. Such side effects are one of the biggest troublemakers in bigger projects, because they are sometimes very hard to find and track down. Also, because their effect will usually be seen much later - when the cause is no longer in sight. One good strategy to prevent such trouble is to write code in a so called "functional coding style". In this, you avoid manipulating an object's state. Instead you create a copy of the original with the change applied. Thus never affecting others that also hold on to the object. Of course, taken to its extreme, such a coding style can lead to a huge overhead, and in practice a compromise has to be taken. However, for strings, simple collections, 2D and 3D coordinates and a number of other simple objects it is usually the way to go.

Assignment to Globals

Be careful when assigning to globals - do not (by accident) overwrite a reference to some other object, especially not to a class by writing:

    Array := nil

To prevent beginners from doing harm to the system, ST/X checks for this situation and gives a warning.
However, other (Smalltalk-) systems may silently perform the assignment and leave you with an unusable system.
Keep in mind, that this danger is also one of the reasons for Smalltalk's flexibility: it allows for almost any existing class to be replaced by your own. So it is a two sided sword: the strict prohibition of such changes (as advocated by Java and C#) might help the absolute beginners, but also hinder those who want to enhance the system with fancy new features. In practice, this has never been really a problem: if it ever happens, restart your last working image, reapply the changes and continue. You will do this mistake only once. Try it, now!

As a general rule:

do not assign to global variables - it is usually a sign of very very bad design if you have to. As you read above and will see below, there are other variable types which can be used in most situations.

Examples

Knowing about variables, we can try more interesting messages:

Ask the Float class for the π (pi) constant:

    Float pi

Ask the Transcript object to raise its top view:

    Transcript topView raise

Ask the Transcript object to flash its view:

    Transcript flash

Ask the WorkspaceApplication class to create a new instance and open a view for it:

    WorkspaceApplication open

Declare a local variable, assign a value and display it on the transcript window:

    |foo|

    Transcript show:'foo is initially bound to: '.
    Transcript showCR:foo.

    foo := -1.
    Transcript show:'foo is now bound to: '.
    Transcript showCR:foo.

    foo := foo + 2.
    Transcript show:'foo is now bound to: '.
    Transcript showCR:foo.

Remember, that a variable may refer to any object.
Thus, the following is legal (although not considered a good style):

    |foo|

    foo := -1.
    Transcript show:'foo is: '.
    Transcript show:foo.
    Transcript cr.
    Transcript show:'and it is a: '.
    Transcript showCR:foo class name.

    foo := 'hello'.
    Transcript show:'foo is now: '.
    Transcript show:foo.
    Transcript cr.
    Transcript show:'and it is a: '.
    Transcript showCR:foo class name.

A rule of wisdom:
do not reuse variables (as in the above case) unless needed for accumulating something. Having an extra variable in a method does not cost anything (neither time, nor space). However, it helps a lot in readability. Sometimes even use a temporary variable just for the name of it, to document what an intermediate result represents.

Cascade Message Expressions

Sometimes, it is useful to send multiple messages to the same receiver.
For example, to add elements to a freshly created collection, you could write:

    | coll |

    coll := Set new.    "/ create an empty Set-collection
    coll add:'one'.
    coll add:'two'.
    coll add:3.

A cascade expression (semicolon) allows this to be written a little shorter: it sends another message - possibly with arguments - to the previous receiver.

The following cascade is semantically equivalent to the above albeit a bit shorter:

    | coll |

    coll := Set new.    "/ create an empty Set-collection
    coll add:'one'; add:'two'; add:3.

Hint and Warning

If you need a reference to such a freshly created collection, you should be aware that the "add:" method returns its argument (for historic reasons beyond my understanding). This means, that the following code does NOT what it looks like:

    | coll |

    coll := Set new
		add:1; add:2.   "/ Attention: add returns its argument

Instead of the expected, it leaves the integer 2 in the variable named "coll", because the assigned value is the value of the last "add:" message.

Because this is a recurring pattern, a method named "yourself" has been added to the Object class. As the name implies, it simply returns itself. Use this as the last message of the cascade:

    | coll |

    coll := Set new
		add:1; add:2;
		yourself.       "/ returns the receiver - i.e. the Set

to prevent the above problem and get the expected value assigned. You may encounter this kind of code at various places in the system.

Blocks

Blocks are one of the most powerful features of the Smalltalk language. They allow for what is called "higher order function" algorithms - that is functions which return functions or get a function as argument.
Take a few minutes to understand this concept, because it is not known to neither C/C++ nor to Java programmers.

A block represents a piece of executable code. Being a "real object" just like any other, it can be stored in a variable, passed around as argument or returned as value from a method - just like any other object. When required, the block can be evaluated at any later time, which results in the execution of the block's statement(s). The fancy thing is that the block's statements can see and are allowed to access all of the surrounding variables. Those which are visible within the static block scope.

For C/C++ and Java programmers:: As a first approximation, regard a block as a reference to an anonymous function, which can be defined without a name, passed to other objects and eventually executed. However, blocks are more powerful, as they have access to variables of their statically enclosing, defining context, and especially: they can return from it.
For Lispers/Schemers:: Blocks are closures (lambdas with access to their static enclosing environment)!

Defining and Evaluating a Block

A block is defined simply by enclosing its statements in brackets, as in:

    | someBlock |

    someBlock := [  Transcript flash ].

later, when the block has to be evaluated (i.e. its statements executed), send it the "#value" message:

    ...
    someBlock value.
    ...

Blocks may be defined with 0 (zero) or more argument(s);
A block with argument(s) is defined by giving the formal argument identifiers after the opening bracket - each prepended by a colon-character. The list is finished by a vertical bar.
For example:

    |someBlock|

    ...
    someBlock := [:a | Transcript showCR:a ].
    ...

defines a block which expects (exactly) one argument.
To evaluate it, send it the "#value:" message, passing the desired argument object.
For example, the above block can be evaluated as:

    someBlock value:'hello'

(here, a string object is passed as argument).

For multiple arguments, declare each formal argument preceeded by a colon. For evaluation, a message of the form "#value:...value:" with a corresponding number of arguments must be used.
For example, the block:

    |someBlock|

    ...
    someBlock := [:a :b :c |
			Transcript show:a.
			Transcript show:' '.
			Transcript show:b.
			Transcript show:' '.
			Transcript show:c.
			Transcript cr
		  ].
    ...

can be evaluated with:

    someBlock value:1 value:2 value:3

Block Evaluation Yields a Value

When evaluated, the return value of the message is the value of the block's last expression.

    |someBlock|

    ...
    someBlock := [:a :b :c | a + b + c].
    ...
    Transcript showCR:(someBlock value:1 value:2 value:3).
    ...

When executed, the above will display "6" on the Transcript window.
Likewise,

    |someBlock|

    ...
    someBlock := [:a :b :c | Transcript showCR:'hello'. a + b + c].
    ...
    result := someBlock value:1 value:2 value:3.
    ...

will assign the numeric value 6 to the result variable.

Notice that blocks are closures; they "close over the variables" of the environment which was active at the time the closure was created. And also that blocks also create such a variable-environment when executed. This means that in the following:

    |actions|

    actions := (1 to:10) collect:[:factor | [:arg | arg * factor] ].
    (actions at:5) value:10.

the "action at:5" retrieves a block which has captured the current value of the factor variable (which was 5) and therefore multiplies the argument by 5.

Blocks have many nice applications: for example, a GUI-Button's action can be defined using blocks, a timer may be given a block for later execution, a batch processing queue may use a queue of block-actions, a background process may be forked to execute a block and a sorted collection may use a block to specify how elements are to be compared.

However, the most striking application of blocks is in defining control structures (like if, while, repeat, loops etc.), and as "higher order functions" when enumerating or processing collections and the like.

Control Structures

Recall, that the above description of the Smalltalk language did not describe any syntax for control-flow - the reason is simple: there is none!
Instead, all program control is defined by appropriate message protocol; mostly in the Boolean, Block and the Collection classes.

Conditional Execution (if)

Conditional execution is defined by the ifTrue: / ifFalse: protocol as implemented by the two boolean objects, which are bound to the globals "true" and "false":

aBoolean ifTrue: aBlock
evaluates aBlock if aBoolean is true
aBoolean ifFalse: aBlock
evaluates aBlock if aBoolean is false
aBoolean ifTrue:trueBlock ifFalse: falseBlock
evaluates trueBlock if aBoolean is true, falseBlock if false
aBoolean ifFalse: falseBlock ifTrue:trueBlock
evaluates trueBlock if aBoolean is true, falseBlock if false

these correspond to the if-then and if-then-else statements in traditional languages.
If you have problems understanding this, then think of the above as being an order to the receiver, saying: "if you are true, then here is some code for you to execute; and if you are false, then there is some other code for you"

So, to compare two variables and send some message to the Transcript window, you can write:

    ...
    (someVariable > 0) ifTrue:[ Transcript showCR:'yes' ].
    ...

of course, you may change the indentation to reflect the program flow;
this is what a C-Hacker (like I used to be) would write:

    ...
    (someVariable > 0) ifTrue:[
	(someVariable < 10) ifTrue:[
	    Transcript showCR:'between 1 and 9'
	] ifFalse:[
	    Transcript showCR:'positive'
	]
    ] ifFalse:[
	Transcript showCR:'zero or negative'
    ].
    ...

and that is how a Lisper (and many Smalltalkers) would write it:

    ...
    (someVariable > 0)
	ifTrue:
	    [(someVariable < 10)
		ifTrue:
		    [Transcript showCR:'between 1 and 9']
		ifFalse:
		    [Transcript showCR:'positive']]
	ifFalse:
	    [Transcript showCR:'zero or negative'].
    ...

Because the above constructs are actually message sends (NOT statement syntax), they do also return a value when invoked. Thus, some Smalltalkers or Lispers would probably prefer a more functional style, as in:

    ...
    Transcript showCR:
	((someVariable > 0)
	    ifTrue:
		[(someVariable < 10)
		    ifTrue:['between 1 and 9']
		    ifFalse:['positive']]
	    ifFalse:
		['zero or negative']).
    ...

Which one you prefer is mostly a matter of style, and you should use the one which is more readable - sometimes, deeply nested expressions can become quite complicated and hard to read.

As a final trick, noticing the fact that every object responds to the #value-message, and that the #if-messages actually send #value to one of the alternatives and return that, you may even encounter the following coding style (notice the non-block args of the inner ifs):

    ...
    Transcript showCR:
	((someVariable > 0)
	    ifTrue:
		[(someVariable < 10)
		    ifTrue:'between 1 and 9'
		    ifFalse:'positive']
	    ifFalse:
		'zero or negative').
    ...

The above "trick" should (if at all) only be used for constant if-arguments and only when using the "if" for its value. With message-send arguments, both alternatives would be evaluated, which has probably not the desired effect. Also be aware that some other objects implement value and will not return themself. Most noteworthy are instances of Association and the ValueModel hierarchy.

Warning:: It is a common beginners error, to forget that the above are really messages to some object and that the argument(s) of an if-message ought to be blocks.
Therefore, except for the above "trick", it is usually an error to use round parentheses instead of brackets.
(the if-expression would evaluate both alternatives and use the condition to choose the returned value.)

Looping (while)

While-loops are defined in the Block class:

aBooleanBlock whileTrue: loopBlock
as long as aBooleanBlock evaluates to true, loopBlock is evaluated
(i.e. repeats evaluating loopBlock as long as aBooleanBlock evaluates to true).
aBooleanBlock whileFalse: loopBlock
as long as aBooleanBlock evaluates to false, loopBlock is evaluated
aBooleanBlock whileTrue
repeats evaluating aBooleanBlock until it returns false
aBooleanBlock whileFalse
repeats evaluating aBooleanBlock until it returns true

Examples:

    |someVar|

    someVar := 1.
    [someVar < 10] whileTrue:[
	Transcript showCR:someVar.
	someVar := someVar + 1.
    ]

Warning:: It is a common beginners error, to forget that the above are really messages to some (in this case) block object and that the receiver of a while-message ought to be a block.
Therefore, it is an error to use round parentheses instead of brackets.
(i.e. "(someVar < 10)" would return a boolean, which does not implement the while messages.)

A nice use of this (and a demonstration of how powerful blocks are) is when the condition block is not static as in the above example, but passed in as an argument to some looping code. For example:

    condition := [ something evaluating to a Boolean ].
    ...

    condition whileTrue:[
	...
    ]

If while-loops are used that way, the condition is typically passed in as an argument or configured in some instance variable.

The above while-loops check the condition at the beginning - i.e. if the condition block evaluates to false initially, the loop-block is not executed at all.

The Block class also provides looping protocol for condition checking at the end (I.e. where the loop-block is executed at least once):

    [
	...
	loop statements
	...
    ] doWhile: [ ...condition... ]

and also:

    [
	...
	loop statements
	...
    ] doUntil: [ ...condition... ]

Endless Loop (forever)

An endless loop is normally not what the programmer wants, except for server processes (which handle incoming requests) or iterative calculations. Such loops can be written as an endless loop, which is left (if at all) by other means (typically by terminating the process, via an exception or by returning from the method which contains the loop).

Of course, an obvious way to write an endless loop is:

    [true] whileTrue:[
	...
	endless loop statements
	...
    ]

However, to document the programmers intention, it it better to use one of the explicit endless loop constructs (#loop or #repeat), as in:

    [
	...
	endless loop statements
	...
    ] loop

or:

    [
	...
	endless loop statements
	...
	someCondition ifTrue:[ ^ something ].
	...
    ] loop

this one demonstrates that a return statement inside a block will actually force a return from the enclosing method. Especially C,C++,C#, Java and JavaScript programmers should raise their eyebrows here.

Finally, take a look at:

    [:exit |
	...
	endless loop statements
	...
	someCondition ifTrue:[ exit value ]
	...
    ] loopWithExit

this one is interesting, as the exit object passed in as argument is exiting the loop when #value is sent to it. Thus, because ifTrue: sends #value to its argument, the loop can also be written as:

    [:exit |
	...
	endless loop statements
	...
	someCondition ifTrue: exit
	...
    ] loopWithExit

Repeating a Number of Times

To repeat the execution a number of times, use:

    n timesRepeat:[
	...
	repeated statements
	...
    ]

where n stands for an integer value (constant, variable or message expression).

Looping Over a Range of Numbers

The traditional (C and Java) loop styles, where a range of numbers is enumerated is also available in Smalltalk:

    |anArray|

    anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).

    1 to: 6 do: [:idx |
	Transcript showCR: (anArray at: idx)
    ].

or, with an increment,

    |anArray|

    anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).

    1 to: 6 by: 2 do: [:idx |
	Transcript showCR: (anArray at: idx)
    ].

However, no real Smalltalk programmer would use "to:do:" to enumerate a collection's elements.
There are many, many useful enumeration messages provided in the collection classes, and we highly recommend that you have a look at them.
A real Smalltalk programmer would instead write:

    |anArray|

    anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).

    anArray do:[:eachElement |
	Transcript showCR: eachElement
    ].

Notice that this example also demonstrates good vs. bad resuability of the code: the first version (using to:do:) uses a numeric-index-based address to fetch each element. This implies that the collection must be some kind of numerically-sequenceable collection. The second version simply leaves that decision to the collection itself. It will therefore work with any kind of collection (lists, trees, hashtables, sets, etc.). Of course, in the above example we hardcoded an array as receiver, which is known to allow access via a numeric index. However, in practice, the collection is often coming from elsewhere via a message argument or variable value. In that case, a changing collection representation in other parts of the program will not affect the enumeration loop.

Open a browser and look at the implementation of #reverseDo:, #collect:, #detect:, #select:, #findFirst: etc.

Hint:: It is very common for beginners to use simple "do"- or even "while"-loops with indexing to enumerate elements for element searching or processing.
Please do have a look at the full enumeration protocol and browse for uses of them. It really helps, saves code and avoids bugs. In addition, many of the enumeration messages are implemented in a much more efficient way than na�ve loop code would be. Do not reinvent the wheel!

Exception Handling

Errors and Exceptions can be handled programmatically:

    [
	'nonExistingFile' asFilename contents
    ] on:Error do:[:exceptionInfo |
	Transcript showCR:(exceptionInfo description).
    ].

The above code should be read as a demonstration example; it catches any error, not only file-not-found exceptions. In practice, more specific handlers are usually setup:

    [
	10 / 'nonExistingFile' asFilename contents size
    ] on:StreamError do:[:exceptionInfo |
	Transcript showCR:(exceptionInfo description).
    ] on:ArithmeticError do:[:exceptionInfo |
	Transcript showCR:('Oops: ',exceptionInfo description).
    ].

(Hint: create a file named 'nonExistingFile' and try it)

In Smalltalk, a handler may decide to repair things, and either restart the computation, or proceed:

    [
	|input divisor|

	input := Dialog request:'Enter a divisor (try 0)'.
	divisor := input asNumber.
	Dialog information:'The result is ',(10 / divisor) asString
    ] on:ArithmeticError do:[:e |
	Dialog information:'Mhmh - I will proceed with 0'.
	e proceedWith:0.
    ].

it may also decide to not handle the error, and pass it on to either another handler or the default exception handler (called "rejecting the error"), which typically opens a debugger:

    [
	|input divisor|

	input := Dialog request:'Enter a divisor (try 0)'.
	divisor := input asNumber.
	Dialog information:'The result is ',(10 / divisor) asString
    ] on:ArithmeticError do:[:e |
	(Dialog confirm:'Proceed with 0 or debug?') ifTrue:[
	    e proceedWith:0.
	].
	e reject.
    ].

You can also combine exception into a so called "handler set" and handle a bunch of otherwise unrelated (meaning:" not inheriting from each other") with a common handler:

    [
	|input divisor|

	input := Dialog request:'Enter a divisor (try 0)'.
	divisor := input asNumber.
	Dialog information:'The result is ',(10 / divisor) asString
    ] on:(ZeroDivide, StreamError, ImaginaryResultError) do:[:e |
	(Dialog confirm:'Proceed with 0 or debug?') ifTrue:[
	    e proceedWith:0.
	].
	e reject.
    ].

Ensuring Cleanup Actions

In some situations, a cleanup action is required to be always performed, even if some operation gets aborted (by an exception or a user interrupt). Typical situations are closing a file, closing a window or turning off some device. For this, use an ensure block, which will be evaluated after some other action, even if the action gets aborted (unwound). This is similar to the "finally" construct of Java or the "unwind-protect" of Scheme.

    |s|

    [
	s := 'someFile' asFilename writeStream.
	Transcript showCR:'start writing...'.
	s nextPutLine:'hello'.
	"/ now, an error occurs and a debugger is opened
	self error:'please abort (here or in the debugger)'.
	"/ so this line is not executed:
	Transcript showCR:'not reached'.
    ] ensure:[
	"/ but this is:
	Transcript showCR:'cleaning up'.
	s close.
	'someFile' asFilename remove.
    ].

There is also a combined handler+ensure method which corresonds to other language's try-catch-finally statement:

    [
	some action
    ] on:Error do:[
	error handler
    ] ensure:[
	cleanup action
    ]

Higher Order Functions

The term "Higher Order Function" originated in the functional programming area and refers to functions which expect functions as arguments or return them as their values. Higher order functions are natural in programming languages where functions and executable code can be treated like any other object (are so called "first class citizens").

Smalltalk's blocks are perfectly well suited for this style of programming, because they allow for all of the above. And actually, they are used heavily as arguments in the collection class protocol.

Looping Over Elements of a Collection (enumerating)

All collection classes (Array, Set, Dictionary etc.) provide for messages to enumerate their elements and evaluate a given block for each of them. The most useful of those enumeration messages is:

aCollection do: aOneArgBlock

For example, the enumeration of an array's elements is easily done as in:

    |anArray|

    anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).
    anArray do:[:eachElement | Transcript showCR:eachElement ].

of course, you should indent the code to reflect the intended control flow. With C-style indentation the code looks as:

    |anArray|

    anArray := #( 'one' 'deux' 'drei' 'quatro' 5 6.0 ).
    anArray do:[:eachElement |
	Transcript showCR:eachElement
    ].

The Power of Blocks: a Concrete Example

To give you a rough idea of how powerful these blocks are, here is a piece of code to enumerate all files of a directory, split each into words, count them and give a histogram of the top-10 used words:

    |bag mostUsed|

    bag := Bag new.

    '../../doc/online/english/getstart' asFilename directoryContentsAsFilenames
	select:[:eachFile | eachFile isDirectory not]
	thenDo:[:eachFile |
	    eachFile contents do:[:eachLine |
		bag addAll: eachLine asCollectionOfWords.
	    ].
	].

    mostUsed := (bag valuesAndCounts asArray sort:[:a :b | a value > b value ]) first:10.

    CodingExamples_GUI::HistogrammView new
	extent:500@300;
	labels:(mostUsed collect:[:eachPair | eachPair key storeString]);
	values:(mostUsed collect:[:eachPair | eachPair value]);
	open.

The higher-order functions used are:

as argument1 to select:thenDo:
as argument2 to select:thenDo:
as argument to do:
as argument to sort:
as argument to collect:
as argument to collect:

That makes 7 uses as higher order function - some even nested. Notice how nicely the block syntax fits the select operation - it looks almost like mathematics. You are welcome to try this without blocks, by using streams or even explicit loops.

The Power of Blocks: Another Example

As another demonstration of how powerful blocks are, here is a code fragment to measure the execution times of a function (a block), and to show it as histogram. The operation to be measured counts the number of $x characters in a string. It is repeated a million times to get measurable time durations.

    |function measureData|

    function :=
	    [
		1000000 timesRepeat:[
		    'abcdefxghijklxmn' occurrencesOf:$x
		]
	    ].

    measureData := (1 to:30)
			collect:[:n |
			     Time millisecondsToRun: function.
			].

    CodingExamples_GUI::HistogrammView new
	extent:750@400;
	labels:nil;
	values:measureData;
	open.

Notice again, that higher order functions are used as the function itself, with the timesRepeat and with the collect: expressions.

Syntactic Sugar

The term "Syntactic Sugar" is used for language constructs that do not offer new functionality (i.e. new semantic), but solely exist to make the life of the programmer easier. Typically these are syntactic variants of other constructs created to be shorter (for faster coding) and/or to be more readable (for better understandability).

Historically, due to its very readable, English-like syntax, Smalltalk does not have lots of syntactic sugar. Everything was expressed as message-sends to objects. This includes class- and method-definition, variable initialization, looping, exception handling etc.

In contrast, most other programming languages typically provide separate syntactic constructs for each of the above mentioned issues (lisp being a well-respected exception here). The only existing syntactic sugar is the additional message syntax for binary selectors (which was added to make mathematic expressions more readable) and the cascade message.

Brace-Array-Constructor

In more recent times, one syntactic construct which originated in the Squeak community, has become very popular: the "brace-constructor" to construct arrays at run time.
Its syntax form is:

    { expression1 . expression2 .  ... expressionN }

to construct a new Array (at runtime) with N elements, computed by the corresponding expressions. Please notice the separating periods (to separate the expressions).
For example:

    { 'str'.  Date today.  Time now. 1.  #sym }

creates a 5-element array at run time.

Notice that the brace-constructor shows the same behavior as a multi-new-message to the Array class, or (for more than a small number of elements), for an "Array new:" followed by a bunch of at:put: messages; in other words: it is equivalent to:

    Array with:expression1 with:expression2 ... with:expressionN

but without the restriction on the max. number of arguments.

Thus, the above is equivalent to:

    (Array new:5)
	at:1 put:'str';
	at:2 put:(Date today);
	at:3 put:(Time now);
	at:4 put:1;
	at:5 put:#sym;
	yourself

If you use this feature, be aware that "#( )" and "{ }" both return an empty array. However, the array returned by "#( )" has been created at compilation time, and the same identical object will be returned, whenever the "#( )"-expression is evaluated again.
In contrast, every evaluation of "{ }" will construct and return a new Array at runtime.

Notice by the author: I personally have one critique on the brace constructor: why should the Array class be so special as to justify a special syntactic sugar construct? Most collections in real life are variable in size, so creating an OrderedCollection could pretty much the same be justified. But then, why exclude Set, Dictionary and all other fancy collections? Why exclude Matrix or Vectors? In addition, those with a functional background would definitely love to have a simple constructor for Lisp-like linked lists or cons-objects.
In other words: the brace constructor seems to be a quick hack for a single programmer's needs (lazyness?). It should have been more thought-through, for a more generic solution, before finding its way into thousands of methods.
(an alternative possible syntax could have been: "<ClassName>{ ... }")

Strings with Embedded (Sliced in) Expressions

It is very common, that messages must be constructed from parts which are evaluated dynamically. For example, consider the following:

    Transcript
	showCR:'Today is ',Date today asString,' and the time is ',Time now asString.

this concatenates a longer string from the four parts, of which two are computed dynamically.

This becomes especially ugly, if you have to consider national language variants; and even more so, as not all languages will order the parts of the sentence the same way. For example, in German, you may want to write:

    Transcript
	showCR:'Heute ist der ',Date today asString,' und es ist ',Time now asString,' Uhr'.

For this, ST/X provides a national language translation mechanism, which is based on a getter named 'resources', which is understood by all classes and all application instances. You can give it a string, which it will translate as set by the current language setting. Different sentence ordering is supported, by passing in the english string, with placeholders for the parts to be filled in:

    Transcript showCR:(
	self class resources
	    string:'Today is %1 and the time is %2'
	    with:(Date today)
	    with:(Time now))

this is very flexible, in that you can add a resource file named "de.rs" and add the translation for:

'Today is %1 and the time is %2' 'Heute ist der %1 und es ist %2 Uhr'

or even change the sentence structure in German to:

'Today is %1 and the time is %2' 'Es ist %2 am %1'

Hoever, as the above code looks rather ugly, ST/X provides syntactic sugar for national language strings; you can also write:

    Transcript showCR: i'Today is {Date today} and the time is {Time now}'

You can also embed newlines and other special characters in a C-language fashion:

    Transcript show: i'Today is {Date today} and the time is {Time now}\n'

The "i"-prefix before the string tells the compiler that this is a string with embedded expression, which is to be translated via the resources (a so called "internationalized" or "i-string" for short).
There is also an untranslated variant (name "expanded" or e-string):

    Transcript show: e'Today is {Date today} and the time is {Time now}\n'

Class Library

Now, we reached a point, where we realize that the key to becoming a Smalltalker lies in the knowledge of the system's class library. Although this is true for all big programming systems, it is even more true for Smalltalk, since even control structures and looping is implemented by message protocol as opposed to being a syntax feature.

No programming is possible if you don't know the protocol of the classes in the system, or at least part of it.
To give you a starting point, we have compiled a list of the most useful messages as implemented by various classes in the ``list of useful selectors'' document.

A rough overview of the most common classes and their typical use is found in the "Basic Classes Overview". Please, read this document now.

Continue in "Playing with objects".

<cg at exept.de>

Doc $Revision: 1.26 $ $Date: 1999/10/14 13:13:06 $