Smalltalk offers various ways to store and retrieve objects to/from the external world.
Beside the wellknown #storeOn:
method, binary storage is supported by any object.
Binary storage is both more dense (i.e. requires less space) and faster than
textual storage in the normal case. In addition, the format used by the binary storage mechanism
allows recursive and cyclic objects to be handled correctly,
which is not possible with the ascii representation used by the normal #storeOn:
mechanism.
The disadvantages are that (1) the binary storage format is proprietary to each Smalltalk dialect, and communication with other dialects is usually not possible "out-of-the-box". (2) the binary format "knows" the stored object format, and conversion may be needed, if stored objects are loaded after a class has changed its instance layout.
In Smalltalk, all classes support the #storeOn:
message, which
asks the object to append a textual (i.e. ascii) representation of itself to
a stream, from which a copy of this object can be reconstructed.
This scheme works for simple objects, which do NOT contain self references or cycles.
Also, this format is compatible among different Smalltalk implementations, if
the layout of the instances is the same across them (i.e. the instance's class exists with
the same instance layout on the target system).
For example:
stores the array (myObject) in the file named "
|myObject outStream|
myObject := Array with:'hello world'
with:1.2345
with:#(1 2 3 4 5)
with:('one' -> #one).
outStream := 'data' asFilename writeStream.
myObject storeOn:outStream.
outStream close.
data
".
If you inspect this file, you will notice that it contains a single Smalltalk
expression (in textual representation) which when evaluated recreates the original array.
From this, the object can be reconstructed
by asking the compiler to evaluate that expression:
the above has been wrapped into an easier to use method, which is understood
by any class:
|string|
string := 'data' asFilename readStream contents asString.
myObject := Compiler evaluate:string.
myObject inspect.
or, alternatively, reading directly from the stream:
|string|
string := 'data' asFilename readStream contents asString.
myObject := Object readFromString:string.
myObject inspect.
Thus, any object can be stored by sending it
|inStream|
inStream := 'data' asFilename readStream.
myObject := Object readFrom:inStream.
myObject inspect.
#storeOn:
and retrieved by sending
#readFrom:
to Object or a class.
|original retrieved hello inStream outStream|
hello := 'hello'.
original := Array with:hello
with:hello.
outStream := 'data.txt' asFilename writeStream.
original storeOn:outStream.
outStream close.
inStream := 'data.txt' asFilename readStream.
retrieved := Object readFrom:inStream.
inStream close.
Transcript showCR:
(original at:1) == (original at:2). "evaluates to true"
Transcript showCR:
(original at:1) = (original at:2). "obviously evaluates to true"
Transcript showCR:
(retrieved at:1) == (retrieved at:2). "evaluates to false"
Transcript showCR:
(retrieved at:1) = (retrieved at:2). "evaluates to true"
The above limitation makes that mechanism unusable when objects are to be stored which
depend on identity (for example: IdentityDictionaries, IdentitySets and some others)
For example:
The above is of course a non realistic example; however, objects like trees,
doubly linked lists etc. are typical examples of self referencing objects.
|original retrieved hello inStream outStream|
original := Array new:3.
original at:1 put:'hello'.
original at:2 put:'world'.
original at:3 put:original.
outStream := 'data' asFilename writeStream.
original storeOn:outStream.
outStream close.
inStream := 'data' asFilename readStream.
retrieved := Object readFrom:inStream.
inStream close.
Inspector openOn:original title:'original'.
Inspector openOn:retrieved title:'retrieved'.
You will notice a message on the Transcript, when the object is stored,
and find the self reference being lost in the reconstructed object
when inspecting it.
(i.e. in Smalltalk/X, cycles are detected in the store method,
and a warning is sent to the Transcript.
Notice, that not all Smalltalk implementations allow the above expression
to be evaluated - some will be cought by the endless-recursion trap and/or crash)
storeOn:
" as a programmer, you have two choices: either use the default implementation
as inherited from "Object
"), or redefine this method in your class.
Therefore, you would have to redefine the "storeOn:
" method, to create a
storeString, which does not depend on the instance variable offsets.
This can be done by generating a storeString, which invokes setter-methods,
instead of the "instVarAtPut:" as generated by the default implementaton).
This would allow future class versions to retrieve previously stored objects.
But never forget to update the "storeOn:
" method,
whenever any new instance variables are added to or removed from the class
(when an instvar is removed, leave empty "do-nothing" setters in the class;
those might still get called when reading old objects).
In summary: there is a potential for future trouble. However, by generating a storeString consisting of setters only, you have total freedom in how the object may be represented in the future.
The literalArrayEncoding is also very useful to store objects and descriptions in the program itself - that's what the windowSpec methods are actually for - they simply return an array which describes the original (spec-) object.
This store format is independent of the object's instance variable order.
When an object is later retrieved,
the setter methods are invoked for the values that were present at store time.
This means, that even future changed classes can provide a backward compatibility protocol.
Using a literalArrayEncoding, you can store your objects with:
(or, you may want to provide a storeLiteralOn method in your class(es), as:
anObject literalArrayEncoding storeOn:aStream
storeLiteralOn:aStream
self literalArrayEncoding storeOn:aStream
Perfectionists may even use the already present pretty-printer from the
GUI framework, to create a nice, indented storeFormat:
and:
storeLiteralPrettyOn:aStream
UISpecification
prettyPrintSpecArray:self literalArrayEncoding
on:aStream
indent:0
storeLiteralPrettyOnFile:aFilename
|s|
s := aFilename asFilename writeStream.
self storeLiteralPrettyOn:s.
s close.
Retrieval is by:
fromLiteralFile:aFilename
|s arr|
s := aFilename asFilename readStream.
o := Array readFrom:s.
s close.
^ o decodeAsLiteralArray
This format does solve some of the storeOn: problem, but still cannot handle recursive or self referencing object structures. Also, it does not preserve object identity. However, it may be suitable for many simple applications.
Sample code is found in the file: 'doc/coding/StoringObjectsAscii-example.st'. It contains a class named "User" which stores and retrieves instances using literalArray encoding. File this in using the FileBrowser and explore it in the SystemBrowser.
The JSON support classes are provided in a separate package, so you may have to
load it first with:
to write objects, use a JSONPrinter, to read, a JSONReader:
Smalltalk loadPackage:'stx:goodies/json'
for more information, refer to their class documentation and examples found there.
Be aware that the set of supported objects which can be stored/retrieved is very limited:
basically, they must be Numbers, Strings, Boleans, Arrays and Dictionaries thereof.
|o1 s o2|
o1 := Dictionary withKeysAndValues:#('one' 1 'two' 2 'three' 3.0 'four' 'vier').
s := JSONReader toJSON:o1.
o2 := JSONReader fromJSON:s
If required, convert the objects into some Dictionary format, and store/retrieve those. Data can only be stored by value - no references (and definitely no recursive references) can be stored or retrieved.
In contrast to the above described #storeOn:
format,
this format is not meant to be human readable.
Also, since it uses all 8 bits of a byte, it may not be possible
to send binary encoded objects directly via some ancient transport mechanisms
(i.e. old electronic mail transports which only support 7bit ascii
and not using uuencode). A limitation which is probably no longer present, these days.
Binary storage has the disadvantage that it is not compatible between different Smalltalk implementations. Although all Smalltalk dialects do support some form of binary object storage with similar functionality, no common encoding standard exists.
It is used in pretty much the same way as above, simply replace
#storeOn:
by #storeBinaryOn:
and
#readFrom:
by #readBinaryFrom:
:
The above can be used on any stream which supports reading/writing of bytes.
(i.e. a WriteStream on a ByteArray, FileStreams, Sockets, Pipes etc.).
|original retrieved hello inStream outStream|
hello := 'hello'.
original := Array with:hello
with:hello.
outStream := 'data.bos' asFilename writeStream binary.
original storeBinaryOn:outStream.
outStream close.
inStream := 'data.bos' asFilename readStream binary.
retrieved := Object readBinaryFrom:inStream.
inStream close.
Transcript showCR:
(original at:1) == (original at:2). "evaluates to true"
Transcript showCR:
(retrieved at:1) == (retrieved at:2). "evaluates to true"
The binary storage mechanism handles cyclic or self referencing structures, preserving object identity. It does so by assigning unique object IDs (i.e. integers) to stored objects. It keeps track of previously assigned IDs, and writes the ID of previously encountered objects if an object is to be stored which was already stored before. In addition to preserving object identity, this also creates a more compact output, as each individual object's contents is only stored once. (The process of converting an arbitrary graph of objects into a flat sequence is also refered to as flattening or marshalling.)
At retrieval time, the reverse is done, keeping track of objectIDs as objects are restored and reconstructing the original references from the ID.
Example (storing the above self-referencing object):
looking into the retrieved object in the inspector, you will find that the
original self reference was correctly reconstructed.
|original retrieved hello inStream outStream|
original := Array new:3.
original at:1 put:'hello'.
original at:2 put:'world'.
original at:3 put:original.
outStream := 'data.bos' asFilename writeStream.
original storeBinaryOn:outStream.
outStream close.
inStream := 'data.bos' asFilename readStream.
retrieved := Object readBinaryFrom:inStream.
inStream close.
Inspector openOn:original title:'original'.
Inspector openOn:retrieved title:'retrieved'.
PersistencyManager
, which implements a dictionary-like
protocol and allows storage and retrieval of objects by a key.
The low-level mechanism used by PersistencyManager
is based upon
the "db-1.6" berkeley database library which is a successor of
the well known "dbm/ndmb" library.
Using PersistencyManager
, objects can be stored with:
and retrieved with:
...
manager := PersistencyManager file:'<somefileName>'.
...
manager at:<someKey> put:<someObject>.
...
manager close
The
...
manager := PersistencyManager file:'<somefileName>'.
...
<someObject> := manager at:<someKey>.
...
manager close
#at:
/ #at:put:
interface is especially convenient,
as you can test your application using in-memory dictionaries first,
and switch to an external database later.
Like with ordinary Dictionaries
, any object is allowed as key.
Example (storing):
(in a real-world application, you would create a PersonRecord class,
and store its instances - instead of dictionaries).
Example (retrieving):
|manager record|
manager := PersistencyManager file:'sampleData'.
record := IdentityDictionary new.
record at:#firstName put:'Joe'.
record at:#lastName put:'Sampleman'.
record at:#age put:35.
record at:#salary put:75000.
record at:#personalID put:123456.
manager at:(record at:#personalID) put:record.
record := IdentityDictionary new.
record at:#firstName put:'Boris'.
record at:#lastName put:'Jelzin'.
record at:#age put:99.
record at:#salary put:175000.
record at:#personalID put:34561.
manager at:(record at:#personalID) put:record.
record := IdentityDictionary new.
record at:#firstName put:'Tony'.
record at:#lastName put:'Friedman'.
record at:#age put:25.
record at:#salary put:35000.
record at:#personalID put:78905.
manager at:(record at:#personalID) put:record.
manager release.
|manager record|
manager := PersistencyManager file:'sampleData'.
record := manager at:78905.
manager release.
record inspect
PersistencyManager
does not provide the functionality of
a real database - it is just a goody thrown in, for simple applications.
Don't blame us for this - after all, this is a free goody.
First, build the "lastName to personID" mapping (ignoring duplicates, for simplicity):
retrieve the nameToIDMapping first, and use this to fetch records by name:
|manager nameToIDMapping|
manager := PersistencyManager file:'sampleData'.
nameToIDMapping := Dictionary new.
manager do:[:record |
"/ ignore non-person objects
(record includesKey:#personalID) ifTrue:[
nameToIDMapping
at:(record at:#lastName)
put:(record at:#personalID)
]
].
"/ store the mapping under a special key
manager at:#nameToIDMapping put:nameToIDMapping.
manager release.
|manager nameToIDMapping record|
manager := PersistencyManager file:'sampleData'.
nameToIDMapping := manager at:#nameToIDMapping.
record := manager at:(nameToIDMapping at:'Friedman' ifAbsent:nil).
manager release.
record inspect
You may skip this section - and use binary storage while ignoring these internals.
A binary object stream consists of a sequence of typeID bytes
and objectID bytes.
The typeID specifies how following bytes are to be interpreted.
Basically, there are four major typeIDs:
As an example, the binary representation of:
looks like:
s := 'hello'.
Array
with:1
with:s
with:s.
classDefinition
ID: 1
name: 'String'
...
objectDefinition
ID: 2
classID: 1
contents: 'hello'
classDefinition
ID: 3
name: 'Array'
...
objectDefinition
ID: 4
classID: 3
contents:
specialObject(SmallInteger) 1
objectReference ID: 2
objectReference ID: 2
(the above is a conceptional picture - the real encoding is somewhat different)
The interesting thing is that classes are stored by name, not by contents. This is done to limit the amount of stored data.
If this was not done, and the classes structure be treated like any other object instead, a binary store would trace & dump all classes along the object's superclass chain; thereby dumping class variables, metaclasses and in most cases traverse the full set of existing objects. (Because it may encounter the list of global variables in theSince classes are stored by name, a corresponding class must be available at reconstruction time (see below on how the system behaves if that is not the case). To catch the case of changed class layouts, additional information (a so called signature) is written with the name in a classDefinition block. This signature is checked against the existing classes signature at reload time and an exception is raised if they do not match.Smalltalk
object - from which almost every other object can be reached.)Obviously, this is not a behavior we want (it is cheaper to save a snapshot image to get this ;-).
Also, every binaryStore and binaryRead
operation starts with a new, empty association table and saves
class definitions again
(assuming that the objects stored in individual #storeBinaryOn:
operations are to be reconstructed using individual #readBinaryFrom:
later).
Therefore, there is a big difference in the time/space requirements
of the following two examples:
and:
|array element outStream|
element := 1@1.
array := Array new:1000 withAll:element.
outStream := 'data1.bos' asFilename writeStream.
array storeBinaryOn:outStream.
outStream close.
the first stores the definition of the Point class only once, reusing it
for every stored point.
The second stores this class definition once for each individual
point.
|array element outStream|
element := 1@1.
array := Array new:1000 withAll:element.
outStream := 'data2.bos' asFilename writeStream.
array do:[:el |
el storeBinaryOn:outStream.
].
outStream close.
Looking at the size of the created file shows this. The first requires 1.9Kb,
while the second requires 24.4Kb. Also, the times required to store/load the data
are quite different: 130ms vs. 2800ms (stored via NFS to a disk on a remote machine.
Your actual numbers will be different, but the ratio should be alike).
The second example has the advantage, that individual elements can be read from the file (if you remember the file positions). In contrast, the first examples' data can only be reconstructed as a whole array.
In some cases, you may want to avoid the above overhead,
AND store data while reusing information about previously stored classes/objects.
This makes sense, if:
|array element outStream manager|
element := 1@1.
array := Array new:1000 withAll:element.
manager := BinaryOutputManager new.
outStream := 'data3.bos' asFilename writeStream binary.
array do:[:el |
el storeBinaryOn:outStream manager:manager.
].
outStream close.
manager release.
loading:
|array element inStream manager|
array := Array new:1000.
inStream := 'data3.bos' asFilename readStream binary.
manager := BinaryInputManager on:inStream.
1 to:array size do:[:index |
array at:index put:(manager nextObject).
].
inStream close.
array inspect
As a concrete example, consider the case, where you have a tree of person
objects, consisting of firstName, lastName and whatever,
but you only want to binaryStore the firstName values of each node:
|tree outStream manager|
...
manager := BinaryOutputManager new
outStream := 'namedata.bos' asFilename writeStream binary.
tree inOrderDo:[:aNode |
|name|
name := aNode firstName.
name storeBinaryOn:outStream manager:manager.
].
outStream close.
manager relase.
and reconstruct the tree with the names only:
|tree name inStream manager|
...
tree := NameTree new.
...
inStream := 'namedata.bos' asFilename readStream binary.
manager := BinaryInputManager on:inStream
[inStream atEnd] whileFalse:[
name := manager nextObject.
tree insertNode:(PersonNode for:name).
].
inStream close.
...
using a little trick, it is also possible to extract individual objects
from this dataStream;
to do this, you have to read and skip all objects before the one to
be reconstructed
(to let the manager build up its id information table).
|array element outStream manager|
array := ((1 to:1000) collect:[:i | i @ i]) asArray.
manager := BinaryOutputManager new.
outStream := 'data3.bos' asFilename writeStream binary.
array do:[:el |
el storeBinaryOn:outStream manager:manager.
].
outStream close.
manager release.
and reads the 400th point:
|element inStream manager|
inStream := 'data3.bos' asFilename readStream binary.
manager := BinaryInputManager on:inStream.
399 timesRepeat:[manager nextObject].
element := manager nextObject.
inStream close.
element inspect
the inputmanager offers a (slightly faster) skipObject
method for skipping:
|element inStream manager|
inStream := 'data3.bos' asFilename readStream binary.
manager := BinaryInputManager on:inStream.
399 timesRepeat:[manager skipObject].
element := manager nextObject.
inStream close.
element inspect
Since all class and object definitions still have to to
be processed, do not expect skipObject
to be dramatically faster than nextObject
.
Notice: this is also true with textual storage for most classes, since the defaultstoreOn:
as defined in theObject
class stores a description which reconstructs the object based oninstVarAt:put:
. Of course, this also reconstructs a wrong object if the relative offsets of instance variables have changed. (if you want to take precautions against this, reimplement thestoreOn:
method in your classes, to not create instVarAt:put: expressions, but write expressions sending instance variable access messages instead.)To avoid this, some classes redefine
storeOn:
and create an expression based on an instance variables name.
Smalltalk/X offers an error handling mechanism to catch situations when an object is restored for which no valid class exists. As usual, the error is signalled using the exception mechanism, by raising some signal (see ``Exception handling and signals'').
It is possible to handle these signals and either:
BinaryIOManager binaryLoadErrorSignal
BinaryIOManager invalidClassSignal
BinaryIOManager nonExistingClassSignal
BinaryIOManager changedInstLayoutSignal
BinaryIOManager changedInstSizeSignal
BinaryIOManager changedIndexedInstSignal
proceed
without a return value (i.e. with nil)
ObsoleteObject
)
and the loaded object is made an instance of it.
Instances of ObsoleteObject
(and therefore the retrieved object
as well) will trap on all messages into a messageNotUnderstood
exception.
This allows for your obsolete objects to be loaded (at least) and
be manually fixed later, or by walking over all derived instances
of ObsoleteObject
and fixing things in an inspector, or by a converter procedure.
In other words: the contents of those objects is available
- without the semantic.
proceed
with a class as return value
ObsoleteObject
).
return:
, abort
or terminate
restart:
#fileIn:
)
and restart the whole binary load operation.
ObsoleteObject
as parameter;
this allows for the handler to decide for every detected class individually,
how things are to be handled. That class is named after the original classes
name, and has all required meta information at hand; especially,
instance size and names of instance variables may be of interest.
After a proceed, the handler will not be called again for the same class; any further retrieved objects of the same class will be silently made instances of the same class (either as obsolete, or whatever the handler returned in the first place).
Examples:
Abort the binary load on any error:
in the above, the binary read will be aborted, and nil be left in
data.
|inStream data|
...
inStream := .... asFilename readStream binary.
...
BinaryIOManager binaryIOError handle:[:ex |
"
other error (such as corrupted file etc.)
"
Transcript showCR:'some other error occured in binary load'.
Transcript showCR:'abort the load ...'.
ex return.
] do:[
BinaryIOManager invalidClassSignal handle:[:ex |
|oldClass|
oldClass := ex parameter.
Transcript showCR:'cannot restore instance of ' , oldClass name.
Transcript showCR:'reason: ' , ex signal notifierString.
Transcript showCR:'abort the load ...'.
ex return.
] do:[:
data := Object readBinaryFrom:inStream.
]
].
...
s close.
...
Ignoring the error to return an obsoleteObject:
in the above, data may contain an instance of a subclass of
|inStream data|
...
inStream := .... asFilename readStream binary.
...
BinaryIOManager binaryIOError handle:[:ex |
...
] do:[
BinaryIOManager invalidClassSignal handle:[:ex |
|oldClass|
oldClass := ex parameter.
Transcript showCR:'cannot restore instance of ' , oldClass name.
Transcript showCR:'reason: ' , ex signal notifierString.
Transcript showCR:'continue with obsolete object...'.
ex proceed.
] do:[:
data := Object readBinaryFrom:inStream.
]
].
...
s close.
...
ObsoleteObject
.
This object will not be usable, since it traps on most messages into a
messageNotUnderstood exception.
However, it will contain the original values, so manual or programatic
conversion is possible.
(a concrete application could provide some kind of database conversion
procedure to convert all obsoleteObjects into something useful.)
Return a replacement class and retrieve these objects as instances of that:
See example code in
|inStream data|
...
inStream := .... asFilename readStream binary.
...
BinaryIOManager binaryIOError handle:[:ex |
...
] do:[
BinaryIOManager invalidClassSignal handle:[:ex |
|oldClass|
oldClass := ex parameter.
Transcript showCR:'cannot restore instance of ' , oldClass name.
Transcript showCR:'reason: ' , ex signal notifierString.
Transcript showCR:'return as instance of another class ...'.
ex proceedWith:ReplacementClass.
] do:[:
data := Object readBinaryFrom:inStream.
]
].
...
s close.
...
"doc/coding/BOSS-errors"
.
To do so, the binaryLoader will raise the requestConversion
exception, passing the existing class and the obsolete object as
arguments to the exception handler.
The handler should somehow try to convert the obsolete object
and proceed with the new object as value.
This conversion signal is only raised by the binary loader if an exception handler is present; therefore, not handling (or ignoring) the conversionSignal results in obsoleteObjectes to be returned from the binary load (as described above).
Also, since any invalidClass exceptions are raised before
any conversion is tried, these must be handled as described above.
The reason is that during binaryStore/binaryRead, classes are written/encountered first,
before any instances. Therefore, all class related exceptions will occur first;
but only once per class, since classes (like any other object) are only stored once.
Conversion requests are signalled for each individual obsolete object being loaded (in contrast to the above invalidClass signals, which are only signalled once per class).
The existing (new) class can provide a conversion method
(#cloneFrom:
),
which should create and return
a new instance of itself based on some a template object.
Here, the template object is the obsolete object as retrieved from the binary
load.
A default #cloneFrom:
method is provided, which creates an
object with all named and indexed instance variables preserved.
However,
for special needs, your class may redefine this method and do whatever is
required for conversion (or even decide to return nil ...)
For more details,
see example code in "doc/coding/BOSS-errors"
#representBinaryOn:
and #readBinaryContentsFromData:manager:
.
#representBinaryOn:
#readBinaryContentsFromData:manager:
is supposed to set the instance variables from an array (as previously returned
by #representBinaryOn:
).
"doc/coding/BOSS-special"
Since views require a process for proper operation (the windowgroup process), this limitation results in the inability to store and retrieve views.
Copyright © 1995 Claus Gittinger, all rights reserved
<cg at exept.de>