CParser & CTypes

Introduction
Area of use
Overview
Parsing C-Types
Meta knowledge of C-Types
Allocating C-Data
Manipulating C-Data
Pointers into C-Data
ByteOrder issues
Examples

Introduction

The CParser and CType class hierarchy provide a framework to read C-Language header files containing type declaration and #define directives.

While parsing, the CParser generates the corresponding type information into a hierarchy of CType objects, which can then be used to create and manipulate byte-oriented data blocks.

Licensing

The CParser/CType package is not included in the standard distribution; it is delivered as an extra (non-free) add-on package.
Please contact eXept for license information & pricing.

Area of use

The CParser & CType framework is especially useful to interface to C-Language data or programs via either shared data files, or via some communication mechanism (such as Pipes or Sockets).

In contrast to ST/X's inline C-code features, CParser and CType are completely implemented as Smalltalk code and are therefore easier to use and less error prone.
However, the performance may be slower than corresponding hardcoded inline C routines, since a lot of meta information is kept in the CType hierarchy.

Overview

Use of the framework consists of three major parts:

parsing the C-Types
generating structured C-Data (CDatums)
accessing C-Data structures

Parsing C-Types

As a first step, a C-Header file (or a string, containing the C-Language source) must be given to CParser and parsed.
The resulting collection of C-Types should be kept by the application (typically in a class variable).
A good place to perform this task is a classes #initialize method.

Example:

    ...
    classVariableNames:'CTypes'
    ...


    initialize
	"parse C-Types from the file cDefs.h,
	 which contains C-Language types and #defines"

	CTypes isNil ifTrue:[
	    parser := CParser new.

	    parser parse:('cDefs.h' asFilename readStream).
	    "/ fetch types ...
	    CTypes := parser types.
	].
	...

Now, the classVariable CTypes refers to a dictionary of C-types, where the keys are the type names. Thus, if the C-header file contained the definition

    struct myStruct {
	int foo;
	float bar;
    };

a corresponding entry will be found in the dictionary under the key myStruct,
i.e.

    ...
    myStructType := CTypes at:'myStruct'.
    ...

Beside reading types, the CParser also keeps track of #define definitions. Defines can be retrieved from the parser via the #defines message.
Notice, that defines are typeless - i.e. the cparser treats and returns all defines as string-defines. However, some protocol exists to extract a #defines integer value (which might be required for bit-constants or array dimensions).

#include and #if directives are not handled by the CParser - if required, a cpp (c-preprocessor) filtered output must be used for CParser to handle the header file.

Meta knowledge of C-Types

CTypes keep all information as collected by the CParser; therefore, it is possible to query the type for various aspects. Of special interest are:

aCType isCArray
true if type is a C-array of something
aCType isCCompound
true if type is a C-struct or C-union
aCType isCEnum
true if type is a C-enumeration type
aCType isCNumber
true if type is a C-int, float or double
aCType isCStruct
true if type is a C-struct
aCType isCUnion
true if type is a C-union
aCType isIndexed
same as isCArray
aCType memberNames
a collection of field names for structs or unions
aCType sizeof
size in bytes of a corresponding datum
aCType dimension
dimension of a C-Array type
aCType elementType
element type of a C-Array type

(Notice, that there are many other query methods - see the CType implementation in the Browser, for a complete list)

Allocating C-Data

Given a CType, you can allocate a corresponding CDatum by sending one of the following messages to the CType:

aCType new
allocate a new CDatum in smalltalk memory
The CDatum will use a garbage collected ByteArray as storage.
The data area may be moved around by the compressing garbage collector and is automatically reclaimed when the CDatum is no longer referenced.
I.e. this should never be passed to C-functions, which keep a reference to the data.
aCType new:dim
like above, for C-arrays.
aCType malloc
allocate a new CDatum in external memory
The CDatum will use a malloc'd memory block as storage. This memory will not be freed automatically (i.e. it must be freed by the program).
aCType malloc:dim
like above, for C-arrays.
aCType gcMalloc
allocate a new CDatum in external memory
The CDatum will use a malloc'd memory block as storage. This memory will be freed automatically whenever the CDatum is no longer referenced.
aCType gcMalloc:dim
like above, for C-arrays.
aCType onBytes:bytes
wrap a type over a buffer
the CDatum is mapped over the given byte-accessible storage, which is usually either a ByteArray or an instance of ExternalBytes.

Use onBytes: if some data has either been allocated elsewhere (for example, in a C-primitive function or library routine), or has been read from a file or communication channel. For example, when reading data from a DataBase or via a Socket.

Use new / new: for all data which is not given to C code directly (i.e. for message/data buffers for file storage, or which are sent to another program via a pipe or socket).

Use gcMalloc / gcMalloc: for data which is passed to either inline C-code or to a C-library function and it is known that the C-code does not keep a reference to the datum internally. This memory will be automatically freed whenever smalltalk has no more references to it.

Use malloc / malloc: for data which is passed to either inline C-code or to a C-library function and it is either unknown if or certain that the C-code keeps references internally.
Be very careful to avoid memory leaks, since the storage must be freed manually (via the #free message) by the programmer.

Manipulating C-Data

CDatum objects respond to the same query protocol as described above, plus the additional protocol:

aCDatum type
return the CDatums type.
aCDatum at:index
get an array element.
aCDatum at:index put:val
set an array element.
aCDatum memberAt:fieldName
get a struct/union field
aCDatum memberAt:fieldName put:val
set a struct/union field.

CDatums provide access protocol to access indexed elements via #at: / #at:put: and field members via #memberAt: / #memberAt:put: messages.

If the elementType (for arrays) or fieldType (for struct/union) is a scalar type (i.e. char, int, float or double), the get methods return smalltalk integers or floats, and the set-methods accept smalltalk numeric objects as value.

For a non-scalar element type, the get-methods return another CDatum (i.e. a copy) and the set methods expect a cDatum.
For convenience, some smalltalk collections are allowed for setting:

structures - a dictionary containing fieldName - value associations is allowed and set corresponding fields of the struct/union.
arrays - a sequencable collection is allowed and corresponding indexed elements are set (but notice, that indexing is offset by one in Smalltalk)

In addition, the doesNotUnderstand: method is redefined to allow for member access in the typical smalltalk fashion (i.e. get/set protocol).
i.e. field members can also be accessed via:

aCDatum fieldName
get a struct/union fields value.
aCDatum fieldName:val
set a struct/union fields value.

For example, the above example data structure can be allocated and manipulated as:

    ...
    myStructType := CTypes at:'myStruct'.
    myDatum := myStructType new.
    myDatum memberAt:'foo' put:15.
    myDatum memberAt:'bar' put:3.14159.
    ...

or:

    ...
    myDatum foo:15.
    myDatum bar:3.14159.
    ...

Pointers into C-Data

Sometimes, it is useful to create pointers into a CDatum - for example, to use a common helper-method which manipulates a subStruct, or to process a substruct without a need to copy the underlying storage.

Remember, that the #memberAt: message extracts a field, which results in an expensive copy, if the field is a structure, union or array.

You can create a CPointer (which points into another CDatum) with:

aCDatum refMemberAt:fieldName
get a pointer to a struct/union field

ByteOrder issues

Often, when data is passed between machines, the byteorder is different between the CPU architectures.
To provide a convenient solution for this problem, CDatums keep the byteOrder of their data and allow it to be queried or changed.
By default, CDatums assume that the byteorder is that of the underlying CPU (i.e. LSB for intel/alpha, MSB for hp/sparc).

At any time, a CDatums byteOrder can be changed/queried via the

aCDatum msb
return the CDatums byteOrder
aCDatum msb:aBoolean
change the CDatums byteOrder

message.
Thus, when some data has been retrieved via a socket or pipe, and the data is known to be bigEndian (i.e. MSB-first), simply send the CDatum the message:

    cDatum msb:true

All followup accesses will assume bigEndian data.

You should always set the byteOrder when communicating with external processes/machines (i.e. do not depend upon the default, because it is not the same on all ST/X implementations)

Examples

The following code fragment can be used to send and receive C-structured data blocks to/from a C-program via a Socket. Data is transfered in msb-first (i.e. network-) byteOrder.

receiver:

    |buffer socket datum foo bar|

    ...
    buffer := ByteArray new:1024.
    ...
    socket readWait.
    socket nextAvailableInto:buffer.
    ...
    datum := myStructType onBytes:buffer.
    datum msb:true.
    ...
    foo := datum foo.
    bar := datum bar.
    ...

sender:

    |buffer socket datum foo bar|

    ...
    buffer := ByteArray new:1024.
    ...
    datum := myStructType onBytes:buffer.
    datum msb:true.
    ...
    datum foo:123.
    datum bar:1.2345.
    ...
    socket nextPutBytes:datum sizeof from:datum data.
    ...

The following code fragment uses a helper method to initialize the fields of a structure and a CPointer is passed to it to fill a substructure.
The corresponding C-header definitions are:

    #define NUM_CHARS  10

    typedef struct foo {
	int     foo1;
	float   foo2;
    };

    typedef struct bar {
	foo     innerFoo;
	int     bar1;
	char    bar2[NUM_CHARS];
    };

the smalltalk code is:

initFoo:aFoo
    aFoo foo1:10.
    aFoo foo2:(Float pi).
    ^ self


    ...
    cTypes := cparser types.
    cDefines := cparser defines.
    ...
    fooType := cTypes at:'foo'.
    barType := cTypes at:'bar'.
    ...
    aBar := barType new.
    ...
    self initFoo:(aBar refMemberAt:'innerFoo');
    ...
    NUM_CHARS := Integer fromString:(defines at:'NUM_CHARS').
    ...

Notice, that passing the result of #memberAt: to the initFoo method would not work in the above example, since that would pass a copy of the inner structure and leave the original (outer) datum unchanged.

<info@exept.de>

Doc $Revision: 1.15 $