Data representation
There exist two different kinds of data objects in the CHICKEN system:
immediate and non-immediate objects.
Immediate objects
Immediate objects are represented by a single machine word, 32 or 64 bits depending on the architecture. They come in four different flavors:
fixnums, that is, small exact integers, where the lowest order bit is
set to 1. This gives fixnums a range of 31 bits for the actual
numeric value (63 bits on 64-bit architectures).
characters, where the four lowest-order bits are equal to
C_CHARACTER_BITS, currently 1010. The Unicode code point
of the character is encoded in the next 24 bits.
booleans, where the four lowest-order bits are equal to C_BOOLEAN_BITS,
currently 0110. The next bit is one for #t and zero for #f.
other values: the empty list, the value of unbound identifiers,
the undefined value (void), and end-of-file. The four lowest-order bits are equal to
C_SPECIAL_BITS, currently 1110. The next four bits contain an identifying
number for this type of object, one of:
C_SCHEME_END_OF_LIST, currently 0000;
C_SCHEME_UNDEFINED, currently 0001;
C_SCHEME_UNBOUND, currently 0010; or
C_SCHEME_END_OF_FILE, currently 0011.
Non-immediate objects
Collectively, the two lowest-order bits are known as the immediate mark bits. When the lowest bit is set, the object is a fixnum, as described above, and the next bit is part of its value. When the lowest bit is clear but the next bit is set, it is an immediate object other than a fixnum. If neither bit is set, the object is non-immediate, as described below.
Non-immediate objects are blocks of data represented by a pointer into
the heap. The pointer's immediate mark bits must be zero to indicate the object is non-immediate;
this guarantees the data block is aligned on a 4-byte boundary, at minimum. Alignment of data words
is required on modern architectures anyway, so we get the ability to distinguish between immediate and non-immediate objects for free.
The first word of the data block contains a header, which gives
information about the type of the object. The header is a
single machine word.
The 24 lowest-order bits contain the length of the data object, which is either
the number of bytes in a string or byte-vector, or the the number
of elements for a vector or record type.
The remaining bits are placed in the high-order end of the header.
The four highest-order bits are used for garbage
collection or internal data type dispatching.
- C_GC_FORWARDING_BIT
- Flag used for forwarding garbage collected object pointers.
- C_BYTEBLOCK_BIT
- Flag that specifies whether this data object contains raw bytes (a string or byte-vector) or pointers to other data objects.
- C_SPECIALBLOCK_BIT
- Flag that specifies whether this object contains a special non-object pointer value in its first slot. An example for this kind of objects are closures, which are a vector-type object with the code-pointer as the first item.
- C_8ALIGN_BIT
- Flag that specifies whether the data area of this block should be aligned on an 8-byte boundary (floating-point numbers, for example).
The actual data follows immediately after the header. Note that
block-addresses are always aligned to the native machine-word
boundary. Scheme data objects map to blocks in the following manner:
pairs: vector-like object (type bits C_PAIR_TYPE),
where the car and the cdr are contained in the first and second slots,
respectively.
vectors: vector object (type bits C_VECTOR_TYPE).
strings: byte-vector object (type bits C_STRING_TYPE).
procedures: special vector object (type bits
C_CLOSURE_TYPE). The first slot contains a pointer to a
compiled C function. Any extra slots contain the free variables (since
a flat closure representation is used).
flonums: a byte-vector object (type bits
C_FLONUM_BITS). Slots one and two (or a single slot on
64 bit architectures) contain a 64-bit floating-point number, in the
representation used by the host systems C compiler.
symbols: a vector object (type bits C_SYMBOL_TYPE). Slots
one and two contain the toplevel variable value and the print-name
(a string) of the symbol, respectively.
ports: a special vector object (type bits
C_PORT_TYPE). The first slot contains a pointer to a file-
stream, if this is a file-pointer, or NULL if not. The other slots
contain housekeeping data used for this port.
structures: a vector object (type bits
C_STRUCTURE_TYPE). The first slot contains a symbol that
specifies the kind of structure this record is an instance of. The other
slots contain the actual record items.
pointers: a special vector object (type bits
C_POINTER_TYPE). The single slot contains a machine pointer.
tagged pointers: similar to a pointer (type bits
C_TAGGED_POINTER_TYPE), but the object contains an additional
slot with a tag (an arbitrary data object) that identifies the type
of the pointer.
Data objects may be allocated outside of the garbage collected heap, as
long as their layout follows the above mentioned scheme. But care has to
be taken not to mutate these objects with heap-data (i.e. non-immediate
objects), because this will confuse the garbage collector.
For more information see the header file chicken.h.
Previous: Extensions
Next: Bugs and limitations