means, all the object's data (including the type tagging information that is required to identify the object's type) must fit into 32 bits. - Heap objects -- meaning that the SCM variable holds a pointer into the heap. On systems where a pointer needs more than 32 bits this means that scm_t_bits and SCM variables need to be large enough to hold such pointers. In contrast to immediates, the data associated with a heap object can consume arbitrary amounts of memory. The 'heap' is the memory area that is under control of Guile's garbage collector. It holds allocated memory of various sizes. The impact on the runtime type system is that Guile needs to be able to determine the type of an object given the pointer. Usually the way that Guile does this is by storing a "type tag" in the first word of the object. Some objects are common enough that they get special treatment. Since Guile guarantees that the address of a GC-allocated object on the heap is 8-byte aligned, Guile can play tricks with the lower 3 bits. That is, since heap objects encode a pointer to an 8-byte-aligned pointer, the three least significant bits of a SCM can be used to store additional information. The bits are used to store information about the object's type and thus are called tc3-bits, where tc stands for type-code. For a given SCM value, the distinction whether it holds an immediate or heap object is based on the tc3-bits (see above) of its scm_t_bits equivalent: If the tc3-bits equal #b000, then the SCM value holds a heap object, and the scm_t_bits variable's value is just the pointer to the heap cell. Summarized, the data of a scheme object that is represented by a SCM variable consists of a) the SCM variable itself, b) in case of heap objects memory that the SCM object points to, c) in case of heap objects potentially additional data outside of the heap (like for example malloc'ed data), and d) in case of heap objects potentially additional data inside of the heap, since data stored in b) and c) may hold references to other cells. Immediates Operations on immediate objects can typically be processed faster than on heap objects. The reason is that the object's data can be extracted directly from the SCM variable (or rather a corresponding scm_t_bits variable), instead of having to perform additional memory accesses to obtain the object's data from the heap. In order to get the best possible performance frequently used data types should be realized as immediates. This is, as has been mentioned above, only possible if the objects can be represented with 32 bits (including type tagging). In Guile, the following data types and special objects are realized as immediates: booleans, characters, small integers (see below), the empty list, the end of file object, the 'unspecified' object (which is delivered as a return value by functions for which the return value is unspecified), a 'nil' object used in the elisp-compatibility mode and certain other 'special' objects which are only used internally in Guile. Integers in Guile can be arbitrarily large. On the other hand, integers are one of the most frequently used data types. Especially integers with less than 32 bits are commonly used. Thus, internally and transparently for application code guile distinguishes between small and large integers. Whether an integer is a large or a small integer depends on the number of bits needed to represent its value. Small integers are those which can be represented as immediates. Since they don't require more than a fixed number of bits for their representation, they are also known as 'fixnums'. The tc3-combinations #b010 and #b110 are used to represent small integers, which allows to use the most significant bit of the tc3-bits to be part of the integer value being represented. This means that all integers with up to 30 bits (including one bit for the sign) can be represented as immediates. On systems where SCM and scm_t_bits variables hold more than 32 bits, the amount of bits usable for small integers will even be larger. The tc3-code #b100 is shared among booleans, characters and the other special objects listed above. Heap Objects All object types not mentioned above in the list of immediate objects are represented as heap objects. The amount of memory referenced by a heap object depends on the object's type, namely on the set of attributes that have to be stored with objects of that type. Every heap object type is allowed to define its own layout and interpretation of the data stored in its cell (with some restrictions, see below). One of the design goals of guile's type system is to make it possible to store a scheme pair with as little memory usage as possible. The minimum amount of memory that is required to store two scheme objects (car and cdr of a pair) is the amount of memory required by two scm_t_bits or SCM variables. Therefore pairs in guile are stored in two words, and are tagged with a bit pattern in the SCM value, not with a type tag on the heap. Garbage collection During garbage collection, unreachable objects on the heap will be freed. To determine the set of reachable objects, by default, the GC just traces all words in all heap objects. It is possible to register custom tracing ("marking") procedures. If an object is unreachable, by default, the GC just notes this fact and moves on. Later allocations will clear out the memory associated with the object, and re-use it. It is possible to register custom finalizers, however. Run-time type introspection Guile's type system is designed to make it possible to determine a the type of a heap object from the object's first scm_t_bits variable. (Given a SCM variable X holding a heap object, the macro SCM_CELL_TYPE(X) will deliver the corresponding object's first scm_t_bits variable.) If the object holds a scheme pair, then we already know that the first scm_t_bits variable of the cell will hold a scheme object with one of the following tc3-codes: #b000 (heap object), #b010 (small integer), #b110 (small integer), #b100 (non-integer immediate). All these tc3-codes have in common, that their least significant bit is #b0. This fact is used by the garbage collector to identify cells that hold pairs. The remaining tc3-codes are assigned as follows: #b001 (class instance or, more precisely, a struct, of which a class instance is a special case), #b011 (closure), #b101/#b111 (all remaining heap object types). Summary of type codes of scheme objects (SCM variables) Here is a summary of tagging bits as they might occur in a scheme object. The notation is as follows: tc stands for type code as before, tc with n being a number indicates a type code formed by the n least significant bits of the SCM variables corresponding scm_t_bits value. Note that (as has been explained above) tc1==1 can only occur in the first scm_t_bits variable of a cell belonging to a heap object that is not a pair. For an explanation of the tc tags with tc1==1, see the next section with the summary of the type codes on the heap. tc1: 0: For scheme objects, tc1==0 must be fulfilled. (1: This can never be the case for a scheme object.) tc2: 00: Either a heap object or some non-integer immediate (01: This can never be the case for a scheme object.) 10: Small integer (11: This can never be the case for a scheme object.) tc3: 000: a heap object (pair, closure, class instance etc.) (001: This can never be the case for a scheme object.) 010: an even small integer (least significant bit is 0). (011: This can never be the case for a scheme object.) 100: Non-integer immediate (101: This can never be the case for a scheme object.) 110: an odd small integer (least significant bit is 1). (111: This can never be the case for a scheme object.) The remaining bits of the heap objects form the pointer to the heap cell. The remaining bits of the small integers form the integer's value and sign. Thus, the only scheme objects for which a further subdivision is of interest are the ones with tc3==100. tc8 (for objects with tc3==100): 00000-100: special objects ('flags') 00001-100: characters 00010-100: unused 00011-100: unused Summary of type codes on the heap Here is a summary of tagging in scm_t_bits values as they might occur in the first scm_t_bits variable of a heap cell. tc1: 0: the cell belongs to a pair. 1: the cell belongs to a non-pair. tc2: 00: the cell belongs to a pair with no short integer in its car. 01: the cell belongs to a non-pair (struct or some other heap object). 10: the cell belongs to a pair with a short integer in its car. 11: the cell belongs to a non-pair (closure or some other heap object). tc3: 000: the cell belongs to a pair with a heap object in its car. 001: the cell belongs to a struct 010: the cell belongs to a pair with an even short integer in its car. 011: the cell belongs to a closure 100: the cell belongs to a pair with a non-integer immediate in its car. 101: the cell belongs to some other heap object. 110: the cell belongs to a pair with an odd short integer in its car. 111: the cell belongs to some other heap object. tc7 (for tc3==1x1): See below for the list of types. Three special tc7-codes are of interest: numbers, ports and smobs in fact each represent collections of types, which are subdivided using tc16-codes. tc16 (for tc7==scm_tc7_smob): The largest part of the space of smob types is not subdivided in a predefined way, since smobs can be added arbitrarily by user C code. */