Table of Contents

This is an implementation of
**efficient tree data structures** for GUILE Scheme.
Binary search trees are the only important data structures
that are missing in this interpreter. Hashing tables are
outstanding when one needs to represent associated arrays with
(almost) time-constant access. But hashing tables have one big
disadvantage---they cannot represent
a *partially ordered set* in a natural way.
On the other hand, trees (namely binary trees) can do the job
with a minimal growth of the overhead.
Elements of hashing tables are “spread in large space”,
so it's not possible to implement efficient iterators for such
data structures while trees consist of directly connected nodes
and the iteration algorithm is entirely straightforward.

The general problem with binary search trees is that a tree data
structure can degenerate readily. For example, assume a
binary tree when inserting numbers from 1 to 1000. The tree
will transmute to a (overheaded) list. In consequence, the
complexity of important tree manupulation functions will no
longer be logarithmic.
When this situation occurs, the structure will loose its efficiency.
Such situations can be avoided by ensuring that the tree will be
in a balanced form after every insertion/deletion procedure call.
Leaf nodes of *pure balanced trees* are of the same depth
(eventually, their depth can be “depth - 1”). The balancing
of such structures is very expensive operation. But if the balance
criteria is not so strict,
one can obtain low-cost and reliable structures
with only imponderable grow of the overhead.

The **Red-Black** trees consist of red and black nodes.
Red node cannot be followed by a red node, leaf nodes are always
black and the *black height* of the tree must be constant
(black node count from the root to an arbitrary leaf node must
be the same). Possibly the previous definition looks strange
but Red-Black trees can be easily balanced using the
“discoloring of nodes” and “subtree rotations”.

The **AVL** trees
(named after **A**dison--**V**elski and **L**andis)
consist only of such subtrees, which depth differs by 1 at most.
This seems to be almost like a pure balanced tree definition
but the constraint is applied to the *subtree depth*
(not leaf depth). This *little change*
has a dramatic fall-out. It's possible to balance AVL trees
using simple rotations. Each node of the AVL tree must contain
the additional overhead information --- whether the subtree
is prone to the left/right or balanced.

The following two examples show the different structures after inserting nodes from 30 to 1.

I've decided to represent the tree data structures using SMOBS (i.e. SMall OBjects). Tree nodes can be represented the same way for both tree implementations. The overhead information (red/black and balance factor) is stored in the lowest two bits of the (for example---parent) pointer in the similar manner as guile does (e.g. short integers). This mechanism will spare about 4 MB for 1 million nodes at the cost of few AND operations. Each node contains a key and a value (both represented by SCM) and some node pointers. It's sufficient to keep only children pointers, but if you have an additional parent node, you can easily determine the node successors and predecessors.

In consequence, all the tree manipulator functions can be
split into two groups. The first are the **modifiers**, they are
directly dependent on the tree type (Red-Black / AVL), because
they must balance the tree, and this operation is naturally
different for each implementation. For example *insertion*,
*deletion* and *tree construction* are typical
tree modifiers. While **accessors** don't change the
tree structure, the same functions can be used for both
tree implementations (e.g. *tree-ref*, *iterators*,
*flatting functions*). This is a big advantage, it allows
to implement more tree structures (splay trees?)
and use amount of existing functions.

The rest is on you. If you are interested in this project, look at the source code, it's very close commented, so I think it won't be a problem to read.

The latest source snapshot is available from here. If you want to contribute, please send me patches, suggestions and comments to <vilem.vychodil@upol.cz>.

Further documentation and examples can be found in the reference manual. This manual contains similar information as this page. In addition to that, there is a commented list of all functions and predicates of Gtrees. Unfortunately, as the module system of Guile Scheme is somewhat changing, the attached examples must not work with every version of the interpreter. Sorry for that. On the other hand, each Gtrees source archive contains its own operational examples.

If you are interested in this project, contact me via e-mail.

Currently, the implementation of both tree structures and generic functions is done. But I'm not sure, that my interpretation of how the smobs and garbage collector work was right, so the code maybe contains some (for me) hidden bugs, memory leaks, etc.

I want thereby ask patient and experienced schemers to look
at my source code
(if they don't expect that all is a nonsense, of course).
I've tested it with amount of data, but I'm not sure about the
stuff around the GC. `:-)` Thank you.