Gtrees Project

Table of Contents
Preface
Implementation
Source Code
Documentation and Examples
Contacts
Release Notes

Preface

This is an implementation of efficient tree data structures for GUILE Scheme. Binary search trees are the only important data structures that are missing in this interpreter. Hashing tables are outstanding when one needs to represent associated arrays with (almost) time-constant access. But hashing tables have one big disadvantage---they cannot represent a partially ordered set in a natural way. On the other hand, trees (namely binary trees) can do the job with a minimal growth of the overhead. Elements of hashing tables are “spread in large space”, so it's not possible to implement efficient iterators for such data structures while trees consist of directly connected nodes and the iteration algorithm is entirely straightforward.

The general problem with binary search trees is that a tree data structure can degenerate readily. For example, assume a binary tree when inserting numbers from 1 to 1000. The tree will transmute to a (overheaded) list. In consequence, the complexity of important tree manupulation functions will no longer be logarithmic. When this situation occurs, the structure will loose its efficiency. Such situations can be avoided by ensuring that the tree will be in a balanced form after every insertion/deletion procedure call. Leaf nodes of pure balanced trees are of the same depth (eventually, their depth can be “depth - 1”). The balancing of such structures is very expensive operation. But if the balance criteria is not so strict, one can obtain low-cost and reliable structures with only imponderable grow of the overhead.

The Red-Black trees consist of red and black nodes. Red node cannot be followed by a red node, leaf nodes are always black and the black height of the tree must be constant (black node count from the root to an arbitrary leaf node must be the same). Possibly the previous definition looks strange but Red-Black trees can be easily balanced using the “discoloring of nodes” and “subtree rotations”.

The AVL trees (named after Adison--Velski and Landis) consist only of such subtrees, which depth differs by 1 at most. This seems to be almost like a pure balanced tree definition but the constraint is applied to the subtree depth (not leaf depth). This little change has a dramatic fall-out. It's possible to balance AVL trees using simple rotations. Each node of the AVL tree must contain the additional overhead information --- whether the subtree is prone to the left/right or balanced.

The following two examples show the different structures after inserting nodes from 30 to 1.

Implementation

I've decided to represent the tree data structures using SMOBS (i.e. SMall OBjects). Tree nodes can be represented the same way for both tree implementations. The overhead information (red/black and balance factor) is stored in the lowest two bits of the (for example---parent) pointer in the similar manner as guile does (e.g. short integers). This mechanism will spare about 4 MB for 1 million nodes at the cost of few AND operations. Each node contains a key and a value (both represented by SCM) and some node pointers. It's sufficient to keep only children pointers, but if you have an additional parent node, you can easily determine the node successors and predecessors.

In consequence, all the tree manipulator functions can be split into two groups. The first are the modifiers, they are directly dependent on the tree type (Red-Black / AVL), because they must balance the tree, and this operation is naturally different for each implementation. For example insertion, deletion and tree construction are typical tree modifiers. While accessors don't change the tree structure, the same functions can be used for both tree implementations (e.g. tree-ref, iterators, flatting functions). This is a big advantage, it allows to implement more tree structures (splay trees?) and use amount of existing functions.

The rest is on you. If you are interested in this project, look at the source code, it's very close commented, so I think it won't be a problem to read.

Source Code

The latest source snapshot is available from here. If you want to contribute, please send me patches, suggestions and comments to <vilem.vychodil@upol.cz>.

Documentation and Examples

Further documentation and examples can be found in the reference manual. This manual contains similar information as this page. In addition to that, there is a commented list of all functions and predicates of Gtrees. Unfortunately, as the module system of Guile Scheme is somewhat changing, the attached examples must not work with every version of the interpreter. Sorry for that. On the other hand, each Gtrees source archive contains its own operational examples.

Contacts

If you are interested in this project, contact me via e-mail.

Release Notes

Currently, the implementation of both tree structures and generic functions is done. But I'm not sure, that my interpretation of how the smobs and garbage collector work was right, so the code maybe contains some (for me) hidden bugs, memory leaks, etc.

I want thereby ask patient and experienced schemers to look at my source code (if they don't expect that all is a nonsense, of course). I've tested it with amount of data, but I'm not sure about the stuff around the GC. :-) Thank you.