Graph & Tree Traversals in Rust

Memory-safe tree traversals using arena allocators and the visitor pattern

Jun 01, 2020

This story was originally published at sachanganesh.com. Please view the article there to read it in full.

Introduction

Like a lot of folks stuck at home during lockdown, I’m currently spending my newfound free time on a personal project. I’m developing it in Rust because I need fast execution speed and memory efficiency, although guaranteed memory safety is certainly a great perk.

The program I’m building is a phylogenetic inference tool (used to estimate evolutionary trees), so there’s a lot of constructing and modifying tree data structures. Naturally, tree traversals are fundamental to many of the algorithms I’ll use, but implementing one in Rust was not as easy as I expected.

In this article I’ll walk through how a traditional pre-order traversal is implemented in Python, to set a reasonable baseline for comparison. Then we’ll port the same code over to Rust.

We’ll first implement an immutable traversal, and follow with an attempt to develop a mutable traversal. The issues we’ll encounter with these two Rust programs will lead us to our final solution using arena allocators and the visitor pattern. Eventually, we’ll develop an implementation that will be more memory-safe and flexible than the original Python code.

The lessons learned from this exercise will obviously be useful when writing other tree traversals, but they can also be generalized to graph traversal algorithms as well.

Don’t Shoot the Messenger

I really enjoy what Rust brings to the table, and I think the memory ownership model is a great abstraction once you get used to it. But when starting out, it’s difficult to wrap your head around why code that just works in one language won’t just work when ported to Rust. I think everyone goes through this “fighting the borrow-checker” stage before they start to enjoy the language, but there are some particularly difficult habits to break before you get there.

There was definitely a point at which I became extremely frustrated with the language. After all, what good is Rust if I can’t even implement a simple tree traversal? It begged the question: what else would the language restrict me from in the future?

It’s at these moments that it’s important to realize that the borrow-checker isn’t stupid. It’s complaining for a reason, and probably a very good one. It isn’t the language that’s denying you because of its limitations; it’s just recognizing that there’s a memory-safety issue in your code. It’s forcing you to think differently about your problem, and to be more conscious about memory-safety.

Hopefully this article will show just how useful the borrow-checker is in helping us write safe, fast programs. Yes, it will be incredibly frustrating at times, but I think the end result is well worth the trouble.

Before we continue it will be relevant and useful to learn the memory ownership rules I mentioned before. If you’re already familiar with these, feel free to skip ahead to the next section, in which we’ll write a pre-order traversal in Python.

Reviewing Rust Memory Ownership Rules

This will be an informal review for those who don’t already know the ownership rules, and a more thorough reference can be found in the Rust Book 2nd Ed.

Type-safety and memory-safety are laudable goals, and preventing user error is one of the prime benefits of choosing Rust. But as always, rules are restricting. You’ll definitely feel limited at first because you won’t have access to all the tools and methods you’ve used before. We’ll see that this is a good thing, eventually, as we’ll learn how to write memory-safe code.

Hint: Substack doesn’t render code very nicely. Try reading it on my website instead.

All data, such as values and references, will have one owner. Typically when the scope of the owner ends, the owned memory is de-allocated.

let x = 5;  // here `x` is the owner of the data

2. Ownership of data can be transferred by a move operation. The original owner also does not have access to moved data if it was moved to a new owner.

let x = Box::new(5);
let y = x;          // move occurs here
                    // `y` is now the owner of the data
println!("{}", x);  // ILLEGAL: will not compile because data was moved out of `x`

3. An owner can lend the owned data out as borrowed references. A borrow is an immutable (read-only) reference to data.

You can have as many immutable borrows as you want.
As long as the data is being borrowed, it won’t be de-allocated.

let x = 5;
let y = &x;  // `&` signifies an immutable borrow

let z = &x;  // borrow again

4. A mutable borrow is a read-write reference to data. You can only have one mutable borrow to owned data at a time.

let mut x = Box::new(5);  // `mut` specifies that owned data can be mutated

let y = &x;

let z = &mut x;  // `&mut` signifies a mutable borrow
                 // `y` can't be used now because `x` was mutably borrowed


z += 1;

println!("{}", y);  // ILLEGAL: will not compile due to the mutable borrow

As I noted in the comment above, this code will not compile because mutable borrows can’t be used simultaneously with immutable borrows. When a mutable borrow occurs, all immutable borrows up to that point should be considered stale.
One could change the data using a mutable borrow, so in order to avoid a data race the previous immutable borrows should be discarded.
To preserve program soundness, we must re-borrow to get a new read-only reference, as shown below.

let mut x = 5
let y = &x
let z = &mut x;     // `&mut` signifies a mutable borrow
                    // `y` cannot be used anymore because `x` was mutably borrowed

z += 1

let y = &x;         // borrow again from `x`println!("{}", y);
                    // works because `x` is borrowed again after the modification

These rules are great in theory. But when we start building complicated programs with the same mindset as before, we’ll quickly run into issues. We have to be mindful of ownership and borrowing every step of the way.

A Motivating Example in Python

As an example, let’s examine how a simple pre-order traversal of the depicted tree can be implemented in Python 3. This will give us a baseline to compare to when we implement the same thing in Rust.

As part of the traversal, I want to visit every node, so that I can mutate each node’s internal data.

Tree Node

This class represents a node in the tree. An instance holds an arbitrary value and the child node references for the left and right subtrees.

Tree

This class represents the tree structure itself. In our example it only tracks the root node. We also implement the __iter__ function to return the pre-order iterator instance that we'll define next.

Pre-order Traversal

This class holds the logic for performing a pre-order traversal over the tree nodes. It implements the __next__ handler so that we can use the iterator in loops.

Running a Simple Test

In our test, we’ll build the example tree depicted above and then confirm two things: that we can iterate through every node, and that we can modify the tree during iteration. First, we’ll update the contents of the tree during the first iteration, as seen at line 10, by multiplying the node values by 10. Then in our second pass we’ll print the updated value of each node.

It’s certainly a contrived example, but it fits my use case regarding the implementation of branch-swapping algorithms.

You can see the output for yourself by running the script below. The printed values should be 10, 20, 40, 50, and 30 in that order.

Can we take a moment to appreciate how easy Python makes it for us? Simple, succint, and tidy. Nothing really special to see here, and that’s the point! Unfortunately, it won’t be so simple in Rust.

An Immutable Traversal in Rust

Recall that in Rust, we have two different kinds of references: immutable and mutable borrows. Like in Python, we preserve the Tree’s ownership of all the TreeNodes. In other words, the tree struct holds all the references to its nodes. But unlike in Python, we have to decide whether the reference we return during iteration will be immutable or mutable.

To get an accurate rewrite of our Python code, we eventually need to use mutable references so that we can modify the node values as we traverse. For now, however, let’s start with the simpler case of a read-only traversal. If we can get that working, then the mutable implementation should require just a few changes.

Tree Node

The above code is an example of how we explicitly reason about null-able types through the use of Option. An Option can either be None (null) or Some, where Some is a container for a non-null value.

The Box signifies that the data being wrapped/boxed is going to be stored on the heap, and the box acts as a pointer that we can de-reference later. As our node structure is recursive through its left and right fields (a TreeNode has references to other TreeNodes), we use the Box to inform the compiler of the TreeNode's byte-size at compile-time. Otherwise we'd get a compiler error, as it won't be able to infer the size by itself.

Tree

The following code is very similar to the Tree class definition we observed in Python.

The as_ref() function on line 13 is used to borrow the contents of the Option type. In this case, self.root.as_ref() will return data of the type Option<&Box<TreeNode>>, signifying that the Box is borrowed instead of the Option itself (&Option<Box<TreeNode>>).

Immutable Pre-order Traversal

As we can see, Rust is more verbose than Python, but the similarities in the code are still visible. On line 19, we see that the Iterator trait is being implemented for the data structure, and then we implement the next function on line 21 just as we did in Python.

One thing to be aware of is that Rust forces us to be explicit about how null-able types are handled. Just as we defined data to be Optional, we have to unwrap the Option when we want to use its contents. On line 22, we can see how we unwrap the Option within the conditional statement by unpacking the Some value.

Unfortunately, the code as it is written right now will not compile. Try it for yourself below.

You should encounter errors for missing lifetime parameter s. We haven't done anything special yet, but we've already hit our first road-block! Let's take a closer look.

Lifetimes

Typically the borrow-checker is able to automatically reason about some data’s lifetime, or how long it lives for before it can be de-allocated from memory. For example, when some owner of data goes out of scope, then the lifetime of the data could end at that point too, so the compiler might be able to free that memory.

However, the compiler will not free data that is still being borrowed. This is to avoid dangling reference issues often encountered in languages like C or C++, in which data is freed but pointers still point to the inaccessible memory. Thus borrows have lifetimes as well, so that the compiler can determine when a borrow ends and when the borrowed data can eventually be freed.

This makes sense, because we don’t want the tree nodes to be freed while they’re still being borrowed by the iterator! The question the compiler is then asking is: What is the lifetime of the borrow? In other words, when will the borrow end?

We know that when the PreorderIter struct borrows some data, that borrow should live for as long as the struct lives for. But when we use a trait like Iterator, which contains generic parameters like type Item, we introduce ambiguity that the compiler can't resolve without help.

We can eliminate this ambiguity by explicitly annotating the lifetimes of each ambiguous reference. A lifetime is denoted by an apostrophe ' followed by the lifetime identifier (some unique name for the lifetime). For example 'a signifies a lifetime a.

We can then explicitly annotate a struct’s lifetime by attaching <'a> to the end of it, like so: PreorderIter<'a>. Then we can use the lifetime 'a anywhere within that struct's scope to annotate borrows.

When implementing functions or traits on the struct, we have to first declare the lifetime for that implementation like this: impl<'a>. Then it can be used throughout that scope such as when annotating the struct or borrows.

With the lifetimes explicitly annotated, we’ve removed any ambiguities the compiler may encounter with these borrowed references and made it clear that the borrows will live for as long as the PreorderIter struct lives.

For a more in-depth treatment of these concepts, please see Arthur Liao’s article “Rust Borrows and Lifetimes”.

Now that we have lifetimes addressed and our code successfully compiles, let’s write the main routine to test the iterator.

Running the Test

Pay special attention to the different loop implementations. They’re equivalent, with the only difference being that the while loop on line 16 is more explicit about the iteration. We'll use these while loops again later.

Try commenting out line 17 and re-running the test ( hint: it won’t work). Recall that we can’t mutate data through immutable borrows. We need to make a mutable iterator instead that returns mutable borrows.

A Failed Attempt at Mutable Traversal

With the immutable iterator working, we now have the foundation to start implementing a mutable one. The easiest way may be to simply update our immutable iterator to use mutable borrows.

This article is too long to be included in full in this email. To continue reading about the solution, see the full article on my website.

If you liked this content, subscribe to my newsletter! I write about technical topics twice a month.

Originally published at https://sachanganesh.com on June 1, 2020.

The Data Dump

Discussion about this post

Ready for more?