5 min read

Mapping Every Key, Part 1

A survey of common partial map data structures, and how total maps somewhat change the landscape

I recently found myself wanting an abstraction that denotes a total function, can be randomly generated, and can be efficiently updated like a dictionary. In the spirit of a side quest, I decided to overdo it.

It may seem that total maps might as well be implemented the same ways that partial maps are implemented, but as we will see, total maps require some slightly different considerations. This post is part 1 of a 3-part series. Part 1 is a brief survey of common dictionary data structures and why I am ultimately going to choose lazily-generated tries. Part 2 will derive the data structure and operations from a specification. Part 3 will present a Haskell implementation.

Partial Maps

Most dictionary data structures are partial maps. That is, not every possible key will have a corresponding value in the map. In an "empty" map, no keys map to anything at all, and a good implementation only uses a small constant amount of space. Here's a brief cost analysis of a few data structures for partial maps.

Hash tables

The most common data structure for partial maps is the hash table.

People normally claim that lookup, insertion, and deletion on a hash table that contains \(n\) elements take average (under the uniform hashing assumption) \(\mathcal{O}(1)\) time and worst-case \(\mathcal{O}(n)\) time. However, this is assuming that hashing and comparing keys both take constant time. They don't! In order for a hash table to store n unique keys drawn from a universe of size N, the size of the average key must be \(\mathcal{O}(\log(N)\). In other words, these operations take average \(\mathcal{O}(\log(N))\) and worst case \(\mathcal{O}(n \log(N))\). It may seem pedantic, but without distinguishing \(N\) from \(n\), comparing hash tables against tries later would be unfair.

Hash tables normally have excellent constant factors, however. They remain the undisputed king of partial maps in most programming languages.

Balanced Binary Search Trees

A binary search tree usually requires more indirections than a hash table, and more key comparisons, but has the advantage that the keys are sorted. It's also possible to perform some operations more efficiently than is possible with a hash table, especially when the tree is purely functional, such as splitting one into two at a given key.

Asymptotically, balanced binary search trees are worse than hash tables' "average" times, but don't have really bad worst-case times. Lookup, insertion, and deletion take \(\mathcal{O}(\log(n)\log(N))\) (that's \(\mathcal{O}(\log(n))\) comparisons, each taking \(\mathcal{O}(\log(N))\) time).

Tries

In theory, tries have great asymptotics. In practice, they are not very common. Hash tables and binary search trees don't need to understand much about their keys in their representation, as long as they know how to compare them (and how to hash them, in the case of hash tables); they can just store pointers to them or store them directly. Tries are different, because the structure of a trie depends on the structures of their keys. This makes it harder to implement general tries. Most implementations are for specific types of keys, like strings.

Tries usually have the most indirections to follow, because the depth to traverse is equal to the size of the key. On the other hand, each indirection takes only \(\mathcal{O}(1)\) time (or at least it's a common assumption). There is no key hashing or comparing required. If the path for a key exists in the trie, the value at the end of the path is the correct one. Lookup, insertion, and deletion simply take \(\mathcal{O}(\log(N))\) time.

Total Maps

A total map denotes a (total) function. Every key in the domain maps to a corresponding value. Because a total map must provide a mapping for every possible key, the choice of data structure for a total map is not as straightforward as for a partial map.

Hash Tables, Balanced Binary Search Trees, and Tries

You can use the same data structure that you would for a partial map. The catch is that every key has to be present. For small \(N\), this might be fine. For large \(N\), you won't have enough memory. It also means that constructing one from some function will require \(\mathcal{O}(N)\) function evaluations.

Asymptotically, \(n = N\), so you end up with:

  • hash tables: average case \(\mathcal{O}(\log(N))\), worst case \(\mathcal{O}(N \log(N))\)
  • binary search trees: \(\mathcal{O}(\log^2(N))\)
  • tries: \(\mathcal{O}(\log(N))\)

Constant With Overrides

You can represent a total map as a constant value paired with a partial map of overrides. To look up a key, you check the overrides map first, falling back to the constant when absent. This reduces the space usage to be on the order of the number of overrides, and it brings back the original asymptotics of whichever data structure you use for the partial map, just in terms of the number of overrides instead of the number of mappings. However, it obviously can only handle a constant "default" case, not an arbitrary function.

Function With Overrides

You can represent a total map as a function paired with a partial map of overrides. This lifts the restriction of only having a constant fallback. The asymptotics remain identical to the constant-with-overrides data structure. On the other hand, if the function is slow, looking up anything that hasn't been overridden is just as slow. The map of overrides can optionally be used as a memo table, caching the results of function applications even if they are not overridden, but this has the tradeoff of growing in size with the number of queried keys.

Lookup Table

For a sufficiently small domain that also has an efficient mapping to array indices, you could just use an array. This is about as optimal as it gets, with \(\mathcal{O}(1)\) lookup and update, but does not generalize well.

Lazily-Generated Structures

A neat thing about knowing that every key will be present is that you know statically what shape the fully-expanded data structure will have. This doesn't help a whole lot in the case of a hash table, but things are a bit more interesting with binary search trees and tries, because they start small and grow on demand. They have most of the downsides of the function-with-overrides-and-memoization approach, but have one advantage: they are purely functional.

Between binary search trees and tries, the latter is a much better fit. We're already basing the internal structure of the map on the space of keys, so we're not gaining the advantage of simpler implementation that we would normally get with binary trees. Lazily generating a binary tree requires storing keys in the internal nodes, even though, since the tree must be complete anyway, this information is already implied by the paths themselves, so it might as well be just a trie. Binary search trees as total maps are both more redundant and more complicated than tries.

Let's Try Tries

Lazily-generated tries are the best fit for me. They allow me to use Haskell's random package, which provides pure, splittable PRNGs, to lazily generate them, and they are efficiently updatable. As we'll see in the next post, tries can also be explained with some cute theory. We can easily derive the representation and functions from a specification in Part 2, and when we implement it in Haskell in Part 3, we will get the laziness for free.