Splay Top Trees

The top tree data structure is an important and fundamental tool in dynamic graph algorithms. Top trees have existed for decades, and today serve as an ingredient in many state-of-the-art algorithms for dynamic graphs. In this work, we give a new direct proof of the existence of top trees, facilitating simpler and more direct implementations of top trees, based on ideas from splay trees. This result hinges on new insights into the structure of top trees, and in particular the structure of each root path in a top tree. In amortized analysis, our top trees match the asymptotic bounds of the state of the art.


Introduction
An interesting topic in graph algorithms is dynamic graphs; graphs that undergo updates or changes.Throughout the past half century, fundamental algorithmic questions about graphs such as connectivity, minimum spanning tree, shortest paths, matchings, and planarity, have been studied in the dynamic setting, where edges may appear and disappear adversarially [20,18,27,46,34,48,38,32,40,16,47,26,1,35,41,39,44,6,9,8,25,24,19,17,29].For such problems, researchers engage in the pursuit of a data structure with an (as efficient as possible) polylogarithmic algorithm for updates and queries, or (conditional) lower bounds certifying that such algorithms are unlikely to exist.For a wide range of problems, polylogarithmic time data structures do indeed exist, and often, such algorithms use top trees as a tool towards efficient solutions.
The top tree [4] is a data structure, which can be used in combination with any (explicitly maintained) dynamic spanning forest of the changing graph.The strength of top trees is that they can be used to efficiently store concise summaries of information about subtrees of a tree in a way that allows for efficient updates and queries.If we think of the dynamic graph problem as a calculation, the top tree stores a tree of summaries of properties of subgraphs, and upon changes to the graph, we only need to recompute a logarithmically bounded number of subgraph summaries.
More formally, the basic structure of a top tree can be seen as a binary tree of contractions in the underlying tree.The leaf nodes of the top tree are the edges in the underlying tree, and the internal nodes correspond to the union, or merge, of the two children.Each node must behave as a "generalized edge" in the sense that it may share at most two vertices with the rest of the graph.We denote by cluster the set of edges corresponding to a node in the top tree.Top trees build on similar principles as topology trees, which is the tool introduced and used by Frederickson [20] to get the first non-trivial update algorithms for connectivity-related questions about dynamic graphs [20,21,33].However, top trees and topology trees differ in the fundamental way that while top trees use a notion of clusters that generalizes edges, topology trees use a notion of clusters that generalizes low-degree vertices.
Not only do top trees build on similar principles as topology trees, the original proof that top trees of logarithmic height can be maintained goes via reduction to topology trees [4,Section 6.2].Most implementations of top trees described to date have the form of either an extra interface imposed on topology trees, or are implemented as an interface on Sleator and Tarjan's link-cut trees [45].
In this work, we show a different route to top trees, and provide new proofs of top tree properties.We show that top trees with logarithmic amortized update times may be directly implemented in a manner akin to splay trees.We claim that such direct implementations are simpler, and provide pseudo-code and C code as proof of concept.While our approach is indeed simple to implement, it is not trivial that it has good theoretical guarantees, and we give proofs of its efficient amortized update times.
Just like the state-of-the art implementation by Tarjan and Werneck [45], a caveat of our work is that all times are only amortized.However there are many results involving top trees for which this is not a problem.For one, there are many dynamic problems for which the efficient (polylog-time) solution using top trees is already amortized.Examples include biconnectivity, 2-edge connectivity, planarity testing, and diameter [27,46,30,28,29,4].Secondly, dynamic algorithms have a range of applications in static problems, where the solution to the static problem is computed via a sequence of subproblems, dynamically extracted from the original problem [10,23,37,5].When used as a subroutine this way, amortized running time is just as valuable as worst-case.

Related work.
Data structures for dynamic forests include link-cut trees that were introduced by Sleator and Tarjan [42] and who used it, among other applications, to solve flow-problems.Topology trees were introduced by Frederickson [20] and used to obtain the first sublinear-time algorithms for dynamic graph connectivity.Later, top trees [4] were introduced and used to give new polylogarithmic update-time algorithms for connectivity and related problems [27,31].Splay trees were introduced by Sleator and Tarjan [43] under the name of Self-adjusting Binary Search trees, as a simple way to obtain efficient amortized time per operation, without guaranteeing that the tree is always balanced.The same paper also introduced the notion of semi-splaying as an attempt to reduce the total number of rotations by a constant, while still guaranteeing efficient amortized time per operation.In [12] it was confirmed empirically that using semi-splaying, "the practical performance is better for a very broad variety of access patterns".
Splay trees have been conjectured to be dynamically optimal [14,13], meaning that for any sequence of operations, they do at most a constant factor more work than an optimal algorithm that knows the whole sequence of operations in advance.The subject of dynamic optimality has been addressed also in connection with Tango trees [15], and in recent advances on search trees on trees [11,7].
In practice, the state-of-the-art implementation of top trees is by Tarjan and Werneck [45].Their implementation is based on link-cut trees, that are enhanced with extra information, and their update-times are amortized.An experimental evaluation of topology trees in comparison with link-cut-trees is given by Frederickson [22], and Albers and Karpinski [2] study splay trees in theory and practice, proposing a randomized variant with fast (i.e.O(log n)) expected worst-case time per operation.

Overview of paper.
First, in Section 2, we give an overview of the data structure and the operations we support.Then, sections 3, 4, 5, 6, and 7 are dedicated to the details of those data structures and operations: In Section 3, we describe some very simple local queries in the top tree.In Section 4, we describe the rotation-like subroutine that the splay tree uses.Section 5 is devoted to splaying, and we describe and prove structural properties that make both semi-splaying and splaying easy to implement in O(log n) amortized time.Section 6 describes an important internal operation used as a subroutine by the top tree for several of its fundamental operations.Section 7 describes the dynamic operations that are called directly by the user of the top tree: expose, deexpose, link, and cut.In Section 8, we provide practical information about our implementation, which is available for download.In Appendix A, we describe a more complicated way to implement expose that does not rely on full_splay.This reduces the number of changes to the top tree, and we conjecture that it is faster in practice.

Overview of the data structure
A top tree is a data structure that represents an underlying tree, usually from some underlying forest.A top tree can be augmented to store additional information, e.g.edge weights, and to answer various types of queries on the underlying tree.It does this by maintaining a hierarchy of summaries of certain connected subgraphs of the underlying tree, from which the answers can be efficiently computed.Up to two vertices in each underlying tree can be marked as exposed, which affects the structure of the top tree and what summaries are stored, and therefore which queries the top tree is ready to answer quickly at any given time.With no exposed vertices this could e.g.be the size or diameter of the underlying tree.With one vertex exposed, we can think of the underlying tree as rooted in that vertex and we might e.g.answer queries about the height.With two vertices exposed, that would typically be some question related to the path between them, e.g. the length or the maximum edge weight.See [4] for more details.

External operations
Using T v to refer to the tree in the underlying forest containing the vertex v, and T v to refer to the corresponding top tree, the operations we support are as follows: expose(v): Makes v exposed, and returns the root 1 of T v .Requires that the tree T v containing v has at most 1 exposed vertex and that v is not currently exposed.Does not change which node is the root node of T v .

deexpose(v):
Makes v not exposed, and returns the root of T v .Requires that v is currently exposed in the tree T v containing it.Does not change which node is the root node of T v .

link(u, v):
Creates a new edge u, v in the forest and returns the root of the top tree for the resulting tree.Requires that u, v are in disjoint trees T u , T v and that neither tree has any exposed vertices.

cut(e):
Deletes the edge e with endpoints u, v from the forest and returns 2 the roots of the two resulting trees T u , T v .Requires that the tree containing e has no exposed vertices.
This differs slightly from the standard description of top trees where expose is usually defined to take up to two arguments, but it should be clear that it is equivalent in terms of power.We give a direct proof that the amortized cost of each operation is O(log n).

Structure
The top tree is itself a rooted binary 3 tree of nodes, where each node corresponds to a cluster, which is a connected set of edges in the underlying tree.Each leaf in the top tree corresponds to a single edge in the underlying tree, and each internal node corresponds to the disjoint union of its two children.
We say that a vertex is a boundary vertex of a cluster if it is incident to an edge in the cluster and either is exposed or is also incident to an edge outside the cluster.A cluster is valid if it has at most two boundary vertices.Conceptually, the boundary vertices lie on the boundary between the cluster and the rest of the tree, and since valid clusters have at most two boundary vertices, valid clusters can be thought of as "generalized edges" with the endpoints being the boundary vertices.It is possible for such a generalized edge to have less than two endpoints, but we can think of them as normal edges whose "missing" endpoints are irrelevant to us.
The top tree must satisfy the invariant that all clusters are valid.We note that this invariant requires that each tree in the underlying forest has at most two exposed vertices, because the root of the top tree would otherwise have too many boundary vertices.
We categorize the (valid) clusters in the top tree as either path clusters or point clusters.A path cluster is a cluster with two boundary vertices, and a point cluster is a cluster with zero or one boundary vertices.For each internal node in the top tree, we also define its central vertex as the vertex shared by its two children.The set of boundary vertices in a node is always equal to the union of boundary vertices in its children, with or without the central vertex removed.Figure 1: An example of a top tree that satisfies the orientation invariant.Having some convention for what orientation is chosen for each node helps reduce the number of cases that need to be considered, both for the operations explored in this paper, and for any augmentations that need to maintain additional information as part of the stored summaries.In this paper we require the orientations in the top tree to satisfy the following orientation invariant: For any internal node, the leftmost boundary vertex of the right child and the rightmost boundary vertex of the left child must both exist and be equal to the central vertex of the node.

Orientation invariant
The invariant determines the orientation of every internal non-root node relative to the orientation of its parent, except for point clusters whose children are both point clusters.
A larger example of a top tree that satisfies the orientation invariant can be found in Figure 1.Note that b 5 has its only boundary vertex in the middle, so the subtree rooted at b 5 could be mirrored without breaking the invariant.
To maintain the orientation invariant as the structure changes, we sometimes have to flip the orientation of a whole subtree.

Internal operations
To support the external operations (link, cut, expose, deexpose) we introduce a collection of internal operations.Some of these are probably not directly useful for applications using the top tree, but form the basis for implementing the external operations.
is_point(node), is_path(node) Returns whether node is a point cluster or a path cluster.

flip(node)
To maintain the orientation invariant as the structure changes, we sometimes have to flip the orientation of a whole subtree.Flipping can be done by reversing the left-to-right order in every node of the subtree, but for efficiency we need to do this in a lazy fashion.The idea is to just store a boolean flip bit in each node, indicating whether or not to conceptually flip the whole subtree rooted at that node.This operation therefore just inverts that bit, which trivially takes constant time.Since it is literally just that, our pseudocode manipulates that bit directly rather than calling a function to do it.
Since all our logic for the rest of the internal operations is symmetric, calling push_flip on the constant number of nodes where the relative orientation is relevant is sufficient to ensure the part of the top tree we are working on is stored in a way that we can ignore the flip bits in much of the rest of the logic.

has_left_boundary(node), has_middle_boundary(node), has_right_boundary(node)
Returns whether node has a left, middle, or right boundary vertex respectively.These functions can be implemented in constant time if we know the number of boundary vertices of each cluster.In practice, it is useful to ignore the flip bits of the proper ancestors of node, but not the flip bit on node itself when defining what left and right means for these functions.In other words, the result is as if push_flip had been called on node first, so it is consistent with the orientation of the parent.

rotate_up(node)
This is similar to the rotate operation known from binary search trees.See Figure 2 for an illustration.
There are two major differences between rotate_up and rotations in binary search trees.One is that rotations are not always allowed.The rotate_up function is only allowed if sibling(node) ∪ sibling(parent(node)) is a valid cluster.The other difference is that it does not always preserve the ordering of the leaves.
The operation can be defined in the following way: Swap the parent pointers of node and sibling(parent(node)), then adjust child pointers, orientations, and any other remaining fields in a way such that all invariants are satisfied.It can be shown that this is always possible if sibling(node) ∪ sibling(parent(node)) is a valid cluster.or rotate_up(A) (right).Note here that in binary search trees, one typically calls the rotate operation on the node (A ∪ B), whereas in our case, that would be ambiguous.Instead, we define rotate_up to take the child of (A ∪ B) whose depth is reduced.

semi_splay_step(node)
This finds and executes one or two rotate_up operations in the top tree that are valid and together reduce the depth of node by one.It returns the root of the changed subtree.This is really the core of our new algorithm.We prove that if the depth of node is at least five, this operation always succeeds and only needs to look at a small constant number of nodes.
Conceptually, the semi-splay step serves the purpose that we use rotations for in binary search trees: decreasing the depth of the given node.We cannot use rotate_up for that directly because we aren't always allowed to call that function.However, rotate_up is allowed sufficiently often that we can still implement splay trees, and the semi-splay step is best thought of as a procedure that searches for such a rotation -understanding the precise details of how that happens is not important for understanding the rest of the paper.

semi_splay(node)
This uses semi_splay_step repeatedly but in a slightly less naïve way.The amortized cost of this operation is O(log n) − Ω(depth(node)), which means that it can be used to pay for other operations whose natural cost is O(depth(node)).
This also guarantees that the depth of node is reduced by a constant factor, and is similar to the balancing operation used in semi-splay trees.

full_splay(node)
This also uses semi_splay_step repeatedly.In addition to the guarantees that semi_splay provides, this method also guarantees that node is moved to have depth at most 4.
It does this by considering two semi_splay_step calls at a time, and is similar to the logic in splay trees where we look for either a Zig, a Zig-Zig, or a Zig-Zag step.
find_consuming_node(v) returns the consuming node of a vertex, defined as the lowest common ancestor in the top tree of all edges incident to the vertex.More details and pseudocode can be found in Section 6.Note that although the query itself does not require any changes to the tree, our implementation will perform a semi-splay in the tree first to make the amortization work.This is similar to a search in a standard splay tree, where you also need to do a splay during or after each search.

Data structure
We store the top tree as a rooted binary tree.For each node, we store the following information: • A pointer to the parent node, or a null pointer if it has no parent.
• For internal nodes, a left and right child pointer.Neither pointer may be null.
• For leaf nodes, a vertex id or pointer for the left and right endpoints of the edge.(Or a pointer to the edge, which in turn stores the left and right endpoints.) • A counter storing the number of boundary vertices of the node.This can be zero, one or two.
• Each node stores a flip bit for the subtree rooted at that node.It represents whether the subtree rooted at the node conceptually needs to be flipped relative to its parent.
It is also necessary to store the underlying tree itself.This can be done by storing the edges adjacent to each vertex in a linked list.The necessary operations on the underlying tree are: insert/delete edge, a way to get any edge incident to a given vertex, a way to get the endpoints of an edge, and a way to determine whether the degree of a vertex is "zero or one" or "at least two".(We have queries like degree(v) >= 2, but we do not actually need the precise number.This is convenient when storing the edges incident to a vertex in a linked list, as we do not need a separate counter for its length.)

User data
Most applications of top trees require that you store some kind of user-data in the vertices or edges in the underlying tree, and summaries based on that user-data in each cluster of the top tree.Normally, this user-data must be computed "bottom-up", and the user-data usually doesn't support changes in the middle of the top tree (e.g.rotations).However, it turns out that this is not actually a problem for our operations, because they always operate on the entire root path, and not only locally in the middle of the top tree.This means that when modifying the top tree, you can first destroy the user-data on the nodes of the root path, then you can run the algorithm from this paper, and then you can rebuild the user-data on nodes of the new root path.Rebuilding can be done either at the end of each operation, or deferred until the data is actually needed.Either way, the amortized number of times node user-data must be (re)computed stays O(log n).

Detecting boundary vertex positions
Our other internal operations need to determine whether the boundary vertices are to the left, middle, or to the right.This can be done by looking at how many boundary vertices the node and its children have.The operations can be implemented as follows: Copyright Checking whether a node has a left boundary vertex can be done by checking whether the appropriate child is a path cluster because if the child is a point cluster, then its only boundary vertex is the central vertex of our node, and if the child is a path cluster, then it has a boundary vertex different from the central vertex, and that extra boundary vertex is also a boundary vertex of the parent.The code for has_middle_boundary works because left + middle + right is the total number of boundary vertices.(Using the convention that booleans can be used interchangeably with the integers zero and one.)

Rotations
The rotation is one of the basic operations that the top tree algorithms are built on.To move a given node up, we swap it with its parent's sibling.The rotate_up operation is allowed if and only if sibling(node) ∪ sibling(parent(node)) is a valid cluster.There are no guarantees about which orientations the nodes are given, except that they satisfy the orientation invariant.When performing a rotation, the underlying tree can look in one of two ways as illustrated in Figure 3. Additionally, the orientations of the top tree can appear with node and sibling to the same or opposite sides as illustrated in Figure 4. Using this, we can analyze the implementation of rotate_up on a case-by-case basis.
First, we should argue that to_same_sides && sibling_is_path is true if and only if we are rotating around a path.If we assume that we are rotating around a path, then sibling must be a path cluster, and it follows from the orientation invariant that we are in the case of Figure 4a.If we assume that we are rotating around a star and sibling is a path cluster, then we can't be in case Figure 4a    to opposite sides of sibling due to the orientation invariant.
We now show correctness for rotations around a path like in Figure 3a.We know that we are in Figure 4a, and the orientations are updated such that node, sibling, and uncle appear in the same order before and after the rotation.This means that the orientations match without having to flip node, sibling, uncle, or parent.The line that computes new_parent_is_path needs to be true when {sibling, uncle} has two boundary vertices.That cluster always has the boundary vertex between sibling and node, so we just need to check if one of the two other vertices are boundary vertices.The vertex between sibling and uncle is a boundary vertex of the new parent if and only if it is a boundary vertex of the old grandparent, and it is the central vertex of the old grandparent, so we can check that by asking whether grandparent had a middle boundary vertex.The vertex at the other end of uncle is a boundary of {sibling, uncle} whenever it is a boundary vertex of uncle.We can check this by asking whether uncle has two boundary vertices.
The code that looks at ggp checks for the situation where {node, sibling, uncle} has a single boundary vertex in the middle before the rotation, and a single boundary vertex to the left or right after the rotation.If the new boundary vertex is to the right, then the grandparent no longer has a leftmost boundary vertex, so if the grandparent is a right child, we need to flip it to avoid breaking the orientation invariant at ggp.
We now consider rotations on a star like in Figure 3b.We first consider the case in Figure 4a, where sibling must be a point cluster to avoid violating the orientation invariant.Here, we put the nodes together such that node, sibling, and uncle appear in the same order before and after the rotation.We flip sibling because it changes between being a left and right child.(The flip isn't necessary when sibling has the boundary vertex in the middle, but it is also not incorrect to flip in this case.)The new cluster is a path cluster exactly when uncle is a path cluster since we know that sibling is a point cluster.The orientation of the grandparent remains correct since the rotation does not change which vertex is to the left, in the middle, and to the right.

Copyright © 2023
This paper is available under the CC-BY 4.0 license 313 Downloaded 03/30/23 to 192.38.90.17 .Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacyNext, we consider rotations on a star like in Figure 4b with sibling being a point cluster.Here, node must be a point cluster to avoid violating the orientation invariant.The only two possible boundary vertices of the new parent and grandparent are the two endpoints of uncle.Thus, the new parent is a path cluster when uncle is.To avoid breaking the orientation invariant above grandparent, we note that the code puts the tree back together such that the two endpoints of uncle do not change positions in grandparent.
The last case to consider is Figure 4b with sibling being a path cluster.Then, both node and uncle must be point clusters, because otherwise we have a cluster with three boundary vertices before or after the rotation.Thus, the only two possible boundary vertices of the new parent and grandparent are the two endpoints of sibling.Thus, parent is a path cluster when sibling is.To avoid breaking the orientation invariant above grandparent, we note that the code puts the tree back together such that the two endpoints of sibling do not change positions in grandparent.
The above analysis was based on the two cases in Figure 4, but the two cases can also appear flipped.The algorithm also works in those cases because the code only cares about whether things hang off to the same or different sides, and not which side is left or right, so it treats the flipped cases identically.

When are rotations valid?
The exact condition for when rotates are allowed is that sibling(node) ∪ sibling(parent(node)) is a valid cluster, but proving this directly every time we wish to make a rotation quickly becomes quite cumbersome.For this reason, we provide two lemmas that provide some cases in which rotations are allowed.
We note that since rotate_up may modify the orientations in any way it wants to as long as the invariants are satisfied, there are multiple correct ways to implement it.However, the lemmas below do not look inside our implementation, so they hold for any correct implementation of rotate_up.
However, we first prove some helper lemmas.In the following we define the phrase "a and b hang off to the same side" as "both a and b are left children, or both are right children".The phrase "a and b hang off to opposite sides" will be its negation.was a connected set of edges, then they would share a vertex u, but this is impossible as there would be a path from u to w through b, then from w to v through a, then from v to u through x.This is a cycle in a tree.Thus, x ∪ b is not connected.
Using these helper lemmas, we can prove our lemmas for when rotations are allowed.The variable names in the lemmas are defined to match Figure 5. Lemma 4.4.Let x be a valid cluster in a top tree.If x and its grandparent are both point clusters, then it is valid to call rotate_up(x).
Proof.Let y be the parent of x and let z be the parent of y.Let also a be the sibling of x and b be the sibling of y.We need to prove that a ∪ b is a valid cluster.To do that, we first argue that a and b share a vertex v.By Lemma 4.2, all boundary vertices of y are also boundary vertices of a.However, one of the boundary vertices of y is the one it shares with its sibling b.Thus, a and b share a boundary vertex.
It remains to show that a ∪ b is valid.Since x ∪ a already exists in the tree, the only boundary vertex in x must be the one it shares with a. Thus, the cluster (a ∪ b) ∪ x has at most one fewer boundary vertices than a ∪ b.Since (a ∪ b) ∪ x already exists in the tree as the point cluster z, this means that a ∪ b has at most two boundary vertices as desired.Lemma 4.5.Let x, y, z, a, b be valid clusters in a top tree with z the parent of y, b and y the parent of x, a.If y is a path cluster and x hangs off to the same side as y, then it is valid to call rotate_up(x).
If z is a path cluster and not the root, then a ∪ b hangs off to the same side as z after the rotation if and only if b hung off to the same side as z before the rotation.
If z is a point cluster, then a ∪ b is a point cluster after the rotation.
Proof.We need to show that a ∪ b is a valid cluster.To show that it is connected, let v be the central vertex of z.It is a boundary vertex of its children y and b.Since y is a path cluster, it has another boundary vertex w, and due to the orientation invariant, the assumption that x, y hang off to the same side implies that w is a boundary vertex of x and v is a boundary vertex of a. Thus, since a and b share the vertex v, they are connected.
Let us show that a ∪ b is a valid cluster.If a or b are point clusters, then we are done by Lemma 4.1, so assume both are path clusters.Since y is also a path cluster, it follows from Lemma 4.1 that v is not a boundary vertex of z.Since z = x ∪ (a ∪ b), it follows that v can only be a boundary vertex of a ∪ b if v is the vertex shared with x, but x ∪ b is disconnected by Lemma 4.3, making it impossible for v to be a boundary vertex of x.Thus, v is not a boundary vertex of a ∪ b, so it is valid by Lemma 4.1.
Assume that z is a path cluster and that its sibling c exists.The cluster z has a boundary vertex that isn't in the middle.Either it comes from x ∪ a before and x after, or it comes from b before and a ∪ b after.If that boundary vertex is shared with c, then the child it comes from must hang off to the same side as c both before and after the rotation.If it isn't shared with c, then the child it comes from must hang off to the opposite side as c both before and after the rotation.The desired post-condition follows.
Assume that z is a point cluster.Since x, y hang off to the same side, the only boundary vertex of z must be w.However, since w is not in the middle at z, it must come exclusively from one child, and we know that it is a boundary vertex of x, so it must come from x.This implies that a ∪ b cannot be a path cluster after the rotation.

Splaying
Splaying is a well-known strategy for balancing binary search trees, and typically comes in two variants: full splays and semi-splays.Splaying does not translate verbatim to top trees since not all rotations are allowed in top trees, but it turns out that rotations are allowed sufficiently often that we can still implement splay-like Copyright © 2023 This paper is available under the CC-BY 4.0 license 315 Downloaded 03/30/23 to 192.38.90.17 .Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacyoperations.This section will describe both a semi-splay and a full splay, where the semi-splay will reduce the depth of the given node by a constant factor, and the full splay will reduce the depth to four or less.
If we only consider the asymptotic amortized running times in terms of n, then there is no reason to ever use a semi-splay, since the full splay provides strictly stronger guarantees and has the same amortized running time asymptotically.However, since semi-splays make fewer structural updates to the tree, they are generally faster by a constant factor, which we conjecture may be significant in practice.In this paper, we will only use a full splay when we need the additional guarantees it provides.
Our proofs that the operations run in amortized logarithmic time use the amortization potential of Φ = T x∈T r(x), with r(x) = log 2 (s(x)) and s(x) the number of leaves in the subtree rooted at x.This is similar to the potential used for splay trees, which uses the number of nodes rather than leaves in the subtree.

Semi-splay step
To implement this, we will first define a semi-splay step.Implementing a semi-splay or a full splay involves repeatedly calling the semi-splay step operation.Conceptually, the semi-splay step serves the purpose that we use rotations for in binary search trees: decreasing the depth of the given node.We cannot use rotate_up for that directly because we aren't always allowed to call that function.However, rotate_up is allowed sufficiently often that we can still implement splay trees, and the semi-splay step is best thought of as a procedure that searches for such a rotation -understanding the precise details of how that happens is not important for understanding the rest of the paper.
The pseudo-code for a semi-splay step is found below: Since rotations are not always valid, the semi_splay_step operation will follow the root path b 0 , b 1 , . . ., b k until it matches a pattern where the rotation is allowed.Once the pattern is detected, the algorithm knows that it has found two nodes hanging off the root path that can be merged, and it merges them using rotations.If the pattern doesn't match, it tries again by calling itself with b 0 being the previous b 1 .The pattern is such that it can only fail a few times, and the first node in the match will be one of b The patterns that the algorithm will match are: Proof.If the semi-splay step hits the is_point(node) && is_point(gparent) branch, then the rotation is allowed by Lemma 4.4.If we hit the node_is_left == parent_is_left branch, then since parent is a path cluster, the rotation is valid by Lemma 4.5.
If we hit the parent_is_left == gparent_is_left branch, then it must be the case that gparent is a path cluster.To see this, assume for contradiction that it is a point cluster, and assume without loss of generality that parent is a left child.Then gparent must have a left boundary vertex.This means that gparent has no rightmost boundary vertex, but it is a left child so its rightmost boundary vertex must exist.Thus, gparent is a path cluster, and the rotation is valid by Lemma 4.5.
If we hit the node_is_left == gparent_is_left branch, then since parent is a path cluster, the first rotation is valid by Lemma 4.5.If gparent is a path cluster (the rotation doesn't change this), then the first post-condition of Lemma 4.5 says that parent and gparent hang off to the same side after the rotation, so Lemma 4.5 says that the second rotation is valid.If gparent is a point cluster, then the second post-condition says that parent is a point cluster after the rotation, and this case is not reachable unless ggparent is also a point cluster, so the second rotation is allowed by Lemma 4.4.Proof.We get depth(x) ≤ 4 since the semi-splay step always succeeds for depth(x) ≥ 5 by Lemma 5.2.If the root cluster is a point cluster, then since we match all patterns ending in a point cluster, the failure to match must be because b 3 doesn't exist, so depth(x) ≤ 2. If x is also a point cluster, then the failure to match must be because b 2 doesn't exist, so depth(x) ≤ 1.Finally, if x is a point cluster and the root is a path cluster, then in every possible Proof.The semi-splay step either rotates once or twice.By drawing the tree before and after, we can see that if it rotates once, then the potential changes by ∆Φ = r(sibling(p ℓ−1 (x ′ ))) − r(p ℓ (x)).If it rotates twice, then the potential changes by ∆Φ = r(sibling ).These expressions are equal to the expression in the Lemma because the remaining terms are just the same nodes added and subtracted before and after the operation, but none of those nodes had their number of leaves changed, so the extra terms cancel out.

Semi-splay
Using semi_splay_step as a subroutine, the pseudocode of semi_splay is: This subroutine will reduce the depth of node by a constant factor.It does this by repeatedly calling semi_splay_step on an ancestor of node.It is important for the amortized analysis that the semi-splay steps do not "overlap", which the algorithm avoids because all nodes that the semi_splay_step modifies are descendants of the node that it returns.
We will need the following small lemma in our analysis: In other words, the amortized cost of semi_splay is O 1 + r(root(x)) − r(x) = O(log n).
Each successful iteration reduces the depth of x by 1, so the resulting depth of x is at most depth(x)− 1 5 depth(x) = 4 5 depth(x) as claimed.

Full splay
Using semi_splay_step as a subroutine, the pseudocode for full_splay is: This subroutine will move node close to the root so that its depth is bounded by a constant.It does this by alternating between calling semi_splay_step on node and on one of its ancestors top.If we just wanted to reduce the depth to a constant, then calling semi_splay_step(node) in a loop would suffice, but the second semi_splay_step is necessary to make the amortized running time work.This is similar to ordinary splay trees, which require Zig-Zig or Zig-Zag steps rather than simply repeating Zig steps.Proof.We consider the change of potential in an iteration of full_splay where both semi-splay steps succeed.We let x be the node before the iteration, x ′ the node after one semi-splay step, and x ′′ the node after two splay steps.Let p ℓ 1 (x ′ ) and p ℓ 2 (x ′′ ) be the return values of each semi-splay step.By Lemma 5.2, we have 1 ≤ ℓ 1 ≤ 4 and ℓ 1 < ℓ 2 ≤ 8.By using Lemma 5.4 twice (first with (a, k, b) = (1, 0, 9) then with (a, k, b) = (1, ℓ 1 , 8)), the two semi-splay steps change the potential by (5.1) Note that the term 8 i=1 r(p i (x ′ )) has canceled in the above.Since sibling(p ℓ 2 −1 (x ′′ )) and sibling(p ℓ 1 −1 (x ′ )) have disjoint subtrees and are both descendants of p 8 (x ′′ ), it follows by Lemma 5.5 that r(sibling(p Analogously, the iterations where only one of the semi-splay steps succeed increase the potential by at most ∆Φ ≤ 9 i=1 r(p i (x ′′ )) − 9 i=1 r(p i (x)), without the minus two term.
Since both semi-splay steps always succeed when depth(x) ≥ 9 and the depth decreases by two each time, both semi-splay steps succeed at least 1 2 (depth(x) − 9) times.Summing the changes of potentials over all iterations telescopes to ∆Φ ≤ 9 i=1 r(p i (x ′ )) − 9 i=1 r(p i (x)) − (depth(x) − 9) ≤ 9 1 + r(root(x)) − r(x) − depth(x) with x the node before and x ′ the node after the entire full splay.This is as desired.That the depth bounds are as desired follows by Lemma 5.3 since the semi-splay step has just failed on x ′ when the algorithm returns.
Copyright © 2023 This paper is available under the CC-BY 4.0 license delete node } fn cut(e) { // Assumes there are no exposed vertices u = e.vertices[0] v = e.vertices [1] full_splay(e) // now depth(e)<=2, and if e is a leaf edge, depth(e)<=1 delete_all_ancestors(e) u.exposed = true v.exposed = true Tu = deexpose(u) Tv = deexpose(v) return (Tu,Tv) } Since we assume that there are no exposed vertices, the full splay reduces the depth to 2 when e is a path cluster, or 1 when e is a point cluster.For a path cluster, the two top trees that remain when removing the ancestors of the edge must correspond to the two trees you get by removing the edge, since otherwise the original top tree would contain a cluster that isn't connected.For point clusters, removing the ancestors of the edge does not cut the top tree into several pieces, so here the remaining top tree also corresponds to the remaining piece of the underlying tree.
Note that after removing the ancestors, the vertices are marked as boundary vertices of all clusters they appear in, which is incorrect.Marking them as exposed restores the top tree invariants, and they can then be deexposed afterwards.
The amortized running time of the cut operation is O(log n) because that's the amortized cost of a full splay and the deexpose operations.Removing the edge and its at most 2 ancestors only decreases the amortization potential, so that is also okay.

Implementation and testing
To help verify the correctness of the pseudocode in this paper, we provide an implementation of the data structure in the C programming language.The implementation can be found at: https://gitlab.com/aliceryhl/toptree-c-exampleWhenever a function in the C code also exists as pseudocode in the paper, we have ensured that they match line-for-line to make it easy to verify that they implement the algorithm in the same way.There are a few exceptions since the C code must handle e.g.allocation failures, but the equivalence should be clear.
The C code maintains the following user data: For each edge, a weight is stored.Additionally, for each path cluster, the maximum weight of any edge on the path between the two boundary vertices is stored.Using this information, we can use the top tree to dynamically maintain a minimum spanning tree of a weighted graph as new edges are added to the graph (see e.g.[27]).
To verify the correctness of the C code, we have implemented a testing utility that generates a random graph with a given number of vertices and edges, then verifies that the minimum spanning tree generated by our top tree has the same total weight as the minimum spanning tree generated by running Kruskal's algorithm [36] on the same graph.Furthermore, our implementation provides a way to check whether all invariants are satisfied, which is called periodically during the test.We have run this testing utility on a larger number of random graphs.The two algorithms agreed on the total weight every time, and the invariants were satisfied every time we checked them.This makes it very likely that the implementation is correct.
Copyright © 2023 This paper is available under the CC-BY 4.0 license 324 Downloaded 03/30/23 to 192.38.90.17 .Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy The correctness of each rotation is seen by considering the figure and noticing that the new cluster must necessarily be valid in each case.Lemma A.1.When the prepare_expose function returns, every cluster containing the vertex being exposed is either a point cluster, or already has the vertex as a boundary vertex.

Figure 2 :
Figure 2: A top tree containing clusters A, B, and C (middle), and the result of calling either rotate_up(B) (left)or rotate_up(A) (right).Note here that in binary search trees, one typically calls the rotate operation on the node (A ∪ B), whereas in our case, that would be ambiguous.Instead, we define rotate_up to take the child of (A ∪ B) whose depth is reduced.

Figure 3 :
Figure 3: The two possible configurations in the underlying tree of the clusters involved in a rotation.Each cluster is drawn as an edge.The labels on the edges refers to variables in the pseudocode for rotate_up.

Figure 4 :
Figure 4: The two possible configurations in the top tree of sibling and uncle in relation to each other (ignoring reflections).The labels on the edges refers to variables in the pseudocode for rotate_up.

Lemma 4 . 1 .
If a and b are valid clusters of a top tree whose intersection is a single vertex v, then the cluster a ∪ b is invalid if and only if the following three conditions hold: a is a path cluster, b is a path cluster, v is a boundary vertex of a ∪ b.Proof.We have already seen that the boundary vertices of a cluster is the union of the boundary vertices of its children, possibly with the central vertex removed.Since a and b are valid and share v, the union of the boundary vertices of the children has three elements if and only if both a and b are path clusters.The central vertex of a ∪ b is v.The lemma follows from this.Lemma 4.2.If a, b, c are valid clusters with c the parent of a and b, and if a is a point cluster, then all boundary vertices of c are also boundary vertices of b.Proof.All boundary vertices of c are either boundary vertices of a or b.However, all boundary vertices of a are also boundary vertices of b because the only boundary vertex of a point cluster is the one it shares with its sibling.

Lemma 4 . 3 .
Let x, y, z, a, b be valid clusters in a top tree with y the parent of x, a and z the parent of y, b.If a is a path cluster and x, y hang off to the same side, then x ∪ b is not a connected set of edges.Proof.Assume without loss of generality that a, b are right children.Since a is a path cluster, it has a leftmost boundary vertex v and a rightmost boundary vertex w.By the orientation invariant, y has central vertex v and right boundary vertex w.Similarly, w is the central vertex of z.It follows that x has boundary vertex v and that b has boundary vertex w.If x ∪ b

Copyright © 2023 Figure 5 :
Figure 5: An illustration of a tree with the variable names used in Section 4.1.
Here, some of the patterns are matched on b 0 b 1 b 2 , and some are matched on b 1 b 2 b 3 .The semi-splay step prefers to pick a pattern on b 0 b 1 b 2 if there is a match for both types.Lemma 5.1.A semi-splay step never makes an invalid rotation in a valid top tree.

Lemma 5 . 2 .Lemma 5 . 3 .
If x has depth d ≥ 5, then semi_splay_step(x) reduces the depth of x by 1, and returns a node x ′ with d − 5 ≤ depth(x ′ ) ≤ d − 2 such that all modified nodes are in the subtree rooted at x ′ .If semi_splay_step(x) returns null, then no change was made to the tree and depth(x) ≤ 4. If x is a point cluster, then depth(x) ≤ 3.If the root is a point cluster, then depth(x) ≤ 2. If both x and the root are point clusters, then depth(x) ≤ 1.

Lemma 5 . 4 .
pattern for b 0 b 1 b 2 b 3 b 4 either both of b 0 , b 2 are point clusters, or b 3 is a point cluster, or both of b 2 , b 3 are path clusters.In each case semi_splay_step(x) would return non-null, so b 4 can't exist and depth(x) ≤ 3. Consider the operation semi_splay_step(p k (x)) with p i (x) the ith ancestor of x or the root if i ≥ depth(x).Let x ′ be the node x after the semi-splay step.Let 0 ≤ k ≤ b and 0 ≤ a ≤ k + 1 be such that semi_splay_step(p k (x)) returns p ℓ (x ′ ) for some ℓ ≤ b (this is the same node as p ℓ+1 (x)).Then, the splay step changes the amortization potential by ∆Φ = r(sibling(p ℓ−1 (x ′ ))) + b−1 i=a r(p i (x ′ )) − b i=a r(p i (x)).

Lemma 5 . 7 .
Calling full_splay on a node x does O(depth(x)) work, changes the potential by ∆Φ ≤ O 1 + r(root(x)) − r(x) − Ω(depth(x)), and reduces the depth of x so it satisfies the same bounds as in Lemma 5.3.In other words, the amortized cost of full_splay is O 1 + r(root(x)) − r(x) = O(log n).
B is exposed, next edge on A. We merge AB and DA.B is exposed, next edge on C. We merge BC and CD.Symmetric to Case (a).
A is exposed, edge to D is point cluster and connects on B. We merge AB and BD.A is exposed, next edge connects on C. We merge BC and CD.A is exposed, edge to D is path cluster on B, and edge to E connects to B. We merge BD and DE.A is exposed, edge to D is path cluster on B, and edge to E connects to C. We swap BD and CE, leaving situation (d) for next iteration.

Figure 6 :
Figure 6: The different situations one can encounter during the subroutine from section A.1 when the current node has two boundary vertices.The two edges AB and BC inside the dashed circles are the two children of the current node.The gray vertex corresponds to the vertex we want to expose.The two other vertices inside the dashed circle are boundary vertices of the current node.The edge connecting to D is the sibling of the current node.The edge connecting to E is the sibling of the parent of the current node.
When working with top trees, it is useful to think of the children of each node to have a specific left-to-right order, or orientation.(I.e. one child is the left child and the other child is the right child.)Given the orientation of each node, we can define whether a boundary vertex of a node is to the left, in the middle, or to the right.For internal nodes, a boundary vertex is in the middle if it is the central vertex of the node.Otherwise, a boundary vertex is to the right if it comes from the right child, and to the left if it comes from the left child.The leftmost boundary vertex is then defined as the left boundary vertex if one exists, otherwise the middle boundary vertex if one exists.If the node only has a right boundary vertex, then we say that it has no leftmost boundary vertex.The definition of rightmost is analogous.For leaf nodes, the left boundary vertex is the left endpoint and the right boundary vertex is the right endpoint, if those vertices are boundary vertices of the edge.Leaf nodes never have a central or middle vertex.