Working With Red-Black Trees in C#
Working With Red-Black Trees in C#
See how red-black trees facilitate faster searches and become less disordered over time than similar data structures, and see how to build and search a red-black tree in C#.
Join the DZone community and get the full member experience.Join For Free
Although binary search trees (BSTs) are used widely, as data gets added, binary search trees tend to degenerate into an unordered linked list over time. The "red-black tree" is a related type of binary tree that's often more suitable for certain types of search operations. This article discusses how red-black trees facilitate faster searches and become less disordered over time than similar data structures, and shows how to build and search a red-black tree in C#.
A tree is a data structure that represents hierarchical data by storing it in nodes that are related to one another in specific ways. The topmost node of a tree is called the root. It is the node from which all the other nodes of the tree descend. A tree can have one and only one root. Using the same logic, a tree should have at least one node—the root node. A BST is a special tree structure that ensures both that the tree nodes are ordered, and that each node contains a value. In a binary tree, each node may have only two children, called left and right internal nodes. For any given node, the left child node stores a value that is less than the current node's value, while the right child node's value is greater than the current node's value. Such trees have a "height," which you can calculate by counting the number of links from the root node to the deepest node (the one furthest away from the root) in the tree.
The tree in Figure 1 has nine nodes, and a height of three—the distance from the root node (8) to any of the nodes 4, 7, or 13. Note how all left nodes have values less than their parent nodes while right nodes store values greater than their parents. The nodes that have no children are called "leaf nodes." In Figure 1, the nodes that correspond to the values 1, 4, 7 and 13 are all leaf nodes.
To search for a specific item in a binary tree, you start at the root node and repeatedly take either the left or right branch depending on whether the value that you are searching for is greater than or less than the value of the current node. Search operations in a binary tree take O(h) units of time, where, h represents the height of the tree. A BST can easily degenerate into an unbalanced list when added nodes fall disproportionately into one branch of the tree, making that branch far longer than others, thus making searches take longer for that branch than for others. According to MSDN, "The disadvantage of BSTs is that in the worst-case their asymptotic running time is reduced to linear time. This happens if the items inserted into the BST are inserted in order or in near-order. In such a case, a BST performs no better than an array." This is where red-black trees fit into the picture.
A red-black tree is an optimized version of a BST that adds a color attribute to each node. The value of this color attribute value is always either red or black. The root node is always black. In addition to color, each node contains a reference to its parent node, and its child nodes—left and right, as well as an optional key value. Red-black trees perform better than ordinary BST because they use the color attribute and the node references to maintain a better balance.
Red-black trees always follow these rules:
- The root node of a red-black tree is always black.
- The path from the root of the tree to any leaf node contains the same number of black nodes throughout the tree, also known as the "black-height" of the tree.
- Both children of a red node are always black.
- All "external" nodes—leaf nodes—are always colored black.
The maximum height of a red-black tree that contains n nodes is given by 2log(n+1). The time taken to search for any item in a red-black tree follows the formula O(log n), which implies that it is a good choice for search operations. Figure 2 shows a typical red-black tree.
Implementing a BST in C#
It's worth looking at the code for a binary search tree first; you can use it use it to see how the red-black tree implementation differs, and for comparison testing. The code in Listing 1 implements a simple binary search tree. It's interesting to look at the recursive code for searching the tree:
The search method compares the string values of the key parameter and the key of the passed-in node. The result of that comparison sets up the recursive call to search when the key value is either less than (search the left child) or greater than (search the right child) the key value of the node parameter. As the search progresses down the tree, eventually either the key value will match a node value, resulting in a successful search, or the search will fail and the method will return null.
Implementing a Red-Black Tree in C#
The rest of this article discusses a red-black tree implementation, shows how you can use it to search data, and shows an example comparing the efficiency of a search operation over a large data set between the red-black tree and the binary tree implementations.
Start by creating an enum containing two integer constants that represent the colors of the red-black tree nodes:
You also need an enum that holds a direction, represented by constants called Left and Right:
The Node class shown below represents a single red-black tree node. It has two overloaded constructors: One accepts the data that the new node should hold, and the other accepts both the node data and references to its child left and right nodes:
Next, here's a base class called Tree from which the RedBlackTree class (discussed later) will inherit. This class contains methods for searching and comparing node data and a method called Display() to display the tree data as text. It also contains references to the root node, the current node, and a "nil" node that serves as the single reference for all leaf or "external" nodes:
With the base class in place, you can create a RedBlackTree class: You need the added attribute called color, a reference to the parent and the grandparent nodes, and a temporary node reference. Here's the code for the RedBlackTree class.
The Insert() method adds new nodes to the RedBlackTree. The insert operation places the new node either to the left or the right of the parent node, depending on whether its value is lesser or greater than the parent node's value:
You may have noticed calls to a ReArrange method in the preceding code. That's necessary, because when you add or delete nodes from a red-black tree, you may need to move nodes around or change their color to meet the red-black tree rules discussed earlier. The ReArrange operation actually swaps nodes to ensure that the color properties are preserved, but at the same time makes sure that the in-order traversal of the tree is not lost by calling the Rotate methods shown below to restore red-black tree ordering rules:
Working with Red-Black Trees
Here's an example showing how you can use the RedBlackTree class. The main() method shown below creates a new RedBlackTree instance and populates it with 1,000,000 nodes containing random integer values between 1 and 1,000,000. Finally, it inserts a node with the value 1,000,001 (forcing that node to appear at the end of the tree), and then searches for it, printing the elapsed time.
When I run the preceding application on my system, the search requires just three milliseconds. Your system's timing may vary. To explore how much faster searching a red-black tree is compared to searching a binary search tree, I built both tree types in a similar manner, populated them with identical node values, and ran a comparison test. Here's the test code:
On my system the search using the red-black tree took scarcely any time compared to the same search using a binary search tree. The large difference is because, as discussed earlier, BST performance degrades quickly when you populate the tree with ordered or near-ordered values. In contrast, the red-black tree maintains branch balance even for ordered data, which is the root cause of the difference in search performance.
At this point, you've seen how red-black trees provide more efficient binary search operations for some types of data by maintaining more balance among the branches of the search tree than is usually possible with a typical binary search tree implementation. While adding nodes to red-black trees does take a little longer, that time is usually more than offset by improved search performance as the data volume in the tree grows. This article was originally published at DevX. http://www.devx.com/DevX/Article/36196
If you want more information on building red-black trees and other data structures in C#, I suggest you read this MSDN article.
Published at DZone with permission of Joydip Kanjilal . See the original article here.
Opinions expressed by DZone contributors are their own.