Growing Trees in Cyberspace

What are trees in programming, and what are they used for?

6 min readMay 30, 2021

In programming, a tree is a non-linear data structure. It has its root at the top, and the leaves are at the bottom, just like in real life.

Each data point in a tree is called a node, and parent nodes are linked to their children with connections called edges. Children of the same parent are called siblings, and nodes without children are called leaf nodes. Children may also be referred to as sub trees of the parent.

A tree with N nodes will have N-1 edges. If there are two edges between the root node and a child node, we can say that the child has a depth of two. The height of the tree is defined as the depth of the deepest node. A tree with one node would have a height of 0, just like indices of an array.

Some applications of a tree are:

Storing naturally hierarchical data
Organizing data for quick search, insertion and deletion
Network routing

A binary tree is a tree where every node has, at most, 2 children. Binary trees have a very strict social system. Imagine if your parents told you that you could only have 2 children!

However, that still leaves quite a few options. We could have only left nodes only right nodes, alternating right and left nodes, mostly left but with random right nodes, etc.

Trees where each node has only one child are called degenerate trees. They are more like linked lists than trees.

This is clearly not strict enough - the society is descending into anarchy.

Luckily, we can go further.

The next rule we can apply is that all levels except the last must be completely filled, in order from left to right. This is called a complete binary tree.

If we keep going and require all levels to be filled, this is called a perfect binary tree.

Finally, we have achieved perfection. Or have we?

The height of a perfect binary tree is equal to log₂(n+1)-1, where n is the number of nodes. This may look confusing, but it only means that every time you add a new layer you double the width. This is fairly intuitive: height 0 has 1 node, height 1 has 2 , height 2 has 4, 3 has 8, and so on.

Each new layer multiplies the previous by 2, because each node has 2 children. This means the maximum number of nodes in each tree can also be expressed as the sum of the heights squared, i.e. (0² +1² +2²… h²). This is also equivalent to 2^(h+1) -1. In turn, the height can be written as the inverse: log2(n+1) -1. The -1 is because height starts at 0. Why is this important?

It all comes down to time-cost complexity.

Let’s go back to the complete binary tree. This type of tree can also be called a balanced binary tree, because one side cannot be longer than the other by more than one node (with some exceptions).

Now let’s add some more structure. This will only work with data that can be expressed with ordered numeric values. Imagine we have a dataset of products, where each one has a unique ID. We want to organize these by their ID value in a tree, but what is the best way to do this? One way is a balanced binary search tree. There are several types of these, with great names like AVL tree, B tree, A-B tree, 2–3 tree, 2–3–4 tree, and red-black tree, to name a few. The essence of all these trees is that they are self balancing.

So how does this work? A balanced binary search tree is organized where the middle value is the root. So if you have data with values between 1 and 100, the root node would have a value of 50. The child on the left MUST be smaller than 50, and the child on the right MUST be larger. This is the “search” part of the tree, because it makes every operation orders of magnitude faster.

This is incorrect because 2 is smaller than 3, but it has ended up on the right side of 3!

I will not go into depth about time complexity, if you would like further reading, go here.

The main thing developers are interested in is the speed at which we can access, search, insert, and delete data. In a balanced binary tree, the worst case would be O(log(n)) for all of the above. A regular binary tree has a worst case of O(n) for all of the above. What is the difference, you may ask? For small datasets, it is negligible. But for large data, in the millions or billions, where you have to perform these operations many times per second, it can be the difference between a couple seconds and the age of the universe!

https://thumbor.forbes.com/thumbor/711x481/https://blogs-images.forbes.com/startswithabang/files/2018/05/history.jpg?width=960

Let’s say we had a dataset with a billion points, such as COVID-19 infection data, and we wanted to perform one million insertions into this set because we had one million new cases. If we made the mistake of using an unsorted binary tree, this would take 1,000,000,000,000,000 operations in the worst case, but with a balanced binary search tree, the worst case is only 50 operations! For vaccine researchers, this could be a matter of life and death.

This works because of the way you can search through sorted trees. If you want to find a data point, you can take a straight path through the tree to find it. If we have a dataset of 100 items, and we want to search a value of 31, we start by comparing against the root.

With 100 items, the root would be 50. Item 31 is less than 50, so we go to the left child of the root. Now everything on this side is lower than 50, so we repeat the process as if we had a tree with only 30 nodes. Item 31 is bigger than 30, so we go to the right. Now 31 is lower than 35, so we go to the left, and there it is!

We only used 3 operations to get there, which is also the height of the tree. If this tree were full, the worst case would still be the height of the tree, or log₂(101)-1. This is much faster than searching every node one by one.

This doesn’t mean that you should stop using arrays and switch everything over to balanced binary search trees. There are many options when choosing how to organize your data, make sure you are using the right structure for the right job.

Now you know how trees can grow in cyberspace!

Sources:

Red-black tree - Wikipedia

Red-black tree Type tree Invented 1972 Invented by Rudolf Bayer Algorithm Average Worst case Space Search Insert Delete…

en.wikipedia.org

https://www.geeksforgeeks.org/complexity-different-operations-binary-tree-binary-search-tree-avl-tree/#:~:text=Searching%3A%20For%20searching%20element%201,as%20left%20child%20of%201.

https://www.geeksforgeeks.org/red-black-tree-vs-avl-tree/

https://www.youtube.com/watch?v=qH6yxkw0u78&list=PL2_aWCzGMAwI3W_JlcBbtYTwiQSsOTa6P&index=25&ab_channel=mycodeschoolmycodeschool

Growing Trees in Cyberspace

What are trees in programming, and what are they used for?

Red-black tree - Wikipedia

Red-black tree Type tree Invented 1972 Invented by Rudolf Bayer Algorithm Average Worst case Space Search Insert Delete…

Written by Luke Penner