Implementation of a binary search tree

Implementation of a binary search tree
Implementation of a binary search tree
Anonim

Binary search tree - a structured database containing nodes, two links to other nodes, right and left. Nodes are an object of the class that has data, and NULL is the sign that marks the end of the tree.

Optimal Binary Search Trees
Optimal Binary Search Trees

It is often referred to as BST, which has a special property: nodes larger than the root are located to the right of it, and smaller ones to the left.

General theory and terminology

In a binary search tree, each node, excluding the root, is connected by a directed edge from one node to another, which is called the parent. Each of them can be connected to an arbitrary number of nodes, called children. Nodes without "children" are called leaves (outer nodes). Elements that are not leaves are called internal. Nodes with the same parent are siblings. The topmost node is called the root. In BST, assign an element to each node and make sure they have a special property set for them.

Tree terminology
Tree terminology

Tree terminology:

  1. The depth of a node is the number of edges defined from the root to it.
  2. Height of a node is the number of edges defined from it to the deepest leaf.
  3. The height of the tree is determined by the height of the root.
  4. Binary search tree is a special design, it provides the best ratio of height and number of nodes.
  5. Height h with N nodes at most O (log N).

You can easily prove this by counting the nodes at each level, starting from the root, assuming that it contains the largest number of them: n=1 + 2 + 4 + … + 2 h-1 + 2 h=2 h + 1 - 1 Solving this for h gives h=O (log n).

Benefits of wood:

  1. Reflect the structural relationships of the data.
  2. Used to represent hierarchies.
  3. Ensure efficient installation and search.
  4. Trees are very flexible data, allowing you to move subtrees with minimal effort.

Search method

In general, to determine if a value is in the BST, start a binary search tree at its root and determine if it meets the requirements:

  • be at the root;
  • be in the left subtree of root;
  • in the right subtree of root.

If no base register is satisfied, a recursive search is performed in the corresponding subtree. There are actually two basic options:

  1. The tree is empty - return false.
  2. The value is in the root node - return true.

It should be noted that a binary search tree with a developed schema always starts searching along the path from the root to the leaf. In the worst case, it goes all the way to the leaf. Therefore, the worst time is proportional to the length of the longest path from the root to the leaf, which is the height of the tree. In general, this is when you need to know how long it takes to look up as a function of the number of values stored in the tree.

In other words, there is a relationship between the number of nodes in a BST and the height of a tree, depending on its "shape". In the worst case, nodes have only one child, and a balanced binary search tree is essentially a linked list. For example:

50

/

10

15

30

/

20

This tree has 5 nodes and Finding values in the range 16-19 and 21-29 will require the following path from the root to the leaf (the node containing the value 20), i.e., it will take time proportional to the number of nodes. At best, they all have 2 children, and the leaves are located at the same depth.

The search tree has 7 nodes
The search tree has 7 nodes

This binary search tree has 7 nodes and In general, a tree like this (full tree) will have a height of approximately log 2 (N), where N is the number of nodes in the tree. The value of log 2 (N) is the number of times (2) that N can be divided before zero is reached.

Summarizing: the worst time needed to search in BST is O (tree height). The worst case "linear" tree is O(N), where N is the number of nodes in the tree. At best, a "complete" tree is O(log N).

BST binary insert

Wondering where should bethe new node is located in the BST, you need to understand the logic, it must be placed where the user finds it. In addition, you need to remember the rules:

  1. Duplicates are not allowed, attempting to insert a duplicate value will throw an exception.
  2. The public insert method uses a helper recursive "helper" method to actually insert.
  3. A node containing a new value is always inserted as a leaf in BST.
  4. The public insert method returns void, but the helper method returns a BSTnode. It does this to handle the case where the node passed to it is null.

In general, the helper method indicates that if the original binary search tree is empty, the result is a tree with one node. Otherwise, the result will be a pointer to the same node that was passed as an argument.

Deletion in binary algorithm

As you might expect, deleting an element involves finding a node that contains the value to be removed. There are several things in this code:

  1. BST uses a helper, overloaded delete method. If the element you are looking for is not in the tree, then the helper method will eventually be called with n==null. This is not considered an error, the tree simply does not change in this case. The delete helper method returns a value - a pointer to the updated tree.
  2. When a leaf is removed, the removal from the binary search tree sets the corresponding child pointer of its parent to null, or the root to null if the one being removed isthe node is a root and has no children.
  3. Note that the delete call must be one of the following: root=delete (root, key), n.setLeft (delete (n.getLeft (), key)), n.setRight (delete(n.getRight(), key)). Thus, in all three cases it is correct that the delete method simply returns null.
  4. When the search for the node containing the value to be deleted succeeds, there are three options: the node to be deleted is a leaf (has no children), the node to be deleted has one child, it has two children.
  5. When the node being removed has one child, you can simply replace it with a child, returning a pointer to the child.
  6. If the node to be deleted has zero or 1 children, then the delete method will "follow the path" from the root to that node. So the worst time is proportional to the height of the tree, for both search and insert.

If the node to be removed has two children, the following steps are taken:

  1. Find the node to be deleted, following the path from the root to it.
  2. Find the smallest value of v in the right subtree, continuing along the path to the leaf.
  3. Recursively remove the value of v, follow the same path as in step 2.
  4. Thus, in the worst case, the path from the root to the leaf is performed twice.

Order of traverses

Traversal is a process that visits all nodes in a tree. Because a C binary search tree is a non-linear data structure, there is no unique traversal. For example, sometimes several traversal algorithmsgrouped into the following two types:

  • crossing depth;
  • first pass.

There is only one kind of width crossing - bypassing the level. This traversal visits nodes level down and left, top and right.

There are three different types of depth crossings:

  1. Passing PreOrder - first visit the parent and then the left and right child.
  2. Passing InOrder - visiting the left child, then the parent and the right child.
  3. Past the PostOrder - visiting the left child, then the right child, then the parent.

Example for four traversals of a binary search tree:

  1. PreOrder - 8, 5, 9, 7, 1, 12, 2, 4, 11, 3.
  2. InOrder - 9, 5, 1, 7, 2, 12, 8, 4, 3, 11.
  3. PostOrder - 9, 1, 2, 12, 7, 5, 3, 11, 4, 8.
  4. LevelOrder - 8, 5, 4, 9, 7, 11, 1, 12, 3, 2.

The figure shows the order in which nodes are visited. The number 1 is the first node in a particular traversal, and 7 is the last node.

Indicates the last node
Indicates the last node

These general traversals can be represented as a single algorithm, assuming that each node is visited three times. The Euler tour is a walk around a binary tree where each edge is treated as a wall that the user cannot cross. In this walk, each node will be visited either on the left, or below, or on the right. The Euler tour, which visits the nodes on the left, causes the preposition to be bypassed. When the nodes below are visited, they get traversed in order. And when the nodes on the right are visited - getstep-by-step bypass.

Implementation and bypass
Implementation and bypass

Navigation and Debugging

To make it easier to navigate the tree, create functions that first check whether they are the left or right child. To change the position of a node, there must be easy access to the pointer at the parent node. Correctly implementing a tree is very difficult, so you need to know and apply debugging processes. A binary search tree with an implementation often has pointers that do not actually indicate the direction of travel.

To figure this all out, a function is used that checks if the tree can be correct, and helps to find many errors. For example, it checks if the parent node is a child node. With assert(is_wellformed(root)) many errors can be caught prematurely. Using a couple of given breakpoints within this function, you can also determine exactly which pointer is wrong.

Function Konsolenausgabe

This function flushes the entire tree to the console and is therefore very useful. The order in which the tree output goal is executed is:

  1. To do this, you first need to determine what information will be output through the node.
  2. And you also need to know how wide and tall the tree is to account for how much space to leave.
  3. The following functions calculate this information for the tree and each subtree. Since you can only write to the console line by line, you will also need to print the tree line by line.
  4. Now we need another way to withdrawthe whole tree, not just line by line.
  5. With the help of the dump function, you can read the tree and significantly improve the output algorithm, as far as speed is concerned.

However, this function will be difficult to use on large trees.

Copy constructor and destructor

Copy constructor and destructor
Copy constructor and destructor

Because a tree is not a trivial data structure, it's better to implement a copy constructor, a destructor, and an assignment operator. The destructor is easy to implement recursively. For very large trees, it can handle "heap overflow". In this case, it is formulated iteratively. The idea is to remove the leaf representing the smallest value of all leaves, so it is on the left side of the tree. Cutting off the first leaves creates new ones, and the tree shrinks until it finally ceases to exist.

The copy constructor can also be implemented recursively, but be careful if an exception is thrown. Otherwise, the tree quickly becomes confusing and error-prone. That's why the iterative version is preferred. The idea is to go through the old tree and the new tree, as you would in the destructor, cloning all the nodes that are in the old tree but not the new ones.

With this method, the binary search tree implementation is always in a he althy state and can be removed by the destructor even in an incomplete state. If an exception occurs, all you need to do is instruct the destructor to delete the semi-finished tree. assignment operatorcan be easily implemented using Copy & Swap.

Creating a binary search tree

Optimal binary search trees are incredibly efficient if managed properly. Some rules for binary search trees:

  1. A parent node has at most 2 child nodes.
  2. The left child node is always less than the parent node.
  3. A valid child node is always greater than or equal to the parent node.
Create 10 root node
Create 10 root node

The array that will be used to build the binary search tree:

  1. A base integer array of seven values in unsorted order.
  2. The first value in the array is 10, so the first step in building the tree is to create a 10 root node, as shown here.
  3. With a set of root nodes, all other values will be children of this node. Referring to the rules, the first step to be taken to add 7 to the tree is to compare it to the root node.
  4. If the value 7 is less than 10, it will become the left child node.
  5. If the value 7 is greater than or equal to 10, it will move to the right. Since 7 is known to be less than 10, it is designated as the left child node.
  6. Recursively perform comparisons for each element.
  7. Following the same pattern, perform the same comparison against the 14th value in the array.
  8. Comparing the value 14 to the root node 10, knowing that 14 is the correct child.
  9. Walking through the array,come to 20.
  10. Start by comparing the array with 10, whichever is greater. So move to the right and compare it to 14, he is over 14 and has no children on the right.
  11. Now there is a value of 1. Following the same pattern as the other values, compare 1 to 10, moving to the left and comparing to 7 and finally to the 1 left child of the 7th node.
  12. At 5, compare it to 10. Since 5 is less than 10, pass to the left and compare it to 7.
  13. Knowing that 5 is less than 7, continue down the tree and compare 5 with 1 value.
  14. If 1 has no children and 5 is greater than 1, then 5 is a valid child of 1 node.
  15. Finally insert the value 8 into the tree.
  16. When 8 is less than 10, move it to the left and compare it to 7, 8 is greater than 7, so move it to the right and complete the tree, making 8 a proper child of 7.
Creating a Binary Search Tree
Creating a Binary Search Tree

Getting and evaluating the simple elegance of optimal binary search trees. Like many other topics in programming, the power of binary search trees comes from their ability to resolve data into small, related components. From now on, you can work with the full dataset in an organized way.

Potential Binary Search Issues

Potential Binary Search Issues
Potential Binary Search Issues

Binary search trees are great, but there are a few caveats to keep in mind. They are usually only effective if they are balanced. A balanced tree is a tree in whichthe difference between the heights of the subtrees of any node in the tree is at most one. Let's look at an example that might help clarify the rules. Let's imagine that the array starts as sortable.

If you try to run a binary search tree algorithm on this tree, it will perform exactly as if it were just iterating through the array until the desired value is found. The power of binary search lies in the ability to quickly filter out unwanted values. When a tree is not balanced, it will not provide the same benefits as a balanced tree.

It is very important to examine the data the user is working with when creating a binary search tree. You can integrate routines such as array randomization before implementing a binary search tree for integers to balance it out.

Binary search calculation examples

We need to determine what kind of tree will result if 25 is inserted into the following binary search tree:

10

/

/

5 15

/ /

/ /

2 12 20

When inserting x into a tree T that does not yet contain x, the key x is always placed in a new leaf. In connection with this, the new tree will look like:

10

/

/

5 15

/ /

/ /

2 12 20

25

What kind of tree would you get if you inserted 7 into the following binary search tree?

10

/

/

5 15

/ /

/ /

2 12 20

Answer:

10

/

/

/

5 15

/ / /

/ / /

2 7 12 20

A binary search tree can be used to store any object. The advantage of using a binary search tree instead of a linked list is that if the tree is reasonably balanced and more like a "full" tree than a "linear" one, insertion, search, and all delete operations can be implemented to run in O(log N) time.

Recommended: