### AVL BST tree

########################
####    AVL tree    ####
########################

as discussed in BST section of the tree data structure article, an un-balanced tree can be just a linear list, making the search efficiency O(n).

AVL tree, along with B-tree, 2-3 tree, red-black tree, AA tree, splay tree, is a balanced BST structure.

##
## core property
##

--- for any node, the heights of the two child sub-trees differ by at most one.
// if differ by 2 or more after insert/delete operations, then re-balance (aka rotation).

NOTE: this means a tree like below is still valid.


|
----
|        |
- -[]
|
-[]

(wiki)
http://en.wikipedia.org/wiki/AVL_tree
https://www.cs.auckland.ac.nz/software/AlgAnim/AVL.html

#########################
####   complexity    ####
#########################

## time (worst/average/best)

search     :  O(log n)
re-balance :  O(log n)   // aka rotation
insert     :  O(log n)
delete     :  O(log n)

===> because we always maintain the tree balanced, we always achieve the best possible case O(log n)

## space

auxiliary: O(1)
total: O(n)

##########################################
####   re-balance(rotation) scheme    ####
##########################################

rebalance operation is needed when an elem is inserted/deleted and causes the height of two child subtrees to differ by two or more.

this consistency check should be conducted on all ancestor nodes of the inserted/deleted node.
(why?)
because before insertion, each of its ancestors may have a balance factor 1 or -1, which may have changed due to an addition of a node in its downstream. thus, to check the latest balance factor, need to traverse up to the root from the leaf where you inserted a new node.
(how?)
two options;
- use parent pointers. but often BST tree nodes come without pointers to parent nodes.
- use recursive roll up.
(ref)
http://www.prefield.com/algorithm/container/avl_tree.html

# logic flow for rebalance (after insertion)

the rebalance is done via rotation, and there are 4 patterns of rotation. (wikipedia explains well)

(1) left-right
(2) left-left
(3) right-left
(4) right-right

also, for each node, we use the concept of "balance factor" = height(left subtree) - height(right subtree)

(1) left-right
(2) left-left

e.g.

    // root node. balance factor = 3 - 1 = 2
|
-[d]  // node  balance factor = 1 - 2 = -1
|
[a]-
|
[b]-[c]

===> balance factor 2 for a given node means the left subtree is too long.
===> then we look at the balance factor of the immediate left child node (in this case it's )
===> if it's -1, then now we call this left-right case.

we promote the right child node of , and demote . i.e.  becomes the left child node of  in the above example.
when you do this, as always, the left subtree of  needs to become the right subtree of .

so the resulting tree will be as below.

        // balance factor still 2
|
-[d]      // node  balance factor 1
|
-[c]
|
[a]-[b]

===>  a given node balance factor 2, and its immediate left child node's balance factor 1
===>  we call this left-left case.

we promote , and demote . as always, the right subtree of  becomes the left subtree of 


|
-----
|       |
[a]-[b] [c]-[d]

(3) right-left
(4) right-right

this is just a mirror of the (1),(2) cases.
right-left:  a given node balance factor -2, and its immediate right child node has balance factor 1
right-right: a given node balance factor -2, and its immediate right child node has balance factor -1

rotate like (1),(2)

===> see the below psuedo code from wikipedia.

(ref)
http://en.wikipedia.org/wiki/AVL_tree

----------------------------------------------------- // from wikipedia

if (balance_factor(L) == 2) { //The left column
let P=left_child(L)
if (balance_factor(P) == -1) { //The "Left Right Case"
rotate_left(P) //reduce to "Left Left Case"
}
//Left Left Case
rotate_right(L);
} else { // balance_factor(L) == -2, the right column
let P=right_child(L)
if (balance_factor(P) == 1) { //The "Right Left Case"
rotate_right(P) //reduce to "Right Right Case"
}
//Right Right Case
rotate_left(L);
}

-----------------------------------------------------

##
##  AVL node delete operation
##

(ref)
http://en.wikipedia.org/wiki/AVL_tree

wikipedia explains the good old delete(node) operation of bST, and add a step for balancing all ancester nodes of the node being moved.
i.e.

lets say we want to delete node X.
take the right-most node(=Y) of the left subtree of X.
replace X with Y.
if Y had any left child(=Z), then put Z into Y's old place
and then trace and balance all nodes from Z's parent to root.

e.g.  lets remove 


|
-----
|       |
[a]-[b] [c]-[d]
|
[e]-NULL

=====>  we promote [b] into 's place, and put [e] to where [b] used to be.

[b]
|
-----
|       |
[a]-[e] [c]-[d]

====> and then balance(3), balance(b), up until balance(root)

this concept is intuitive because the only height that has changed is the right subtree of  in this case, thus it makes sense to conduct our usual "balance each node tracing up till root" operations should starts from .

now this can be tricky implementation-wise.

with parent pointers available in nodes, then it's fairly simple, though the code will not be elegant. (see the below code)
without parent pointers, we need to apply slightly different steps leveraging recursion and balance() function feature. (see the code) further comments after the code.

##
## logic flow for rebalance (after deletion)
##

first of all, what remove(node n) does in a BST tree is;

remove n; then
(a) if(no child) no action;
(b) else if(only one side left|right child) promote the child to n's place;
(c) else if(both sides have child) promote the right most leaf of the left subtree of n

from the balance factor perspective,
for the case (a)(b), only subtress of n.parent's level or above is changing.
for the case (c), it is simpler in a way, because putting the right most child(lets call it x) of the left subtree of n, to n's place means x.parent's BF increments by one.

e.g. case: (a)&(b)

[a]  // BF = 1               [a]      // BF = 0
|                            |
[n]---[b]             ==>     [c]-[b]
|
[c]-[]

===> depending on whether [n] is the left or right child of its parent, BF of the parent node either decrements/increments.
===> thus only need to start the balance factor check from [a] onward.

e.g. case: (c)

[n]                       [d]
|                         |
[a]---[b]        ===>     [a]---[b]    // [a]'s BF incremented by 1
|                         |
[c]-[d]                   [c]-[]

====> it's a simple right-child deletion from [a]'s perspective. BF[a] += 1; regardless of the left child existence for [d]
===> thus only need to start the balance factor check from [a] onward.

###################################################################
####   implementation example in C++  (with parent pointers)   ####
###################################################################

we will re-use the BST code from the tree article.

just need to rotate(), balance(), and insert them into insert(), remove()

as you can see, the existence of a parent pointer in each node is adding extra complications for insert(), remove(), rotate() operations.
parent pointers are really not necessary as we will see in the next section that recursion will facilitate the upward traversal.

also one cannot help but notice we should be able to consolidate the code for left/right node rotation. it's essentially mirrorred procedure.

(ref)
http://cskill.wordpress.com/2012/09/02/avl-tree-with-parent-node/
( see the pointer manipulations inside the rotate() func def, very well written. )

------------------------------------------------ // avl.h

#include <iostream>
using namespace std;

template<typename T> struct node

node(T t): left(0), right(0), parent(0){ data = t; }
T data;
node* left;
node* right;
node* parent;
};

template<typename T> class AVL

public:
AVL(): size(0), root(0){}
void insert(T);
void remove(T);
int search(T);
int getSize(){return size;}
void preOrder();
private:
void balance(node<T>*);
void rotate(node<T>*,int);
int max(int,int);
int height(node<T>*);
int getBF(node<T>*);
void m_preOrder(node<T>*);
node<T>* root;
int size;
};

template<typename T> int AVL<T>::max(int x, int y)

if( x > y )
return x;
else
return y;

template<typename T> int AVL<T>::height(node<T>* nd)

if( nd == 0 )
return 0;
else if(nd->left == 0 && nd->right == 0)
return 1; // normally, this is 0, but to distinguish from the above nd=null case, lets mark this 1. so height(root)=1
else
return 1 + max( height(nd->left), height(nd->right));

template<typename T> int AVL<T>::getBF(node<T>* nd)

if( nd == 0 )
return 0;
return height(nd->left) - height(nd->right);

template<typename T> int AVL<T>::search(T t)

node<T>* tmp = root;
while(1)
{
if(tmp == 0)
return 0;
else if(tmp->data == t)
return 1;
else if(tmp->data > t)
tmp = tmp->left;
else
tmp = tmp->right;
}

template<typename T> void AVL<T>::insert(T t)

node<T>* new_node = new node<T>(t);
node<T>* it = root;
node<T>* prev = 0;
node<T>** ptr = &root;

while(1)
{
it = *ptr;
if(it == 0)
{
*ptr = new_node;
(*ptr)->parent = prev;
break;
}
else if(it->data == t)
return; // lets just say this is the spec
else if(it->data > t)
ptr = &(it->left);
else
ptr = &(it->right);
prev = it;
}

size++;
balance(*ptr);

template<typename T> void AVL<T>::balance(node<T>* nd)

if(nd == 0) // root's parent is null
return;

node<T>* tmp;
if( getBF(nd) == 2){
tmp = nd->left;
if( getBF(tmp) == -1)  // left-right case
rotate(tmp,0);
rotate(nd,1);
}else if( getBF(nd) == -2){
tmp = nd->right;
if( getBF(tmp) == 1) // right-left case
rotate(tmp,1);
rotate(nd,0);
}else  // -2 < BF < 2, then check parent
balance(nd->parent);

template<typename T> void AVL<T>::rotate(node<T>* nd, int side) // 0 = left, 1 = right

node<T>* tmpc;  // tmp child
node<T>* tmpp;  // tmp parent
node<T>* tmpcc; // tmp child's child

if(side == 0) // rotate left
{
tmpp = nd->parent;
tmpc = nd->right;
tmpcc = tmpc->left;

if(tmpp == 0){
root = tmpc;
tmpc->parent = 0;
}else{
tmpc->parent = tmpp;
if(tmpc->data < tmpp->data) // a bit unelegant, but need to check if its parent node's left or right child
tmpp->left = tmpc;
else
tmpp->right = tmpc;
}

tmpc->left = nd;
nd->parent = tmpc;
nd->right = tmpcc;
if(tmpcc != 0)
tmpcc->parent = nd;
}
else if(side == 1) // rotate right
{
tmpp = nd->parent;
tmpc = nd->left;
tmpcc = tmpc->right;

if(tmpp == 0){
root = tmpc;
tmpc->parent = 0;
}else{
tmpc->parent = tmpp;
if(tmpc->data < tmpp->data)
tmpp->left = tmpc;
else
tmpp->right = tmpc;
}

tmpc->right = nd;
nd->parent = tmpc;
nd->left = tmpcc;
if(tmpcc != 0)
tmpcc->parent = nd;  // this line is missing in the code from the ref link
}

template<typename T> void AVL<T>::remove(T t)

node<T>* it = root; // for iteration
node<T>* subit;

node<T>* tmpp;
node<T>* tmpr;
node<T>* tmpl;
node<T>* tmpcl;

while(1)
{
if(it == 0)      // key not found
break;
else if(it->data == t)  // found the key to delete
{
subit = it;
if(it->left == 0 && it->right == 0) // if no child, just delete it
{
tmpp = it->parent;
if(tmpp->data > it->data)
tmpp->left = 0;
else
tmpp->right = 0;
}
else if(it->left == 0)  // only right child exists, so promote it
{
tmpp = it->parent;
tmpr = it->right;

tmpp->right = tmpr;
tmpr->parent = tmpp;
}
else if(it->right == 0) // only left child exists, so promote it
{
tmpp = it->parent;
tmpr = it->left;

tmpp->left = tmpr;
tmpr->parent = tmpp;
}
else // both child nodes exist, then promote the right-most leaf of the left subtree
{
subit = it->left;
while(1)
{
if(subit->right == 0) // subit = right-most leaf of the left subtree of nd
break;
subit = subit->right;
}
// replacing the right-most leaf(X) of the left subtree with the left subtree of X
tmpcl = subit->left;
tmpp = subit->parent;
tmpl = it->left;
tmpr = it->right;

tmpl->parent = subit;
tmpr->parent = subit;
subit->right = tmpr;
subit->left = tmpl;

tmpcl->parent = subit->parent;
tmpp->right = tmpcl;
}
delete it;
balance(subit->parent);
break;
}
else if(it->data > t)
it = it->left;
else
it = it->right;
} // while loop

template<typename T> void AVL<T>::preOrder()

m_preOrder(root);

template<typename T> void AVL<T>::m_preOrder(node<T>* nd)

if(nd == 0)
return;
if(nd->parent == 0)
cout << nd->data << " parent:root" << endl;
else
cout << nd->data << " parent:" << nd->parent->data << endl;
m_preOrder(nd->left);
m_preOrder(nd->right);

---------------------------------------------------
--------------------------------------------------- // avl.cpp

#include "avl.h"
#include <iostream>
using namespace std;

int main()

AVL<int> avl;

// one easy way to visualize the transition is you insert avl.preOrder() after each insert()

avl.insert(11);
avl.insert(22);
avl.insert(9);
avl.insert(5);
avl.insert(7);    // left-right case
avl.insert(3);    // left-left case
avl.insert(33);
avl.insert(44);   // right-right case
avl.insert(30);    // right-left case

avl.preOrder();

avl.remove(5);  // right-right case
avl.remove(3);  // right-left case

cout << endl;
avl.preOrder();

return 0;

---------------------------------------------------

######################################################################
####   implementation example in C++  (without parent pointers)   ####
######################################################################

now we noticed from the above implementation example that there are a few things we can dramatically improve.
- use of leaf instead of explicit independent pointer variables left|right in node def.
- use of recursion can reduce code significantly

(ref)
http://www.prefield.com/algorithm/container/avl_tree.html

---------------------------------------------------- // avl2.h

#include <iostream>
using namespace std;

template<typename T> struct node

node(T t){ data = t; leaf = 0; leaf = 0;}
T data;
node* leaf;
};

template<typename T> class AVL

public:
AVL(): size(0), root(0){}
void insert(T);
void remove(T);
int search(T);
int getSize(){return size;}
void preOrder();
private:
node<T>* balance(node<T>*);
void rotate(node<T>*,int);
node<T>* insert(node<T>*, node<T>*);
node<T>* remove(node<T>*, T t);
node<T>* rightMost(node<T>*, node<T>*);
int max(int,int);
int height(node<T>*);
int getBF(node<T>*);
void m_preOrder(node<T>*);
node<T>* root;
int size;
};

template<typename T> int AVL<T>::max(int x, int y)

if( x > y )
return x;
else
return y;

template<typename T> int AVL<T>::height(node<T>* nd)

if( nd == 0 )
return 0;
else if(nd->leaf == 0 && nd->leaf == 0)
return 1; // normally, this is 0, but to distinguish from the above nd=null case, lets mark this 1. so height(root)=1
else
return 1 + max( height(nd->leaf), height(nd->leaf));

template<typename T> int AVL<T>::getBF(node<T>* nd)

if( nd == 0 )
return 0;
return height(nd->leaf) - height(nd->leaf);

template<typename T> int AVL<T>::search(T t)

node<T>* tmp = root;
while(1)
{
if(tmp == 0)
return 0;
else if(tmp->data == t)
return 1;
int i = tmp->data < t ;  // this is where leaf[] helps reduce code size
tmp = tmp->leaf[i];
}

template<typename T> void AVL<T>::insert(T t)

node<T>* new_node = new node<T>(t);
root = insert(root, new_node);
size++;

template<typename T> node<T>* AVL<T>::insert(node<T>* it , node<T>* ins)

if(it == 0)
return ins;
int i = it->data < ins->data;
it->leaf[i] = insert(it->leaf[i], ins);
return balance(it);

template<typename T> node<T>* AVL<T>::balance(node<T>* nd)

if(nd == 0) // root's parent is null
return nd;

node<T>* tmp;
node<T>* rt = nd;

// some clever way to optimize this in the code from the ref link
// we dont change code here because to leverage leaf[], i have to get rid of getBF() and use height(leaf[i]) scheme
// which certainly helps the code size reduction by half, but triples the difficulty on intuitive readability.

if( getBF(nd) == 2){
tmp = rt = nd->leaf;
if( getBF(tmp) == -1){  // left-right case
cout << "left right - ";
rt = tmp->leaf;
rotate(tmp,0);
nd->leaf = rt;
}
cout << "left left" << endl;
rotate(nd,1);
}else if( getBF(nd) == -2){
tmp = rt = nd->leaf;
if( getBF(tmp) == 1){ // right-left case
cout << "right left - ";
rt = tmp->leaf;
rotate(tmp,1);
nd->leaf = rt;
}
cout << "right right" << endl;
rotate(nd,0);
}
return rt;

template<typename T> void AVL<T>::rotate(node<T>* nd, int side) // 0 = left, 1 = right

node<T>* tmpc;  // tmp child
node<T>* tmpcc; // tmp child's child

tmpc = nd->leaf[!side];
tmpcc = tmpc->leaf[side];

tmpc->leaf[side] = nd;
nd->leaf[!side] = tmpcc;

template<typename T> void AVL<T>::remove(T t)  // similar recursive technique to insert()

root = remove(root,t);

template<typename T> node<T>* AVL<T>::remove(node<T>* nd, T t)

if(nd == 0)
return 0;
else if(nd->data == t){
//  cout << "remove " << nd->data << endl;
node<T>* tmpl = nd->leaf;
node<T>* tmpr = nd->leaf;

delete nd;
if(tmpl == 0 && tmpr == 0)
{
cout << "case 1 " << endl;
return 0;
}
else if(tmpl == 0)
{
cout << "case 2 " << endl;
return tmpr;
}
else if(tmpr == 0)
{
cout << "case 3 " << endl;
return tmpl;
}
else
{
cout << "case 4 " << endl;
return rightMost(tmpl, tmpr);
}

}else{
int i = nd->data < t;
nd->leaf[i] = remove(nd->leaf[i],t);
return balance(nd);  // here you see a clever recursive roll up to the root node
}

template<typename T> node<T>* AVL<T>::rightMost(node<T>* nd, node<T>* rhs) // used the same logic from the ref link

if(nd == 0)
return rhs;
else
nd->leaf = rightMost(nd->leaf,rhs);
return balance(nd);

template<typename T> void AVL<T>::preOrder()

m_preOrder(root);

template<typename T> void AVL<T>::m_preOrder(node<T>* nd)

if(nd == 0)
return;
cout << nd->data << endl;
m_preOrder(nd->leaf);
m_preOrder(nd->leaf);

----------------------------------------------------
---------------------------------------------------- // avl2.cpp
#include "avl2.h"
#include <iostream>
using namespace std;

int main()

AVL<int> avl;

// one easy way to visualize the transition is you insert avl.preOrder() after each insert()

avl.insert(11);
avl.insert(22);
avl.insert(9);
avl.insert(5);
avl.insert(7);    // left-right case
avl.insert(3);    // left-left case
avl.insert(33);
avl.insert(44);   // right-right case
avl.insert(30);   // right-left case

avl.preOrder();

avl.remove(5);  // case 3,  right-right
avl.remove(3);  // case 1,  right-left,
avl.remove(22); // case 4,  right-right, then right-left
avl.remove(33); // case 4
avl.remove(44); // case 1
avl.remove(11); // case 4
avl.remove(7);  // case 1
avl.remove(9);  // case 2
avl.remove(30); // case 1

cout << endl;
avl.preOrder();

return 0;

----------------------------------------------------

=====> notice how remove() is implemented. until reaching the node to delete, it's essentially the same recursive traversal from the root as we already used in insert().
=====> but after finding the target node(=X) for deletion, instead of popping the right-most leaf node(=Y) of the left subtree of X, we just put Y.right = X.right, then start balance(Y), tracing up the tree until where X used to be.

the reason for doing  Y.right = X.right at the bottom of the recursion is because after X is removed, we need to join X.left and X.right subtrees.
the biggest elem in X.left is Y. the smallest elem in X.right subtree is X.right. so when deleting X, i.e. promoting Y, X.right subtree should come into Y.right.

and we just balance(Y) upward to X's position, that ensures the balancing. very clever. and simplifies the implementation.

(ref)
http://www.prefield.com/algorithm/container/avl_tree.html

also, notice, for illustrative purpose, we separated case 1-4, but for code optimization, since rightMost() comprehensively covers all vases using the logic in case 4, we only need the code for case 4.

change

---------------------------------------- // BEFORE

if(tmpl == 0 && tmpr == 0)
{
cout << "case 1 " << endl;
return 0;
}
else if(tmpl == 0)
{
cout << "case 2 " << endl;
return tmpr;
}
else if(tmpr == 0)
{
cout << "case 3 " << endl;
return tmpl;
}
else
{
cout << "case 4 " << endl;
return rightMost(tmpl, tmpr);
}

----------------------------------------

into

---------------------------------------- // AFTER

cout << "case 4 " << endl;
return rightMost(tmpl, tmpr);

----------------------------------------

1. 2014-07-28 23:56:32 |
2. Category : data_structure
3. Page View: