kenics.net

Technical notes on perl, python, php, sql, cgi, c/c++, q/kdb+, unix/shell, revision control tools, data structures & algorithms, and their applications into web services and other various forms of software engineering.

tree data structure

 
############################# 
####   tree structure    #### 
############################# 
 
- an ADT data strcuture. 
- the "root" node linked with child nodes of different depth thru a hierarchical strcture. 
 
 
        []            // root 
        | 
     []----[]         // child (or "parent" from its child perspective) 
     |      | 
  []---[] []--[]      // grand child, leaf 
 
- a node can have zero to multiple child nodes, depending on the requirement. 
- often comes with a specific search algo technique. 
- a tree where a node has only the max of two child nodes is called a "binary" tree. 
- nodes with no child are sometimes called "leaf" nodes. 
- the "depth/height" of a tree is the number of hops/edges between the root and the furthest/deepst child node. 
(so the above tree is height 2. root is height zero. height usually refers to the deepest depth of the tree, whereas depth can be used to describe "depth X nodes in a height Y tree" etc) 
- sometimes you hear the term descendant/ancestor nodes. 
 
## 
## full binary tree: a binary tree where every node has either two or zero child. 
## 
 
## 
## perfect/complete binary tree: a full binary tree where every leaf node has the same depth. 
## 
 
### 
###  balanced (binary) tree: for every node, the height of its left subtree and right subtree differ at most by one. 
### 
 
===> this is useful, because the min path from the root to the furthest node is minimized. 
===> but obviously, not always efficient, e.g. when you have certain items more frequently accessed, etc, then you might want to build an imbalanced tree. 
===> a "balanced" "binary" tree is always a classic CS topic, many implementations. (AVL tree, red-black tree, AA tree, splay tree, etc) 
 
(ref) 
http://stackoverflow.com/questions/8015630/definition-of-a-balanced-tree 
http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree 
 
 
an example C++ struct definition for a LL-based binary tree node would look like below. 
 
 
------------------------------------ 
 
template<typename T> strcut node{ 
  struct node* left; 
  struct node* right; 
  struct node* parent;    // often this is omitted when the binary tree is used for DFS searching purpose only. 

 
------------------------------------ 
 
====> intuitive, but we can also express a tree structure using array. 
 
e.g. we declare an array, 
 
root:  arr[1] 
left:  arr[2i] 
right: arr[2i+1] 
 
[0][1][2][3][4][5][6][7][8][9] 
 
====> we will see a real example in the heap sort section. 
 
 
(ref) 
http://en.wikipedia.org/wiki/Tree_(data_structure) 
http://ppp-lab.sakura.ne.jp/ProgrammingPlacePlus/algorithm/data_struct/007.html 
 
 
 
##################################### 
####    tree traversal methods    ### 
##################################### 
 
There are two kinds: 
(1) depth-first search (DFS)      // aka backtracking 
(2) breadth-first search (BFS) 
 
===> the name essentially says everything, both are traversal methods. 
===> depth-first explores each branch to the end then backtrack (both recursive and non-recursive are possible). commonly used in binary search tree. 
===> breadth-first explores all immediate nearest-hop nodes, then their nearest, and so on. commonly used in graph algo. 
 
(ref) 
http://en.wikipedia.org/wiki/Depth-first_search 
http://en.wikipedia.org/wiki/Breadth-first_search 
 
 
for binary tree, (1) depth-first search is commonly used. 
there are three kinds of DFS traversal(search) methods. 
 
(1) pre-order      //  root, left, right 
(2) in-order       //  left, root, right 
(3) post-order     //  left, right, root        // imagine, you want to free the whole tree where you malloc'ed each node, then this is the choice. 
 
if we use breadth-first search, then we traverse every depth(level) one by one. 
 
----------------------------------------// BFS_print_tree psuedo code 
Q.enqueue(root); 
while(!Q.isEmpty()) 

   u = Q.dequeue(); 
   print u.data; 
   if( !u.left ) Q.enqueue(u.left); 
   if( !u.right) Q.enqueue(u.right); 

---------------------------------------- 
 
 
(ref) 
http://en.wikipedia.org/wiki/Tree_traversal 
 
 
 
###################################### 
####   binary search tree (BST)   #### 
###################################### 
 
at any given subtree, the value of each node has to satisfy the below relation. 
 
      left_child < parent < right_child 
 
so searching for an elem in a binary tree can be done in O(log n), "assuming" you keep it balanced. 
an ultimately imbalanced tree is essentially just a LL. so the complexity can be O(n) 
 
NOTE: how to deal with if the same value, but to simplify, lets not allow dupe elems. 
NOTE: depending on how you construct the BST, it can be inefficient. 
 
 
e.g. 
 
     [4] 
      | 
  [2]---[5] 
   |     | 
[1]-[3][]-[6] 
 
 
===> a well balanced tree 
 
 
e.g. 
 
          [6] 
           | 
        [5]-[] 
         | 
      [4]-[] 
       | 
    [3]-[] 
     | 
  [2]-[] 
   | 
[1]-[] 
 
===> an imbalanced tree 
 
 
 
############################################### 
####   delete(node) operation for a tree   #### 
############################################### 
 
when deleting a node, you need to promote a descendant node, specifically either (1) the right most (biggest) node of the left subtree, or (2) the left most (smallest) node of the right subtree. 
 
e.g. 
 
        [123] 
          | 
   [55]-------[777] 
    |           | 
[3]--[66]  [150]--[9999] 
      | 
  [60]-[] 
    | 
[59]-[61] 
 
====> if you delete [123], then promote either [66] or [150] in its place. 
====> notice how we leave [66]'s left subtree in its place 
 
 
        [66] 
          | 
   [55]-------[777] 
    |           | 
[3]--[60]  [150]--[9999] 
      | 
  [59]-[61] 
 
 
 
 
 
 
################################################################# 
####   DFS traversal of a BST implementation example (C++)   #### 
################################################################# 
 
lets do all (1) pre-order, (2) in-order, (3) post-order, in a simple BST tree. no self-balancing ability. 
 
 
---------------------------------------------------------- // bt.h 
 
#include <iostream> 
using namespace std; 
 
template<typename T> struct node 

  node(T t): left(0),right(0){ data = t; } 
  T data; 
  struct node* left; 
  struct node* right; 
}; 
 
template<typename T> class BT 

 public: 
 BT(): root(0){} 
  void add(T); 
  void del(T); 
  int search(T); 
  void preOrder(); 
  void inOrder(); 
  void postOrder(); 
 private: 
  void m_preOrder(struct node<T>*); 
  void m_inOrder(struct node<T>*); 
  void m_postOrder(struct node<T>*); 
  struct node<T>* root; 
}; 
 
template<typename T> void BT<T>::add(T t) 

  struct node<T>* new_node = new node<T>(t); 
  struct node<T>* tmp = root; 
  if(root == 0){ 
    root = new_node; 
    return; 
  } 
 
  while(1) 
    { 
      if(tmp->data == t) 
          return; 
      else if(tmp->data > t) 
          if(tmp->left == 0) 
            tmp->left = new_node; 
          else 
            tmp = tmp->left; 
      else 
          if(tmp->right == 0) 
            tmp->right = new_node; 
          else 
            tmp = tmp->right; 
    } 

 
 
template<typename T> void BT<T>::del(T t) 

  if(root == 0) 
    return; 
 
  struct node<T>** tmp = &root; 
  struct node<T>* it = root; 
  // as explained in pointer/reference article, *tmp = 0 will really overwrite root = 0, but it = 0 will not change root = 0 
  /* 
 
    it -> node_A -> node_B 
    root -> node_A -> node_B 
 
    so, while overwriting it will only change where that particular pointer points to, if you overwrite it->next = 0, 
    that will overwrite root->next as well. because both are essentially node_A->next pointer 
 
   */ 
  struct node<T>** sub; 
  struct node<T>* subit; 
 
  while(1) 
    { 
      it = *tmp; 
      if(it == 0)              // key not found 
        break; 
      else if(it->data == t)   // found the guy to delete 
        { 
          if(it->left == 0 && it->right == 0) // if no child just delete it 
            *tmp = 0; 
          else if(it->left == 0) // only the right node present, then promote it 
            *tmp = it->right; 
          else if(it->right == 0) // only the left node present, then promote it 
            *tmp = it->left; 
          else // both child nodes present, then promote the right-most leaf(biggest val) of the left subtree 
            {  // but be careful, that right-most leaf node X can have its own left subtree Y which we need to leave in X's place 
 
              sub = &(it->left);   // this sub/subit business is the same deal as tmp/it trick 
              while(1) 
                { 
                  subit = *sub; 
                  if(subit->right == 0) 
                    break; 
                  sub = &(subit->right); 
                } 
              *sub = subit->left;   // replacing the right-most leaf(X) of the left subtree with the left subtree of X 
              *tmp = subit; 
              (*tmp)->right = it->right; 
              (*tmp)->left = it->left; 
            } 
          delete it; 
        } 
      else if(it->data > t) 
        tmp = &(it->left);    // precisely due to the above pointer explanation we need to do stuff like this 
      else 
        tmp = &(it->right); 
    } 

 
 
template<typename T> int BT<T>::search(T t) 

  int depth = 0; 
  struct node<T>* temp = root; 
 
  while(1) 
    { 
      if(temp == 0) 
        return -1;                // depth -1 is for no match. 
      else if(temp->data == t) 
        return depth; 
      else if(temp->data > t) 
        temp = temp->right; 
      else 
        temp = temp->left; 
      depth++; 
    } 

 
template<typename T> void BT<T>::preOrder() 

  m_preOrder(root); 

 
template<typename T> void BT<T>::m_preOrder(struct node<T>* tmp) 

  if(tmp == 0) 
    return; 
 
  cout << tmp->data << endl; 
  m_preOrder(tmp->left); 
  m_preOrder(tmp->right); 

 
template<typename T> void BT<T>::inOrder() 

  m_inOrder(root); 

 
template<typename T> void BT<T>::m_inOrder(struct node<T>* tmp) 

  if(tmp == 0) 
    return; 
 
  m_inOrder(tmp->left); 
  cout << tmp->data << endl; 
  m_inOrder(tmp->right); 

 
 
template<typename T> void BT<T>::postOrder() 

  m_postOrder(root); 

 
template<typename T> void BT<T>::m_postOrder(struct node<T>* tmp) 

  if(tmp == 0) 
    return; 
 
  m_postOrder(tmp->left); 
  m_postOrder(tmp->right); 
  cout << tmp->data << endl; 

 
---------------------------------------------------------- 
---------------------------------------------------------- // main.cpp 
 
#include "bt.h" 
#include <iostream> 
using namespace std; 
 
int main() 

  BT<int> bt; 
  bt.add(123);    // insert to root 
  bt.add(55);     // insert to left 
  bt.add(777);    // insert to right 
  bt.add(9999);   // insert to right right 
  bt.add(66);     // insert to left right 
  bt.add(150);    // insert to right left 
  bt.add(3);      // insert to left left 
  bt.add(60);     // insert to left right left 
  bt.add(7);      // insert to left left left 
 
  /* 
 
           [123] 
             | 
      [55]--------[777] 
       |            | 
  [3]----[66]  [150]--[9999] 
   |       | 
 []-[7][60]-[] 
 
   */ 
 
 
  cout << "preOrder" << endl;    // (root -> left -> right) 
  bt.preOrder();                 // 123 55 3 7 66 60 777 150 9999 
 
  cout << "inOrder" << endl;     // (left -> root -> right) 
  bt.inOrder();                  // 3 7 55 60 66 123 150 777 9999 
 
  cout << "postOrder" << endl;   // (left -> ritht -> root) 
  bt.postOrder();                // 7 3 60 66 55 150 9999 777 123 
 
 
  bt.del(9999);   // validates no-child case 
  bt.del(777);    // validates only-left-child case 
  bt.del(123);    // validates both-children case 
  bt.del(3);      // validates only-right-child case 
 
  cout << "inOrder" << endl;     // (left -> root -> right) 
  bt.inOrder();                  // 7 55 60 66 150 
 
  /* 
 
           [66] 
             | 
      [55]--------[150] 
       | 
  [7]----[60] 
 
  */ 
 
 
  return 0; 

 
 
---------------------------------------------------------- 
 
 
################################## 
####    height of the tree    #### 
################################## 
 
psuedo code: 
-------------------------------------- 
func height(node) 

   if(node == null) 
     return -1; 
   else 
     return 1 + max(height(node->left),height(node->right)); 

-------------------------------------- 
 
===> by changing max() with min() you get the shortest branch height. by comparing the two, you can check if it's BST. 
 
 
################################### 
####   interpolation search    #### 
################################### 
 
one simple technique to enhance a binary search is called interpolation search. 
 
like any other binary search, we need the target data set to be "already sorted". 
 
a usual structure for binary search can be either LL-based or array-based, but for this interpolation search technique, we need the array-based structure for random accessibility. 
 
the logic is: 
lets say you have the array of size 300. and the data is between 1 to 1000, already sorted. 
and your search val is 750. where do you look? probably at the 75% mark of the array. (ASSUMING the uniform distribution of the input data) 
 
 
so in code, instead of 
 
if ( target > arr[(first + last) / 2] ) 
 
we do 
 
if ( target >  ( (target - arr[first]) / (arr[last] - arr[first]) ) * (last - first)  ) 
 
 
(ref) 
http://trivial-contents.com/programming/algorithm/search/interpolation_search.html 
http://ppp-lab.sakura.ne.jp/ProgrammingPlacePlus/algorithm/search/005.html 
http://en.wikipedia.org/wiki/Interpolation_search 
 

  1. 2014-06-10 00:48:36 |
  2. Category : data_structure
  3. Page View:

Google Ads