kenics.net

Technical notes on perl, python, php, sql, cgi, c/c++, q/kdb+, unix/shell, revision control tools, data structures & algorithms, and their applications into web services and other various forms of software engineering.

counting sort

 
############################ 
####   counting sort    #### 
############################ 
 
while any comparison-based sort algo's average/worst case time complexity cannot beat O(n*log(n)), there is a way to achieve O(n), but with tradeoff with space. 
 
# input assumption 
when sorting N elements, if it's known that each element's value is in the range [0,k), where k is much smaller than N, then we can achieve O(n). 
 
count the occurrence of each of 0 to k-1 by traversing each of N. then print each of 0 to k-1 as many times as they occur. 
 
e.g. 
 
input:  3 1 2 0 3 3 1 2 0 0 3 1 2 
 
you have a knowledge that the input data has value [0,k) 
in this case, k = 4 
 
3 appears 4 times 
2 appears 3 times 
1 appears 3 times 
0 appears 3 times 
 
print from 0 to k-1 as many times as they appear. 
 
 
(ref) 
http://rosettacode.org/wiki/Sorting_algorithms/Counting_sort 
 
 
########################################### 
####   implementation example in C++   #### 
########################################### 
 
 
lets use the same example input from the above. 
 
 
---------------------------------//  counting.cpp 
#include <iostream> 
using namespace std; 
 
void counting_sort(int* arr, int size, int k) 

  int* temp = new int[4]; 
 
  int i; 
  for(i = 0; i < size ; i++) 
    temp[arr[i]]++; 
 
  int idx = 0; 
  for(i = 0; i < k; i++) 
    while(temp[i]-- > 0) 
      arr[idx++] = i; 
 
  delete [] temp; 

 
void printArray(int* arr, int size) 

  for(int i = 0; i < size ; i++) 
    cout << arr[i]; 
  cout << endl; 

 
int main() 

  int arr[] = {3,1,2,0,3,3,1,2,0,0,3,1,2}; 
  int n = sizeof(arr)/sizeof(arr[0]);      // n = 13 
  int k = 4;                               // this is given 
  printArray(arr,n);           // 3120331200312 
  counting_sort(arr,n,k); 
  printArray(arr,n);           // 0001112223333 
 
  return 0; 

 
----------------------------------- 
 
 
 
 
################################# 
####   complexity analysis   #### 
################################# 
 
as seen above, the algo iterates thru the input array twice, first time for counting, and second time printing/writing the output to the original array. the size of that print is the size of elements counted in the first loop that is N. so in total we have 2*N operations. regardless of best/average/worst. thus O(2n) = O(n) 
 

# time 

 
worst   : O(n) 
average : O(n) 
best    : O(n) 
 

# space 

 
total     : O(n+k) 
auxiliary : O(k) 
 
other than the input array, we need extra array[k] for storing the counting result. as stated at the beginning, the assumption is k is much smaller than n for this algo to be efficient. but if not, space inefficiency becomes apparent, which leads us to other non-comparision-based algo such as radix sort and bucket(bin) sort. 
 
if your only knowledge about the input data range is, for example, just an unsigned integer. then k = 2^32 = approx 4GB size, which is ridiculously huge. and N can be just 10, etc. in such a case, counting sort is not suitable. (who allocates 4GB auxiliary space for sorting 10 integers?) 
 
 
 
################################ 
####     using hashmap      #### 
################################ 
 
using a hashmap data structure enables us to do counting-sort in a more efficient way. 
 
 
---------------------------------- 
 
#!/usr/bin/perl 
 
$file = "<file_dir>";        # suppose each line is an elem 
open(FH,"<$file") or die "open error"; 
 
%h = {}; 
 
while(<FH>) 

    chomp $_; 
    $h{$_}++; 

 
for $k (sort keys %h) 

    for($i=0; $i < $h{$k} ; $i++) 
    { 
        print $k; 
    } 

 
close(FH) or die "close error"; 
 
-------------------------------- 

  1. 2014-06-30 03:59:44 |
  2. Category : algo
  3. Page View:

Google Ads