### counting sort

############################
####   counting sort    ####
############################

while any comparison-based sort algo's average/worst case time complexity cannot beat O(n*log(n)), there is a way to achieve O(n), but with tradeoff with space.

# input assumption
when sorting N elements, if it's known that each element's value is in the range [0,k), where k is much smaller than N, then we can achieve O(n).

count the occurrence of each of 0 to k-1 by traversing each of N. then print each of 0 to k-1 as many times as they occur.

e.g.

input:  3 1 2 0 3 3 1 2 0 0 3 1 2

you have a knowledge that the input data has value [0,k)
in this case, k = 4

3 appears 4 times
2 appears 3 times
1 appears 3 times
0 appears 3 times

print from 0 to k-1 as many times as they appear.

(ref)
http://rosettacode.org/wiki/Sorting_algorithms/Counting_sort

###########################################
####   implementation example in C++   ####
###########################################

lets use the same example input from the above.

---------------------------------//  counting.cpp
#include <iostream>
using namespace std;

void counting_sort(int* arr, int size, int k)

int* temp = new int[4];

int i;
for(i = 0; i < size ; i++)
temp[arr[i]]++;

int idx = 0;
for(i = 0; i < k; i++)
while(temp[i]-- > 0)
arr[idx++] = i;

delete [] temp;

void printArray(int* arr, int size)

for(int i = 0; i < size ; i++)
cout << arr[i];
cout << endl;

int main()

int arr[] = {3,1,2,0,3,3,1,2,0,0,3,1,2};
int n = sizeof(arr)/sizeof(arr[0]);      // n = 13
int k = 4;                               // this is given
printArray(arr,n);           // 3120331200312
counting_sort(arr,n,k);
printArray(arr,n);           // 0001112223333

return 0;

-----------------------------------

#################################
####   complexity analysis   ####
#################################

as seen above, the algo iterates thru the input array twice, first time for counting, and second time printing/writing the output to the original array. the size of that print is the size of elements counted in the first loop that is N. so in total we have 2*N operations. regardless of best/average/worst. thus O(2n) = O(n)

# time

worst   : O(n)
average : O(n)
best    : O(n)

# space

total     : O(n+k)
auxiliary : O(k)

other than the input array, we need extra array[k] for storing the counting result. as stated at the beginning, the assumption is k is much smaller than n for this algo to be efficient. but if not, space inefficiency becomes apparent, which leads us to other non-comparision-based algo such as radix sort and bucket(bin) sort.

if your only knowledge about the input data range is, for example, just an unsigned integer. then k = 2^32 = approx 4GB size, which is ridiculously huge. and N can be just 10, etc. in such a case, counting sort is not suitable. (who allocates 4GB auxiliary space for sorting 10 integers?)

################################
####     using hashmap      ####
################################

using a hashmap data structure enables us to do counting-sort in a more efficient way.

----------------------------------

#!/usr/bin/perl

\$file = "<file_dir>";        # suppose each line is an elem
open(FH,"<\$file") or die "open error";

%h = {};

while(<FH>)

chomp \$_;
\$h{\$_}++;

for \$k (sort keys %h)

for(\$i=0; \$i < \$h{\$k} ; \$i++)
{
print \$k;
}

close(FH) or die "close error";

--------------------------------

1. 2014-06-30 03:59:44 |
2. Category : algo
3. Page View: