####################
### algorithm ###
####################
there are a few kinds, some of which are obviously interrelated.
(0) sort
(1) search
(2) merge, swap, randomize, encryption/decryption, etc
##################################
### computation complexity ###
##################################
understanding computation efficiency/complexity of each algorithm is critical.
though there can be two kinds, (0) time, and (1) space, often, they are in a tradeoff relation.
##
## "asymptotic" analysis
##
in a little bit oversimplified term, "asymptotic" means, just like we used in maths/calculus, "(gradually/eventually) approaching the limit"
e.g. when talking about some sort algo, "asymtotically speaking" means "when input data size n increases toward infinity".
in some algo, the "limiting" behavior may not be about the simple input data size n. it can be about the ratio of input data size N versus some fixed size M table, approaching zero, etc.
so "asymptotic" analysis is about the algo's asitapproacheslimit behavior. that "it" is usually input data size n. note we actually never reach such a limit.
more on a conceptual side of the term, a curve expressed by y = 1/x is described "asymptotic" to both x & y axises, as the curve approaches but never touches when x approaces 0 (or inifinity)
there are a few well established asymptotic notations; bigOh, omega, theta
##
## big O notation : upper bound
##
bigOh expresses the asymptotic upper bound of an algorithm's running time.
as in O(n^2) # O is for order.
lets say you have O( 5*(n^2) + kn  3*log(n) + 1/4 )
where k is a coefficient
===> we only keep the asymptotically the strongest term from the equation, and ignore coefficients.
===> because the purpose here is just to gauge the rate of growth of the execution time of the algo.
===> O(n) means when N is 1000 times more, the computation time becomes 1000 times more.
===> O(log(n)) means when N is 1000 times more, the computation time becomes only 10 times more.
===> so, for example, the above becomes O(n^2)
e.g.
O(3n) = O(n/2) = O(n)
lets classify the performance familiy members in the order of efficiency.
O(1) < O(log(n)) < O(n^d) < O(n) < O(n*log(n)) < O(n^2) < O(k^n) < O(n!)
where d < 1, k is a constant
constant < logarithm < sublinear < linear < n*log(n) < polynomial(usually quadratic) < exponential < factorial
NOTE: in reality, constants (especially coefficient constants) matter as N is not always a big number. but here we talk in theory.
##
## omega: lower bound
##
##
## theta: tight bound
##

to provide some guideline, we generally consider (1) worse case, (2) average case, (3) best case.
but each algo has its own advantage depending on certain conditions, such as the number of inputs, if the inputs are already relatively sorted, etc.
for many problems there is no single absolute optimal solution. Algorithm A can perform better/worse than algorithm B depending on the pattern/condition of the input data set.
so be mindful of the assumptions that comes with the analysis of any algorithm.
(ref)
http://en.wikipedia.org/wiki/Asymptotic_analysis
http://www.cs.utexas.edu/users/djimenez/utsa/cs3343/lecture3.html
https://answers.yahoo.com/question/index?qid=20081030122228AAb0SRo
http://bigocheatsheet.com/
###############################
#### space complexity ####
###############################
there are two kinds.
(1) auxiliary space ## this is the extra/temp/additonal space required
(2) total space ## input plus auxiliary space
====> worst case space complexity usually means the latter.
====> but the total space is almost always O(N) anyway because seldom an algo requires more than O(N) auxiliary space.
====> so it makes sense to discuss in terms of auxiliary space.
(ref)
http://www.geeksforgeeks.org/gfact86/
###############################################
#### amoritized case VS average case ####
###############################################
(2) average case is, scientifically speaking, not strictly correct, as often we cannot/should not make assumptions on the probability distribution of inputs. also, we need to look at the entire sequence of operations. within an algorithm, there may be some expensive operation(like insert) that is O(n) but its occurrence is very rare, and all other operations are cheap (like search).
so instead of average, we use "amortized" case which is based on probability analysis over an expected cost of a single operations.
another description quoted from the below reference is "the amortized time is (a bound for) the average time of an operation in the worst case."
one analogy is income expectation of various types of jobs. if you compare the amortized amount of total salary you receive working for the gov. VS a wall street firm. here, taking average does not mean much.
(ref)
http://stackoverflow.com/questions/15079327/amortizedcomplexityinlaymansterms
http://www.ledatutorial.org/en/official/ch02s02s03.html
#########################################
#### sort  stable VS unstable ###
#########################################
(ref)
http://en.wikipedia.org/wiki/Sorting_algorithm
stable sort  does not change the order of the same value entries.
unstable sort  may change the order.
e.g.
lets say you are sorting 4 cards.
# from left to right
5 (heart), 4 (club), 8(diamond), 5 (spade)
===> sort output will be 4,5,5,8. stable sort guarantees it will be 4,5(heart),5(spade),8 while unstable sort may change the order of the two 5s.
##############################################################
#### n*log(n) limit for comparisonbased sort algo ####
##############################################################
average and worst case for any pairwise comparisonbased sort cannot beat O(n*log(n))
since n*log(n) is asymptotically greater than n/2, we just write it as O( n*log(n) )
this is proved using a binary decision tree illustration.
lets say we sort three elements a,b,c
n = 3
n! = 3! = 6 possible permutations
a < b

yesno
b < c a < c
 
yesno yesno
a,b,c a < c b,a,c b < c
 
yesno yesno
a,c,b c,a,b b,c,a c,b,a
===> each leaf node is an underlying permutation.
===> path from root to leaf is a sequence of comparison
===> some are 2, other are 3 in the above example
===> the height h of the tree is the longest path
sometimes, like the above case, you may not get a complete binary tree. but imagine in the above case, can the height 3 be any shorter?
no, because to make h=2, we can only have 4 permutations.
height  0  1  2  3  ...  h
max # of nodes  1  3  7  15 ...  2^(h+1)  1
max # of leaves  1  2  4  8  ...  2^h
h = log(max # of leaves)
max # of leaves >= # of leaves = n! permutations
h >= log(n!)
or
h = ceiling( log(n!) )
ceiling(x) rounds x to its immediate next integer. e.g. ceiling(3.1) = 4
h >= log(n!)
= log(n * (n1) * (n2) * ... 2 * 1)
> log(n * (n1) * (n2) * ... n/2) // truncated after n/2
> log((n/2)^(n/2))
= (n/2) * log(n/2)
= (n/2) * ( log(n)  1 )
= (1/2) * n * log(n)  (1/2) * n // just rewriting by expanding
if you put the last equation in bigOh,
it becomes O(n*log(n))
so, for sorting n elements using pairwise comparisonbased algo, you have h > O(n*log(n))
i have only proven for the worst case, but if you think about average case as the input already half sorted, i.e. n/2
then you still cannot beat O(n*log(n)) asymptotically.
(ref)
http://crypto.stanford.edu/~dabo/courses/cs161_spring01/cs16107.pdf