### computation theory

Computability, Complexity and Algorithm. cs6505 (later evolved to cs6515 graduate algo) course notes

##
##  math notes
##

for digital editor, use Google docs or MS word.

# inclusion-exclusion principle
http://mathworld.wolfram.com/Inclusion-ExclusionPrinciple.html

# operators e.g. ":="  flipped E,A etc
http://www.abstractmath.org/MM/MMOtherSymbols.htm

# misc
http://en.wikipedia.org/wiki/Contraposition
http://en.wikipedia.org/wiki/Cardinality

##
## textbook/aid
##

(mit discrete math review)

(textbook)
Discrete Mathematics and Its Applications - Seventh Edition - Kenneth H. Rosen
--- http://faculty.wwu.edu/curgus/Courses/209_201110/S_1_1.pdf
Introduction to the Theory of Computation - Third Edition - Michael Sipser
Introduction to Algorithms -Third Edition - Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein

(instructor provided notes) https://www.udacity.com/wiki/ud557/notes

########################################
####    Languages & Countability    ####
########################################

####
####  Function  VS  Computation
####

A "relation" is a set of ordered pairs.
A "function" is a special kind of relation that maps each element of the "domain" into exactly one element in the "range".

input -> (computer) -> output

to express input/output, we have
Alphabet : a "finite" set of symbols (e.g. [0-9a-ZA-Z_] )
String   : a "finite" sequence of symbols from an alphabet  (e.g.  w = 001011, assuming the alphabets are only {0,1})
Language : a set of strings (can be either finite or infinite)
e.g. a set X = a list of names in a class. (finite)
a set Y = a set of even numbers. (infinite)
a set Z = null

thus, any set of strings over alphabets is a language.

===> there can only be "countably" inifinite strings because e.g. alphabet {0,1}, then
string length : possible strings
0 :  empty
1 :  0,1
2 :  00,01,10,11
3 :  000,001,010,011,100,101,110,111
N :  k^N strings where k is the number of symbols from the alphabet, k=2 in this case.
====> no matter how deeper you go, you can always count the number of strings in this k^N fasion.

(quote/modified from instructor's note) https://www.udacity.com/wiki/ud557/notes
------------
A language, in general, is any set of strings over some alphabet. We will define a computable language to be the set of all strings from its alphabet that some machine accepts. (If we call the machine M, then the set of strings M accepts is called the language of M, denoted L(M). In other words L(M) is the language M recognizes) So if M is a machine whose alphabet is {0,1} and which accepts binary strings with an even number of 0s and rejects all other binary strings, then L(M) = {binary strings with an even number of 0s}, and therefore this language is a computable language.

In this way, a machine's language determines a function whose domain is the set of strings over an alphabet Σ and whose range is the finite set {ACCEPT,REJECT}. A function of this type that is determined by the language of some machine is called a computable function.
------------

NOTE:  L(M) = { x in Z* | M accepts x }

NOTE: as a general math term, the "power" set of any set S, written as P(S), is the set of all subsets of S, including the empty set and S itself.

###
###  operations on languages
###

- union
- intersection
- complement
- concatination
- Kleene star and plus

note: two sets are said to be disjoint if their intersection is empty

e.g.

A = {0,10}
B = {0,11}
A ∪ B = {0,10,11}
A ∩ B = {0}
_                                                              _
A = {null,1,00,01,11,000,001,,,,,,,}   # infinite.  notice how A excludes 0 & 10
_
A ∩ A = {null}  # notice the empty set is a subset of every set. it belongs to both a set and its complement.

==> to recap, in this case, A & B are both languages(sets)
==> alphabets/symbols are 0 or 1
==> strings are any finite sequence of alphabet

AB = {xy| x belongs to A and y to B}     # concatination
= {00,011,100,1011}

AA = A^2
= {00,010,100,1010}

A^0 = {null}

A^k = {x1,x2,x3,,,xk | each xi belongs to A}

# concatination aka Cartesian product
(ref) http://en.wikipedia.org/wiki/Cartesian_product

# Kleene Star and Plus

===> to concatinate any number of strings from a language, we write

∞
A* =  ∪ A^i
i=0

∞
A+ =  ∪ A^i       # plus means exclude an empty string
i=1

===> think of this as a union of all possible powers of the language

e.g.
010001010 belongs to A*   because it can be broken down to 0 10 0 0 10 10, each being a string from A

===> NOTE A* does NOT inlclude an infinite sequence of strings. each string is a finite length, you are only allowed to concatinate a finite number of those strings.

===> notice it's essentially the regex we are familiar in programming.

####
####  Countability
####

some functions are not computable.

countably infinite sets VS uncountably infinite sets

countably inifite sets   : the ones you can enumerate elements in order. e.g. the even numbers = 2,4,6,,,,
uncountable infinite sets: the ones you cannot enumerate in order. e.g. the real numbers.

the set of all strings over an alphabet is countable.
the set of languages is UNcountable.

Formal definition: a set is countable if it is finite or there is a one-to-one correspondence with the natural numbers.
i.e. no two elements of the domain map to the same element in the range. and "onto" means every element in range is mapped to an element in the domain.

e.g. f(n) = 2n

domain      range
-----------------
1             2
2             4
3             6
4             8
.             .
.             .
.             .

notice because it's one-to-one, two sets are the same length.
there are as many even numbers as natural numbers.

Theorem: for any finite alphabet Z, the set of strings Z* is countable.
Proof  :

∞
recall Z* =  ∪ Z^i
i=0

let Nk = total number of strings size at most Zk
= |Z^0| + |Z^1| + .... + |Z^k|

domain          range
----------------------------------
1 = N0          Z^0 = empty string
2 = N0 + 1      Z^1
3 = N0 + 2      Z^1
.               .
.               .
N1              Z^1  # depending on how many symbols for alphabet Z, there can be many strings of length 1 = Z^1
N1 + 1          Z^2
N1 + 2          Z^2  # similarly, there can be many strings of length 2
.               .    # if Z = [a-z] then |Z^2| = 26C2
.               .
N2              Z^2
.               .
.               .
Nk-1 + 1        Z^k
Nk-1 + 2        Z^k
.               .
.               .
Nk              Z^k

e.g.
lets say a binary alphabet Z = {0,1}
we can enumerate Z*

domain   range
---------------
1        Empty
2        0
3        1
4        00
5        01
6        10
7        11
8        000
.        .
.        .

=====> now, the above argument also shows the below theorem

Theorem: a countable union of finite sets is countable.

=====> at this points we can say "any set of strings is countable" !!
=====> now, lets even go further to prove the below theorem.

Theorem: a countable union of "countable" sets is countable.

Proof:
suppose sets(aka languages) S0, S1, S2,,,, are all disjoint
Because if the sets aren't disjoint, the number of elements in the union of sets won't be equal to the sum of number of elements of all sets. This is because the common elements in the sets will be counted only once in the union. we care cos we care about one-to-one correspondence, i.e. the number of elements in both domain and range should be equally many.

domain             range
------------------------
{1,...,N0}          S0
{N0 +1,...,N1}      S1
.                   .
.                   .
{Nk-1 +1,...,Nk}    Sk

S0 = {X00, X01, X02,,,,}
Sk = {Xk0, Xk1, Xk2,,,,}

0   1   2   3
0 X00 X01 X02 X03 ....
1 X10 X11 X12 X13 ....
2 X20 X21 X22 X23 ....
3 X30 X31 X32 X33 ....

===> use the index of each X for enumeration
===> e.g. think of this as rational number, as in;

0   1   2   3
0 0/1 1/1 2/1 3/1 ....
1 0/2 1/2 2/2 3/2 ....
2 0/3 1/3 2/3 3/3 ....
3 0/4 1/4 2/4 3/4 ....

===> each row is infinite but diagonals are countable.

∞
∪ Si  =  ∪ {xij | i+j = k}     # this {} is diagonal which is always finite. we can enumerate.
i=0      k=0

(ref) http://en.wikipedia.org/wiki/Rational_number

Theorem:  "the set of all languages over a finite alphabet is UNcountable"
Proof:

we will proove by contradiction, using diagonalization trick. ( = same trick we will use in Undecidability context)

Let L1, L2,,, be the set of languages over an alphabet Z.
Let x1, x2,,, be the strings in Z*.

e.g.

x1 x2 x3 x4 .... xk
L1  0  1  0  1            # this means string x1,x3 are not in L1, but x2,x4 are in L1
L2  1  1  0  1
L3  1  0  1  1
L4  1  0  1  0
.               .
.                .
Lk  1  0  1  0 .... ??

==> now, consider a language set L = {xi | xi is NOT in Li}   # assume this is countable for now
==> this L must be Lk for some k, if L is enumerable. in the above table, L1, L4 are languages where x1,x4 dont belong to L1,L4 respectively. Thus x1,x4 are in L.
==> in effect, this is taking the diagonal and reversing each bit
==> i.e. Lk is defined as the opposite of what's on the diagonal.
==> is xk in Lk ?  no xk can be the opposite of itself.  if xk is in Lk, then xk is not in Lk. if xk is not in Lk, then xk is in Lk. contradiction.

==> enumeration of languages here is invalid. thus the set of all languages over a finite alphabet is UNcountable.

Q.E.D

(good instructor's note) https://www.udacity.com/wiki/ud557/notes#existence-of-uncomputable-functions

###
###  consequences of the uncountability of languages
###

Let Z be the set of ascii char.
{string w | w represents a valid python program} belongs to Z*

each w is finite, thus the set of valid python programs is countable. This means there can only countably many algorithms, assuming an algorithm can be encoded by a finite string of an alphabet. and an algo can compute one language.
(substitute python with any other computer language)

for any Language L, let fL(x) = 1 if x is in L, else 0.
all such functions are distinct, thus the set {fL | L is a language over Z} is UNcountable.
i.e. the set of functions from Z* to {0,1} is uncountable.

====> this means, there must be some functions that we cannot simply write computer programs for.
====> so what are the problems that cannot be solved by coputer programs ?
====> lets study Turing Machine

#####################################
#####      Turing Machines      #####
#####################################

we need a model of computation (a notion of computability) that captures everything any computer can do now or in the future.
===> Turing Machine

recall  input --> computer --> output
input/output can be expressed in string

A Turing Machine consists of
1) finite set of states Q
2) input alphabet Z (finite. whitespace is exluded as it is used as input terminator)
3) tape alphabet T (finite. whitepsace included)
4) transition function F: Q,T -> Q,T,{L,R}
5) start state q0
6) accept state qa
7) reject state qr

===> the transition function is the heart of the TM
===> based on the current state and the symbol that its head reads, it decides the next state, what to write, which direction to move the head. (if at the beginning, it cannot move to the left. by convention, trying to move left at the first square makes the head stay at that position instead.)

NOTE:
- everything used to specify a TM is "finite"  as in Q,Z,T are all finite.
- q0,qa,qr are all in Q, and qa != qr.
- a tape is two-way infinite, meaning (infinite blank)(input string)(infinite blank), it's basically unlimited memory space. in general, the input string is on the left most of the tape and the TM head is position at the left most of the tape.
- there can be "useless" states which are never entered on any input.
- to "rewind" the tape, meaning you want the head to come back to the left most, ie beginning, you just mark the first symbol or use the right shift which you will see in the below exercise.

##
##  TM example : testing the oddness (as opposed to evenness)
##

This TM reads the binary input and tells if it is odd or not.

Q = {q0, q1, qa, qr}
Z = {0,1}
T = {0,1,_}   # _ as a whitespace
F =  q0,0 -> q0,0,R
q0,1 -> q0,1,R
q0,_ -> q1,_,L
q1,0 -> qr,0,R
q1,1 -> qa,1,R    # if unknown Q,T pair is encountered, TM halts and rejects, by convention.

===> essentially, just keep reading the input binary till the terminating whitespace, and goes back one bit, and determines the oddness based on that final bit.

===> this transition function can be representated as a state diagram (similar to what is done for finite automate)

as for notiation (aka configuration sequences), we write as follows;

input binary is 1011 (= 11 in decimal)

state and input
---------------
q0 1011
1 q0 011
10 q0 11
101 q0 1
1011 q0 _
101 q1 1
1011 qa _

===> this shows the position of the tape and TM state
===> everything expressed as strings (important)
===> also tells what action was taken. thus this sequence itself shows the computation
===> "configuration" refers to the setting of the three (state, tape content, head location)

##
##  TM example  :  equality testing
##

task: given input over {0,1,#}*, accept if it is of the form w#w where w is in {0,1}*. otherwise reject.
input: 01#01

strategy: start from the left, and mark the read symbol X, then go across # to see if the corresponding symbol matches. if no, then reject immediately. otherwise mark it X and go back across # to X then start again. once marking all 0,1 X before #, then go across # to confirm _ after X on the right side of #.

configuration sequence
----------------------
q0 01#01
x q1 1#01
x1 q1 #01
x1# q3 01
x1# q5 x1
x1 q6 #x1
x q6 1#x1
x q0 1#x1
xx q2 #x1
xx# q4 x1
xx#x q4 1
xx#x q5 x
xx# q5 xx
xx q6 #xx
xx q6 #xx
xx q0 #xx
xx# q7 xx
xx#x q7 x
xx#xx q7 _
xx#xx qa

NOTE: w#w is easy because we have a delimeter "#" but we can also construct a TM that decides {ww| w is in Z*} by first marking the very first char upper case, then go all the way till the last char, and mark it upper case, and repeat it. so eventually you reach the middle char, and since only lower/upper cases were changed, we can still compare the ww-ness.

##
##  TM exercise : right shift
##

input : w    # some string
output: \$w   # append a dollar sign

strategy is

Q_0 : if 0|1, go right, stay Q_0. if blank, go left, go Q_1
Q_1 : if 0, blank it, go right, go Q_2.  if 1, blank it, go right, go Q_3.  if blank, make it \$, go Q_accept
Q_2 : if blank, make it 0, go left, go Q_4
Q_3 : if blank, make it 1, go left, go Q_4
Q_4 : if blank, go left, go Q_1

here is the psuedo python code

-----------------------------------

######################################################################
# Task: Describe a Turing machine that shifts the input string one
# space to the right and places the symbol (\$) in the first location
# on the tape.
######################################################################
# Enter values below for
# q_0 : the initial state (an int)
# q_a : the accept state (an int)
# q_r : the reject state (an int)
# delta : the transition function expressed as a dictionary
#         with keys (state, symbol) and values (state, symbol, 'L' or 'R')
# Use the 'b' character for the blank symbol.

# For example, you might express the TM that appends a 1 as follows:

# q_0 = 0
# q_a = 1
# q_r = 2
# delta = {}
# delta[(0,'0')] = (0,'0','R')
# delta[(0,'1')] = (0,'1','R')
# delta[(0,'b')] = (1,'1','R')
######################################################################

test_tape = ['0','1']

q_0 = 0
q_1 = 1
q_2 = 2
q_3 = 3
q_4 = 4
q_a = 5
q_r = 6

delta = {}
delta[(q_0,'0')] = (q_0,'0','R')
delta[(q_0,'1')] = (q_0,'1','R')
delta[(q_0,'b')] = (q_1,'b','L')
delta[(q_1,'0')] = (q_2,'b','R')
delta[(q_1,'1')] = (q_3,'b','R')
delta[(q_2,'b')] = (q_4,'0','L')
delta[(q_3,'b')] = (q_4,'1','L')
delta[(q_4,'b')] = (q_1,'b','L')
delta[(q_1,'b')] = (q_a,'\$','R')

--------------------------------

##
##  TM exercise : balanced strings
##

input : { w | w has an equal number of 0s and 1s}

e.g.  01001    # this should be rejected. one too many 0

strategy is
if blank, accept. otherwise mark it blank.
then go right till you hit the corresponding symbol. mark it x, then go left till blank, then go right. the rest repeats.

-------------------// psuedo python code

######################################################################
# Describe a Turing Machine that decides the language
# {w | w has an equal number of 0s and 1}
######################################################################
# Enter values below for
# q_0 : the initial state (an int)
# q_a : the accept state (an int)
# q_r : the reject state (an int)
# delta : the transition function expressed as a dictionary
#         with keys (state, symbol) and values (state, symbol, 'L' or 'R')
# Use the 'b' character for the blank symbol.

# For example, you might express the TM that appends a 1 as follows:

# q_0 = 0
# q_a = 1
# q_r = 2
# delta = {}
# delta[(0,'0')] = (0,'0','R')
# delta[(0,'1')] = (0,'1','R')
# delta[(0,'b')] = (1,'1','R')
######################################################################

test_tape = ['0','0','1', '0', '1', '1']

q_0 = 0
q_1 = 1
q_2 = 2
q_3 = 3
q_a = 4
q_r = 5

delta = {}
delta[(q_0,'b')] = (q_a,'b','L')
delta[(q_0,'x')] = (q_0,'x','R')
delta[(q_0,'0')] = (q_1,'b','R')
delta[(q_0,'1')] = (q_2,'b','R')

delta[(q_1,'0')] = (q_1,'0','R')
delta[(q_1,'x')] = (q_1,'x','R')
delta[(q_1,'1')] = (q_3,'x','L')
delta[(q_1,'b')] = (q_r,'b','L')

delta[(q_2,'0')] = (q_3,'x','L')
delta[(q_2,'x')] = (q_2,'x','R')
delta[(q_2,'1')] = (q_2,'1','R')
delta[(q_2,'b')] = (q_r,'b','L')

delta[(q_3,'0')] = (q_3,'0','L')
delta[(q_3,'1')] = (q_3,'1','L')
delta[(q_3,'x')] = (q_3,'x','L')
delta[(q_3,'b')] = (q_0,'x','R')

----------------------------

####
####  Language Decider
####

A Turing Machine "decides" a language L  if it accepts every x in L and it rejects every x not in L.

e.g.
equality checker TM decides L = { w#w | w is in {0,1}* }
i.e.
equality checker TM computes L = f(x) = {1 if x is in L, 0 otherwise}

####
####   Language Recognizer
####

A Turing Machine "recognizes" a language L  if it accepts every x in L and it does NOT accept any x not in L.

e.g.
TM computes L = {w is in {0,1}| w contains a 1 }
===> lets say
q_0: if 0 or blank then stay q_0, move right. if 1, then q_accept

===> what if the input string never contains 1?  then it loops forever.
===> make sure the reject logic is there for TM to "decide" otherwise it just "recognizes"

q_0 if 0, then stay q_0, move right. if 1, then q_accept. if blank, then q_reject
====> now this TM can properly "decide" this language.

====> as above, decider only accepts|rejects(never loop). every decider is a recognizer.

Define L(M) = {x | M accepts x} as the languages of machine M

(good summary on "decide" vs "recognize")

##########################################
####       Church-Turing Thesis      #####
##########################################

a thesis : everything computable is computable by TM.

TM <-> Multi-tape <-> RAM model    # random access mem model

==> we will formally agure that "A language|function is computable iff it can be decided by a TM."
#note : computable as in a generic sense.

###
###  Stay-Put Machines
###

transition function F :  Q,T -> Q,T,{L,R,S}

===> we added "S" = stay put for a possible tape head movement choice.
===> this can easily be implemented the simple TM with just {L,R} by adding an extra state that matches every symbol and simply goes back a slot.

Two TMs are equivalent when both of the below are true;
(1) accept, reject, loop on the same inputs
(2) have same tape contents with halt (when relevant)

###
###  Multi-tape TM
###

essentially the same as the single tape

for a TM with k tapes;
transition function F : Q,T^k -> Q,T^k,{L,R,S}^k

e.g.
duplicate inputs

tape 1 contains input
tape 2 blank

===> copies the input from tape1 to tape2, then go back to the beginning of the tape2, and appends to the tape1.

###
###  TM exercise : substring serach
###

--------------------------- // psuedo python code

######################################################################
# Task: Describe a two-tape Turing machine that recognizes the
# language {x#y | x is a substring of y and x,y in {0,1}^* }
######################################################################
# Enter values below for
# q_0 : the initial state (an int)
# q_a : the accept state (an int)
# q_r : the reject state (an int)
# delta : the transition function expressed as a dictionary
#         with keys (state, symbol, symbol) and
#       values (state, symbol, symbol,{L,S,R},{L,S,R})
# Use the 'b' character for the blank symbol.

# For example, you might express the TM that turns input x into
# x#x as follows:

# q_0 = 0
# q_a = 4
# q_r = 5

# delta = {}
# #Leave a blank at the beginning of the second tape.
# delta[(0,'0','b')] = (1,'0','b','S','R')
# delta[(0,'1','b')] = (1,'1','b','S','R')
# delta[(0,'b','b')] = (4,'#','b','R','R')

# #Copy the 1st onto the 2nd tape
# delta[(1,'0','b')] = (1,'0','0','R','R')
# delta[(1,'1','b')] = (1,'1','1','R','R')
# delta[(1,'b','b')] = (2,'#','b','S','L')

# #Rewind the 2nd tape
# delta[(2,'#','0')] = (2,'#','0','S','L')
# delta[(2,'#','1')] = (2,'#','1','S','L')
# delta[(2,'#','b')] = (3,'#','b','R','R')

# #Copy from 2nd to 1st
# delta[(3,'b','0')] = (3,'0','0','R','R')
# delta[(3,'b','1')] = (3,'1','1','R','R')
# delta[(3,'b','b')] = (4,'b','b','S','S')
######################################################################

#Test Run will test your machine on this input.
test_tape = ['0','1','#','1','0','1','0']

#Specify the Multitape machine here
q_0 = 0
q_1 = 1
q_2 = 2
q_3 = 3
q_4 = 4
q_a = 5
q_r = 6

delta = {}
#Leave a blank at the beginning of the second tape.
delta[(q_0,'0','b')] = (q_1,'0','b','S','R')
delta[(q_0,'1','b')] = (q_1,'1','b','S','R')
delta[(q_0,'b','b')] = (q_r,'#','b','R','R')

#Copy the 1st onto the 2nd tape
delta[(q_1,'0','b')] = (q_1,'0','0','R','R')
delta[(q_1,'1','b')] = (q_1,'1','1','R','R')
delta[(q_1,'#','b')] = (q_2,'#','b','S','L')
delta[(q_1,'b','b')] = (q_r,'b','b','S','L')

#Rewind both tape
delta[(q_2,'#','0')] = (q_2,'#','0','S','L')
delta[(q_2,'#','1')] = (q_2,'#','1','S','L')
delta[(q_2,'0','0')] = (q_2,'0','0','L','L')
delta[(q_2,'1','1')] = (q_2,'1','1','L','L')
delta[(q_2,'0','1')] = (q_2,'0','1','L','L')
delta[(q_2,'1','0')] = (q_2,'1','0','L','L')
delta[(q_2,'0','b')] = (q_2,'0','b','L','S')
delta[(q_2,'1','b')] = (q_2,'1','b','L','S')
delta[(q_2,'#','b')] = (q_3,'#','b','R','R')

#Compare the 2nd to the 1st
delta[(q_3,'0','0')] = (q_4,'#','0','R','R')
delta[(q_3,'1','1')] = (q_4,'#','1','R','R')
delta[(q_3,'1','0')] = (q_3,'#','0','R','S')
delta[(q_3,'0','1')] = (q_3,'#','1','R','S')
delta[(q_3,'b','0')] = (q_r,'b','0','S','S')
delta[(q_3,'b','1')] = (q_r,'b','1','S','S')

delta[(q_4,'0','0')] = (q_4,'0','0','R','R')
delta[(q_4,'1','1')] = (q_4,'1','1','R','R')
delta[(q_4,'1','0')] = (q_2,'1','0','S','L')
delta[(q_4,'0','1')] = (q_2,'0','1','S','L')
delta[(q_4,'b','0')] = (q_r,'b','0','S','S')
delta[(q_4,'b','1')] = (q_r,'b','1','S','S')
delta[(q_4,'b','b')] = (q_a,'b','b','S','S')
delta[(q_4,'1','b')] = (q_a,'1','b','S','S')
delta[(q_4,'0','b')] = (q_a,'0','b','S','S')

---------------------------------------------

####
####   Multi-tape and Single-tape Equivalence
####

Multi-tape and single-tape TMs are the same. but we allow the below tricks;
- delimit each tape content by some delimeter.
- prepare dotted alphabets(yes, so basically need the dotted/marked version of every input, meaning doubly expanding the alphabet space) to mark the head position on that symbol.
- for appending a new char in any tapes, just right shift the rest of the subsequent tapes.

##
##  complexity comparision btw multi-tape VS single-tape
##

Let M be a multitape TM and let S be its single tape equivalent. assume k tapes for M.
If M halts after t >= |x| steps for an input x, then S halts after O(t^2)
because
1. certain operations that can be done in O(1) with multitape takes O(x) for single tape.
thus for each step of t, it can take O(t), thus in total, O(t^2) for S
e.g.
recall how we simulated the multi-tape in a single tape.
lets say we have 3-tapes
A = 01
B = 10
C = 11
then we express the contents in a single tape as follows;
X = 01#10#11
when appending a symbol to tape A, it is O(1) for multi-tape but O(n) for single tape as TM has to shift the rest of the tape by one. n = input symbol size. if M(multi-tape TM in this case) halts after t steps, then S can take O(t) for each t steps, thus halts after O(t^2).

2. another way to look at this is consider the number of symbols at step i of M, which is |x|+k*i
there is a total of t steps for M, and each step can be a constant for M, but can take linear t for S. thus P(t^2)

(TA's explanation)
1. The condition t >= |X| is implicitly used in the analysis, I will come back to that later.
2./3. When Charles referred to the total number of symbols in the lecture video, he was talking about the total number of symbols "printed" on the tape. When simulating the multitape on a single tape you have to go over the whole tape (over all printed symbols), reading the "Head Positions" from the multitape equivalent. And you have to go over the tape again for writing the new symbols and moving these "Head Positions". These two steps are therefore clearly proportional to the number of symbols on the tape. There might even be more than two passes, if you have to do additional right shifting. But as Charles pointed out in the lecture, this involves at the maximum (k-1) passes for k tapes (the very last tape / last section on the single tape can be extended without shifting, because there are blanks at the right end).
For the upper bound, Big-Oh notation, we neglect constants and can say that one step on the multitape machine will take O(n) steps on the single tape machine, where n is the number of symbols used at this particular step.
4. The total number of symbols printed on the tape depends on the step, because in each step, we can add another symbol to the tape per simulated head. Assume you start with an input of |X|, then the total number of symbols on the tape can grow by some constant per step. This constant is k, because by convention, for k tapes in the multitape machine, we have k heads that can write to the tape. Thus we add k symbols at the maximum per step and the total number of symbols in step i is bounded by |X| + ki.
5. For the first term |X| (the size of the input), we can put a O(t^2) bound, because we pass these |X| symbols at least twice per step (read/write), as explained in 2., and for t steps we can conclude that O(t |X|) is O(t^2), because t >= |X|.
The second term we have to consider is ki, the number of additional symbols printed during the computation as explained in 4. Again, we are using the upper bound big-Oh and can therefore neglect any constants. Thus the bound is given by the arithmetic sequence as described by Charles in the lecture video. This sequence is ( (t+1) t ) / 2, which is O(t^2) as well.
Thus we can finally conclude that for both terms O(t^2) + O(t^2) we have an upper bound of O(t^2).

####
####   RAM (random access model)
####

RAM more closely resembles the basic CPU register memory paradigm.

instead of a finite alphabet, RAM operates with non-negative integers which can be arbitrarily large.
- register : storing operands
- memory   : infinite random access storage, kind of like tape in TM. but (1) RAM access, (2) each pos stores a number
- program  : a sequence of instructions
- PC       : a special register called program counter which keeps track of which instruction to execute next.

every instruction is a finite set that resembles assembly code instructions.
e.g.
instruction   operand
---------------------
write         j       # writes R0 into mem[j]
store         j
add           j       # increase R0 by mem[j]   # this takes care of multiplication
jump          5(R0)   # jumps to a certain place in the program, this updates PC
halt                  # end

====> each operation increments PC (except jump)
====> final value in R0 determins accept|reject

====> more mathematically, we can define the property
k : number of registers in N(natural number)
Pi: a sequence of instructions. Pi = ( Pi0 Pi1 Pi3.... )

A configration of RAM TM;
1. PC in {0....p}               # 0 = halt
2. register value R0 .... Rk-1
3. RAM contents T : N1 -> N0    # expressed as a function. domain -> range

NOTE : only a finite number of the addresses will contain a non-zero, so this function T always has a finite representation.

####
####   Equivalance of RMA and TM
####

formally, we need to define a function to translate the symbol representation in TM to the numbers in RAM.
let E : {_} ∨ Z -> {0,....,|Z|} be a 1-to-1 correspondence where E{_} = 0   i.e. blank = uninitialized is zero

RAM      -   TM
--------------------------
memory   -   Tape
R1 tracks head position              # R1 or any other some fixed register
program/PC  - transition function    #  PC points to the current state q_i

lets look at this the other way around.

TM    -    RAM
----------------
each register gets its own tape
1 tape for scratch work
1 tape for memory
1 tape for IO
transition funcs for programs/PC

====> they can simulate each other. thus our modern computer - TM are equivalent.
anything TM can[not] solve, CPU can[not] solve.

##
##  RAM run time complexity
##

Let R be a RAM TM
Let M be its multi-tape equivalent.
Let n be the length of the binary encoding of the input to R
Let t be the number of the steps taken by R

then M takes O((t+n)^3)

===> longest representation of a number after i steps is r_i = O(n+i)
===> total number of symbols used = l, increases by O(r_i) as step i is simulated. thus after step j,

j
l_j <= n +  Σ r_i = O((n+j)^2)
i=1

===> each step j is simulated with O(l_i) steps, so the total number of steps is

t
Σ l_j = O((n+t)^3)
j=1

##
## Turing's insights;
##
(1) a computer program can be viewed as data as part of the input to another program.
(2) a universal TM : a TM that can simulate any other machines; input is a string representation of some TM
===> as in, UTM(<M>) behaves as M

####################################
#####       Universality       #####
####################################

program code == another data (file)
fonts == data == program that generates readable characters of arbitrary sizes
TM can simulate program code

but how do we encode an arbitrarily large number of alphabets and states?
===> well, let's say binary. see below for more details.

####
####   Encoding a Turing Machine  (= encoding the transition func)
####

Let M be a TM with states Q = {q0,,,,qn} and tape alphabet T = {a1,,,,am}
Define i & j such that 2^i >= |Q|  and 2^j >= |T|
Encode qk as the string qw, where w is the binary representation of k.
(by convention, q0 = initial state, q1 = accept, q2 = reject)
Encode ak as the string aw, where w is the binary representation of k.

e.g.  if |Q| = 6 (i.e. the # of states is 6), then encode q3 as q011 because i must be 3 (or bigger)
e.g.  if |T| = 10(i.e. there are 10 symbols), then encode a5 as a0101

<M> is a string representation of M.
<M> = (input state, input symbol, output state, output symbol, direction) (trans func 2) (trans func 3) ....

e.g.
<M> = (q000, a01, q011, a11, R) (q011, a01, q100, a10, R) (trans func 3) ....

we also need string representation <I> of an input I because TM accepts input as string.

####
####  Universal TM
####

task : given input <w>#<M>, accept|reject|loop as M would on w.
==> TM simulating another TM! i.e. we can pass a description of a TM to another TM as input for it to simulate.
==> when M haults, we want the simulator TM to encode the tape content as the same output by M.

simulationi itself is simple. done with a 3-tape TM.

tape1 : <w>     # input string
tape2 : <M>     # another input string that encodes a TM, this basically consists of all the transition functions
tape3 : current state

####
####  abstraction
####

now we've established the universality of TM, lets look back on the claim that "not all functions are computable." or "not all anguages can be decided by TM"

decidable languages consist of strings that encode TM.

####
####   recognizability VS decidability
####

a language is recogniable (by anything, any computer) if there is a TM that recognizes it.
a language is decidable (by anything, any computer) if there is a TM that decides it.

==> based on the notion that TM can do what any computer can do.

recognizable = recursively enumerable = semi/partially decidable = acceptable
decidable = recursive = computable

NOTE: a decidable language is recognizable.

(recap)
a language A which a Turing Machine M recognizes is called the language of the TM, denoted A = L(M)
when we say a language is decidable, we mean there's a TM that accepts the string x iff x is in the language, and rejects iff x is not in the language.

##
##  decidability quiz
##

suppoer M1 recognizes L, and M2 recognizes !L (complement of L), then can you decide the language, using M1 & M2?

def D(x):
if M1 accepts, accept.
elif M2 accepts, reject.

====> seemingly, this is fine. but one pitfall. M1(x) might not halt. M1(x) might go into a loop, then it can never go to the second branch of the if-else statements.  M2 loop is not a concern because M1 would accept anything M2 rejects|loops. similarly if M1 rejects, then M2 accepts. so if the program returns any answer, then it is correct.

===> how do we overcome the M1 loop pitfall?  ideally, run the two machines in parallel, but not possible.
===> switch between the two at every step.

####
####  alternating trick
####

suppoer M1 recognizes L, and M2 recognizes !L (complement of L), then can you decide the language, using M1 & M2?

def D(x):
i = 1
while True:
if M1 accepts x after i step, accept
if M2 accepts x after i step, reject
i = i + 1

====> then this MUST haul after a finite number of steps.

Theorem : a language L is decidable if and only if L and !L are recognizable.

###
###  recognizability exercise
###

(1)  Let L = { <M> | M has at most 100 states}

Is L  recognizable ?   yes
Is !L recognizable ?   yes

===> simply go thru <M> and count the number of states, then accepts or rejects, no looping.
===> thus L is decidable.

(2)  Let L = { <M> | M halts on 100010}    # 34 in decimal

Is L  recognizable ?   yes     # just feed 100010 as input, see if it halts, then accept.
Is !L recognizable ?   no      # we cannot run the machine until it does not halt

===> does not halt == loop/cycle. one may argue we should detect a cycle, but it is not possible as the tape can be infinite.

(3)  Let L = { <M> | M accepts nothing }    i.e. L = {<M>| L(M) = {} }    !L = {<M>| L(M) != {} }

Is L  recognizable ?   no   # yes we can create some TM that immediately goes into Q_reject, but that is not general.
Is !L recognizable ?   yes  # M accepts something

===> we can run M on all strings. string empty, len 1, len 2, so on, in parallel. (aka "Dovetailing")

btw, M accepts everything is L(M) = Z*

####
####   Dovetailing
####

= A way to run a countable set of computations at once.
much like diagonalization trick.

don't start processing an empty string as it might never halt, in such a case, you can never test on the rest of the input strings. so we go diagonal. M(null), M(0).... = countable set of computations
thus eventually every configuration of the table is reached. thus M that accepts something will be recognized, because M must accept something after a finite number of steps. while its complement { <M> | M accepts nothing } cannot be recognized this way. (as such, dovetailing has an affinity with non-determinism. though this step by step diagonilization trick lets deterministic TM to solve this too.)

(step)
|  1   2   3   4   5  ....
---------------------------------
M(null)|  1   3   6
M(0)   |  2   5
M(1)   |  4
M(10)  |
M(11)  |           .(accept)
.
.
.

(4)  Let L = {<M> | M halts on every input}

is L  recognizable ?  no   # cannot check every input even with Dovetailing. there will always be some input yet to be checked.
is !L recognizable ?  no   # M that does not halt on some input should be recognizable? well how do we know if it does not halt? it comes back to the discussion of how do we detect a cycle/loop?

===> in general, when proving (un)recognizability, try to find a TM that recognizes it. or try to reduce from un-recognizable language. e.g. {<M>| M loops on ''} {<M>| M does not accept <M>} {<M>| M accepts nothing}

###
###  finding a complement of a language(or a set)
###

visualize in Benn diagram, to capture the idea that the whole universe should be split MECE between L and !L.
also, thinking mathematically helps.

L  = {<M> | M halts on every input}
L  = {<M> | for all x, M(x) halts}
!L = {<M> | for some x, M(x) does not halt}

e.g.
A = {<M> | M  accepts all even numbers}   # not recognizable, there is always some even num you have not checkd
!A = {<M> | M does not accept some even numbers} # again, does not accept = reject|loop, and if loop, we cannot detect.

logical negation
(ref) https://www.math.washington.edu/~aloveles/Math300Summer2011/m300Quantifiers.pdf

#####################################
#####       Undecidability      #####
#####################################

a halting problem. it is IMPOSSIBLE to determine whether an arbitrary computer code halts or loops.

the best you can do is to simulate cases where it halts, but not much more.

we will prove this obsevation.

####
####  diagonalization
####

diagonalization trick applied to Engilsh adjectives.

"long"  "polysyllabic"  "French" (string representation of the adjectives)
------------------------------------------------
long         |    0           1            0       # "long" is not a long word
polysyllabic |    0           1            0       # "polysyllabic" is a polysyllabic word
French       |    0           0            0       # "French" is not a French word

===>  now add "heterological" which means a property that its representation does not possess.
===>  i.e.  "long" is not a long word, thus it is heterological

"long"  "polysyllabic"  "French"   "heterological"
------------------------------------------------------------------
long         |    0           1            0             1
polysyllabic |    0           1            0             1
French       |    0           0            0             0
heterological|    1           0            1            ???

====> paradox. if heterological is a heterological word, then it has to be both 0 & 1 in ???...

consider "the set of all sets that do not contain themselves." is this set the member of itself ???
===> need to define sets properly

consider "this statement is false"   # again, false is not well defined mathematically in this case

####
####    An Undecidable Language
####

Let {M1, M2, ... } be the set of TMs.  since TM can be expressed as string, there can only be countably many TMs.
i.e. we can enumerate in correspondence with integers
Define f(i,j) = 1 if Mj accepts <Mi> else 0

<M1>  <M2>   <M3>
---------------------------
M1 |   0     1     1          # values are not important
M2 |   1     1     0          # lets say these are the values
M3 |   0     1     0

Now, consider the langauge L = {<M>| M does NOT accept <M>}
i.e. <M> is not in L(M)
L consists of string representations of machines that do NOT accept themselves.
now, lets say ML recognizes L. (soon to be proven contradictory below)

<M1>  <M2>   <M3>  <ML>
-----------------------------
M1 |   0     1     1
M2 |   1     1     0
M3 |   0     1     0
ML |   1     0     1    ???      # again, paradox

===> if ML does NOT accept <ML>, then <ML> is not in L(ML), which puts <ML> into L.
===> then by the definition of L, ML should accept <ML>. Contradiction!

By diagonalization,
L = {<M>| M does NOT accept <M> }  i.e. <M> is not in L(M)
is not recognizable.

thus, by definition, L is undecidable.

thus, !L = DTM = {<M>| M accepts <M>}  i.e. <M> is in L(M)
is undecidable.

####
####   The Dumaflache (in place of TM)
####

1. halts on all inputs.  # possible
2. can interpret dumaflache descriptions and simulate them. (i.e. a universal dumaflache exists)
3. invert the result of an interpretted dumaflache.

Can this inverter-dumaflache exist?   ---  No

====> think about defining the inverter dumaflache.

def invD(<D>):
if D(<D>) accepts, reject
if D(<D>) rejects, accept

===> then what happens when you invD(<invD>) ?

def invD(<invD>):
if invD(<invD>) accepts, reject
if invD(<invD>) rejects, accept

===> paradox, thus the inverter dumaflache cannot exist.

###
###   Atm
###

Atm = { <M,w> | M accepts w } is recognizable, but undecidable.
Htm = { <M> | M halts on '' }
Etm = { <M> | M accepts nothing }
Dtm = { <M> | M accepts <M> }

####
####  Mapping Reductions   # a technique to turn an instance of a prob to an instance of another
####

(Definition):  language A is mapping reducible to language B (written as A <= m,B  see video for notation) if there is a computable function f:Z*->Z* where for every w, w is in A <=> f(w) is in B.   # <=> denotes iff

video visualizes the above statement very very well

NOTE: notice the reducer reduces A->B and !A->!B    i.e. A >=m B <=> !A >=m !B
the mapping does not have to be one-to-one or onto, just loosely mapping "whether w is in A" to "whether f(x) is in B"
thus, you can map every element in A to the same element in B if it is easier for you.

NOTE: reduction itself must be always computable

NOTE: (def) a function f:Z*->Z* is a computable function if some TM M, on every input w, halts with just f(w) on its tape.

e.g.
every multiplication(17*3) problem can be reduced(converted) to an addition(17+17+17) problem. but not vice versa.

# exercise

------------------------------------------------------
#Consider the language E over the binary alphabet
#consisting of strings representing even non-negative
#I.e. E = {x | x[-1] == '0'}.

#Reduce E to the language {'Even'} by implementing
#the function R
def R(x):
if x[-1] == '0':
return "Even"
else:
return "Odd"

def main():
#A few test cases
assert(R('10010') in ['Even'])
assert(R('110011') not in ['Even'])

if __name__ == '__main__':
main()
----------------------------------------------------
----------------------------------------------------
#Consider the language A = {'John'}.  Reduce
#A to the complement of A by implementing
#the function R.

def R(x):
if x == 'John':
return "not john"
else:
return 'John'

def main():
#A few test cases
assert(R('John') not in ['John'])
assert(R('110011') in ['John'])

if __name__ == '__main__':
main()
-------------------------------------------------
-------------------------------------------------

#to the halting problem {<M> | M halts on the empty string}.

#Since we can't very well test whether your output function
#actually loops, we ask you to raise the exception Looping
#defined below with

# raise Looping()

#We use the convention that a machine accepting corresponds
#to a procedure returning true and a machine rejecting
#corresponds to a procedure returning false.

#For those less familiar with python, one can define functions
#nested functions and then return these functions as objects.
#For example,
def foo(w):
def returns_ww():
return w+w
return returns_ww

bar = foo('foobar')
assert(bar() == 'foobarfoobar')

#Here is the Looping exception
class Looping(Exception):
pass

def R(x):
def func(y):
if y == '':
return True
else:
raise Looping()
def func2(y):
if y == '':
raise Looping()
else:
return True
if x == 'Jane':
return func
else:
return func2

def main():
#A few test cases
N = R('Jane')

assert(N('') == True)

try:
N = R('not Jane')
N('')
#Code should not be reached
assert(False)
except Looping:
pass

if __name__ == '__main__':
main()

#  instructor comment

It is often helpful to think about reduction mapping one "question" to another (or one "problem" to another).

In this case, the input question is: "Is this string equal to 'Jane'?", i.e L = {'Jane'}.

The output question is: "Does this string represent a Turing machine that halts on the empty input?" i.e. {<M> | M halts on the empty string}.

For every string x, the answer to "Is x equal to Jane?" ("Is x in {'Jane'}?") should be the same answer to "Does R(x) represent a Turing machine that halts on the empty input?" (Is R(x) in {<M> | M halts on the empty string}).

When actually writing python code, we allow our "reductions" to return function objects (like returns_ww in the example) instead of strings.
----------------------------------------------------

##
##  the empty language = the empty set
##
(from instructor notes)

the empty language is the empty set. In our python-ish convention, returning True corresponds to accept and returning False corresponds to reject.

Here is a machine that recognizes the empty language.
-----------------
def M(x):
return False
-----------------
That is, the machine rejects every input.  What does it accept?  Nothing.

Here is another machine that recognizes the empty language
------------------
def M(x):
while True:
continue
------------------
This machine loops on every input.  What does it accept?  Nothing.

Here is a machine that recognizes the empty string
------------------
def M(x)
return x==''
------------------
What does it accept?  Exactly {''}.

Here is another
------------------------
def M(x)
if x =='':
return True
else:
while True:
continue
------------------------
What does it accept?  Exactly {''}.

####
####   Reductions and (Un)decidability
####

suppose language A is reducible to B.   A<=M,B

one wants to know if some given string x is in A. (we can know if A is recognizable)

suppose we have a decider D(y) for B.

def D(y):
somehow decides B    # accepts if y in B, rejects if y not in B

reduction function is computable,

def R(x):
implements A<=M,B

sequence:    x -> R(x) -> y -> D(y)

Thus, x is in A <=> R(x) is in B <=> D(R(x)) accepts

i.e.  x in A = y in B
i.e.  x not in A = y not in B

===> in the above case, notice the composition of the reduction and B's decider became the decider for A.

Thus, we can conclude the following;

If A is reducible to B, then 1. B is decidable => A is decidable
2. B is recognizable => A is recognizable
3. A is undecidable => B is undecidable
4. A is unrecognizable => B is unrecognizable

===> 1. and 2. are shown above.
===> 3. and 4. are obvious because if A is reducible to B, but A is undecidable, then that is because B's decider does not exist.

====> note we CANNOT say A is decidable => B is decidable
B is undecidable => A is undecidable

====> trick to remember is "A is reducible to B" translates to "B is as hard as A" i.e. "A is easier than B"
====> to quote Sipser "if A reduces to B, then you can use solution to B to solve A. Because any instance of problem A can be reduced to an instance of B."

####
####   reduction technique in action
####

WTS:  B = {<M>| M accepts something}  i.e. L(M) != empty
is undecidable

strategy: reduce Dtm = {<M>| M accepts <M>} to B.   i.e. Dtm <= M,B    # Dtm means a diagonal TM = undecidable

def R(<M>):
def N(x):
return M(<M>)
return <N>

===> R() implement another TM, and returns its description.
===> notice N() accept any x(=everything) if M(<M>) accepts, otherwise accepts nothing.
note; everything means L(N) = Z*
nothing means L(N) = empty
===> decider for B should be able to tell the diff, whether the machine N accepts nothing or something.
===> but if B is decidable, then Dtm must be decidable. it cannot be the case.
===> Dtm is undecidable, B must be too.

!D = {<M>| M does not accept <M>}
!B = {<M>| M accepts nothing}

If M(<M>) accepts (which means it is in D), then N accepts everything (which is a subset of N accepts something) because it'll return True regardless of what x is, so N is in B.

If M(<M>) rejects (which means it is in !D), then N rejects everything since it returns False (which means it accepts nothing), which is !B.

If M(<M>) loops (which is also in !D), so will N, which means N is again in !B.

####
####   The Halting Problem
####

given any input, we cannot know if the program/TM halts. i.e. is it looping or taking a long time?
in fact, even without input, we still cannot know.

WTS: language Htm = {<M>| M halts on empty string ''} is undecidable.

strategy: reduce Dtm = {<M>| M accepts <M>} to Htm.  Dtm <= M,Htm

a possible (among many others) reduction function;

def R(<M>):
def N(x):
if M(<M>) rejects, then loop.
else, accept.
return <N>

====> This is intuitive enough. If M(<M>) rejects then loop, if M(<M>) loops, it loops. Thus M does not halt on anything. in this case, L(N) is an empty set {}
====> if M(<M>) accepts, then accept everything, which guarantees M halts on empty string. Again reduction does not have to be one-to-one mapping.

====> we know Dtm is undecidable, so Htm is undecidable.
====> if Htm is decidable, then Dtm must be decidable, but we know via diagonalization trick Dtm is undecidable.

####
####   Filtering
####

WTS: Stm = {<M>| M accepts exactly one string ( |L(M)| = 1) }  is undecidable.

strategy: reduce Htm = {<M>| M halts on empty string} to Stm.  i.e. Htm <= M,Stm
(doesnt matter which undecidable language we reduce from, we just want to use the same reduction technique as before to prove Stm is undecidable.)

a possible (among many) reduction function;

def R(<M>):
def N(x):
M('')               # if loops, then L(N) = {}
return x == ''      # otherwise, L(N) = {''}
return <N>

NOTE: empty set means really nothing. no string in the language. L(N)={}
whereas L(M)= {''}  still counts as one (empty) string

Decider for N should be able to tell the diff between the above two cases, but if that is true, then Htm is decidable. We already proved Htm is undecidable, so by contradiction, we show Stm is undedicable.

return x==''  looks too far-fetched?  well, it is a valid reduction because we mapped all TMs that halt on '' to this TM that accepts one (empty) string in the pool of all TMs that accept one (any) string. recall, the mapping does not need to be one-to-one or onto.

##
## reduction exercise
##
-----------------------------------------------
######################################################################
#Provide a reduction from LOOP = {<M> | M loops on the empty string} to
# {<M> | M does NOT accept the string '100010'}.

#You must define a function R that takes a function M as a parameter
#and returns another function N.  The function R must always
#halt.

#In general, it is only necessary that <N> be in L if M loops and not in
#L if M otherwise.  For this exercise, N must also always halt if M does.

#For example, consider what the reduction from HALT to
#{<M> | M accepts only ''} might look like

#def R(M):
# def N(x):
#   M('')
#   return x ==''
# return N

######################################################################
def R(M):
def N(x):
M('')                  # if M loops on '', then M does not accept '100010'
return x == '100010'   # otherwise accept '100010'  OR you can return True to accept everything including '100010'
return N
---------------------------------------------------------

(explanation from classmate)
the key is

- "Accepts unconditionally" definitely "Accepts 100010".
- "Loop unconditionally" definitely "Does not accept 100010".

So if the input is a machine that halts, we end up being a machine that accepts "100010", and if the input is a machine that doesn't halt then we end up being a machine that does not accept "100010".

Of course, you're absolutely free to change "Accept unconditionally" to "Accept if the input is '100010'" if you really want to, but it's not necessary.

===> again, always be conscious of the reduction concept. it is called reduction, but essentially it is a mapping where A <=m B means B is as hard as A. A is not as hard as B. thus, A is decidable does not say anything about B's decidability, but B being decidable means A being decidable. A being undecidable means B which is harder to decide must be undecidable. B being undecidable does not say anything about A's decidability.

####
####  more reduction exercise
####
(video)

---------------------------------------------
######################################################################
# Provide a reduction from HALT = {<M> | M halts on the empty string}
# to ZO = {<M> | M accepts strings of the form 0^n1^n for n >= 0 and
# no others}.

# You must define a function R that takes a function M as a parameter
# and returns another function N.  The function R should must always
# halt.

# In general, it is only necessary that <N> be in ZO if M halts and not in
# ZO if M does not halt (or vice-versa).

#For this exercise, it must also be the case that <N> is in ZO if M halts
#and not in ZO otherwise.  Also, N must always halt if M does.

#For example, consider what the reduction from HALT to
#{<M> | M accepts only ''} might look like

#def R(M):
#    def N(x):
#        M('')
#        return x ==''
#    return N

# assert( OnlyEmptyString( R(accepts) ))
# assert( OnlyEmptyString( R(rejects) ))
# assert( not OnlyEmptyString ( R(loops) ))
######################################################################
def R(M):
def N(x):        # assume x is binary encoded, e.g. 000111  # we can split them in half and check all 000s & 111s
M('')        # if loops, L(N) = {}
return x == '0'*(len(x)/2) + '1'*(len(x)/2)  # cannot simply return True. L(N) = 0^n1^n where len(x) = 2n
return N
----------------------------------------------------

===>  decider for N can tell the diff between L(N)={} VS L(N)={0^n1^n} but then it means there is a TM that can recognize the halting problem. not possible. thus the decider for N cannot exist. N is undecidable.

####
####   more reduction quiz
####

let Htm = {<M>| M halts on empty string}    # Htm is recognizable, !Htm is not
let Etm = {<M>| M accepts nothing}          # Etm is not recognizable, !Etm is

(reducible?)
(1) !Htm <=m,Etm   true
(2)  Htm <=m,Etm   false
(3)  Etm <=m,Htm   false

For (1),
!Htm = {<M>| M loops on empty string}

def R(<M>):
def N(x):
M('')        # if M loops on '', so N will accept nothing. good
return True  # if M halts on '', then N accepts everything (well to be strict, !Etm = {<M>|M accepts something}) but accepting everything is accepting something. not the other way around.
return <N>

==> thus since !Htm is undecidable, Etm is undecidable.
also, !Htm is unrecognizable, Etm is unrecognizable.

For (2), we cannot create R()

def R(<M>):
def N(x):
M('')        # when M loops on '', N needs to accept, but since M loops, we cannot let N accept. not reducible.
return False
return <N>

For (3), similarly,,

def R(<M>):
def N(x):
if M(x) accepts, raise loop if x == ''
if M(x) rejects, return x == ''
if M(x) loops, return x == ''   # but if M(x) loops, then N cannot halt on ''  thus this reduction doesnt work.
return <N>

Also, more formally, for (2)(3), we can prove by contradiction.
(to show contradiction) suppese Htm is reducible to Etm,  i.e.  <M> is in Htm <=> R(<M>) is in Etm.
then let A be the recognizer for !Etm (we talked about !Etm's recognizability in Dovetailing section).
!Etm = {<M>| M accepts something}

run M('') and A(R(<M>)) in parallel to decide Htm   # to show contradiction
==> if M('') halts, Htm is recognized.
==> if M('') loops, <M> is in !Htm, and R(<M>) is in !Etm, which A() will recognize.
==> thus we constructed a decider that recognizes Htm and !Htm.
==> but Htm is undecidable as proven previously. contradiction. this reduction is not possible.

MORAL : you cannot reduce a recognizable language to a co-recognizable one or vice-versa when proving undecidability.

NOTE : a co-recognizable language is one whose complement is recognizable.

(video)

####
####    Rice's Theorem
####

===> describes the whole family of umcomputable functions.   basically, we cannot say anything about computers just based on the language that it recognizes.

so far, the pattern was WTS : L = {<M>| L(M) is in P} is undecidable.   # P for Property

(1) membership in L depends only on L(M) = the set of strings that M accepts
(2) assume that there exists some <M1> in the L, and other <M2> not in L

recall in all our reduction practice, we created a machine N that either accepts nothing or else it does something.
similarly there are two cases for Rice's Theorem.

(1) empty set is in P      # i.e. every machine that does not accept anything is in L
(2) empty set is NOT in P  # every machine that does not accept anything is NOT in L

NOTE: "does not accept" means either reject or loop

For (2),  reduce from Htm = {<M>| M halts on empty string}

def R(<M>):
def N(x):
M('')          # act like M1 if M('') halts on ''   i.e.  L(N) = L(M1)
return M1(x)   # loops otherwise   i.e.  L(N) = {}
return <N>

=====> a decider for N (if existed at all) will decide between L(N)=L(M1) and L(N)={} case, thus decide Htm.
=====> but Htm is undecidable. contradiction.

For (1),  essentially the same. replace M1 by M2. also reduce from !Htm = {<M>| M loops on empty string}

def R(<M>):
def N(x):
M('')
return M2(x)
return <N>

=====> all of the above proves the Rice's Theorem.
Rice's Theorem : Let L be a subset of strings representing TMs, where
(1) If M1 & M2 recognize the same language, then either <M1>&<M2> are in L, or <M1>&<M2> are NOT in L.
(2) There exist M1,M2 such that <M1> is in L, <M2> is NOT in L
Then, L is undecidable.

===> (1) just says the langauge depends only on the behavior of the machine (by behavior, we mean the language the machine recognizes), not its implementation(trans func, states, etc), or alphabet.
===> (2) says langauge is not trivial. "trivial property" means it includes/exclude every TM.

#  Rice's theorem quiz

Which of the following properties is decidable.
- is a virus ?  (have both properties, thus undecidable)
- decide if an input binary string has an equal number of 0s and 1s (again, undecidable)
- implements a reduction showing that Htm <=m,Etm  (simply reject, this is about implementation. Htm = recognizable, Etm = co-recognizable) thus decidable
- halts in at most 10000 steps (decidable. run up to that step)

##
##  trivial property
##

The property P is defined as a subset of the recognizable languages.  It is trivial if it includes every recognizable language or if it is empty.

e.g.
P={L | L is the language of some Turing machine that has an odd number of states}
Here P is the set of languages that can be recognized by a Turing machine with an odd number of states.

##
##  Rice theorem examples
##

P1 = {L|L is decided by some 5-state TM}

This means L is decided by some 6 or more state TM. but not necessarily by 4 or fewer state TM.

(1) Let M1 and M2 be TMs. If L(M1) = L(M2), then either both in P1, or not in P1.
(2) there are some M not in L, and some in L. thus P1 is non-trivial.
==> 5-state TM can accept 0011 and reject 1100, but suppose a 1-state TM, which cannot do the same.

P2 = {<M>|M is a TM that always enters state q4}

(1) If L(M1) = L(M2), still M1 may enter q4, M2 may NOT enter q4. transition functions, states, are implementation details. not the property of the language. REMEMBER, in Rice Theorem context, property is a set of langauges.
(2) P2 is not trivial. Some TM enters q4, and other dont.

P3 = {<M>| M halts after 10 steps}

--> again lets say M1 immediately rejects when starting, thus L(M1) = {}, then M2 rejects everything after 15 steps, then L(M2) = L(M1), but <M1> is not in P3 while <M2> is. This means Rice Theorem does not apply here.

##
##  rice theorem ; hard example
##

S ={<M> | M accepts exactly one string}
Then the complement is !S = {<M> | M accepts nothing, or M accepts at least 2 strings}

Can S be mapping-reduced to !S?

- no

(instructor explanation)

It follows from the recursion theorem.  See ch6 of Sipser, the fixed point theorem in particular.

Suppose that such a reduction R did exist.  Then consider the following TM

def M(x):
Obtain <M> via the recursion theorem
Compute <N> = R(<M>).
return N(x)

By construction, <N> = R(<M>), and yet L(M) = L(N) since N just simulates M.

Ask yourself, "is <M> is in S?"

(This is well beyond the scope of the course.)

(uncomputable prob example: busy beaver)
https://github.com/pkrumins/busy-beaver

###
###  self-printing code
###

demonstrates "recursion theorem" from computation theory.

Th: Let T be a Turing machine that computes a function t: Z*,Z* -> Z*
There is a Turing machine R that computes a function r: Z* -> Z*, where for every w,
r(w) = t(<R>,w)

many examples on the web. here is a two-liner python example.

----------quine.py-------------
s = "print \"s = \\\"\"+s.replace(\"\\\\\",\"\\\\\\\\\").replace(\"\\\"\",\"\\\\\\\"\")+\"\\\"\"+\"\\n\"+s"
print "s = \""+s.replace("\\","\\\\").replace("\"","\\\"")+"\""+"\n"+s
-------------------------------

Essentially the string defines the program core while the program core prints the string and the rest of the code (which may include header, footer, variable declaration, etc). Notice we print the string twice, first time for the string definition itself, and the second time for printing the program core itself which is defined in the string. The only tricky part is taking care of escape characters, which is the source of the headache as shown above. There are many more elegant examples on the web that makes use of each language's builtin functions, etc.

------a shorter version------
s = 's = %r\nprint(s%%s)'
print(s%s)
-----------------------------

#################################
######      P and NP       ######
#################################

(N vs NP lecture by Sipser)  https://www.youtube.com/watch?v=msp2y_Y5MLE

so far, we focused on "computability" of things, now we look at "complexity" (efficiency) of computable things.

e.g.
finding a clique of size N in a graph of M nodes where N < M is hard,

a clique means every node is connected to every other node.

P: sorting, multiplication
NP: factoring, clique

factoring takes long
- brute force search takes long
- is search necessary? yes, for all we know today.

clique problem
100 nodes, k size clique
we need to try every 100Ck combination of nodes.
is searching necessary, for all we know, yes.

scheduling
map coloring
protein folding
theorem proving
packing
puzzles
traveling salesman
many more

is searching necessary? we don’t know, sp yes for all we know.

can we solve searching problems without searching?
is P = NP or N != NP
hence, the P vs NP problem

# a strange way to test primality
old theorem: for prime p and a < p,
a^(p-1) = 1 mod p
===> no need to factorize, thus primality test can be P

###
###   P and NP
###

P  : problems "solvable" in polynomial time.
NP : problems "verifiable" in polynomial time.
NPC: hardest probles in NP.   # NPC = NP complete

---------NP----------------------
|                               |
|   ----P------    ----NPC----  |
|   |         |    |         |  |
|   -----------    -----------  |
|                               |
|--------------------------------

###
###   Delicacy of Tractability
###

polynomial(tractable)
---------------------------------------
shortest-path btw two nodes in a graph
vertex cover in Bipartite Graphs  (ref) http://en.wikipedia.org/wiki/Bipartite_graph
Linear Programming
Eulerian cycle  (touches each edge once)
2-CNF SAT  (boolean expression with two literals per clause)

NP-complete
---------------------------------------
traveling sales man | (in a shop, customer orders item A,B,C then the clerk needs to go get them, in an efficient route)
(or the best route to get to all the attractions in a Desney land)
vertex cover in general graphs
integer programming
Hamiltonian cycle  (touches each vertex once)
3-CNF SAT  (boolean expression with three literals per clause)

=====>  the diff is sometimes subtle, but the ability to distinguish btw P vs NP is important. if P, then you can use any of the great existing algo to construct your algo. but if NP, then all you can do is to build some algo that works for a smaller instance of the problem or gives good enough approximation of an answer. but never the exact algo that solves the problem efficiently in general, scalable way.

####
####   Running Time Analysis
####

def:  Let M be a TM that halts on all inputs. The running time of M is f:N->N where
f(n) = max {m | m is the number of steps taken by M on an input string of length n}

# N in this case represents natural numbers

e.g.

consider a TM that checks if the input binary-encoded string contains "1"

transition functions;
q_0,0 -> q_0,0,R
q_0,1 -> q_a,1,R
q_0,b -> q_r,b,R

input    steps
--------------
0        2
1        1
00       3
01       2
10       1
11       1
.        .
.        .

===> therefore, f(1) = 2   and   f(2) = 3

===> exact runtime is often not required, we want some kind of approximation. in this case, we use asymtotic analysis.

###
###  asymptotic analysis
###

define:    O(f(n)) = {g:N->N | there exist c,N such that for all n>=N, g(n) <= c*f(n)}

In English,  O(f(n)) is a set of functions g over natural numbers, where c*f(n) is at least g(n) for all n<=N
by convention, we write  g(n) = O(f(n))  instead of  g(n) is in O(f(n))

must see video for good visualized examples
e.g.
g(n) = n^2 - n + 10     then g(n) = O(n^2) with c=1, N=10 (or c=10, N=1)
g(n) = O(n^3) with c=1, N=3

NOTE: n is the length of the input string

(ref)   http://mathforum.org/library/drmath/view/51433.html

aside:  log_b(n) = log_2(n) / log_2(b)
(ref)  http://en.wikipedia.org/wiki/List_of_logarithmic_identities#Changing_the_base

####
####  The Class P
####

def:   P = {L| L is a language recognized by an O(n^k) time deterministic TM for some k in N }     # N = natural numbers

NOTE:
- we say "language" instead of "problem" because problems are described in langauges.
- "deterministic" means given a current state, and a tape symbol being read, there is one transition for the machine to follow.
- polynomial does not mean tractable. what if k is big? well class P is defined to separate from NP in this regard, not to define tractable. besides computing power changes all the time.
- P is the same for single-tape, multi-tape, RAM
- P is closed under compositions. if a function A(n) = O(n) then even if other function B(n) uses A within, A(n)=O(n) still.
- to show polinomiality of an algo, (1) show it has poly number of stages, (2) show each stage has poly steps

##
##  problems and encoding
##

"how we encode problems" can affect the P-ness of the problems. for the sake of our study, we mostly ignore this encoding problem.

###
###  P-ness quiz
###

which of the following is in P ?

U = a set of some integers written in binary

(1) does U contain a number greater than 100 ?      # P   - scan thru all elems. it's O(n)
(2) do any 3 elems of U sum to zero ?               # P   - try all combinations. nC3 = O(n^3)
(3) does any non-empty subset of U sum to zero ?    # NPC - all subsets of n means 2^n possibilities

NOTE: diff btw polynomial and exponential

polynomial : some variable is raised to the power of some integers.
e.g.  x^3 - x^2 + x

exponential: some real number is raised to the power of variable x (or some function of x)
e.g.  7^x   e^x

====> polynomial - P,  exponential - NP

####
####  Non-Deterministic TM
####

recall TM configuration sequence. (given a current state, and the tape symbol being read, the transiiton function decides what to do.)

in "derministic" TM computation, given an input, cfg sequence is unique.

o - o - o - o - o - o - o - o
|                           |
(initial cfg)           (accept/reject cfg)

in "non-deterministyic" TM, transition cfg sequence can branch to multiple paths.

o - o - o - o - o - (reject)
\
o - o - o - o - (loop)
\
o - o - o - (accept)   # if one branch accepts, the entire TM accepts.
\
o - (reject)

====> we need to re-write our "transition function" definition for non-deterministic TM

f: Q,T -> {S|S is in Q,T,{L,R} }     # now the range is the power set of all possible transitions.

NOTE: A non-deterministic TM accepts if there is any valid sequence of configurations resulting in an accept state.
Similarly, a NTM rejects when every branch reaches a reject state.
Similarly, a NTM loops when no accepting branch, and any of the branches loops.

NOTE: NTM is funny that we cannot invert accept and reject in the way we can for DTM. thus the complements of NP problems are often not even in NP.

###
###   composite numbers (non-prime numbers)
###

task: decide the language { x | x = str(p*q) for p,q > 1 }

- a possible implementation of deterministic TM;

---------------------
decider(x):
p = 1
while True:
p = p + 1
if p*p > x:         # hence O( x^(1/2) ) steps
reject
elif x%p == 0:
accept
----------------------

===> each iteration of the loop takes the number of steps polynomial to the number of bits used to represent x.

complexity is  O( x^(1/2) ) = O( 2 ^ len(x)/2 )    # remember x is represented in binary
# meaning, 2^len(x) -1 is the biggest possible number
# (2^len(x) -1)^(1/2) = O( 2 ^ len(x)/2 )
NOTE:  x^(1/n) =  y   # where y^n = x
e.g.  25^(1/2) = 5

- a possible decider implementation of non-deterministic TM

----------------------------
decider(x):
p = '1'     # we will treat p in binary representation
while True:
Foo
(branch 1)  p = p + '0'      # each iteration takes care of one additional bit
(branch 2)  p = p + '1'      # how many bits in x?  well, O(log x) = O(len(x))
if p*p > x:
reject
else:
(branch 3) goto Foo        # just running the check in parallel
(branch 4) if x%p == 0:
accept
else:
reject
------------------------------

====> O(log x) = O(len(x))  iterations

##
##  running time for non-deterministic TM
##

because we can run things in parallel, the max number of steps in any of the branch is the runtime of the machine as a whole.
NOTE: once you have a bound on any of the accept cfg sequence, you can bound the steps number to cut any loop.

####
####   The Class NP
####

NP = {L| L is a language recognized by an O(n^k) non-deterministic TM for some k in N }   # N is natural numbers

NP stands for "non-deterministic ponomial"

NOTE: any TM that recognizes a language in polynomial time can be turned into a polynomial decider if we can add a bound (aka time-out).

###
###   NP = verifiability
###

def: a verifier for a language L is an algorithm V where L = {w| V accepts <w,c> for some string c }

===> w is like a statement and c is its certificate V can check.
===> a verifier V is polynomial "if" V halts in O(|w|^k) steps

Claim: NP = {L| L has a polynomial verifier}

recall non-deterministic computation tree

o - o - o - o - o - (reject)
\
o - o - o - o - (loop)
\
o - o - o - (accept)   # if one branch accepts, the entire TM accepts.
\
o - (reject)

(1) if verifier, then NP

==> if a language is in NP, then there is a non-deterministic TM that recognizes(accepts) it in polynomial time.
==> though the verifier cannot simulate the whole tree of non-derministic computation in polynomial time, but it can simulate a singple path. Think of the certificate as the direction for the verifier to know exactly which path.

(2) if NP, then verifier

==> non-deterministic machine branches off many ways to choose "c" to append to the input w
==> then for each branch (i.e. for each c), simulate V(<w,c>), if any accepts, there you have a verifier.

###
###  NP quiz
###

which of the following is in NP ?

(1) is a graph connected ?
==>  NP   just do DFS. O(V+E) which is in P, thus in NP also

(2) does a graph have a set of k-vertecies with no edge in between?
==>  NP, given the certificate which shows which k vertices, V just needs to check one vertex, if it has an edge to the rest of the k vertices. this is k*(k-1) = O(k^2)

(3) does a TM accept exactly one string?
==>  not in NP. in fact, this is not even recognizable, as we've seen before.

NOTE: there are decidable problems that are not in NP. e.g. deciding if a non-derministic TM halts after k steps where k is encoded in binary. (proof is beyond the scope of this course.)

##################################
####      NP-completeness     ####
##################################

###
###  The Hardest Problems in NP
###

P is a subset of NP.

How to show a problem is intractable:

(1) show it is not in P.
(2) reduce another intractable problem to it.
(3) reduce an NP-complete problem to it.

===> we focus on (3), specifically "polynomial reduction" and "NPC" using cook-levin SAT

###
###   Polynomial Reduction
###

Def:  Language A is "polynomial time reducible" to language B (written A =<p B) if there is a polynomial time function f: Z* -> Z* where for every w, w is in A <=> f(w) is in B

NOTE: very similar to reduction, but notice the function has to be "polynomial time"

e.g.   Is x in A ?

lets say we have a decider TM M for language B. we just need to reduce a machine N that computes f(x) in polynomial time to M. then the combination of N+M become a "polynomial" decider for language A. because if f(x) is in B, then x is in A

x -> N -> f(x) -> M -> tells if f(x) is in B (i.e. if x is in A)

(must see good visual video)  https://www.youtube.com/watch?v=WtqaJ28QzrA

e.g.

Let N be an algorithm with running time p(n) >= n
Let M be an algorithm with running time q(n) >= n

input    algo   output/input  algo   output
---------------------------------------------
x  ->  N   ->   N(x)     -> M   -> M(N())

==> n is the length of the input x
==> the most appropriate bound for the running time of N*M is    O(q(p(n)))

IMPORTANT NOTE: compositions of polynomials is a polynomial.
i.e. if each algorithm's running time is bounded by a polynomial, then their compositions is bounded by polynomial also.
====> hence the theorem : if A <=p B, and if B is in P, then A is in P
====> thus the decider for A you get via polynomial reduction of a polynomial f() to the polynomial decider B is polynomial.
====> similarly, if A <=p B, and if A is not in P, then B is not in P  (B is harder than A)

####
####  Independent Set (clique) problem
####

Def:   given a graph G=(V,E),  a subset S of V is an "independent set" if there are no edges between vertices in S.

e.g.

a - b - c
|   |
d - e

===> S(a,c,d) is an independent set of size 3

Input : G=(V,E), integer k <= |V|
Decision : does G have an independent set of size k ?  (this problem is in NP)

===> usually the question is what is the max size independent set in a given G?

aside: clique is a set where every vertex pair is connected, so you finding an independent set in G is the same as finding a clique in !G

####
####   Vertex Cover
####

Given a graph G=(V,E), a subset S of V is a "vertex cover" if every edge is incident on vertex in S.

e.g.

a - b - c
|   |
d - e

==> S(b,e) is a vertex cover, touching every edge. (that is what we mean by incident)

Input : G=(V,E), integer k <= |V|
Decision: does G have a vertex cover of size k ?    (this problem is in NP)

===> usually we wanna know the smallest possible k for a given G.

####
####  polynomial reducing "independent set" to "vertex cover"
####

e.g.

a - b - c
|   |
d - e

===> S(a,c,d) is an independent set.
===> S(b,e) is a vertext cover.
===> notice how they are complement.

lets examine the definitions.

S is an independent set if there are no edges (btw vertices) within S.
==(rephrase)==>
S is an independent set if every edge is incident (btw vertices) on V-S.
==>  V-S is a vertex cover!!!

observation: S is an independent set iff V-S is a vertex cover.
corollary  : G has an independent set of size >=k iff G has a vertex cover of size <= |V|-k

f
(G,k) ----> (G,|V|-k)

####
####  Transitivity of Reducibility
####

Theorem: if A <=p B and B <=p C   then A <=p C

e.g. independent set <=p vertex cover   &&   vertex cover <=p hamiltonian path
then independent set <=p hamiltonian path

proof:

let M reduce A to B, and let N reduce B to C.

x -> M -> M(x) -> N -> N(M(x))    # consider N(M(x)) = R(x)
# R is polynomial

x is in A <=> M(x) is in B <=> R(x) is in C

===> composite of M+N is polynomial reduction function

####
####  NP-completeness
####

Def: language L is NPC if        # L aka a decision problem
(1) L is in NP
(2) for all other L' in NP, L' <=p L

===> (1) is because NPC is the intersection of NP and NP-hard. must see the visual in ref.
===> (2) basically says NPC problem is the hardest of NP problems.

(ref) http://en.wikipedia.org/wiki/NP-hard

===> though the rest of the lesson focuses on (2), (1) is also important.
===> for reduction proof, make sure you prove BOTH directions, and the reduction is done in "polynomial" time.

the reason for BOTH directions is because suppose we say something like
x is true iff y is true
VS
x is true if y is true   # this means x can be true while y may not be. but we want x being true to be mapped to (simulated by) y being true. and x being false mapped to y being false. thus both directions are needed.

# (Cook Levin) Theorem : boolean satisfiability(aka SAT) is NPC

####
####  CNF satisfiability    # conjunctive normal form (CNF)
####

CNF boolean formula:

- variables: e.g. {x,y,z}
- literals : e.g. x,!y
- clauses  : disjunctions(combinations of logical ORs) of literals, a clause is parenthesis segmented
- CNF formula: a conjunction(combinations of logical ANDs) of clauses

e.g.  (x | y) & (~y | z)

A formula is "satisfiable" if there exists a truth assignment such that the formula evaluates to true.

Input: a formula
Decision: is there a satisfying assignment?

====> clearly an NP problem. O(2^k) where k is the number of literals

SAT = { <bf> | bf is a satisfiable boolean formula}

##
##  quiz:
##

if P=NP, then you can build a polynomial time function f that, given a boolean formula bf, f(bf) computes a satisfying assignment.

http://cs.jhu.edu/~cs363/fall2013/assign9_sln.pdf

####
####  Cook-Levin Theorem
####

Theorem:  SAT is NPC
(1) given a candidate formula, we can verify in poly time
(2) we show we can reduce any NP prob to SAT, in poly time. (not easy, but we will show below)
Let L in NP, and let M be a non-deterministic TM that decides L in polynomial time <= p(n)

| state | pos | sqr 1 | sqr 2 | .... | sqr p(n)
------------------------------------------.... ------------
step 0    | q_0   | sqr1|  a    |  1    | .... | b
step 1    |       |     |       |       | .... |
step 2    |       |     |       |       | .... |
.       .       .     .       .       . .... .
.       .       .     .       .       . .... .
step p(n) |       |     |       |       | .... |

##
##  variables
##

State: Qik
==> after step i, the state is q_k

==> after step i, the head is over square j

Tape contents: Sijk
==> after step i, square j contains tape symbol k

Cook-Levin says that the existence of accepting computation == satisfying all constraints on filling out the above table.

##
##  configuration clauses
##

(1) exactly one state being true after east step

e.g.
(Qi0 | Qi1 | Qi2 | .... | Qir) for all i.   # r is the number of states
(~Qij | ~Qik) for all i, and j != k

(2) head in exactly one position after each step

e.g.
(Hi0 | Hi1 | Hi2 | .... | Hip(n) ) for all i
(~Hij | ~Hik) for all i, and j != k

(3) each square has exactly one symbol after each step

e.g.
(Sij0 | Sij1 | Sij2 | .... | Sijr) for all i and j
(~Sijk | ~Sijt) for all i and j, and k != t

(4) initial configuration at step 0

e.g.
Q00           # q0 is init state
H01           # head is at sqr 1
(S01k_1)(S02k_2)....(S0|w|k_|w|)     # first part of the tape contains input string w
(S0,|w|+1,b)....(S0,p(n),b)          # rest of the tape is blank

(5) ends in an accepting state

e.g.
Qp(n)y   # qy is the accepting state

##
##  transition clauses
##

a tuple (k,l,m,n,d) is valid if transition function f(q_k,s_l) contains (q_m,s_n,d)

for every step i, position j, and invalid (k,l,m,n,d),
include in the formula the clause (~Hij | ~ Qik | Sijl | ~Hi+1,j+d | ~Qi+1,m | Si+1,j,n)
-------------------   ------------------------------
if f(q_k,s_l) applies    the machine didnt follow an invalid transition

===> it is only saying whatever literal(variable) assignments must be valid in terms of available transition functions of the TM.
===> we add clauses to exclude impossible assignments. e.g  A implies ~B means (~A | ~B)

####
####  Cook-Levin Summary
####

essentially says any NP problem can be polynomial reduced to CNF SAT, thus making CNF SAT the hardest = NPC

L is in NP

is x in L ?

x -> [cook-levin reducer for L] -> boolean formula f that is satisfiable iff x is in L

(1) is the reduction correct?
x is in L  implies  f is satisfiable
f is satisfiable  implies  x is in L    #  (1) tableaux is well defined
#  (2) starts in init cfg
#  (3) every transition is valid
#  (4) ends in accepting cfg
==> thus non-deterministic TM will accept if x is in L

(2) is the reduction polynomial in the size of the input?
- running time is polynomial in the output formula length.
- output formula is polynomial in the input length.

(assuming the size of the alphabet is constant)
number of steps : p(n)
number of bits needed to distinguish n literals : log(n)
e.g. 8 literals. log2(8) = 3bits

===> now revist each clause, to know the order string length.
(the reason we care about the string length is that is the computation run time when the tape reads symbols one by one)

##  configuration clauses
(1) exactly one state being true after east step        # O( p(n) * log(n) ) string length
(2) head in exactly one position after each step        # O( p(n)^3 * log(n))   =>  one p(n) for the # of steps, and the other two for the number of head-pos pairs.
(3) each square has exactly one symbol after each step  # O( p(n)^2 * log(n))   => p(n)^2 combinations of steps-squares
(4) initial configuration at step 0                     # O( p(n) * log(n))
(5) accept state                                        # O(log(n))

##  transition clause
O( p(n)^2 * log(n))
# p(n)^2 for all pairs of states VS tape symbols

=====> all in all, summing up the length of all the above clauses are still polynomial (sum of polynomials is polynomial.)

NOTE: the point is, CNF-SAT is a NPC problem. so we can reduce any other problems to CNF-SAT to prove the problem in concern is NPC as well.

#######################################
######       NPC problems        ######
#######################################

per cook-levin theorem, any arbitrary NP problem can be reduced to CNF SAT, thus making CNF SAT the hardest(NPC) problem.
here, we will reduce CNF-SAT to 3-CNF-SAT to clique(independent set) to vertex cover.
thus proving all 3-CNF-SAT, clique, vertex cover problems are NPC. (recall trancivity of poly reduction)

then, we practice reducing 3-CNF-SAT to other problems to prove their NPC-ness

###
###  strategy
###

map CNF formula f to a 3-CNF formula f' which (1) includes additional variables Y and (2) has the property that for any truth assignment t for f,
t satisfies f  <=>  there exists t_y : Y -> {T,F} such that (t ∨ t_y) satisfies f'

e.g.

(Z1 | Z2 | Z3 | Z4)  ->  (Z1 | Z2 | ~y) & (y | Z3 | Z4)
where Zi is in {x_1, ~x_1, x_2, ~x_2, x_3, ~x_3,,,, }  # just a set of literals
where Y is in {y}

=====> this is intuitive. suppose the left side statement (Z1 | Z2 | Z3 | Z4) is true, the we want the right side statement to be true. now, if y=true, then (y | Z3 | Z4) becomes true obviously, and for (Z1 | Z2 | ~y) to be true, since ~y=false, Z1 or Z2 must be true, and vice versa for y=false case. thus, the existence of y makes the right side statement "either Z1|Z2 or Z3|Z4 must be true" which is basically the same as the left side clause.

###
###  transforming one clause
###

consider one clause (Z1 | Z2 | Z3 | .... | Zk) where each Zi is a literal in {x_1, ~x_1, x_2, ~x_2,,,,}

case      new variables      replacement clauses  (we wanna make it "3" CNF)
-------------------------------------------------------------------------------------------------------
k=1       y1, y2             (Z1 | y1 | y2) & (Z1 | y1 | ~y2) & (Z1 | ~y1 | y2) & (Z1 | ~y1 | ~y2)
k=2       y1                 (Z1 | Z2 | y1) & (Z1 | Z2 | ~y1)
k=3       NA                 already 3-CNF, nothing to do
k>3       y1,,,yk-3          (Z1 | Z2 | ~y1) & (y1 | Z3 | ~y2) & .... & (Z1 | yk-4 | ~yk-3) & (yk-3 | Zk-1 | Zk)

e.g. (Z1 | Z2 | Z3 | Z4 | Z5 | Z6)

think of y3 = Z1 | Z2 | Z3 | Z4
then resolve the original statement to (Z1 | Z2 | Z3 | Z4 | ~y3) & (y3 | Z5 | Z6)

then the rest is essentially a repeat.

think of y2 = Z1 | Z2 | Z3
then resolve to (Z1 | Z2 | Z3 | ~y2) & (y2 | Z4 | ~y3) & (y3 | Z5 | Z6)
think of y1 = Z1 | Z2
then resolve to (Z1 | Z2 | ~y1) & (y1 | Z3 | ~y2) & (y2 | Z4 | ~y3) & (y3 | Z5 | Z6)

===> polynomial steps
===> argument can extend to any k>3 by induction

###
###   CNF SAT  <=p  3-CNF SAT
###

poly reduction: transform each clause individually with new variables for each.

(can we reduce to 2-CNF? no, that would mean P=NP because 2-CNF is proven P by graph logician. we skip that discussion here.)

###
###    3-CNF SAT  <=p  IND SET
###

input :  A 3-CNF of m claues  (Z11 | Z12 | Z13) .... (Zm1 | Zm2 | Zm3)
output:  Graph G that has an independent set of size m iff the input is satisfiable.

reduction step:

e.g.   (a | b | c) & (~b | c | d) & (a | ~b | ~c)

===> connect all literals as nodes within each caluse
(we call these edges as "within clause edges")

a--b    ~b--c    a--~b
\ |      \ |     \  |
\c       \d      \~c

===> then connect all contradictory literals between clauses.
(we cal these edges as "between clause edges" or "contradiction edges")

________________
/                \
a--b----~b--c    a--~b
\ |      \ |\     \  |
\c       \d \_____\~c
\________________/

NOTE: when considering a poly time reduction of A to B, we look for strutures in the language B that can map/simulate A. such structures are called "gadgets"
e.g.
in case of 3-CNF-SAT <=p IND SET, nodes in the set simulate variables in 3CNF, and triples in the set simulate clauses in 3CNF

###
###  proof  of 3-CNF SAT  <=p  IND SET
###

WTS:  f is satisfiable => G has an independent set of size m

let t be a satisfying assignment for f.
choose one literal from each clause in f that is true under t to form S.
clearly |S| = m
vertices from distinct clauses  => no within-clause edges
all chosen literals are true => no between-clause edges

WTS: G has an independenet set of size m  =>  f is satisfiable

let S be an independent set of size m
there are no edges between any of the nodes, thus, no within-clause edges => literals comes from distinct clauses.
also no between-clause edges => literals in S are non-contradictory
THUS, any assignment consistent with literals of S will satisfy f

####
####   Subset Sum
####
(multiset means the set can have duplicate elements)

given : multiset A = {a1, a2,,,, am} where each ai is a positive integer, and k = sum of all elements in A
decision: is there a subset S in A such that  Σa = k ?   i.e. sum of all elem in S sum up to some number k
a ∈ S

e.g.  is there a way to partition {1,1,4,5,7}  so that a subset gets 1/2 the total ?

1+1+4+5+7 = 18
1+1+7     = 9     #  yes
4+5       = 9     #  yes

===> subset sum is a NP problem. verifying subset takes polynomial time, just adding elem up. but solving the problem takes 2^m which is exponential, not polynomial.

##
##  a subset sum algorithm
##

def subsetSum(m,k):
w[0][0] = True
for i = 1 to m:
for j = 0 to k:
w[i][j] = w[i-1][j] or w[i-1][j-ai]   # if any subset 1toi-1 sum up to j or 1toi-1 sum up to j-ai
return w[m][j]

====> O(m*k)
====> this is NP considering we use binary encoding for k. because input size = n bits where log(k) = n.
====> 2^n = k,  thus O(m*k) = O(m * 2^n) means NP

------- perfect explanation from fellow student -----

The reason is that we measure with respect to the size of the input, and the encoding affects the size of the input.

In binary, the number 37 is represented as 100101. In unary, it's represented as 1 1111 1111 1111 1111 1111 1111 1111 1111 1111. So as you can see, it's quite a bit bigger. The number n in binary can be represented in log(n) bits, whereas in unary it takes n bits.

For a simple example, what is the runtime of the following algorithm, "with respect to the size of the input" ?

def some_algo(k):
for i in range(k):
do_something_in_constant_time() # O(1)

You might be inclined to say O(n), but that's not the case. If k is represented in binary, the number k only takes log(k) bits to represent. So if we let n=log(k) - that is, n is the size of our input - then k=2^n. Thus some_algo runs in O(2^n) with respect to the size of the input (remember, n is the size of our input, and k is the value of the input).

However, suppose k was unary. Then the number k now takes k bits to represent. So then the size of our input is k, and the algorithm is O(n) with respect to the size of our input. So now, just by changing the encoding of the input, we've made this exponential time algorithm turn into a linear time algorithm, because the size of the input is so much bigger.

So the same thing is happening here, with subset sum. By measuring against the unary representation, we make our input exponentially bigger, and so the runtime with respect to the size of the input decreases. Think of how you'd solve subset sum: try every permutation of subsets and see if they sum to k. There are 2^n possible subsets, so it takes exponential time to do this. But if we make n exponentially larger, we turn this into a linear time algorithm (but with respect to a much bigger input).

------------------------------------------------------

####
####   3-CNF SAT   <=p  subset sum
####

(good slides)  http://cs.mcgill.ca/~lyepre/pdf/assignment2-solutions/subsetSumNPCompleteness.pdf

e.g.  (x1 | x2 | x3) & (~x2 | x3 | x4) & (x1 | ~x2 | ~x3)

lets define;
yi is in S <=> xi = true
zi is in S <=> xi = false

x1 x2 x3 x4 c1 c2 c3      # c for clause
------------------------------ # ci means if xi is in the clause or not
y1 |  1  0  0  0  1  0  1      # is x1 in c1,c2,c3 ?
z1 |  1  0  0  0  0  0  0      # is ~x1 in c1,c2,c3 ?
y2 |     1  0  0  1  0  0      # is x2 in c1,c2,c3 ?
z2 |     1  0  0  0  1  1      # so on
y3 |        1  0  1  1  0
z3 |        1  0  0  0  1
y4 |           1  0  1  0
z4 |           1  0  0  0
g1 |              1  0  0      # gi,hi pair is for each ci
h1 |              1  0  0      # in a valid truth assignment, the number of true var in a clause can be 1,2 or 3
g2 |                 1  0      # to always keep the sum 3 for t, we use gi,hi
h2 |                 1  0      # keeping always 3 is important to be able to transform 3CNF to subset sum
g3 |                    1
h3 |                    1
------------------------------
t  |  1  1  1  1  3  3  3     (desired total is 1 for each literal, and 3 for each clause)
# so this becomes the multiset A = {1,1,1,1,3,3,3}

==> think of each constituent digit of t as the decimal digit.
==> thus t = 1111333 (one million one hundred eleven thousands three hundred thirty three)
==> the existence of gi and hi is purely manifactured to make the mapping logic consistent.

(instructor notes)
The high-level idea is that we describe a polynomial algorithm that takes as input any question of the form

Given: A 3-CNF
Decision: Is there a satisfying assignment?

and then algorithm outputs a question of the form

Given: A set of numbers (the rows in the table) and k (=111....3....)?
Decision: Is there a subset of the numbers that sum to k?

where the answer to the first question is "yes" if and only if the answer to the second question is "yes."

####
####   proof of  3-CNF SAT   <=p  subset sum
####

let f be a 3CNF with n variables and m clauses.
let the multiset over positive intergers (denoted Z+) A and k = int(1n3m) be the result of the transformation.

WTS :  f is satisfiable  =>  there exists S that is in A summing to k

let t be a satisfying assignment.
include yi in S if t(xi) is true
include zi in S if t(~xi) is true

if <3 literals of clause j are satisfied, include gi
if <2, include hj as well

then sum of all elems in S must be k

WTS :  there exists S that is in A summing to k  =>  f is satisfiable

k = int(1n3m)
think of each digit for xi,ci
since each digit is decimal, no digit overflow, it's self contained.
k = int(1n3m)  means exactly one of y1 or z1 is included in S.
if yi is in S, then set t(xi) true
if zi is in S, then make t(xi) false
also, 3 means, it may contain at most gi,hi, thus at least 1 which can correspond to the number of true var in ci.

==> thus f is satisfiable.

###
###  other NPC
###

you can reduce subset sum to knapsack
vertex cover to hamiltonian cycle to traveling salesman

e.g.  3-CNF-SAT <=p 3-coloring graph

(ref) https://courses.cs.washington.edu/courses/csep521/99sp/lectures/lecture05.pdf
http://cgi.csc.liv.ac.uk/~igor/COMP309/3CP.pdf

e.g.  reduce hamilton path to CNF

(ref)  http://www.csie.ntu.edu.tw/~lyuu/complexity/2011/20111018.pdf

e.g.  reduce subset sum to knapsack

(ref)  http://cgm.cs.mcgill.ca/~avis/courses/360/2003/assignments/sol4.pdf

e.g.  reduce TSP to metric TSP

(ref) http://cstheory.stackexchange.com/questions/12885/guidelines-to-reduce-general-tsp-to-triangle-tsp

===> the rest of the course will focus on algorithm in P class.

uncountably many undecidable languages
countably many decidable languages (contains NPC,NP,P)

#######################################
######    Dynamic Programming     #####
#######################################

DP concept : divide and conquer

# overview

DP is a general design strategy/technique, not a particular algo.

3 examples
- sequence alignment/edit distance
- chain matrix multiplication
- all-pairs shortest paths in graph

###
###  Optimal Similar Substructure
###

aka optimal substructure and overlapping subproblems

"a problem is said to have optimal substructure if an optimal solution can be constructed efficiently from optimal solutions of its subproblems."
(ref) http://en.wikipedia.org/wiki/Optimal_substructure
http://www.geeksforgeeks.org/dynamic-programming-set-1/
http://www.geeksforgeeks.org/dynamic-programming-set-2-optimal-substructure-property/

e.g.
fibonacci

fib(5)
/      \
fib(3)       fib(4)
/     \      /      \
fib(1)   fib(2) fib(2)   fib(3)
/    \
fib(1)    fib(2)

===> yes, there are many overlapping subproblems, specifically fib(1), fib(2), fib(3) are computed multiple times.
===> optimal substructure? yes because combining optimal solutions to subproblems yield an optimal sol to the whole prob.

##
##  ways to avoid re-computation
##

(1)  memoization of answers to subproblems   (top down)
(2)  do the subproblems in the right order   (bottom up)

====> (2) is possible because as you can see in the above fib tree example, substructure is acyclic.
====> if it is cyclic, that means circular dependency, and recursion wont work either.

====> we focus on (2)

(ref)  http://stackoverflow.com/questions/6164629/dynamic-programming-and-memoization-bottom-up-vs-top-down-approaches

####
####   Sequence Alignment (Edit Distance)
####

X: A G A C G T A G      ->     A G A - C - G T A G
Y: G T T C A G A        ->     - G T T C A G - A -

---> align the two strings so the number of insert/delete ops to convert one to the other is minimized.

Consider sequences X = (x1,x2,,,xn) and Y = (y1,y2,,,ym) over an alphabet Z.
An alignment is a subset A in {1...n} * {1....n} such that for any (i,j) and (i',j') : i != i', j != j' , and i<i' => j<j'

===> i.e.  A = subset of pairs of elements from the two sequences, and the pairs never cross.

Cost(A)  =  # of unmatched characters (=insert + delete)   +   cost of pairing
=  (n+m) - 2*|A|                                  +     Σ f(xi,yj)
(i,j)∈A

f(xi,yj) is a function that computes the cost(# of steps required) of paring (xi,yj)
usually it is defined like  f(a,b) = 0 if a==b, otherwise 1.5  # at least it should be over 1

if X=Y, then Zf(xi,yj) will be 0

##
##  Prefix Substructure
##

the key optimal substructre for the sequence alignment problem comes from aligning prefixes of the sequence that we want to align.

Let c(i,j) be the minimum cost of aligning  x1,x2,,,xi  with  y1,y2,,,ym
i.e. c(i,j) computes the min cost of aligning the first i chars of X, with j chars of y
Goal is to find c(n,m)  since X has n chars, and Y has m chars.

Case 1: match the last char of X and Y
e.g.

A G T C G
|
- T T C A

===> total cost is  c(n-1,m-1) + f(Xn,Ym)

Case 2: leave Xn unmatched
e.g.

A G T C G    # the last G = Xn
T T C A -

===> total cost is  c(n-1,m) + 1

Case 3: leave Ym unmatched
e.g.

A G T C G -
- - T T C A   # the last A = Ym

===> total cost is  c(n,m-1) + 1

Thus,

c(n,m) = min {c(n-1,m-1) + f(Xn,Ym) , c(n-1,m) + 1 , c(n,m-1) + 1 }

===> generalize it with n=i, m=j
===> just like fibonacci sequence, this is yet another optimal substructure with overlapping subproblems

Base cases:  c(0,j) = j   and  c(i,0) = i

c(0,0) - c(0,1) - c(0,2) - c(0,3)
|    \    |   \    |   \    |
c(1,0) - c(1,1) - c(1,2) - c(1,3)
|    \    |   \    |   \    |
c(2,0) - c(2,1) - c(2,2) - c(2,3)
|    \    |   \    |   \    |
c(3,0) - c(3,1) - c(3,2) - c(3,3)

===> these dependencies form a acyclic graph, which we can linearize in any number of ways.
===> a simple example algo below

----------------------
for i = 0 to n:
c(i,0) = i
for j = 0 to m;
c(0,j) = j
for i = 1 to n:
for j = 1 to m:
c(i,j) = min { c(i-1,j-1) + f(Xi,Yi), c(i-1,j) + 1, c(i,j-1) + 1 }
-----------------------

##
##  sequence alignment summary
##

Theorem: there is an algorithm which, given two sequences of lengths n and m, finds a minimum cost alignment in O(nm) time.

Optimal similar substructure: minimum cost of aligning prefixes

Recurrence: c(i,j) = min {  c(i-1,j-1) + f(Xi,Yj),  c(i-1,j) + 1,  c(i,j-1) + 1}

####
####  Chain Matrix Multiplication (CMM)
####

Given matrices A1, A2,,,, An of sizes M0 * M1, M1 * M2,,,, Mn-1 * Mn, copute the product A1 .... An efficiently.

(note: the number of the columns in one matrix matches the number of rows in the next matrix so this multiplication is possible.)
(ref) http://en.wikipedia.org/wiki/Matrix_multiplication#Rectangular_matrices

n                                          P
|-------|              P                |-------------|
|       |        |------------|         |             |
m |       |  *   n |            |   =   m |             |
|       |        |------------|         |             |
|-------|                               |-------------|

====> m*n*p operations

also recall matrix multiplication is associative:  (AB)C = A(BC)
thus, the order in which we multiply matrices doesnt matter.
BUT, the number of operations may dramatically change.

e.g.

20       50        1
50  A  *  20 B   *  50 C

if we do (AB)C,  AB = (50*20*50) that produces a 50 by 50 matrix. (AB)C = 50*50*1
thus in total, 50*20*50 + 50*20*1 = 52500

if we do A(BC), BC = 20*50*1 that produces a 20 by 1 matrix.  A(BC) = 50*20*1
thus in total, 20*50*1 + 50*20*1 = 2000

===> the difference is huge.

###
###  subchain substructure
###

we can express the ordering of the matrix multiplication in a tree structure.

((AB)C)D              (AB)(CD)

.                      .
/ \                    / \
.  D                  .   .
/ \                   / \ / \
.  C                 A  B C  D
/ \
A  B

====> so we try all possible binary trees, and pick the one with the smallest operation cost.
====> (btw, notice there are overlapping subproblems, A*B appears in both cases above)

given the multiplication
A1 * A2 * A3 * .... * Ak+1 * .... *  An-1 * An

the first step is where to partition (where to split), there are n-1 possible choices.
(A1 * A2 * .. * Ak) * (Ak+1 * .. * An)
then the rest repeats.

[Formal statement]
Let C(i,j) be the minimum cost of multiplying Ai .. Aj
C(i,j) =  min  C(i,k) + C(k+1,j) +  Mi-1 * Mk * Mj
i<=k<j

===> the cost C(i,j) is the sum of the cost of two subtrees, and the cost of combining the resulting two matrices.

Base case: C(i,i) = 0 for all i

###
###  CMM algorithm
###

there are only nC2 subproblems. so we can use a i-by-j table to memoize.

e.g. n=5

j
--------
|0   x      # C(1,1) = 0,  we want to know C(1,n) which is the cost of CMM of A1 * .. * An
| 0         # C(2,2) = 0
i|  0
|   0
|    0

C(1,n) = min { C(1,1) + C(2,n) + mult cost ,    # k=1
C(1,2) + C(3,n) + mult cost ,    # k=2
C(1,3) + C(4,n) + mult cost ,    # k=3
...
}

===> in general, the entry in the table depends on the entries to its left and downwards.

===> there are many ways to linearize this directed acyclic graph, but one elegant way is to fill in the diagonals.

----------------------CMM algo
for i= 1 to n:
c(i,i)=0
for s= 1 to n-1:
for i= 1 to n-s:
j=i+s
C(i,j) = min  C(i,k) + C(k+1,j) + Mi-1 * Mk * Mj
i<=k<j
return C(1,n)
-------------------------------

###
###  CMM summary
###

Theorem:  there is an algorithm that, given the dimensions of n matrices, gives the expression tree that requires the fewest scalar multiplications in O(n^3) time.

optimal similar substructure:  minimum cost of subchains

Recurrence:  C(i,j) = min  C(i,k) + C(k+1,j) + Mi-1 * Mk * Mj
i<=k<j

####
####  All-pairs shortest path
####

Given a graph G=(V,E) and w: E-> R, find a shortest path between u and v for all (u,v) in V*V.
i.e. minimize   Σ w(e)
e∈P

recall [single source shortest path]
- Dijkstra's algo with fibonacci queue: O( |V| * log|V| + |E|)
(requires w >= 0)

- Bellman Ford: O( |V| * |E| )

===> to make SSSP into APSP, just run SSSP on every vertex.
===> complexity-wise, multiply by |V| as you run on every vertext
e.g.
for Dijkstra-based APSP    : O( |V|^2 * log|V| + |E|*|V| )
for Bellman Ford based APSP: O( |V|^2 * |E| )

==> Dijkstra is good, but its weakness is only works for non-negative weight edges.
==> in a dense graph, every pair has an edge, |E| = |V|C2 = almost |V|^2
==> thus for Bellman Ford based APSP, it can be |V|^4

###
###  shortest path substructure
###

(key insight from SSSP)  subpaths of a shortest path are shortest paths.

u -- x -- y -- v    # if any shorter path btw x,y then just replace it.

##
##  Recurse on path length
##

Let d(u,v,k) be the weight of the shortest path that uses at most k edges.
N(v) denotes the neighbors of v

d(u,v,k) = min { d(u,v,k-1), min d(u,x,k-1) + w(x,v) }
x ∈ N(v)

====> yields O(|V|^3 * log|V|) matrix mult. algo. (see CLRS)

##
##  Recurse on potential intermediate vertices
##

WLOG, assume V={1,2,,,,n}
Let d(u,v,k) be the min weight of a path from u to v, using only 1 to k as intermediate vertices.

d(u,v,k) = min { d(u,v,k-1),  d(u,k,k-1) + d(k,v,k-1) }

[base case]
d(u,v,0) = w(u,v) if (u,v) is in E
= ∞ otherwise

###
###  Floyd-Warshall Algorithm
###

recall our shortest path finding algo that recurses on potential intermediate vertices.
------------------------
d(u,v,k) = min { d(u,v,k-1),  d(u,k,k-1) + d(k,v,k-1) }

[base case]
d(u,v,0) = w(u,v) if (u,v) is in E
= ∞ otherwise
------------------------

===>  d(u,k,k-1) + d(k,v,k-1)  parts involve lots of recomputations
===>  lets fill out a table

for i= 1 to n:      # init
for j= 1 to n:
d[i][j] = w(i,j) if (i,j) is in E
= 0      if i=j
= ∞      otherwise
p[i][j] = i      if (i,j) in E      # p for predecessor
= null   otherwise

for k= 1 to n:
for i= 1 to n:
for j= 1 to n:
d[i][j] = min{ d[i][j],  d[i][k] + d[k][j]}
|              |
pred remains      p[i][j] = p[k][j]

###
###  All-pairs shortest path  summary
###

Theorem:  the Floyd-Warshall algo finds the shortest path in a weighted graph for all pairs of vertices in O(|V|^3) time.

Optimal Similar Substructure: optimal paths via a restricted set of vertices 1 to k.

Recurrence:  d(u,v,k) = min { d(u,v,k-1),  d(u,k,k-1) + d(k,v,k-1) }

####
####  Transitive Closure
####

consider a relation R over a set A.  i.e. R is in A*A
compute the reflexive transitive closure R* of R

R - preference
A - sports      = {a1,a2,,,,an}

===> we can infer soccer > golf
===> we can use Floyed-Warshall approach

for i= 1 to n:
for j= 1 to n:
t[i][j] = 1  if i=j or (ai,aj) is in R
= 0  otherwise
for k= 1 to n:
for i= 1 to n:
for j= 1 to n:
t[i][j] = t[i][j] |  ( t[i][k] & t[k][j] )

##########################################
####   FFT : fast fourier transform   ####
##########################################

continuing from DP, we will identify overlapping subproblems(recursive elements) and exploit.

unlike prev FFT lectures on solving differential equations, signal processing, we only focus on "multiplying polynomials quickly" aspect of FFT.

###
###  Prerequisites
###

- complex numbers, polar representations  (re^(iT))
- matrix inverse    # AA^(-1) = I    # a square matrix A is invertible if its determinent |A| != 0
- master theorem (for recurrences) Big-O = upper bound, Big-Omega = lower bound, Big-Theta = tight(both upper/lower bound)

(ref) http://www.mathsisfun.com/algebra/matrix-inverse.html
http://mathworld.wolfram.com/MatrixInverse.html
http://en.wikipedia.org/wiki/Determinant
http://www.mathportal.org/algebra/complex-numbers/complex-numbers-polar-representation.php
http://en.wikipedia.org/wiki/Master_theorem   (really good notes)

###
###  Convolution
###

FFT is an instance of discrete fourier transform.
in algorithm context, FFT is useful for convolving two sequences of numbers.

e.g.

a = {1,0,2,1}
b = {2,0,1}

k
ck =  Σ ai * bk-i
i=0

k=0:
1021
102
------
2
thus c0 = 2

k=1:
1021
102
-----
00
thus c1 = 0

k=2:
1021
102
----
104
thus c2 = 5

k=3:
1021
102
----
002
thus c3 = 2

k=4:
1021
102
-----
20
thus c4 = 2

k=5:
1021
102
------
1
thus c5 = 1

===> this ends here as no further overlap.

c = {2,0,5,2,2,1}

===> this directly applies to polynomial multiplication.

think of the sequence a as the coefficients of a polynomial equation  1 + 0x + 2x^2 + 1x^3
think of the sequence b as the coefficients of a polynomial equation  1x^2 + 0x + 2

result is c = 2 + 0x + 5x^2 + 2x^3 + 2x^4 + 1x^5
which shows the cofficients of the products of a * b

FACT:  convolving a sequence a0,,,,an-1 with b0,,,,bm-1 via the above naive formula takes Θ(nm) operations. n and m are the size of a and b.
n = |a|
m = |b|

====> FFT will improve O(nm)

###
###  representations of polynomials
###

coefficients:
A(x) = a0 * x^0  +  a1 * x^1  +  .... + an-1 * x^(n-1)

e.g.
A(x) = -2 + 3x + x^2

Fact: an order n polynomial is uniquely characterized by its values at any n distinct points.
(the order of poly means its degree plus one)

===> leveraging this fact, we might as well represent polynomials for a sequence of inputs by "values"

Values:
A(x0), A(x1),,,,A(xn-1)

e.g.
A(x) = -2 + x + x^2      # order is 3
A(-1) = -2
A(0) = -2
A(1) = 0

##
##  Vandermonde Matrix (always invertible)
##

think of A(x) value in terms of matrix multiplication of coffeficients and each polynomial x

A(x0)       x00 x01 x02 .... x0n-1             a0
A(x1)       x10 x11 x12 .... x1n-1             a1
.      =  ....                           *   a2
.         ....                                .
.         ....                                .
A(n-1)      xn-1,0 xn-1,1 .... xn-1,n-1        an-1

====> here, x00 means x^0
x01 means x^1
x02 means x^2   so on

non-zero determinant -> matrix is invertible

##
##  multiplying polynomials
##

[coefficient based]
multiplying via convlution equation takes O(nm)

e.g.
a0,a1,a2,,,,an-1
b0,b1,b2,,,,bm-1     k
==================>  Σ ai * bi-k ==> c0,c1,c2,,,,cn+m-1
i=0

[value based]
takes O(n+m)

A(x0),A(x1),,,,A(x n+m-1)
B(x0),B(x1),,,,B(x n+m-1)
=========================> a(xi) * B(xi-k) ==> C(x0),X(x1),,,,C(x n+m-1)

====> to leverage the efficiency of value-based method, we convert coefficient-rep to value-rep

O(n(m+n))                                   O(n+m)                 O((n+m)^2)
a0,a1,,,,an-1 ===>  eval   ===> A(x0),A(x1),,,,A(x n+m-1) ---\ multiplication  ____> Lagrange interpolation --> c0,c1,c2,,,,cn-1
b0,b1,,,,bn-1 ===>  eval   ===> B(x0),B(x1),,,,B(x n+m-1) ---/
O(m(n+m))

eval = convert coefficient-rep to value-rep
interpolate = conver value-rep back to coefficient-rep

NOTE: multiplication is fast now that we use value-based mult, but the eval & interpolate takes long...
NOTE: it takes O((n+m)^3) for interpolation if we use Gaussian Elimination
=====> as time-comsuming as convolution, but there is a way to improve this by choosing our own set of Xs.

let's create a Vandermonde matrix

A: 1 + 0x + x^2 + 0x^3

X       A       A(xi)
----------  ---     ---
1  1  1  1   1       2     #  A(1)
1  2  4  8   0    =  5     #  A(2)
1 -1  1 -1   1       2     #  A(-1)
1 -2  4 -8   0       5     #  A(-2)

X       B       B(xi)
----------  ---     ---
1  1  1  1   1       3     #  B(1)
1  2  4  8   2    =  5     #  B(2)
1 -1  1 -1   0       -1    #  B(-1)
1 -2  4 -8   0       -3    #  B(-2)

X       C       C(xi)
----------  ---      ---
1  1  1  1   c0       6     #  C(1)
1  2  4  8   c1    =  15    #  C(2)
1 -1  1 -1   c2       -2    #  C(-1)
1 -2  4 -8   c3       -15   #  C(-2)

====> solve this for C, then you get  C: 1 + 2x + x^2 + 2x^3

###
###  Divide and Conquer  inspiration
###

Goal: evaluate a polynomial A of order N at n distinct points
(convert coefficient-rep to value-rep)

let A(x)  = a0 + a1x + a2x^2 + ....
A(-x) = a0 - a1x + a2x^2 - ....
Ae(x) = a0 + a2x + a4x^2 + ....  # even
Ao(x) = a1 + a3x + a5x^2 + ....  # odd

Thus, A(x) = Ae(x^2) + x*Ao(x^2)
A(-x)= Ae(x^2) - x*Ao(x^2)

choose x_i  s.t.  x_i = -x_(i + N/2) for i in {0,,,,N/2-1}. then,

A(xi)     = Ae(xi^2) + xi * Ao(xi^2)
A(x_(i+N/2)) = Ae(xi^2) - Xi * Ao(xi^2)

===> evaluate 2 polynomials of order N/2 at N/2 points. Do N more additions and mults.

###
###  Roots of Unity
###

Goal:
A(xi)     = Ae(xi^2) + xi * Ao(xi^2)
A(x_(i+N/2)) = Ae(xi^2) - Xi * Ao(xi^2)

===> compute 2 polynomials for the price of one.

Assume: N is a power of 2.

Properties of X0,,,,Xn-1
------------------------
1. Distinct : xk = xi <=> k=i
2. (anti)symmetric
xi = -x_(i + N/2) for i in {0,,,,N/2 - 1}
3. Recursive
x0^2,,,,xn^2 have these properties

Define wn = e^(2π/N)
let xi = wn^i = e^(2π * i/N)

wn^i = wn^(i/2) * wm^(i/2)

###
###  FFT example
###

N=4,  w_4=i       # N=4 means, a*x^0 + b*x^1 + c*x^2 + d*x^3

Ae(1)  +   Ao(1)  = A(1)
Ae(-1) + i*Ao(-1) = A(i)
Ae(1)  -   Ao(1)  = A(-1)
Ae(-1) - i*Ao(-1) = A(-i)

===> we just need Ae(1), Ae(-1), Ao(1), Ao(-1)

Aee(1)  + Aoe(1) = Ae(1)
Aee(1)  - Aoe(1) = Ae(-1)
Aeo(1)  + Aoo(1) = Ao(1)
Aeo(1)  - Aoo(1) = Ao(-1)

===> we just need Aee(1), Aoe(1), Aeo(1), Aoo(1)

###
###   FFT algorithm (generalized)
###

input : (a0,a1,,,,an-1) where n is a power of 2, and w = a primitive Nth root of unity
# w^0, w^1,,,, w^(n-1) are the roots of unity

n-1
output: A(x) =  Σ aj * x^j evaluated at Nth roots of unity
j=0

if n=1, return (a0)
else:
(s0,s1,,,,s_(n/2 - 1) )  = FFT(a0,a2,a4,,,,a_(n-2) , w^2)      # even coefficients
(s0',s1',,,,s_(n/2 - 1)' = FFT(a1,a3,a5,,,,a_(n-1) , w^2)      # odd coefficients
for j= 0 to n/2 -1
rj       = sj + wn^j * sj
rj + n/2 = sj - wn^j * sj'
return (r0,r1,r2,,,,r_(n-1))

[Analysis]   T(N) = 2T(N/2) + O(N)     # master's theorem
= O(N*logN)

=====> yes, we finally beat O(n^2) of the naive method we saw in prev section, for evalution part.
=====> but interpolation part still remains O(n^2)

###
###  Butterfly Network
###

###
###  Vandermonde at Roots of Unity
###

Mn(w) denotes the Vandermonde matrix at nth root of unity.

1       1        1   1 ,,,, 1
1       w      w^2 w^3 ,,,, w^(n-1)
Mn(w) =  1     w^2      w^4 w^6 ,,,, w^2(n-1)
1     w^3      w^6 w^9 ,,,, w^3(n-1)
.       .        .   . ,,,,    .
.       .        .   . ,,,,    .
1 w^(n-1) w^2(n-1)   . ,,,, w^(n-1)(n-1)

#  element kj  =  w(k-1)(j-1)

CLAIM : let w be a primitive Nth root of unity ( w^0, w^1,,,, w^(n-1) are unique) then Mn(w) * Mn(w^-1) = nI
proof: consider element kj.
n-1                               n-1
Σ w^(k-1)i  * w^-(j-1)i     =     Σ w(k-j)i   =  N if k=j,  otherwise  1 - w^n(j-k)  = 0
i=0                               i=0                                   /1 - w^(j-k)

====> just use FFT with w^(-1)  !!!!!!

####
####  Inverse FFT
####

input : sequence  A(w^0), A(w^1),,,, A(w^(n-1))   where n is a power of 2, and w is a primitive nth root of unity.
n-1
output: sequence  a0,a1,,,, an-1                  s.t.  A(x) =  Σ ai * x^i
i=0
===> given values, we interpolate coefficients.

===>  simply write as :   return  FFT( input, w^(-1) )  / n
===>  amazingly simple

NOTE:
a0
each value is vandermonde matrix times coefficients, i.e.  Mn(w) *  a1
.
.
an-1
a0               a0
Mn(w) * Mn(w^-1) * a1      =   nI * a1
.                .
an-1            an-1

===> see how deviding the result by n gives you the original coefficient exactly.

####
####   recap of FFT for matrix multiplication
####

evaluation    : O(n(m+n))  = O(n^2)   ->   O(n log(n)) via FFT
interpolation : O((n+m)^2) = O(n^2)   ->   O(n log(n)) via inverse FFT

===> order N polynomials can be multiplied in O(n log(n)) time, i.e. O((n+m) * log(n+m)) !

(as you could see, FFT for multiplying poly has DP aspects)

###############################
#####    Maximum Flow     #####
###############################

#  prerequisites

- graph (directed, undirected)
- degrees (how many edges each vertex touches)
- BFS, DFS (basic traversal methods)

#  lesson overview

1. flow NWs, capacities, residual NWs, etc
2. Ford-Fulkerson method for finding a maximum flow.  show correctness via "max-flow min-cut" theorem
3. scalaing algo and Edmons-Karp for improved performance

an edge may have capacity A, and can have flow [0,A]
e.g.
an edge (u,v) with capacity 10, i.e. you can send up to 10 fwd.
if you send f(u,v) = 10, then we call the fwd edge "full" but it means you can send up to 10 backward.
i.e. if you don't send anything on fwd edge, then backward edge is empty.
i.e. you can only send stuff on non-full fwd edge aka non-empty backward edge.

###
###  Flow Networks
###

[flow network]

- a directed graph G(V,E); (u,v) in E => (v,u) NOT in E
- source "s"
- sing "t"      # aka target, destination
- for all v in V,  (v,s) & (t,v) NOT in E   # meaning, no vertex goes TO s, FROM t
- C: V * V -> N|{0}                         # C for capacity, N for natural number 1,2,3,4,,,, so cost of any edge is non-negative integer
- C(u,v) = 0 if (u,v) NOT in E              # capacity is zero if no edge

[flow]  # i.e. how much stuff flows within the capacity

f : V * V  -> [0,int)   where  1. f(u,v) <= c(u,v)
2. f(u,v) * f(v,u) = 0    # this means no both way flow, one way must be 0
3. f_in(u) = f_out(u) for internal u.

f_in(u) =    Σ f(v,u)    # sum of all flows into u
v∈V

f_out(u)=    Σ f(u,v)    # sym of all flows out of u
v∈V

[flow value]

v(f) = f_out(s) = f_in(t)        # i.e. every node must be in equilibrium

##
##

by simple tricks, we can allow the below 3 cases

(1) rational capacities
- we can allow ratioinal capacities ( 2/3, 1/2, 17/33, etc) which we can simply multiply some integer k to generalize into all integer capacity graph.

(2) anti-parallel edges
for any edge (u,v),  (v,u) cannot exist, but we can essentially create an anti-paralell edge by adding a new vertex w s.t. (v,w)(w,u)

(3) multiple sources and sinks
e.g.
s1,s2  and  t1,t2,t3
===> just create s0,t0.  have s0 connect to all sn, and all tn to t0, with infinite capacity.

###
###   Residual Network
###

[Residual Capacity]    # defines what flows we can add

given flow f over G(V,E)

|  C(u,v) - f(u,v)   if (u,v) in E      # hence "residual capacity"
Cf(u,v) = |  f(v,u)            if (v,u) in E
|  0                 otherwise

[Residual Graph]
Vf = V
Ef = {(u,v) is in V*V : Cf(u,v) > 0}

###
###   Augmentations
###

A path with available capacity (i.e. non-full fwd edges aka non-empty backward edges) is called an augmenting path.

Let f  be a flow in G.
Let f' be a flow in Gf.   # Gf = residual NW of G.

(f+f')(u,v)  =  f(u,v) + f'(u,v) - f'(v,u) if (u,v) in E
=  0 otherwise

CLAIM: (f+f') is a flow in G with value v(f+f') = v(f) + v(f')

####
####   Ford-Fulkerson method
####

to be strict, it is called "method" instead of "algorithm" because it doesn't specify how to find augmenting paths in residual graph Gf
but people call it algorithm.
it is a maxflow algorithm. (edmonds-carp algo is one implementation of FF method)

[FF method steps]
1. initialize f=0
2. while ∃ an augmenting path p from s to t in Gf, s.t. Cf(e) > 0 ∀ e∈P    # find an augmenting path
- calculate b =  min Cf(e)                                              # find the bottleneck capacity
e ∈ P                                                   # e is just an edge (u,v)
- set f = f+f', where f'(u,v) = b  if (u,v) ∈ P     # augment each edge f'(u,v) and total flow f
= 0  otherwise
3. return f

===> the idea is (starting with zero flow f=0) keep finding a path from source to sink (until no more exists), with non-zero capacity on all edges in the path (which makes such a path an augmenting path by definition), we keep sending flow along the path (and update/augment each edge and total flow). when we find no more augmenting path, we have (hopefully) reached the maxflow for the given G.

(ref) https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm

#  complexity

[per iteration]
- finding p  : O(|E|) using DFS (or BFS)       # if you use DFS, then it is called edmonds-carp algo
- finding Gf : calculating residual NW is checking each edge, so O(|E|)
- finding b  : at most O(|E|) if p contains all edges
- setting f  : at most O(|E|) as the number of flows in G can be |E| times constants

===> thus per-iteration complexity is  4 * O(|E|) == O(|E|)    # i.e. proportional to the number of edges

[how many iterations?]
at most v(f*) i.e. the size of maxflow because f=0 at the beginning, and each iteration increases the total flow at least +1.

===> thus overall FF method complexity is O(Ef)          # as we will see next, f == f*

#  remaining questions

1. v(f) = v(f*) ?      # where f is the flow returned from the algo, and f* is the max flow
2. better analysis or algo for the number of iterations?

####
####  flows and cuts
####

max flow = somewhere between 0 and   Σ c(s,v)
v∈V

max flow point (ford-fulkerson will determine)
|
----(flow)---><----(cut)--------
0 ________________________________  Σ c(s,v)
|                                  v∈V
|                                   |
theoretical min                  theoretical max

###
###   Cut basics
###

Def: an s-t cut of a flow network is a partition of the vertices (A,B) s.t.  s in A, t in B, B=V-A

Lemma: let f be an s-t flow, and (A,B) an s-t cut. Then v(f) = f_out(A) - f_in(A)
= f_in(B) - f_out(B)

proof: v(f) = f_out(s) =   Σ f_out(v) - f_in(v)
v∈A

terms cancel if (u,v) in A*A, so v(f) =       Σ f(u,v)  -    Σ f(u,v)
(u,v) ∈ A*B    (u,v) ∈ B*A
= f_out(A) - f_in(A)

###
###   Cut capacity
###

Def: the capacity of an s-t cut (A,B) is c(A,B) =     Σ c(u,v)
(u,v)∈A*B

Lemma: let f be a flow and (A,B) be an s-t cut in a flow network.
then v(f) <= c(A,B)
proof: v(f) == f_out(A) - f_in(A)
<= f_out(A)
<=     Σ f(u,v)
(u,v)∈A*B
<=     Σ c(u,v)  = C(A,B)
(u,v)∈A*B

===> at this point, intuitively, minimum capacity cut is the max flow capacity

####
####  max-flow min-cut theorem
####

Theorem:  the following are equivalent.
1. f is a maximum flow in G
2. Gf has no augmenting path
3. there exists an s-t cut (A,B) s.t. v(f) = c(A,B)

Corollary: the Ford Fulkerson algo returns a maximum flow

Proof:
1. f is a maximum flow in G  => Gf has no augmenting paths
suppose not. let f' be an augmenting flow.
then f+f' is a flow and v(f+f') = v(f) + v(f') > v(f)
# but f is a maximum flow, so this contradicts

2. Gf has no augmenting paths => there exists an s-t cut (A,B) where v(f) = c(A,B)
let A be the set of vertices reachable in Gf from s.
B=V-A
(u,v) in A*B  => f(u,v) = c(u,v)
# because the outbound A*B edge must be saturated otherwise non-zero residual flow f'(u,v) can exist. but we defined A to be the set of vertices reachable from s in residual NW.
(u,v) in B*A  => f(u,v) = 0
# because no augmenting flow exists. o/w u will be in Gf.

recall v(f) = f_out(A) - f_in(A)      # but we argued, in max flow, f_in(A)=0
= f_out(A)                # thus in this case
=      Σ c(u,v) = c(A,B)
(u,v) ∈ A*B

3. there exists an s-t cut (A,B) where v(f)=c(A,B)   =>  f is a max flow in G
if not then, cut capacity wouldnt be an upper bound. (but by definition, it is)

###
###  choosing augmenting paths
###

ideas:  1. prefer heavier paths
2. prefer shorter paths

###
###   scaling algorithm  (prefer heavier paths)
###

we set the threshold delta D.

Def:  Gf(D) = (V, {(u,v): Cf(u,v) >= D})

Gf(1) = Gf    # when D=1, the same as our traditional definition of residual network Gf
# D=1 means a network of non-zero residual flows = Gf

1. initialize f=0
2. initialize D =  max c(s,v)
v ∈ V
3. while D >= 1                               #  O(log X)  where X = initial val of D
while there exists a path p in Gf(D)    #  finding p    = O(|E|)
use p to augment f                  #  augmenting f = O(|E|)
set D = D/2
4. return f

WTS: the max number of iterations for a scaling phase is at most 2|E|

lemma: if Gf(D) has no s-t paths, then there exists an s-t cut (A,B) s.t. c(A,B) <= v(f) + |E|(D-1)

proof: let A be set of vertices reachable from s in Gf(D).
let B = V-A

(u,v) in A*B  =>  c(u,v) - f(u,v) <= D-1
# edges from A to B must have residual capacity smaller than D, otherwise v would be reachable from s in Gf(D), and would have been in A, not B.
(u,v) in B*A  =>  f(u,v) <= D-1
# edges from B to A cannot have more than D because that would make f a residual flow in Gf(D) and u would be in A.

v(f) == f_out(A) - f_in(A)
>=     Σ c(u,v)-(D-1)   -    Σ (D-1)
(u,v) ∈ A*B           (u,v) ∈ B*A
>=    c(A,B)   - |E|(D-1)

proof: base case D=X is trivial, since each augmenting flow here saturates one of the edges out of the source.
let f be the flow after the scaling phase D.
let g be the flow after the prev scaling (2D or 2D+1)
v(f*) = max flow
v(f)  <=  v(f*)  <=  c(A,B)  <=  v(g) + |E|(2D)
let k be the number of iterations
kD  <=  v(f) - v(g)  <=  2|E|D
k   <=  2|E|

[revisiting the scaling algo]

1. initialize f=0
2. initialize D =  max c(s,v)
v∈V
3. while D >= 1                               #  O(log X)  where X = initial val of D
while there exists a path p in Gf(D)    #  finding p    = O(|E|),  plus this inner loop = O(|E|)
use p to augment f                  #  augmenting f = O(|E|)
set D = D/2
4. return f

Theorem:  the scaling algo returns a max flow in O( |E|^2 * logX ) time  where X =  max c(s,v)
v ∈ V

####
####   Edmond Karp  (prefer shorter paths)
####

1. initialize f=0
2. while there exists an s-t path in Gf
- find a shortest s-t path p        # almost identical to Ford-Fulkerson, but we pick "shortest" p in each iteration
- let b =   min Cf(u,v)
(u,v) ∈ p
- augment f by b over p
3. return f

====>  while loop = O(|E||V|), each iteration (inside the loop) = O(|E|)
====>  thus in overall, O(|E|^2 * |V|)

WTS: O(|E||V|) augmentations are used.

Def (level graph):
let L(v) be the shortest path distance from s to v in G.
the level graph of G is the subgraph with edges (u,v)  s.t. L(v) = L(u) +1    # i.e. only edges in shorted paths

[observations]
1. augmenting along a shortest path only creates longer ones.
2. shortest path distance must increase every |E| iterations. (as each iteration deletes an edge)
3. only |V| possible shortest path lengths. (obvious)

====> strongly polynomial.
====> capacities don't need to be integers.

###
###  Dinic Algo  (further refinement to Edmond Karp)
###

- reuse the computation

1. initialize f=0
2. repeat
- build level graph L from Gf, and let k be the shortest path length.
- while there exists a positive s-t path p in L of length k
- augment f by p
until there are no more s-t paths in Gf
3. return f

[analysis]
===> one iteration of the outer loop (repeat) is called a phase. there are O(|V|) phases.
===> each phase increases k++
===> building level graph done via BFS = O(|E|)
===> while loop = O(|E||V|)

====> thus overall, O(|E| * |V|^2)

Theorem: Dinic algo returns a max flow in O(|E| * |V^2|) time.
===> better than O(|E|^2 * |V|) of Edmonds Karp.

WTS: each phase takes O(|V||E|) time.

we will use level graph, which we build by BFS; save all fwd edges, and remove backward edges.

if new edges are useless, why rebuild level graph of Gf?
- build path by "first in adjacency list."
- augment if t reached. (every augmentation deletes the bottleneck edge) <= O(|E|)
- delete blocking vertex from Level graph of Gf    #  O(|V|) times

===> this cycle repeats O(|V|)
===> overall O(|E||V|)

=======>  we will further discuss max flow in Duality lecture

(good ref)  http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=maxFlow2

#######################################
#####     Bipartite Matching      #####
#######################################

## overview

1. definitions: bipartite graphs, matching, etc
2. reduction to max-flow gives an O(|E||V|) algo.
3. vertex cover is the dual problem to BP matching.
4. Hopcroft-Karp algo is O(|E||V|^(1/2))

###
###  Bipartite Graphs
###

Def: an undirected graph G=(V,E) is bipartite if there exists L in V and R=V-L s.t. every edge has one vertex in L and one in R.

[observations]
1. bipartite <=> 2-colorable
2. bipartite <=> no odd-length cycles
3. L,R sometimes included in definition.
e.g. here is a bipartite graph G(L,R,E). this unequivocally defines L,R when there can be more than one partitioning.

###
###   Matching
###

Def: given G=(V,E),  M in E is a matching if no two edges in M share a vertex. (does not have to be bipartite)

note: we call edges in M "matched edges" and vertices in M as "matched vertices"

Def: a max-matching is a matching of maximum cardinality. (utmost matched edges, i.e. biggest possible |M|)

note: maximal != maximum

####
####   Reduction to max-flow
####

1. build a flow NW where V' = V|{s,t},  E'= E|{s} * L|R * {t},  C=1  # cap of every edge is 1, overall O(|E|)
2. run Ford-Fulkerson on the NW.     # O(|E||V|)
3. return the edges with positive flows as the matching.   # O(|V|)

===> thus overall O(|E||V|)

###
###  reduction correctness
###

Theorem:
consider a bipartite graph G and the associated flow network F. The edges in G which have flows of 1 in a maximum flow obtained by Ford-Fulkerson on F constitute a maximum matching in G.

Proof:
flow is 0 or 1. Conservation of flow implies each vertex is in at most 1 edge. Hence {e: f(e)=1} is a matching.
if there were a larger matching, it would be trivial to contruct a larger flow.

####
####   Augmenting Paths
####

given f and G, augmenting paths are the s-t paths in Gf.

[observation]

1. intermediate flows found by Ford-Fulkerson correspond to matching.
2. since every edge cap=1, only original fwd edge or its reverse edge. never both.
3. in finding augmenting paths and updating residual NW, matched edges are reversed. so only unmatched vertices have from-s and to-t edges. so, any augmenting path must;
(2) then go over unmatched edge
(3) go until reaching unmatched vertex, then to t   # here until reaching unmatched vertex, you alternate between matched edge and unmatched edges.

Def:
given a matching M, an "alternating" path is a path where edges are alternately in M and not in M.
an augmenting path is an alternating path where the first and last vertices are unmatched.

====> we flip unmatched-matched edges in augmenting paths
====> NOTICE this increases matched edges by one. because the direction is s->unmatched->{matched->unmatched}*->t

====> we can re-state Ford-Fulkerson in BP matching terms.

1. initialize M=0
2. while there exists an augmenting path p,
M = M (+) p         #  A(+)B = A|B - A&B   # means if it is in only one of them, then include
3. return M

###
###   Vertex Cover
###

Def: given G=(V,E), S in V is a vertex cover if every edge is incident on a vertex in S.
# does not have to be minimum cardinality |S|. redundancy is fine. if we want, we say minimum vertex cover.

Prop: given G(V,E), the size of a matching in G is at most the size of a vertex cover

Proof: for every edge in the matching, at least one vertex must be in the cover. All these must be distict by def of matching, i.e. no two edges share a vertex.

===> min vertex cover |S| is the upper bound for matching in bipartite graph. (NOT for general graph.)

####
####   max-matching / min-vertex-cover theorem
####

Theorem:
consider a bipartite graph G, and a matching M.
the following are equivalent.

1. M is a maximum matching.
2. M admits no augmenting paths
3. there is a vertex cover of size |M|

Proof:  M is a max-matching  =>  M admits no augmenting paths.
suppose not. there exists an augmenting path p, then M' = M(+)p is a larger matching. contradiction.

Proof:  M admits no augmenting paths  =>  there exists a vertex cover of size |M|
let H be the set of vertices reachable via alternating paths from unmatched vertices in L.
recall fwd edge (L->R) is unmatched, and reverse edge (L<-R) is matched. and no augmenting paths means the alternating paths must terminate at some matched vertex in L.

L  R
.->.
. /.
./ .
.  .

let S = (L-H) | (R & H)     # see video for visual, super simple
S contains exactly one vertex of each edge in M, so |S| = |M|
no edges from L & H to R-H (by definition), also no edges are internal to L,R themselves, so S is a vertex cover.

Proof:  there exists a vertex cover of size |M|  =>  M is a max matching.
if not, then vertex cover would not be an upper bound. but we proved it is upper bound.

####
####   Frobenius-Hall  theorem
####

for a subset of vertices X in V,  N(X) = union of the neighbors of v in X.

(intuitively) if you select a set of vertices X from L, then if |X| < |N(X)|, then there may be a better matching, but if |X| > |N(X)| no way better matching exists.

[Corollary]
given a bipartite graph G=(L,R,E), there exists a matching of size |L| if and only if for every X in L, |N(X)| >= |X|

Proof:  fwd direction
let M be a matching of size |L|
let X in L
let Y = {v: (u,v) in M, u in X}   # v in R
then |Y| = |X|, and Y is in N(X)
so |Y| <= |N(X)|

Proof:  rev direction
suppose not.
let M be a max matching with |M| < |L|
let H be the set of vertices reachable via an alternating path from an unmatched vertex in L.
then N(L&H) = R&H, but |L & H| > |R & H| because there is at least 1 unmatched vertex.

####
####  Perfect Matching
####

consider a bipartite graph G=(L,R,E) where |L| = |R|, then a matching M is a perfect matching if |L| = |M|
i.e.  all vertices are matched.

if |L| = |R|, then perfect matching is implied by;
- there exists Y in R, s.t. |N(Y)| >= |Y|     # frobenius-hall theorem
- there is M that admits an alternating hamiltonian cycle.

####
####   toward a better BP-matching algo
####

recall two approaches we examined in max-flow;
1. prefer heavier augmenting paths (which aug paths let the heavier flow in Gf?)
2. prefer shorter augmenting paths (smallest # of edges, recall Dinic, BFS re-done only when the shortest path length changes, not in every augmentation)

===> 1. does not work in BP matching, because any augmenting path increase the matching size by one.
===> 2. we will explore deeper: hopcroft-karp algo

####
####   Hopcroft-Karp algo
####

1. initialize M=0
2. while there is an augmenting path
- build alternating level graph rooted at unmatched vertices in L, using BFS.
- augment M with a maximal set of vertex disjoint shortest-length paths.
3. return M

# a vertex disjoint path means the same vertex does not appear twice in the path.
# an edge disjoint path/cycle, the same.

===> each iteration (aka phase) in the loop is O(|E|)
===> there are O(|V|^(1/2)) phases   # proof later

Theorem:
Given a bipartite graph G=(V,E),  Hopcroft-Karp finds a maximum matching in O(|E||V|^(1/2))

##
##   max-matching (final result)  VS  intermediate matching (algo produces along the way)
##

Claim: consider a graph G=(V,E), and matchings M and M'

the following are all true about G=( V, M(+)M' )
- every vertex has degree at most 2              # because one edge from each matching
- every path is alternating with respoect to M and M'   # see visual. obvious from the above.
- G consists of a set of vertex-disjoint paths and cycles.  # degree at most 2 means this.

####
####   Shortest Augmenting Paths
####

Lemma:
let M be a matching.
let P be a shortest augmenting path for M.
let Q be an augmenting path for M(+)P.
then |Q| >= |P| + 2|P&Q|      # in terms of the edge count

Proof:
we have |M(+)P(+)Q| - |M| = 2, so (M(+)P(+)Q) (+) M = P(+)Q must contain 2 "vertex disjoin" augmenting paths of length at most |P|
therefore, |P(+)Q| >= 2|P|
|Q| + |P| = |P(+)Q| + 2|P&Q| >= 2|P| + 2|P&Q|

Corollary:
let M,P and Q as above.
if |Q| = |P|, then P and Q are vertex disjoint.

Proof:
suppose not. then let v be a shared vertex. then v is matched in M(+)P, so Q must share the matched edge with P. but Q is obtained by flipping whatever is mathed in M(+)P. and if they are the same length, that is not possible.

####
####   Analysis of a phase
####

lemma:
each phase increases the length of the shortest path by at least two.

Proof:
let Q be a shortest augmenting path after a phase with path length k.
|Q| < k is impossible by prev lemma. |Q|=k implies that Q is vertex disjoint from all paaths found in the phase. but then it would have been part of the phase.
thus |Q|>k and also odd, so |Q| >= k+2
(k is odd, as we discussed, augmenting path is unmatched edge + {matched edge + unmatched edge}* = odd)

####
####   number of phases
####

lemma:
the number of phases used by Hopcroft-Karp is O(|V|^(1/2))

Proof:
let M' be the matching found by HK and let M be the matching after ceeiling(|V|^(1/2)) phases.
the length of the shortest augmenting path in M is 2*ceiling(|V|^(1/2)) + 1
hence no path in M'(+)M thatt is augmenting for M can have shorter length.
this implies that there are at most |V| / (2*ceiling(|V|^(1/2))+1) augmenting paths for M in M(+)M'
then |M'| - |M| <= |V| / (2*ceiling(|V|^(1/2))+1)
<= |V|^(1/2)

####
####    exercise
####

a graph is said to be k-regular if the degree of each vertex is exactly k.
let G = (L,R,E) be k-regular and bipartite.

a. prove |L| = |R|

By the definition of bipartite, we can think of G as 2-colorable, as in. L = red, R = green.
Let N be the number of red vertices in G. Now lets think about the number of green vertices.
Every red vertex has exactly k green neighbors. So there are N * k = Nk green vertices if you count neighbor green vertices at every red vertex. But did I just count any of the green vertices more than once? Yes, every green vertex has k red neighbors. So from those k red vertices, I counted the same green vertex k times. Thus I need to divide Nk by k to determine the number of green vertices in G, which is Nk / k = N.  Thus |L| = |R|.

#######################################
#####     Linear Programming      #####
#######################################

LP : "optimizing a linear objective function under linear equality and inequality constraints. Many problems, such as maximum flow, can be seen as special cases of LP"
(ref) http://www.cis.temple.edu/~wangp/9615-AA/Lecture/11-LinearPro.htm

simple, and yet general and practical.

- matrix expressions and inequality
- polynomial time algo

##
## overview
##
- LP in 2-dimension (high school version)
- generalize to N-dimension, fundamental theorem of LP
- Simplex Algorithm

####
####  preliminaries
####

Linear Algebra
- systems of equations as matrices
- linear independence of vectors   # i.e. a vector not in the span of other vectors in a set.
# span is any space covered by linear combination of vectors
# (ref) https://youtu.be/CrV1xCWdY-g
- matrix rank                      # number of linear independent column (column rank = row rank)
- matrix inverse                   # AA^-1 = I   # identity matrix where diagonal is all 1s, and the rest is zeros.

(gilbert lang lecture on linear algebra, linear independence)

###
###  2-dimensional LP
###

prose) a student has to balance work and rest. sleeping, eating and commuting leave him 14hrs per day. for every 2hrs he works, he has to rest 30min before he can work effectively for another hour. how does he maximiaze his work hr?

let x1 be the number of hours worked.
let x2 be the number of hours relaxed/rested.

goal: maximize x1

conditions:
x1 >= 0
x2 >= 0
x1 + x2 <= 14
x1 - 2*x2 <= 2    # after 2hrs, then 1hr-30min combination

solving the equations gives x1 = 10, x2 = 4

watch video for visualizing this on x-y plane.

==> basically, there can be 3 cases;
1. region (aka polytope) is bounded, so there is an optimal answer.
2. region is unbounded, (there can be one, or) no optimal answer.
3. region does not exist, e.g. x > 5  and x < 0.  no answer that satisfies the constraints.

(simple quiz)  https://youtu.be/ZuUqu_Jw8oI

####
####  generalizing to N dimentions
####

maximize  c1x1 + c2x2 + .... + cnxn
s.t. a11x1 + a12x2 + .... + a1nxn <= b1
a21x1 + a22x2 + .... + a2nxn <= b2
.
.
am1x1 + am2x2 + .... + amnxn <= bm

x1,x2, .... ,xn >= 0

====>  expressing this in Matrix form,

maximize c^T * x     # ^T is transpose, i.e. all columns become rows. column1 -> r1, columne2 -> r2,,,
s.t. Ax <= b
x > 0

##
##  key transformations:
##

1.  max <-> min     # just negate the vector c
max c^T * x  --> min -c^T * x     # vice versa

2.  >=  <->  <=     # negate both sides
777 > 555   <->   -777 < -555

3.  =   ->  >=
a = b   ->  a >= b,  -a >= -b

4.  >= or <=  ->  =
a >= b   ->  a - y = b, y >= 0
a <= b   ->  a + y = b, y >= 0

5.  free variables  ->  non-negative ones   # see quiz
(a) use an equality to eliminate the variable. e.g. if y is a free var, and x - y = 1, then substitute y = x-1 everywhere
(b) substitute y = x'-x'' for y where x',x'' >= 0

(good quiz) https://youtu.be/TNOiuOacHWo

####
####  favored LP forms
####

(1) symmetric                                                     (2) standard form

max c^T * x                          c^T  -> [c^T | 0]            max c^T * x
s.t. Ax <= b,  x >= 0                  A  -> [A|I]                s.t. Ax = b,  x >= 0
x1,x2,,,xn >= 0 -> x1,x2,,,xn+m >0   # here we introduced m slack variables, xn+1 to xn+m

===> only diff is inequality to equality.

# c^T * x is called objective function
# A is a matrix of coefficients, x is a vector of variables
# c and b are vectors of known coefficients

NOTE:  1. we still use n for dimension in standard form. (n in standard form is n+m in symmetric form)
2. assume A is full rank (i.e. all rows are linear independent)

(we focus on Standard Form for the rest of the lesson)
(good reference)   http://www.cis.temple.edu/~wangp/9615-AA/Lecture/11-LinearPro.htm

###
###   basic solutions and feasibility
###

Given:
Ax = b
x >= 0

Def: vector x is a solution iff Ax = b
Def: let 1 <= i1 < i2 < ,,,, < im <= n
the columns of A : ai1, ai2,,,,aim are linearly independent and let B the resulting submatrix of A
define Xb = B^-1 * b and now form a new vector x in E^n s.t. xj = Xbk if j = ik
= 0 otherwise
then x(=Xb) is a basic solution.

A = [B|D]      # B is m rows by m columns. recall the num of LI rows is the num of LI cols
# D is m rows by n-m columns

|B^-1 * b|          #
A = [B|D] * |--------| = b      #  A = A*x = b   and this x is a basic solution
|   0    |          #

===> called "basic" because B(LI set of columns) forms the basis for the colum space of x.
===> note it is possible for more than one basis to yield the same basic solution if some entires in Xb are zeros. (such a sol is called degenerate)

Def:  a basic solution is degenerate if it has fewer than m non-zero entries.

(simply put, lets say you look for a vertex in 2D place, and only need two lines to have the intersection that defines the vertex, so if you have any extraneous lines, i.e. variables, those can be removed, and now you have a basic solution.)

===> so far we only considered Ax = b, but we now look at  x >= 0  aspect (non-negativity)

Def: a vector x is a feasible solution if it satisfies both Ax = b and x >= 0    # x as a vector has all non-negative entries

(must see good quiz)  https://youtu.be/DxxQCK8RZJw

####
####   Fundameental Theorem of LP
####

wikipedia is good on this.
(ref)   http://en.wikipedia.org/wiki/Linear_programming

Given a linear program in the (standard) form;
max c^T * x       # c^T * x is called objective function
s.t. Ax = b       # A is a matrix of coefficients, x is a vector of variables
x >= 0       # c and b are vectors of known coefficients
where A is an m-by-n matrix of rank m
i)  if there is a feasible solution, there is a "basic" feasible solution.
ii) if there is an optimal feasible solution, there is an optimal "basic" feasible solution.

Proof:   i)  if there is a feasible solution, there is a "basic" feasible solution.
let x be a feasible solution and WLOG let x1, x2,,,,xp be the positive entries. thus x1a1 + x2a2 +,,,,+ xpap = b
CASE 1: a1,a2,,,,ap are linear independent.
p > m :  impossible. p <= m
p = m :  x is basic
p < m :  x is degenerate. add cols to form linear independent basis (to make p + y = m).
CASE 2: a1,a2,,,,ap are linear dependent.
there exists a set of coefficients y1,y2,,,,yp s.t. y1a1 + y2a2 +,,,,+ ypap = 0 with at least one such yi > 0
let e = min{xi/yi : yi > 0}
then, (x1 - e*y1)a1 + ,,,,+ (xp - e*yp)ap = b     # so this is yet another feasible solution
but at least one coefficient is zero while others remain positive
so we reduced an entry in x, and repeat this until we reach CASE 1

Proof:  ii) if there is an optimal feasible solution, there is an optimal "basic" feasible solution.
let x be an optimal feasible solution with non-zero entries x1,x2,,,,xp.
then x1a1 + ,,,,+ xpap = b    and   c^T * x =   max C^T * x    # because that is what it means for x to be optimal
feasible x
CASE 1 : a1,a2,,,,ap are linear independent.
(the rest of the arguments is the same as i) proof)
CASE 2 : a1,a2,,,,ap are linear dependent.
there exists y1,y2,,,,yp s.t. y1a1 + ,,,, + ypap = 0 with at least one yi > 0   # what it means to be LD
for e sufficiently close to zero(positive and negative),  x-ey is feasible.
thus c^T * y = 0. otherwise, c^T * x < c^T(x-ey) for the right e, which is impossible because x is optimal.
choose e = min{xiyi : yi > 0} to yield an optimal feasible solution with smaller support. repeat until CASE1.
(essentially, by choosing the right e, we set one of the coefficients of x-ey zero, thus one variable closer to m, i.e. LI)

====> in a way, it is intuitive that any feasible solution comes with a basic one, basic meaning by selecting m LI columns for the basis.

(good quiz)  https://youtu.be/thtUvQ0tCLE

####
####   Simplex Equation
####

Simplex Algo.
- a way to move from one basic feasible sol to a better one. (we repeat till reaching optimal)
- not polynomial time in the worst case. but still useful in practice.

A = [B|D]                 # m rows, B has m cols, D has n-m cols
c^T = [c(B)^T | c(D)^T]   # just rewriting c^T in terms of B and D
x = [x(B)^T | x(D)^T]     #

max  c(B)^T * x(B) + c(D)^T * x(D)    # this is just rewriting the standard form
s.t. Bx(B) + Dx(D) = b                 # in terms of the above
x(B), x(D) >= 0                   #

For Simplex Algo, we consider swapping a column of B with another valid one, that imporves the objective function.

===> we parameterize constraints solely in terms of x(D), then we move xi(i.e. variables in x(D)) into B.
===> notice moving columns and moving variables mean the same thing.

===> now we solve the equality constraints for x(B) = B^-1 - B^-1 * Dx(D)
r(D)^T = c(D)^T - c(B)^T * B^-1 * D
===> now the standard form can be re-phrased as;
max r(D)^T * x(D)
s.t. B^-1 * D * x(D) <= B^-1 * n
x(D) >= 0

notation gets messy, see video

Q. which columns of D, d1,d2,,,,d(n-m), are good candidates to enter the basis?
A. any dq s.t. r(D)q > 0

===> any columns corresponding to a positive column of r(D) is a good candidate.
===> we want the entry to be positive because the corresponding value of r(D) will be positive as we increase it.

Q. which columns of B to kick out?
A. G = min{ vi/ui : ui > 0}

G * B^-1 * dq <= B^-1 * b
---------    --------
= u          = v

pick q s.t. r(D)q > 0. let x(D) = Geq

Procedure:
1. calculate r(D).  if r(D) <= 0, return | B^-1 * b |   i.e. current sol x is the best, and return it
|    0     |
2. pick q s.t. r(D)1 > 0,  let dq be q'th column of D
3. calculate u = B^-1 * d1.  if u <= 0, report that the problem is unbounded.
otherwise find argmin{ vi/ui : ui > 0}   Let bi be i'th column of B
i
4. swap bi for dq in the basis. repeat.

###
###  Simplex Example
###

===> notice how we just find one basic feasible solution, almost randomly, then we use Simplex algo to improve the basis better better to toward the best, eventually.

###
###  Simplex Correctness
###

CASE :  Bounded LP

1. make progress in each iteration
2. finite number of steps (only nCm possible bases)
3. dont stop too early

CASE : Unbounded LP

1. wont hit r(D)^T <= 0 condition
2. wont run forever
3. stops when unbounded

(must see video)

###
###   how to find a basic feasible sol to begin with
###

- Two Phase Approach

given the standard form;
max c^T * x
s.t. Ax=b
x >= 0

often it is easy to find a basic feasible sol to begin with just by inspection. but sometimes not.
so we define auxilliary program, first enforce b >= 0, and then introduce an artificial slack variable y
min 1^T * y
s.t. Ax + y = b
x,y >= 0

=====> x=0, y=b is a basic feasible solution !!
=====> so we can start Simplex algo on it.

if optimal is zero, then start Simplex on the original standard form at x
otherwise the original form is infeasible.

####################################
#####         Duality         ######
####################################

every LP has a dual program, which mirrors the behavior of the original.

(ref)
http://www.cs.columbia.edu/coms6998-3/lpprimer.pdf

####
####   Bounding an LP
####

notice how you can transform an LP(called "primal") to another one(called "dual").

every LP problem has a dual program, which is another LP.

####
####   Dual Programs
####

as seen in the above quiz, a dual program can be seen as minimizing the upper bound of the primal program.

Primal                      Dual
---------------------------------------
max c^T * x           min b^T * y
s.t. Ax <= b          s.t. A^T * y >= c
x >= 0                     y >= 0

For all feasible y, b^T * y  =  y^T * b  >=  y^T * Ax  >=  c^T * x           # generalized statement from the quiz
( b >= Ax, y^T >= 0 )

[Weak Duality Lemma]
if x is feasible for the primal and y is feasible for the dual, then c^T * x  <=  b^T * y

====> it says the value of the primal is at most the value of the dual.

####
####   Duality Theorem
####

(1) if either the primal problem or the dual has a finite optimal solution, so does the other, and the optimum objective values are equal.
(2) if either problem has an unbounded objective value, the the other is infeasible.

Proof for (2):
- suppose the primal is unbounded and y is feasible for the dual.
- by weak duality, b^T * y >= c^T * x for all feasible x. since the primal is unbounded, however, there exists x s.t. c^T * x > b^T * y  thus contradiction.

Proof for (1):
- if the primal has a finite optimal solution, then it has a finite optimal basic solution by the fundamental theorem of LP.
- let [x(B)^T,x(D)^T]^T be such a basic solution and let A=[B,D] and c^T = [c(B)^T,c(D)^T]
- recall (from simplex algo) that 0 >= r(D)^T = c(D)^T - c(B)^T * B^-1 * D
(now we will contruct a solution for the dual)
- y^T = c(B)^T * B^-1, we have y^T * D >= c(D)^T   and y^T * A = [y^T * B, y^T * D] = [c(B)^T, c(B)^T * B^-1 * D] >= [c(B)^T,c(D)^T] = c^T
- thus y is feasible for the dual.
- moreover, y^T * b = c(B)^T * B^-1 * b = c(B)^T * x(B). by weak duality, y^T * b is optimal.

=====> notice how we just gave a way to determine the dual optimal solution.

####
####   Dual Optimal Solution
####

let the linear program (standard form)

max c^T * x
s.t. Ax=b
x>=0

have an optimal basic feasible solution corresponding to the basic formed by the columns of B.
note  x(B) = B^T * b

then y^T = c(B)^T * B^-1 is an optimal solution to the dual problem
min y^T * b
s.t. y^T A >= C^T

moreover, the optimum values of the problems are equal.

####
####   Duality of Max Matching
####

[primal]  ------------------><----------------- [dual]
max-flow                                        min-cut
max-matching in b.p. graphs                     min vertex cover in b.p. graphs

Let xij indicate whether (i,j) is included in the matching.

as a LP, the problem becomes to maximize the number of matched edges,
subject to the contraintsthat no vertex in R can be matched more than once, and no vertex in L can be matched more than once.
and of course, we cannot have negatively matched edges.

max      Σ xij
i∈L,j∈R

s.t.  Σ xij <= 1 for all j in R     # yi
i ∈ L
Σ xij <= 1 for all i in L     # yj
j ∈ R
xij > 0

Its dual is

min Σ yi  +   Σ yj
i∈L       j∈R

s.t. yi + yj >= 1 for all i∈L, j∈R
yi,yj >= 0 for all i∈L, j∈R

===> interpretation is straight-fwd.
vertex i is in the cover if and only if yi = 1
vertex j in in the cover if and only if yj = 1
every edge must have at least one vertex in the cover.
and we are trying to minimize the size of the cover.

====> every decision problem in P can be converted into a LP as LP is P-complete

####
####   Duality of Max-Flow
####

Let fuv the flow and let cuv the capacity along (u,v) in E.

our goal is to maximize the flow out of the source;
subject to the conservation of flow constraints, and the capacity constraints.

max  Σ fsv
v:(s,v) ∈ E

s.t.  Σ fvu    -   Σ fuv  =  0 for all u in V-{s,t}   # conservation of flow      Yu
v:(v,u) ∈ E    v:(u,v)

fuv <= cuv     # capacity constraint    Yuv
fuv >= 0       # flows must be non-negative

its dual is to minimize the sum over all the edges of Yuv times cuv.
note Yu has no role in the objective function because its cofficient is zero.
the constraints for the dual involve several cases

min   Σ  Yuv * cuv
u,v ∈ E

s.t.  Ysv + Yv >= 1  for all (s,v) in E
Yut - Yv >= 0  for all (v,t) in E
Yuv + Yv - Yu >= 0 for all other edges
Yuv >= 0

Yuv = 1  iff u in A, v in B
Yv  = 1  iff v in A
Yv  = 1  iff v in B

note : optimal is a local optimum
similarly maximal (local) <-> maximum (global)
minimal (local) <-> minimum (global)
(ref)
http://english.stackexchange.com/questions/50843/what-is-the-difference-between-optimal-and-optimum

How to construct a LP for which both the primal and the dual are infeasible.
(ref)
http://math.stackexchange.com/questions/393818/construct-a-linear-programming-problem-for-which-both-the-primal-and-the-dual-pr

How to remove the absolute value in optimization problem and turn it into LP
(ref)

###############################################
#####       Approximation Algorithms      #####
###############################################

so far, our focus was on polynomial time algos.
but there are many NP/NPC problems that we wanna solve.
exponential algos are ok for smaller instances, but also we can resort to "approximation" algos.

####
####  Min Vertex Cover Approximation
####

find the smallest set of vertices that touch all edges. (recall this is NPC)

---------------------------
ApproxVC( G=(V,E) )
C = empty
while E != empty
choose (u,v) in E arbitrarily
C = C | {u,v}
E = E - {(u,w)|w in V} - {(w,v)|w in V}
return C
--------------------------

===> while E is not empty, pick an edge randomly, and put the two vertices of the edge into the answer set C,
===> then remove all edges incident to that edge(=those two vertices).

Theorem:
given a graph G, having a minimum vertex cover C*, the algorithm ApproxVC returns a cover C such that |C| / |C*| <= 2
i.e. in the worst case, the approximation is twice as big.

Proof:
lets call the set of edges chosen in the while loop M.
edges of M are vertex disjoint, so any cover must include at least 1 vertex from each.
thus |M| is the lower bound for |C*|      # see visual in the video
|M| <= |C*| <= |C| = 2|M| <= 2|C*|        # see how we define the lower & upper bounds for |C|

(very good visual)  https://youtu.be/-N8p3d9Njao
(very good quiz)    https://youtu.be/U_kYM3-xN6I

####
####  Lower Bounding the Optimum
####

minimum vertex cover.

0   |M|     |C*|   |C|     |V|
------------------------------
= 2|M|

finding |C*| is NPC, so we just defined the lower bound, and the upper bound in terms of the lower bound.
as discussed, 1  <=  |C| / |C*|  <= 2
but this is only approx algo, and does not let us solve a decision problem like "is the min vertex cover 6 possible?"
and yet it lets us solve certain optimization problems.

####
####   Optimization
####

Definition:                                                   # E.g. Min Vertex Cover
1. set of valid instances                                     # 1. set of undirected graphs
2. for each instance, there is a set of feasible solutions.   # 2. set of vertex covers
3. objective function (the thing we will try to min|max)      # 3. size of the cover
4. goal (min or max)                                          # 4. min

==> 1.2.3. are polynomially computable. (can be verified in P)
==> optimization prob has a decision version.
e.g.                                                       # e.g.
whether OPT <= T for min                                  # |C*| <= T
whether OPT >= T for max
==> optimization is NP-hard if decision version is.
==> a problem is NP-hard if an NPC prob can be reduced to it. # NPC prob Max-ind set can be reduced to Min Vertex Cover, thus MVC is NP-hard

####
####   Approximation algo
####

DefL an algorithm is a factor D approximation algorithm if on instance I, it produces a solution s such that obj(s,I) <= D(|I|) * OPT(I)

# D for delta
# Delta can be a function of the instance size
# inequality <= gets reversed >= for maximization problem
e.g.
min vertex cover has a polynomial 2-factor approximation algorithm

==> can we further trade-off speed and accuracy?  (not really for vertex cover, but for some problems, yes)
==> we will see an example below.

####
####   FPTAS (fully polynomial time approximation scheme)  for Subset Sum
####

Subset Sum  (optimization version)    # original is if a subset of S sums to t
-------------
input : a set S = {x1,x2,,,xn} of numbers and threshold t
goal  : find S' in S  s.t.   Σ x is maximized without exceeding t
x ∈ S'

Theorem:
for every e > 0, there is an O( (n^2 * log(t))  /  e) time, factor (1+e)^-1 approximation algorithm for subset sum.

===> the smaller e is better accuracy, but the bigger e is faster.
===> polynomial time approximation scheme (PTAS for short)     # if e appears on the exponent
===> fully PTAS (FPTAS for short)                              # if e appears on the denominator

(video)  https://youtu.be/Vyr0Ct9ISfQ

====> does every NPC problem admit an FPTAS ?  (true, only if P=NP)
====> there are some problems where approximating the optimum within certain afctors leads to a polynomial algo that solves every NPC problem.
====> this phenomenon is called "the hardness of approximation" (big topic in the study of complexity)

####
####   General TSP
####

we will illustrate TSP(an NPC problem) is impossible to approximate within any constant factors.

Given: graph G=(V,E)    # usually complete, i.e. every edge is present
cost C:E -> N_0

Goal : find the minimum cost Hamiltonian cycle.  # i.e. visit every vertex without visiting the same one twice.

===> to do so in a way is to minimize the sum of the cost of the path you choose.

Theorem:  if P!=NP, then for any constant factor A >= 1, there is no poly time A-approximation algo for TSP.

Proof: reduce from Hamiltonian cycle.
given a graph G (not necessarily complete), we assign edge cost as below
c(u,v)=1 for (u,v) in E, and
c(u,v)= A|V|+1 for (u,v) NOT in E

If G has a Hamiltonian cycle, then c(H*) = |V|            # H* is optimal ham cycle
and an A-approximation will return H s.t. c(H) <= A|V|

If G has NO Hamiltonian cycle, then c(H) >= c(H*) >=  (|v|-1) + (A|V|+1)  =  (A+1)|V|

Thus, if there is a poly time A-approxmation algo for hamiltonian cycle, it will yield a poly time algo for finding a ham cycle, which is impossible because we proved TSP = NPC, unless P=NP.

(visual)  https://youtu.be/ar5-aJKg7tU

####
####  Metric TSP approximation
####

Given : graph G=(V,E)     # usually complete
cost c: E -> N_0  s.t. c(u,v) <= c(u,w) + c(w,v)   # triangular inequality

Goal : find the minimum cost hamiltonian cycle.

ApproxMetricTSP(G,c):
- build a min spanning tree T    #  O(V^2)
- run DFS on T          # keep track of the discovery order of vertices, O(V)
- return the cycle induced by the discovery order of the vertices as H

===> discovery order works only for metric TSP because we can assume triangular inequality

(must see visuals)  https://youtu.be/3noYwBdRhH0

Theorem: ApproxMetricTSP() is poly-time 2-approximation algo for metric TSP.
Proof  :
min spanning tree and DFS are both poly-time, so ApproxMetricTSP is poly-time.
let H* be a min cost Hamiltonian cycle. then  Σ c(e) <=  Σ c(e)
e ∈ T      e ∈ H*
because removing an edge from H* would create a lower cost spanning tree.
the cost of DFS traversal is 2 *  Σ c(e)    # see the visual
e ∈ T
by the triangle inequality,  Σ c(e) <= 2 *  Σ c(e)  <=  2 * Σ c(e)   # see the visual
c ∈ H          e ∈ T           e ∈ H*

(visual) https://youtu.be/smO8FTxXqV8

###############################################
####          Randomized Algorithms        ####
###############################################

what is random? we just assume random number generation function exists.

####
####   Verifying Polynomial Identities
####

input : representations of polynomials A(x) and B(x) with degree d
goal  : decide if A(x) == B(x)

def polyequal(A,B,d):
x = randInt(1,100*d)   # returns random int in {1,,,,100*d}
return A(x)==B(x)
-------------------------
unless A(x) - B(x) = 0, it can have only d roots.
chance of picking one is <= d/100d = 1/100 = 1%

####
####   Discrete Probability Spaces
####

Def:
a discrete probability consists of
1. a sample sapce Omega, finite or countably infinite.
2. a probability function
Pr: {E|E in Omega} -> R                  # E for Event
a. for any E in Omega, 0 <= Pr(E) <= 1
b. Pr(Omega) = 1
c. for any pairwise disjoint, E1,E2,E3,,,, Pr(U Ei)  =   Σ Pr(Ei)
i >= 1     i >= 1

===> lets think about the above with polyequal() example

1. sample space Omega = {1,,,,100d}
2. let Fi - {i} for i - 1,,,,100d
and  define Pr(Fi) = 1/|Omega|
thus, for all s in Omega, Pr(s) =  Pr(U Fi) =    Σ Pr(Fi) =  |s| / |Omega|
i ∈ s         i ∈ s

===> in the above example, we assumed uniform probability distrib, i.e. all events are equally likely.

(good quiz)  https://youtu.be/C6hg04WgA58

####
####   Repeated Trials
####

improved version of polyequal()

-----------------------
def polyequal(A,B,d):
equal = True
for i in 1...k:
x = randInt(1,100d)
equal = equal and A(x)==B(x)
return equal
-----------------------

now, |Omega| = 100d ^ k
at most d^k of these make polyequal() fail.
let F be the event that A(x)==B(x) in both iterations.
by symmetry, Pr(F) <= d^k / (100d)^k  =  1 / 100^k

####
####  Independence & Conditional Probability
####

Def:
The conditional probablity that event E occurs given that event F occurs is Pr(E|F) = Pr(E ^ F) / Pr(F)

when Pr(E|F) = Pr(E),   it means E and F are independent.

Def:
Events E and F are independent if Pr(E ^ F) = Pr(E) * Pr(F)

####
####  Monte Carlo & Las Vegas
####

an algo that can return a wrong answer is called Monte Carlo algo.

-----------------------
def polyequal(A,B,d):
equal = True
for i in 1...k:
x = randInt(1,100d)
equal = equal and A(x)==B(x)
return equal
-----------------------

if A==B, then  Pr[polyequal(A,B,d)] = 1
if A!=B, then  Pr[polyequal(A,B,d)] <= 100^-k

===> since wrong answers may be given only when A!=B, we call this One-Sided Monte Carlo algo.
===> there can be two-sided Monte Carlo algo in geneal.
===> algo that never returns incorrect answers is called "Las Vegas" algo

Here is the Las Vegas version of polyequal()

-----------------------
def polyequal(A,B,d):
order = sample(range(1,100d),d+1)   # without replacement, i.e. we eliminate what is already checked from the sample pool
for i in order:
if A(x) != B(x):
return False
return True
-----------------------

if A==B, then  Pr[polyequal(A,B,d)] = 1
if A!=B, then  Pr[polyequal(A,B,d)] = 0

Let Ei be the event that A(order[i]) = B(order[i])

k-1        k-1
Pr( ^  Ei) =   Π  d-k / 100d-k    <=  (1/100)^k
i=0        i=0

let Ei be the event that the ith element chosen from S is a root.
Pr(E1 ^ E2 ^ E3) = Pr(E3 | E1 ^ E2) * Pr(E2 | E2) * Pr(E1)

(good quiz)  https://youtu.be/D1BuqatlxX4

####
####  Random Variables
####

Def:  a random variable on a sample space Omega is a function X: Omega -> R

e.g. let X be the sum of two die throws.
sample space is Omega: {1,,,,6} * {1,,,,6}
X(i,j) = i + j

Notation: we write (X=a) to indicate the event where X is a
(X=a) = {s in Omega| X(s)=a }

e.g. (X=3) = {(1,2),(2,1)}

####
####   Quick sort
####

psuedo code   # assuming the elems are distinct
-------------------------------------------------
def quicksort(A):
if len(A) < 2:
return A
pivot = choice(A)                    # how do we choose pivot? middle-index? random?
AL = [x for x in A if x < pivot]
AR = [x for x in A if x > pivot]
return quicksort(AL) + [pivot] + quicksort(AR)
-------------------------------------------------

Suppose quicksort is used to sort a list of elements a1 < a2 < a3 ,,,, < an
Let Eijk be the event that ai is separated from aj by the kth choice of a pivot.
Let a free variable Xij = 1 if the algo compares ai with aj, and Xij=0 otherwise.

CLAIM: E[Xij] = 2/(j-i-1)
Proof:
E[Xij] = 1 * Pr(Xij = 1) + 0 * Pr(Xij = 0)     # this is true of all 0or1 variable
=  Σ Pr(Xij = 1 ^ Eijk)                # Law of total probability
k
=  Σ Pr(Xij = 1 | Eijk) Pr(Eijk)
k:Pr(Eijk)>0
=  Σ 2/(j-i+1) * Pr(Eijk)
k:Pr(Eijk)>0
= 2/(j-i+1)

==> ai compared to aj when one is chosen as pivot

Pr(Xij = 1 | Eijk) =  2/(j-i+1)        # because to separete, they had to be un-separeted till then.
# and one of them must be chosen as pivot for the comparison to occur
(visual)   https://youtu.be/G3eA5FceOao

By the linearity of expectations, the number of comparisions is
n   n
Σ E[Xij] =   Σ 2/(j-i+1)  <    Σ   Σ 2/j  =  O(n*log n)
i<j          i<j               i=1 j=1

Theorem:
for any input of distinct elements, quicksort with pivots chosen uniformly at random compares O(n*log n) elements in expectation.

"concentration bound" - distrib is concentrated.

####
####  Minimum Cut Algorithm
####

Given : A connected graph G=(V,E)
output: A set S of minimum size s.t. (V,E-S) has two connected components.

####
####   1/2 factor approximation algo for finding the maximum acyclic subgraph
####

Given a directed graph G=(V,E), find a max cardinality set of E' from E such that G=(V,E') is acyclic.

==> just take any linear ordering of all vertices, and count fwd & backward edges. and remove bigger numbered direction.
==> acyclic. because any fwd(bwd) move you make, you cannot go back(fwd).
==> you then have covered at least |E|/2 which makes it 1/2 factor approx.

####
####   A minimum cut algorithm
####

Given :  a connected graph G=(V,E)
Output:  a set S of minimum size s.t. (V,E-S) has two connected components.

===> S is a min "cut set"
===> basically, it means, if you have to cut at least one edge, to cut the G in two, how many edges do you have to remove at least?
(ref visual)  http://en.wikipedia.org/wiki/Minimum_cut

-------------------------
def randMinCutSet(V,E):
while |V| > 2:
e = choice(E)                 # random choice is often good enough
(V,E) = contractEdge(V,E,e)
return E
-------------------------

NOTE:  contract an edge means merge two vertices into one.

Theorem:  the algo randMinCutSet() outputs a min-cut set with probability at least 2/(n*(n-1))

Proof: let C be a minimum cut set. let Ei be the event that the edge chosen in iteration i is not in C

Corollary: the smallest set of those returned by 2/(n*(n-1)) * ln(1/k) calls to randMinCutSet() is a minimum cut set with probability 1-k.

Proof: each call is independent so the probability that all fail is at most ( 1 - 2/(n(n-1))^2/(n(n-1))*ln(1/k) < exp(-ln(1/k)) = k

####
####  Max 3-SAT
####

Given: A collection of clauses c1,c2,,,cm over variables x1,x2,,,,xn with each clause consisting of 3 literals of distinct variables.

output: an assignment to x1,x2,,,xn s.t. a maximum number of clauses is satisfied.

Theorem: there is an assignment that satisfies at least 7/8 of the clauses.

Proof: consider an assignment chosen uniformly at random.
let Yi = 1 if ci is satisfied
Yi = 0 otherwise
m
let Y =  Σ Yi
i=1

m
E[Y] =  Σ E[Yi] = m * 7/8    # there are 1 or 0 for each variable, each clause has 3 vars, so 8 patterns
i=1                   # only one of them cause Yi to be 0

===> "expectation argument" in probabilistic method

##
##  derandomization
##

def instantiate:    ## assigns value to var in every clause in C, simplifies, and returns the result.

e.g.
var = x1, value = T

then we can derive

(x2 | x4 | x7)  = (x2 | x4 | x7)
(x1 | x2 | x5)  = (T)
(!x1 | x3 | x7) = (x3 | x7)
(!x1)           = F

def EofY(C):        ## calculates E[Y] for the set of C of clauses.

e.g.
(T)            = 1
(F)            = 0
(x3)           = 1/2
(x1 | !x2)     = 3/4
(x2 | x4 | x5) = 7/8

------------------------------
def approxMax3SAT(C):    # our derandomization algo
assignment = []
for i= 1 to n:
Cp = instantiate(C,xi,T)
Cn = instantiate(C,xi,F)
if EofY(Cp) >= EofY(Cn)
C = Cp
assignment.append(Xi=T)
else
C = Cn
assignment.append(Xi=F)
return assignment
------------------------------

===> for each variable, decides a better T-probability assignment.
===> maintains the invariant
===> known as "method of conditional expectation"

7/8 * m  =<  EofY(C) = EofY(Cp)/2 + EofY(Cn)/2

Theorem:
There is a deterministic algorithm which given any 3-CNF formula finds an assignment that satisfies at least 7/8 of the caluses.

(this is the best we can do, unless P=NP)

(derandomization for max cut: really good paper)  http://i.cs.hku.hk/~hubert/teaching/c8601_2011/notes2.pdf

####
####  PCP theorem  (probabilistically checkable proof)
####

let S denote the set of all 3CNF formulas.

theorem:
for any constant, c > 7/8, there is a polytime computable function f:S->S  s.t. for every p in S over a sufficient number of variables:
- if p is satisfiable, then f(p) is satisfiable.
- if p is not satisfiable, then every assignment for f(p) satisfies fewer than an c fraction of the clauses.

===> very useful for proving the hard-ness of approximation

the instructor note:
----
The PCP theorem itself is the hardest thing to understand.  I would focus on what it does :

1. transforms 3CNF formulas that are satisfiable into formulas that are satisfiable
2. transforms 3CNF formulas that are unsatisfiable to outputs where you can't even get close to satisfying it.
----

1. 2015-01-05 22:47:58 |
2. Category : gatech
3. Page View: