### path finding AI – game tree – minimax negmax alphabeta

Three well established algorithms for path finding in game trees.

- minimax
- negmax
- alphabeta pruning

the goal is to find a move that leads to the greatest chance of victory or at least a draw.
just like it was with algos for search trees, we usually (depending on branching factor and CPU/mem resources) need to define the max depth.

##
## assumption
##

- all moves are accessble for a given state.
- opponents will make no mistake, and make the best possible move.
- an evaluation fuction is available to assess the score of a state for a player. the score tells how advantageous to the player a particular move is. the higher the better for the player (the worse for the opponent)

#######################
####    MiniMax    ####
#######################

considering the nature of a game tree, where each level of the tree alternates between player and opponent, for level N for player, we need to chose the move that maximize the score(state,player), then at its next level N+1, we need to choose the move that minimizes the score(state,opponent), and alternates so on. hence the name MiniMax.

##
## input:
##
- a starting state s
- max search depth
- player/opponent identifiers

##
## output:
##
- return the best move m that maxmimizes player's victory. (or at least a draw)

-------------------------------- // psuedo code

bestMove(s,maxDepth,player,opponent) // s = a given state at which we start the minimax processing

original = player;
[move,score] = minimax(s,maxDepth,player,opponent);
return move;

minimax(s,maxDepth,player,opponent)

best = [null,null];
if(maxDepth == 0 || no valid moves from s for player)
{
score = evaluate(s,original);  // evaluate s for original player
return [null,score];
}

foreach m (all valid moves of s for player)
{
apply m to s;
[move,score] = minimax(s,maxDepth-1,opponent,player); // decrement the maxDepth, sawp p and o
undo m from s;
if(player == original)
if(score > best.score) best = [m,score]; // maximize score for player
else
if(score < best.score) best = [m,score]; // minimize score for opponent
}
return best;

--------------------------------

===> as above, it's very simple. just recursively traverse the tree down, and iterate all child nodes, and pick the min score node for the opponent level, and pick the max score node for the player level.

===> clearly, we can further generalize this logic by applying an alternating negation at each recursion level as we will see in NegMax section below.

##
## evaluation function:
##
- this is hard. but crucial. the entire algo depends on it. but it's more of a craft work than science. often just a heuristic measure.
- assumption here is eval-func(state,player) returns a positive number for an advantageous state, and a negative num for a no-good state for palyer.

##
## note: the total number of board states being examined
##

if branching factor b is fixed, then given the search depth d, it's simply  sum[i=0,d] b^i
but if branching factor decrements at each turn of the game, e.g. tic-tac-toe, then it's   sum[i=1,d] b!/(b-i)!

##
## complexity analysis:
##

best   :  O(b^d)
average:  O(b^d)
worst  :  O(b^d)

b = branching factor
d = max search depth

######################
####    NegMax    ####
######################

##
## input/output : same as minimax
##

it's essentially the same as minimax, but to simplify, make the scores of child nodes negative, then pick the biggest.

note  maxValue == -minValue

---------------------------------- // psuedo code

bestMove(s,maxDepth,player,opponent)  // notice how we don't need to record the original player

[move,score] = negmax(s,maxDepth,player,opponent);
return move;

negmax(s,maxDepth,player,opponent)

best = [null,null];
if(maxDepth == 0 || no valid moves of s for player)
{
score = evaluate s for player;  // not for original
return [null,score];
}

foreach m (all valid moves of s for player)
{
apply m to s;
[move,score] = negmax(s,maxDepth-1,opponent,player); // decrement maxDepth, swap p & o
undo m from s;
if(-score > best.score) best = [m, -score]; // cleverly simplifies the minimax
}
return best;

----------------------------------

====> to intuitively grasp this, think of the first couple of iterations.
====> maxDepth=0 would just be looking at the start state only, meaning no move.
====> maxDepth=1 would look at all immediate child nodes from s. but scores for them are calculated from opponent's perspective, so we want the mimimum score state, which you can obtain by negating all scores, and picking the biggest.
====> maxDepth=2 is easy, just think of maxDepth=2 as maxDepth=1 from opponent's perspective. all it has to do is minimize its opponent(opponent's opponent = player)'s score by first negating all child nodes scores and then picking the biggest.
====> the rest is recursion.

##
##  complexity analysis:
##

same as minimax.

###############################
####   AlphaBeta Pruning   ####
###############################

minimax/negmax needs to examine all child nodes at each state until reaching the maxDepth.
but that is computationally inefficient, hopefully you can specify upper & lower bounds.

##
## input/output: same as minimax & negmax
##

alpha : the lower bound of the score that player can at least get.
beta  : the upper bound of the score that player may get.

at start, we initialize [alpha,beta] = [-inf,+inf]

our usual evaluate(state,player) function will only give the lower bound, and we keep improving the lower bound thru our iteration.
but that is the same as cutting down the upper bound for the opponent.
thus [alpha,beta] indicates the "window of opportunity" for player's score.

given [alpha,beta] for player, we can easily translate it for the opponent's [alpha,beta]
(because we use the same eval function for both players.)

player [alpha,beta] = opponent [-beta,-alpha]

if alpha = +inf, player found a winning move. (which means beta=-inf for opponent)

the rest of the algo is the same as negmax.

------------------------------------- // psuedo code

bestMove(s,maxDepth,player,opponent)

[move,score] = alphabeta(s,maxDepth,player,opponent,-inf,+inf);
return move;

alphabeta(s,maxDepth,player,opponent,low,high)    // see how we retain low,high

best = [null,null];
if(maxDepth == 0 || no valid moves of s for player)
{
score = evaluate(s,player);
return [null,score];
}

foreach m (all moves from s for player)
{
apply m to s;
[move,score] = alphabeta(s,maxDepth-1,opponent,player,-high,-low);
undo m from s;
if(-score > best.score)
{
low = -score;
best = [m,low];
}
if(low >= high) return best;   // pruning happens here
}
return best;

-------------------------------------

=====> obviously, by retaining/updating the lower/upper bounds for both players at each recursive call, we can identify a certain state in which the window closes for player.

##
## complexity analysis:
##

theoretically, the first level is the same as negmax because we start with [alpha,beta] = [-inf,+inf], but at the end of the first level, the lower bound will update for player, which updates the upper bound for opponent. this requires examining all child nodes of s.
but at the second level, when it checks its first child node, the lower bound may immediately close the alpha-beta window, then the rest of the nodes need not be checked.

in other words, at each layer, at the most optimal case where the best move is found at the first of iteration and we can prune the rest, then it's b*1*b*1... for the duration of d.

best   :  O(b^(d/2))
average:  O(b^(d/2))
worst  :  O(b^(d/2))

b = branching factor
d = max search depth

(ref)
http://www.cs.utsa.edu/~bylander/cs5233/a-b-analysis.pdf
http://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning
"Algorithms in a nutshell" Heineman et al, O'Reilly

1. 2014-08-28 10:46:26 |
2. Category : algo
3. Page View: