graph dfs bfs

########################
####      graph     ####
########################

graph is yet another data structure.

a graph consists of;
- vertex (vertices in plural) : aka node
- edge : aka link between vertices

##
## undirected VS directed
##

it's about whether edges have direction. if undirected, v1 -> v2 and v2 -> v1 are treated the same.

##
## weighted graph:
##

each edge comes with a cost (can be a negative number of even non-numeric, e.g. negative val representing profit/loss, or name of the path between two stations, etc)

now the term "the shortest path" can mean either "the shortest edge count" or "the minimum total cost"

##
## hypergraph:
##

it's an elaborate extension of edge concept where an edge can link more than two vertices, like a sub-grouping of vertices within a group of vertices.

#############################################################
####     underlying structures for implementing graph    ####
#############################################################

(0) objects and pointers (simple enough, LL is a graph in this context)

lets say we have a directed graph with V(v0-v4)

v0-v2 (with weight 16)
v0-v3 (with weight 23)
v1-v4 (with weight 14)
v3-v2 (with weight 77)

we use an array of lists as below.

v0 - v2(16) - v3(23)
v1 - v4(14)
v2
v3 - v2(77)
v4

====> if undirected, then add entries for both ways as below.
====> (hence, space cost doubles for undirected graph using adjacency list)

v0 - v2(16) - v3(23)
v1 - v4(14)
v2 - v0(16) - v3(77)
v3 - v0(23) - v2(77)
v4 - v1(14)

====> there are ways to reduce the operational costs (both time & space) for adjacency list based graph.
(a) sort each list, which can be useful depending on the input pattern
(b) aggregate to a range if contiguous input data set.
e.g.
v0 - v3 - v4 - v5 - v6 - v9 - v10 - v11 - v16 - v17 - v18
// do
v0 - v{3-6} - v{9-11} - v{16-18}

we use an N-by-N matrix A, where index A[i][j] represents the weight of the edge v(i) -> v(j)

v0 v1 v2 v3 v4
v0 -  -  16 23 -
v1 -  -  -  -  14
v2 -  -  -  -  -
v3 -  -  77 -  -
v4 -  -  -  -  -

=====> similarly, if undirected, then just put the weight entries in both ways. this does not incur any extra storage space.
=====> obviously, an N-by-N two dimensional matrix is very inefficient space-wise if the graph is "sparse".

########################################
####    Depth first search (DFS)    ####
########################################

a recursive back-tracking method to exchaustively search each vertex of the graph. needs a source vertex to begin with, and then after having traversed all the vertices, we will know the path, if it exists, between the source vertex and the target destination vertex, which is NOT necessarily the shortest path.

assumption is each vertex from the input graph G is accessble/iterable.

##
## logic flow
##

lets say G(V,E), s = source vertex
and to keep track of which vertex we already checked, we progressively mark each vertex WHITE, GRAY, BLACK

WHITE = not checked at all
GRAY = the vertex first visited
BLACK = DFS done on the vertex

also,

d[v] = ++counter; // order in which v is marked GRAY
f[v] = ++counter; // track the order for marking vertices BLACK
pred[v] = predesessor vertex to v (theoretically there can be multiple, but we just choose the first one we visit)

------------------------------------// pseudo code taken/modified from "Algorithms in a nutshell" by Heineman, et al.

DFS(G,s)

foreach v (V)     // initialization
{
d[v] = f[v] = pred[v] = -1;
color[v] = WHITE;
}
counter_d = 0;
counter_f = 0;

// back-track starts
dfs_visit(s);

// to cover vertices unconnected from s
foreach v (V)
{
if(color[v] == WHITE)
dfs_visit(v);
}

dfs_visit(u)

color[u] = GRAY;
d[u] = ++counter_d;    // means we start from 1

foreach v (immediate neighbors of u)
{
if(color[v] == WHITE)
{
pred[v] = u;
dfs_visit(v);   // recursive call
}
}

color[u] = BLACK;
f[u] = ++counter_f;

--------------------------------------

===> as you can see above, dfs_visit(v) recursively traverses all vertices in DFS back-track way. but not guaranteeing the path we identify between two vertices is the shortest possible path (in terms of edge count).

===> outputs are three arrays:  pred[], d[], f[]  that give you good information.
===> e.g. you pick some v as source, and other v as target, just keep tracing pred[t] till you reach s.
===> d[] & f[] tell you exactly how each vertex was DFS traversed.

NOTE: DFS works for both directed and undirected graphs.

##
## complexity : O(V+E)
##

initialization iterates the entire set of vertices: O(V)

we iterate thru all vertices and run dfs_visit() on each vertex exactly once. that is O(V)

within each dfs_visit(v), we check all of v's immediate neighbors, which is O(E)

thus in total, O(2V+E) = O(V+E)

this holds true for worst/average/best cases.

#########################################
####    Breadth first search (BFS)   ####
#########################################

given a source vertex s in a graph G(V,E), BFS visits all vertices that are k edges away, then k+1 edges away vertices, then k+2, and so on. (hence the name breadth) until no more vertices to reach from s.

in other words, unlike DFS, BFS will not visits vertices unconnected from s.

NOTE: works for both directed and undirected graphs.

##
## logic flow
##

lets say G(V,E) and source vertex s.

we use the same concept of WHITE, GRAY, BLACK from DFS.

also,

pred[v] = predesessor vertext to v
dist[v] = distance(edge count) from vertex s. dist[s] = 0.

------------------------- // taken/modified from Heineman book

BFS(G,s)

// initialization
foreach v (V)
{
pred[v] = dist[v] = -1;
color[v] = WHITE;
}

color[s] = GRAY;
dist[s] = 0;

Q = empty Queue;

Q.enqueue(s);

while( !Q.empty() )
{

foreach v (immediate neighbors of u)
if(color[v] == WHITE)
{
dist[v] = dist[u] + 1;
pred[v] = u;
color[v] = GRAY;
Q.enqueue(v);
}

Q.dequeue(); // popping u out
color[u] = BLACK;
}

----------------------------------

===> dist[v] will contain the shortest path edge count from s, which you can trace via pred[]
===> unreachable vertices from s will have pred[v] = -1

##
## complexity
##

space complexity: O(V)   // size of queue
----
time complexity: O(V+E)

initialization iterates the entire V, thus O(V)

each vertex is enqueued/dequeued exactly once, thus O(V)
and before dequeueing a vertex, we traverse all its immediate neighbors, thus O(E)

thus in total, O(2V+E) = O(V+E)

1. 2014-08-14 02:55:44 |
2. Category : algo
3. Page View: