kenics.net

Technical notes on perl, python, php, sql, cgi, c/c++, q/kdb+, unix/shell, revision control tools, data structures & algorithms, and their applications into web services and other various forms of software engineering.

graph dfs bfs

 
######################## 
####      graph     #### 
######################## 
 
graph is yet another data structure. 
 
a graph consists of; 
- vertex (vertices in plural) : aka node 
- edge : aka link between vertices 
 
## 
## undirected VS directed 
## 
 
it's about whether edges have direction. if undirected, v1 -> v2 and v2 -> v1 are treated the same. 
 
## 
## weighted graph: 
## 
 
each edge comes with a cost (can be a negative number of even non-numeric, e.g. negative val representing profit/loss, or name of the path between two stations, etc) 
 
now the term "the shortest path" can mean either "the shortest edge count" or "the minimum total cost" 
 
## 
## hypergraph: 
## 
 
it's an elaborate extension of edge concept where an edge can link more than two vertices, like a sub-grouping of vertices within a group of vertices. 
 
 
############################################################# 
####     underlying structures for implementing graph    #### 
############################################################# 
 
(0) objects and pointers (simple enough, LL is a graph in this context) 
(1) adjacency list 
(2) adjacency matrix 
 
lets say we have a directed graph with V(v0-v4) 
 
v0-v2 (with weight 16) 
v0-v3 (with weight 23) 
v1-v4 (with weight 14) 
v3-v2 (with weight 77) 
 
 
(1) adjacency list 
 
we use an array of lists as below. 
 
v0 - v2(16) - v3(23) 
v1 - v4(14) 
v2 
v3 - v2(77) 
v4 
 
====> if undirected, then add entries for both ways as below. 
====> (hence, space cost doubles for undirected graph using adjacency list) 
 
v0 - v2(16) - v3(23) 
v1 - v4(14) 
v2 - v0(16) - v3(77) 
v3 - v0(23) - v2(77) 
v4 - v1(14) 
 
====> there are ways to reduce the operational costs (both time & space) for adjacency list based graph. 
(a) sort each list, which can be useful depending on the input pattern 
(b) aggregate to a range if contiguous input data set. 
e.g. 
// instead of 
v0 - v3 - v4 - v5 - v6 - v9 - v10 - v11 - v16 - v17 - v18 
// do 
v0 - v{3-6} - v{9-11} - v{16-18} 
 
 
(2) adjacency matrix 
 
we use an N-by-N matrix A, where index A[i][j] represents the weight of the edge v(i) -> v(j) 
 
   v0 v1 v2 v3 v4 
v0 -  -  16 23 - 
v1 -  -  -  -  14 
v2 -  -  -  -  - 
v3 -  -  77 -  - 
v4 -  -  -  -  - 
 
 
=====> similarly, if undirected, then just put the weight entries in both ways. this does not incur any extra storage space. 
=====> obviously, an N-by-N two dimensional matrix is very inefficient space-wise if the graph is "sparse". 
 
 
 
######################################## 
####    Depth first search (DFS)    #### 
######################################## 
 
a recursive back-tracking method to exchaustively search each vertex of the graph. needs a source vertex to begin with, and then after having traversed all the vertices, we will know the path, if it exists, between the source vertex and the target destination vertex, which is NOT necessarily the shortest path. 
 
assumption is each vertex from the input graph G is accessble/iterable. 
 
 
## 
## logic flow 
## 
 
lets say G(V,E), s = source vertex 
and to keep track of which vertex we already checked, we progressively mark each vertex WHITE, GRAY, BLACK 
 
WHITE = not checked at all 
GRAY = the vertex first visited 
BLACK = DFS done on the vertex 
 
also, 
 
d[v] = ++counter; // order in which v is marked GRAY 
f[v] = ++counter; // track the order for marking vertices BLACK 
pred[v] = predesessor vertex to v (theoretically there can be multiple, but we just choose the first one we visit) 
 
------------------------------------// pseudo code taken/modified from "Algorithms in a nutshell" by Heineman, et al. 
 
DFS(G,s) 

  foreach v (V)     // initialization 
  { 
    d[v] = f[v] = pred[v] = -1; 
    color[v] = WHITE; 
  } 
  counter_d = 0; 
  counter_f = 0; 
 
  // back-track starts 
  dfs_visit(s); 
 
  // to cover vertices unconnected from s 
  foreach v (V) 
  { 
    if(color[v] == WHITE) 
      dfs_visit(v); 
  } 

 
dfs_visit(u) 

  color[u] = GRAY; 
  d[u] = ++counter_d;    // means we start from 1 
 
  foreach v (immediate neighbors of u) 
  { 
    if(color[v] == WHITE) 
    { 
       pred[v] = u; 
       dfs_visit(v);   // recursive call 
    } 
  } 
 
  color[u] = BLACK; 
  f[u] = ++counter_f; 

 
-------------------------------------- 
 
===> as you can see above, dfs_visit(v) recursively traverses all vertices in DFS back-track way. but not guaranteeing the path we identify between two vertices is the shortest possible path (in terms of edge count). 
 
===> outputs are three arrays:  pred[], d[], f[]  that give you good information. 
===> e.g. you pick some v as source, and other v as target, just keep tracing pred[t] till you reach s. 
===> d[] & f[] tell you exactly how each vertex was DFS traversed. 
 
NOTE: DFS works for both directed and undirected graphs. 
 
## 
## complexity : O(V+E) 
## 
 
initialization iterates the entire set of vertices: O(V) 
 
we iterate thru all vertices and run dfs_visit() on each vertex exactly once. that is O(V) 
 
within each dfs_visit(v), we check all of v's immediate neighbors, which is O(E) 
 
thus in total, O(2V+E) = O(V+E) 
 
this holds true for worst/average/best cases. 
 
 
 
######################################### 
####    Breadth first search (BFS)   #### 
######################################### 
 
given a source vertex s in a graph G(V,E), BFS visits all vertices that are k edges away, then k+1 edges away vertices, then k+2, and so on. (hence the name breadth) until no more vertices to reach from s. 
 
in other words, unlike DFS, BFS will not visits vertices unconnected from s. 
 
 
NOTE: works for both directed and undirected graphs. 
 
 
## 
## logic flow 
## 
 
lets say G(V,E) and source vertex s. 
 
we use the same concept of WHITE, GRAY, BLACK from DFS. 
 
also, 
 
pred[v] = predesessor vertext to v 
dist[v] = distance(edge count) from vertex s. dist[s] = 0. 
 
------------------------- // taken/modified from Heineman book 
 
BFS(G,s) 

  // initialization 
  foreach v (V) 
  { 
    pred[v] = dist[v] = -1; 
    color[v] = WHITE; 
  } 
 
  color[s] = GRAY; 
  dist[s] = 0; 
 
  Q = empty Queue; 
 
  Q.enqueue(s); 
 
  while( !Q.empty() ) 
  { 
    u = Q.head(); 
 
    foreach v (immediate neighbors of u) 
      if(color[v] == WHITE) 
      { 
        dist[v] = dist[u] + 1; 
        pred[v] = u; 
        color[v] = GRAY; 
        Q.enqueue(v); 
      } 
 
    Q.dequeue(); // popping u out 
    color[u] = BLACK; 
  } 
 

 
---------------------------------- 
 
===> dist[v] will contain the shortest path edge count from s, which you can trace via pred[] 
===> unreachable vertices from s will have pred[v] = -1 
 
 
## 
## complexity 
## 
 
space complexity: O(V)   // size of queue 
---- 
time complexity: O(V+E) 
 
initialization iterates the entire V, thus O(V) 
 
each vertex is enqueued/dequeued exactly once, thus O(V) 
and before dequeueing a vertex, we traverse all its immediate neighbors, thus O(E) 
 
thus in total, O(2V+E) = O(V+E) 
 
 

  1. 2014-08-14 02:55:44 |
  2. Category : algo
  3. Page View:

Google Ads