纸上谈兵: 拓扑排序

Vamei

发布于 2018-01-18 16:08:25

8500

《文明》是一款风靡20多年的回合制策略游戏，由Sid Meier开发。《文明》结构宏大，内容丰富，玩法多样，游戏性强，称得上是历史上最伟大的游戏。在文明中，你可以选择某个文明的，从部落开始发展，在接下来的几千年的历史中，发展科技、开荒拓野、发动战争等等。游戏在保持自由度的前提下，对各个社会文明的发展顺序有很好的仿真性，让玩家仿佛置身于历史长河，坐观文明的起落。美国的一些大学的历史系甚至于使用该游戏作为教学工具。

(作为《文明》的忠实爱好者，多少个昼夜耗费在一张张地图上啊。好吧，是为了学习历史。)

“科技树”

《文明》中的科技树

游戏有一个“科技树”系统，即你可以按照该图所示的顺序来发展科技。“科技树”是这样一个概念: 为了研发某个科技，我们必须已经掌握了该科技的所有前提科技(prerequisite)。科技树中有一些箭头，从A科技指向B科技，那么A就是B的前提科技。比如，根据上面的科技树，为了研发物理(Physics)，我们必须已经掌握了化学(Chemistry)和天文学(Astronomy)。而为了研发化学(Chemistry)，我们又必须掌握了火药(Gunpowder)。如果一个科技没有其它科技指向，那么就不需要任何前提科技就可以研发，比如图中的封建主义(Feudalism)。如果同一时间只能研发一项科技，那么玩家的科技研发就是一个序列，比如封建主义，工程(Engineering)，发明(Invention)，火药，一神教(monotheism)，骑士制度(Chivalry)…… 有些序列是不合法的，比如工程，发明，火药……，在研发的“发明”时，并没有满足“封建主义”的前提条件。

一个有趣的问题是，如何找到合法的序列呢？当我们在设计计算机玩家时，很可能需要解决这样一个问题。

图(graph)中的拓扑排序算法(Topological Sort)可以给出一个合法序列。虽然在游戏中被称为“科技树”，但“科技树”并不符合数据结构中的树结构。在数据结构中，树的每个节点只能由一个父节点，整个树只有一个根节点。因此，“科技树”是一个不折不扣的图结构。此外，该树是一个有向的无环(acyclic)图。图中不存在环 (cycle, 环是一个长度大于0的路径，其两端为同一节点)。

拓扑排序

拓扑排序利用了一个事实，即在一个无环网络中，有一些节点是没有箭头指入的，比如科技树中的一神教、封建主义、工程。它们可以作为序列的起点。

选择没有箭头指入的节点中的任一个，可以作为合法序列的起点，放入序列。
当我们将某个节点放入序列后，去掉该节点以及从该节点指出的所有箭头。在新的图中，选择没有箭头指向的节点，作为序列中的下一个点。
重复上面的过程，直到所有的节点都被放入序列。

对于每个节点，我们可以使用一个量入度(indegree)，用来表示有多少个箭头指入该节点。当某个节点被删除时，图发生变化，我们需要更新图中节点的入度。

为了方便，我将“科技树”中的节点编号，为了符合C语言中的传统，编号从0开始。下面是对应关系:

0	1	2	3	4	5	6	7	8	9
一神教	神学	印刷术	民主	自由艺术	骑士制度	音乐理论	银行	经济学	封建主义
10	11	12	13	14	15	16	17	18	19
教育	天文学	航海术	物理	重力理论	发明	火药	化学	磁学	工程
20	21
冶金	军事传统

我们根据编号，绘制上述图(graph)。我同时用三种颜色，来表示不同点的入度(indegree):

我们使用邻接表的数据结构(参考纸上谈兵: 图)，来实现图。构建代码如下:

/* By Vamei */
#include <stdio.h>
#include <stdlib.h>

#define NUM_V 22

typedef struct node *position;

/* node */
struct node {
    int element;
    position next;
};

/* 
 * operations (stereotype)
 */
void insert_edge(position, int, int);
void print_graph(position graph, int nv);

/* for testing purpose */
void main()
{
    struct node graph[NUM_V];
    int i;

    // initialize the vertices
    for(i=0; i<NUM_V; i++) {
        (graph+i)->element = i;
        (graph+i)->next    = NULL;
    }

    // insert edges
    insert_edge(graph,0,1);
    insert_edge(graph,0,5);
    insert_edge(graph,1,2);
    insert_edge(graph,1,10);
    insert_edge(graph,2,3);
    insert_edge(graph,3,4);
    insert_edge(graph,7,8);
    insert_edge(graph,9,5);
    insert_edge(graph,9,15);
    insert_edge(graph,10,6);
    insert_edge(graph,10,7);
    insert_edge(graph,10,11);
    insert_edge(graph,11,12);
    insert_edge(graph,11,13);
    insert_edge(graph,13,14);
    insert_edge(graph,13,18);
    insert_edge(graph,15,16);
    insert_edge(graph,16,17);
    insert_edge(graph,17,13);
    insert_edge(graph,17,20);
    insert_edge(graph,19,15);
    insert_edge(graph,20,21);

    print_graph(graph,NUM_V);
}

/* print the graph */
void print_graph(position graph, int nv) {
    int i;
    position p;
    for(i=0; i<nv; i++) {
        p = (graph + i)->next;
        printf("From %3d: ", i);
        while(p != NULL) {
            printf("%d->%d; ", i, p->element);
            p = p->next;
        }
        printf("\n");
    }
}

/*
 * insert an edge
 */
void insert_edge(position graph,int from, int to)
{
    position np;
    position nodeAddr;

    np = graph + from;

    nodeAddr = (position) malloc(sizeof(struct node));
    nodeAddr->element = to;
    nodeAddr->next    = np->next;
    np->next = nodeAddr;
}

我们可以根据上面建立的图，来获得各个节点的入度。使用一个数组indeg来储存各个节点的入度，即indeg[i] = indgree of ith node。下面的init_indeg()用于初始化各节点的入度:

// according to the graph, initialize the indegree at each vertice
void init_indeg(position graph, int nv, int indeg[]) {
    int i;
    position p;

    // initialize
    for(i=0; i<nv; i++) {
        indeg[i] = 0;
    }

    // assimilate the graph
    for(i=0; i<nv; i++) {
        p = (graph + i)->next;
        while(p != NULL) {
            (indeg[p->element])++;
            p = p->next;
        }
    }   
}

正如我们之前叙述的，我们需要找到入度为0的节点，并将这些节点放入序列。find_next()就是用于寻找下一个入度为0的节点。找到对应节点后，返回该节点，并将该节点的入度更新为-1:

/* find the vertice with 0 indegree*/
int find_next(int indeg[], int nv) {
    int next;
    int i;
    for(i=0; i<nv; i++) {
        if(indeg[i] == 0) break;
    }
    indeg[i] = -1;

    return i;
}

当我们取出一个节点放入序列时，从该节点出发，指向的节点的入度会减1，我们用update_indeg()表示该操作:

// update indeg when ver is removed
void update_indeg(position graph, int nv, int indeg[], int ver) {
    position p;
    p = (graph + ver)->next;
    while(p != NULL) {
        (indeg[p->element])--;
        p = p->next;
    }
}

有了上面的准备，我们可以寻找序列。使用数组seq来记录序列中的节点。下面的get_seq()用于获得拓扑排序序列:

// return the sequence
void get_seq(position graph, int nv, int indeg[], int seq[]){
    int i;
    int ver;
    for(i = 0; i<nv; i++) {
        ver = find_next(indeg, nv);
        seq[i] = ver;
        update_indeg(graph, nv, indeg, ver);
    }
}

综合上面的叙述，我们可以使用下面代码，来实现拓扑排序:

/* By Vamei */
#include <stdio.h>
#include <stdlib.h>

#define NUM_V 22

typedef struct node *position;

/* node */
struct node {
    int element;
    position next;
};

/* 
 * operations (stereotype)
 */
void insert_edge(position, int, int);
void print_graph(position, int);
void init_indeg(position, int , int *);
void update_indeg(position, int, int *, int);
void get_seq(position, int, int *, int *);
int find_next(int *, int);

/* for testing purpose */
void main()
{
    struct node graph[NUM_V];
    int indeg[NUM_V];
    int seq[NUM_V];
    int i;
    
    // initialize the graph
    for(i=0; i<NUM_V; i++) {
        (graph+i)->element = i; 
        (graph+i)->next    = NULL;
    }

    
    insert_edge(graph,0,1);
    insert_edge(graph,0,5);
    insert_edge(graph,1,2);
    insert_edge(graph,1,10);
    insert_edge(graph,2,3);
    insert_edge(graph,3,4);
    insert_edge(graph,7,8);
    insert_edge(graph,9,5);
    insert_edge(graph,9,15);
    insert_edge(graph,10,6);
    insert_edge(graph,10,7);
    insert_edge(graph,10,11);
    insert_edge(graph,11,12);
    insert_edge(graph,11,13);
    insert_edge(graph,13,14);
    insert_edge(graph,13,18);
    insert_edge(graph,15,16);
    insert_edge(graph,16,17);
    insert_edge(graph,17,13);
    insert_edge(graph,17,20);
    insert_edge(graph,19,15);
    insert_edge(graph,20,21);

    print_graph(graph,NUM_V);
    
    init_indeg(graph,NUM_V,indeg);
    get_seq(graph, NUM_V, indeg, seq);
    for (i=0; i<NUM_V; i++) {
        printf("%d,", seq[i]);
    }
}

void print_graph(position graph, int nv) {
    int i;
    position p;
    for(i=0; i<nv; i++) {
        p = (graph + i)->next;
    printf("From %3d: ", i);
    while(p != NULL) {
        printf("%d->%d; ", i, p->element);
        p = p->next;
    }
    printf("\n");
    }
}
/*
 * insert an edge
 */
void insert_edge(position graph,int from, int to) 
{
    position np;
    position nodeAddr;
    
    np = graph + from;

    nodeAddr = (position) malloc(sizeof(struct node));
    nodeAddr->element = to;
    nodeAddr->next    = np->next;
    np->next = nodeAddr;    
}

void init_indeg(position graph, int nv, int indeg[]) {
    int i;
    position p;

    // initialize
    for(i=0; i<nv; i++) {
        indeg[i] = 0;
    }

    // update
    for(i=0; i<nv; i++) {
        p = (graph + i)->next;
        while(p != NULL) {
            (indeg[p->element])++;
            p = p->next;
        }
    }  
}

// update indeg when ver is removed
void update_indeg(position graph, int nv, int indeg[], int ver) {
    position p;
    p = (graph + ver)->next;
    while(p != NULL) {
        (indeg[p->element])--;
        p = p->next;
    }
}

/* find the vertice with 0 indegree*/
int find_next(int indeg[], int nv) {
    int next;
    int i;
    for(i=0; i<nv; i++) {
        if(indeg[i] == 0) break;
    }
    indeg[i] = -1;

    return i;
}

// return the sequence
void get_seq(position graph, int nv, int indeg[], int seq[]){
    int i;
    int ver;
    for(i = 0; i<nv; i++) {
        ver = find_next(indeg, nv);
        seq[i] = ver;
        update_indeg(graph, nv, indeg, ver);
    }
}

上面算法中有一个遍历所有节点的for循环，而循环中的find_next()函数也会遍历所有的节点。因此，算法复杂度为[$O(|V|^2)$]。

在find_next()中，我们将放入序列的节点入度记为-1。find_next()每次会遍历所有的节点，没有必要遍历入度为-1的节点。为了改善算法复杂度，我们可以使用一个队列(或者栈)来记录入度为0的元素。我们每次从队列中取出一个元素放入拓扑序列，并更新相邻元素的入度。如果该元素的相邻元素的入度变为0，那么将它们放入队列中。可以证明，这样的改造后，算法复杂度为[$O(|V|+|E|)$]