Yukuan's Blog: Graph-based Modeling on Python

Agent-based modeling 的電腦實驗，最核心的架構不外乎一個大迴圈（super loop）包裹著一群規則。大迴圈每跑一輪，系統就更新一次狀態，就如同時鐘的滴答（tick）聲般。通常系統每次滴答都會收集一次統計資料。這類實驗，有許多現成的 famework 可用，如最經典的 Swarm 及其後進 Repast ，還有我模仿 Repast ，自己搞的一個，它們都提供了 start, pause, stop 等流程控制的介面。

模擬複雜網路，也可以套用 Agent-based modeling 架構。不過諸如網路的群聚度（clustering coefficient, C）及網路特徵的路經長度（characteristic path length, L）等統計數據計算需耗費的時間，隨著網路的規模成長很快，所以不適合運作太頻繁。但我們卻得靠這些統計數據來判斷網路是否收斂、試驗是否該終止了，想試驗各個參數排列組合時，更是雪上加霜。

透過 Intermediate File 是很直覺的解法。程式每次執行都餵入一個輸入檔來初始網路結構，並以參數決定 super loop 要跑幾次。統計數據只在程式要結束時計算，然後將網路狀態及統計數據吐給輸出檔。下次要執行，就以這個輸出檔當作新的輸入，然後一樣決定要跑幾圈，最後得到另一個輸出檔。採取接力的方式，不用每次都從頭跑，不會浪費先前程式執行的時間。

Multithread 是另一個可行作法。採用 Agent-based modeling framework 的方式，一個 thread 跑核心的規則。另一個 thread 則控制試驗的流程，讓 user 決定何時該暫停下來，計算統計數據，然後決定是否繼續執行。

Dynamic Programming Language 是這次的正解。利用動態程式語言，如 Python ，可以很方便於執行時期控制模擬試驗的程式流程。我就是採用 Python ，搭配 NetworkX 來處理 Graph ，並以 matplotlib 來繪製圖表。接下來就看看如何架設這個試驗平台：

下載 Python Enthought Edition 的 installer 。它是 Python 的加強版，除了 Python 基本工具外，還整合了許多有用的工具，例如 matplotlib 。
到這裡下載 NetworkX 的 installer 。
依序執行這兩個 installer 。

裝好了後，就以 Emergence of a small world from local interactions: modeling acquaintance networks 裡描述的實驗來操刀，試寫第一個 Python 程式：

"""
Simulation for the verifying of "Emergence of a Small World from Local
Interactions: Modeling Acquaintance Network", Davidsen J. et al. 2002.
 
Public Variables:
    - Iters: the interations of loop, used in loop()
 
Private Variables:
    - _G: graph of the acquaintance network
    - _N: the total of nodes
    - _p: the death-and-birth probability, used in _step2()
    - _run: a generator returned by loop()
 
Usage
=====
 
This module must run on a python shell with interative mode; and call
public functions e.g. runnext().
 
Example
-------
You can simply run this module with default settings.
 
    > python -i acq2002.py
    >>> runnext(10)         # loops 10 times of Iters time-steps.
    ...
 
Setting parameters is allowed.
 
    >>> start(1000, 0.04)   # starts the simulation with new N, and p
    >>> Iters=10000         # sets the iterations of each loop
    >>> runnext(5)          # loops 5 times of 10000 time-steps.
    ...
    >>> Iters=100000        # sets the iterations of each loop
    >>> runnext(2)          # loops another 2 times of 100000 time-steps
    ...
 
After above, you may want to save the result.
 
    >>> savepkfig()
    >>> savenet()
 
"""
__author__ = "Jiang Yu-Kuan, yukuan.jiang(at)gmail.com"
__date__ = "2006/08/22~2006/10/02"
__revision__ = "1.10"
 
import random as rnd
import networkx as nx # Must import this as a name to avoid namespace collision!
 
 
def _create():
    """Create an empty acquaintance network.
 
    Adds _N nodes to _G, and _G contains no any edges.
    """
    global _G
    _G = nx.Graph()
    _G.add_nodes_from(xrange(_N))  # adds n nodes into _G
 
 
def _step1():
    """Friend making of two persons.
 
    One randomly chosen person picks any two his acquaintances and
    introduces them to each another. If they have not met before, a new
    link between them is formed. In case the person chosen has less than
    two acquaintances, he introduces himself to one other random person.
    """
    u, v = rnd.sample(xrange(_N), 2)
    nb = _G.neighbors(u)
    if len(nb) > 1:
        u, v = rnd.sample(nb, 2)
    _G.add_edge(u, v)
 
 
def _step2(p):
    """Death and Birth of a chosen person.
 
    With probability p, one randomly chosen person is removed from the
    network, including all links connected to this node, and replaced by
    a new person with one randomly chosen acquaintance.
    """
    if rnd.random() < p:
        # sol. 1
        v = rnd.randrange(_N)
        _G.delete_node(v)
        _G.add_node(v)
 
        # sol. 2: slower than sol. 1.
        #_G.delete_edges_from(_G.edges(rnd.randrange(_N)))
 
 
def _avgK2():
    """Return the average square of degree of all nodes."""
    kss = [k*k for k in _G.degree(_G.nodes())]
    return sum(kss)/float(_N)
 
 
def _avgSPL():
    """Return the average shortest path length of Graph _G."""
    pathlengths=[]
    for v in _G.nodes():
        # sol. 1
        #spl = nx.single_source_shortest_path_length(_G,v)  # including v to v
        #for pl in spl.values():
        #    pathlengths.append(pl)
 
        # sol. 2
        pathlengths += nx.single_source_shortest_path_length(_G,v).values()
 
    return sum(pathlengths) / float(len(pathlengths)-_N)
 
 
def _stat():
    """Gather statistics for Graph _G."""
    print "N, p = %d, %f" % (_N, _p)
    print "Degree histogram:", nx.degree_histogram(_G)
    print "Avg. degree, <k>:", 2.*_G.number_of_edges()/_N
    print "Avg. square of degree, <k^2>:", _avgK2()
    print "Avg. clustering coefficient, C:", nx.average_clustering(_G)
    print "Avg. shortest path length, L:", _avgSPL()  # spends huge time
 
 
def savepkfig(fn='acq2002pk.png'):
    """Save the p(k) figure.
 
    - fn: file name
    """
    import pylab as mpl
    dh = nx.degree_histogram(_G)
    pk = mpl.array(dh)/float(_N)
 
    mpl.loglog(pk, 'r--')
    mpl.grid(True)
    mpl.gca().xaxis.grid(True, which='minor')
    mpl.xlabel('k')
    mpl.ylabel('p(k)')
    mpl.title('N=%d, p=%d' % (_N, _p) )
    mpl.savefig(fn)
 
 
def savenet(fn='acq2002', fm='GPickle'):
    """Save the acquaintance network.
 
    - fn: file name
    - fm: file format
    """
    save = {'edgelist':nx.write_edgelist,   # edge list
            'adjlist':nx.write_adjlist,     # node adjacency-list
            'yaml':nx.write_yaml,           # YAML text format
            'gpickle':nx.write_gpickle      # Python pickle format.
            }
    fm = fm.lower()
    if fm not in save.keys():
        fm = 'edgelist'
 
    import os
    fn, ext = os.path.splitext(fn)
    ext = ext[1:].lower()
    if ext in save.keys():
        fm = ext
    save[fm](_G, '.'.join([fn, fm]))
 
 
def _loop():
    """Loop _step1 and _step2 of Iters times and gather statistics.
 
    This function uses yield statement and returns the total iterations of loop
    """
    import time
    t_start = time.time()  # records the start time
    t_ttl = 0  # clears the total time spent
    i_cnt = i_ttl = 0  # clears the loop counter and total loops.
 
    _create()  # create an empty network
 
    while 1:
        i_cnt += 1
 
        _step1()
        _step2(_p)
 
        if i_cnt == Iters:
            i_ttl += i_cnt
            i_cnt = 0
 
            _stat()
 
            t_inc = time.time() - t_start
            t_ttl += t_inc
            print "Time spent (increment,total): (%f,%f)" % (t_inc, t_ttl)
            yield i_ttl
            t_start = time.time()
 
 
def runnext(n=1):
    """Call _run.next() of n times."""
    for i in xrange(n):
        print _run.next()
 
 
def start(N=100, p=0.04):
    """Start the module.
 
    - N: the total of nodes of the network
    - p: the death-and-birth probability
    """
    global _N, _p, _run
    _N, _p = N, p
    _run = _loop()  # gets a generator
 
 
Iters = 100000
 
if __name__ == "__main__":
    # Uses Psyco if available
    try:
        import psyco
        psyco.full()
        print "Psyco: OK"
    except ImportError:
        print "Psyco: FAIL"
 
    start(100, .04)
 
    """
    import profile, pstats
    profile.run('runnext()', 'acq2002.prof')
 
    s = pstats.Stats('acq2002.prof')
    s.sort_stats('time','name').print_stats()
    """

行 203~211 的 start() 除了測試這個模組外，也用作主程式。
行 211 讓我們取得 generator 給 _run 。
行 214 設定每次 _run.next() 要跑幾輪 super loop ，每 _run.next() 一次，才統計一次數據。
行 225 執行 start() ，並指定不同的 node 數 N 及死生發生的機率 p 。
這個模組執行後，可以在 Python shell 下 _run.next() ，每 _run.next() 一次，大迴圈就跑 Iters 輪。此外，還可以在 Python shell 中改變 Iters 的次數。
行 197~200 的 runnext() 讓 Python shell 下，調用多次 _run.next() 更方便。
行 217~223 使用 Psyco 來加速。

註：

NetworkX 的用法可以參考其 Tutorial 及 API Reference 。
Python 的 random 模組說明可以參考這裡。
要以 matplotlib 繪製圖表，可以參考其 Tutorial 及 Screenshots 。其刻意模仿 Matlab 的用法，用來倍感親切。
其他 Python 的說明文件可以參考 Python Tutorials 。

6 comments:

Gardener-Sinya said...: 哇~~

好特別的圖啊!!
^^; 8/31/2006 11:27 AM
York said...: To sinya-planter:

只是為將來自己論文要跑的模擬實驗作一下熱身 ^___^; 8/31/2006 4:30 PM
York said...: 今天為程式作了更新，請參考內文。; 9/04/2006 9:41 PM
York said...: 昨天學妹 Email 了 graph-tool 的 link 過來。

細看之下發現其功能還滿完整的，且可以秀出漂亮的統計圖表， kernel 更是使用先前稍微介紹過的 Boost Graph Library ，程式執行效率應該會比 NetworkX 好很多。

美中不足的是，其使用方式不合我胃口，且網上的說明文件也不是很完整。等有機會，農閒時再來玩玩。; 9/13/2006 2:52 PM
York said...: 經過多次的重構，這段 Python code，越來越 Pythonic 了。; 9/21/2006 4:09 AM
York said...: 這一、兩天測試了 Psyco ，它可以讓這個採用 Graph 的程式執行速度提昇近一倍。（參考 init 副程式）

可惜對 XGraph 的版本沒有明顯效益。

此外，聽說 for .Net 的 IronPython 可改善 Python 程式執行效率，有 1.7 倍的乘數。也許以後再實際試試。; 9/23/2006 5:41 PM

Yukuan's Blog

Tuesday, August 29, 2006

Graph-based Modeling on Python

6 comments:

Recent Posts

Recent Comments

Blog Labels