Merge pull request #3 from PaddlePaddle/master

Merge From paddle/PGL

Merge pull request #3 from PaddlePaddle/master
Merge From paddle/PGL
b505f7d4 · Huang Zhengjie · GitHub · 36febd60 · 96cc0ec6 · b505f7d4
7 changed file
--- a/README.md
+++ b/README.md
 <img src="./docs/source/_static/logo.png" alt="The logo of Paddle Graph Learning (PGL)" width="320">

-[DOC](https://pgl.readthedocs.io/en/latest/) | [Quick Start](https://pgl.readthedocs.io/en/latest/instruction.html) | [中文](./README.zh.md)
+[DOC](https://pgl.readthedocs.io/en/latest/) | [Quick Start](https://pgl.readthedocs.io/en/latest/quick_start/instruction.html) | [中文](./README.zh.md)

 Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle).

@@ -76,12 +76,12 @@ Because of the different node types on the heterogeneous graph, the message deli


 ## Large-Scale: Support distributed graph storage and distributed training algorithms
-In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted [PaddleFleet](https://github.com/PaddlePaddle/Fleet) as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so tcan easily set up a large scale distributed training algorithm with MPI clusters.
+In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted [PaddleFleet](https://github.com/PaddlePaddle/Fleet) as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so it can easily set up a large scale distributed training algorithm with MPI clusters.

 <img src="./docs/source/_static/distributed_frame.png" alt="The distributed frame of PGL" width="800">


-## Highlight: Tons of Models
+## Model Zoo
 The following are 13 graph learning models that have been implemented in the framework. See the details [here](https://pgl.readthedocs.io/en/latest/introduction.html#highlight-tons-of-models)

 |Model | feature |
@@ -125,6 +125,8 @@ pip install pgl

 PGL is developed and maintained by NLP and Paddle Teams at Baidu

+E-mail: nlp-gnn[at]baidu.com
+
 ## License

 PGL uses Apache License 2.0.
--- a/README.zh.md
+++ b/README.zh.md
 <img src="./docs/source/_static/logo.png" alt="The logo of Paddle Graph Learning (PGL)" width="320">

-[文档](https://pgl.readthedocs.io/en/latest/) | [快速开始](https://pgl.readthedocs.io/en/latest/instruction.html) | [English](./README.md)
+[文档](https://pgl.readthedocs.io/en/latest/) | [快速开始](https://pgl.readthedocs.io/en/latest/quick_start/instruction.html) | [English](./README.md)

 Paddle Graph Learning (PGL)是一个基于[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)的高效易用的图学习框架

@@ -28,7 +28,7 @@ Paddle Graph Learning (PGL)是一个基于[PaddlePaddle](https://github.com/Padd
        return fluid.layers.sequence_pool(msg, "sum")
 ```

-尽管DGL用了一些内核融合（kernel fusion）的方法来将常用的sum，max等聚合函数用scatter-gather进行优化。但是对于**复杂的用户定义函数**，他们使用的Degree Bucketing算法，仅仅使用串行的方案来处理不同的分块，并不同充分利用GPU进行加速。然而，在PGL中我们使用基于LodTensor的消息传递能够充分地利用GPU的并行优化，在复杂的用户定义函数下，PGL的速度在我们的实验中甚至能够达到DGL的13倍。即使不使用scatter-gather的优化，PGL仍然有高效的性能表现。当然，我们也是提供了scatter优化的聚合函数。
+尽管DGL用了一些内核融合（kernel fusion）的方法来将常用的sum，max等聚合函数用scatter-gather进行优化。但是对于**复杂的用户定义函数**，他们使用的Degree Bucketing算法，仅仅使用串行的方案来处理不同的分块，并不会充分利用GPU进行加速。然而，在PGL中我们使用基于LodTensor的消息传递能够充分地利用GPU的并行优化，在复杂的用户定义函数下，PGL的速度在我们的实验中甚至能够达到DGL的13倍。即使不使用scatter-gather的优化，PGL仍然有高效的性能表现。当然，我们也是提供了scatter优化的聚合函数。


 ### 性能测试
@@ -75,12 +75,12 @@ Paddle Graph Learning (PGL)是一个基于[PaddlePaddle](https://github.com/Padd
 <img src="./docs/source/_static/distributed_frame.png" alt="The distributed frame of PGL" width="800">


-## 特色：丰富性——覆盖业界大部分图学习网络
+## 丰富性——覆盖业界大部分图学习网络

-下列是框架中已经自带实现的十三种图网络学习模型
+下列是框架中已经自带实现的十三种图网络学习模型。详情请参考[这里](https://pgl.readthedocs.io/en/latest/introduction.html#highlight-tons-of-models)

 | 模型 | 特点 |
-|---|---|--- |
+|---|---|
 | GCN | 图卷积网络 |
 | GAT | 基于Attention的图卷积网络 |
 | GraphSage | 基于邻居采样的大规模图卷积网络 |
@@ -121,6 +121,8 @@ pip install pgl

 PGL由百度的NLP以及Paddle团队共同开发以及维护。

+联系方式 E-mail: nlp-gnn[at]baidu.com
+
 ## License

 PGL uses Apache License 2.0.
--- a/docs/source/_static/logo.png
+++ b/docs/source/_static/logo.png
--- a/docs/source/md/introduction.md
+++ b/docs/source/md/introduction.md
@@ -96,7 +96,7 @@ In most cases of large-scale graph learning, we need distributed graph storage a
 <div/>


-## Highlight: Tons of Models
+## Model Zoo
 The following are 13 graph learning models that have been implemented in the framework.

 |Model | feature |

--- a/docs/source/quick_start/md/quick_start.md
+++ b/docs/source/quick_start/md/quick_start.md
@@ -95,7 +95,7 @@ After defining the GCN layer, we can construct a deeper GCN model with two GCN l
 ```python
 output = gcn_layer(gw, gw.node_feat['feature'],
                hidden_size=8, name='gcn_layer_1', activation='relu')
-output = gcn_layer(gw, output, hidden_size=2,
+output = gcn_layer(gw, output, hidden_size=1,
                name='gcn_layer_2', activation=None)
 ```


--- a/examples/distribute_deepwalk/cluster_train.py
+++ b/examples/distribute_deepwalk/cluster_train.py
@@ -108,9 +108,7 @@ def build_complied_prog(train_program, model_loss):

    compiled_prog = F.compiler.CompiledProgram(
        train_program).with_data_parallel(
-            loss_name=model_loss.name,
-            build_strategy=build_strategy,
-            exec_strategy=exec_strategy)
+            loss_name=model_loss.name)
    return compiled_prog



--- a/pgl/graph_kernel.pyx
+++ b/pgl/graph_kernel.pyx
@@ -253,10 +253,22 @@ def sample_subset_with_eid(list nids, list eids, long long maxdegree, shuffle=Fa

 @cython.boundscheck(False)
 @cython.wraparound(False)
-def skip_gram_gen_pair(vector[long long] walk, long win_size=5):
+def skip_gram_gen_pair(vector[long long] walk_path, long win_size=5):
+    """Return node paris generated by skip-gram algorithm.
+
+    This function will auto remove the pair which src node is the same 
+    as dst node.
+
+    Args:
+        walk_path: List of nodes as a walk path.
+        win_size: the windows size used in skip-gram.
+
+    Return:
+        A tuple of (src node list, dst node list).
+    """
    cdef vector[long long] src
    cdef vector[long long] dst
-    cdef long long l = len(walk)
+    cdef long long l = len(walk_path)
    cdef long long real_win_size, left, right, i
    cdef np.ndarray[np.int64_t, ndim=1] rnd = np.random.randint(1,  win_size+1,
                                    dtype=np.int64, size=l)
@@ -270,15 +282,23 @@ def skip_gram_gen_pair(vector[long long] walk, long win_size=5):
            if right >= l:
                right = l - 1
            for j in xrange(left, right+1):
-                if walk[i] == walk[j]:
+                if walk_path[i] == walk_path[j]:
                    continue
-                src.push_back(walk[i])
-                dst.push_back(walk[j])
+                src.push_back(walk_path[i])
+                dst.push_back(walk_path[j])
    return src, dst

 @cython.boundscheck(False)
 @cython.wraparound(False)
 def alias_sample_build_table(np.ndarray[np.float64_t, ndim=1] probs):
+    """Return the alias table and event table for alias sampling.
+
+    Args:
+        porobs: A list of float numbers as the probability.
+
+    Return:
+        A tuple of (alias table, event table).
+    """
    cdef long long l = len(probs)
    cdef np.ndarray[np.float64_t, ndim=1] alias = probs * l
    cdef np.ndarray[np.int64_t, ndim=1] events = np.zeros(l, dtype=np.int64)