Merge pull request #2 from PaddlePaddle/master

merge

Merge pull request #2 from PaddlePaddle/master
merge
97c241f3 · kirayummy · GitHub · 711b7bdc · 2e3c52a4 · 97c241f3
209 changed file
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -23,7 +23,7 @@ repos:
    sha: 5bf6c09bfa1297d3692cadd621ef95f1284e33c0
    hooks:
    -   id: check-added-large-files
-        args: [--maxkb=1024]
+        args: [--maxkb=4096]
    -   id: check-merge-conflict
    -   id: check-symlinks
    -   id: detect-private-key

--- a/README.md
+++ b/README.md
-# Paddle Graph Learning (PGL) 
+<img src="./docs/source/_static/logo.png" alt="The logo of Paddle Graph Learning (PGL)" width="320">

-[DOC](https://pgl.readthedocs.io/en/latest/) | [Quick Start](https://pgl.readthedocs.io/en/latest/instruction.html)
+[DOC](https://pgl.readthedocs.io/en/latest/) | [Quick Start](https://pgl.readthedocs.io/en/latest/quick_start/instruction.html) | [中文](./README.zh.md)

-Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle). 
+Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle).


-<img src="https://github.com/PaddlePaddle/PGL/blob/master/docs/source/_static/framework_of_pgl.png" alt="The Framework of Paddle Graph Learning (PGL)" width="800">
+<img src="./docs/source/_static/framework_of_pgl.png" alt="The Framework of Paddle Graph Learning (PGL)" width="800">


-We provide python interfaces for storing/reading/querying graph structured data and two fundamental computational interfaces, which are walk based paradigm and message-passing based paradigm as shown in the above framework of PGL, for building cutting-edge graph learning algorithms.  Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.
+The newly released PGL supports heterogeneous graph learning on both walk based paradigm and message-passing based paradigm by providing MetaPath sampling and Message Passing mechanism on heterogeneous graph. Furthermor, The newly released PGL also support distributed graph storage and some distributed training algorithms, such as distributed deep walk and distributed graphsage. Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.


-## Highlight: Efficient and Flexible Message Passing Paradigm
+## Highlight: Efficiency - Support Scatter-Gather and LodTensor Message Passing

 One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt **Message Passing Paradigm** similar to [DGL](https://github.com/dmlc/dgl) to help to build a customize graph neural network easily. Users only need to write ```send``` and ```recv``` functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function ![](http://latex.codecogs.com/gif.latex?\\phi^e}) to send the message from the source to the target node. For the second step, the recv function ![](http://latex.codecogs.com/gif.latex?\\phi^v}) is responsible for aggregating ![](http://latex.codecogs.com/gif.latex?\\oplus}) messages together from different sources.



-<img src="https://github.com/PaddlePaddle/PGL/blob/master/docs/source/_static/message_passing_paradigm.png" alt="The basic idea of message passing paradigm" width="800">
+<img src="./docs/source/_static/message_passing_paradigm.png" alt="The basic idea of message passing paradigm" width="800">

-As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function ![](http://latex.codecogs.com/gif.latex?\\oplus}) on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**. 
+As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function ![](http://latex.codecogs.com/gif.latex?\\oplus}) on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**.


-<img src="https://github.com/PaddlePaddle/PGL/blob/master/docs/source/_static/parallel_degree_bucketing.png" alt="The parallel degree bucketing of PGL" width="800">
+<img src="./docs/source/_static/parallel_degree_bucketing.png" alt="The parallel degree bucketing of PGL" width="800">


 Users only need to call the ```sequence_ops``` functions provided by Paddle to easily implement efficient message aggregation. For examples, using ```sequence_pool``` to sum the neighbor message.
@@ -33,14 +33,14 @@ Users only need to call the ```sequence_ops``` functions provided by Paddle to e
 ```


-Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
+Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. In our experiments, PGL can reach up to 13 times the speed of DGL with complex user-defined functions. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.

-## Performance
+### Performance


-We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.
+We test all the following GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.

-| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL speed (epoch time) |
+| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) |
 | -------- | ----- | ----------------- | ------------ | ------------------------------------ |
 | Cora | GCN |81.75% | 0.0047s | **0.0045s** |
 | Cora | GAT | 83.5% | **0.0119s** | 0.0141s |
@@ -49,12 +49,64 @@ We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs t
 | Citeseer | GCN |70.2%| **0.0045** |0.0046s|
 | Citeseer | GAT |68.8%| **0.0124s** |0.0139s|

+
+If we use complex user-defined aggregation like [GraphSAGE-LSTM](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) that aggregates neighbor features with LSTM ignoring the order of recieved messages, the optimized message-passing in DGL will be forced to degenerate into degree bucketing scheme. The speed performance will be much slower than the one implemented in PGL. Performances may be various with different scale of the graph, in our experiments, PGL can reach up to 13 times the speed of DGL.
+
+| Dataset |   PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) | Speed up|
+| -------- |  ------------ | ------------------------------------ |----|
+| Cora | **0.0186s** | 0.1638s | 8.80x|
+| Pubmed | **0.0388s** |0.5275s | 13.59x|
+| Citeseer | **0.0150s** | 0.1278s | 8.52x |
+
+## Highlight: Flexibility - Natively Support Heterogeneous Graph Learning
+
+Graph can conveniently represent the relation between things in the real world, but the categories of things and the relation between things are various. Therefore, in the heterogeneous graph, we need to distinguish the node types and edge types in the graph network. PGL models heterogeneous graphs that contain multiple node types and multiple edge types, and can describe complex connections between different types.
+
+### Support meta path walk sampling on heterogeneous graph
+
+<img src="./docs/source/_static/metapath_sampling.png" alt="The metapath sampling in heterogeneous graph" width="800">
+The left side of the figure above describes a shopping social network. The nodes above have two categories of users and goods, and the relations between users and users, users and goods, and goods and goods. The right of the above figure is a simple sampling process of MetaPath. When you input any MetaPath as UPU (user-product-user), you will find the following results
+<img src="./docs/source/_static/metapath_result.png" alt="The metapath result" width="320">
+Then on this basis, and introducing word2vec and other methods to support learning metapath2vec and other algorithms of heterogeneous graph representation.
+
+### Support Message Passing mechanism on heterogeneous graph
+
+<img src="./docs/source/_static/him_message_passing.png" alt="The message passing mechanism on heterogeneous graph" width="800">
+Because of the different node types on the heterogeneous graph, the message delivery is also different. As shown on the left, it has five neighbors, belonging to two different node types. As shown on the right of the figure above, nodes belonging to different types need to be aggregated separately during message delivery, and then merged into the final message to update the target node. On this basis, PGL supports heterogeneous graph algorithms based on message passing, such as GATNE and other algorithms.
+
+
+## Large-Scale: Support distributed graph storage and distributed training algorithms
+In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted [PaddleFleet](https://github.com/PaddlePaddle/Fleet) as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so it can easily set up a large scale distributed training algorithm with MPI clusters.
+
+<img src="./docs/source/_static/distributed_frame.png" alt="The distributed frame of PGL" width="800">
+
+
+## Model Zoo
+The following are 13 graph learning models that have been implemented in the framework. See the details [here](https://pgl.readthedocs.io/en/latest/introduction.html#highlight-tons-of-models)
+
+|Model | feature |
+|---|---|
+| GCN | Graph Convolutional Neural Networks |
+| GAT | Graph Attention Network |
+| GraphSage |Large-scale graph convolution network based on neighborhood sampling|
+| unSup-GraphSage | Unsupervised GraphSAGE |
+| LINE | Representation learning based on first-order and second-order neighbors |
+| DeepWalk | Representation learning by DFS random walk |
+| MetaPath2Vec | Representation learning based on metapath |
+| Node2Vec | The representation learning Combined with DFS and BFS  |
+| Struct2Vec | Representation learning based on structural similarity |
+| SGC | Simplified graph convolution neural network |
+| GES | The graph represents learning method with node features |
+| DGI | Unsupervised representation learning based on graph convolution network |
+| GATNE | Representation Learning of Heterogeneous Graph based on MessagePassing |
+
+The above models consists of three parts, namely, graph representation learning, graph neural network and heterogeneous graph learning, which are also divided into graph representation learning and graph neural network.
+
 ## System requirements

 PGL requires:

-* paddle >= 1.5
-* networkx
+* paddle >= 1.6
 * cython


@@ -63,15 +115,18 @@ PGL supports both Python 2 & 3

 ## Installation

-pip install pgl
-
-
+You can simply install it via pip.

+```sh
+pip install pgl
+```

 ## The Team

 PGL is developed and maintained by NLP and Paddle Teams at Baidu

+E-mail: nlp-gnn[at]baidu.com
+
 ## License

 PGL uses Apache License 2.0.
--- a/README.zh.md
+++ b/README.zh.md
+<img src="./docs/source/_static/logo.png" alt="The logo of Paddle Graph Learning (PGL)" width="320">
+
+[文档](https://pgl.readthedocs.io/en/latest/) | [快速开始](https://pgl.readthedocs.io/en/latest/quick_start/instruction.html) | [English](./README.md)
+
+Paddle Graph Learning (PGL)是一个基于[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)的高效易用的图学习框架
+
+<img src="./docs/source/_static/framework_of_pgl.png" alt="The Framework of Paddle Graph Learning (PGL)" width="800">
+
+在最新发布的PGL中引入了异构图的支持，新增MetaPath采样支持异构图表示学习，新增异构图Message Passing机制支持基于消息传递的异构图算法，利用新增的异构图接口，能轻松搭建前沿的异构图学习算法。而且，在最新发布的PGL中，同时也增加了分布式图存储以及一些分布式图学习训练算法，例如，分布式deep walk和分布式graphsage。结合PaddlePaddle深度学习框架，我们的框架基本能够覆盖大部分的图网络应用，包括图表示学习以及图神经网络。
+
+
+# 特色：高效性——支持Scatter-Gather及LodTensor消息传递
+
+
+对比于一般的模型，图神经网络模型最大的优势在于它利用了节点与节点之间连接的信息。但是，如何通过代码来实现建模这些节点连接十分的麻烦。PGL采用与[DGL](https://github.com/dmlc/dgl)相似的**消息传递范式**用于作为构建图神经网络的接口。用于只需要简单的编写```send```还有```recv```函数就能够轻松的实现一个简单的GCN网络。如下图所示，首先，send函数被定义在节点之间的边上，用户自定义send函数![](http://latex.codecogs.com/gif.latex?\\phi^e})会把消息从源点发送到目标节点。然后，recv函数![](http://latex.codecogs.com/gif.latex?\\phi^v})负责将这些消息用汇聚函数 ![](http://latex.codecogs.com/gif.latex?\\oplus}) 汇聚起来。
+
+<img src="./docs/source/_static/message_passing_paradigm.png" alt="The basic idea of message passing paradigm" width="800">
+
+如下面左图所示，为了去适配用户定义的汇聚函数，DGL使用了Degree Bucketing来将相同度的节点组合在一个块，然后将汇聚函数![](http://latex.codecogs.com/gif.latex?\\oplus})作用在每个块之上。而对于PGL的用户定义汇聚函数，我们则将消息以PaddlePaddle的[LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html)的形式处理，将若干消息看作一组变长的序列，然后利用**LodTensor在PaddlePaddle的特性进行快速平行的消息聚合**。
+
+<img src="./docs/source/_static/parallel_degree_bucketing.png" alt="The parallel degree bucketing of PGL" width="800">
+
+用户只需要简单的调用PaddlePaddle序列相关的函数```sequence_ops```就可以实现高效的消息聚合了。举个例子，下面就是简单的利用```sequence_pool```来做邻居消息求和。
+
+```python
+    import paddle.fluid as fluid
+    def recv(msg):
+        return fluid.layers.sequence_pool(msg, "sum")
+```
+
+尽管DGL用了一些内核融合（kernel fusion）的方法来将常用的sum，max等聚合函数用scatter-gather进行优化。但是对于**复杂的用户定义函数**，他们使用的Degree Bucketing算法，仅仅使用串行的方案来处理不同的分块，并不会充分利用GPU进行加速。然而，在PGL中我们使用基于LodTensor的消息传递能够充分地利用GPU的并行优化，在复杂的用户定义函数下，PGL的速度在我们的实验中甚至能够达到DGL的13倍。即使不使用scatter-gather的优化，PGL仍然有高效的性能表现。当然，我们也是提供了scatter优化的聚合函数。
+
+
+### 性能测试
+我们用Tesla V100-SXM2-16G测试了下列所有的GNN算法，每一个算法跑了200个Epoch来计算平均速度。准确率是在测试集上计算出来的，并且我们没有使用Early-stopping策略。
+
+| 数据集 | 模型 |  PGL准确率 | PGL速度 (epoch) | DGL 0.3.0 速度 (epoch) |
+| -------- | ----- | ----------------- | ------------ | ------------------------------------ |
+| Cora | GCN |81.75% | 0.0047s | **0.0045s** |
+| Cora | GAT | 83.5% | **0.0119s** | 0.0141s |
+| Pubmed | GCN |79.2% |**0.0049s** |0.0051s |
+| Pubmed | GAT | 77% |0.0193s|**0.0144s**|
+| Citeseer | GCN |70.2%| **0.0045** |0.0046s|
+| Citeseer | GAT |68.8%| **0.0124s** |0.0139s|
+
+如果我们使用复杂的用户定义聚合函数，例如像[GraphSAGE-LSTM](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf)那样忽略邻居信息的获取顺序，利用LSTM来聚合节点的邻居特征。DGL所使用的消息传递函数将退化成Degree Bucketing模式，在这个情况下DGL实现的模型会比PGL的慢的多。模型的性能会随着图规模而变化，在我们的实验中，PGL的速度甚至能够能达到DGL的13倍。
+
+| 数据集 |   PGL速度 (epoch) | DGL 0.3.0 速度 (epoch time) | 加速比 |
+| -------- |  ------------ | ------------------------------------ |----|
+| Cora | **0.0186s** | 0.1638s | 8.80x|
+| Pubmed | **0.0388s** |0.5275s | 13.59x|
+| Citeseer | **0.0150s** | 0.1278s | 8.52x |
+
+
+## 特色：易用性——原生支持异构图
+
+图可以很方便的表示真实世界中事物之间的联系，但是事物的类别以及事物之间的联系多种多样，因此，在异构图中，我们需要对图网络中的节点类型以及边类型进行区分。PGL针对异构图包含多种节点类型和多种边类型的特点进行建模，可以描述不同类型之间的复杂联系。
+
+### 支持异构图MetaPath walk采样
+<img src="./docs/source/_static/metapath_sampling.png" alt="The metapath sampling in heterogeneous graph" width="800">
+上图左边描述的是一个购物的社交网络，上面的节点有用户和商品两大类，关系有用户和用户之间的关系，用户和商品之间的关系以及商品和商品之间的关系。上图的右边是一个简单的MetaPath采样过程，输入metapath为UPU（user-product-user），采出结果为
+<img src="./docs/source/_static/metapath_result.png" alt="The metapath result" width="320">
+然后在此基础上引入word2vec等方法，支持异构图表示学习metapath2vec等算法。
+
+### 支持异构图Message Passing机制
+
+<img src="./docs/source/_static/him_message_passing.png" alt="The message passing mechanism on heterogeneous graph" width="800">
+在异构图上由于节点类型不同，消息传递也方式也有所不同。如上图左边，它有五个邻居节点，属于两种不同的节点类型。如上图右边，在消息传递的时候需要把属于不同类型的节点分开聚合，然后在合并成最终的消息，从而更新目标节点。在此基础上PGL支持基于消息传递的异构图算法，如GATNE等算法。
+
+
+## 特色：规模性——支持分布式图存储以及分布式学习算法
+
+在大规模的图网络学习中，通常需要多机图存储以及多机分布式训练。如下图所示，PGL提供一套大规模训练的解决方案，我们利用[PaddleFleet](https://github.com/PaddlePaddle/Fleet)(支持大规模分布式Embedding学习)作为我们参数服务器模块以及一套简易的分布式存储方案，可以轻松在MPI集群上搭建分布式大规模图学习方法。
+
+<img src="./docs/source/_static/distributed_frame.png" alt="The distributed frame of PGL" width="800">
+
+
+## 丰富性——覆盖业界大部分图学习网络
+
+下列是框架中已经自带实现的十三种图网络学习模型。详情请参考[这里](https://pgl.readthedocs.io/en/latest/introduction.html#highlight-tons-of-models)
+
+| 模型 | 特点 |
+|---|---|
+| GCN | 图卷积网络 |
+| GAT | 基于Attention的图卷积网络 |
+| GraphSage | 基于邻居采样的大规模图卷积网络 |
+| unSup-GraphSage | 无监督学习的GraphSAGE |  
+| LINE | 基于一阶、二阶邻居的表示学习 |  
+| DeepWalk | DFS随机游走的表示学习 |  
+| MetaPath2Vec | 基于metapath的表示学习 |
+| Node2Vec | 结合DFS及BFS的表示学习 | 
+| Struct2Vec | 基于结构相似的表示学习 |
+| SGC | 简化的图卷积网络 | 
+| GES | 加入节点特征的图表示学习方法 | 
+| DGI | 基于图卷积网络的无监督表示学习 |
+| GATNE | 基于MessagePassing的异构图表示学习 |
+
+上述模型包含图表示学习，图神经网络以及异构图三部分，而异构图里面也分图表示学习和图神经网络。
+
+
+## 依赖
+
+PGL依赖于:
+
+* paddle >= 1.6
+* cython
+
+
+PGL支持Python 2和3。
+
+
+## 安装
+
+你可以简单的用pip进行安装。
+
+```sh
+pip install pgl
+```
+
+## 团队
+
+PGL由百度的NLP以及Paddle团队共同开发以及维护。
+
+联系方式 E-mail: nlp-gnn[at]baidu.com
+
+## License
+
+PGL uses Apache License 2.0.
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
 sphinx==2.1.0
 mistune
 sphinx_rtd_theme
+numpy >= 1.16.4
+cython >= 0.25.2
+paddlepaddle
+pgl
--- a/docs/source/_static/distributed_frame.png
+++ b/docs/source/_static/distributed_frame.png
--- a/docs/source/_static/framework_of_pgl.png
+++ b/docs/source/_static/framework_of_pgl.png
--- a/docs/source/_static/him_message_passing.png
+++ b/docs/source/_static/him_message_passing.png
--- a/docs/source/_static/logo.png
+++ b/docs/source/_static/logo.png
--- a/docs/source/_static/message_passing_paradigm.png
+++ b/docs/source/_static/message_passing_paradigm.png
--- a/docs/source/_static/metapath_result.png
+++ b/docs/source/_static/metapath_result.png
--- a/docs/source/_static/metapath_sampling.png
+++ b/docs/source/_static/metapath_sampling.png
--- a/docs/source/_static/parallel_degree_bucketing.png
+++ b/docs/source/_static/parallel_degree_bucketing.png
--- a/docs/source/api/pgl.contrib.heter_graph.rst
+++ b/docs/source/api/pgl.contrib.heter_graph.rst
+pgl.contrib.heter\_graph module: Heterogenous Graph Storage
+===============================
+
+.. automodule:: pgl.contrib.heter_graph
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/pgl.contrib.heter_graph_wrapper.rst
+++ b/docs/source/api/pgl.contrib.heter_graph_wrapper.rst
+pgl.contrib.heter\_graph\_wrapper module: Heterogenous Graph data holders for Paddle GNN.
+=========================
+
+.. automodule:: pgl.contrib.heter_graph_wrapper
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/pgl.rst
+++ b/docs/source/api/pgl.rst
@@ -8,3 +8,6 @@ API Reference
   pgl.layers
   pgl.data_loader
   pgl.utils.paddle_helper
+   pgl.utils.mp_reader
+   pgl.contrib.heter_graph
+   pgl.contrib.heter_graph_wrapper
--- a/docs/source/api/pgl.utils.mp_reader.rst
+++ b/docs/source/api/pgl.utils.mp_reader.rst
+pgl.utils.mp\_reader module: MultiProcessing reader helper function for Paddle.
+===============================
+
+.. automodule:: pgl.utils.mp_reader
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -40,7 +40,7 @@ copyright = '2019, PaddlePaddle'
 author = 'PaddlePaddle'

 # The full version, including alpha/beta/rc tags
-release = '0.1.0.beta'
+release = '1.0.1'

 # -- General configuration ---------------------------------------------------

@@ -73,13 +73,12 @@ lanaguage = "zh_cn"
 html_theme = "sphinx_rtd_theme"
 html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
 html_show_sourcelink = False
-#html_logo = 'pgl_logo.png'
+html_logo = '_static/logo.png'

 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
 html_static_path = ['_static']
-'''
 html_theme_options = {
    'canonical_url': '',
    'analytics_id': 'UA-XXXXXXX-1',  #  Provided by Google in your dashboard
@@ -96,4 +95,3 @@ html_theme_options = {
    'includehidden': True,
    'titles_only': False
 }
-'''
--- a/docs/source/examples/dgi_examples.rst
+++ b/docs/source/examples/dgi_examples.rst
+.. mdinclude:: ../../../examples/dgi/README.md
--- a/docs/source/examples/distribute_deepwalk_examples.rst
+++ b/docs/source/examples/distribute_deepwalk_examples.rst
+.. mdinclude:: ../../../examples/distribute_deepwalk/README.md
--- a/docs/source/examples/distribute_graphsage_examples.rst
+++ b/docs/source/examples/distribute_graphsage_examples.rst
+.. mdinclude:: ../../../examples/distribute_graphsage/README.md
--- a/docs/source/examples/gat_examples.rst
+++ b/docs/source/examples/gat_examples.rst
-.. mdinclude:: md/gat_examples.md
+.. mdinclude:: ../../../examples/gat/README.md
--- a/docs/source/examples/gat_examples_code.rst
+++ b/docs/source/examples/gat_examples_code.rst
-View the Code
-=============
-
-examples/gat/train.py
-
-.. literalinclude:: ../../../examples/gat/train.py
-    :language: python
-    :linenos:
--- a/docs/source/examples/gatne_examples.rst
+++ b/docs/source/examples/gatne_examples.rst
+.. mdinclude:: ../../../examples/GATNE/README.md
--- a/docs/source/examples/gcn_examples.rst
+++ b/docs/source/examples/gcn_examples.rst
-.. mdinclude:: md/gcn_examples.md
+.. mdinclude:: ../../../examples/gcn/README.md
--- a/docs/source/examples/gcn_examples_code.rst
+++ b/docs/source/examples/gcn_examples_code.rst
-View the Code
-=============
-
-examples/gcn/train.py
-
-.. literalinclude:: ../../../examples/gcn/train.py
-    :language: python
-    :linenos:
--- a/docs/source/examples/ges_examples.rst
+++ b/docs/source/examples/ges_examples.rst
+.. mdinclude:: ../../../examples/ges/README.md
--- a/docs/source/examples/graphsage_examples.rst
+++ b/docs/source/examples/graphsage_examples.rst
-.. mdinclude:: md/graphsage_examples.md
+.. mdinclude:: ../../../examples/graphsage/README.md
--- a/docs/source/examples/graphsage_examples_code.rst
+++ b/docs/source/examples/graphsage_examples_code.rst
-View the Code
-=============
-
-examples/graphsage/train.py
-
-.. literalinclude:: ../../../examples/graphsage/train.py
-    :language: python
-    :linenos:
-
-examples/graphsage/reader.py
-
-.. literalinclude:: ../../../examples/graphsage/reader.py
-    :language: python
-    :linenos:
-
-examples/graphsage/model.py
-
-.. literalinclude:: ../../../examples/graphsage/model.py
-    :language: python
-    :linenos:
--- a/docs/source/examples/line_examples.rst
+++ b/docs/source/examples/line_examples.rst
+.. mdinclude:: ../../../examples/line/README.md
--- a/docs/source/examples/md/gat_examples.md
+++ b/docs/source/examples/md/gat_examples.md
-# Building Graph Attention Networks
-
-[Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of indicators as the paper in citation network benchmarks.
-### Simple example to build single head GAT
-
-To build a gat layer,  one can use our pre-defined ```pgl.layers.gat``` or just write a gat layer with message passing interface.
-```python
-import paddle.fluid as fluid
-def gat_layer(graph_wrapper, node_feature, hidden_size):
-    def send_func(src_feat, dst_feat, edge_feat):
-        logits = src_feat["a1"] + dst_feat["a2"]
-        logits = fluid.layers.leaky_relu(logits, alpha=0.2)
-        return {"logits": logits, "h": src_feat }
-
-    def recv_func(msg):
-        norm = fluid.layers.sequence_softmax(msg["logits"])
-        output = msg["h"] * norm
-        return output
-
-    h = fluid.layers.fc(node_feature, hidden_size, bias_attr=False, name="hidden")
-    a1 = fluid.layers.fc(node_feature, 1, name="a1_weight")
-    a2 = fluid.layers.fc(node_feature, 1, name="a2_weight")
-    message = graph_wrapper.send(send_func,
-            nfeat_list=[("h", h), ("a1", a1), ("a2", a2)])
-    output = graph_wrapper.recv(recv_func, message)
-    return output
-```
-
-### Datasets
-
-The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
-
-### Dependencies
-
- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
- pgl
-
-### Performance
-
-We train our models for 200 epochs and report the accuracy on the test dataset.
-
-| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)|
-| --- | --- | --- |---|
-| Cora | ~83% | 0.0188s | 0.0175s |
-| Pubmed | ~78% | 0.0449s  | 0.0295s |
-| Citeseer | ~70% | 0.0275 | 0.0253s |
-
-### How to run
-
-For examples, use gpu to train gat on cora dataset.
-```
-python train.py --dataset cora --use_cuda
-```
-
-#### Hyperparameters
-
- dataset: The citation dataset "cora", "citeseer", "pubmed".
- use_cuda: Use gpu if assign use_cuda.
-
-
-### View the Code
-
-See the code [here](gat_examples_code.html)
--- a/docs/source/examples/md/gcn_examples.md
+++ b/docs/source/examples/md/gcn_examples.md
-# Building Graph Convolutional Network
-
-
-[Graph Convolutional Network \(GCN\)](https://arxiv.org/abs/1609.02907) is a powerful neural network designed for machine learning on graphs. Based on PGL, we reproduce GCN algorithms and reach the same level of indicators as the paper in citation network benchmarks.
-
-### Simple example to build GCN
-
-To build a gcn layer, one can use our pre-defined ```pgl.layers.gcn``` or just write a gcn layer with message passing interface.
-```python
-import paddle.fluid as fluid
-def gcn_layer(graph_wrapper, node_feature, hidden_size, act):
-    def send_func(src_feat, dst_feat, edge_feat):
-        return src_feat["h"]
-    
-    def recv_func(msg):
-        return fluid.layers.sequence_pool(msg, "sum")
-    
-    message = graph_wrapper.send(send_func, nfeat_list=[("h", node_feature)])
-    output = graph_wrapper.recv(recv_func, message)
-    output = fluid.layers.fc(output, size=hidden_size, act=act)
-    return output
-```
-
-### Datasets
-
-The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
-
-### Dependencies
-
- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
- pgl
-
-### Performance
-
-We train our models for 200 epochs and report the accuracy on the test dataset.
-
-| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)|
-| --- | --- | --- |---|
-| Cora | ~81% | 0.0106s | 0.0104s | 
-| Pubmed | ~79% | 0.0210s  | 0.0154s |
-| Citeseer | ~71% | 0.0175s | 0.0177s | 
-
-
-### How to run
-
-For examples, use gpu to train gcn on cora dataset.
-```
-python train.py --dataset cora --use_cuda
-```
-
-#### Hyperparameters
-
- dataset: The citation dataset "cora", "citeseer", "pubmed".
- use_cuda: Use gpu if assign use_cuda. 
-
-
-### View the Code
-
-See the code [here](gcn_examples_code.html)
--- a/docs/source/examples/md/node2vec_examples.md
+++ b/docs/source/examples/md/node2vec_examples.md
-# Graph Representation Learning: Node2vec
-
-
-[Node2vec](https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf) is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce node2vec algorithms and reach the same level of indicators as the paper.
-## Datasets
-The datasets contain two networks: [BlogCatalog](http://socialcomputing.asu.edu/datasets/BlogCatalog3) and [Arxiv](http://snap.stanford.edu/data/ca-AstroPh.html).
-## Dependencies
- paddlepaddle>=1.4
- pgl
-
-## How to run
-
-For examples, use gpu to train gcn on cora dataset.
-```sh
-# multiclass task example
-python node2vec.py --use_cuda --dataset BlogCatalog --save_path ./tmp/node2vec_BlogCatalog/ --offline_learning --epoch 400
-
-python multi_class.py --use_cuda --ckpt_path ./tmp/node2vec_BlogCatalog/paddle_model --epoch 1000
-
-# link prediction task example
-python node2vec.py --use_cuda --dataset ArXiv --save_path
-./tmp/node2vec_ArXiv --offline_learning --epoch 10
-
-python link_predict.py --use_cuda --ckpt_path ./tmp/node2vec_ArXiv/paddle_model --epoch 400
-```
-## Hyperparameters
- dataset: The citation dataset "BlogCatalog" and "ArXiv".
- use_cuda: Use gpu if assign use_cuda.
-
-### Experiment results
-Dataset|model|Task|Metric|PGL Result|Reported Result
--|--|--|--|--|--
-BlogCatalog|deepwalk|multi-label classification|MacroF1|0.250|0.211
-BlogCatalog|node2vec|multi-label classification|MacroF1|0.262|0.258
-ArXiv|deepwalk|link prediction|AUC|0.9538|0.9340
-ArXiv|node2vec|link prediction|AUC|0.9541|0.9366
-
-
-## View the Code
-
-See the code [here](node2vec_examples_code.html)
--- a/docs/source/examples/md/static_gat_examples.md
+++ b/docs/source/examples/md/static_gat_examples.md
-# StaticGraphWrapper for GAT Speed Optimization
-
-[Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of indicators as the paper in citation network benchmarks.
-
-However, different from the reproduction in **examples/gat**, we use `pgl.graph_wrapper.StaticGraphWrapper` to preload the graph data into gpu or cpu memories which achieves better performance on speed.
-
-
-### Datasets
-
-The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
-
-### Dependencies
-
- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
- pgl
-
-### Performance
-
-We train our models for 200 epochs and report the accuracy on the test dataset.
-
-
-| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)| examples/gat | Improvement |
-| --- | --- | --- |---| --- | --- |
-| Cora | ~83% | 0.0145s | 0.0119s | 0.0175s | 1.47x |
-| Pubmed | ~78% | 0.0352s | 0.0193s |0.0295s | 1.53x |
-| Citeseer | ~70% | 0.0148s | 0.0124s |0.0253s | 2.04x |
-
-### How to run
-
-For examples, use gpu to train gat on cora dataset.
-```sh
-python train.py --dataset cora --use_cuda
-```
-
-#### Hyperparameters
-
- dataset: The citation dataset "cora", "citeseer", "pubmed".
- use_cuda: Use gpu if assign use_cuda. 
-
-### View the Code
-
-See the code [here](static_gat_examples_code.html)
--- a/docs/source/examples/md/static_gcn_examples.md
+++ b/docs/source/examples/md/static_gcn_examples.md
-# StaticGraphWrapper for GCN Speed Optimization
-
-[Graph Convolutional Network \(GCN\)](https://arxiv.org/abs/1609.02907) is a powerful neural network designed for machine learning on graphs. Based on PGL, we reproduce GCN algorithms and reach the same level of indicators as the paper in citation network benchmarks.
-
-However, different from the reproduction in **examples/gcn**, we use `pgl.graph_wrapper.StaticGraphWrapper` to preload the graph data into gpu or cpu memories which achieves better performance on speed.
-
-### Datasets
-
-The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
-
-### Dependencies
-
- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
- pgl
-
-### Performance
-
-We train our models for 200 epochs and report the accuracy on the test dataset.
-
-
-| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)| examples/gcn | Improvement |
-| --- | --- | --- |---| --- | --- |
-| Cora | ~81% | 0.0053s | 0.0047s | 0.0104s | 2.21x |
-| Pubmed | ~79% | 0.0105s  | 0.0049s |0.0154s | 3.14x |
-| Citeseer | ~71% | 0.0051s | 0.0045s |0.0177s | 3.93x |
-
-
-
-### How to run
-
-For examples, use gpu to train gcn on cora dataset.
-```sh
-python train.py --dataset cora --use_cuda
-```
-
-#### Hyperparameters
-
- dataset: The citation dataset "cora", "citeseer", "pubmed".
- use_cuda: Use gpu if assign use_cuda.
-
-
-### View the Code
-
-See the code [here](static_gcn_examples_code.html)
--- a/docs/source/examples/metapath2vec_examples.rst
+++ b/docs/source/examples/metapath2vec_examples.rst
+.. mdinclude:: ../../../examples/metapath2vec/README.md
--- a/docs/source/examples/node2vec_examples.rst
+++ b/docs/source/examples/node2vec_examples.rst
-.. mdinclude:: md/node2vec_examples.md
+.. mdinclude:: ../../../examples/node2vec/README.md
--- a/docs/source/examples/node2vec_examples_code.rst
+++ b/docs/source/examples/node2vec_examples_code.rst
-View the Code
-=============
-
-examples/node2vec/node2vec.py
-
-.. literalinclude:: ../../../examples/node2vec/node2vec.py
-    :language: python
-    :linenos:
-
-examples/node2vec/multi_class.py
-
-.. literalinclude:: ../../../examples/node2vec/multi_class.py
-    :language: python
-    :linenos:
-
-examples/node2vec/link_predict.py
-
-.. literalinclude:: ../../../examples/node2vec/link_predict.py
-    :language: python
-    :linenos:
--- a/docs/source/examples/sgc_examples.rst
+++ b/docs/source/examples/sgc_examples.rst
+.. mdinclude:: ../../../examples/sgc/README.md
--- a/docs/source/examples/static_gat_examples.rst
+++ b/docs/source/examples/static_gat_examples.rst
-.. mdinclude:: md/static_gat_examples.md
+.. mdinclude:: ../../../examples/static_gat/README.md
--- a/docs/source/examples/static_gat_examples_code.rst
+++ b/docs/source/examples/static_gat_examples_code.rst
-View the Code
-=============
-
-examples/static_gat/train.py
-
-.. literalinclude:: ../../../examples/static_gat/train.py
-    :language: python
-    :linenos:
--- a/docs/source/examples/static_gcn_examples.rst
+++ b/docs/source/examples/static_gcn_examples.rst
-.. mdinclude:: md/static_gcn_examples.md
+.. mdinclude:: ../../../examples/static_gcn/README.md
--- a/docs/source/examples/static_gcn_examples_code.rst
+++ b/docs/source/examples/static_gcn_examples_code.rst
-View the Code
-=============
-
-examples/static_gcn/train.py
-
-.. literalinclude:: ../../../examples/static_gcn/train.py
-    :language: python
-    :linenos:
--- a/docs/source/examples/strucvec_examples.rst
+++ b/docs/source/examples/strucvec_examples.rst
+.. mdinclude:: ../../../examples/strucvec/README.md
--- a/docs/source/examples/unsup_graphsage_examples.rst
+++ b/docs/source/examples/unsup_graphsage_examples.rst
+.. mdinclude:: ../../../examples/unsup_graphsage/README.md
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -15,14 +15,9 @@ Quick Start
 .. toctree::
    :maxdepth: 1
    :caption: Quick Start
-    :hidden:
-
-    instruction.rst
-
-See instruction_ for quick start.
-
-.. _instruction: instruction.html

+    quick_start/instruction.rst
+    quick_start/introduction_for_hetergraph.rst


 .. toctree::
@@ -34,7 +29,16 @@ See instruction_ for quick start.
   examples/static_graph_wrapper.rst
   examples/node2vec_examples.rst
   examples/graphsage_examples.rst
-
+   examples/dgi_examples.rst
+   examples/distribute_deepwalk_examples.rst
+   examples/distribute_graphsage_examples.rst
+   examples/ges_examples.rst
+   examples/line_examples.rst
+   examples/sgc_examples.rst
+   examples/strucvec_examples.rst
+   examples/gatne_examples.rst
+   examples/metapath2vec_examples.rst
+   examples/unsup_graphsage_examples.rst

 .. toctree::
   :maxdepth: 2

--- a/docs/source/md/introduction.md
+++ b/docs/source/md/introduction.md
-# Paddle Graph Learning (PGL) 
+# Paddle Graph Learning (PGL)

-
-Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle). 
+Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle).


 <div />
@@ -9,17 +8,18 @@ Paddle Graph Learning (PGL) is an efficient and flexible graph learning framewor
 <center>The Framework of Paddle Graph Learning (PGL)</center>
 <div />

-We provide python interfaces for storing/reading/querying graph structured data and two fundamental computational interfaces, which are walk based paradigm and message-passing based paradigm as shown in the above framework of PGL, for building cutting-edge graph learning algorithms.  Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.
+The newly released PGL supports heterogeneous graph learning on both walk based paradigm and message-passing based paradigm by providing MetaPath sampling and Message Passing mechanism on heterogeneous graph. Furthermor, The newly released PGL also support distributed graph storage and some distributed training algorithms, such as distributed deep walk and distributed graphsage. Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.
+
+## Highlight: Efficiency - Support Scatter-Gather and LodTensor Message Passing

-## Highlight: Efficient and Flexible <br/> Message Passing Paradigm
+One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt **Message Passing Paradigm** similar to DGL to help to build a customize graph neural network easily. Users only need to write ``send`` and ``recv`` functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function $\phi^e$ to send the message from the source to the target node. For the second step, the recv function $\phi^v$ is responsible for aggregating $\oplus$ messages together from different sources.

 <div />
 <div align=center><img src="_static/message_passing_paradigm.png" width="700"></div>
 <center>The basic idea of message passing paradigm</center>
 <div />

-
-As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function $\oplus$ on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**. 
+As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function $\oplus$ on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**.


 <div/>
@@ -28,21 +28,22 @@ As shown in the left of the following figure, to adapt general user-defined mess
 <div/>


-Users only need to call the ```sequence_ops``` functions provided by Paddle to easily implement efficient message aggregation. For examples, using ```sequence_pool``` to sum the neighbor message.
+Users only need to call the ``sequence_ops`` functions provided by Paddle to easily implement efficient message aggregation. For examples, using ``sequence_pool`` to sum the neighbor message.
 ```python
    import paddle.fluid as fluid
    def recv(msg):
        return fluid.layers.sequence_pool(msg, "sum")
 ```

-Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.

+Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. In our experiments, PGL can reach up to 13 times the speed of DGL with complex user-defined functions. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.

-## Performance
+### Performance

-We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.

-| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL speed (epoch time) |
+We test all the following GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.
+
+| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) |
 | -------- | ----- | ----------------- | ------------ | ------------------------------------ |
 | Cora | GCN |81.75% | 0.0047s | **0.0045s** |
 | Cora | GAT | 83.5% | **0.0119s** | 0.0141s |
@@ -50,3 +51,95 @@ We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs t
 | Pubmed | GAT | 77% |0.0193s|**0.0144s**|
 | Citeseer | GCN |70.2%| **0.0045** |0.0046s|
 | Citeseer | GAT |68.8%| **0.0124s** |0.0139s|
+
+
+If we use complex user-defined aggregation like [GraphSAGE-LSTM](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) that aggregates neighbor features with LSTM ignoring the order of recieved messages, the optimized message-passing in DGL will be forced to degenerate into degree bucketing scheme. The speed performance will be much slower than the one implemented in PGL. Performances may be various with different scale of the graph, in our experiments, PGL can reach up to 13 times the speed of DGL.
+
+| Dataset |   PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) | Speed up|
+| -------- |  ------------ | ------------------------------------ |----|
+| Cora | **0.0186s** | 0.1638s | 8.80x|
+| Pubmed | **0.0388s** |0.5275s | 13.59x|
+| Citeseer | **0.0150s** | 0.1278s | 8.52x |
+
+## Highlight: Flexibility - Natively Support Heterogeneous Graph Learning
+
+Graph can conveniently represent the relation between things in the real world, but the categories of things and the relation between things are various. Therefore, in the heterogeneous graph, we need to distinguish the node types and edge types in the graph network. PGL models heterogeneous graphs that contain multiple node types and multiple edge types, and can describe complex connections between different types.
+
+### Support meta path walk sampling on heterogeneous graph
+
+<div/>
+<div align=center><img src="_static/metapath_sampling.png"  width="750"></div>
+<center>The metapath sampling in heterogeneous graph</center>
+<div/>
+The left side of the figure above describes a shopping social network. The nodes above have two categories of users and goods, and the relations between users and users, users and goods, and goods and goods. The right of the above figure is a simple sampling process of MetaPath. When you input any MetaPath as UPU (user-product-user), you will find the following results
+<div/>
+<div align=center><img src="_static/metapath_result.png"  width="300"></div>
+<center>The metapath result</center>
+<div/>
+Then on this basis, and introducing word2vec and other methods to support learning metapath2vec and other algorithms of heterogeneous graph representation.
+
+### Support Message Passing mechanism on heterogeneous graph
+
+<div/>
+<div align=center><img src="_static/him_message_passing.png"  width="750"></div>
+<center>The message passing mechanism on heterogeneous graph</center>
+<div/>
+Because of the different node types on the heterogeneous graph, the message delivery is also different. As shown on the left, it has five neighbors, belonging to two different node types. As shown on the right of the figure above, nodes belonging to different types need to be aggregated separately during message delivery, and then merged into the final message to update the target node. On this basis, PGL supports heterogeneous graph algorithms based on message passing, such as GATNE and other algorithms.
+
+
+## Large-Scale: Support distributed graph storage and distributed training algorithms
+In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted [PaddleFleet](https://github.com/PaddlePaddle/Fleet) as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so tcan easily set up a large scale distributed training algorithm with MPI clusters.
+
+<div/>
+<div align=center><img src="_static/distributed_frame.png"  width="750"></div>
+<center>The distributed frame of PGL</center>
+<div/>
+
+
+## Model Zoo
+The following are 13 graph learning models that have been implemented in the framework.
+
+|Model | feature |
+|---|---|
+| [GCN](examples/gcn_examples.html)| Graph Convolutional Neural Networks |
+| [GAT](examples/gat_examples.html)| Graph Attention Network |
+| [GraphSage](examples/graphsage_examples.html)|Large-scale graph convolution network based on neighborhood sampling|
+| [unSup-GraphSage](examples/unsup_graphsage_examples.html) | Unsupervised GraphSAGE |
+| [LINE](examples/line_examples.html)| Representation learning based on first-order and second-order neighbors |
+| [DeepWalk](examples/distribute_deepwalk_examples.html)| Representation learning by DFS random walk |
+| [MetaPath2Vec](examples/metapath2vec_examples.html)| Representation learning based on metapath |
+| [Node2Vec](examples/node2vec_examples.html)| The representation learning Combined with DFS and BFS  |
+| [Struct2Vec](examples/strucvec_examples.html)| Representation learning based on structural similarity |
+| [SGC](examples/sgc_examples.html)| Simplified graph convolution neural network |
+| [GES](examples/ges_examples.html)| The graph represents learning method with node features |
+| [DGI](examples/dgi_examples.html)| Unsupervised representation learning based on graph convolution network |
+| [GATNE](examples/gatne_examples.html)| Representation Learning of Heterogeneous Graph based on MessagePassing |
+
+The above models consists of three parts, namely, graph representation learning, graph neural network and heterogeneous graph learning, which are also divided into graph representation learning and graph neural network.
+
+## System requirements
+
+PGL requires:
+
+* paddle >= 1.6
+* cython
+
+
+PGL supports both Python 2 & 3
+
+
+## Installation
+
+You can simply install it via pip.
+
+```sh
+pip install pgl
+```
+
+## The Team
+
+PGL is developed and maintained by NLP and Paddle Teams at Baidu
+
+## License
+
+PGL uses Apache License 2.0.
--- a/docs/source/quick_start/images/heter_graph_introduction.png
+++ b/docs/source/quick_start/images/heter_graph_introduction.png
--- a/docs/source/quick_start/images/quick_start_graph.png
+++ b/docs/source/quick_start/images/quick_start_graph.png
--- a/docs/source/instruction.rst
+++ b/docs/source/instruction.rst
@@ -8,8 +8,7 @@ To install Paddle Graph Learning, we need the following packages.

 .. code-block:: sh

-    paddlepaddle >= 1.4 (Faster performance on 1.5)
-    networkx
+    paddlepaddle >= 1.6
    cython

 We can simply install pgl by pip.

--- a/docs/source/quick_start/introduction_for_hetergraph.rst
+++ b/docs/source/quick_start/introduction_for_hetergraph.rst
+Quick Start with Heterogenous Graph
+========================
+
+Install PGL
+-----------
+To install Paddle Graph Learning, we need the following packages.
+
+
+.. code-block:: sh
+
+    paddlepaddle >= 1.6
+    cython
+
+We can simply install pgl by pip.
+
+.. code-block:: sh
+
+    pip install pgl
+
+.. mdinclude:: md/quick_start_for_heterGraph.md
+
+
--- a/docs/source/md/quick_start.md
+++ b/docs/source/md/quick_start.md
 ## Step 1: using PGL to create a graph 
 Suppose we have a graph with 10 nodes and 14 edges as shown in the following figure:
-![A simple graph](_static/quick_start_graph.png)
+![A simple graph](images/quick_start_graph.png)

 Our purpose is to train a graph neural network to classify yellow and green nodes. So we can create this graph in such way:
 ```python
@@ -49,7 +49,7 @@ Currently our PGL is developed based on static computational mode of paddle (we
 import paddle.fluid as fluid

 use_cuda = False  
-place = fluid.GPUPlace(0) if use_cuda else fluid.CPUPlace()
+place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

 # use GraphWrapper as a container for graph data to construct a graph neural network
 gw = pgl.graph_wrapper.GraphWrapper(name='graph',
@@ -95,7 +95,7 @@ After defining the GCN layer, we can construct a deeper GCN model with two GCN l
 ```python
 output = gcn_layer(gw, gw.node_feat['feature'],
                hidden_size=8, name='gcn_layer_1', activation='relu')
-output = gcn_layer(gw, output, hidden_size=2,
+output = gcn_layer(gw, output, hidden_size=1,
                name='gcn_layer_2', activation=None)
 ```


--- a/docs/source/quick_start/md/quick_start_for_heterGraph.md
+++ b/docs/source/quick_start/md/quick_start_for_heterGraph.md
+## Introduction
+
+In real world, there exists many graphs contain multiple types of nodes and edges, which we call them Heterogeneous Graphs. Obviously, heterogenous graphs are more complex than homogeneous graphs. 
+
+To deal with such heterogeneous graphs, PGL develops a graph framework to support graph neural network computations and meta-path-based sampling on heterogenous graph.
+
+The goal of this tutorial:
+* example of heterogenous graph data;
+* Understand how PGL supports computations in heterogenous graph;
+* Using PGL to implement a simple heterogenous graph neural network model to classfiy a particular type of node in a heterogenous graph network.
+
+## Example of heterogenous graph
+
+There are a lot of graph data that consists of edges and nodes of multiple types. For example, **e-commerce network** is very common heterogenous graph in real world. It contains at least two types of nodes (user and item) and two types of edges (buy and click). 
+
+The following figure depicts several users click or buy some items. This graph has two types of nodes corresponding to "user" and "item". It also contain two types of edge "buy" and "click".
+![A simple heterogenous e-commerce graph](images/heter_graph_introduction.png)
+
+## Creating a heterogenous graph with PGL 
+
+In heterogenous graph, there exists multiple edges, so we should distinguish them. In PGL, the edges are built in below format:
+```python
+edges = {
+    'click': [(0, 4), (0, 7), (1, 6), (2, 5), (3, 6)],
+    'buy': [(0, 5), (1, 4), (1, 6), (2, 7), (3, 5)],
+        }
+```
+
+In heterogenous graph, nodes are also of different types. Therefore, you need to mark the type of each node, the format of the node type is as follows:
+
+```python
+node_types = [(0, 'user'), (1, 'user'), (2, 'user'), (3, 'user'), (4, 'item'), 
+             (5, 'item'),(6, 'item'), (7, 'item')]
+```
+
+Because of the different types of edges, edge features also need to be separated by different types.
+
+```python
+import numpy as np
+
+num_nodes = len(node_types)
+
+node_features = {'features': np.random.randn(num_nodes, 8).astype("float32")}
+
+edge_num_list = []
+for edge_type in edges:
+    edge_num_list.append(len(edges[edge_type]))
+
+edge_features = {
+    'click': {'h': np.random.randn(edge_num_list[0], 4)},
+    'buy': {'h':np.random.randn(edge_num_list[1], 4)},
+}
+```
+
+Now, we can build a heterogenous graph by using PGL.
+
+```python
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+import pgl
+from pgl.contrib import heter_graph
+from pgl.contrib import heter_graph_wrapper
+
+g = heter_graph.HeterGraph(num_nodes=num_nodes,
+                            edges=edges,
+                            node_types=node_types,
+                            node_feat=node_features,
+                            edge_feat=edge_features)
+```
+
+
+
+In PGL, we need to use graph_wrapper as a container for graph data, so here we need to create a graph_wrapper for each type of edge graph.
+
+```python
+place = fluid.CPUPlace()
+
+# create a GraphWrapper as a container for graph data
+gw = heter_graph_wrapper.HeterGraphWrapper(name='heter_graph', 
+                                    place = place, 
+                                    edge_types = g.edge_types_info(),
+                                    node_feat=g.node_feat_info(),
+                                    edge_feat=g.edge_feat_info())
+```
+
+
+
+## MessagePassing
+
+After building the heterogeneous graph, we can easily carry out the message passing mode. In this case, we have two different types of edges, so we can write a function in such way:
+
+```python
+def message_passing(gw, edge_types, features, name=''):
+    def __message(src_feat, dst_feat, edge_feat): 
+        return src_feat['h']
+    def __reduce(feat):
+        return fluid.layers.sequence_pool(feat, pool_type='sum')
+    
+    assert len(edge_types) == len(features)
+    output = []
+    for i in range(len(edge_types)):
+        msg = gw[edge_types[i]].send(__message, nfeat_list=[('h', features[i])])
+        out = gw[edge_types[i]].recv(msg, __reduce)  
+        output.append(out)
+    # list of matrix
+    return output
+```
+
+```python
+edge_types = ['click', 'buy']
+features = []
+for edge_type in edge_types:
+    features.append(gw[edge_type].node_feat['features'])
+output = message_passing(gw, edge_types, features)
+
+output = fl.concat(input=output, axis=1)
+
+output = fluid.layers.fc(output, size=4, bias_attr=False, act='relu', name='fc1')
+logits = fluid.layers.fc(output, size=1, bias_attr=False, act=None, name='fc2')
+```
+
+
+
+## data preprocessing 
+
+In this case, we implement a simple node classifier, we can use 0,1 to represent two classes.
+
+```python
+y = [0,1,0,1,0,1,1,0]  
+label = np.array(y, dtype="float32").reshape(-1,1)
+```
+
+
+
+## Setting up the training program
+The training process of the heterogeneous graph node classification model is the same as the training of other paddlepaddle-based models.
+* First we build the loss function;
+* Second, creating a optimizer;
+* Finally, creating a executor and execute the training program.
+
+```python
+node_label = fluid.layers.data("node_label", shape=[None, 1], dtype="float32", append_batch_size=False)
+
+
+loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=logits, label=node_label)
+
+loss = fluid.layers.mean(loss)
+
+
+adam = fluid.optimizer.Adam(learning_rate=0.01)
+adam.minimize(loss)
+
+
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+feed_dict = gw.to_feed(g) 
+
+for epoch in range(30):
+    feed_dict['node_label'] = label
+    
+    train_loss = exe.run(fluid.default_main_program(), feed=feed_dict, fetch_list=[loss], return_numpy=True)
+    print('Epoch %d | Loss: %f'%(epoch, train_loss[0]))
+```
+
+
+
+
+
--- a/examples/GATNE/Dataset.py
+++ b/examples/GATNE/Dataset.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file loads and preprocesses the dataset for GATNE model.
+"""
+
+import sys
+import os
+import tqdm
+import numpy as np
+import logging
+import random
+from pgl import heter_graph
+import pickle as pkl
+
+
+class Dataset(object):
+    """Implementation of Dataset class
+
+    This is a simple implementation of loading and processing dataset for GATNE model.
+
+    Args:
+        config: dict, some configure parameters.
+    """
+
+    def __init__(self, config):
+        self.train_edges_file = config['data_path'] + 'train.txt'
+        self.valid_edges_file = config['data_path'] + 'valid.txt'
+        self.test_edges_file = config['data_path'] + 'test.txt'
+        self.nodes_file = config['data_path'] + 'nodes.txt'
+
+        self.config = config
+
+        self.word2index = self.load_word2index()
+
+        self.build_graph()
+
+        self.valid_data = self.load_test_data(self.valid_edges_file)
+        self.test_data = self.load_test_data(self.test_edges_file)
+
+    def build_graph(self):
+        """Build pgl heterogeneous graph. 
+        """
+        edge_data_by_type, all_edges, all_nodes = self.load_training_data(
+            self.train_edges_file,
+            slf_loop=self.config['slf_loop'],
+            symmetry_edge=self.config['symmetry_edge'])
+
+        num_nodes = len(all_nodes)
+        node_features = {
+            'index': np.array(
+                [i for i in range(num_nodes)], dtype=np.int64).reshape(-1, 1)
+        }
+
+        self.graph = heter_graph.HeterGraph(
+            num_nodes=num_nodes,
+            edges=edge_data_by_type,
+            node_types=None,
+            node_feat=node_features)
+
+        self.edge_types = sorted(self.graph.edge_types_info())
+        logging.info('total %d nodes are loaded' % (self.graph.num_nodes))
+
+    def load_training_data(self, file_, slf_loop=True, symmetry_edge=True):
+        """Load train data from file and preprocess them.
+
+        Args:
+            file_: str, file name for loading data
+            slf_loop: bool, if true, add self loop edge for every node
+            symmetry_edge: bool, if true, add symmetry edge for every edge
+
+        """
+        logging.info('loading data from %s' % file_)
+        edge_data_by_type = dict()
+        all_edges = list()
+        all_nodes = list()
+
+        with open(file_, 'r') as reader:
+            for line in reader:
+                words = line.strip().split(' ')
+                if words[0] not in edge_data_by_type:
+                    edge_data_by_type[words[0]] = []
+                src, dst = words[1], words[2]
+                edge_data_by_type[words[0]].append((src, dst))
+                all_edges.append((src, dst))
+                all_nodes.append(src)
+                all_nodes.append(dst)
+
+                if symmetry_edge:
+                    edge_data_by_type[words[0]].append((dst, src))
+                    all_edges.append((dst, src))
+
+        all_nodes = list(set(all_nodes))
+        all_edges = list(set(all_edges))
+        #  edge_data_by_type['Base'] = all_edges
+
+        if slf_loop:
+            for e_type in edge_data_by_type.keys():
+                for n in all_nodes:
+                    edge_data_by_type[e_type].append((n, n))
+
+        # remapping to index
+        edges_by_type = {}
+        for edge_type, edges in edge_data_by_type.items():
+            res_edges = []
+            for edge in edges:
+                res_edges.append(
+                    (self.word2index[edge[0]], self.word2index[edge[1]]))
+            edges_by_type[edge_type] = res_edges
+
+        return edges_by_type, all_edges, all_nodes
+
+    def load_test_data(self, file_):
+        """Load testing data from file. 
+        """
+        logging.info('loading data from %s' % file_)
+
+        true_edge_data_by_type = {}
+        fake_edge_data_by_type = {}
+        with open(file_, 'r') as reader:
+            for line in reader:
+                words = line.strip().split(' ')
+                src, dst = self.word2index[words[1]], self.word2index[words[2]]
+                e_type = words[0]
+                if int(words[3]) == 1:  # true edges
+                    if e_type not in true_edge_data_by_type:
+                        true_edge_data_by_type[e_type] = list()
+                    true_edge_data_by_type[e_type].append((src, dst))
+                else:  # fake edges
+                    if e_type not in fake_edge_data_by_type:
+                        fake_edge_data_by_type[e_type] = list()
+                    fake_edge_data_by_type[e_type].append((src, dst))
+
+        return (true_edge_data_by_type, fake_edge_data_by_type)
+
+    def load_word2index(self):
+        """Load words(nodes) from file and map to index.
+        """
+        word2index = {}
+        with open(self.nodes_file, 'r') as reader:
+            for index, line in enumerate(reader):
+                node = line.strip()
+                word2index[node] = index
+
+        return word2index
+
+    def generate_walks(self):
+        """Generate random walks for every edge type.
+        """
+        all_walks = {}
+        for e_type in self.edge_types:
+            layer_walks = self.simulate_walks(
+                edge_type=e_type,
+                num_walks=self.config['num_walks'],
+                walk_length=self.config['walk_length'])
+
+            all_walks[e_type] = layer_walks
+
+        return all_walks
+
+    def simulate_walks(self, edge_type, num_walks, walk_length, schema=None):
+        """Generate random walks in specified edge type.
+        """
+        walks = []
+        nodes = list(range(0, self.graph[edge_type].num_nodes))
+
+        for walk_iter in tqdm.tqdm(range(num_walks)):
+            random.shuffle(nodes)
+            for node in nodes:
+                walk = self.graph[edge_type].random_walk(
+                    [node], max_depth=walk_length - 1)
+                for i in range(len(walk)):
+                    walks.append(walk[i])
+
+        return walks
+
+    def generate_pairs(self, all_walks):
+        """Generate word pairs for training.
+        """
+        logging.info(['edge_types before generate pairs', self.edge_types])
+
+        pairs = []
+        skip_window = self.config['win_size'] // 2
+        for layer_id, e_type in enumerate(self.edge_types):
+            walks = all_walks[e_type]
+            for walk in tqdm.tqdm(walks):
+                for i in range(len(walk)):
+                    for j in range(1, skip_window + 1):
+                        if i - j >= 0 and walk[i] != walk[i - j]:
+                            neg_nodes = self.graph[e_type].sample_nodes(
+                                self.config['neg_num'])
+                            pairs.append(
+                                (walk[i], walk[i - j], *neg_nodes, layer_id))
+                        if i + j < len(walk) and walk[i] != walk[i + j]:
+                            neg_nodes = self.graph[e_type].sample_nodes(
+                                self.config['neg_num'])
+                            pairs.append(
+                                (walk[i], walk[i + j], *neg_nodes, layer_id))
+        return pairs
+
+    def fetch_batch(self, pairs, batch_size, for_test=False):
+        """Produce batch pairs data for training.
+        """
+        np.random.shuffle(pairs)
+        n_batches = (len(pairs) + (batch_size - 1)) // batch_size
+        neg_num = len(pairs[0]) - 3
+
+        result = []
+        for i in range(1, n_batches):
+            batch_pairs = np.array(
+                pairs[batch_size * (i - 1):batch_size * i], dtype=np.int64)
+            x = batch_pairs[:, 0].reshape(-1, ).astype(np.int64)
+            y = batch_pairs[:, 1].reshape(-1, 1, 1).astype(np.int64)
+            neg = batch_pairs[:, 2:2 + neg_num].reshape(-1, neg_num,
+                                                        1).astype(np.int64)
+            t = batch_pairs[:, -1].reshape(-1, 1).astype(np.int64)
+            result.append((x, y, neg, t))
+        return result
+
+
+if __name__ == "__main__":
+    config = {
+        'data_path': './data/youtube/',
+        'train_pairs_file': 'train_pairs.pkl',
+        'slf_loop': True,
+        'symmetry_edge': True,
+        'num_walks': 20,
+        'walk_length': 10,
+        'win_size': 5,
+        'neg_num': 5,
+    }
+
+    log_format = '%(asctime)s-%(levelname)s-%(name)s: %(message)s'
+    logging.basicConfig(level='INFO', format=log_format)
+
+    dataset = Dataset(config)
+
+    logging.info('generating walks')
+    all_walks = dataset.generate_walks()
+    logging.info('finishing generate walks')
+    logging.info(['length of all walks: ', all_walks.keys()])
+
+    train_pairs = dataset.generate_pairs(all_walks)
+    pkl.dump(train_pairs,
+             open(config['data_path'] + config['train_pairs_file'], 'wb'))
+    logging.info('finishing generate train_pairs')
--- a/examples/GATNE/README.md
+++ b/examples/GATNE/README.md
+# GATNE: General Attributed Multiplex HeTerogeneous Network Embedding
+[GATNE](https://arxiv.org/pdf/1905.01669.pdf) is a algorithms framework for embedding large-scale Attributed Multiplex Heterogeneous Networks(AMHN). Given a heterogeneous graph, which consists of nodes and edges of multiple types, it can learn continuous feature representations for every node. Based on PGL, we reproduce GATNE algorithm. 
+
+## Datasets
+YouTube dataset contains 2000 nodes, 1310617 edges and 5 edge types. And we use YouTube dataset for example.
+
+You can dowload YouTube datasets from [here](https://github.com/THUDM/GATNE/tree/master/data)
+
+After downloading the data, put them, let's say, in ./data/ . Note that the current directory is the root directory of GATNE model. Then in ./data/youtube/ directory, there are three files:
+* train.txt
+* valid.txt
+* test.txt
+
+Then you can run the below command to preprocess the data.
+```sh
+python data_process.py --input_file ./data/youtube/train.txt --output_file ./data/youtube/nodes.txt
+```
+
+## Dependencies
+- paddlepaddle>=1.6
+- pgl>=1.0.0
+
+## Hyperparameters
+All the hyper parameters are saved in config.yaml file. So before training GATNE model, you can open the config.yaml to modify the hyper parameters as you like.
+
+for example, you can change the \"use_cuda\" to \"True \" in order to use GPU for training or modify \"data_path\" to use different dataset.
+
+Some important hyper parameters in config.yaml:
+- use_cuda: use GPU to train model
+- data_path: the directory of dataset
+- lr: learning rate
+- neg_num: number of negatie samples.
+- num_walks: number of walks started from each node
+- walk_length: walk length
+
+## How to run
+Then run the below command:
+```sh
+python main.py -c config.yaml
+```
+
+### Experiment results
+|     | PGL result | Reported result |
+|:---:|------------|-----------------|
+| AUC | 84.83     | 84.61           |
+| PR  | 82.77     | 81.93           |
+| F1  | 76.98     | 76.83           |
--- a/examples/GATNE/config.yaml
+++ b/examples/GATNE/config.yaml
+task_name: train.gatne
+use_cuda: True
+log_level: info 
+seed: 1667
+
+optimizer:
+    type:
+    args:
+        lr: 0.005
+
+trainer:
+    type: trainer
+    args:
+        epochs: 2
+        log_dir: logs/
+        save_dir: checkpoints/
+        output_dir: outputs/
+    
+data_loader:
+    type: Dataset
+    args:
+        data_path: ./data/youtube/
+        train_pairs_file: train_pairs.pkl
+        batch_size: 256
+        num_walks: 20  
+        walk_length: 10 
+        win_size: 5
+        neg_num: 5
+        slf_loop: True
+        symmetry_edge: True
+
+model:
+    type: GATNE
+    args:
+        dimensions: 200 
+        edge_dim: 32 
+        att_dim: 32   
+        att_head: 1
--- a/examples/GATNE/data_process.py
+++ b/examples/GATNE/data_process.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file preprocess the data before training.
+"""
+
+import sys
+import argparse
+
+
+def gen_nodes_file(file_, result_file):
+    """calculate the total number of nodes and save them for latter processing.
+    """
+    nodes = []
+    with open(file_, 'r') as reader:
+        for line in reader:
+            tokens = line.strip().split(' ')
+            nodes.append(tokens[1])
+            nodes.append(tokens[2])
+
+    nodes = list(set(nodes))
+    nodes.sort(key=int)
+    print('total number of nodes: %d' % len(nodes))
+    print('saving nodes file in %s' % (result_file))
+    with open(result_file, 'w') as writer:
+        for n in nodes:
+            writer.write(n + '\n')
+
+    print('finished')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='GATNE')
+    parser.add_argument(
+        '--input_file',
+        default='./data/youtube/train.txt',
+        type=str,
+        help='input file')
+    parser.add_argument(
+        '--output_file',
+        default='./data/youtube/nodes.txt',
+        type=str,
+        help='output file')
+    args = parser.parse_args()
+
+    print('generating nodes file')
+    gen_nodes_file(args.input_file, args.output_file)
--- a/examples/GATNE/main.py
+++ b/examples/GATNE/main.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the training process of GATNE model.
+"""
+
+import os
+import argparse
+import time
+import numpy as np
+import logging
+import pickle as pkl
+
+import pgl
+from pgl.utils import paddle_helper
+import paddle
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+
+from utils import *
+import Dataset
+import model as Model
+from sklearn.metrics import (auc, f1_score, precision_recall_curve,
+                             roc_auc_score)
+
+
+def set_seed(seed):
+    """Set random seed.
+    """
+    random.seed(seed)
+    np.random.seed(seed)
+
+
+def produce_model(exe, program, dataset, model, feed_dict):
+    """Output the learned model parameters for testing.
+    """
+    edge_types = dataset.edge_types
+    num_nodes = dataset.graph[edge_types[0]].num_nodes
+    edge_types_count = len(edge_types)
+    neg_num = dataset.config['neg_num']
+
+    final_model = {}
+    feed_dict['train_inputs'] = np.array(
+        [n for n in range(num_nodes)], dtype=np.int64).reshape(-1, )
+    feed_dict['train_labels'] = np.array(
+        [n for n in range(num_nodes)], dtype=np.int64).reshape(-1, 1, 1)
+    feed_dict['train_negs'] = np.tile(feed_dict['train_labels'],
+                                      (1, neg_num)).reshape(-1, neg_num, 1)
+
+    for i in range(edge_types_count):
+        feed_dict['train_types'] = np.array(
+            [i for _ in range(num_nodes)], dtype=np.int64).reshape(-1, 1)
+        edge_node_embed = exe.run(program,
+                                  feed=feed_dict,
+                                  fetch_list=[model.last_node_embed],
+                                  return_numpy=True)[0]
+        final_model[edge_types[i]] = edge_node_embed
+
+    return final_model
+
+
+def evaluate(final_model, edge_types, data):
+    """Calculate the AUC score, F1 score and PR score of the final model
+    """
+    edge_types_count = len(edge_types)
+    AUC, F1, PR = [], [], []
+
+    true_edge_data_by_type = data[0]
+    fake_edge_data_by_type = data[1]
+
+    for i in range(edge_types_count):
+        try:
+            local_model = final_model[edge_types[i]]
+            true_edges = true_edge_data_by_type[edge_types[i]]
+            fake_edges = fake_edge_data_by_type[edge_types[i]]
+        except Exception as e:
+            logging.warn('edge type not exists. %s' % str(e))
+            continue
+        tmp_auc, tmp_f1, tmp_pr = calculate_score(local_model, true_edges,
+                                                  fake_edges)
+        AUC.append(tmp_auc)
+        F1.append(tmp_f1)
+        PR.append(tmp_pr)
+
+    return {'AUC': np.mean(AUC), 'F1': np.mean(F1), 'PR': np.mean(PR)}
+
+
+def calculate_score(model, true_edges, fake_edges):
+    """Calculate the AUC score, F1 score and PR score of specified edge type
+    """
+    true_list = list()
+    prediction_list = list()
+    true_num = 0
+    for edge in true_edges:
+        tmp_score = get_score(model, edge)
+        if tmp_score is not None:
+            true_list.append(1)
+            prediction_list.append(tmp_score)
+            true_num += 1
+
+    for edge in fake_edges:
+        tmp_score = get_score(model, edge)
+        if tmp_score is not None:
+            true_list.append(0)
+            prediction_list.append(tmp_score)
+
+    sorted_pred = prediction_list[:]
+    sorted_pred.sort()
+    threshold = sorted_pred[-true_num]
+
+    y_pred = np.zeros(len(prediction_list), dtype=np.int32)
+    for i in range(len(prediction_list)):
+        if prediction_list[i] >= threshold:
+            y_pred[i] = 1
+
+    y_true = np.array(true_list)
+    y_scores = np.array(prediction_list)
+    ps, rs, _ = precision_recall_curve(y_true, y_scores)
+    return roc_auc_score(y_true, y_scores), f1_score(y_true, y_pred), auc(rs,
+                                                                          ps)
+
+
+def get_score(local_model, edge):
+    """Calculate the cosine similarity score between two nodes.
+    """
+    try:
+        vector1 = local_model[edge[0]]
+        vector2 = local_model[edge[1]]
+        return np.dot(vector1, vector2) / (np.linalg.norm(vector1) *
+                                           np.linalg.norm(vector2))
+    except Exception as e:
+        logging.warn('get_score warning: %s' % str(e))
+        return None
+        pass
+
+
+def run_epoch(epoch,
+              config,
+              dataset,
+              data,
+              train_prog,
+              test_prog,
+              model,
+              feed_dict,
+              exe,
+              for_test=False):
+    """Run training process of every epoch.
+    """
+    total_loss = []
+    for idx, batch_data in enumerate(data):
+        feed_dict['train_inputs'] = batch_data[0]
+        feed_dict['train_labels'] = batch_data[1]
+        feed_dict['train_negs'] = batch_data[2]
+        feed_dict['train_types'] = batch_data[3]
+
+        loss, lr = exe.run(train_prog,
+                           feed=feed_dict,
+                           fetch_list=[model.loss, model.lr],
+                           return_numpy=True)
+        total_loss.append(loss[0])
+
+        if (idx + 1) % 500 == 0:
+            avg_loss = np.mean(total_loss)
+            logging.info("epoch %d | step %d | lr %.4f | train_loss %f " %
+                         (epoch, idx + 1, lr, avg_loss))
+            total_loss = []
+
+    return avg_loss
+
+
+def save_model(program, exe, dataset, model, feed_dict, filename):
+    """Save model.
+    """
+    final_model = produce_model(exe, program, dataset, model, feed_dict)
+    logging.info('saving model in %s' % (filename))
+    pkl.dump(final_model, open(filename, 'wb'))
+
+
+def test(program, exe, dataset, model, feed_dict):
+    """Testing and validating.
+    """
+    final_model = produce_model(exe, program, dataset, model, feed_dict)
+    valid_result = evaluate(final_model, dataset.edge_types,
+                            dataset.valid_data)
+    test_result = evaluate(final_model, dataset.edge_types, dataset.test_data)
+
+    logging.info("valid_AUC %.4f | valid_PR %.4f | valid_F1 %.4f" %
+                 (valid_result['AUC'], valid_result['PR'], valid_result['F1']))
+    logging.info("test_AUC %.4f | test_PR %.4f | test_F1 %.4f" %
+                 (test_result['AUC'], test_result['PR'], test_result['F1']))
+
+    return test_result
+
+
+def main(config):
+    """main function for training GATNE model.
+    """
+    logging.info(config)
+
+    set_seed(config['seed'])
+
+    dataset = getattr(
+        Dataset, config['data_loader']['type'])(config['data_loader']['args'])
+    edge_types = dataset.graph.edge_types_info()
+    logging.info(['total edge types: ', edge_types])
+
+    # train_pairs is a list of tuple: [(src1, dst1, neg, e1), (src2, dst2, neg, e2)]
+    # e(int), edge num count, for select which edge embedding 
+    train_pairs_file = config['data_loader']['args']['data_path'] + \
+                    config['data_loader']['args']['train_pairs_file']
+    if os.path.exists(train_pairs_file):
+        logging.info('loading train pairs from pkl file %s' % train_pairs_file)
+        train_pairs = pkl.load(open(train_pairs_file, 'rb'))
+    else:
+        logging.info('generating walks')
+        all_walks = dataset.generate_walks()
+        logging.info('generating train pairs')
+        train_pairs = dataset.generate_pairs(all_walks)
+        logging.info('dumping train pairs to %s' % (train_pairs_file))
+        pkl.dump(train_pairs, open(train_pairs_file, 'wb'))
+
+    logging.info('total train pairs: %d' % (len(train_pairs)))
+    data = dataset.fetch_batch(train_pairs,
+                               config['data_loader']['args']['batch_size'])
+
+    place = fluid.CUDAPlace(0) if config['use_cuda'] else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    test_program = fluid.Program()
+
+    with fluid.program_guard(train_program, startup_program):
+        model = getattr(Model, config['model']['type'])(
+            config['model']['args'], dataset, place)
+
+    test_program = train_program.clone(for_test=True)
+    with fluid.program_guard(train_program, startup_program):
+        global_steps = len(data) * config['trainer']['args']['epochs']
+        model.backward(global_steps, config['optimizer']['args'])
+
+    # train
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    feed_dict = model.gw.to_feed(dataset.graph)
+
+    logging.info('test before training...')
+    test(test_program, exe, dataset, model, feed_dict)
+    logging.info('training...')
+    for epoch in range(1, 1 + config['trainer']['args']['epochs']):
+        train_result = run_epoch(epoch, config['trainer']['args'], dataset,
+                                 data, train_program, test_program, model,
+                                 feed_dict, exe)
+
+        logging.info('validating and testing...')
+        test_result = test(test_program, exe, dataset, model, feed_dict)
+
+        filename = os.path.join(config['trainer']['args']['save_dir'],
+                                'dict_embed_model_epoch_%d.pkl' % (epoch))
+        save_model(test_program, exe, dataset, model, feed_dict, filename)
+
+    logging.info(
+        "final_test_AUC %.4f | final_test_PR %.4f | fianl_test_F1 %.4f" % (
+            test_result['AUC'], test_result['PR'], test_result['F1']))
+
+    logging.info('training finished')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='GATNE')
+    parser.add_argument(
+        '-c',
+        '--config',
+        default=None,
+        type=str,
+        help='config file path (default: None)')
+    parser.add_argument(
+        '-n',
+        '--taskname',
+        default=None,
+        type=str,
+        help='task name(default: None)')
+    args = parser.parse_args()
+
+    if args.config:
+        # load config file
+        config = Config(args.config, isCreate=True, isSave=True)
+        config = config()
+    else:
+        raise AssertionError(
+            "Configuration file need to be specified. Add '-c config.yaml', for example."
+        )
+
+    log_format = '%(asctime)s-%(levelname)s-%(name)s: %(message)s'
+    logging.basicConfig(
+        level=getattr(logging, config['log_level'].upper()), format=log_format)
+
+    main(config)
--- a/examples/GATNE/model.py
+++ b/examples/GATNE/model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the GATNE model.
+"""
+
+import numpy as np
+import math
+import logging
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+from pgl import heter_graph_wrapper
+
+
+class GATNE(object):
+    """Implementation of GATNE model.
+
+    Args:
+        config: dict, some configure parameters.
+        dataset: instance of Dataset class
+        place: GPU or CPU place 
+    """
+
+    def __init__(self, config, dataset, place):
+        logging.info(['model is: ', self.__class__.__name__])
+        self.config = config
+        self.graph = dataset.graph
+        self.placce = place
+        self.edge_types = sorted(self.graph.edge_types_info())
+        logging.info('edge_types in model: %s' % str(self.edge_types))
+        neg_num = dataset.config['neg_num']
+
+        # hyper parameters
+        self.num_nodes = self.graph.num_nodes
+        self.embedding_size = self.config['dimensions']
+        self.embedding_u_size = self.config['edge_dim']
+        self.dim_a = self.config['att_dim']
+        self.att_head = self.config['att_head']
+        self.edge_type_count = len(self.edge_types)
+        self.u_num = self.edge_type_count
+
+        self.gw = heter_graph_wrapper.HeterGraphWrapper(
+            name="heter_graph",
+            place=place,
+            edge_types=self.graph.edge_types_info(),
+            node_feat=self.graph.node_feat_info(),
+            edge_feat=self.graph.edge_feat_info())
+
+        self.train_inputs = fl.data(
+            'train_inputs', shape=[None], dtype='int64')
+
+        self.train_labels = fl.data(
+            'train_labels', shape=[None, 1, 1], dtype='int64')
+
+        self.train_types = fl.data(
+            'train_types', shape=[None, 1], dtype='int64')
+
+        self.train_negs = fl.data(
+            'train_negs', shape=[None, neg_num, 1], dtype='int64')
+
+        self.forward()
+
+    def forward(self):
+        """Build the GATNE net.
+        """
+        param_attr_init = fluid.initializer.Uniform(
+            low=-1.0, high=1.0, seed=np.random.randint(100))
+        embed_param_attrs = fluid.ParamAttr(
+            name='Base_node_embed', initializer=param_attr_init)
+
+        # node_embeddings
+        base_node_embed = fl.embedding(
+            input=fl.reshape(
+                self.train_inputs, shape=[-1, 1]),
+            size=[self.num_nodes, self.embedding_size],
+            param_attr=embed_param_attrs)
+
+        node_features = []
+        for edge_type in self.edge_types:
+            param_attr_init = fluid.initializer.Uniform(
+                low=-1.0, high=1.0, seed=np.random.randint(100))
+            embed_param_attrs = fluid.ParamAttr(
+                name='%s_node_embed' % edge_type, initializer=param_attr_init)
+
+            features = fl.embedding(
+                input=self.gw[edge_type].node_feat['index'],
+                size=[self.num_nodes, self.embedding_u_size],
+                param_attr=embed_param_attrs)
+
+            node_features.append(features)
+
+        # mp_output: list of embedding(self.num_nodes, dim)
+        mp_output = self.message_passing(self.gw, self.edge_types,
+                                         node_features)
+
+        # U : (num_type[m], num_nodes, dim[s])
+        node_type_embed = fl.stack(mp_output, axis=0)
+
+        # U : (num_nodes, num_type[m], dim[s])
+        node_type_embed = fl.transpose(node_type_embed, perm=[1, 0, 2])
+
+        #gather node_type_embed from train_inputs
+        node_type_embed = fl.gather(node_type_embed, self.train_inputs)
+
+        # M_r
+        trans_weights = fl.create_parameter(
+            shape=[
+                self.edge_type_count, self.embedding_u_size,
+                self.embedding_size // self.att_head
+            ],
+            attr=fluid.initializer.TruncatedNormalInitializer(
+                loc=0.0, scale=1.0 / math.sqrt(self.embedding_size)),
+            dtype='float32',
+            name='trans_w')
+
+        # W_r
+        trans_weights_s1 = fl.create_parameter(
+            shape=[self.edge_type_count, self.embedding_u_size, self.dim_a],
+            attr=fluid.initializer.TruncatedNormalInitializer(
+                loc=0.0, scale=1.0 / math.sqrt(self.embedding_size)),
+            dtype='float32',
+            name='trans_w_s1')
+
+        # w_r
+        trans_weights_s2 = fl.create_parameter(
+            shape=[self.edge_type_count, self.dim_a, self.att_head],
+            attr=fluid.initializer.TruncatedNormalInitializer(
+                loc=0.0, scale=1.0 / math.sqrt(self.embedding_size)),
+            dtype='float32',
+            name='trans_w_s2')
+
+        trans_w = fl.gather(trans_weights, self.train_types)
+        trans_w_s1 = fl.gather(trans_weights_s1, self.train_types)
+        trans_w_s2 = fl.gather(trans_weights_s2, self.train_types)
+
+        attention = self.attention(node_type_embed, trans_w_s1, trans_w_s2)
+        node_type_embed = fl.matmul(attention, node_type_embed)
+        node_embed = base_node_embed + fl.reshape(
+            fl.matmul(node_type_embed, trans_w), [-1, self.embedding_size])
+
+        self.last_node_embed = fl.l2_normalize(node_embed, axis=1)
+
+        nce_weight_initializer = fluid.initializer.TruncatedNormalInitializer(
+            loc=0.0, scale=1.0 / math.sqrt(self.embedding_size))
+        nce_weight_attrs = fluid.ParamAttr(
+            name='nce_weight', initializer=nce_weight_initializer)
+
+        weight_pos = fl.embedding(
+            input=self.train_labels,
+            size=[self.num_nodes, self.embedding_size],
+            param_attr=nce_weight_attrs)
+        weight_neg = fl.embedding(
+            input=self.train_negs,
+            size=[self.num_nodes, self.embedding_size],
+            param_attr=nce_weight_attrs)
+        tmp_node_embed = fl.unsqueeze(self.last_node_embed, axes=[1])
+        pos_logits = fl.matmul(
+            tmp_node_embed, weight_pos, transpose_y=True)  # [B, 1, 1]
+
+        neg_logits = fl.matmul(
+            tmp_node_embed, weight_neg, transpose_y=True)  # [B, 1, neg_num]
+
+        pos_score = fl.squeeze(pos_logits, axes=[1])
+        pos_score = fl.clip(pos_score, min=-10, max=10)
+        pos_score = -1.0 * fl.logsigmoid(pos_score)
+
+        neg_score = fl.squeeze(neg_logits, axes=[1])
+        neg_score = fl.clip(neg_score, min=-10, max=10)
+        neg_score = -1.0 * fl.logsigmoid(-1.0 * neg_score)
+
+        neg_score = fl.reduce_sum(neg_score, dim=1, keep_dim=True)
+        self.loss = fl.reduce_mean(pos_score + neg_score)
+
+    def attention(self, node_type_embed, trans_w_s1, trans_w_s2):
+        """Calculate attention weights.
+        """
+        attention = fl.tanh(fl.matmul(node_type_embed, trans_w_s1))
+        attention = fl.matmul(attention, trans_w_s2)
+        attention = fl.reshape(attention, [-1, self.u_num])
+        attention = fl.softmax(attention)
+        attention = fl.reshape(attention, [-1, self.att_head, self.u_num])
+        return attention
+
+    def message_passing(self, gw, edge_types, features, name=''):
+        """Message passing from source nodes to dstination nodes
+        """
+
+        def __message(src_feat, dst_feat, edge_feat):
+            """send function
+            """
+            return src_feat['h']
+
+        def __reduce(feat):
+            """recv function
+            """
+            return fluid.layers.sequence_pool(feat, pool_type='average')
+
+        if not isinstance(edge_types, list):
+            edge_types = [edge_types]
+
+        if not isinstance(features, list):
+            features = [features]
+
+        assert len(edge_types) == len(features)
+
+        output = []
+        for i in range(len(edge_types)):
+            msg = gw[edge_types[i]].send(
+                __message, nfeat_list=[('h', features[i])])
+            neigh_feat = gw[edge_types[i]].recv(msg, __reduce)
+            neigh_feat = fl.fc(neigh_feat,
+                               size=neigh_feat.shape[-1],
+                               name='neigh_fc_%d' % (i),
+                               act='sigmoid')
+            slf_feat = fl.fc(features[i],
+                             size=neigh_feat.shape[-1],
+                             name='slf_fc_%d' % (i),
+                             act='sigmoid')
+
+            out = fluid.layers.concat([slf_feat, neigh_feat], axis=1)
+            out = fl.fc(out, size=neigh_feat.shape[-1], name='fc', act=None)
+            out = fluid.layers.l2_normalize(out, axis=1)
+            output.append(out)
+
+        # list of matrix
+        return output
+
+    def backward(self, global_steps, opt_config):
+        """Build the optimizer.
+        """
+        self.lr = fl.polynomial_decay(opt_config['lr'], global_steps, 0.001)
+        adam = fluid.optimizer.Adam(learning_rate=self.lr)
+        adam.minimize(self.loss)
--- a/examples/GATNE/utils.py
+++ b/examples/GATNE/utils.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement a class for model configure.
+"""
+
+import datetime
+import os
+import yaml
+import random
+import shutil
+
+
+class Config(object):
+    """Implementation of Config class for model configure.
+
+    Args:
+        config_file(str): configure filename, which is a yaml file.
+        isCreate(bool): if true, create some neccessary directories to save models, log file and other outputs.
+        isSave(bool): if true, save config_file in order to record the configure message.
+    """
+
+    def __init__(self, config_file, isCreate=False, isSave=False):
+        self.config_file = config_file
+        self.config = self.get_config_from_yaml(config_file)
+
+        if isCreate:
+            self.create_necessary_dirs()
+
+            if isSave:
+                self.save_config_file()
+
+    def get_config_from_yaml(self, yaml_file):
+        """Get the configure hyperparameters from yaml file.
+        """
+        try:
+            with open(yaml_file, 'r') as f:
+                config = yaml.load(f)
+        except Exception:
+            raise IOError("Error in parsing config file '%s'" % yaml_file)
+
+        return config
+
+    def create_necessary_dirs(self):
+        """Create some necessary directories to save some important files.
+        """
+
+        time_stamp = datetime.datetime.now().strftime('%m%d_%H%M')
+        self.config['trainer']['args']['log_dir'] = ''.join(
+            (self.config['trainer']['args']['log_dir'],
+             self.config['task_name'], '/'))  # , '.%s/' % (time_stamp)))
+        self.config['trainer']['args']['save_dir'] = ''.join(
+            (self.config['trainer']['args']['save_dir'],
+             self.config['task_name'], '/'))  # , '.%s/' % (time_stamp))) 
+        self.config['trainer']['args']['output_dir'] = ''.join(
+            (self.config['trainer']['args']['output_dir'],
+             self.config['task_name'], '/'))  # , '.%s/' % (time_stamp)))
+        #  if os.path.exists(self.config['trainer']['args']['save_dir']):
+        #      input('save_dir is existed, do you really want to continue?')
+
+        self.make_dir(self.config['trainer']['args']['log_dir'])
+        self.make_dir(self.config['trainer']['args']['save_dir'])
+        self.make_dir(self.config['trainer']['args']['output_dir'])
+
+    def save_config_file(self):
+        """Save config file so that we can know the config when we look back
+        """
+        filename = self.config_file.split('/')[-1]
+        targetpath = self.config['trainer']['args']['save_dir']
+        shutil.copyfile(self.config_file, targetpath + filename)
+
+    def make_dir(self, path):
+        """Build directory"""
+        if not os.path.exists(path):
+            os.makedirs(path)
+
+    def __getitem__(self, key):
+        """Return the configure dict"""
+        return self.config[key]
+
+    def __call__(self):
+        """__call__"""
+        return self.config
--- a/examples/dgi/README.md
+++ b/examples/dgi/README.md
+# DGI: Deep Graph Infomax 
+
+[Deep Graph Infomax \(DGI\)](https://arxiv.org/abs/1809.10341) is a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures.
+
+### Datasets
+
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+
+### Dependencies
+
+- paddlepaddle>=1.6
+- pgl
+
+### Performance
+
+We use DGI to pretrain embeddings for each nodes. Then we fix the embedding to train a node classifier.
+
+| Dataset | Accuracy | 
+| --- | --- |
+| Cora | ~81% | 
+| Pubmed | ~77.6% |
+| Citeseer | ~71.3% |
+
+
+### How to run
+
+For examples, use gpu to train gcn on cora dataset.
+```
+python dgi.py --dataset cora --use_cuda
+python train.py --dataset cora --use_cuda
+```
+
+#### Hyperparameters
+
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda. 
--- a/examples/dgi/dgi.py
+++ b/examples/dgi/dgi.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    DGI Pretrain
+"""
+import os
+import pgl
+from pgl import data_loader
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import numpy as np
+import time
+import argparse
+
+
+def load(name):
+    """Load dataset"""
+    if name == 'cora':
+        dataset = data_loader.CoraDataset()
+    elif name == "pubmed":
+        dataset = data_loader.CitationDataset("pubmed", symmetry_edges=False)
+    elif name == "citeseer":
+        dataset = data_loader.CitationDataset("citeseer", symmetry_edges=False)
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+
+
+def save_param(dirname, var_name_list):
+    """save_param"""
+    for var_name in var_name_list:
+        var = fluid.global_scope().find_var(var_name)
+        var_tensor = var.get_tensor()
+        np.save(os.path.join(dirname, var_name + '.npy'), np.array(var_tensor))
+
+
+def main(args):
+    """main"""
+    dataset = load(args.dataset)
+
+    # normalize
+    indegree = dataset.graph.indegree()
+    norm = np.zeros_like(indegree, dtype="float32")
+    norm[indegree > 0] = np.power(indegree[indegree > 0], -0.5)
+    dataset.graph.node_feat["norm"] = np.expand_dims(norm, -1)
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    hidden_size = 512
+
+    with fluid.program_guard(train_program, startup_program):
+        pos_gw = pgl.graph_wrapper.GraphWrapper(
+            name="pos_graph",
+            place=place,
+            node_feat=dataset.graph.node_feat_info())
+
+        neg_gw = pgl.graph_wrapper.GraphWrapper(
+            name="neg_graph",
+            place=place,
+            node_feat=dataset.graph.node_feat_info())
+
+        positive_feat = pgl.layers.gcn(pos_gw,
+                                       pos_gw.node_feat["words"],
+                                       hidden_size,
+                                       activation="relu",
+                                       norm=pos_gw.node_feat['norm'],
+                                       name="gcn_layer_1")
+
+        negative_feat = pgl.layers.gcn(neg_gw,
+                                       neg_gw.node_feat["words"],
+                                       hidden_size,
+                                       activation="relu",
+                                       norm=neg_gw.node_feat['norm'],
+                                       name="gcn_layer_1")
+
+        summary_feat = fluid.layers.sigmoid(
+            fluid.layers.reduce_mean(
+                positive_feat, [0], keep_dim=True))
+
+        summary_feat = fluid.layers.fc(summary_feat,
+                                       hidden_size,
+                                       bias_attr=False,
+                                       name="discriminator")
+        pos_logits = fluid.layers.matmul(
+            positive_feat, summary_feat, transpose_y=True)
+        neg_logits = fluid.layers.matmul(
+            negative_feat, summary_feat, transpose_y=True)
+        pos_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=pos_logits,
+            label=fluid.layers.ones(
+                shape=[dataset.graph.num_nodes, 1], dtype="float32"))
+        neg_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=neg_logits,
+            label=fluid.layers.zeros(
+                shape=[dataset.graph.num_nodes, 1], dtype="float32"))
+        loss = fluid.layers.reduce_mean(pos_loss) + fluid.layers.reduce_mean(
+            neg_loss)
+
+        adam = fluid.optimizer.Adam(learning_rate=1e-3)
+        adam.minimize(loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    best_loss = 1e9
+    dur = []
+
+    for epoch in range(args.epoch):
+        feed_dict = pos_gw.to_feed(dataset.graph)
+        node_feat = dataset.graph.node_feat["words"].copy()
+        perm = np.arange(0, dataset.graph.num_nodes)
+        np.random.shuffle(perm)
+
+        dataset.graph.node_feat["words"] = dataset.graph.node_feat["words"][
+            perm]
+
+        feed_dict.update(neg_gw.to_feed(dataset.graph))
+        dataset.graph.node_feat["words"] = node_feat
+        if epoch >= 3:
+            t0 = time.time()
+        train_loss = exe.run(train_program,
+                             feed=feed_dict,
+                             fetch_list=[loss],
+                             return_numpy=True)
+        if train_loss[0] < best_loss:
+            best_loss = train_loss[0]
+            save_param(args.checkpoint, ["gcn_layer_1", "gcn_layer_1_bias"])
+
+        if epoch >= 3:
+            time_per_epoch = 1.0 * (time.time() - t0)
+            dur.append(time_per_epoch)
+
+        log.info("Epoch %d " % epoch + "(%.5lf sec) " % np.mean(dur) +
+                 "Train Loss: %f " % train_loss[0])
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='DGI pretrain')
+    parser.add_argument(
+        "--dataset", type=str, default="cora", help="dataset (cora, pubmed)")
+    parser.add_argument(
+        "--checkpoint", type=str, default="best_model", help="checkpoint")
+    parser.add_argument(
+        "--epoch", type=int, default=200, help="pretrain epochs")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/dgi/train.py
+++ b/examples/dgi/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    Train
+"""
+import os
+import pgl
+from pgl import data_loader
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import numpy as np
+import time
+import argparse
+
+
+def load(name):
+    """Load"""
+    if name == 'cora':
+        dataset = data_loader.CoraDataset()
+    elif name == "pubmed":
+        dataset = data_loader.CitationDataset("pubmed", symmetry_edges=False)
+    elif name == "citeseer":
+        dataset = data_loader.CitationDataset("citeseer", symmetry_edges=False)
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+
+
+def load_param(dirname, var_name_list):
+    """load_param"""
+    for var_name in var_name_list:
+        var = fluid.global_scope().find_var(var_name)
+        var_tensor = var.get_tensor()
+        var_tmp = np.load(os.path.join(dirname, var_name + '.npy'))
+        var_tensor.set(var_tmp, fluid.CPUPlace())
+
+
+def main(args):
+    """main"""
+    dataset = load(args.dataset)
+
+    # normalize
+    indegree = dataset.graph.indegree()
+    norm = np.zeros_like(indegree, dtype="float32")
+    norm[indegree > 0] = np.power(indegree[indegree > 0], -0.5)
+    dataset.graph.node_feat["norm"] = np.expand_dims(norm, -1)
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    test_program = fluid.Program()
+    hidden_size = 512
+
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.GraphWrapper(
+            name="graph",
+            place=place,
+            node_feat=dataset.graph.node_feat_info())
+
+        output = pgl.layers.gcn(gw,
+                                gw.node_feat["words"],
+                                hidden_size,
+                                activation="relu",
+                                norm=gw.node_feat['norm'],
+                                name="gcn_layer_1")
+        output.stop_gradient = True
+        output = fluid.layers.fc(output,
+                                 dataset.num_classes,
+                                 act=None,
+                                 name="classifier")
+        node_index = fluid.layers.data(
+            "node_index",
+            shape=[None, 1],
+            dtype="int64",
+            append_batch_size=False)
+        node_label = fluid.layers.data(
+            "node_label",
+            shape=[None, 1],
+            dtype="int64",
+            append_batch_size=False)
+
+        pred = fluid.layers.gather(output, node_index)
+        loss, pred = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=node_label, return_softmax=True)
+        acc = fluid.layers.accuracy(input=pred, label=node_label, k=1)
+        loss = fluid.layers.mean(loss)
+
+    test_program = train_program.clone(for_test=True)
+    with fluid.program_guard(train_program, startup_program):
+        adam = fluid.optimizer.Adam(learning_rate=1e-2)
+        adam.minimize(loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    load_param(args.checkpoint, ["gcn_layer_1", "gcn_layer_1_bias"])
+    feed_dict = gw.to_feed(dataset.graph)
+
+    train_index = dataset.train_index
+    train_label = np.expand_dims(dataset.y[train_index], -1)
+    train_index = np.expand_dims(train_index, -1)
+
+    val_index = dataset.val_index
+    val_label = np.expand_dims(dataset.y[val_index], -1)
+    val_index = np.expand_dims(val_index, -1)
+
+    test_index = dataset.test_index
+    test_label = np.expand_dims(dataset.y[test_index], -1)
+    test_index = np.expand_dims(test_index, -1)
+
+    dur = []
+    for epoch in range(200):
+        if epoch >= 3:
+            t0 = time.time()
+        feed_dict["node_index"] = np.array(train_index, dtype="int64")
+        feed_dict["node_label"] = np.array(train_label, dtype="int64")
+        train_loss, train_acc = exe.run(train_program,
+                                        feed=feed_dict,
+                                        fetch_list=[loss, acc],
+                                        return_numpy=True)
+
+        if epoch >= 3:
+            time_per_epoch = 1.0 * (time.time() - t0)
+            dur.append(time_per_epoch)
+        feed_dict["node_index"] = np.array(val_index, dtype="int64")
+        feed_dict["node_label"] = np.array(val_label, dtype="int64")
+        val_loss, val_acc = exe.run(test_program,
+                                    feed=feed_dict,
+                                    fetch_list=[loss, acc],
+                                    return_numpy=True)
+
+        log.info("Epoch %d " % epoch + "(%.5lf sec) " % np.mean(dur) +
+                 "Train Loss: %f " % train_loss + "Train Acc: %f " % train_acc
+                 + "Val Loss: %f " % val_loss + "Val Acc: %f " % val_acc)
+
+    feed_dict["node_index"] = np.array(test_index, dtype="int64")
+    feed_dict["node_label"] = np.array(test_label, dtype="int64")
+    test_loss, test_acc = exe.run(test_program,
+                                  feed=feed_dict,
+                                  fetch_list=[loss, acc],
+                                  return_numpy=True)
+    log.info("Accuracy: %f" % test_acc)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='GCN')
+    parser.add_argument(
+        "--dataset", type=str, default="cora", help="dataset (cora, pubmed)")
+    parser.add_argument(
+        "--checkpoint", type=str, default="best_model", help="checkpoint")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/distribute_deepwalk/README.md
+++ b/examples/distribute_deepwalk/README.md
+# Distributed Deepwalk in PGL
+[Deepwalk](https://arxiv.org/pdf/1403.6652.pdf) is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce distributed deepwalk algorithms and reach the same level of indicators as the paper.
+
+## Datasets
+The datasets contain two networks: [BlogCatalog](http://socialcomputing.asu.edu/datasets/BlogCatalog3). 
+## Dependencies
+- paddlepaddle>=1.6
+- pgl>=1.0
+
+## How to run
+
+We adopt [PaddlePaddle Fleet](https://github.com/PaddlePaddle/Fleet) as our distributed training frameworks ```pgl_deepwalk.cfg``` is config file for deepwalk hyperparameter and ```local_config``` is a config file for parameter servers. By default, we have 2 pservers and 2 trainers. We can use ```cloud_run.sh``` to help you startup the parameter servers and model trainers. 
+
+For examples, train deepwalk in distributed mode on BlogCataLog dataset.
+```sh
+# train deepwalk in distributed mode.
+sh cloud_run.sh
+
+# multiclass task example
+python3 multi_class.py --use_cuda --ckpt_path ./model_path/4029 --epoch 1000
+
+```
+
+## Hyperparameters
+- dataset: The citation dataset "BlogCatalog".
+- hidden_size: Hidden size of the embedding. 
+- lr: Learning rate. 
+- neg_num: Number of negative samples.
+- epoch: Number of training epoch.
+
+### Experiment results
+Dataset|model|Task|Metric|PGL Result|Reported Result 
+--|--|--|--|--|--
+BlogCatalog|distributed deepwalk|multi-label classification|MacroF1|0.233|0.211
--- a/examples/distribute_deepwalk/cloud_run.sh
+++ b/examples/distribute_deepwalk/cloud_run.sh
+#!/bin/bash 
+
+set -x
+source ./pgl_deepwalk.cfg
+source ./local_config
+
+unset http_proxy https_proxy
+
+# build train_data
+trainer_num=`echo $PADDLE_PORT | awk -F',' '{print NF}'`
+rm -rf train_data && mkdir -p train_data 
+cd train_data
+if [[ $build_train_data == True ]];then
+    seq 0 $((num_nodes-1)) | shuf | split -l $((num_nodes/trainer_num/CPU_NUM+1))
+else
+    for i in `seq 1 $trainer_num`;do
+        touch $i
+    done
+fi
+cd - 
+
+# mkdir workspace
+if [ -d ${BASE} ]; then
+    rm -rf ${BASE}
+fi 
+mkdir ${BASE}
+
+# start ps
+for((i=0;i<${PADDLE_PSERVERS_NUM};i++))
+do
+    echo "start ps server: ${i}"
+    echo $BASE
+    TRAINING_ROLE="PSERVER" PADDLE_TRAINER_ID=${i} sh job.sh &> $BASE/pserver.$i.log & 
+done
+sleep 5s 
+
+# start trainers
+for((j=0;j<${PADDLE_TRAINERS_NUM};j++))
+do
+    echo "start ps work: ${j}"
+    TRAINING_ROLE="TRAINER" PADDLE_TRAINER_ID=${j} sh job.sh &> $BASE/worker.$j.log &
+done
--- a/examples/distribute_deepwalk/cluster_train.py
+++ b/examples/distribute_deepwalk/cluster_train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License"); 
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import time
+import os
+import math
+from multiprocessing import Process
+
+import numpy as np
+import paddle.fluid as F
+import paddle.fluid.layers as L
+from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
+from paddle.fluid.transpiler.distribute_transpiler import DistributeTranspilerConfig
+import paddle.fluid.incubate.fleet.base.role_maker as role_maker
+from pgl.utils.logger import log
+from pgl import data_loader
+
+from reader import DeepwalkReader
+from model import DeepwalkModel
+from utils import get_file_list
+from utils import build_graph
+from utils import build_fake_graph
+from utils import build_gen_func
+
+
+def init_role():
+    # reset the place according to role of parameter server
+    training_role = os.getenv("TRAINING_ROLE", "TRAINER")
+    paddle_role = role_maker.Role.WORKER
+    place = F.CPUPlace()
+    if training_role == "PSERVER":
+        paddle_role = role_maker.Role.SERVER
+
+    # set the fleet runtime environment according to configure
+    ports = os.getenv("PADDLE_PORT", "6174").split(",")
+    pserver_ips = os.getenv("PADDLE_PSERVERS").split(",")  # ip,ip...
+    eplist = []
+    if len(ports) > 1:
+        # local debug mode, multi port
+        for port in ports:
+            eplist.append(':'.join([pserver_ips[0], port]))
+    else:
+        # distributed mode, multi ip
+        for ip in pserver_ips:
+            eplist.append(':'.join([ip, ports[0]]))
+
+    pserver_endpoints = eplist  # ip:port,ip:port...
+    worker_num = int(os.getenv("PADDLE_TRAINERS_NUM", "0"))
+    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
+    role = role_maker.UserDefinedRoleMaker(
+        current_id=trainer_id,
+        role=paddle_role,
+        worker_num=worker_num,
+        server_endpoints=pserver_endpoints)
+    fleet.init(role)
+
+
+def optimization(base_lr, loss, train_steps, optimizer='sgd'):
+    decayed_lr = L.learning_rate_scheduler.polynomial_decay(
+        learning_rate=base_lr,
+        decay_steps=train_steps,
+        end_learning_rate=0.0001 * base_lr,
+        power=1.0,
+        cycle=False)
+    if optimizer == 'sgd':
+        optimizer = F.optimizer.SGD(decayed_lr)
+    elif optimizer == 'adam':
+        optimizer = F.optimizer.Adam(decayed_lr, lazy_mode=True)
+    else:
+        raise ValueError
+
+    log.info('learning rate:%f' % (base_lr))
+    #create the DistributeTranspiler configure
+    config = DistributeTranspilerConfig()
+    config.sync_mode = False
+    #config.runtime_split_send_recv = False
+
+    config.slice_var_up = False
+    #create the distributed optimizer
+    optimizer = fleet.distributed_optimizer(optimizer, config)
+    optimizer.minimize(loss)
+
+
+def build_complied_prog(train_program, model_loss):
+    num_threads = int(os.getenv("CPU_NUM", 10))
+    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
+    exec_strategy = F.ExecutionStrategy()
+    exec_strategy.num_threads = num_threads
+    #exec_strategy.use_experimental_executor = True
+    build_strategy = F.BuildStrategy()
+    build_strategy.enable_inplace = True
+    #build_strategy.memory_optimize = True
+    build_strategy.memory_optimize = False
+    build_strategy.remove_unnecessary_lock = False
+    if num_threads > 1:
+        build_strategy.reduce_strategy = F.BuildStrategy.ReduceStrategy.Reduce
+
+    compiled_prog = F.compiler.CompiledProgram(
+        train_program).with_data_parallel(
+            loss_name=model_loss.name)
+    return compiled_prog
+
+
+def train_prog(exe, program, loss, node2vec_pyreader, args, train_steps):
+    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
+    step = 0
+    while True:
+        try:
+            begin_time = time.time()
+            loss_val, = exe.run(program, fetch_list=[loss])
+            log.info("step %s: loss %.5f speed: %.5f s/step" %
+                     (step, np.mean(loss_val), time.time() - begin_time))
+            step += 1
+        except F.core.EOFException:
+            node2vec_pyreader.reset()
+
+        if step % args.steps_per_save == 0 or step == train_steps:
+            if trainer_id == 0 or args.is_distributed:
+                model_save_dir = args.save_path
+                model_path = os.path.join(model_save_dir, str(step))
+                if not os.path.exists(model_save_dir):
+                    os.makedirs(model_save_dir)
+                fleet.save_persistables(exe, model_path)
+
+        if step == train_steps:
+            break
+
+
+def test(args):
+    graph = build_graph(args.num_nodes, args.edge_path)
+    gen_func = build_gen_func(args, graph)
+
+    start = time.time()
+    num = 10
+    for idx, _ in enumerate(gen_func()):
+        if idx % num == num - 1:
+            log.info("%s" % (1.0 * (time.time() - start) / num))
+            start = time.time()
+
+
+def walk(args):
+    graph = build_graph(args.num_nodes, args.edge_path)
+    num_sample_workers = args.num_sample_workers
+
+    if args.train_files is None or args.train_files == "None":
+        log.info("Walking from graph...")
+        train_files = [None for _ in range(num_sample_workers)]
+    else:
+        log.info("Walking from train_data...")
+        files = get_file_list(args.train_files)
+        train_files = [[] for i in range(num_sample_workers)]
+        for idx, f in enumerate(files):
+            train_files[idx % num_sample_workers].append(f)
+
+    def walk_to_file(walk_gen, filename, max_num):
+        with open(filename, "w") as outf:
+            num = 0
+            for walks in walk_gen:
+                for walk in walks:
+                    outf.write("%s\n" % "\t".join([str(i) for i in walk]))
+                    num += 1
+                    if num % 1000 == 0:
+                        log.info("Total: %s, %s walkpath is saved. " %
+                                 (max_num, num))
+                    if num == max_num:
+                        return
+
+    m_args = [(DeepwalkReader(
+        graph,
+        batch_size=args.batch_size,
+        walk_len=args.walk_len,
+        win_size=args.win_size,
+        neg_num=args.neg_num,
+        neg_sample_type=args.neg_sample_type,
+        walkpath_files=None,
+        train_files=train_files[i]).walk_generator(),
+               "%s/%s" % (args.walkpath_files, i),
+               args.epoch * args.num_nodes // args.num_sample_workers)
+              for i in range(num_sample_workers)]
+    ps = []
+    for i in range(num_sample_workers):
+        p = Process(target=walk_to_file, args=m_args[i])
+        p.start()
+        ps.append(p)
+    for i in range(num_sample_workers):
+        ps[i].join()
+
+
+def train(args):
+    import logging
+    log.setLevel(logging.DEBUG)
+    log.info("start")
+
+    worker_num = int(os.getenv("PADDLE_TRAINERS_NUM", "0"))
+    num_devices = int(os.getenv("CPU_NUM", 10))
+
+    model = DeepwalkModel(args.num_nodes, args.hidden_size, args.neg_num,
+                          args.is_sparse, args.is_distributed, 1.)
+    pyreader = model.pyreader
+    loss = model.forward()
+
+    # init fleet
+    init_role()
+
+    train_steps = math.ceil(1. * args.num_nodes * args.epoch /
+                            args.batch_size / num_devices / worker_num)
+    log.info("Train step: %s" % train_steps)
+
+    if args.optimizer == "sgd":
+        args.lr *= args.batch_size * args.walk_len * args.win_size
+    optimization(args.lr, loss, train_steps, args.optimizer)
+
+    # init and run server or worker
+    if fleet.is_server():
+        fleet.init_server(args.warm_start_from_dir)
+        fleet.run_server()
+
+    if fleet.is_worker():
+        log.info("start init worker done")
+        fleet.init_worker()
+        #just the worker, load the sample
+        log.info("init worker done")
+
+        exe = F.Executor(F.CPUPlace())
+        exe.run(fleet.startup_program)
+        log.info("Startup done")
+
+        if args.dataset is not None:
+            if args.dataset == "BlogCatalog":
+                graph = data_loader.BlogCatalogDataset().graph
+            elif args.dataset == "ArXiv":
+                graph = data_loader.ArXivDataset().graph
+            else:
+                raise ValueError(args.dataset + " dataset doesn't exists")
+            log.info("Load buildin BlogCatalog dataset done.")
+        elif args.walkpath_files is None or args.walkpath_files == "None":
+            graph = build_graph(args.num_nodes, args.edge_path)
+            log.info("Load graph from '%s' done." % args.edge_path)
+        else:
+            graph = build_fake_graph(args.num_nodes)
+            log.info("Load fake graph done.")
+
+        # bind gen
+        gen_func = build_gen_func(args, graph)
+
+        pyreader.decorate_tensor_provider(gen_func)
+        pyreader.start()
+
+        compiled_prog = build_complied_prog(fleet.main_program, loss)
+        train_prog(exe, compiled_prog, loss, pyreader, args, train_steps)
+
+
+if __name__ == '__main__':
+
+    def str2bool(v):
+        if isinstance(v, bool):
+            return v
+        if v.lower() in ('yes', 'true', 't', 'y', '1'):
+            return True
+        elif v.lower() in ('no', 'false', 'f', 'n', '0'):
+            return False
+        else:
+            raise argparse.ArgumentTypeError('Boolean value expected.')
+
+    parser = argparse.ArgumentParser(description='Deepwalk')
+    parser.add_argument(
+        "--hidden_size",
+        type=int,
+        default=64,
+        help="Hidden size of the embedding.")
+    parser.add_argument(
+        "--lr", type=float, default=0.025, help="Learning rate.")
+    parser.add_argument(
+        "--neg_num", type=int, default=5, help="Number of negative samples.")
+    parser.add_argument(
+        "--epoch", type=int, default=1, help="Number of training epoch.")
+    parser.add_argument(
+        "--batch_size",
+        type=int,
+        default=128,
+        help="Numbert of walk paths in a batch.")
+    parser.add_argument(
+        "--walk_len", type=int, default=40, help="Length of a walk path.")
+    parser.add_argument(
+        "--win_size", type=int, default=5, help="Window size in skip-gram.")
+    parser.add_argument(
+        "--save_path",
+        type=str,
+        default="model_path",
+        help="Output path for saving model.")
+    parser.add_argument(
+        "--num_sample_workers",
+        type=int,
+        default=1,
+        help="Number of sampling workers.")
+    parser.add_argument(
+        "--steps_per_save",
+        type=int,
+        default=3000,
+        help="Steps for model saveing.")
+    parser.add_argument(
+        "--num_nodes",
+        type=int,
+        default=10000,
+        help="Number of nodes in graph.")
+    parser.add_argument("--edge_path", type=str, default="./graph_data")
+    parser.add_argument("--train_files", type=str, default=None)
+    parser.add_argument("--walkpath_files", type=str, default=None)
+    parser.add_argument("--is_distributed", type=str2bool, default=False)
+    parser.add_argument("--is_sparse", type=str2bool, default=True)
+    parser.add_argument("--warm_start_from_dir", type=str, default=None)
+    parser.add_argument("--dataset", type=str, default=None)
+    parser.add_argument(
+        "--neg_sample_type",
+        type=str,
+        default="average",
+        choices=["average", "outdegree"])
+    parser.add_argument(
+        "--mode",
+        type=str,
+        required=False,
+        choices=['train', 'walk'],
+        default="train")
+    parser.add_argument(
+        "--optimizer",
+        type=str,
+        required=False,
+        choices=['adam', 'sgd'],
+        default="sgd")
+    args = parser.parse_args()
+    log.info(args)
+    if args.mode == "train":
+        train(args)
+    elif args.mode == "walk":
+        walk(args)
--- a/examples/distribute_deepwalk/gpu_train.py
+++ b/examples/distribute_deepwalk/gpu_train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import time
+import os
+
+import numpy as np
+import paddle.fluid as F
+import paddle.fluid.layers as L
+from pgl.utils.logger import log
+
+from model import DeepwalkModel
+from utils import build_graph
+from utils import build_gen_func
+
+
+def optimization(base_lr, loss, train_steps, optimizer='adam'):
+    decayed_lr = L.polynomial_decay(base_lr, train_steps, 0.0001)
+
+    if optimizer == 'sgd':
+        optimizer = F.optimizer.SGD(
+            decayed_lr,
+            regularization=F.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0025))
+    elif optimizer == 'adam':
+        # dont use gpu's lazy mode
+        optimizer = F.optimizer.Adam(decayed_lr)
+    else:
+        raise ValueError
+
+    log.info('learning rate:%f' % (base_lr))
+    optimizer.minimize(loss)
+
+
+def get_parallel_exe(program, loss):
+    exec_strategy = F.ExecutionStrategy()
+    exec_strategy.num_threads = 1  #2 for fp32 4 for fp16
+    exec_strategy.use_experimental_executor = True
+    exec_strategy.num_iteration_per_drop_scope = 1  #important shit
+
+    build_strategy = F.BuildStrategy()
+    build_strategy.enable_inplace = True
+    build_strategy.memory_optimize = True
+    build_strategy.remove_unnecessary_lock = True
+
+    #return compiled_prog
+    train_exe = F.ParallelExecutor(
+        use_cuda=True,
+        loss_name=loss.name,
+        build_strategy=build_strategy,
+        exec_strategy=exec_strategy,
+        main_program=program)
+    return train_exe
+
+
+def train(train_exe, exe, program, loss, node2vec_pyreader, args, train_steps):
+    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
+    step = 0
+    while True:
+        try:
+            begin_time = time.time()
+            loss_val, = train_exe.run(fetch_list=[loss])
+            log.info("step %s: loss %.5f speed: %.5f s/step" %
+                     (step, np.mean(loss_val), time.time() - begin_time))
+            step += 1
+        except F.core.EOFException:
+            node2vec_pyreader.reset()
+
+        if (step == train_steps or
+                step % args.steps_per_save == 0) and trainer_id == 0:
+
+            model_save_dir = args.output_path
+            model_path = os.path.join(model_save_dir, str(step))
+            if not os.path.exists(model_save_dir):
+                os.makedirs(model_save_dir)
+            F.io.save_params(exe, model_path, program)
+        if step == train_steps:
+            break
+
+
+def main(args):
+    import logging
+    log.setLevel(logging.DEBUG)
+    log.info("start")
+
+    num_devices = len(F.cuda_places())
+    model = DeepwalkModel(args.num_nodes, args.hidden_size, args.neg_num,
+                          False, False, 1.)
+    pyreader = model.pyreader
+    loss = model.forward()
+
+    train_steps = int(args.num_nodes * args.epoch / args.batch_size /
+                      num_devices)
+    optimization(args.lr * num_devices, loss, train_steps, args.optimizer)
+
+    place = F.CUDAPlace(0)
+    exe = F.Executor(place)
+    exe.run(F.default_startup_program())
+
+    graph = build_graph(args.num_nodes, args.edge_path)
+    gen_func = build_gen_func(args, graph)
+
+    pyreader.decorate_tensor_provider(gen_func)
+    pyreader.start()
+
+    train_prog = F.default_main_program()
+
+    if args.warm_start_from_dir is not None:
+        F.io.load_params(exe, args.warm_start_from_dir, train_prog)
+
+    train_exe = get_parallel_exe(train_prog, loss)
+    train(train_exe, exe, train_prog, loss, pyreader, args, train_steps)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Deepwalk')
+    parser.add_argument("--hidden_size", type=int, default=64)
+    parser.add_argument("--lr", type=float, default=0.025)
+    parser.add_argument("--neg_num", type=int, default=5)
+    parser.add_argument("--epoch", type=int, default=100)
+    parser.add_argument("--batch_size", type=int, default=128)
+    parser.add_argument("--walk_len", type=int, default=40)
+    parser.add_argument("--win_size", type=int, default=5)
+    parser.add_argument("--output_path", type=str, default="output")
+    parser.add_argument("--num_sample_workers", type=int, default=1)
+    parser.add_argument("--steps_per_save", type=int, default=3000)
+    parser.add_argument("--num_nodes", type=int, default=10000)
+    parser.add_argument("--edge_path", type=str, default="./graph_data")
+    parser.add_argument("--walkpath_files", type=str, default=None)
+    parser.add_argument("--train_files", type=str, default="./train_data")
+    parser.add_argument("--warm_start_from_dir", type=str, default=None)
+    parser.add_argument(
+        "--neg_sample_type",
+        type=str,
+        default="average",
+        choices=["average", "outdegree"])
+    parser.add_argument(
+        "--optimizer",
+        type=str,
+        required=False,
+        choices=['adam', 'sgd'],
+        default="adam")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/distribute_deepwalk/job.sh
+++ b/examples/distribute_deepwalk/job.sh
+#!/bin/bash
+
+set -x
+source ./pgl_deepwalk.cfg
+
+export CPU_NUM=$CPU_NUM
+export FLAGS_rpc_deadline=3000000 
+export FLAGS_communicator_send_queue_size=1
+export FLAGS_communicator_min_send_grad_num_before_recv=0
+export FLAGS_communicator_max_merge_var_num=1
+export FLAGS_communicator_merge_sparse_grad=1
+
+if [[ $build_train_data == True ]];then
+    train_files="./train_data"
+else
+    train_files="None"
+fi
+    
+if [[ $pre_walk == True ]]; then
+    walkpath_files="./walk_path"
+    if [[ $TRAINING_ROLE == "PSERVER" ]];then
+        while [[ ! -d train_data ]];do
+            sleep 60
+            echo "Waiting for train_data ..."
+        done
+        rm -rf $walkpath_files && mkdir -p $walkpath_files
+        python -u cluster_train.py --num_sample_workers $num_sample_workers --num_nodes $num_nodes --mode walk --walkpath_files $walkpath_files --epoch $epoch \
+             --walk_len $walk_len --batch_size $batch_size --train_files $train_files --dataset "BlogCatalog"
+        touch build_graph_done
+    fi
+
+    while [[ ! -f build_graph_done ]];do
+        sleep 60
+        echo "Waiting for walk_path ..."
+    done
+else
+    walkpath_files="None"
+fi
+
+python -u cluster_train.py --num_sample_workers $num_sample_workers --num_nodes $num_nodes --optimizer $optimizer --walkpath_files $walkpath_files --epoch $epoch \
+            --is_distributed $distributed_embedding --lr $learning_rate --neg_num $neg_num --walk_len $walk_len --win_size $win_size --is_sparse $is_sparse --hidden_size $dim \
+            --batch_size $batch_size --steps_per_save $steps_per_save --train_files $train_files --dataset "BlogCatalog"
--- a/examples/distribute_deepwalk/local_config
+++ b/examples/distribute_deepwalk/local_config
+#!/bin/bash 
+export PADDLE_TRAINERS_NUM=2
+export PADDLE_PSERVERS_NUM=2
+export PADDLE_PORT=6184,6185
+export PADDLE_PSERVERS="127.0.0.1"
+export BASE="./local_dir"
+
--- a/examples/distribute_deepwalk/model.py
+++ b/examples/distribute_deepwalk/model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    Deepwalk model file.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+import math
+
+import paddle.fluid.layers as L
+import paddle.fluid as F
+
+
+def split_embedding(input,
+                    dict_size,
+                    hidden_size,
+                    initializer,
+                    name,
+                    num_part=16,
+                    is_sparse=False,
+                    learning_rate=1.0):
+    """ split_embedding
+    """
+    _part_size = hidden_size // num_part
+    if hidden_size % num_part != 0:
+        _part_size += 1
+    output_embedding = []
+    p_num = 0
+    while hidden_size > 0:
+        _part_size = min(_part_size, hidden_size)
+        hidden_size -= _part_size
+        print("part", p_num, "size=", (dict_size, _part_size))
+        part_embedding = L.embedding(
+            input=input,
+            size=(dict_size, _part_size),
+            is_sparse=is_sparse,
+            is_distributed=False,
+            param_attr=F.ParamAttr(
+                name=name + '_part%s' % p_num,
+                initializer=initializer,
+                learning_rate=learning_rate))
+        p_num += 1
+        output_embedding.append(part_embedding)
+    return L.concat(output_embedding, -1)
+
+
+class DeepwalkModel(object):
+    def __init__(self,
+                 num_nodes,
+                 hidden_size=16,
+                 neg_num=5,
+                 is_sparse=False,
+                 is_distributed=False,
+                 embedding_lr=1.0):
+        self.pyreader = L.py_reader(
+            capacity=70,
+            shapes=[[-1, 1, 1], [-1, neg_num + 1, 1]],
+            dtypes=['int64', 'int64'],
+            lod_levels=[0, 0],
+            name='train',
+            use_double_buffer=True)
+
+        self.num_nodes = num_nodes
+        self.neg_num = neg_num
+
+        self.embed_init = F.initializer.Uniform(
+            low=-1. / math.sqrt(hidden_size), high=1. / math.sqrt(hidden_size))
+        self.is_sparse = is_sparse
+        self.is_distributed = is_distributed
+        self.hidden_size = hidden_size
+        self.loss = None
+        self.embedding_lr = embedding_lr
+        max_hidden_size = int(math.pow(2, 31) / 4 / num_nodes)
+        self.num_part = int(math.ceil(1. * hidden_size / max_hidden_size))
+
+    def forward(self):
+        src, dsts = L.read_file(self.pyreader)
+
+        if self.is_sparse:
+            # sparse mode use 2 dims input.
+            src = L.reshape(src, [-1, 1])
+            dsts = L.reshape(dsts, [-1, 1])
+
+        if self.num_part is not None and self.num_part != 1 and not self.is_distributed:
+            src_embed = split_embedding(
+                src,
+                self.num_nodes,
+                self.hidden_size,
+                self.embed_init,
+                "weight",
+                self.num_part,
+                self.is_sparse,
+                learning_rate=self.embedding_lr)
+
+            dsts_embed = split_embedding(
+                dsts,
+                self.num_nodes,
+                self.hidden_size,
+                self.embed_init,
+                "weight",
+                self.num_part,
+                self.is_sparse,
+                learning_rate=self.embedding_lr)
+        else:
+            src_embed = L.embedding(
+                src, (self.num_nodes, self.hidden_size),
+                self.is_sparse,
+                self.is_distributed,
+                param_attr=F.ParamAttr(
+                    name="weight",
+                    learning_rate=self.embedding_lr,
+                    initializer=self.embed_init))
+
+            dsts_embed = L.embedding(
+                dsts, (self.num_nodes, self.hidden_size),
+                self.is_sparse,
+                self.is_distributed,
+                param_attr=F.ParamAttr(
+                    name="weight",
+                    learning_rate=self.embedding_lr,
+                    initializer=self.embed_init))
+
+        if self.is_sparse:
+            # reshape back
+            src_embed = L.reshape(src_embed, [-1, 1, self.hidden_size])
+            dsts_embed = L.reshape(dsts_embed,
+                                   [-1, self.neg_num + 1, self.hidden_size])
+
+        logits = L.matmul(
+            src_embed, dsts_embed,
+            transpose_y=True)  # [batch_size, 1, neg_num+1]
+
+        pos_label = L.fill_constant_batch_size_like(logits, [-1, 1, 1],
+                                                    "float32", 1)
+        neg_label = L.fill_constant_batch_size_like(
+            logits, [-1, 1, self.neg_num], "float32", 0)
+        label = L.concat([pos_label, neg_label], -1)
+
+        pos_weight = L.fill_constant_batch_size_like(logits, [-1, 1, 1],
+                                                     "float32", self.neg_num)
+        neg_weight = L.fill_constant_batch_size_like(
+            logits, [-1, 1, self.neg_num], "float32", 1)
+        weight = L.concat([pos_weight, neg_weight], -1)
+
+        weight.stop_gradient = True
+        label.stop_gradient = True
+
+        loss = L.sigmoid_cross_entropy_with_logits(logits, label)
+        loss = loss * weight
+        loss = L.reduce_mean(loss)
+        loss = loss * ((self.neg_num + 1) / 2 / self.neg_num)
+        loss.persistable = True
+        self.loss = loss
+        return loss
--- a/examples/distribute_deepwalk/mp_reader.py
+++ b/examples/distribute_deepwalk/mp_reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Optimized Multiprocessing Reader for PaddlePaddle
+"""
+import multiprocessing
+import numpy as np
+import time
+
+import paddle.fluid as fluid
+import pyarrow
+
+
+def _serialize_serializable(obj):
+    """Serialize Feed Dict
+    """
+    return {"type": type(obj), "data": obj.__dict__}
+
+
+def _deserialize_serializable(obj):
+    """Deserialize Feed Dict
+    """
+
+    val = obj["type"].__new__(obj["type"])
+    val.__dict__.update(obj["data"])
+    return val
+
+
+context = pyarrow.default_serialization_context()
+
+context.register_type(
+    object,
+    "object",
+    custom_serializer=_serialize_serializable,
+    custom_deserializer=_deserialize_serializable)
+
+
+def serialize_data(data):
+    """serialize_data"""
+    return pyarrow.serialize(data, context=context).to_buffer().to_pybytes()
+
+
+def deserialize_data(data):
+    """deserialize_data"""
+    return pyarrow.deserialize(data, context=context)
+
+
+def multiprocess_reader(readers, use_pipe=True, queue_size=1000):
+    """
+    multiprocess_reader use python multi process to read data from readers
+    and then use multiprocess.Queue or multiprocess.Pipe to merge all
+    data. The process number is equal to the number of input readers, each
+    process call one reader.
+    Multiprocess.Queue require the rw access right to /dev/shm, some
+    platform does not support.
+    you need to create multiple readers first, these readers should be independent
+    to each other so that each process can work independently.
+    An example:
+    .. code-block:: python
+        reader0 = reader(["file01", "file02"])
+        reader1 = reader(["file11", "file12"])
+        reader1 = reader(["file21", "file22"])
+        reader = multiprocess_reader([reader0, reader1, reader2],
+            queue_size=100, use_pipe=False)
+    """
+
+    assert type(readers) is list and len(readers) > 0
+
+    def _read_into_queue(reader, queue):
+        """read_into_queue"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None")
+            queue.put(serialize_data(sample))
+        queue.put(serialize_data(None))
+
+    def queue_reader():
+        """queue_reader"""
+        queue = multiprocessing.Queue(queue_size)
+        for reader in readers:
+            p = multiprocessing.Process(
+                target=_read_into_queue, args=(reader, queue))
+            p.start()
+
+        reader_num = len(readers)
+        finish_num = 0
+        while finish_num < reader_num:
+            sample = deserialize_data(queue.get())
+            if sample is None:
+                finish_num += 1
+            else:
+                yield sample
+
+    def _read_into_pipe(reader, conn):
+        """read_into_pipe"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None!")
+            conn.send(serialize_data(sample))
+        conn.send(serialize_data(None))
+        conn.close()
+
+    def pipe_reader():
+        """pipe_reader"""
+        conns = []
+        for reader in readers:
+            parent_conn, child_conn = multiprocessing.Pipe()
+            conns.append(parent_conn)
+            p = multiprocessing.Process(
+                target=_read_into_pipe, args=(reader, child_conn))
+            p.start()
+
+        reader_num = len(readers)
+        finish_num = 0
+        conn_to_remove = []
+        finish_flag = np.zeros(len(conns), dtype="int32")
+        while finish_num < reader_num:
+            for conn_id, conn in enumerate(conns):
+                if finish_flag[conn_id] > 0:
+                    continue
+                buff = conn.recv()
+                now = time.time()
+                sample = deserialize_data(buff)
+                out = time.time() - now
+                if sample is None:
+                    finish_num += 1
+                    conn.close()
+                    finish_flag[conn_id] = 1
+                else:
+                    yield sample
+
+    if use_pipe:
+        return pipe_reader
+    else:
+        return queue_reader
--- a/examples/distribute_deepwalk/multi_class.py
+++ b/examples/distribute_deepwalk/multi_class.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import time
+import math
+import os
+
+import numpy as np
+import sklearn.metrics
+from sklearn.metrics import f1_score
+
+import pgl
+from pgl import data_loader
+from pgl.utils import op
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import paddle.fluid.layers as l
+
+np.random.seed(123)
+
+
+def load(name):
+    if name == 'BlogCatalog':
+        dataset = data_loader.BlogCatalogDataset()
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+
+
+def node_classify_model(graph,
+                        num_labels,
+                        hidden_size=16,
+                        name='node_classify_task'):
+    pyreader = l.py_reader(
+        capacity=70,
+        shapes=[[-1, 1], [-1, num_labels]],
+        dtypes=['int64', 'float32'],
+        lod_levels=[0, 0],
+        name=name + '_pyreader',
+        use_double_buffer=True)
+    nodes, labels = l.read_file(pyreader)
+    embed_nodes = l.embedding(
+        input=nodes,
+        size=[graph.num_nodes, hidden_size],
+        param_attr=fluid.ParamAttr(name='weight'))
+    embed_nodes.stop_gradient = True
+    logits = l.fc(input=embed_nodes, size=num_labels)
+    loss = l.sigmoid_cross_entropy_with_logits(logits, labels)
+    loss = l.reduce_mean(loss)
+    prob = l.sigmoid(logits)
+    topk = l.reduce_sum(labels, -1)
+    return pyreader, loss, prob, labels, topk
+
+
+def node_classify_generator(graph,
+                            all_nodes=None,
+                            batch_size=512,
+                            epoch=1,
+                            shuffle=True):
+
+    if all_nodes is None:
+        all_nodes = np.arange(graph.num_nodes)
+    #labels = (np.random.rand(512, 39) > 0.95).astype(np.float32)
+
+    def batch_nodes_generator(shuffle=shuffle):
+        perm = np.arange(len(all_nodes), dtype=np.int64)
+        if shuffle:
+            np.random.shuffle(perm)
+        start = 0
+        while start < len(all_nodes):
+            yield all_nodes[perm[start:start + batch_size]]
+            start += batch_size
+
+    def wrapper():
+        for _ in range(epoch):
+            for batch_nodes in batch_nodes_generator():
+                batch_nodes_expanded = np.expand_dims(batch_nodes,
+                                                      -1).astype(np.int64)
+                batch_labels = graph.node_feat['group_id'][batch_nodes].astype(
+                    np.float32)
+                yield [batch_nodes_expanded, batch_labels]
+
+    return wrapper
+
+
+def topk_f1_score(labels,
+                  probs,
+                  topk_list=None,
+                  average="macro",
+                  threshold=None):
+    assert topk_list is not None or threshold is not None, "one of topklist and threshold should not be None"
+    if threshold is not None:
+        preds = probs > threshold
+    else:
+        preds = np.zeros_like(labels, dtype=np.int64)
+        for idx, (prob, topk) in enumerate(zip(np.argsort(probs), topk_list)):
+            preds[idx][prob[-int(topk):]] = 1
+    return f1_score(labels, preds, average=average)
+
+
+def main(args):
+    hidden_size = args.hidden_size
+    epoch = args.epoch
+    ckpt_path = args.ckpt_path
+    threshold = args.threshold
+
+    dataset = load(args.dataset)
+
+    if args.batch_size is None:
+        batch_size = len(dataset.train_index)
+    else:
+        batch_size = args.batch_size
+
+    train_steps = (len(dataset.train_index) // batch_size) * epoch
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_prog = fluid.Program()
+    test_prog = fluid.Program()
+    startup_prog = fluid.Program()
+
+    with fluid.program_guard(train_prog, startup_prog):
+        with fluid.unique_name.guard():
+            train_pyreader, train_loss, train_probs, train_labels, train_topk = node_classify_model(
+                dataset.graph,
+                dataset.num_groups,
+                hidden_size=hidden_size,
+                name='train')
+            lr = l.polynomial_decay(0.025, train_steps, 0.0001)
+            adam = fluid.optimizer.Adam(lr)
+            adam.minimize(train_loss)
+
+    with fluid.program_guard(test_prog, startup_prog):
+        with fluid.unique_name.guard():
+            test_pyreader, test_loss, test_probs, test_labels, test_topk = node_classify_model(
+                dataset.graph,
+                dataset.num_groups,
+                hidden_size=hidden_size,
+                name='test')
+    test_prog = test_prog.clone(for_test=True)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+
+    train_pyreader.decorate_tensor_provider(
+        node_classify_generator(
+            dataset.graph,
+            dataset.train_index,
+            batch_size=batch_size,
+            epoch=epoch))
+    test_pyreader.decorate_tensor_provider(
+        node_classify_generator(
+            dataset.graph, dataset.test_index, batch_size=batch_size, epoch=1))
+
+    def existed_params(var):
+        if not isinstance(var, fluid.framework.Parameter):
+            return False
+        return os.path.exists(os.path.join(ckpt_path, var.name))
+
+    fluid.io.load_vars(
+        exe, ckpt_path, main_program=train_prog, predicate=existed_params)
+    step = 0
+    prev_time = time.time()
+    train_pyreader.start()
+
+    while 1:
+        try:
+            train_loss_val, train_probs_val, train_labels_val, train_topk_val = exe.run(
+                train_prog,
+                fetch_list=[
+                    train_loss, train_probs, train_labels, train_topk
+                ],
+                return_numpy=True)
+            train_macro_f1 = topk_f1_score(train_labels_val, train_probs_val,
+                                           train_topk_val, "macro", threshold)
+            train_micro_f1 = topk_f1_score(train_labels_val, train_probs_val,
+                                           train_topk_val, "micro", threshold)
+            step += 1
+            log.info("Step %d " % step + "Train Loss: %f " % train_loss_val +
+                     "Train Macro F1: %f " % train_macro_f1 +
+                     "Train Micro F1: %f " % train_micro_f1)
+        except fluid.core.EOFException:
+            train_pyreader.reset()
+            break
+
+        test_pyreader.start()
+        test_probs_vals, test_labels_vals, test_topk_vals = [], [], []
+        while 1:
+            try:
+                test_loss_val, test_probs_val, test_labels_val, test_topk_val = exe.run(
+                    test_prog,
+                    fetch_list=[
+                        test_loss, test_probs, test_labels, test_topk
+                    ],
+                    return_numpy=True)
+                test_probs_vals.append(
+                    test_probs_val), test_labels_vals.append(test_labels_val)
+                test_topk_vals.append(test_topk_val)
+            except fluid.core.EOFException:
+                test_pyreader.reset()
+                test_probs_array = np.concatenate(test_probs_vals)
+                test_labels_array = np.concatenate(test_labels_vals)
+                test_topk_array = np.concatenate(test_topk_vals)
+                test_macro_f1 = topk_f1_score(
+                    test_labels_array, test_probs_array, test_topk_array,
+                    "macro", threshold)
+                test_micro_f1 = topk_f1_score(
+                    test_labels_array, test_probs_array, test_topk_array,
+                    "micro", threshold)
+                log.info("\t\tStep %d " % step + "Test Loss: %f " %
+                         test_loss_val + "Test Macro F1: %f " % test_macro_f1 +
+                         "Test Micro F1: %f " % test_micro_f1)
+                break
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='node2vec')
+    parser.add_argument(
+        "--dataset",
+        type=str,
+        default="BlogCatalog",
+        help="dataset (BlogCatalog)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--epoch", type=int, default=400)
+    parser.add_argument("--batch_size", type=int, default=None)
+    parser.add_argument("--threshold", type=float, default=0.3)
+    parser.add_argument(
+        "--ckpt_path",
+        type=str,
+        default="./tmp/baseline_node2vec/paddle_model")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/distribute_deepwalk/pgl_deepwalk.cfg
+++ b/examples/distribute_deepwalk/pgl_deepwalk.cfg
+
+# deepwalk config
+num_nodes=10312 # max node_id + 1
+num_sample_workers=2
+epoch=100
+
+optimizer=sgd # sgd or adam
+learning_rate=0.5
+
+neg_num=5
+walk_len=40
+win_size=5
+dim=128
+batch_size=8
+steps_per_save=5000
+
+is_sparse=False
+distributed_embedding=False # only use when num_nodes > 100,000,000, slower than noraml embedding
+build_train_data=True
+pre_walk=False
+
+CPU_NUM=16
--- a/examples/distribute_deepwalk/reader.py
+++ b/examples/distribute_deepwalk/reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    Reader file.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+import time
+import io
+import os
+
+import numpy as np
+import paddle
+from pgl.utils.logger import log
+from pgl.sample import node2vec_sample
+from pgl.sample import deepwalk_sample
+from pgl.sample import alias_sample
+from pgl.graph_kernel import skip_gram_gen_pair
+from pgl.graph_kernel import alias_sample_build_table
+from pgl.utils import mp_reader
+
+
+class DeepwalkReader(object):
+    def __init__(self,
+                 graph,
+                 batch_size=512,
+                 walk_len=40,
+                 win_size=5,
+                 neg_num=5,
+                 train_files=None,
+                 walkpath_files=None,
+                 neg_sample_type="average"):
+        """
+        Args:
+            walkpath_files: if is not None, read walk path from walkpath_files
+        """
+        self.graph = graph
+        self.batch_size = batch_size
+        self.walk_len = walk_len
+        self.win_size = win_size
+        self.neg_num = neg_num
+        self.train_files = train_files
+        self.walkpath_files = walkpath_files
+        self.neg_sample_type = neg_sample_type
+
+    def walk_from_files(self):
+        bucket = []
+        while True:
+            for filename in self.walkpath_files:
+                with io.open(filename) as inf:
+                    for line in inf:
+                        #walk = [hash_map[x] for x in line.strip('\n\t').split('\t')]
+                        walk = [int(x) for x in line.strip('\n\t').split('\t')]
+                        bucket.append(walk)
+                        if len(bucket) == self.batch_size:
+                            yield bucket
+                            bucket = []
+            if len(bucket):
+                yield bucket
+
+    def walk_from_graph(self):
+        def node_generator():
+            if self.train_files is None:
+                while True:
+                    for nodes in self.graph.node_batch_iter(self.batch_size):
+                        yield nodes
+            else:
+                nodes = []
+                while True:
+                    for filename in self.train_files:
+                        with io.open(filename) as inf:
+                            for line in inf:
+                                node = int(line.strip('\n\t'))
+                                nodes.append(node)
+                                if len(nodes) == self.batch_size:
+                                    yield nodes
+                                    nodes = []
+                if len(nodes):
+                    yield nodes
+
+        if "alias" in self.graph.node_feat and "events" in self.graph.node_feat:
+            log.info("Deepwalk using alias sample")
+        for nodes in node_generator():
+            if "alias" in self.graph.node_feat and "events" in self.graph.node_feat:
+                walks = deepwalk_sample(self.graph, nodes, self.walk_len,
+                                        "alias", "events")
+            else:
+                walks = deepwalk_sample(self.graph, nodes, self.walk_len)
+            yield walks
+
+    def walk_generator(self):
+        if self.walkpath_files is not None:
+            for i in self.walk_from_files():
+                yield i
+        else:
+            for i in self.walk_from_graph():
+                yield i
+
+    def __call__(self):
+        np.random.seed(os.getpid())
+        if self.neg_sample_type == "outdegree":
+            outdegree = self.graph.outdegree()
+            distribution = 1. * outdegree / outdegree.sum()
+            alias, events = alias_sample_build_table(distribution)
+        max_len = int(self.batch_size * self.walk_len * (
+            (1 + self.win_size) - 0.3))
+        for walks in self.walk_generator():
+            try:
+                src_list, pos_list = [], []
+                for walk in walks:
+                    s, p = skip_gram_gen_pair(walk, self.win_size)
+                    src_list.append(s[:max_len]), pos_list.append(p[:max_len])
+                src = [s for x in src_list for s in x]
+                pos = [s for x in pos_list for s in x]
+                src = np.array(src, dtype=np.int64),
+                pos = np.array(pos, dtype=np.int64)
+                src, pos = np.reshape(src, [-1, 1, 1]), np.reshape(pos,
+                                                                   [-1, 1, 1])
+
+                neg_sample_size = [len(pos), self.neg_num, 1]
+                if src.shape[0] == 0:
+                    continue
+                if self.neg_sample_type == "average":
+                    negs = np.random.randint(
+                        low=0, high=self.graph.num_nodes, size=neg_sample_size)
+                elif self.neg_sample_type == "outdegree":
+                    negs = alias_sample(neg_sample_size, alias, events)
+                elif self.neg_sample_type == "inbatch":
+                    pass
+                dst = np.concatenate([pos, negs], 1)
+                # [batch_size, 1, 1] [batch_size, neg_num+1, 1]
+                yield src[:max_len], dst[:max_len]
+            except Exception as e:
+                log.exception(e)
--- a/examples/distribute_deepwalk/utils.py
+++ b/examples/distribute_deepwalk/utils.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    Utils file.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+import time
+
+import numpy as np
+from pgl.utils.logger import log
+from pgl.graph import Graph
+from pgl.sample import graph_alias_sample_table
+
+from reader import DeepwalkReader
+import mp_reader
+
+
+def get_file_list(path):
+    filelist = []
+    if os.path.isfile(path):
+        filelist = [path]
+    elif os.path.isdir(path):
+        filelist = [
+            os.path.join(dp, f)
+            for dp, dn, filenames in os.walk(path) for f in filenames
+        ]
+    else:
+        raise ValueError(path + " not supported")
+    return filelist
+
+
+def build_graph(num_nodes, edge_path):
+    filelist = []
+    if os.path.isfile(edge_path):
+        filelist = [edge_path]
+    elif os.path.isdir(edge_path):
+        filelist = [
+            os.path.join(dp, f)
+            for dp, dn, filenames in os.walk(edge_path) for f in filenames
+        ]
+    else:
+        raise ValueError(edge_path + " not supported")
+    edges, edge_weight = [], []
+    for name in filelist:
+        with open(name) as inf:
+            for line in inf:
+                slots = line.strip("\n").split()
+                edges.append([slots[0], slots[1]])
+                edges.append([slots[1], slots[0]])
+                if len(slots) > 2:
+                    edge_weight.extend([float(slots[2]), float(slots[2])])
+    edges = np.array(edges, dtype="int64")
+    assert num_nodes > edges.max(
+    ), "Node id in any edges should be smaller then num_nodes!"
+
+    edge_feat = dict()
+    if len(edge_weight) == len(edges):
+        edge_feat["weight"] = np.array(edge_weight)
+
+    graph = Graph(num_nodes, edges, edge_feat=edge_feat)
+    log.info("Build graph done")
+
+    graph.outdegree()
+
+    del edges, edge_feat
+
+    log.info("Build graph index done")
+    if "weight" in graph.edge_feat:
+        graph.node_feat["alias"], graph.node_feat[
+            "events"] = graph_alias_sample_table(graph, "weight")
+        log.info("Build graph alias sample table done")
+    return graph
+
+
+def build_fake_graph(num_nodes):
+    class FakeGraph():
+        pass
+
+    graph = FakeGraph()
+    graph.num_nodes = num_nodes
+    return graph
+
+
+def build_gen_func(args, graph):
+    num_sample_workers = args.num_sample_workers
+
+    if args.walkpath_files is None or args.walkpath_files == "None":
+        walkpath_files = [None for _ in range(num_sample_workers)]
+    else:
+        files = get_file_list(args.walkpath_files)
+        walkpath_files = [[] for i in range(num_sample_workers)]
+        for idx, f in enumerate(files):
+            walkpath_files[idx % num_sample_workers].append(f)
+
+    if args.train_files is None or args.train_files == "None":
+        train_files = [None for _ in range(num_sample_workers)]
+    else:
+        files = get_file_list(args.train_files)
+        train_files = [[] for i in range(num_sample_workers)]
+        for idx, f in enumerate(files):
+            train_files[idx % num_sample_workers].append(f)
+
+    gen_func_pool = [
+        DeepwalkReader(
+            graph,
+            batch_size=args.batch_size,
+            walk_len=args.walk_len,
+            win_size=args.win_size,
+            neg_num=args.neg_num,
+            neg_sample_type=args.neg_sample_type,
+            walkpath_files=walkpath_files[i],
+            train_files=train_files[i]) for i in range(num_sample_workers)
+    ]
+    if num_sample_workers == 1:
+        gen_func = gen_func_pool[0]
+    else:
+        gen_func = mp_reader.multiprocess_reader(
+            gen_func_pool, use_pipe=True, queue_size=100)
+    return gen_func
+
+
+def test_gen_speed(gen_func):
+    cur_time = time.time()
+    for idx, _ in enumerate(gen_func()):
+        log.info("iter %s: %s s" % (idx, time.time() - cur_time))
+        cur_time = time.time()
+        if idx == 100:
+            break
--- a/docs/source/examples/md/graphsage_examples.md
+++ b/docs/source/examples/md/graphsage_examples.md
-# GraphSage for Large Scale Networks
+# Distribute GraphSAGE in PGL

 [GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) is a general inductive framework that leverages node feature
 information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, GraphSAGE learns a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Based on PGL, we reproduce GraphSAGE algorithm and reach the same level of indicators as the paper in Reddit Dataset. Besides, this is an example of subgraph sampling and training in PGL.

-### Datasets
-The reddit dataset should be downloaded from the following links and placed in directory ```./data```. The details for Reddit Dataset can be found [here](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf).
+For purpose of high scalability, we use redis as distribute graph storage solution and training graphsage against redis server.

- reddit.npz https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
- reddit_adj.npz: https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt
+### Datasets(Quickstart)
+The reddit dataset should be downloaded from [reddit_adj.npz](https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt) and [reddit.npz](https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2Jthe). The details for Reddit Dataset can be found [here](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf).

+Alternatively, reddit dataset has been preprocessed and packed into docker image, which can be instantly pulled using following commands.
+
+```sh
+docker pull githubutilities/reddit_redis_demo:v0.1
+```

 ### Dependencies

- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+```txt
+- paddlepaddle>=1.6
 - pgl
+- scipy
+- redis==2.10.6
+- redis-py-cluster==1.3.6
+```

 ### How to run

-To train a GraphSAGE model on Reddit Dataset, you can just run
+#### 1. Start reddit data service
+
+```sh
+docker run \
+    --net=host \
+    -d --rm \
+    --name reddit_demo \
+    -it githubutilities/reddit_redis_demo:v0.1 \
+    /bin/bash -c "/bin/bash ./before_hook.sh && /bin/bash"
+docker logs -f `docker ps -aqf "name=reddit_demo"`
 ```
- python train.py --use_cuda --epoch 10 --graphsage_type graphsage_mean --normalize --symmetry     
+
+#### 2. training GraphSAGE model
+
+```sh
+python train.py --use_cuda --epoch 10 --graphsage_type graphsage_mean --sample_workers 10
 ```

 #### Hyperparameters
@@ -27,28 +49,11 @@ To train a GraphSAGE model on Reddit Dataset, you can just run
 - epoch: Number of epochs default (10)
 - use_cuda: Use gpu if assign use_cuda. 
 - graphsage_type: We support 4 aggregator types including "graphsage_mean", "graphsage_maxpool", "graphsage_meanpool" and "graphsage_lstm".
- normalize: Normalize the input feature if assign normalize.
 - sample_workers: The number of workers for multiprocessing subgraph sample.
 - lr: Learning rate.
- symmetry: Make the edges symmetric if assign symmetry.
 - batch_size: Batch size.
 - samples_1: The max neighbors for the first hop neighbor sampling. (default: 25)
 - samples_2: The max neighbors for the second hop neighbor sampling. (default: 10)
 - hidden_size: The hidden size of the GraphSAGE models.


-### Performance
-
-We train our models for 200 epochs and report the accuracy on the test dataset.
-
-
-| Aggregator | Accuracy   | Reported in paper |
-| --- | --- | --- |
-| Mean | 95.70% |  95.0% | 
-| Meanpool | 95.60% | 94.8% |
-| Maxpool | 94.95%  | 94.8% |
-| LSTM | 95.13% | 95.4% |
-
-### View the Code
-
-See the code [here](graphsage_examples_code.html).
--- a/examples/distribute_graphsage/data/reddit_index_label.npz
+++ b/examples/distribute_graphsage/data/reddit_index_label.npz
--- a/examples/distribute_graphsage/model.py
+++ b/examples/distribute_graphsage/model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.fluid as fluid
+
+
+def copy_send(src_feat, dst_feat, edge_feat):
+    return src_feat["h"]
+
+
+def mean_recv(feat):
+    return fluid.layers.sequence_pool(feat, pool_type="average")
+
+
+def sum_recv(feat):
+    return fluid.layers.sequence_pool(feat, pool_type="sum")
+
+
+def max_recv(feat):
+    return fluid.layers.sequence_pool(feat, pool_type="max")
+
+
+def lstm_recv(feat):
+    hidden_dim = 128
+    forward, _ = fluid.layers.dynamic_lstm(
+        input=feat, size=hidden_dim * 4, use_peepholes=False)
+    output = fluid.layers.sequence_last_step(forward)
+    return output
+
+
+def graphsage_mean(gw, feature, hidden_size, act, name):
+    msg = gw.send(copy_send, nfeat_list=[("h", feature)])
+    neigh_feature = gw.recv(msg, mean_recv)
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+
+
+def graphsage_meanpool(gw,
+                       feature,
+                       hidden_size,
+                       act,
+                       name,
+                       inner_hidden_size=512):
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+    msg = gw.send(copy_send, nfeat_list=[("h", neigh_feature)])
+    neigh_feature = gw.recv(msg, mean_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+
+
+def graphsage_maxpool(gw,
+                      feature,
+                      hidden_size,
+                      act,
+                      name,
+                      inner_hidden_size=512):
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+    msg = gw.send(copy_send, nfeat_list=[("h", neigh_feature)])
+    neigh_feature = gw.recv(msg, max_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+
+
+def graphsage_lstm(gw, feature, hidden_size, act, name):
+    inner_hidden_size = 128
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+
+    hidden_dim = 128
+    forward_proj = fluid.layers.fc(input=neigh_feature,
+                                   size=hidden_dim * 4,
+                                   bias_attr=False,
+                                   name="lstm_proj")
+    msg = gw.send(copy_send, nfeat_list=[("h", forward_proj)])
+    neigh_feature = gw.recv(msg, lstm_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
--- a/examples/distribute_graphsage/reader.py
+++ b/examples/distribute_graphsage/reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+import pickle as pkl
+import paddle
+import paddle.fluid as fluid
+import socket
+import pgl
+import time
+
+from pgl.utils import mp_reader
+from pgl.utils.logger import log
+from pgl import redis_graph
+
+
+def node_batch_iter(nodes, node_label, batch_size):
+    """node_batch_iter
+    """
+    perm = np.arange(len(nodes))
+    np.random.shuffle(perm)
+    start = 0
+    while start < len(nodes):
+        index = perm[start:start + batch_size]
+        start += batch_size
+        yield nodes[index], node_label[index]
+
+
+def traverse(item):
+    """traverse
+    """
+    if isinstance(item, list) or isinstance(item, np.ndarray):
+        for i in iter(item):
+            for j in traverse(i):
+                yield j
+    else:
+        yield item
+
+
+def flat_node_and_edge(nodes, eids):
+    """flat_node_and_edge
+    """
+    nodes = list(set(traverse(nodes)))
+    eids = list(set(traverse(eids)))
+    return nodes, eids
+
+
+def worker(batch_info, graph_wrapper, samples):
+    """Worker
+    """
+
+    def work():
+        """work
+        """
+        redis_configs = [{
+            "host": socket.gethostbyname(socket.gethostname()),
+            "port": 7430
+        }, ]
+        graph = redis_graph.RedisGraph("sub_graph", redis_configs, 64)
+        first = True
+        for batch_train_samples, batch_train_labels in batch_info:
+            start_nodes = batch_train_samples
+            nodes = start_nodes
+            eids = []
+            eid2edges = {}
+            for max_deg in samples:
+                pred, pred_eid = graph.sample_predecessor(
+                    start_nodes, max_degree=max_deg, return_eids=True)
+                for _dst, _srcs, _eids in zip(start_nodes, pred, pred_eid):
+                    for _src, _eid in zip(_srcs, _eids):
+                        eid2edges[_eid] = (_src, _dst)
+
+                last_nodes = nodes
+                nodes = [nodes, pred]
+                eids = [eids, pred_eid]
+                nodes, eids = flat_node_and_edge(nodes, eids)
+                # Find new nodes
+                start_nodes = list(set(nodes) - set(last_nodes))
+                if len(start_nodes) == 0:
+                    break
+
+            subgraph = graph.subgraph(nodes=nodes, eid=eids, edges=[ eid2edges[e] for e in eids])
+            sub_node_index = subgraph.reindex_from_parrent_nodes(
+                batch_train_samples)
+            feed_dict = graph_wrapper.to_feed(subgraph)
+            feed_dict["node_label"] = np.expand_dims(
+                np.array(
+                    batch_train_labels, dtype="int64"), -1)
+            feed_dict["node_index"] = sub_node_index
+            yield feed_dict
+
+    return work
+
+
+def multiprocess_graph_reader(
+                              graph_wrapper,
+                              samples,
+                              node_index,
+                              batch_size,
+                              node_label,
+                              num_workers=4):
+    """multiprocess_graph_reader
+    """
+
+    def parse_to_subgraph(rd):
+        """parse_to_subgraph
+        """
+
+        def work():
+            """work
+            """
+            last = time.time()
+            for data in rd():
+                this = time.time()
+                feed_dict = data
+                now = time.time()
+                last = now
+                yield feed_dict
+
+        return work
+
+    def reader():
+        """reader"""
+        batch_info = list(
+            node_batch_iter(
+                node_index, node_label, batch_size=batch_size))
+        block_size = int(len(batch_info) / num_workers + 1)
+        reader_pool = []
+        for i in range(num_workers):
+            reader_pool.append(
+                worker(batch_info[block_size * i:block_size * (i + 1)], 
+                       graph_wrapper, samples))
+        multi_process_sample = mp_reader.multiprocess_reader(
+            reader_pool, use_pipe=True, queue_size=1000)
+        r = parse_to_subgraph(multi_process_sample)
+        return paddle.reader.buffered(r, 1000)
+
+    return reader()
+
--- a/examples/distribute_graphsage/requirements.txt
+++ b/examples/distribute_graphsage/requirements.txt
+scipy
+redis==2.10.6
+redis-py-cluster==1.3.6
+
--- a/examples/distribute_graphsage/train.py
+++ b/examples/distribute_graphsage/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import argparse
+import time
+
+import numpy as np
+import scipy.sparse as sp
+from sklearn.preprocessing import StandardScaler
+
+import pgl
+from pgl.utils.logger import log
+from pgl.utils import paddle_helper
+import paddle
+import paddle.fluid as fluid
+import reader
+from model import graphsage_mean, graphsage_meanpool,\
+        graphsage_maxpool, graphsage_lstm
+
+
+def load_data():
+    """
+        data from https://github.com/matenure/FastGCN/issues/8
+        reddit.npz: https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
+        reddit_index_label is preprocess from reddit.npz without feats key.
+    """
+    data_dir = os.path.dirname(os.path.abspath(__file__))
+    data = np.load(os.path.join(data_dir, "data/reddit_index_label.npz"))
+
+    num_class = 41
+
+    train_label = data['y_train']
+    val_label = data['y_val']
+    test_label = data['y_test']
+
+    train_index = data['train_index']
+    val_index = data['val_index']
+    test_index = data['test_index']
+
+    return {
+        "train_index": train_index,
+        "train_label": train_label,
+        "val_label": val_label,
+        "val_index": val_index,
+        "test_index": test_index,
+        "test_label": test_label,
+        "num_class": 41
+    }
+
+
+def build_graph_model(graph_wrapper, num_class, k_hop, graphsage_type,
+                      hidden_size):
+    node_index = fluid.layers.data(
+        "node_index", shape=[None], dtype="int64", append_batch_size=False)
+
+    node_label = fluid.layers.data(
+        "node_label", shape=[None, 1], dtype="int64", append_batch_size=False)
+
+    #feature = fluid.layers.gather(feature, graph_wrapper.node_feat['feats'])
+    feature = graph_wrapper.node_feat['feats']
+    feature.stop_gradient = True
+
+    for i in range(k_hop):
+        if graphsage_type == 'graphsage_mean':
+            feature = graphsage_mean(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_mean_%s" % i)
+        elif graphsage_type == 'graphsage_meanpool':
+            feature = graphsage_meanpool(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_meanpool_%s" % i)
+        elif graphsage_type == 'graphsage_maxpool':
+            feature = graphsage_maxpool(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_maxpool_%s" % i)
+        elif graphsage_type == 'graphsage_lstm':
+            feature = graphsage_lstm(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_maxpool_%s" % i)
+        else:
+            raise ValueError("graphsage type %s is not"
+                             " implemented" % graphsage_type)
+
+    feature = fluid.layers.gather(feature, node_index)
+    logits = fluid.layers.fc(feature,
+                             num_class,
+                             act=None,
+                             name='classification_layer')
+    proba = fluid.layers.softmax(logits)
+
+    loss = fluid.layers.softmax_with_cross_entropy(
+        logits=logits, label=node_label)
+    loss = fluid.layers.mean(loss)
+    acc = fluid.layers.accuracy(input=proba, label=node_label, k=1)
+    return loss, acc
+
+
+def run_epoch(batch_iter,
+              exe,
+              program,
+              prefix,
+              model_loss,
+              model_acc,
+              epoch,
+              log_per_step=100):
+    batch = 0
+    total_loss = 0.
+    total_acc = 0.
+    total_sample = 0
+    start = time.time()
+    for batch_feed_dict in batch_iter():
+        batch += 1
+        batch_loss, batch_acc = exe.run(program,
+                                        fetch_list=[model_loss, model_acc],
+                                        feed=batch_feed_dict)
+
+        if batch % log_per_step == 0:
+            log.info("Batch %s %s-Loss %s %s-Acc %s" %
+                     (batch, prefix, batch_loss, prefix, batch_acc))
+
+        num_samples = len(batch_feed_dict["node_index"])
+        total_loss += batch_loss * num_samples
+        total_acc += batch_acc * num_samples
+        total_sample += num_samples
+    end = time.time()
+
+    log.info("%s Epoch %s Loss %.5lf Acc %.5lf Speed(per batch) %.5lf sec" %
+             (prefix, epoch, total_loss / total_sample,
+              total_acc / total_sample, (end - start) / batch))
+
+
+def main(args):
+    data = load_data()
+    log.info("preprocess finish")
+    log.info("Train Examples: %s" % len(data["train_index"]))
+    log.info("Val Examples: %s" % len(data["val_index"]))
+    log.info("Test Examples: %s" % len(data["test_index"]))
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    samples = []
+    if args.samples_1 > 0:
+        samples.append(args.samples_1)
+    if args.samples_2 > 0:
+        samples.append(args.samples_2)
+
+    with fluid.program_guard(train_program, startup_program):
+        graph_wrapper = pgl.graph_wrapper.GraphWrapper(
+            "sub_graph", fluid.CPUPlace(), node_feat=[('feats', [None, 602], np.dtype('float32'))])
+        model_loss, model_acc = build_graph_model(
+            graph_wrapper,
+            num_class=data["num_class"],
+            hidden_size=args.hidden_size,
+            graphsage_type=args.graphsage_type,
+            k_hop=len(samples))
+
+    test_program = train_program.clone(for_test=True)
+
+    with fluid.program_guard(train_program, startup_program):
+        adam = fluid.optimizer.Adam(learning_rate=args.lr)
+        adam.minimize(model_loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    train_iter = reader.multiprocess_graph_reader(
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['train_index'],
+        node_label=data["train_label"])
+
+    val_iter = reader.multiprocess_graph_reader(
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['val_index'],
+        node_label=data["val_label"])
+
+    test_iter = reader.multiprocess_graph_reader(
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['test_index'],
+        node_label=data["test_label"])
+
+    for epoch in range(args.epoch):
+        run_epoch(
+            train_iter,
+            program=train_program,
+            exe=exe,
+            prefix="train",
+            model_loss=model_loss,
+            model_acc=model_acc,
+            log_per_step=1,
+            epoch=epoch)
+
+        run_epoch(
+            val_iter,
+            program=test_program,
+            exe=exe,
+            prefix="val",
+            model_loss=model_loss,
+            model_acc=model_acc,
+            log_per_step=10000,
+            epoch=epoch)
+
+    run_epoch(
+        test_iter,
+        program=test_program,
+        prefix="test",
+        exe=exe,
+        model_loss=model_loss,
+        model_acc=model_acc,
+        log_per_step=10000,
+        epoch=epoch)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='graphsage')
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument(
+        "--normalize", action='store_true', help="normalize features")
+    parser.add_argument(
+        "--symmetry", action='store_true', help="undirect graph")
+    parser.add_argument("--graphsage_type", type=str, default="graphsage_mean")
+    parser.add_argument("--sample_workers", type=int, default=10)
+    parser.add_argument("--epoch", type=int, default=10)
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--batch_size", type=int, default=128)
+    parser.add_argument("--lr", type=float, default=0.01)
+    parser.add_argument("--samples_1", type=int, default=25)
+    parser.add_argument("--samples_2", type=int, default=10)
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/distribute_metapath2vec/README.md
+++ b/examples/distribute_metapath2vec/README.md
+# Distributed metapath2vec, metapath2vec++, multi-metapath2vec++ in PGL
+[metapath2vec](https://ericdongyx.github.io/papers/KDD17-dong-chawla-swami-metapath2vec.pdf) is a algorithm framework for representation learning in heterogeneous networks which contains multiple types of nodes and links. Given a heterogeneous graph, metapath2vec algorithm first generates meta-path-based random walks and then use skipgram model to train a language model. Based on PGL, we reproduce metapath2vec algorithm in distributed mode.
+
+
+### Datasets
+DBLP: The dataset contains 14376 papers (P), 20 conferences (C), 14475 authors (A), and 8920 terms (T). There are 33791 nodes in this dataset.
+You can dowload datasets from [here](https://github.com/librahu/HIN-Datasets-for-Recommendation-and-Network-Embedding)
+
+We use the ```DBLP``` dataset for example. After downloading the dataset, put them, let's say, in ```./data/DBLP/``` .
+
+### Dependencies
+- paddlepaddle>=1.6
+- pgl>=1.0.0
+
+### How to run
+Before training, run the below command to do data preprocessing.
+```sh
+python data_process.py --data_path ./data/DBLP  --output_path ./data/data_processed
+```
+
+We adopt [PaddlePaddle Fleet](https://github.com/PaddlePaddle/Fleet) as our distributed training frameworks. ```config.yaml``` is a configure file for metapath2vec hyperparameters and ```local_config``` is a configure file for parameter servers of PaddlePaddle. By default, we have 2 pservers and 2 trainers. One can use ```cloud_run.sh``` to help startup the parameter servers and model trainers. 
+
+For examples, train metapath2vec in distributed mode on DBLP dataset.
+```sh
+# train metapath2vec in distributed mode.
+sh cloud_run.sh
+
+# multiclass task example
+python multi_class.py --dataset ./data/data_processed/author_label.txt --ckpt_path ./checkpoints/2000 --num_nodes 33791
+
+```
+
+### Model Selection
+Actually, There are 3 models in this example, they are ```metapath2vec```, ```metapath2vec++``` and ```multi_metapath2vec++```. You can select different models by modifying ```config.yaml```.
+
+In order to run ```metapath2vec++``` model, you can easily rewrite the hyper parameter of **neg_sample_type** to **m2v_plus**, then ```metapath2vec++``` model will be selected.
+
+```multi-metapath2vec++``` means that you are not only use a single metapath, instead, you can use several metapaths at the same time to train the model. For example, you might want to use ```c2p-p2a-a2p-p2c``` and  ```p2a-a2p``` simultaneously. Then you can rewrite the below hyper parameters in ```config.yaml```.
+- **neg_sample_type**: "m2v_plus"
+- **walk_mode**: "multi_m2v"
+- **meta_path**: "c2p-p2a-a2p-p2c;p2a-a2p"
+- **first_node_type**: "c;p"
+
+### Hyperparameters
+All the hyper parameters are saved in ```config.yaml``` file. So before training, you can open the config.yaml to modify the hyper parameters as you like.
+
+Some important hyper parameters in ```config.yaml```:
+- **edge_path**: the directory of graph data that you want to load
+- **lr**: learning rate
+- **neg_num**: number of negative samples.
+- **num_walks**: number of walks started from each node
+- **walk_len**: walk length
+- **meta_path**: meta path scheme
--- a/examples/distribute_metapath2vec/cloud_run.sh
+++ b/examples/distribute_metapath2vec/cloud_run.sh
+#!/bin/bash 
+set -x
+mode=${1}
+
+source ./utils.sh
+unset http_proxy https_proxy
+
+source ./local_config
+if [ ! -d ${log_dir} ]; then
+    mkdir ${log_dir}
+fi 
+
+for((i=0;i<${PADDLE_PSERVERS_NUM};i++))
+do
+    echo "start ps server: ${i}"
+    echo $log_dir
+    TRAINING_ROLE="PSERVER" PADDLE_TRAINER_ID=${i} sh job.sh &> $log_dir/pserver.$i.log & 
+done
+sleep 10s 
+for((j=0;j<${PADDLE_TRAINERS_NUM};j++))
+do
+    echo "start ps work: ${j}"
+    TRAINING_ROLE="TRAINER" PADDLE_TRAINER_ID=${j} sh job.sh &> $log_dir/worker.$j.log &
+done
+tail -f $log_dir/worker.0.log
--- a/examples/distribute_metapath2vec/cluster_train.py
+++ b/examples/distribute_metapath2vec/cluster_train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import time
+import os
+import math
+import numpy as np
+
+import paddle.fluid as F
+import paddle.fluid.layers as L
+from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
+from paddle.fluid.transpiler.distribute_transpiler import DistributeTranspilerConfig
+import paddle.fluid.incubate.fleet.base.role_maker as role_maker
+from pgl.utils.logger import log
+
+from model import Metapath2vecModel
+from graph import m2vGraph
+from utils import load_config
+from walker import multiprocess_data_generator
+
+
+def init_role():
+    # reset the place according to role of parameter server
+    training_role = os.getenv("TRAINING_ROLE", "TRAINER")
+    paddle_role = role_maker.Role.WORKER
+    place = F.CPUPlace()
+    if training_role == "PSERVER":
+        paddle_role = role_maker.Role.SERVER
+
+    # set the fleet runtime environment according to configure
+    ports = os.getenv("PADDLE_PORT", "6174").split(",")
+    pserver_ips = os.getenv("PADDLE_PSERVERS").split(",")  # ip,ip...
+    eplist = []
+    if len(ports) > 1:
+        # local debug mode, multi port
+        for port in ports:
+            eplist.append(':'.join([pserver_ips[0], port]))
+    else:
+        # distributed mode, multi ip
+        for ip in pserver_ips:
+            eplist.append(':'.join([ip, ports[0]]))
+
+    pserver_endpoints = eplist  # ip:port,ip:port...
+    worker_num = int(os.getenv("PADDLE_TRAINERS_NUM", "0"))
+    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
+    role = role_maker.UserDefinedRoleMaker(
+        current_id=trainer_id,
+        role=paddle_role,
+        worker_num=worker_num,
+        server_endpoints=pserver_endpoints)
+    fleet.init(role)
+
+
+def optimization(base_lr, loss, train_steps, optimizer='sgd'):
+    decayed_lr = L.learning_rate_scheduler.polynomial_decay(
+        learning_rate=base_lr,
+        decay_steps=train_steps,
+        end_learning_rate=0.0001 * base_lr,
+        power=1.0,
+        cycle=False)
+    if optimizer == 'sgd':
+        optimizer = F.optimizer.SGD(decayed_lr)
+    elif optimizer == 'adam':
+        optimizer = F.optimizer.Adam(decayed_lr, lazy_mode=True)
+    else:
+        raise ValueError
+
+    log.info('learning rate:%f' % (base_lr))
+    #create the DistributeTranspiler configure
+    config = DistributeTranspilerConfig()
+    config.sync_mode = False
+    #config.runtime_split_send_recv = False
+
+    config.slice_var_up = False
+    #create the distributed optimizer
+    optimizer = fleet.distributed_optimizer(optimizer, config)
+    optimizer.minimize(loss)
+
+
+def build_complied_prog(train_program, model_loss):
+    num_threads = int(os.getenv("CPU_NUM", 10))
+    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
+    exec_strategy = F.ExecutionStrategy()
+    exec_strategy.num_threads = num_threads
+    #exec_strategy.use_experimental_executor = True
+    build_strategy = F.BuildStrategy()
+    build_strategy.enable_inplace = True
+    #build_strategy.memory_optimize = True
+    build_strategy.memory_optimize = False
+    build_strategy.remove_unnecessary_lock = False
+    if num_threads > 1:
+        build_strategy.reduce_strategy = F.BuildStrategy.ReduceStrategy.Reduce
+
+    compiled_prog = F.compiler.CompiledProgram(
+        train_program).with_data_parallel(loss_name=model_loss.name)
+    return compiled_prog
+
+
+def train_prog(exe, program, loss, node2vec_pyreader, args, train_steps):
+    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
+    step = 0
+    if not os.path.exists(args.save_path):
+        os.makedirs(args.save_path)
+    while True:
+        try:
+            begin_time = time.time()
+            loss_val, = exe.run(program, fetch_list=[loss])
+            log.info("step %s: loss %.5f speed: %.5f s/step" %
+                     (step, np.mean(loss_val), time.time() - begin_time))
+            step += 1
+        except F.core.EOFException:
+            node2vec_pyreader.reset()
+
+        if step % args.steps_per_save == 0 or step == train_steps:
+            save_path = args.save_path
+            if trainer_id == 0:
+                model_path = os.path.join(save_path, "%s" % step)
+                fleet.save_persistables(exe, model_path)
+
+        if step == train_steps:
+            break
+
+
+def main(args):
+    log.info("start")
+
+    worker_num = int(os.getenv("PADDLE_TRAINERS_NUM", "0"))
+    num_devices = int(os.getenv("CPU_NUM", 10))
+
+    model = Metapath2vecModel(config=args)
+    pyreader = model.pyreader
+    loss = model.forward()
+
+    # init fleet
+    init_role()
+
+    train_steps = math.ceil(args.num_nodes * args.epochs / args.batch_size /
+                            num_devices / worker_num)
+    log.info("Train step: %s" % train_steps)
+
+    real_batch_size = args.batch_size * args.walk_len * args.win_size
+    if args.optimizer == "sgd":
+        args.lr *= real_batch_size
+    optimization(args.lr, loss, train_steps, args.optimizer)
+
+    # init and run server or worker
+    if fleet.is_server():
+        fleet.init_server(args.warm_start_from_dir)
+        fleet.run_server()
+
+    if fleet.is_worker():
+        log.info("start init worker done")
+        fleet.init_worker()
+        #just the worker, load the sample
+        log.info("init worker done")
+
+        exe = F.Executor(F.CPUPlace())
+        exe.run(fleet.startup_program)
+        log.info("Startup done")
+
+        dataset = m2vGraph(args)
+        log.info("Build graph done.")
+
+        data_generator = multiprocess_data_generator(args, dataset)
+
+        cur_time = time.time()
+        for idx, _ in enumerate(data_generator()):
+            log.info("iter %s: %s s" % (idx, time.time() - cur_time))
+            cur_time = time.time()
+            if idx == 100:
+                break
+
+        pyreader.decorate_tensor_provider(data_generator)
+        pyreader.start()
+
+        compiled_prog = build_complied_prog(fleet.main_program, loss)
+        train_prog(exe, compiled_prog, loss, pyreader, args, train_steps)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='metapath2vec')
+    parser.add_argument("-c", "--config", type=str, default="./config.yaml")
+    args = parser.parse_args()
+    config = load_config(args.config)
+    log.info(config)
+    main(config)
--- a/examples/distribute_metapath2vec/config.yaml
+++ b/examples/distribute_metapath2vec/config.yaml
+# graph data config
+edge_path: "./data/data_processed"
+edge_files: "p2a:paper_author.txt,p2c:paper_conference.txt,p2t:paper_type.txt"
+node_types_file: "node_types.txt"
+num_nodes: 37791
+symmetry: True
+
+# skipgram pair data config 
+win_size: 5
+neg_num: 5
+# average; m2v_plus
+neg_sample_type: "average" 
+
+# random walk config
+# m2v; multi_m2v;
+walk_mode: "m2v" 
+meta_path: "c2p-p2a-a2p-p2c"
+first_node_type: "c"
+walk_len: 24
+batch_size: 4
+node_shuffle: True
+node_files: null
+num_sample_workers: 2
+
+# model config
+embed_dim: 64
+is_sparse: True
+# only use when num_nodes > 100,000,000, slower than noraml embedding
+is_distributed: False 
+
+# trainging config
+epochs: 10
+optimizer: "sgd"
+lr: 0.1
+warm_start_from_dir: null
+walkpath_files: "None"
+train_files: "None"
+steps_per_save: 1000
+save_path: "./checkpoints"
+log_dir: "./logs"
+CPU_NUM: 16
--- a/examples/distribute_metapath2vec/data_process.py
+++ b/examples/distribute_metapath2vec/data_process.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Data preprocessing for DBLP dataset"""
+import sys
+import os
+import argparse
+import numpy as np
+from collections import OrderedDict
+
+AUTHOR = 14475
+PAPER = 14376
+CONF = 20
+TYPE = 8920
+LABEL = 4
+
+
+def build_node_types(meta_node, outfile):
+    """build_node_types"""
+    nt_ori2new = {}
+    with open(outfile, 'w') as writer:
+        offset = 0
+        for node_type, num_nodes in meta_node.items():
+            ori_id2new_id = {}
+            for i in range(num_nodes):
+                writer.write("%d\t%s\n" % (offset + i, node_type))
+                ori_id2new_id[i + 1] = offset + i
+            nt_ori2new[node_type] = ori_id2new_id
+            offset += num_nodes
+    return nt_ori2new
+
+
+def remapping_index(args, src_dict, dst_dict, ori_file, new_file):
+    """remapping_index"""
+    ori_file = os.path.join(args.data_path, ori_file)
+    new_file = os.path.join(args.output_path, new_file)
+    with open(ori_file, 'r') as reader, open(new_file, 'w') as writer:
+        for line in reader:
+            slots = line.strip().split()
+            s = int(slots[0])
+            d = int(slots[1])
+            new_s = src_dict[s]
+            new_d = dst_dict[d]
+            writer.write("%d\t%d\n" % (new_s, new_d))
+
+
+def author_label(args, ori_id2pgl_id, ori_file, real_file, new_file):
+    """author_label"""
+    ori_file = os.path.join(args.data_path, ori_file)
+    real_file = os.path.join(args.data_path, real_file)
+    new_file = os.path.join(args.output_path, new_file)
+    real_id2pgl_id = {}
+    with open(ori_file, 'r') as reader:
+        for line in reader:
+            slots = line.strip().split()
+            ori_id = int(slots[0])
+            real_id = int(slots[1])
+            pgl_id = ori_id2pgl_id[ori_id]
+            real_id2pgl_id[real_id] = pgl_id
+
+    with open(real_file, 'r') as reader, open(new_file, 'w') as writer:
+        for line in reader:
+            slots = line.strip().split()
+            real_id = int(slots[0])
+            label = int(slots[1])
+            pgl_id = real_id2pgl_id[real_id]
+            writer.write("%d\t%d\n" % (pgl_id, label))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='DBLP data preprocessing')
+    parser.add_argument(
+        '--data_path',
+        default=None,
+        type=str,
+        help='original data path(default: None)')
+    parser.add_argument(
+        '--output_path',
+        default=None,
+        type=str,
+        help='output path(default: None)')
+    args = parser.parse_args()
+
+    meta_node = OrderedDict()
+    meta_node['a'] = AUTHOR
+    meta_node['p'] = PAPER
+    meta_node['c'] = CONF
+    meta_node['t'] = TYPE
+
+    if not os.path.exists(args.output_path):
+        os.makedirs(args.output_path)
+
+    node_types_file = os.path.join(args.output_path, "node_types.txt")
+    nt_ori2new = build_node_types(meta_node, node_types_file)
+
+    remapping_index(args, nt_ori2new['p'], nt_ori2new['a'], 'paper_author.dat',
+                    'paper_author.txt')
+    remapping_index(args, nt_ori2new['p'], nt_ori2new['c'],
+                    'paper_conference.dat', 'paper_conference.txt')
+    remapping_index(args, nt_ori2new['p'], nt_ori2new['t'], 'paper_type.dat',
+                    'paper_type.txt')
+
+    author_label(args, nt_ori2new['a'], 'author_map_id.dat',
+                 'author_label.dat', 'author_label.txt')
--- a/examples/distribute_metapath2vec/graph.py
+++ b/examples/distribute_metapath2vec/graph.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import sys
+import os
+import numpy as np
+import pickle as pkl
+import tqdm
+import time
+import random
+from pgl.utils.logger import log
+from pgl import heter_graph
+
+
+class m2vGraph(object):
+    """Implemetation of graph in order to sample metapath random walk.
+    """
+
+    def __init__(self, config):
+        self.edge_path = config.edge_path
+        self.num_nodes = config.num_nodes
+        self.symmetry = config.symmetry
+        edge_files = config.edge_files
+        node_types_file = config.node_types_file
+
+        self.edge_file_list = []
+        for pair in edge_files.split(','):
+            e_type, filename = pair.split(':')
+            filename = os.path.join(self.edge_path, filename)
+            self.edge_file_list.append((e_type, filename))
+
+        self.node_types_file = os.path.join(self.edge_path, node_types_file)
+
+        self.build_graph()
+
+    def build_graph(self):
+        """Build pgl heterogeneous graph.
+        """
+        edges_by_types = {}
+        npy = self.edge_file_list[0][1] + ".npy"
+        if os.path.exists(npy):
+            log.info("load data from numpy file")
+
+            for pair in self.edge_file_list:
+                edges_by_types[pair[0]] = np.load(pair[1] + ".npy")
+
+        else:
+            log.info("load data from txt file")
+            for pair in self.edge_file_list:
+                edges_by_types[pair[0]] = self.load_edges(pair[1])
+                #  np.save(pair[1] + ".npy", edges_by_types[pair[0]])
+
+        for e_type, edges in edges_by_types.items():
+            log.info(["number of %s edges: " % e_type, len(edges)])
+
+        if self.symmetry:
+            tmp = {}
+            for key, edges in edges_by_types.items():
+                n_list = key.split('2')
+                re_key = n_list[1] + '2' + n_list[0]
+                tmp[re_key] = edges_by_types[key][:, [1, 0]]
+            edges_by_types.update(tmp)
+
+        log.info(["finished loadding symmetry edges."])
+
+        node_types = self.load_node_types(self.node_types_file)
+
+        assert len(node_types) == self.num_nodes, \
+                "num_nodes should be equal to the length of node_types"
+        log.info(["number of nodes: ", len(node_types)])
+
+        node_features = {
+            'index': np.array([i for i in range(self.num_nodes)]).reshape(
+                -1, 1).astype(np.int64)
+        }
+
+        self.graph = heter_graph.HeterGraph(
+            num_nodes=self.num_nodes,
+            edges=edges_by_types,
+            node_types=node_types,
+            node_feat=node_features)
+
+    def load_edges(self, file_, symmetry=False):
+        """Load edges from file.
+        """
+        edges = []
+        with open(file_, 'r') as reader:
+            for line in reader:
+                items = line.strip().split()
+                src, dst = int(items[0]), int(items[1])
+                edges.append((src, dst))
+                if symmetry:
+                    edges.append((dst, src))
+            edges = np.array(list(set(edges)), dtype=np.int64)
+            #  edges = list(set(edges))
+        return edges
+
+    def load_node_types(self, file_):
+        """Load node types 
+        """
+        node_types = []
+        log.info("node_types_file name: %s" % file_)
+        with open(file_, 'r') as reader:
+            for line in reader:
+                items = line.strip().split()
+                node_id = int(items[0])
+                n_type = items[1]
+                node_types.append((node_id, n_type))
+
+        return node_types
--- a/examples/distribute_metapath2vec/job.sh
+++ b/examples/distribute_metapath2vec/job.sh
+#!/bin/bash
+
+set -x
+source ./utils.sh
+
+export CPU_NUM=$CPU_NUM
+export FLAGS_rpc_deadline=3000000 
+
+export FLAGS_communicator_send_queue_size=1
+export FLAGS_communicator_min_send_grad_num_before_recv=0
+export FLAGS_communicator_max_merge_var_num=1
+export FLAGS_communicator_merge_sparse_grad=0
+
+python -u cluster_train.py -c config.yaml
--- a/examples/distribute_metapath2vec/local_config
+++ b/examples/distribute_metapath2vec/local_config
+#!/bin/bash 
+export PADDLE_TRAINERS_NUM=2
+export PADDLE_PSERVERS_NUM=2
+export PADDLE_PORT=6184,6185
+export PADDLE_PSERVERS="127.0.0.1"
+
--- a/examples/distribute_metapath2vec/model.py
+++ b/examples/distribute_metapath2vec/model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    metapath2vec model.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+import math
+
+import paddle.fluid.layers as L
+import paddle.fluid as F
+
+
+def distributed_embedding(input,
+                          dict_size,
+                          hidden_size,
+                          initializer,
+                          name,
+                          num_part=16,
+                          is_sparse=False,
+                          learning_rate=1.0):
+    _part_size = hidden_size // num_part
+    if hidden_size % num_part != 0:
+        _part_size += 1
+    output_embedding = []
+    p_num = 0
+    while hidden_size > 0:
+        _part_size = min(_part_size, hidden_size)
+        hidden_size -= _part_size
+        print("part", p_num, "size=", (dict_size, _part_size))
+        part_embedding = L.embedding(
+            input=input,
+            size=(dict_size, int(_part_size)),
+            is_sparse=is_sparse,
+            is_distributed=False,
+            param_attr=F.ParamAttr(
+                name=name + '_part%s' % p_num,
+                initializer=initializer,
+                learning_rate=learning_rate))
+        p_num += 1
+        output_embedding.append(part_embedding)
+    return L.concat(output_embedding, -1)
+
+
+class Metapath2vecModel(object):
+    def __init__(self, config, embedding_lr=1.0):
+        self.config = config
+        self.neg_num = self.config.neg_num
+        self.num_nodes = self.config.num_nodes
+        self.embed_dim = self.config.embed_dim
+        self.is_sparse = self.config.is_sparse
+        self.is_distributed = self.config.is_distributed
+        self.embedding_lr = embedding_lr
+
+        self.pyreader = L.py_reader(
+            capacity=70,
+            shapes=[[-1, 1, 1], [-1, self.neg_num + 1, 1]],
+            dtypes=['int64', 'int64'],
+            lod_levels=[0, 0],
+            name='train',
+            use_double_buffer=True)
+
+        bound = 1. / math.sqrt(self.embed_dim)
+        self.embed_init = F.initializer.Uniform(low=-bound, high=bound)
+        self.loss = None
+        max_hidden_size = int(math.pow(2, 31) / 4 / self.num_nodes)
+        self.num_part = int(math.ceil(1. * self.embed_dim / max_hidden_size))
+
+    def forward(self):
+        src, dsts = L.read_file(self.pyreader)
+
+        if self.is_sparse:
+            src = L.reshape(src, [-1, 1])
+            dsts = L.reshape(dsts, [-1, 1])
+
+        if self.num_part is not None and self.num_part != 1 and not self.is_distributed:
+            src_embed = distributed_embedding(
+                src,
+                self.num_nodes,
+                self.embed_dim,
+                self.embed_init,
+                "weight",
+                self.num_part,
+                self.is_sparse,
+                learning_rate=self.embedding_lr)
+
+            dsts_embed = distributed_embedding(
+                dsts,
+                self.num_nodes,
+                self.embed_dim,
+                self.embed_init,
+                "weight",
+                self.num_part,
+                self.is_sparse,
+                learning_rate=self.embedding_lr)
+        else:
+            src_embed = L.embedding(
+                src, (self.num_nodes, self.embed_dim),
+                self.is_sparse,
+                self.is_distributed,
+                param_attr=F.ParamAttr(
+                    name="weight",
+                    learning_rate=self.embedding_lr,
+                    initializer=self.embed_init))
+
+            dsts_embed = L.embedding(
+                dsts, (self.num_nodes, self.embed_dim),
+                self.is_sparse,
+                self.is_distributed,
+                param_attr=F.ParamAttr(
+                    name="weight",
+                    learning_rate=self.embedding_lr,
+                    initializer=self.embed_init))
+
+        if self.is_sparse:
+            src_embed = L.reshape(src_embed, [-1, 1, self.embed_dim])
+            dsts_embed = L.reshape(dsts_embed,
+                                   [-1, self.neg_num + 1, self.embed_dim])
+
+        logits = L.matmul(
+            src_embed, dsts_embed,
+            transpose_y=True)  # [batch_size, 1, neg_num+1]
+
+        pos_label = L.fill_constant_batch_size_like(logits, [-1, 1, 1],
+                                                    "float32", 1)
+        neg_label = L.fill_constant_batch_size_like(
+            logits, [-1, 1, self.neg_num], "float32", 0)
+        label = L.concat([pos_label, neg_label], -1)
+
+        pos_weight = L.fill_constant_batch_size_like(logits, [-1, 1, 1],
+                                                     "float32", self.neg_num)
+        neg_weight = L.fill_constant_batch_size_like(
+            logits, [-1, 1, self.neg_num], "float32", 1)
+        weight = L.concat([pos_weight, neg_weight], -1)
+
+        weight.stop_gradient = True
+        label.stop_gradient = True
+
+        loss = L.sigmoid_cross_entropy_with_logits(logits, label)
+        loss = loss * weight
+        loss = L.reduce_mean(loss)
+        loss = loss * ((self.neg_num + 1) / 2 / self.neg_num)
+        loss.persistable = True
+        self.loss = loss
+        return loss
--- a/examples/distribute_metapath2vec/mp_reader.py
+++ b/examples/distribute_metapath2vec/mp_reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Optimized Multiprocessing Reader for PaddlePaddle
+"""
+import multiprocessing
+import numpy as np
+import time
+
+import paddle.fluid as fluid
+import pyarrow
+
+
+def _serialize_serializable(obj):
+    """Serialize Feed Dict
+    """
+    return {"type": type(obj), "data": obj.__dict__}
+
+
+def _deserialize_serializable(obj):
+    """Deserialize Feed Dict
+    """
+
+    val = obj["type"].__new__(obj["type"])
+    val.__dict__.update(obj["data"])
+    return val
+
+
+context = pyarrow.default_serialization_context()
+
+context.register_type(
+    object,
+    "object",
+    custom_serializer=_serialize_serializable,
+    custom_deserializer=_deserialize_serializable)
+
+
+def serialize_data(data):
+    """serialize_data"""
+    return pyarrow.serialize(data, context=context).to_buffer().to_pybytes()
+
+
+def deserialize_data(data):
+    """deserialize_data"""
+    return pyarrow.deserialize(data, context=context)
+
+
+def multiprocess_reader(readers, use_pipe=True, queue_size=1000):
+    """
+    multiprocess_reader use python multi process to read data from readers
+    and then use multiprocess.Queue or multiprocess.Pipe to merge all
+    data. The process number is equal to the number of input readers, each
+    process call one reader.
+    Multiprocess.Queue require the rw access right to /dev/shm, some
+    platform does not support.
+    you need to create multiple readers first, these readers should be independent
+    to each other so that each process can work independently.
+    An example:
+    .. code-block:: python
+        reader0 = reader(["file01", "file02"])
+        reader1 = reader(["file11", "file12"])
+        reader1 = reader(["file21", "file22"])
+        reader = multiprocess_reader([reader0, reader1, reader2],
+            queue_size=100, use_pipe=False)
+    """
+
+    assert type(readers) is list and len(readers) > 0
+
+    def _read_into_queue(reader, queue):
+        """read_into_queue"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None")
+            queue.put(serialize_data(sample))
+        queue.put(serialize_data(None))
+
+    def queue_reader():
+        """queue_reader"""
+        queue = multiprocessing.Queue(queue_size)
+        for reader in readers:
+            p = multiprocessing.Process(
+                target=_read_into_queue, args=(reader, queue))
+            p.start()
+
+        reader_num = len(readers)
+        finish_num = 0
+        while finish_num < reader_num:
+            sample = deserialize_data(queue.get())
+            if sample is None:
+                finish_num += 1
+            else:
+                yield sample
+
+    def _read_into_pipe(reader, conn):
+        """read_into_pipe"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None!")
+            conn.send(serialize_data(sample))
+        conn.send(serialize_data(None))
+        conn.close()
+
+    def pipe_reader():
+        """pipe_reader"""
+        conns = []
+        for reader in readers:
+            parent_conn, child_conn = multiprocessing.Pipe()
+            conns.append(parent_conn)
+            p = multiprocessing.Process(
+                target=_read_into_pipe, args=(reader, child_conn))
+            p.start()
+
+        reader_num = len(readers)
+        finish_num = 0
+        conn_to_remove = []
+        finish_flag = np.zeros(len(conns), dtype="int32")
+        while finish_num < reader_num:
+            for conn_id, conn in enumerate(conns):
+                if finish_flag[conn_id] > 0:
+                    continue
+                buff = conn.recv()
+                now = time.time()
+                sample = deserialize_data(buff)
+                out = time.time() - now
+                if sample is None:
+                    finish_num += 1
+                    conn.close()
+                    finish_flag[conn_id] = 1
+                else:
+                    yield sample
+
+    if use_pipe:
+        return pipe_reader
+    else:
+        return queue_reader
--- a/examples/distribute_metapath2vec/multi_class.py
+++ b/examples/distribute_metapath2vec/multi_class.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file provides the multi class task for testing the embedding learned by metapath2vec model.
+"""
+import argparse
+import sys
+import os
+import tqdm
+import time
+import math
+import logging
+import random
+import pickle as pkl
+
+import numpy as np
+import sklearn.metrics
+from sklearn.metrics import f1_score
+
+import pgl
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+
+
+def load_data(file_):
+    """Load data for node classification.
+    """
+    words_label = []
+    line_count = 0
+    with open(file_, 'r') as reader:
+        for line in reader:
+            line_count += 1
+            tokens = line.strip().split()
+            word, label = int(tokens[0]), int(tokens[1]) - 1
+            words_label.append((word, label))
+
+    words_label = np.array(words_label, dtype=np.int64)
+    np.random.shuffle(words_label)
+
+    logging.info('%d/%d word_label pairs have been loaded' %
+                 (len(words_label), line_count))
+    return words_label
+
+
+def node_classify_model(config):
+    """Build node classify model.
+    """
+    nodes = fl.data('nodes', shape=[None, 1], dtype='int64')
+    labels = fl.data('labels', shape=[None, 1], dtype='int64')
+
+    embed_nodes = fl.embedding(
+        input=nodes,
+        size=[config.num_nodes, config.embed_dim],
+        param_attr=fluid.ParamAttr(name='weight'))
+
+    embed_nodes.stop_gradient = True
+    probs = fl.fc(input=embed_nodes, size=config.num_labels, act='softmax')
+    predict = fl.argmax(probs, axis=-1)
+    loss = fl.cross_entropy(input=probs, label=labels)
+    loss = fl.reduce_mean(loss)
+
+    return {
+        'loss': loss,
+        'probs': probs,
+        'predict': predict,
+        'labels': labels,
+    }
+
+
+def run_epoch(exe, prog, model, feed_dict, lr):
+    """Run training process of every epoch.
+    """
+    if lr is None:
+        loss, predict = exe.run(prog,
+                                feed=feed_dict,
+                                fetch_list=[model['loss'], model['predict']],
+                                return_numpy=True)
+        lr_ = 0
+    else:
+        loss, predict, lr_ = exe.run(
+            prog,
+            feed=feed_dict,
+            fetch_list=[model['loss'], model['predict'], lr],
+            return_numpy=True)
+
+    macro_f1 = f1_score(feed_dict['labels'], predict, average="macro")
+    micro_f1 = f1_score(feed_dict['labels'], predict, average="micro")
+
+    return {
+        'loss': loss,
+        'pred': predict,
+        'lr': lr_,
+        'macro_f1': macro_f1,
+        'micro_f1': micro_f1
+    }
+
+
+def main(args):
+    """main function for training node classification task.
+    """
+    words_label = load_data(args.dataset)
+
+    # split data for training and testing
+    split_position = int(words_label.shape[0] * args.train_percent)
+    train_words_label = words_label[0:split_position, :]
+    test_words_label = words_label[split_position:, :]
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_prog = fluid.Program()
+    test_prog = fluid.Program()
+    startup_prog = fluid.Program()
+
+    with fluid.program_guard(train_prog, startup_prog):
+        with fluid.unique_name.guard():
+            model = node_classify_model(args)
+
+    test_prog = train_prog.clone(for_test=True)
+
+    with fluid.program_guard(train_prog, startup_prog):
+        lr = fl.polynomial_decay(args.lr, 1000, 0.001)
+        adam = fluid.optimizer.Adam(lr)
+        adam.minimize(model['loss'])
+
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+
+    def existed_params(var):
+        if not isinstance(var, fluid.framework.Parameter):
+            return False
+        return os.path.exists(os.path.join(args.ckpt_path, var.name))
+
+    fluid.io.load_vars(
+        exe, args.ckpt_path, main_program=train_prog, predicate=existed_params)
+    #  load_param(args.ckpt_path, ['content'])
+
+    feed_dict = {}
+    X = train_words_label[:, 0].reshape(-1, 1)
+    labels = train_words_label[:, 1].reshape(-1, 1)
+    logging.info('%d/%d data to train' %
+                 (labels.shape[0], words_label.shape[0]))
+
+    test_feed_dict = {}
+    test_X = test_words_label[:, 0].reshape(-1, 1)
+    test_labels = test_words_label[:, 1].reshape(-1, 1)
+    logging.info('%d/%d data to test' %
+                 (test_labels.shape[0], words_label.shape[0]))
+
+    for epoch in range(args.epochs):
+        feed_dict['nodes'] = X
+        feed_dict['labels'] = labels
+        train_result = run_epoch(exe, train_prog, model, feed_dict, lr)
+
+        test_feed_dict['nodes'] = test_X
+        test_feed_dict['labels'] = test_labels
+
+        test_result = run_epoch(exe, test_prog, model, test_feed_dict, lr=None)
+
+        logging.info(
+            'epoch %d | lr %.4f | train_loss %.5f | train_macro_F1 %.4f | train_micro_F1 %.4f | test_loss %.5f | test_macro_F1 %.4f | test_micro_F1 %.4f'
+            % (epoch, train_result['lr'], train_result['loss'],
+               train_result['macro_f1'], train_result['micro_f1'],
+               test_result['loss'], test_result['macro_f1'],
+               test_result['micro_f1']))
+
+    logging.info(
+        'final_test_macro_f1 score: %.4f | final_test_micro_f1 score: %.4f' %
+        (test_result['macro_f1'], test_result['micro_f1']))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='multi_class')
+    parser.add_argument(
+        '--dataset',
+        default=None,
+        type=str,
+        help='training and testing data file(default: None)')
+    parser.add_argument(
+        '--ckpt_path', default=None, type=str, help='task name(default: None)')
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument(
+        '--train_percent',
+        default=0.5,
+        type=float,
+        help='train_percent(default: 0.5)')
+    parser.add_argument(
+        '--num_labels',
+        default=4,
+        type=int,
+        help='number of labels(default: 4)')
+    parser.add_argument(
+        '--epochs',
+        default=100,
+        type=int,
+        help='number of epochs for training(default: 100)')
+    parser.add_argument(
+        '--lr',
+        default=0.025,
+        type=float,
+        help='learning rate(default: 0.025)')
+    parser.add_argument(
+        '--num_nodes', default=0, type=int, help='number of nodes')
+    parser.add_argument(
+        '--embed_dim',
+        default=64,
+        type=int,
+        help='dimension of embedding(default: 64)')
+    args = parser.parse_args()
+
+    log_format = '%(asctime)s-%(levelname)s-%(name)s: %(message)s'
+    logging.basicConfig(level='INFO', format=log_format)
+
+    main(args)
--- a/examples/distribute_metapath2vec/utils.py
+++ b/examples/distribute_metapath2vec/utils.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Implementation of some helper functions"""
+
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+import time
+import yaml
+import numpy as np
+
+from pgl.utils.logger import log
+
+
+class AttrDict(dict):
+    """Attr dict
+    """
+
+    def __init__(self, d):
+        self.dict = d
+
+    def __getattr__(self, attr):
+        value = self.dict[attr]
+        if isinstance(value, dict):
+            return AttrDict(value)
+        else:
+            return value
+
+    def __str__(self):
+        return str(self.dict)
+
+
+def load_config(config_file):
+    """Load config file"""
+    with open(config_file) as f:
+        if hasattr(yaml, 'FullLoader'):
+            config = yaml.load(f, Loader=yaml.FullLoader)
+        else:
+            config = yaml.load(f)
+
+    return AttrDict(config)
--- a/examples/distribute_metapath2vec/utils.sh
+++ b/examples/distribute_metapath2vec/utils.sh
+
+# parse yaml file 
+function parse_yaml {
+   local prefix=$2
+   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
+   sed -ne "s|^\($s\):|\1|" \
+        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
+        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p"  $1 |
+   awk -F$fs '{
+      indent = length($1)/2;
+      vname[indent] = $2;
+      for (i in vname) {if (i > indent) {delete vname[i]}}
+      if (length($3) > 0) {
+         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
+         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
+      }
+   }'
+}
+
+eval $(parse_yaml "$(dirname "${BASH_SOURCE}")"/config.yaml)
--- a/examples/distribute_metapath2vec/walker.py
+++ b/examples/distribute_metapath2vec/walker.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""doc
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+import time
+import io
+import os
+import numpy as np
+import random
+
+from pgl.utils.logger import log
+from pgl.sample import metapath_randomwalk
+from pgl.graph_kernel import skip_gram_gen_pair
+from pgl.graph_kernel import alias_sample_build_table
+
+from utils import load_config
+from graph import m2vGraph
+import mp_reader
+
+
+class NodeGenerator(object):
+    """Node generator"""
+
+    def __init__(self, config, graph):
+        self.config = config
+        self.graph = graph
+
+        self.batch_size = self.config.batch_size
+        self.shuffle = self.config.node_shuffle
+        self.node_files = self.config.node_files
+        self.first_node_type = self.config.first_node_type
+        self.walk_mode = self.config.walk_mode
+
+    def __call__(self):
+        if self.walk_mode == "m2v":
+            generator = self.m2v_node_generate
+            log.info("node gen mode is : %s" % (self.walk_mode))
+        elif self.walk_mode == "multi_m2v":
+            generator = self.multi_m2v_node_generate
+            log.info("node gen mode is : %s" % (self.walk_mode))
+        elif self.walk_mode == "files":
+            generator = self.files_node_generate
+            log.info("node gen mode is : %s" % (self.walk_mode))
+        else:
+            generator = self.m2v_node_generate
+            log.info("node gen mode is : %s" % (self.walk_mode))
+
+        while True:
+            for nodes in generator():
+                yield nodes
+
+    def m2v_node_generate(self):
+        """m2v_node_generate"""
+        for nodes in self.graph.node_batch_iter(
+                batch_size=self.batch_size,
+                n_type=self.first_node_type,
+                shuffle=self.shuffle):
+            yield nodes
+
+    def multi_m2v_node_generate(self):
+        """multi_m2v_node_generate"""
+        n_type_list = self.first_node_type.split(';')
+        num_n_type = len(n_type_list)
+        node_types = np.unique(self.graph.node_types).tolist()
+
+        node_generators = {}
+        for n_type in node_types:
+            node_generators[n_type] = \
+                    self.graph.node_batch_iter(self.batch_size, n_type=n_type)
+
+        cc = 0
+        while True:
+            idx = cc % num_n_type
+            n_type = n_type_list[idx]
+            try:
+                nodes = next(node_generators[n_type])
+            except StopIteration as e:
+                log.info("node type of %s iteration finished in one epoch" %
+                         (n_type))
+                node_generators[n_type] = \
+                        self.graph.node_batch_iter(self.batch_size, n_type=n_type)
+                break
+            yield (nodes, idx)
+            cc += 1
+
+    def files_node_generate(self):
+        """files_node_generate"""
+        nodes = []
+        for filename in self.node_files:
+            with io.open(filename) as inf:
+                for line in inf:
+                    node = int(line.strip('\n\t'))
+                    nodes.append(node)
+                    if len(nodes) == self.batch_size:
+                        yield nodes
+                        nodes = []
+        if len(nodes):
+            yield nodes
+
+
+class WalkGenerator(object):
+    """Walk generator"""
+
+    def __init__(self, config, dataset):
+        self.config = config
+        self.dataset = dataset
+        self.graph = self.dataset.graph
+        self.walk_mode = self.config.walk_mode
+        self.node_generator = NodeGenerator(self.config, self.graph)
+
+        if self.walk_mode == "multi_m2v":
+            num_path = len(self.config.meta_path.split(';'))
+            num_first_node_type = len(self.config.first_node_type.split(';'))
+            assert num_first_node_type == num_path, \
+                "In [multi_m2v] walk_mode, the number of metapath should be the same \
+                as the number of first_node_type"
+
+            assert num_path > 1, "In [multi_m2v] walk_mode, the number of metapath\
+                    should be greater than 1"
+
+    def __call__(self):
+        np.random.seed(os.getpid())
+        if self.walk_mode == "m2v":
+            walk_generator = self.m2v_walk
+            log.info("walk mode is : %s" % (self.walk_mode))
+        elif self.walk_mode == "multi_m2v":
+            walk_generator = self.multi_m2v_walk
+            log.info("walk mode is : %s" % (self.walk_mode))
+        else:
+            raise ValueError("walk_mode [%s] is not matched" % self.walk_mode)
+
+        for walks in walk_generator():
+            yield walks
+
+    def m2v_walk(self):
+        """Metapath2vec walker"""
+        for nodes in self.node_generator():
+            walks = metapath_randomwalk(
+                self.graph, nodes, self.config.meta_path, self.config.walk_len)
+
+            yield walks
+
+    def multi_m2v_walk(self):
+        """Multi metapath2vec walker"""
+        meta_paths = self.config.meta_path.split(';')
+        for nodes, idx in self.node_generator():
+            walks = metapath_randomwalk(self.graph, nodes, meta_paths[idx],
+                                        self.config.walk_len)
+            yield walks
+
+
+class DataGenerator(object):
+    def __init__(self, config, dataset):
+        self.config = config
+        self.dataset = dataset
+        self.graph = self.dataset.graph
+        self.walk_generator = WalkGenerator(self.config, self.dataset)
+
+    def __call__(self):
+        generator = self.pair_generate
+
+        for src, pos, negs in generator():
+            dst = np.concatenate([pos, negs], 1)
+            yield src, dst
+
+    def pair_generate(self):
+        for walks in self.walk_generator():
+            try:
+                src_list, pos_list = [], []
+                for walk in walks:
+                    s, p = skip_gram_gen_pair(walk, self.config.win_size)
+                    src_list.append(s), pos_list.append(p)
+                src = [s for x in src_list for s in x]
+                pos = [s for x in pos_list for s in x]
+
+                if len(src) == 0:
+                    continue
+
+                negs = self.negative_sample(
+                    src,
+                    pos,
+                    neg_num=self.config.neg_num,
+                    neg_sample_type=self.config.neg_sample_type)
+
+                src = np.array(src, dtype=np.int64).reshape(-1, 1, 1)
+                pos = np.array(pos, dtype=np.int64).reshape(-1, 1, 1)
+
+                yield src, pos, negs
+
+            except Exception as e:
+                log.exception(e)
+
+    def negative_sample(self, src, pos, neg_num, neg_sample_type):
+        if neg_sample_type == "average":
+            neg_sample_size = [len(pos), neg_num, 1]
+            negs = np.random.randint(
+                low=0, high=self.graph.num_nodes, size=neg_sample_size)
+
+        elif neg_sample_type == "m2v_plus":
+            negs = []
+            for s in src:
+                neg = self.graph.sample_nodes(
+                    sample_num=neg_num, n_type=self.graph.node_types[s])
+                negs.append(neg)
+            negs = np.vstack(negs).reshape(-1, neg_num, 1)
+
+        else:  # equal to "average"
+            neg_sample_size = [len(pos), neg_num, 1]
+            negs = np.random.randint(
+                low=0, high=self.graph.num_nodes, size=neg_sample_size)
+
+        negs = negs.astype(np.int64)
+
+        return negs
+
+
+def multiprocess_data_generator(config, dataset):
+    """Multiprocess data generator.
+    """
+    if config.num_sample_workers == 1:
+        data_generator = DataGenerator(config, dataset)
+    else:
+        pool = [
+            DataGenerator(config, dataset)
+            for i in range(config.num_sample_workers)
+        ]
+        data_generator = mp_reader.multiprocess_reader(
+            pool, use_pipe=True, queue_size=100)
+
+    return data_generator
+
+
+if __name__ == "__main__":
+    config_file = "./config.yaml"
+    config = load_config(config_file)
+    dataset = m2vGraph(config)
+    data_generator = multiprocess_data_generator(config, dataset)
+    start = time.time()
+    cc = 0
+    for src, dst in data_generator():
+        log.info(src.shape)
+
+        log.info("time: %.6f" % (time.time() - start))
+        start = time.time()
+        cc += 1
+        if cc == 100:
+            break
--- a/examples/gat/README.md
+++ b/examples/gat/README.md
-# PGL Examples for GAT
+# GAT: Graph Attention Networks

 [Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of indicators as the paper in citation network benchmarks.
 ### Simple example to build single head GAT
@@ -26,24 +26,25 @@ def gat_layer(graph_wrapper, node_feature, hidden_size):
    return output
 ```

+
 ### Datasets

 The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).

 ### Dependencies

- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- paddlepaddle>=1.6
 - pgl

 ### Performance

 We train our models for 200 epochs and report the accuracy on the test dataset.

-| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)|
-| --- | --- | --- |---|
-| Cora | ~83% | 0.0188s | 0.0175s | 
-| Pubmed | ~78% | 0.0449s  | 0.0295s |
-| Citeseer | ~70% | 0.0275 | 0.0253s | 
+| Dataset | Accuracy |
+| --- | --- |
+| Cora | ~83% | 
+| Pubmed | ~78% |
+| Citeseer | ~70% | 

 ### How to run


--- a/examples/gat/train.py
+++ b/examples/gat/train.py
@@ -68,7 +68,7 @@ def main(args):
        node_index = fluid.layers.data(
            "node_index",
            shape=[None, 1],
-            dtype="int32",
+            dtype="int64",
            append_batch_size=False)
        node_label = fluid.layers.data(
            "node_label",
@@ -111,7 +111,7 @@ def main(args):
    for epoch in range(200):
        if epoch >= 3:
            t0 = time.time()
-        feed_dict["node_index"] = np.array(train_index, dtype="int32")
+        feed_dict["node_index"] = np.array(train_index, dtype="int64")
        feed_dict["node_label"] = np.array(train_label, dtype="int64")
        train_loss, train_acc = exe.run(train_program,
                                        feed=feed_dict,
@@ -121,7 +121,7 @@ def main(args):
            time_per_epoch = 1.0 * (time.time() - t0)
            dur.append(time_per_epoch)

-        feed_dict["node_index"] = np.array(val_index, dtype="int32")
+        feed_dict["node_index"] = np.array(val_index, dtype="int64")
        feed_dict["node_label"] = np.array(val_label, dtype="int64")
        val_loss, val_acc = exe.run(test_program,
                                    feed=feed_dict,
@@ -132,7 +132,7 @@ def main(args):
                 "Train Loss: %f " % train_loss + "Train Acc: %f " % train_acc
                 + "Val Loss: %f " % val_loss + "Val Acc: %f " % val_acc)

-    feed_dict["node_index"] = np.array(test_index, dtype="int32")
+    feed_dict["node_index"] = np.array(test_index, dtype="int64")
    feed_dict["node_label"] = np.array(test_label, dtype="int64")
    test_loss, test_acc = exe.run(test_program,
                                  feed=feed_dict,

--- a/examples/gcn/README.md
+++ b/examples/gcn/README.md
-# PGL Examples for GCN
+# GCN: Graph Convolutional Networks

 [Graph Convolutional Network \(GCN\)](https://arxiv.org/abs/1609.02907) is a powerful neural network designed for machine learning on graphs. Based on PGL, we reproduce GCN algorithms and reach the same level of indicators as the paper in citation network benchmarks.

@@ -26,18 +26,18 @@ The datasets contain three citation networks: CORA, PUBMED, CITESEER. The detail

 ### Dependencies

- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- paddlepaddle>=1.6
 - pgl

 ### Performance

 We train our models for 200 epochs and report the accuracy on the test dataset.

-| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)|
-| --- | --- | --- |---|
-| Cora | ~81% | 0.0106s | 0.0104s | 
-| Pubmed | ~79% | 0.0210s  | 0.0154s |
-| Citeseer | ~71% | 0.0175s | 0.0177s | 
+| Dataset | Accuracy |
+| --- | --- |
+| Cora | ~81% | 
+| Pubmed | ~79% |
+| Citeseer | ~71% | 


 ### How to run

--- a/examples/gcn/train.py
+++ b/examples/gcn/train.py
@@ -70,7 +70,7 @@ def main(args):
        node_index = fluid.layers.data(
            "node_index",
            shape=[None, 1],
-            dtype="int32",
+            dtype="int64",
            append_batch_size=False)
        node_label = fluid.layers.data(
            "node_label",
@@ -113,7 +113,7 @@ def main(args):
    for epoch in range(200):
        if epoch >= 3:
            t0 = time.time()
-        feed_dict["node_index"] = np.array(train_index, dtype="int32")
+        feed_dict["node_index"] = np.array(train_index, dtype="int64")
        feed_dict["node_label"] = np.array(train_label, dtype="int64")
        train_loss, train_acc = exe.run(train_program,
                                        feed=feed_dict,
@@ -123,7 +123,7 @@ def main(args):
        if epoch >= 3:
            time_per_epoch = 1.0 * (time.time() - t0)
            dur.append(time_per_epoch)
-        feed_dict["node_index"] = np.array(val_index, dtype="int32")
+        feed_dict["node_index"] = np.array(val_index, dtype="int64")
        feed_dict["node_label"] = np.array(val_label, dtype="int64")
        val_loss, val_acc = exe.run(test_program,
                                    feed=feed_dict,
@@ -134,7 +134,7 @@ def main(args):
                 "Train Loss: %f " % train_loss + "Train Acc: %f " % train_acc
                 + "Val Loss: %f " % val_loss + "Val Acc: %f " % val_acc)

-    feed_dict["node_index"] = np.array(test_index, dtype="int32")
+    feed_dict["node_index"] = np.array(test_index, dtype="int64")
    feed_dict["node_label"] = np.array(test_label, dtype="int64")
    test_loss, test_acc = exe.run(test_program,
                                  feed=feed_dict,

--- a/examples/ges/README.md
+++ b/examples/ges/README.md
+# GES:  Graph Embedding with Side Information
+[Graph Embedding with Side Information](https://arxiv.org/pdf/1803.02349.pdf) is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce ges algorithms.
+## Datasets
+The datasets contain two networks: [BlogCatalog](http://socialcomputing.asu.edu/datasets/BlogCatalog3). 
+## Dependencies
+- paddlepaddle>=1.6
+- pgl>=1.0.0
+
+## How to run
+
+For examples, train ges on cora dataset.
+```sh
+# train deepwalk in distributed mode.
+sh gpu_run.sh
+```
+
+## Hyperparameters
+- dataset: The citation dataset "BlogCatalog".
+- hidden_size: Hidden size of the embedding. 
+- lr: Learning rate. 
+- neg_num: Number of negative samples.
+- epoch: Number of training epoch.
--- a/examples/ges/gpu_run.sh
+++ b/examples/ges/gpu_run.sh
+#!/bin/bash
+
+export FLAGS_sync_nccl_allreduce=1
+export FLAGS_eager_delete_tensor_gb=0
+export FLAGS_fraction_of_gpu_memory_to_use=1
+export NCCL_DEBUG=INFO
+export NCCL_IB_GID_INDEX=3
+export GLOG_v=1
+export GLOG_logtostderr=1
+
+num_nodes=10312
+num_embedding=10351
+num_sample_workers=20
+
+# build train_data
+rm -rf train_data && mkdir -p train_data 
+cd train_data 
+seq 0 $((num_nodes-1)) | shuf | split -l $((num_nodes/num_sample_workers+1))
+cd - 
+
+python3 gpu_train.py --output_path ./output  --epoch 100  --walk_len 40 --win_size 5 --neg_num 5 --batch_size 128 --hidden_size 128 \
+    --num_nodes $num_nodes --num_embedding $num_embedding --num_sample_workers $num_sample_workers --steps_per_save 2000 --dataset "BlogCatalog"
--- a/examples/ges/gpu_train.py
+++ b/examples/ges/gpu_train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" gpu_train
+"""
+import argparse
+import time
+import os
+import glob
+
+import numpy as np
+import paddle.fluid as F
+import paddle.fluid.layers as L
+from pgl.utils.logger import log
+from pgl.graph import Graph
+from pgl.sample import graph_alias_sample_table
+from pgl import data_loader
+
+import mp_reader
+from reader import GESReader
+from model import GESModel
+
+
+def get_file_list(path):
+    """get_file_list
+    """
+    filelist = []
+    if os.path.isfile(path):
+        filelist = [path]
+    elif os.path.isdir(path):
+        filelist = [
+            os.path.join(dp, f)
+            for dp, dn, filenames in os.walk(path) for f in filenames
+        ]
+    else:
+        raise ValueError(path + " not supported")
+    return filelist
+
+
+def build_graph(num_nodes, edge_path, output_path, undigraph=True):
+    """ build_graph
+    """
+    edge_file = os.path.join(output_path, "edge.npy")
+    edge_weight_file = os.path.join(output_path, "edge_weight.npy")
+    alias_file = os.path.join(output_path, "alias.npy")
+    events_file = os.path.join(output_path, "events.npy")
+    if os.path.isfile(edge_file):
+        edges = np.load(edge_file)
+        edge_feat = dict()
+        if os.path.isfile(edge_weight_file):
+            log.info("Loading weight from cache")
+            edge_feat["weight"] = np.load(edge_weight_file, allow_pickle=True)
+        node_feat = dict()
+        if os.path.isfile(alias_file):
+            log.info("Loading alias from cache")
+            node_feat["alias"] = np.load(alias_file, allow_pickle=True)
+        if os.path.isfile(events_file):
+            log.info("Loading events from cache")
+            node_feat["events"] = np.load(events_file, allow_pickle=True)
+    else:
+        filelist = get_file_list(edge_path)
+        edges, edge_weight = [], []
+        log.info("Reading edge files")
+        for name in filelist:
+            with open(name) as inf:
+                for line in inf:
+                    slots = line.strip("\n").split()
+                    edges.append([slots[0], slots[1]])
+                    if len(slots) > 2:
+                        edge_weight.append(slots[2])
+        edges = np.array(edges, dtype="int64")
+        assert num_nodes > edges.max(
+        ), "Node id in any edges should be smaller then num_nodes!"
+
+        log.info("Read edge files done.")
+        edge_feat = dict()
+        node_feat = dict()
+        if len(edge_weight) == len(edges):
+            edge_feat["weight"] = np.array(edge_weight, dtype="float32")
+
+    if undigraph is True:
+        edges = np.concatenate([edges, edges[:, [1, 0]]], 0)
+        if "weight" in edge_feat:
+            edge_feat["weight"] = np.concatenate(
+                [edge_feat["weight"], edge_feat["weight"]],
+                0).astype("float64")
+
+    graph = Graph(num_nodes, edges, node_feat, edge_feat=edge_feat)
+    log.info("Build graph done")
+    graph.outdegree()
+    log.info("Build graph index done")
+    if "weight" in graph.edge_feat and "alias" not in graph.node_feat and "events" not in graph.node_feat:
+        graph.node_feat["alias"], graph.node_feat[
+            "events"] = graph_alias_sample_table(graph, "weight")
+        log.info(
+            "Build graph alias sample table done, and saving alias & evnets cache"
+        )
+        np.save(alias_file, graph.node_feat["alias"])
+        np.save(events_file, graph.node_feat["events"])
+    return graph
+
+
+def optimization(base_lr, loss, train_steps, optimizer='adam'):
+    """ optimization
+    """
+    decayed_lr = L.polynomial_decay(base_lr, train_steps, 0.0001)
+
+    if optimizer == 'sgd':
+        optimizer = F.optimizer.SGD(
+            decayed_lr,
+            regularization=F.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0025))
+    elif optimizer == 'adam':
+        # dont use gpu's lazy mode
+        optimizer = F.optimizer.Adam(decayed_lr)
+    else:
+        raise ValueError
+
+    log.info('learning rate:%f' % (base_lr))
+    optimizer.minimize(loss)
+
+
+def build_gen_func(args, graph, node_feat):
+    """ build_gen_func
+    """
+    num_sample_workers = args.num_sample_workers
+
+    if args.walkpath_files is None:
+        walkpath_files = [None for _ in range(num_sample_workers)]
+    else:
+        files = get_file_list(args.walkpath_files)
+        walkpath_files = [[] for i in range(num_sample_workers)]
+        for idx, f in enumerate(files):
+            walkpath_files[idx % num_sample_workers].append(f)
+
+    if args.train_files is None:
+        train_files = [None for _ in range(num_sample_workers)]
+    else:
+        files = get_file_list(args.train_files)
+        train_files = [[] for i in range(num_sample_workers)]
+        for idx, f in enumerate(files):
+            train_files[idx % num_sample_workers].append(f)
+
+    gen_func_pool = [
+        GESReader(
+            graph,
+            node_feat,
+            batch_size=args.batch_size,
+            walk_len=args.walk_len,
+            win_size=args.win_size,
+            neg_num=args.neg_num,
+            neg_sample_type=args.neg_sample_type,
+            walkpath_files=walkpath_files[i],
+            train_files=train_files[i]) for i in range(num_sample_workers)
+    ]
+    if num_sample_workers == 1:
+        gen_func = gen_func_pool[0]
+    else:
+        gen_func = mp_reader.multiprocess_reader(
+            gen_func_pool, use_pipe=True, queue_size=100)
+    return gen_func
+
+
+def get_parallel_exe(program, loss):
+    """ get_parallel_exe
+    """
+    exec_strategy = F.ExecutionStrategy()
+    exec_strategy.num_threads = 1  #2 for fp32 4 for fp16
+    exec_strategy.use_experimental_executor = True
+    exec_strategy.num_iteration_per_drop_scope = 10  #important shit
+
+    build_strategy = F.BuildStrategy()
+    build_strategy.enable_inplace = True
+    build_strategy.memory_optimize = True
+    build_strategy.remove_unnecessary_lock = True
+
+    #return compiled_prog
+    train_exe = F.ParallelExecutor(
+        use_cuda=True,
+        loss_name=loss.name,
+        build_strategy=build_strategy,
+        exec_strategy=exec_strategy,
+        main_program=program)
+    return train_exe
+
+
+def train(train_exe, exe, program, loss, node2vec_pyreader, args, train_steps):
+    """ train
+    """
+    trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
+    step = 0
+    while True:
+        try:
+            begin_time = time.time()
+            loss_val, = train_exe.run(fetch_list=[loss])
+            log.info("step %s: loss %.5f speed: %.5f s/step" %
+                     (step, np.mean(loss_val), time.time() - begin_time))
+            step += 1
+        except F.core.EOFException:
+            node2vec_pyreader.reset()
+
+        if (step % args.steps_per_save == 0 or
+                step == train_steps) and trainer_id == 0:
+
+            model_save_dir = args.output_path
+            model_path = os.path.join(model_save_dir, str(step))
+            if not os.path.exists(model_save_dir):
+                os.makedirs(model_save_dir)
+            F.io.save_params(exe, model_path, program)
+
+        if step == train_steps:
+            break
+
+
+def test_gen_speed(gen_func):
+    """ test_gen_speed
+    """
+    cur_time = time.time()
+    for idx, _ in enumerate(gen_func()):
+        log.info("iter %s: %s s" % (idx, time.time() - cur_time))
+        cur_time = time.time()
+        if idx == 100:
+            break
+
+
+def main(args):
+    """ main
+    """
+    import logging
+    log.setLevel(logging.DEBUG)
+    log.info("start")
+
+    if args.dataset is not None:
+        if args.dataset == "BlogCatalog":
+            graph = data_loader.BlogCatalogDataset().graph
+        else:
+            raise ValueError(args.dataset + " dataset doesn't exists")
+        log.info("Load buildin BlogCatalog dataset done.")
+        node_feat = np.expand_dims(graph.node_feat["group_id"].argmax(-1),
+                                   -1) + graph.num_nodes
+        args.num_nodes = graph.num_nodes
+        args.num_embedding = graph.num_nodes + graph.node_feat[
+            "group_id"].shape[-1]
+    else:
+        graph = build_graph(args.num_nodes, args.edge_path, args.output_path)
+        node_feat = np.load(args.node_feat_npy)
+
+    model = GESModel(args.num_embedding, node_feat.shape[1] + 1,
+                     args.hidden_size, args.neg_num, False, 2)
+    pyreader = model.pyreader
+    loss = model.forward()
+    num_devices = len(F.cuda_places())
+
+    train_steps = int(args.num_nodes * args.epoch / args.batch_size /
+                      num_devices)
+    log.info("Train steps: %s" % train_steps)
+    optimization(args.lr * num_devices, loss, train_steps, args.optimizer)
+
+    place = F.CUDAPlace(0)
+    exe = F.Executor(place)
+    exe.run(F.default_startup_program())
+
+    gen_func = build_gen_func(args, graph, node_feat)
+
+    pyreader.decorate_tensor_provider(gen_func)
+    pyreader.start()
+    train_prog = F.default_main_program()
+    train_exe = get_parallel_exe(train_prog, loss)
+    train(train_exe, exe, train_prog, loss, pyreader, args, train_steps)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Deepwalk')
+    parser.add_argument("--hidden_size", type=int, default=64)
+    parser.add_argument("--lr", type=float, default=0.025)
+    parser.add_argument("--neg_num", type=int, default=5)
+    parser.add_argument("--epoch", type=int, default=100)
+    parser.add_argument("--batch_size", type=int, default=128)
+    parser.add_argument("--walk_len", type=int, default=40)
+    parser.add_argument("--win_size", type=int, default=5)
+    parser.add_argument("--output_path", type=str, default="output")
+    parser.add_argument("--num_sample_workers", type=int, default=1)
+    parser.add_argument("--steps_per_save", type=int, default=3000)
+    parser.add_argument("--num_nodes", type=int, default=10000)
+    parser.add_argument("--num_embedding", type=int, default=10000)
+    parser.add_argument("--edge_path", type=str, default="./graph_data")
+    parser.add_argument("--walkpath_files", type=str, default=None)
+    parser.add_argument("--train_files", type=str, default="./train_data")
+    parser.add_argument("--node_feat_npy", type=str, default="./feat.npy")
+    parser.add_argument("--dataset", type=str, default=None)
+    parser.add_argument(
+        "--neg_sample_type",
+        type=str,
+        default="average",
+        choices=["average", "outdegree"])
+    parser.add_argument(
+        "--optimizer",
+        type=str,
+        required=False,
+        choices=['adam', 'sgd'],
+        default="adam")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/ges/model.py
+++ b/examples/ges/model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    GES model file.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+import math
+
+import paddle.fluid.layers as L
+import paddle.fluid as F
+
+
+def split_embedding(input,
+                    dict_size,
+                    hidden_size,
+                    initializer,
+                    name,
+                    num_part=16,
+                    is_sparse=False,
+                    learning_rate=1.0):
+    """ split_embedding
+    """
+    _part_size = hidden_size // num_part
+    if hidden_size % num_part != 0:
+        _part_size += 1
+    output_embedding = []
+    p_num = 0
+    while hidden_size > 0:
+        _part_size = min(_part_size, hidden_size)
+        hidden_size -= _part_size
+        print("part", p_num, "size=", (dict_size, _part_size))
+        part_embedding = L.embedding(
+            input=input,
+            size=(dict_size, _part_size),
+            is_sparse=is_sparse,
+            is_distributed=False,
+            param_attr=F.ParamAttr(
+                name=name + '_part%s' % p_num,
+                initializer=initializer,
+                learning_rate=learning_rate))
+        p_num += 1
+        output_embedding.append(part_embedding)
+    return L.concat(output_embedding, -1)
+
+
+class GESModel(object):
+    """ GESModel
+    """
+
+    def __init__(self,
+                 num_nodes,
+                 num_featuers,
+                 hidden_size=16,
+                 neg_num=5,
+                 is_sparse=False,
+                 num_part=1):
+        self.pyreader = L.py_reader(
+            capacity=70,
+            shapes=[[-1, 1, num_featuers, 1],
+                    [-1, neg_num + 1, num_featuers, 1]],
+            dtypes=['int64', 'int64'],
+            lod_levels=[0, 0],
+            name='train',
+            use_double_buffer=True)
+
+        self.num_nodes = num_nodes
+        self.num_featuers = num_featuers
+        self.neg_num = neg_num
+        self.embed_init = F.initializer.TruncatedNormal(scale=1.0 /
+                                                        math.sqrt(hidden_size))
+        self.is_sparse = is_sparse
+        self.num_part = num_part
+        self.hidden_size = hidden_size
+        self.loss = None
+
+    def forward(self):
+        """ forward
+        """
+        src, dst = L.read_file(self.pyreader)
+
+        if self.is_sparse:
+            # sparse mode use 2 dims input.
+            src = L.reshape(src, [-1, 1])
+            dst = L.reshape(dst, [-1, 1])
+
+        src_embed = split_embedding(src, self.num_nodes, self.hidden_size,
+                                    self.embed_init, "weight", self.num_part,
+                                    self.is_sparse)
+
+        dst_embed = split_embedding(dst, self.num_nodes, self.hidden_size,
+                                    self.embed_init, "weight", self.num_part,
+                                    self.is_sparse)
+
+        if self.is_sparse:
+            src_embed = L.reshape(
+                src_embed, [-1, 1, self.num_featuers, self.hidden_size])
+            dst_embed = L.reshape(
+                dst_embed,
+                [-1, self.neg_num + 1, self.num_featuers, self.hidden_size])
+
+        src_embed = L.reduce_mean(src_embed, 2)
+        dst_embed = L.reduce_mean(dst_embed, 2)
+
+        logits = L.matmul(
+            src_embed, dst_embed,
+            transpose_y=True)  # [batch_size, 1, neg_num+1]
+
+        pos_label = L.fill_constant_batch_size_like(logits, [-1, 1, 1],
+                                                    "float32", 1)
+        neg_label = L.fill_constant_batch_size_like(
+            logits, [-1, 1, self.neg_num], "float32", 0)
+        label = L.concat([pos_label, neg_label], -1)
+
+        pos_weight = L.fill_constant_batch_size_like(logits, [-1, 1, 1],
+                                                     "float32", self.neg_num)
+        neg_weight = L.fill_constant_batch_size_like(
+            logits, [-1, 1, self.neg_num], "float32", 1)
+        weight = L.concat([pos_weight, neg_weight], -1)
+
+        weight.stop_gradient = True
+        label.stop_gradient = True
+
+        loss = L.sigmoid_cross_entropy_with_logits(logits, label)
+        loss = loss * weight
+        loss = L.reduce_mean(loss)
+        loss = loss * ((self.neg_num + 1) / 2 / self.neg_num)
+        loss.persistable = True
+        self.loss = loss
+        return loss
+
+
+class EGESModel(GESModel):
+    """ EGESModel
+    """
+
+    def forward(self):
+        """ forward
+        """
+        src, dst = L.read_file(self.pyreader)
+
+        src_id = L.slice(src, [0, 1, 2, 3], [0, 0, 0, 0],
+                         [int(math.pow(2, 30)) - 1, 1, 1, 1])
+        dst_id = L.slice(dst, [0, 1, 2, 3], [0, 0, 0, 0],
+                         [int(math.pow(2, 30)) - 1, self.neg_num + 1, 1, 1])
+
+        if self.is_sparse:
+            # sparse mode use 2 dims input.
+            src = L.reshape(src, [-1, 1])
+            dst = L.reshape(dst, [-1, 1])
+
+        # [b, 1, f, h]
+        src_embed = split_embedding(src, self.num_nodes, self.hidden_size,
+                                    self.embed_init, "weight", self.num_part,
+                                    self.is_sparse)
+
+        # [b, n+1, f, h]
+        dst_embed = split_embedding(dst, self.num_nodes, self.hidden_size,
+                                    self.embed_init, "weight", self.num_part,
+                                    self.is_sparse)
+
+        if self.is_sparse:
+            src_embed = L.reshape(
+                src_embed, [-1, 1, self.num_featuers, self.hidden_size])
+            dst_embed = L.reshape(
+                dst_embed,
+                [-1, self.neg_num + 1, self.num_featuers, self.hidden_size])
+
+        # [b, 1, 1, f]
+        src_weight = L.softmax(
+            L.embedding(
+                src_id, [self.num_nodes, self.num_featuers],
+                param_attr=F.ParamAttr(name="alpha")))
+        # [b, n+1, 1, f]
+        dst_weight = L.softmax(
+            L.embedding(
+                dst_id, [self.num_nodes, self.num_featuers],
+                param_attr=F.ParamAttr(name="alpha")))
+
+        # [b, 1, h]
+        src_sum = L.squeeze(L.matmul(src_weight, src_embed), axes=[2])
+        # [b, n+1, h]
+        dst_sum = L.squeeze(L.matmul(dst_weight, dst_embed), axes=[2])
+
+        logits = L.matmul(
+            src_sum, dst_sum, transpose_y=True)  # [batch_size, 1, neg_num+1]
+
+        pos_label = L.fill_constant_batch_size_like(logits, [-1, 1, 1],
+                                                    "float32", 1)
+        neg_label = L.fill_constant_batch_size_like(
+            logits, [-1, 1, self.neg_num], "float32", 0)
+        label = L.concat([pos_label, neg_label], -1)
+
+        pos_weight = L.fill_constant_batch_size_like(logits, [-1, 1, 1],
+                                                     "float32", self.neg_num)
+        neg_weight = L.fill_constant_batch_size_like(
+            logits, [-1, 1, self.neg_num], "float32", 1)
+        weight = L.concat([pos_weight, neg_weight], -1)
+
+        weight.stop_gradient = True
+        label.stop_gradient = True
+
+        loss = L.sigmoid_cross_entropy_with_logits(logits, label)
+        loss = loss * weight
+        loss = L.reduce_mean(loss)
+        loss = loss * ((self.neg_num + 1) / 2 / self.neg_num)
+        loss.persistable = True
+        self.loss = loss
+        return loss
--- a/examples/ges/mp_reader.py
+++ b/examples/ges/mp_reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Optimized Multiprocessing Reader for PaddlePaddle
+"""
+import multiprocessing
+import numpy as np
+import time
+
+import paddle.fluid as fluid
+import pyarrow
+
+
+def _serialize_serializable(obj):
+    """Serialize Feed Dict
+    """
+    return {"type": type(obj), "data": obj.__dict__}
+
+
+def _deserialize_serializable(obj):
+    """Deserialize Feed Dict
+    """
+
+    val = obj["type"].__new__(obj["type"])
+    val.__dict__.update(obj["data"])
+    return val
+
+
+context = pyarrow.default_serialization_context()
+
+context.register_type(
+    object,
+    "object",
+    custom_serializer=_serialize_serializable,
+    custom_deserializer=_deserialize_serializable)
+
+
+def serialize_data(data):
+    """serialize_data"""
+    return pyarrow.serialize(data, context=context).to_buffer().to_pybytes()
+
+
+def deserialize_data(data):
+    """deserialize_data"""
+    return pyarrow.deserialize(data, context=context)
+
+
+def multiprocess_reader(readers, use_pipe=True, queue_size=1000):
+    """
+    multiprocess_reader use python multi process to read data from readers
+    and then use multiprocess.Queue or multiprocess.Pipe to merge all
+    data. The process number is equal to the number of input readers, each
+    process call one reader.
+    Multiprocess.Queue require the rw access right to /dev/shm, some
+    platform does not support.
+    you need to create multiple readers first, these readers should be independent
+    to each other so that each process can work independently.
+    An example:
+    .. code-block:: python
+        reader0 = reader(["file01", "file02"])
+        reader1 = reader(["file11", "file12"])
+        reader1 = reader(["file21", "file22"])
+        reader = multiprocess_reader([reader0, reader1, reader2],
+            queue_size=100, use_pipe=False)
+    """
+
+    assert type(readers) is list and len(readers) > 0
+
+    def _read_into_queue(reader, queue):
+        """read_into_queue"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None")
+            queue.put(serialize_data(sample))
+        queue.put(serialize_data(None))
+
+    def queue_reader():
+        """queue_reader"""
+        queue = multiprocessing.Queue(queue_size)
+        for reader in readers:
+            p = multiprocessing.Process(
+                target=_read_into_queue, args=(reader, queue))
+            p.start()
+
+        reader_num = len(readers)
+        finish_num = 0
+        while finish_num < reader_num:
+            sample = deserialize_data(queue.get())
+            if sample is None:
+                finish_num += 1
+            else:
+                yield sample
+
+    def _read_into_pipe(reader, conn):
+        """read_into_pipe"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None!")
+            conn.send(serialize_data(sample))
+        conn.send(serialize_data(None))
+        conn.close()
+
+    def pipe_reader():
+        """pipe_reader"""
+        conns = []
+        for reader in readers:
+            parent_conn, child_conn = multiprocessing.Pipe()
+            conns.append(parent_conn)
+            p = multiprocessing.Process(
+                target=_read_into_pipe, args=(reader, child_conn))
+            p.start()
+
+        reader_num = len(readers)
+        finish_num = 0
+        conn_to_remove = []
+        finish_flag = np.zeros(len(conns), dtype="int32")
+        while finish_num < reader_num:
+            for conn_id, conn in enumerate(conns):
+                if finish_flag[conn_id] > 0:
+                    continue
+                buff = conn.recv()
+                now = time.time()
+                sample = deserialize_data(buff)
+                out = time.time() - now
+                if sample is None:
+                    finish_num += 1
+                    conn.close()
+                    finish_flag[conn_id] = 1
+                else:
+                    yield sample
+
+    if use_pipe:
+        return pipe_reader
+    else:
+        return queue_reader
--- a/examples/ges/reader.py
+++ b/examples/ges/reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    Reader file.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+import time
+import io
+import os
+
+import numpy as np
+import paddle
+from pgl.utils.logger import log
+from pgl.sample import node2vec_sample
+from pgl.sample import deepwalk_sample
+from pgl.sample import alias_sample
+from pgl.graph_kernel import skip_gram_gen_pair
+from pgl.graph_kernel import alias_sample_build_table
+
+
+class GESReader(object):
+    """ GESReader
+    """
+
+    def __init__(self,
+                 graph,
+                 node_feat,
+                 batch_size=512,
+                 walk_len=40,
+                 win_size=5,
+                 neg_num=5,
+                 train_files=None,
+                 walkpath_files=None,
+                 neg_sample_type="average"):
+        """
+        Args:
+            walkpath_files: if is not None, read walk path from walkpath_files
+        """
+        self.graph = graph
+        self.node_feat = node_feat
+        self.batch_size = batch_size
+        self.walk_len = walk_len
+        self.win_size = win_size
+        self.neg_num = neg_num
+        self.train_files = train_files
+        self.walkpath_files = walkpath_files
+        self.neg_sample_type = neg_sample_type
+
+    def walk_from_files(self):
+        """ walk_from_files
+        """
+        bucket = []
+        while True:
+            for filename in self.walkpath_files:
+                with io.open(filename) as inf:
+                    for line in inf:
+                        walk = [int(x) for x in line.strip('\n\t').split('\t')]
+                        bucket.append(walk)
+                        if len(bucket) == self.batch_size:
+                            yield bucket
+                            bucket = []
+            if len(bucket):
+                yield bucket
+
+    def walk_from_graph(self):
+        """ walk_from_graph
+        """
+
+        def node_generator():
+            """ node_generator
+            """
+            if self.train_files is None:
+                while True:
+                    for nodes in self.graph.node_batch_iter(self.batch_size):
+                        yield nodes
+            else:
+                nodes = []
+                while True:
+                    for filename in self.train_files:
+                        with io.open(filename) as inf:
+                            for line in inf:
+                                node = int(line.strip('\n\t'))
+                                nodes.append(node)
+                                if len(nodes) == self.batch_size:
+                                    yield nodes
+                                    nodes = []
+                if len(nodes):
+                    yield nodes
+
+        if "alias" in self.graph.node_feat and "events" in self.graph.node_feat:
+            log.info("Deepwalk using alias sample")
+        for nodes in node_generator():
+            if "alias" in self.graph.node_feat and "events" in self.graph.node_feat:
+                walks = deepwalk_sample(self.graph, nodes, self.walk_len,
+                                        "alias", "events")
+            else:
+                walks = deepwalk_sample(self.graph, nodes, self.walk_len)
+            yield walks
+
+    def walk_generator(self):
+        """ walk_generator
+        """
+        if self.walkpath_files is not None:
+            for i in self.walk_from_files():
+                yield i
+        else:
+            for i in self.walk_from_graph():
+                yield i
+
+    def __call__(self):
+        np.random.seed(os.getpid())
+        if self.neg_sample_type == "outdegree":
+            outdegree = self.graph.outdegree()
+            distribution = 1. * outdegree / outdegree.sum()
+            alias, events = alias_sample_build_table(distribution)
+        max_len = int(self.batch_size * self.walk_len * (
+            (1 + self.win_size) - 0.3))
+        for walks in self.walk_generator():
+            src, pos = [], []
+            for walk in walks:
+                s, p = skip_gram_gen_pair(walk, self.win_size)
+                src.extend(s), pos.extend(p)
+            src = np.array(src, dtype=np.int64),
+            pos = np.array(pos, dtype=np.int64)
+            src, pos = np.reshape(src, [-1, 1, 1]), np.reshape(pos, [-1, 1, 1])
+
+            if src.shape[0] == 0:
+                continue
+            neg_sample_size = [len(pos), self.neg_num, 1]
+            if self.neg_sample_type == "average":
+                negs = self.graph.sample_nodes(neg_sample_size)
+            elif self.neg_sample_type == "outdegree":
+                negs = alias_sample(neg_sample_size, alias, events)
+            # [batch_size, 1, 1] [batch_size, neg_num+1, 1]
+            dst = np.concatenate([pos, negs], 1)
+            src_feat = np.concatenate([src, self.node_feat[src[:, :, 0]]], -1)
+            dst_feat = np.concatenate([dst, self.node_feat[dst[:, :, 0]]], -1)
+            src_feat, dst_feat = np.expand_dims(src_feat, -1), np.expand_dims(
+                dst_feat, -1)
+            yield src_feat[:max_len], dst_feat[:max_len]
--- a/examples/gin/Dataset.py
+++ b/examples/gin/Dataset.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the dataset for GIN model.
+"""
+
+import os
+import sys
+import numpy as np
+
+from sklearn.model_selection import StratifiedKFold
+
+import pgl
+from pgl.utils.logger import log
+
+
+def fold10_split(dataset, fold_idx=0, seed=0, shuffle=True):
+    """10 fold splitter"""
+    assert 0 <= fold_idx and fold_idx < 10, print(
+        "fold_idx must be from 0 to 9.")
+
+    skf = StratifiedKFold(n_splits=10, shuffle=shuffle, random_state=seed)
+    labels = []
+    for i in range(len(dataset)):
+        g, c = dataset[i]
+        labels.append(c)
+
+    idx_list = []
+    for idx in skf.split(np.zeros(len(labels)), labels):
+        idx_list.append(idx)
+    train_idx, valid_idx = idx_list[fold_idx]
+
+    log.info("train_set : test_set == %d : %d" %
+             (len(train_idx), len(valid_idx)))
+    return Subset(dataset, train_idx), Subset(dataset, valid_idx)
+
+
+def random_split(dataset, split_ratio=0.7, seed=0, shuffle=True):
+    """random splitter"""
+    np.random.seed(seed)
+    indices = list(range(len(dataset)))
+    np.random.shuffle(indices)
+    split = int(split_ratio * len(dataset))
+    train_idx, valid_idx = indices[:split], indices[split:]
+
+    log.info("train_set : test_set == %d : %d" %
+             (len(train_idx), len(valid_idx)))
+    return Subset(dataset, train_idx), Subset(dataset, valid_idx)
+
+
+class BaseDataset(object):
+    """BaseDataset"""
+
+    def __init__(self):
+        pass
+
+    def __getitem__(self, idx):
+        """getitem"""
+        raise NotImplementedError
+
+    def __len__(self):
+        """len"""
+        raise NotImplementedError
+
+
+class Subset(BaseDataset):
+    """
+    Subset of a dataset at specified indices.
+    """
+
+    def __init__(self, dataset, indices):
+        self.dataset = dataset
+        self.indices = indices
+
+    def __getitem__(self, idx):
+        """getitem"""
+        return self.dataset[self.indices[idx]]
+
+    def __len__(self):
+        """len"""
+        return len(self.indices)
+
+
+class GINDataset(BaseDataset):
+    """Dataset for Graph Isomorphism Network (GIN)
+    Adapted from https://github.com/weihua916/powerful-gnns/blob/master/dataset.zip.
+    """
+
+    def __init__(self,
+                 data_path,
+                 dataset_name,
+                 self_loop,
+                 degree_as_nlabel=False):
+        self.data_path = data_path
+        self.dataset_name = dataset_name
+        self.self_loop = self_loop
+        self.degree_as_nlabel = degree_as_nlabel
+
+        self.graph_list = []
+        self.glabel_list = []
+
+        # relabel
+        self.glabel_dict = {}
+        self.nlabel_dict = {}
+        self.elabel_dict = {}
+        self.ndegree_dict = {}
+
+        # global num
+        self.num_graph = 0  # total graphs number
+        self.n = 0  # total nodes number
+        self.m = 0  # total edges number
+
+        # global num of classes
+        self.gclasses = 0
+        self.nclasses = 0
+        self.eclasses = 0
+        self.dim_nfeats = 0
+
+        # flags
+        self.degree_as_nlabel = degree_as_nlabel
+        self.nattrs_flag = False
+        self.nlabels_flag = False
+
+        self._load_data()
+
+    def __len__(self):
+        """return the number of graphs"""
+        return len(self.graph_list)
+
+    def __getitem__(self, idx):
+        """getitem"""
+        return self.graph_list[idx], self.glabel_list[idx]
+
+    def _load_data(self):
+        """Loads dataset
+        """
+        filename = os.path.join(self.data_path, self.dataset_name,
+                                "%s.txt" % self.dataset_name)
+        log.info("loading data from %s" % filename)
+
+        with open(filename, 'r') as reader:
+            # first line --> N, means total number of graphs
+            self.num_graph = int(reader.readline().strip())
+
+            for i in range(self.num_graph):
+                if (i + 1) % int(self.num_graph / 10) == 0:
+                    log.info("processing graph %s" % (i + 1))
+                graph = dict()
+                # second line --> [num_node, label] 
+                # means [node number of a graph, class label of a graph]
+                grow = reader.readline().strip().split()
+                n_nodes, glabel = [int(w) for w in grow]
+
+                # relabel graphs
+                if glabel not in self.glabel_dict:
+                    mapped = len(self.glabel_dict)
+                    self.glabel_dict[glabel] = mapped
+
+                graph['num_nodes'] = n_nodes
+                self.glabel_list.append(self.glabel_dict[glabel])
+
+                nlabels = []
+                node_features = []
+                num_edges = 0
+                edges = []
+
+                for j in range(graph['num_nodes']):
+                    slots = reader.readline().strip().split()
+
+                    # handle edges and node feature(if has)
+                    tmp = int(slots[
+                        1]) + 2  # tmp == 2 + num_edges of current node
+                    if tmp == len(slots):
+                        # no node feature
+                        nrow = [int(w) for w in slots]
+                        nfeat = None
+                    elif tmp < len(slots):
+                        nrow = [int(w) for w in slots[:tmp]]
+                        nfeat = [float(w) for w in slots[tmp:]]
+                        node_features.append(nfeat)
+                    else:
+                        raise Exception('edge number is not correct!')
+
+                    # relabel nodes if is has labels
+                    # if it doesn't have node labels, then every nrow[0] == 0
+                    if not nrow[0] in self.nlabel_dict:
+                        mapped = len(self.nlabel_dict)
+                        self.nlabel_dict[nrow[0]] = mapped
+
+                    nlabels.append(self.nlabel_dict[nrow[0]])
+                    num_edges += nrow[1]
+                    edges.extend([(j, u) for u in nrow[2:]])
+
+                    if self.self_loop:
+                        num_edges += 1
+                        edges.append((j, j))
+
+                if node_features != []:
+                    node_features = np.stack(node_features)
+                    graph['attr'] = node_features
+                    self.nattrs_flag = True
+                else:
+                    node_features = None
+                    graph['attr'] = node_features
+
+                graph['nlabel'] = np.array(
+                    nlabels, dtype="int64").reshape(-1, 1)
+                if len(self.nlabel_dict) > 1:
+                    self.nlabels_flag = True
+
+                graph['edges'] = edges
+                assert num_edges == len(edges)
+
+                g = pgl.graph.Graph(
+                    num_nodes=graph['num_nodes'],
+                    edges=graph['edges'],
+                    node_feat={
+                        'nlabel': graph['nlabel'],
+                        'attr': graph['attr']
+                    })
+
+                self.graph_list.append(g)
+
+                # update statistics of graphs
+                self.n += graph['num_nodes']
+                self.m += num_edges
+
+        # if no attr
+        if not self.nattrs_flag:
+            log.info('there are no node features in this dataset!')
+            label2idx = {}
+            # generate node attr by node degree
+            if self.degree_as_nlabel:
+                log.info('generate node features by node degree...')
+                nlabel_set = set([])
+                for g in self.graph_list:
+
+                    g.node_feat['nlabel'] = g.indegree()
+                    # extracting unique node labels
+                    nlabel_set = nlabel_set.union(set(g.node_feat['nlabel']))
+                    g.node_feat['nlabel'] = g.node_feat['nlabel'].reshape(-1,
+                                                                          1)
+
+                nlabel_set = list(nlabel_set)
+                # in case the labels/degrees are not continuous number
+                self.ndegree_dict = {
+                    nlabel_set[i]: i
+                    for i in range(len(nlabel_set))
+                }
+                label2idx = self.ndegree_dict
+            # generate node attr by node label
+            else:
+                log.info('generate node features by node label...')
+                label2idx = self.nlabel_dict
+
+            for g in self.graph_list:
+                attr = np.zeros((g.num_nodes, len(label2idx)))
+                idx = [
+                    label2idx[tag]
+                    for tag in g.node_feat['nlabel'].reshape(-1, )
+                ]
+                attr[:, idx] = 1
+                g.node_feat['attr'] = attr.astype("float32")
+
+        # after load, get the #classes and #dim
+        self.gclasses = len(self.glabel_dict)
+        self.nclasses = len(self.nlabel_dict)
+        self.eclasses = len(self.elabel_dict)
+        self.dim_nfeats = len(self.graph_list[0].node_feat['attr'][0])
+
+        message = "finished loading data\n"
+        message += """
+                    num_graph: %d
+                    num_graph_class: %d
+                    total_num_nodes: %d
+                    node Classes: %d
+                    node_features_dim: %d
+                    num_edges: %d
+                    edge_classes: %d
+                    Avg. of #Nodes: %.2f
+                    Avg. of #Edges: %.2f
+                    Graph Relabeled: %s
+                    Node Relabeled: %s
+                    Degree Relabeled(If degree_as_nlabel=True): %s""" % (
+            self.num_graph,
+            self.gclasses,
+            self.n,
+            self.nclasses,
+            self.dim_nfeats,
+            self.m,
+            self.eclasses,
+            self.n / self.num_graph,
+            self.m / self.num_graph,
+            self.glabel_dict,
+            self.nlabel_dict,
+            self.ndegree_dict, )
+        log.info(message)
+
+
+if __name__ == "__main__":
+    gindataset = GINDataset(
+        "./dataset/", "MUTAG", self_loop=True, degree_as_nlabel=False)
--- a/examples/gin/README.md
+++ b/examples/gin/README.md
+# Graph Isomorphism Network (GIN)
+
+[Graph Isomorphism Network \(GIN\)](https://arxiv.org/pdf/1810.00826.pdf) is a simple graph neural network that expects to achieve the ability as the Weisfeiler-Lehman graph isomorphism test. Based on PGL, we reproduce the GIN model.
+
+### Datasets
+
+The dataset can be downloaded from [here](https://github.com/weihua916/powerful-gnns/blob/master/dataset.zip)
+
+### Dependencies
+
+- paddlepaddle 1.6
+- pgl 1.0.2
+
+### How to run
+
+For examples, use GPU to train GIN model on MUTAG dataset.
+```
+python main.py --use_cuda --dataset_name MUTAG 
+```
+
+### Hyperparameters
+
+- data\_path: the root path of your dataset 
+- dataset\_name: the name of the dataset
+- fold\_idx: The $fold\_idx^{th}$ fold of dataset splited. Here we use 10 fold cross-validation
+- train\_eps: whether the $\epsilon$ parameter is learnable.
+
+### Experiment results （Accuracy）
+| |MUTAG | COLLAB   | IMDBBINARY | IMDBMULTI |
+|--|-------------|----------|------------|-----------------|
+|PGL result | 90.8           | 78.6 | 76.8     | 50.8          |
+|paper reuslt |90.0           | 80.0 | 75.1     | 52.3          |
--- a/examples/gin/dataloader.py
+++ b/examples/gin/dataloader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the graph dataloader.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+from __future__ import absolute_import
+
+import os
+import sys
+import time
+import argparse
+import numpy as np
+import collections
+
+import paddle
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+import pgl
+from pgl.utils import mp_reader
+from pgl.utils.logger import log
+
+
+def batch_iter(data, batch_size, fid, num_workers):
+    """node_batch_iter
+    """
+    size = len(data)
+    perm = np.arange(size)
+    np.random.shuffle(perm)
+    start = 0
+    cc = 0
+    while start < size:
+        index = perm[start:start + batch_size]
+        start += batch_size
+        cc += 1
+        if cc % num_workers != fid:
+            continue
+        yield data[index]
+
+
+def scan_batch_iter(data, batch_size, fid, num_workers):
+    """scan_batch_iter
+    """
+    batch = []
+    cc = 0
+    for line_example in data.scan():
+        cc += 1
+        if cc % num_workers != fid:
+            continue
+        batch.append(line_example)
+        if len(batch) == batch_size:
+            yield batch
+            batch = []
+
+    if len(batch) > 0:
+        yield batch
+
+
+class GraphDataloader(object):
+    """Graph Dataloader
+    """
+
+    def __init__(
+            self,
+            dataset,
+            batch_size,
+            seed=0,
+            num_workers=1,
+            buf_size=1000,
+            shuffle=True, ):
+
+        self.shuffle = shuffle
+        self.seed = seed
+        self.num_workers = num_workers
+        self.buf_size = buf_size
+        self.batch_size = batch_size
+        self.dataset = dataset
+
+    def batch_fn(self, batch_examples):
+        """ batch_fn batch producer"""
+        graphs = [b[0] for b in batch_examples]
+        labels = [b[1] for b in batch_examples]
+        join_graph = pgl.graph.MultiGraph(graphs)
+        labels = np.array(labels, dtype="int64").reshape(-1, 1)
+        return join_graph, labels
+        #  feed_dict = self.graph_wrapper.to_feed(join_graph)
+
+        #  raise NotImplementedError("No defined Batch Fn")
+
+    def batch_iter(self, fid):
+        """batch_iter"""
+        if self.shuffle:
+            for batch in batch_iter(self, self.batch_size, fid,
+                                    self.num_workers):
+                yield batch
+        else:
+            for batch in scan_batch_iter(self, self.batch_size, fid,
+                                         self.num_workers):
+                yield batch
+
+    def __len__(self):
+        """__len__"""
+        return len(self.dataset)
+
+    def __getitem__(self, idx):
+        """__getitem__"""
+        if isinstance(idx, collections.Iterable):
+            return [self[bidx] for bidx in idx]
+        else:
+            return self.dataset[idx]
+
+    def __iter__(self):
+        """__iter__"""
+
+        def worker(filter_id):
+            def func_run():
+                for batch_examples in self.batch_iter(filter_id):
+                    batch_dict = self.batch_fn(batch_examples)
+                    yield batch_dict
+
+            return func_run
+
+        if self.num_workers == 1:
+            r = paddle.reader.buffered(worker(0), self.buf_size)
+        else:
+            worker_pool = [worker(wid) for wid in range(self.num_workers)]
+            worker = mp_reader.multiprocess_reader(
+                worker_pool, use_pipe=True, queue_size=1000)
+            r = paddle.reader.buffered(worker, self.buf_size)
+
+        for batch in r():
+            yield batch
+
+    def scan(self):
+        """scan"""
+        for example in self.dataset:
+            yield example
--- a/examples/gin/main.py
+++ b/examples/gin/main.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the training process of GIN model.
+"""
+import os
+import sys
+import time
+import argparse
+import numpy as np
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+import pgl
+from pgl.utils.logger import log
+
+from Dataset import GINDataset, fold10_split, random_split
+from dataloader import GraphDataloader
+from model import GINModel
+
+
+def main(args):
+    """main function"""
+    dataset = GINDataset(
+        args.data_path,
+        args.dataset_name,
+        self_loop=not args.train_eps,
+        degree_as_nlabel=True)
+    train_dataset, test_dataset = fold10_split(
+        dataset, fold_idx=args.fold_idx, seed=args.seed)
+
+    train_loader = GraphDataloader(train_dataset, batch_size=args.batch_size)
+    test_loader = GraphDataloader(
+        test_dataset, batch_size=args.batch_size, shuffle=False)
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.GraphWrapper(
+            "gw", place=place, node_feat=dataset[0][0].node_feat_info())
+
+        model = GINModel(args, gw, dataset.gclasses)
+        model.forward()
+
+    infer_program = train_program.clone(for_test=True)
+
+    with fluid.program_guard(train_program, startup_program):
+        epoch_step = int(len(train_dataset) / args.batch_size) + 1
+        boundaries = [
+            i
+            for i in range(50 * epoch_step, args.epochs * epoch_step,
+                           epoch_step * 50)
+        ]
+        values = [args.lr * 0.5**i for i in range(0, len(boundaries) + 1)]
+        lr = fl.piecewise_decay(boundaries=boundaries, values=values)
+        train_op = fluid.optimizer.Adam(lr).minimize(model.loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    # train and evaluate
+    global_step = 0
+    for epoch in range(1, args.epochs + 1):
+        for idx, batch_data in enumerate(train_loader):
+            g, labels = batch_data
+            feed_dict = gw.to_feed(g)
+            feed_dict['labels'] = labels
+            ret_loss, ret_lr, ret_acc = exe.run(
+                train_program,
+                feed=feed_dict,
+                fetch_list=[model.loss, lr, model.acc])
+
+            global_step += 1
+            if global_step % 10 == 0:
+                message = "epoch %d | step %d | " % (epoch, global_step)
+                message += "lr %.6f | loss %.6f | acc %.4f" % (
+                    ret_lr, ret_loss, ret_acc)
+                log.info(message)
+
+        # evaluate
+        result = evaluate(exe, infer_program, model, gw, test_loader)
+
+        message = "evaluating result"
+        for key, value in result.items():
+            message += " | %s %.6f" % (key, value)
+        log.info(message)
+
+
+def evaluate(exe, prog, model, gw, loader):
+    """evaluate"""
+    total_loss = []
+    total_acc = []
+    for idx, batch_data in enumerate(loader):
+        g, labels = batch_data
+        feed_dict = gw.to_feed(g)
+        feed_dict['labels'] = labels
+        ret_loss, ret_acc = exe.run(prog,
+                                    feed=feed_dict,
+                                    fetch_list=[model.loss, model.acc])
+        total_loss.append(ret_loss)
+        total_acc.append(ret_acc)
+
+    total_loss = np.mean(total_loss)
+    total_acc = np.mean(total_acc)
+
+    return {"loss": total_loss, "acc": total_acc}
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--data_path', type=str, default='./dataset')
+    parser.add_argument('--dataset_name', type=str, default='MUTAG')
+    parser.add_argument('--batch_size', type=int, default=32)
+    parser.add_argument('--fold_idx', type=int, default=0)
+    parser.add_argument('--output_path', type=str, default='./outputs/')
+    parser.add_argument('--use_cuda', action='store_true')
+    parser.add_argument('--num_layers', type=int, default=5)
+    parser.add_argument('--num_mlp_layers', type=int, default=2)
+    parser.add_argument('--hidden_size', type=int, default=64)
+    parser.add_argument(
+        '--pool_type',
+        type=str,
+        default="sum",
+        choices=["sum", "average", "max"])
+    parser.add_argument('--train_eps', action='store_true')
+    parser.add_argument('--epochs', type=int, default=350)
+    parser.add_argument('--lr', type=float, default=0.01)
+    parser.add_argument('--dropout_prob', type=float, default=0.5)
+    parser.add_argument('--seed', type=int, default=0)
+    args = parser.parse_args()
+
+    log.info(args)
+    if not os.path.exists(args.output_path):
+        os.makedirs(args.output_path)
+
+    main(args)
--- a/examples/gin/model.py
+++ b/examples/gin/model.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""This file implement the GIN model.
+"""
+
+import numpy as np
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+import pgl
+from pgl.layers.conv import gin
+
+
+class GINModel(object):
+    """GINModel"""
+
+    def __init__(self, args, gw, num_class):
+        self.args = args
+        self.num_layers = self.args.num_layers
+        self.hidden_size = self.args.hidden_size
+        self.train_eps = self.args.train_eps
+        self.pool_type = self.args.pool_type
+        self.dropout_prob = self.args.dropout_prob
+        self.num_class = num_class
+
+        self.gw = gw
+        self.labels = fl.data(name="labels", shape=[None, 1], dtype="int64")
+
+    def forward(self):
+        """forward"""
+        features_list = [self.gw.node_feat["attr"]]
+
+        for i in range(self.num_layers):
+            h = gin(self.gw,
+                    features_list[i],
+                    hidden_size=self.hidden_size,
+                    activation="relu",
+                    name="gin_%s" % (i),
+                    init_eps=0.0,
+                    train_eps=self.train_eps)
+
+            h = fl.batch_norm(h)
+            h = fl.relu(h)
+
+            features_list.append(h)
+
+        output = 0
+        for i, h in enumerate(features_list):
+            pooled_h = pgl.layers.graph_pooling(self.gw, h, self.pool_type)
+            drop_h = fl.dropout(
+                pooled_h,
+                self.dropout_prob,
+                dropout_implementation="upscale_in_train")
+            output += fl.fc(drop_h,
+                            size=self.num_class,
+                            act=None,
+                            param_attr=fluid.ParamAttr(name="final_fc_%s" %
+                                                       (i)))
+
+        # calculate loss
+        self.loss = fl.softmax_with_cross_entropy(output, self.labels)
+        self.loss = fl.reduce_mean(self.loss)
+        self.acc = fl.accuracy(fl.softmax(output), self.labels)
--- a/examples/graphsage/README.md
+++ b/examples/graphsage/README.md
-# GraphSAGE in PGL
+# GraphSAGE: Inductive Representation Learning on Large Graphs

 [GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) is a general inductive framework that leverages node feature
 information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, GraphSAGE learns a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Based on PGL, we reproduce GraphSAGE algorithm and reach the same level of indicators as the paper in Reddit Dataset. Besides, this is an example of subgraph sampling and training in PGL.
@@ -12,16 +12,23 @@ The reddit dataset should be downloaded from the following links and placed in d

 ### Dependencies

- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- paddlepaddle>=1.6
 - pgl

 ### How to run

 To train a GraphSAGE model on Reddit Dataset, you can just run
+
 ```
 python train.py --use_cuda --epoch 10 --graphsage_type graphsage_mean --normalize --symmetry     
 ```

+If you want to train a GraphSAGE model with multiple GPUs, you can just run
+
+```
+CUDA_VISIBLE_DEVICES=0,1 python train_multi.py --use_cuda --epoch 10 --graphsage_type graphsage_mean --normalize --symmetry  --num_trainer 2    
+```
+
 #### Hyperparameters

 - epoch: Number of epochs default (10)

--- a/examples/graphsage/reader.py
+++ b/examples/graphsage/reader.py
@@ -17,12 +17,15 @@ import paddle
 import paddle.fluid as fluid
 import pgl
 import time
+from pgl.utils import mp_reader
 from pgl.utils.logger import log
-import train
 import time
+import copy


 def node_batch_iter(nodes, node_label, batch_size):
+    """node_batch_iter
+    """
    perm = np.arange(len(nodes))
    np.random.shuffle(perm)
    start = 0
@@ -33,6 +36,8 @@ def node_batch_iter(nodes, node_label, batch_size):


 def traverse(item):
+    """traverse
+    """
    if isinstance(item, list) or isinstance(item, np.ndarray):
        for i in iter(item):
            for j in traverse(i):
@@ -41,35 +46,56 @@ def traverse(item):
        yield item


-def flat_node_and_edge(nodes, eids):
+def flat_node_and_edge(nodes):
+    """flat_node_and_edge
+    """
    nodes = list(set(traverse(nodes)))
-    eids = list(set(traverse(eids)))
-    return nodes, eids
+    return nodes


-def worker(batch_info, graph, samples):
+def worker(batch_info, graph, graph_wrapper, samples):
+    """Worker
+    """
+
    def work():
+        """work
+        """
+        _graph_wrapper = copy.copy(graph_wrapper)
+        _graph_wrapper.node_feat_tensor_dict = {}
        for batch_train_samples, batch_train_labels in batch_info:
            start_nodes = batch_train_samples
            nodes = start_nodes
-            eids = []
+            edges = []
            for max_deg in samples:
-                pred, pred_eid = graph.sample_predecessor(
-                    start_nodes, max_degree=max_deg, return_eids=True)
+                pred_nodes = graph.sample_predecessor(
+                    start_nodes, max_degree=max_deg)
+
+                for dst_node, src_nodes in zip(start_nodes, pred_nodes):
+                    for src_node in src_nodes:
+                        edges.append((src_node, dst_node))
+
                last_nodes = nodes
-                nodes = [nodes, pred]
-                eids = [eids, pred_eid]
-                nodes, eids = flat_node_and_edge(nodes, eids)
+                nodes = [nodes, pred_nodes]
+                nodes = flat_node_and_edge(nodes)
                # Find new nodes
                start_nodes = list(set(nodes) - set(last_nodes))
                if len(start_nodes) == 0:
                    break

-            feed_dict = {}
-            feed_dict["nodes"] = [int(n) for n in nodes]
-            feed_dict["eids"] = [int(e) for e in eids]
-            feed_dict["node_label"] = [int(n) for n in batch_train_labels]
-            feed_dict["node_index"] = [int(n) for n in batch_train_samples]
+            subgraph = graph.subgraph(
+                nodes=nodes,
+                edges=edges,
+                with_node_feat=False,
+                with_edge_feat=False)
+
+            sub_node_index = subgraph.reindex_from_parrent_nodes(
+                batch_train_samples)
+            feed_dict = _graph_wrapper.to_feed(subgraph)
+            feed_dict["node_label"] = np.expand_dims(
+                np.array(
+                    batch_train_labels, dtype="int64"), -1)
+            feed_dict["node_index"] = sub_node_index
+            feed_dict["parent_node_index"] = np.array(nodes, dtype="int64")
            yield feed_dict

    return work
@@ -81,27 +107,31 @@ def multiprocess_graph_reader(graph,
                              node_index,
                              batch_size,
                              node_label,
+                              with_parent_node_index=False,
                              num_workers=4):
-    def parse_to_subgraph(rd):
+    """multiprocess_graph_reader
+    """
+
+    def parse_to_subgraph(rd, prefix, node_feat, _with_parent_node_index):
+        """parse_to_subgraph
+        """
+
        def work():
+            """work
+            """
            for data in rd():
-                nodes = data["nodes"]
-                eids = data["eids"]
-                batch_train_labels = data["node_label"]
-                batch_train_samples = data["node_index"]
-                subgraph = graph.subgraph(nodes=nodes, eid=eids)
-                sub_node_index = subgraph.reindex_from_parrent_nodes(
-                    batch_train_samples)
-                feed_dict = graph_wrapper.to_feed(subgraph)
-                feed_dict["node_label"] = np.expand_dims(
-                    np.array(
-                        batch_train_labels, dtype="int64"), -1)
-                feed_dict["node_index"] = sub_node_index
+                feed_dict = data
+                for key in node_feat:
+                    feed_dict[prefix + '/node_feat/' + key] = node_feat[key][
+                        feed_dict["parent_node_index"]]
+                if not _with_parent_node_index:
+                    del feed_dict["parent_node_index"]
                yield feed_dict

        return work

    def reader():
+        """reader"""
        batch_info = list(
            node_batch_iter(
                node_index, node_label, batch_size=batch_size))
@@ -110,44 +140,18 @@ def multiprocess_graph_reader(graph,
        for i in range(num_workers):
            reader_pool.append(
                worker(batch_info[block_size * i:block_size * (i + 1)], graph,
-                       samples))
-        multi_process_sample = paddle.reader.multiprocess_reader(
-            reader_pool, use_pipe=False)
-        r = parse_to_subgraph(multi_process_sample)
-        return paddle.reader.buffered(r, 1000)
+                       graph_wrapper, samples))
+
+        if len(reader_pool) == 1:
+            r = parse_to_subgraph(reader_pool[0],
+                                  repr(graph_wrapper), graph.node_feat,
+                                  with_parent_node_index)
+        else:
+            multi_process_sample = mp_reader.multiprocess_reader(
+                reader_pool, use_pipe=True, queue_size=1000)
+            r = parse_to_subgraph(multi_process_sample,
+                                  repr(graph_wrapper), graph.node_feat,
+                                  with_parent_node_index)
+        return paddle.reader.buffered(r, num_workers)

    return reader()
-
-
-def graph_reader(graph, graph_wrapper, samples, node_index, batch_size,
-                 node_label):
-    def reader():
-        for batch_train_samples, batch_train_labels in node_batch_iter(
-                node_index, node_label, batch_size=batch_size):
-            start_nodes = batch_train_samples
-            nodes = start_nodes
-            eids = []
-            for max_deg in samples:
-                pred, pred_eid = graph.sample_predecessor(
-                    start_nodes, max_degree=max_deg, return_eids=True)
-                last_nodes = nodes
-                nodes = [nodes, pred]
-                eids = [eids, pred_eid]
-                nodes, eids = flat_node_and_edge(nodes, eids)
-                # Find new nodes
-                start_nodes = list(set(nodes) - set(last_nodes))
-                if len(start_nodes) == 0:
-                    break
-
-            subgraph = graph.subgraph(nodes=nodes, eid=eids)
-            feed_dict = graph_wrapper.to_feed(subgraph)
-            sub_node_index = subgraph.reindex_from_parrent_nodes(
-                batch_train_samples)
-
-            feed_dict["node_label"] = np.expand_dims(
-                np.array(
-                    batch_train_labels, dtype="int64"), -1)
-            feed_dict["node_index"] = np.array(sub_node_index, dtype="int32")
-            yield feed_dict
-
-    return paddle.reader.buffered(reader, 1000)
--- a/examples/graphsage/train.py
+++ b/examples/graphsage/train.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import os
 import argparse
 import time

@@ -34,8 +35,9 @@ def load_data(normalize=True, symmetry=True):
        reddit_adj.npz: https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt
        reddit.npz: https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
    """
-    data = np.load("data/reddit.npz")
-    adj = sp.load_npz("data/reddit_adj.npz")
+    data_dir = os.path.dirname(os.path.abspath(__file__))
+    data = np.load(os.path.join(data_dir, "data/reddit.npz"))
+    adj = sp.load_npz(os.path.join(data_dir, "data/reddit_adj.npz"))
    if symmetry:
        adj = adj + adj.T
    adj = adj.tocoo()
@@ -61,10 +63,7 @@ def load_data(normalize=True, symmetry=True):

    log.info("Feature shape %s" % (repr(feature.shape)))
    graph = pgl.graph.Graph(
-        num_nodes=feature.shape[0],
-        edges=list(zip(src, dst)),
-        node_feat={"index": np.arange(
-            0, len(feature), dtype="int32")})
+        num_nodes=feature.shape[0], edges=list(zip(src, dst)))

    return {
        "graph": graph,
@@ -82,12 +81,18 @@ def load_data(normalize=True, symmetry=True):
 def build_graph_model(graph_wrapper, num_class, k_hop, graphsage_type,
                      hidden_size, feature):
    node_index = fluid.layers.data(
-        "node_index", shape=[None], dtype="int32", append_batch_size=False)
+        "node_index", shape=[None], dtype="int64", append_batch_size=False)

    node_label = fluid.layers.data(
        "node_label", shape=[None, 1], dtype="int64", append_batch_size=False)

-    feature = fluid.layers.gather(feature, graph_wrapper.node_feat['index'])
+    parent_node_index = fluid.layers.data(
+        "parent_node_index",
+        shape=[None],
+        dtype="int64",
+        append_batch_size=False)
+
+    feature = fluid.layers.gather(feature, parent_node_index)
    feature.stop_gradient = True

    for i in range(k_hop):
@@ -97,28 +102,28 @@ def build_graph_model(graph_wrapper, num_class, k_hop, graphsage_type,
                feature,
                hidden_size,
                act="relu",
-                name="graphsage_mean_%s % i")
+                name="graphsage_mean_%s" % i)
        elif graphsage_type == 'graphsage_meanpool':
            feature = graphsage_meanpool(
                graph_wrapper,
                feature,
                hidden_size,
                act="relu",
-                name="graphsage_meanpool_%s % i")
+                name="graphsage_meanpool_%s" % i)
        elif graphsage_type == 'graphsage_maxpool':
            feature = graphsage_maxpool(
                graph_wrapper,
                feature,
                hidden_size,
                act="relu",
-                name="graphsage_maxpool_%s % i")
+                name="graphsage_maxpool_%s" % i)
        elif graphsage_type == 'graphsage_lstm':
            feature = graphsage_lstm(
                graph_wrapper,
                feature,
                hidden_size,
                act="relu",
-                name="graphsage_maxpool_%s % i")
+                name="graphsage_maxpool_%s" % i)
        else:
            raise ValueError("graphsage type %s is not"
                             " implemented" % graphsage_type)
@@ -198,7 +203,9 @@ def main(args):
            hide_batch_size=False)

        graph_wrapper = pgl.graph_wrapper.GraphWrapper(
-            "sub_graph", place, node_feat=data['graph'].node_feat_info())
+            "sub_graph",
+            fluid.CPUPlace(),
+            node_feat=data['graph'].node_feat_info())
        model_loss, model_acc = build_graph_model(
            graph_wrapper,
            num_class=data["num_class"],
@@ -217,59 +224,35 @@ def main(args):
    exe.run(startup_program)
    feature_init(place)

-    if args.sample_workers > 1:
-        train_iter = reader.multiprocess_graph_reader(
-            data['graph'],
-            graph_wrapper,
-            samples=samples,
-            num_workers=args.sample_workers,
-            batch_size=args.batch_size,
-            node_index=data['train_index'],
-            node_label=data["train_label"])
-    else:
-        train_iter = reader.graph_reader(
-            data['graph'],
-            graph_wrapper,
-            samples=samples,
-            batch_size=args.batch_size,
-            node_index=data['train_index'],
-            node_label=data["train_label"])
-
-    if args.sample_workers > 1:
-        val_iter = reader.multiprocess_graph_reader(
-            data['graph'],
-            graph_wrapper,
-            samples=samples,
-            num_workers=args.sample_workers,
-            batch_size=args.batch_size,
-            node_index=data['val_index'],
-            node_label=data["val_label"])
-    else:
-        val_iter = reader.graph_reader(
-            data['graph'],
-            graph_wrapper,
-            samples=samples,
-            batch_size=args.batch_size,
-            node_index=data['val_index'],
-            node_label=data["val_label"])
-
-    if args.sample_workers > 1:
-        test_iter = reader.multiprocess_graph_reader(
-            data['graph'],
-            graph_wrapper,
-            samples=samples,
-            num_workers=args.sample_workers,
-            batch_size=args.batch_size,
-            node_index=data['test_index'],
-            node_label=data["test_label"])
-    else:
-        test_iter = reader.graph_reader(
-            data['graph'],
-            graph_wrapper,
-            samples=samples,
-            batch_size=args.batch_size,
-            node_index=data['test_index'],
-            node_label=data["test_label"])
+    train_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        with_parent_node_index=True,
+        node_index=data['train_index'],
+        node_label=data["train_label"])
+
+    val_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        with_parent_node_index=True,
+        node_index=data['val_index'],
+        node_label=data["val_label"])
+
+    test_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        with_parent_node_index=True,
+        node_index=data['test_index'],
+        node_label=data["test_label"])

    for epoch in range(args.epoch):
        run_epoch(

--- a/examples/graphsage/train_multi.py
+++ b/examples/graphsage/train_multi.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import argparse
+import time
+
+import sys
+import traceback
+import numpy as np
+import scipy.sparse as sp
+from sklearn.preprocessing import StandardScaler
+
+import pgl
+from pgl.utils.logger import log
+from pgl.utils import paddle_helper
+import paddle
+import paddle.fluid as fluid
+import reader
+from model import graphsage_mean, graphsage_meanpool,\
+        graphsage_maxpool, graphsage_lstm
+
+
+def load_data(normalize=True, symmetry=True):
+    """
+        data from https://github.com/matenure/FastGCN/issues/8
+        reddit_adj.npz: https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt
+        reddit.npz: https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
+    """
+    data_dir = os.path.dirname(os.path.abspath(__file__))
+    data = np.load(os.path.join(data_dir, "data/reddit.npz"))
+    adj = sp.load_npz(os.path.join(data_dir, "data/reddit_adj.npz"))
+    if symmetry:
+        adj = adj + adj.T
+    adj = adj.tocoo()
+    src = adj.row
+    dst = adj.col
+
+    num_class = 41
+
+    train_label = data['y_train']
+    val_label = data['y_val']
+    test_label = data['y_test']
+
+    train_index = data['train_index']
+    val_index = data['val_index']
+    test_index = data['test_index']
+
+    feature = data["feats"].astype("float32")
+
+    if normalize:
+        scaler = StandardScaler()
+        scaler.fit(feature[train_index])
+        feature = scaler.transform(feature)
+
+    log.info("Feature shape %s" % (repr(feature.shape)))
+    graph = pgl.graph.Graph(
+        num_nodes=feature.shape[0],
+        edges=list(zip(src, dst)),
+        node_feat={"feat": feature.astype("float32")})
+
+    return {
+        "graph": graph,
+        "train_index": train_index,
+        "train_label": train_label,
+        "val_label": val_label,
+        "val_index": val_index,
+        "test_index": test_index,
+        "test_label": test_label,
+        "num_class": 41
+    }
+
+
+def build_graph_model(graph_wrapper, num_class, k_hop, graphsage_type,
+                      hidden_size):
+    """build_graph_model"""
+    node_index = fluid.layers.data(
+        "node_index", shape=[None], dtype="int64", append_batch_size=False)
+
+    node_label = fluid.layers.data(
+        "node_label", shape=[None, 1], dtype="int64", append_batch_size=False)
+
+    feature = graph_wrapper.node_feat["feat"]
+
+    for i in range(k_hop):
+        if graphsage_type == 'graphsage_mean':
+            feature = graphsage_mean(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_mean_%s" % i)
+        elif graphsage_type == 'graphsage_meanpool':
+            feature = graphsage_meanpool(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_meanpool_%s" % i)
+        elif graphsage_type == 'graphsage_maxpool':
+            feature = graphsage_maxpool(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_maxpool_%s" % i)
+        elif graphsage_type == 'graphsage_lstm':
+            feature = graphsage_lstm(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_maxpool_%s" % i)
+        else:
+            raise ValueError("graphsage type %s is not"
+                             " implemented" % graphsage_type)
+
+    feature = fluid.layers.gather(feature, node_index)
+    logits = fluid.layers.fc(feature,
+                             num_class,
+                             act=None,
+                             name='classification_layer')
+    proba = fluid.layers.softmax(logits)
+
+    loss = fluid.layers.softmax_with_cross_entropy(
+        logits=logits, label=node_label)
+    loss = fluid.layers.mean(loss)
+    acc = fluid.layers.accuracy(input=proba, label=node_label, k=1)
+    return loss, acc
+
+
+def to_multidevice(batch_iter, num_trainer):
+    """to_multidevice"""
+    batch_dict = []
+    for batch in batch_iter():
+        batch_dict.append(batch)
+        if len(batch_dict) == num_trainer:
+            yield batch_dict
+            batch_dict = []
+
+    if len(batch_dict) > 0:
+        log.warning("The batch (%s) can't fill all device (%s)"
+                    "which will be discarded." %
+                    (len(batch_dict), num_trainer))
+
+
+def run_epoch(batch_iter,
+              exe,
+              program,
+              prefix,
+              model_loss,
+              model_acc,
+              epoch,
+              log_per_step=100,
+              num_trainer=1):
+    """run_epoch"""
+    batch = 0
+    total_loss = 0.
+    total_acc = 0.
+    total_sample = 0
+    start = time.time()
+    if num_trainer > 1:
+        batch_iter = to_multidevice(batch_iter, num_trainer)
+    else:
+        batch_iter = batch_iter()
+
+    for batch_feed_dict in batch_iter:
+        batch += 1
+        if num_trainer > 1:
+            batch_loss, batch_acc = exe.run(
+                fetch_list=[model_loss.name, model_acc.name],
+                feed=batch_feed_dict)
+
+            batch_loss = np.mean(batch_loss)
+            batch_acc = np.mean(batch_acc)
+        else:
+            batch_loss, batch_acc = exe.run(
+                program,
+                fetch_list=[model_loss.name, model_acc.name],
+                feed=batch_feed_dict)
+
+        if batch % log_per_step == 0:
+            log.info("Batch %s %s-Loss %s %s-Acc %s" %
+                     (batch, prefix, batch_loss, prefix, batch_acc))
+
+        if num_trainer > 1:
+            num_samples = sum(
+                [len(_batch["node_index"]) for _batch in batch_feed_dict])
+        else:
+            num_samples = len(batch_feed_dict["node_index"])
+        total_loss += batch_loss * num_samples
+        total_acc += batch_acc * num_samples
+        total_sample += num_samples
+    end = time.time()
+
+    log.info("%s Epoch %s Loss %.5lf Acc %.5lf Speed(per batch) %.5lf sec" %
+             (prefix, epoch, total_loss / total_sample,
+              total_acc / total_sample, (end - start) / batch))
+
+
+def main(args):
+    """main"""
+    data = load_data(args.normalize, args.symmetry)
+    log.info("preprocess finish")
+    log.info("Train Examples: %s" % len(data["train_index"]))
+    log.info("Val Examples: %s" % len(data["val_index"]))
+    log.info("Test Examples: %s" % len(data["test_index"]))
+    log.info("Num nodes %s" % data["graph"].num_nodes)
+    log.info("Num edges %s" % data["graph"].num_edges)
+    log.info("Average Degree %s" % np.mean(data["graph"].indegree()))
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    samples = []
+    if args.samples_1 > 0:
+        samples.append(args.samples_1)
+    if args.samples_2 > 0:
+        samples.append(args.samples_2)
+
+    with fluid.program_guard(train_program, startup_program):
+        graph_wrapper = pgl.graph_wrapper.GraphWrapper(
+            "sub_graph",
+            fluid.CPUPlace(),
+            node_feat=data['graph'].node_feat_info())
+
+        model_loss, model_acc = build_graph_model(
+            graph_wrapper,
+            num_class=data["num_class"],
+            hidden_size=args.hidden_size,
+            graphsage_type=args.graphsage_type,
+            k_hop=len(samples))
+
+    test_program = train_program.clone(for_test=True)
+
+    with fluid.program_guard(train_program, startup_program):
+        adam = fluid.optimizer.Adam(learning_rate=args.lr)
+        adam.minimize(model_loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    if args.num_trainer > 1:
+        build_strategy = fluid.BuildStrategy()
+        build_strategy.remove_unnecessary_lock = False
+        build_strategy.enable_sequential_execution = True
+
+        train_exe = fluid.ParallelExecutor(
+            use_cuda=args.use_cuda,
+            main_program=train_program,
+            build_strategy=build_strategy,
+            loss_name=model_loss.name)
+    else:
+        train_exe = exe
+
+    train_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['train_index'],
+        node_label=data["train_label"])
+
+    val_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['val_index'],
+        node_label=data["val_label"])
+
+    test_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['test_index'],
+        node_label=data["test_label"])
+
+    for epoch in range(args.epoch):
+        run_epoch(
+            train_iter,
+            program=train_program,
+            exe=train_exe,
+            prefix="train",
+            model_loss=model_loss,
+            model_acc=model_acc,
+            num_trainer=args.num_trainer,
+            epoch=epoch)
+
+        run_epoch(
+            val_iter,
+            program=test_program,
+            exe=exe,
+            prefix="val",
+            model_loss=model_loss,
+            model_acc=model_acc,
+            log_per_step=10000,
+            epoch=epoch)
+
+    run_epoch(
+        test_iter,
+        program=test_program,
+        prefix="test",
+        exe=exe,
+        model_loss=model_loss,
+        model_acc=model_acc,
+        log_per_step=10000,
+        epoch=epoch)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='graphsage')
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument(
+        "--normalize", action='store_true', help="normalize features")
+    parser.add_argument(
+        "--symmetry", action='store_true', help="undirect graph")
+    parser.add_argument("--graphsage_type", type=str, default="graphsage_mean")
+    parser.add_argument("--sample_workers", type=int, default=5)
+    parser.add_argument("--epoch", type=int, default=10)
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--batch_size", type=int, default=128)
+    parser.add_argument("--num_trainer", type=int, default=1)
+    parser.add_argument("--lr", type=float, default=0.01)
+    parser.add_argument("--samples_1", type=int, default=25)
+    parser.add_argument("--samples_2", type=int, default=10)
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/graphsage/train_scale.py
+++ b/examples/graphsage/train_scale.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Multi-GPU settings
+"""
+import argparse
+import time
+
+import numpy as np
+import scipy.sparse as sp
+from sklearn.preprocessing import StandardScaler
+
+import pgl
+from pgl.utils.logger import log
+from pgl.utils import paddle_helper
+import paddle
+import paddle.fluid as fluid
+import reader
+from model import graphsage_mean, graphsage_meanpool,\
+        graphsage_maxpool, graphsage_lstm
+
+
+def fixed_offset(data, num_nodes, scale):
+    """Test
+    """
+    len_data = len(data)
+    len_per_part = int(len_data / scale)
+    offset = np.arange(0, scale, dtype="int64")
+    offset = offset * num_nodes
+    offset = np.repeat(offset, len_per_part)
+    if len(data.shape) > 1:
+        data += offset.reshape([-1, 1])
+    else:
+        data += offset
+
+
+def load_data(normalize=True, symmetry=True, scale=1):
+    """
+        data from https://github.com/matenure/FastGCN/issues/8
+        reddit_adj.npz: https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt
+        reddit.npz: https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
+    """
+    data = np.load("data/reddit.npz")
+    adj = sp.load_npz("data/reddit_adj.npz")
+    if symmetry:
+        adj = adj + adj.T
+    adj = adj.tocoo()
+    src = adj.row.reshape([-1, 1])
+    dst = adj.col.reshape([-1, 1])
+    edges = np.hstack([src, dst])
+
+    num_class = 41
+
+    train_label = data['y_train']
+    val_label = data['y_val']
+    test_label = data['y_test']
+
+    train_index = data['train_index']
+    val_index = data['val_index']
+    test_index = data['test_index']
+
+    feature = data["feats"].astype("float32")
+
+    if normalize:
+        scaler = StandardScaler()
+        scaler.fit(feature[train_index])
+        feature = scaler.transform(feature)
+
+    if scale > 1:
+        num_nodes = feature.shape[0]
+        feature = np.tile(feature, [scale, 1])
+        train_label = np.tile(train_label, [scale])
+        val_label = np.tile(val_label, [scale])
+        test_label = np.tile(test_label, [scale])
+        edges = np.tile(edges, [scale, 1])
+        fixed_offset(edges, num_nodes, scale)
+        train_index = np.tile(train_index, [scale])
+        fixed_offset(train_index, num_nodes, scale)
+        val_index = np.tile(val_index, [scale])
+        fixed_offset(val_index, num_nodes, scale)
+        test_index = np.tile(test_index, [scale])
+        fixed_offset(test_index, num_nodes, scale)
+
+    log.info("Feature shape %s" % (repr(feature.shape)))
+
+    graph = pgl.graph.Graph(
+        num_nodes=feature.shape[0],
+        edges=edges,
+        node_feat={"feature": feature})
+
+    return {
+        "graph": graph,
+        "train_index": train_index,
+        "train_label": train_label,
+        "val_label": val_label,
+        "val_index": val_index,
+        "test_index": test_index,
+        "test_label": test_label,
+        "feature": feature,
+        "num_class": 41
+    }
+
+
+def build_graph_model(graph_wrapper, num_class, k_hop, graphsage_type,
+                      hidden_size, feature):
+    """Test"""
+    node_index = fluid.layers.data(
+        "node_index", shape=[None], dtype="int64", append_batch_size=False)
+
+    node_label = fluid.layers.data(
+        "node_label", shape=[None, 1], dtype="int64", append_batch_size=False)
+
+    for i in range(k_hop):
+        if graphsage_type == 'graphsage_mean':
+            feature = graphsage_mean(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_mean_%s % i")
+        elif graphsage_type == 'graphsage_meanpool':
+            feature = graphsage_meanpool(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_meanpool_%s % i")
+        elif graphsage_type == 'graphsage_maxpool':
+            feature = graphsage_maxpool(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_maxpool_%s % i")
+        elif graphsage_type == 'graphsage_lstm':
+            feature = graphsage_lstm(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_maxpool_%s % i")
+        else:
+            raise ValueError("graphsage type %s is not"
+                             " implemented" % graphsage_type)
+
+    feature = fluid.layers.gather(feature, node_index)
+    logits = fluid.layers.fc(feature,
+                             num_class,
+                             act=None,
+                             name='classification_layer')
+    proba = fluid.layers.softmax(logits)
+
+    loss = fluid.layers.softmax_with_cross_entropy(
+        logits=logits, label=node_label)
+    loss = fluid.layers.mean(loss)
+    acc = fluid.layers.accuracy(input=proba, label=node_label, k=1)
+    return loss, acc
+
+
+def run_epoch(batch_iter,
+              exe,
+              program,
+              prefix,
+              model_loss,
+              model_acc,
+              epoch,
+              log_per_step=100):
+    """Test"""
+    batch = 0
+    total_loss = 0.
+    total_acc = 0.
+    total_sample = 0
+    start = time.time()
+    for batch_feed_dict in batch_iter():
+        batch += 1
+        batch_loss, batch_acc = exe.run(program,
+                                        fetch_list=[model_loss, model_acc],
+                                        feed=batch_feed_dict)
+
+        if batch % log_per_step == 0:
+            log.info("Batch %s %s-Loss %s %s-Acc %s" %
+                     (batch, prefix, batch_loss, prefix, batch_acc))
+
+        num_samples = len(batch_feed_dict["node_index"])
+        total_loss += batch_loss * num_samples
+        total_acc += batch_acc * num_samples
+        total_sample += num_samples
+    end = time.time()
+
+    log.info("%s Epoch %s Loss %.5lf Acc %.5lf Speed(per batch) %.5lf sec" %
+             (prefix, epoch, total_loss / total_sample,
+              total_acc / total_sample, (end - start) / batch))
+
+
+def main(args):
+    """Test """
+    data = load_data(args.normalize, args.symmetry, args.scale)
+    log.info("preprocess finish")
+    log.info("Train Examples: %s" % len(data["train_index"]))
+    log.info("Val Examples: %s" % len(data["val_index"]))
+    log.info("Test Examples: %s" % len(data["test_index"]))
+    log.info("Num nodes %s" % data["graph"].num_nodes)
+    log.info("Num edges %s" % data["graph"].num_edges)
+    log.info("Average Degree %s" % np.mean(data["graph"].indegree()))
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+
+    samples = []
+    if args.samples_1 > 0:
+        samples.append(args.samples_1)
+    if args.samples_2 > 0:
+        samples.append(args.samples_2)
+
+    with fluid.program_guard(train_program, startup_program):
+        graph_wrapper = pgl.graph_wrapper.GraphWrapper(
+            "sub_graph",
+            fluid.CPUPlace(),
+            node_feat=data['graph'].node_feat_info())
+
+        model_loss, model_acc = build_graph_model(
+            graph_wrapper,
+            num_class=data["num_class"],
+            feature=graph_wrapper.node_feat["feature"],
+            hidden_size=args.hidden_size,
+            graphsage_type=args.graphsage_type,
+            k_hop=len(samples))
+
+    test_program = train_program.clone(for_test=True)
+
+    train_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['train_index'],
+        node_label=data["train_label"])
+
+    val_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['val_index'],
+        node_label=data["val_label"])
+
+    test_iter = reader.multiprocess_graph_reader(
+        data['graph'],
+        graph_wrapper,
+        samples=samples,
+        num_workers=args.sample_workers,
+        batch_size=args.batch_size,
+        node_index=data['test_index'],
+        node_label=data["test_label"])
+
+    with fluid.program_guard(train_program, startup_program):
+        adam = fluid.optimizer.Adam(learning_rate=args.lr)
+        adam.minimize(model_loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    for epoch in range(args.epoch):
+        run_epoch(
+            train_iter,
+            program=train_program,
+            exe=exe,
+            prefix="train",
+            model_loss=model_loss,
+            model_acc=model_acc,
+            epoch=epoch)
+
+        run_epoch(
+            val_iter,
+            program=test_program,
+            exe=exe,
+            prefix="val",
+            model_loss=model_loss,
+            model_acc=model_acc,
+            log_per_step=10000,
+            epoch=epoch)
+
+    run_epoch(
+        test_iter,
+        program=test_program,
+        prefix="test",
+        exe=exe,
+        model_loss=model_loss,
+        model_acc=model_acc,
+        log_per_step=10000,
+        epoch=epoch)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='graphsage')
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument(
+        "--normalize", action='store_true', help="normalize features")
+    parser.add_argument(
+        "--symmetry", action='store_true', help="undirect graph")
+    parser.add_argument("--graphsage_type", type=str, default="graphsage_mean")
+    parser.add_argument("--sample_workers", type=int, default=5)
+    parser.add_argument("--epoch", type=int, default=10)
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--batch_size", type=int, default=128)
+    parser.add_argument("--lr", type=float, default=0.01)
+    parser.add_argument("--samples_1", type=int, default=25)
+    parser.add_argument("--samples_2", type=int, default=10)
+    parser.add_argument("--scale", type=int, default=1)
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/line/README.md
+++ b/examples/line/README.md
+# LINE: Large-scale Information Network Embedding
+[LINE](http://www.www2015.it/documents/proceedings/proceedings/p1067.pdf) is an algorithmic framework for embedding very large-scale information networks. It is suitable to a variety of networks including directed, undirected, binary or weighted edges. Based on PGL, we reproduce LINE algorithms and reach the same level of indicators as the paper.
+
+## Datasets
+[Flickr network](http://socialnetworks.mpi-sws.org/data-imc2007.html) is a social network, which contains 1715256 nodes and 22613981 edges.
+
+You can dowload data from [here](http://socialnetworks.mpi-sws.org/data-imc2007.html).
+
+Flickr network contains four files: 
+* flickr-groupmemberships.txt.gz
+* flickr-groups.txt.gz
+* flickr-links.txt.gz
+* flickr-users.txt.gz
+
+After downloading the data，uncompress them, let's say, in **./data/flickr/** . Note that the current directory is the root directory of LINE model.
+
+Then you can run the below command to preprocess the data.
+```sh
+python data_process.py
+```
+
+Then it will produce three files in **./data/flickr/** directory: 
+* nodes.txt
+* edges.txt
+* nodes_label.txt
+
+
+## Dependencies
+- paddlepaddle>=1.6
+- pgl
+
+## How to run
+
+For examples, use gpu to train LINE on Flickr dataset.
+```sh
+# multiclass task example
+python line.py --use_cuda --order first_order --data_path ./data/flickr/ --save_dir ./checkpoints/model/
+
+python multi_class.py --ckpt_path ./checkpoints/model/model_epoch_20 --percent 0.5
+
+```
+
+## Hyperparameters
+
+- -use_cuda: Use gpu if assign use_cuda.
+- -order: LINE with First_order Proximity or Second_order Proximity
+- -percent: The percentage of data as training data
+
+### Experiment results
+Dataset|model|Task|Metric|PGL Result|Reported Result
+--|--|--|--|--|--
+Flickr|LINE with first_order|multi-label classification|MacroF1|0.626|0.627
+Flickr|LINE with first_order|multi-label classification|MicroF1|0.637|0.639
+Flickr|LINE with second_order|multi-label classification|MacroF1|0.615|0.621
+Flickr|LINE with second_order|multi-label classification|MicroF1|0.630|0.635
--- a/examples/line/data_loader.py
+++ b/examples/line/data_loader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file provides the Dataset for LINE model.
+"""
+import os
+import io
+import sys
+import numpy as np
+
+from pgl import graph
+from pgl.utils.logger import log
+
+
+class FlickrDataset(object):
+    """Flickr dataset implementation
+
+    Args:
+        name: The name of the dataset.
+
+        symmetry_edges: Whether to create symmetry edges.
+
+        self_loop:  Whether to contain self loop edges.
+
+        train_percentage: The percentage of nodes to be trained in multi class task.
+
+    Attributes:
+        graph: The :code:`Graph` data object.
+
+        num_groups: Number of classes.
+
+        train_index: The index for nodes in training set.
+
+        test_index: The index for nodes in validation set.
+    """
+
+    def __init__(self,
+                 data_path,
+                 symmetry_edges=False,
+                 self_loop=False,
+                 train_percentage=0.5):
+        self.path = data_path
+        #  self.name = name
+        self.num_groups = 5
+        self.symmetry_edges = symmetry_edges
+        self.self_loop = self_loop
+        self.train_percentage = train_percentage
+        self._load_data()
+
+    def _load_data(self):
+        edge_path = os.path.join(self.path, 'edges.txt')
+        node_path = os.path.join(self.path, 'nodes.txt')
+        nodes_label_path = os.path.join(self.path, 'nodes_label.txt')
+
+        all_edges = []
+        edges_weight = []
+
+        with io.open(node_path) as inf:
+            num_nodes = len(inf.readlines())
+
+        node_feature = np.zeros((num_nodes, self.num_groups))
+
+        with io.open(nodes_label_path) as inf:
+            for line in inf:
+                # group_id means the label of the node
+                node_id, group_id = line.strip('\n').split(',')
+                node_id = int(node_id) - 1
+                labels = group_id.split(' ')
+                for i in labels:
+                    node_feature[node_id][int(i) - 1] = 1
+
+        node_degree_list = [1 for _ in range(num_nodes)]
+
+        with io.open(edge_path) as inf:
+            for line in inf:
+                items = line.strip().split('\t')
+                if len(items) == 2:
+                    u, v = int(items[0]), int(items[1])
+                    weight = 1  # binary weight, default set to 1
+                else:
+                    u, v, weight = int(items[0]), int(items[1]), float(items[
+                        2]),
+                u, v = u - 1, v - 1
+                all_edges.append((u, v))
+                edges_weight.append(weight)
+
+                if self.symmetry_edges:
+                    all_edges.append((v, u))
+                    edges_weight.append(weight)
+
+                # sum the weights of the same node as the outdegree
+                node_degree_list[u] += weight
+
+        if self.self_loop:
+            for i in range(num_nodes):
+                all_edges.append((i, i))
+                edges_weight.append(1.)
+
+        all_edges = list(set(all_edges))
+        self.graph = graph.Graph(
+            num_nodes=num_nodes,
+            edges=all_edges,
+            node_feat={"group_id": node_feature})
+
+        perm = np.arange(0, num_nodes)
+        np.random.shuffle(perm)
+        train_num = int(num_nodes * self.train_percentage)
+        self.train_index = perm[:train_num]
+        self.test_index = perm[train_num:]
+
+        edge_distribution = np.array(edges_weight, dtype=np.float32)
+        self.edge_distribution = edge_distribution / np.sum(edge_distribution)
+        self.edge_sampling = AliasSampling(prob=edge_distribution)
+
+        node_dist = np.array(node_degree_list, dtype=np.float32)
+        node_negative_distribution = np.power(node_dist, 0.75)
+        self.node_negative_distribution = node_negative_distribution / np.sum(
+            node_negative_distribution)
+        self.node_sampling = AliasSampling(prob=node_negative_distribution)
+
+        self.node_index = {}
+        self.node_index_reversed = {}
+        for index, e in enumerate(self.graph.edges):
+            self.node_index[e[0]] = index
+            self.node_index_reversed[index] = e[0]
+
+    def fetch_batch(self,
+                    batch_size=16,
+                    K=10,
+                    edge_sampling='alias',
+                    node_sampling='alias'):
+        """Fetch batch data from dataset.
+        """
+        if edge_sampling == 'numpy':
+            edge_batch_index = np.random.choice(
+                self.graph.num_edges,
+                size=batch_size,
+                p=self.edge_distribution)
+        elif edge_sampling == 'alias':
+            edge_batch_index = self.edge_sampling.sampling(batch_size)
+        elif edge_sampling == 'uniform':
+            edge_batch_index = np.random.randint(
+                0, self.graph.num_edges, size=batch_size)
+        u_i = []
+        u_j = []
+        label = []
+        for edge_index in edge_batch_index:
+            edge = self.graph.edges[edge_index]
+            u_i.append(edge[0])
+            u_j.append(edge[1])
+            label.append(1)
+            for i in range(K):
+                while True:
+                    if node_sampling == 'numpy':
+                        negative_node = np.random.choice(
+                            self.graph.num_nodes,
+                            p=self.node_negative_distribution)
+                    elif node_sampling == 'alias':
+                        negative_node = self.node_sampling.sampling()
+                    elif node_sampling == 'uniform':
+                        negative_node = np.random.randint(0,
+                                                          self.graph.num_nodes)
+
+                    # make sure the sampled node has no edge with the source node
+                    if not self.graph.has_edges_between(
+                            np.array(
+                                [self.node_index_reversed[negative_node]]),
+                            np.array([self.node_index_reversed[edge[0]]])):
+                        break
+                u_i.append(edge[0])
+                u_j.append(negative_node)
+                label.append(-1)
+        u_i = np.array([u_i], dtype=np.int64).T
+        u_j = np.array([u_j], dtype=np.int64).T
+        label = np.array(label, dtype=np.float32)
+        return u_i, u_j, label
+
+
+class AliasSampling:
+    """Implemention of Alias-Method
+
+    This is an implementation of Alias-Method for sampling efficiently from 
+    a discrete probability distribution.
+
+    Reference: https://en.wikipedia.org/wiki/Alias_method
+
+    Args:
+        prob: The discrete probability distribution.
+
+    """
+
+    def __init__(self, prob):
+        self.n = len(prob)
+        self.U = np.array(prob) * self.n
+        self.K = [i for i in range(len(prob))]
+        overfull, underfull = [], []
+        for i, U_i in enumerate(self.U):
+            if U_i > 1:
+                overfull.append(i)
+            elif U_i < 1:
+                underfull.append(i)
+        while len(overfull) and len(underfull):
+            i, j = overfull.pop(), underfull.pop()
+            self.K[j] = i
+            self.U[i] = self.U[i] - (1 - self.U[j])
+            if self.U[i] > 1:
+                overfull.append(i)
+            elif self.U[i] < 1:
+                underfull.append(i)
+
+    def sampling(self, n=1):
+        """Sampling.
+        """
+        x = np.random.rand(n)
+        i = np.floor(self.n * x)
+        y = self.n * x - i
+        i = i.astype(np.int64)
+        res = [i[k] if y[k] < self.U[i[k]] else self.K[i[k]] for k in range(n)]
+        if n == 1:
+            return res[0]
+        else:
+            return res
--- a/examples/line/data_process.py
+++ b/examples/line/data_process.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file preprocess the FlickrDataset for LINE model.
+"""
+import argparse
+import operator
+import os
+
+
+def process_data(groupsMemberships_file, flickr_links_file, users_label_file,
+                 edges_file, users_file):
+    """Preprocess flickr network dataset.
+
+    Args:
+        groupsMemberships_file: flickr-groupmemberships.txt file, 
+            each line is a pair (user, group), which indicates a user belongs to a group.  
+
+        flickr_links_file: flickr-links.txt file,
+            each line is a pair (user, user), which indicates 
+            the two users have a relationship.
+
+        users_label_file: each line is a pair (user, list of group),
+            each user may belong to multiple groups.
+
+        edges_file: each line is a pair (user, user), which indicates 
+            the two users have a relationship. It filters some unused edges.
+
+        users_file: each line is a int number, which indicates the ID of a user.
+    """
+    group2users = {}
+    with open(groupsMemberships_file, 'r') as f:
+        for line in f:
+            user, group = line.strip().split()
+            try:
+                group2users[int(group)].append(user)
+            except:
+                group2users[int(group)] = [user]
+
+    # counting how many users belong to every group
+    group2usersNum = {}
+    for key, item in group2users.items():
+        group2usersNum[key] = len(item)
+
+    groups_sorted_by_usersNum = sorted(
+        group2usersNum.items(), key=operator.itemgetter(1), reverse=True)
+
+    # the paper only need the 5 groups with the largest number of users
+    label = 1  # remapping the 5 groups from 1 to 5
+    users_label = {}
+    for i in range(5):
+        users_list = group2users[groups_sorted_by_usersNum[i][0]]
+        for user in users_list:
+            # one user may have multi-labels
+            try:
+                users_label[user].append(label)
+            except:
+                users_label[user] = [label]
+        label += 1
+
+    # remapping the users IDs to make the IDs from 0 to N
+    userID2nodeID = {}
+    count = 1
+    for key in sorted(users_label.keys()):
+        userID2nodeID[key] = count
+        count += 1
+
+    with open(users_label_file, 'w') as writer:
+        for key in sorted(users_label.keys()):
+            line = ' '.join([str(i) for i in users_label[key]])
+            writer.write(str(userID2nodeID[key]) + ',' + line + '\n')
+
+    # produce edges file
+    with open(flickr_links_file, 'r') as reader, open(edges_file,
+                                                      'w') as writer:
+        for line in reader:
+            src, dst = line.strip().split('\t')
+            # filter unused user IDs
+            if src in users_label and dst in users_label:
+                # remapping the users IDs
+                src = userID2nodeID[src]
+                dst = userID2nodeID[dst]
+
+                writer.write(str(src) + '\t' + str(dst) + '\n')
+
+    # produce nodes file
+    with open(users_file, 'w') as writer:
+        for i in range(1, 1 + len(userID2nodeID)):
+            writer.write(str(i) + '\n')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='LINE')
+    parser.add_argument(
+        '--groupmemberships',
+        type=str,
+        default='./data/flickr/flickr-groupmemberships.txt',
+        help='groupmemberships of flickr dataset')
+
+    parser.add_argument(
+        '--flickr_links',
+        type=str,
+        default='./data/flickr/flickr-links.txt',
+        help='the flickr-links.txt file for training')
+
+    parser.add_argument(
+        '--nodes_label',
+        type=str,
+        default='./data/flickr/nodes_label.txt',
+        help='nodes (users) label file for training')
+
+    parser.add_argument(
+        '--edges',
+        type=str,
+        default='./data/flickr/edges.txt',
+        help='the result edges (links) file for training')
+
+    parser.add_argument(
+        '--nodes',
+        type=str,
+        default='./data/flickr/nodes.txt',
+        help='the nodes (users) file for training')
+
+    args = parser.parse_args()
+    process_data(args.groupmemberships, args.flickr_links, args.nodes_label,
+                 args.edges, args.nodes)
--- a/examples/line/line.py
+++ b/examples/line/line.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the training process of LINE model.
+"""
+
+import time
+import argparse
+import random
+import os
+import numpy as np
+
+import pgl
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+from pgl.utils.logger import log
+
+from data_loader import FlickrDataset
+
+
+def make_dir(path):
+    """Create directory if path is not existed.
+
+    Args:
+        path: The directory that wants to create.
+    """
+    try:
+        os.makedirs(path)
+    except:
+        if not os.path.isdir(path):
+            raise
+
+
+def save_param(dirname, var_name_list):
+    """save_param"""
+    if not os.path.exists(dirname):
+        os.makedirs(dirname)
+    for var_name in var_name_list:
+        var = fluid.global_scope().find_var(var_name)
+        var_tensor = var.get_tensor()
+        np.save(os.path.join(dirname, var_name + '.npy'), np.array(var_tensor))
+
+
+def set_seed(seed):
+    """Set global random seed.
+    """
+    random.seed(seed)
+    np.random.seed(seed)
+
+
+def build_model(args, graph):
+    """Build LINE model.
+
+    Args:
+        args: The hyperparameters for configure.
+    
+        graph: The :code:`Graph` data object.
+        
+    """
+    u_i = fl.data(
+        name='u_i', shape=[None, 1], dtype='int64', append_batch_size=False)
+    u_j = fl.data(
+        name='u_j', shape=[None, 1], dtype='int64', append_batch_size=False)
+
+    label = fl.data(
+        name='label', shape=[None], dtype='float32', append_batch_size=False)
+
+    lr = fl.data(
+        name='learning_rate',
+        shape=[1],
+        dtype='float32',
+        append_batch_size=False)
+
+    u_i_embed = fl.embedding(
+        input=u_i,
+        size=[graph.num_nodes, args.embed_dim],
+        param_attr='shared_w')
+
+    if args.order == 'first_order':
+        u_j_embed = fl.embedding(
+            input=u_j,
+            size=[graph.num_nodes, args.embed_dim],
+            param_attr='shared_w')
+    elif args.order == 'second_order':
+        u_j_embed = fl.embedding(
+            input=u_j,
+            size=[graph.num_nodes, args.embed_dim],
+            param_attr='context_w')
+    else:
+        raise ValueError("order should be first_order or second_order, not %s"
+                         % (args.order))
+
+    inner_product = fl.reduce_sum(u_i_embed * u_j_embed, dim=1)
+
+    loss = -1 * fl.reduce_mean(fl.logsigmoid(label * inner_product))
+    optimizer = fluid.optimizer.RMSPropOptimizer(learning_rate=lr)
+    train_op = optimizer.minimize(loss)
+
+    return loss, optimizer
+
+
+def main(args):
+    """The main funciton for training LINE model.
+    """
+    make_dir(args.save_dir)
+    set_seed(args.seed)
+
+    dataset = FlickrDataset(args.data_path)
+
+    log.info('num nodes in graph: %d' % dataset.graph.num_nodes)
+    log.info('num edges in graph: %d' % dataset.graph.num_edges)
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+
+    main_program = fluid.default_main_program()
+    startup_program = fluid.default_startup_program()
+
+    # build model here
+    with fluid.program_guard(main_program, startup_program):
+        loss, opt = build_model(args, dataset.graph)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)  #initialize the parameters of the network
+
+    batchrange = int(dataset.graph.num_edges / args.batch_size)
+    T = batchrange * args.epochs
+    for epoch in range(args.epochs):
+        for b in range(batchrange):
+            lr = max(args.lr * (1 - (batchrange * epoch + b) / T), 0.0001)
+
+            u_i, u_j, label = dataset.fetch_batch(
+                batch_size=args.batch_size,
+                K=args.neg_sample_size,
+                edge_sampling=args.sample_method,
+                node_sampling=args.sample_method)
+
+            feed_dict = {
+                'u_i': u_i,
+                'u_j': u_j,
+                'label': label,
+                'learning_rate': lr
+            }
+
+            ret_loss = exe.run(main_program,
+                               feed=feed_dict,
+                               fetch_list=[loss],
+                               return_numpy=True)
+
+            if b % 500 == 0:
+                log.info("Epoch %d | Step %d | Loss %f | lr: %f" %
+                         (epoch, b, ret_loss[0], lr))
+
+        # save parameters in every epoch
+        log.info("saving persistables parameters...")
+        cur_save_path = os.path.join(args.save_dir,
+                                     "model_epoch_%d" % (epoch + 1))
+        save_param(cur_save_path, ['shared_w'])
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='LINE')
+    parser.add_argument(
+        '--data_path',
+        type=str,
+        default='./data/flickr/',
+        help='dataset for training')
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument("--epochs", type=int, default=20, help='total epochs')
+    parser.add_argument("--seed", type=int, default=1667, help='random seed')
+    parser.add_argument("--lr", type=float, default=0.01, help='learning rate')
+    parser.add_argument(
+        "--neg_sample_size",
+        type=int,
+        default=5,
+        help='negative samplle number')
+    parser.add_argument("--save_dir", type=str, default="./checkpoints/model")
+    parser.add_argument("--batch_size", type=int, default=32)
+    parser.add_argument(
+        "--embed_dim",
+        type=int,
+        default=128,
+        help='the dimension of node embedding')
+    parser.add_argument(
+        "--sample_method",
+        type=str,
+        default="alias",
+        help='negative sample method (uniform, numpy, alias)')
+    parser.add_argument(
+        "--order",
+        type=str,
+        default="first_order",
+        help='the order of neighbors (first_order, second_order)')
+
+    args = parser.parse_args()
+
+    main(args)
--- a/examples/line/multi_class.py
+++ b/examples/line/multi_class.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file provides the multi class task for testing the embedding 
+learned by LINE model.
+"""
+import argparse
+import time
+import math
+import os
+import random
+
+import numpy as np
+import sklearn.metrics
+from sklearn.metrics import f1_score
+
+import pgl
+from pgl.utils import op
+import paddle.fluid as fluid
+import paddle.fluid.layers as l
+from pgl.utils.logger import log
+from data_loader import FlickrDataset
+
+
+def load_param(dirname, var_name_list):
+    """load_param"""
+    for var_name in var_name_list:
+        var = fluid.global_scope().find_var(var_name)
+        var_tensor = var.get_tensor()
+        var_tmp = np.load(os.path.join(dirname, var_name + '.npy'))
+        var_tensor.set(var_tmp, fluid.CPUPlace())
+
+
+def set_seed(seed):
+    """Set global random seed.
+    """
+    random.seed(seed)
+    np.random.seed(seed)
+
+
+def node_classify_model(graph,
+                        num_labels,
+                        embed_dim=16,
+                        name='node_classify_task'):
+    """Build node classify model.
+
+    Args:
+        graph: The :code:`Graph` data object.
+
+        num_labels: The number of labels.
+
+        embed_dim: The dimension of embedding.
+
+        name: The name of the model.
+    """
+    pyreader = l.py_reader(
+        capacity=70,
+        shapes=[[-1, 1], [-1, num_labels]],
+        dtypes=['int64', 'float32'],
+        lod_levels=[0, 0],
+        name=name + '_pyreader',
+        use_double_buffer=True)
+    nodes, labels = l.read_file(pyreader)
+    embed_nodes = l.embedding(
+        input=nodes, size=[graph.num_nodes, embed_dim], param_attr='shared_w')
+    embed_nodes.stop_gradient = True
+    logits = l.fc(input=embed_nodes, size=num_labels)
+    loss = l.sigmoid_cross_entropy_with_logits(logits, labels)
+    loss = l.reduce_mean(loss)
+    prob = l.sigmoid(logits)
+    topk = l.reduce_sum(labels, -1)
+    return {
+        'pyreader': pyreader,
+        'loss': loss,
+        'prob': prob,
+        'labels': labels,
+        'topk': topk
+    }
+    #  return pyreader, loss, prob, labels, topk
+
+
+def node_classify_generator(graph,
+                            all_nodes=None,
+                            batch_size=512,
+                            epoch=1,
+                            shuffle=True):
+    """Data generator for node classify.
+
+    Args:
+        graph: The :code:`Graph` data object.
+
+        all_nodes: the total number of nodes.
+
+        batch_size: batch size for training.
+
+        epoch: The number of epochs.
+
+        shuffle: Random shuffle of data.
+    """
+
+    if all_nodes is None:
+        all_nodes = np.arange(graph.num_nodes)
+
+    def batch_nodes_generator(shuffle=shuffle):
+        """Batch nodes generator.
+        """
+        perm = np.arange(len(all_nodes), dtype=np.int64)
+        if shuffle:
+            np.random.shuffle(perm)
+        start = 0
+        while start < len(all_nodes):
+            yield all_nodes[perm[start:start + batch_size]]
+            start += batch_size
+
+    def wrapper():
+        """Wrapper function.
+        """
+        for _ in range(epoch):
+            for batch_nodes in batch_nodes_generator():
+                batch_nodes_expanded = np.expand_dims(batch_nodes,
+                                                      -1).astype(np.int64)
+                batch_labels = graph.node_feat['group_id'][batch_nodes].astype(
+                    np.float32)
+                yield [batch_nodes_expanded, batch_labels]
+
+    return wrapper
+
+
+def topk_f1_score(labels,
+                  probs,
+                  topk_list=None,
+                  average="macro",
+                  threshold=None):
+    """Calculate top K F1 score.
+    """
+    assert topk_list is not None or threshold is not None, "one of topklist and threshold should not be None"
+    if threshold is not None:
+        preds = probs > threshold
+    else:
+        preds = np.zeros_like(labels, dtype=np.int64)
+        for idx, (prob, topk) in enumerate(zip(np.argsort(probs), topk_list)):
+            preds[idx][prob[-int(topk):]] = 1
+    return f1_score(labels, preds, average=average)
+
+
+def main(args):
+    """The main funciton for nodes classify task.
+    """
+    set_seed(args.seed)
+    log.info(args)
+    dataset = FlickrDataset(args.data_path, train_percentage=args.percent)
+
+    train_steps = (len(dataset.train_index) // args.batch_size) * args.epochs
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_prog = fluid.Program()
+    test_prog = fluid.Program()
+    startup_prog = fluid.Program()
+
+    with fluid.program_guard(train_prog, startup_prog):
+        with fluid.unique_name.guard():
+            train_model = node_classify_model(
+                dataset.graph,
+                dataset.num_groups,
+                embed_dim=args.embed_dim,
+                name='train')
+
+            lr = l.polynomial_decay(args.lr, train_steps, 0.0001)
+            adam = fluid.optimizer.Adam(lr)
+            adam.minimize(train_model['loss'])
+    with fluid.program_guard(test_prog, startup_prog):
+        with fluid.unique_name.guard():
+            test_model = node_classify_model(
+                dataset.graph,
+                dataset.num_groups,
+                embed_dim=args.embed_dim,
+                name='test')
+    test_prog = test_prog.clone(for_test=True)
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+
+    train_model['pyreader'].decorate_tensor_provider(
+        node_classify_generator(
+            dataset.graph,
+            dataset.train_index,
+            batch_size=args.batch_size,
+            epoch=args.epochs))
+    test_model['pyreader'].decorate_tensor_provider(
+        node_classify_generator(
+            dataset.graph,
+            dataset.test_index,
+            batch_size=args.batch_size,
+            epoch=1))
+
+    def existed_params(var):
+        """existed_params
+        """
+        if not isinstance(var, fluid.framework.Parameter):
+            return False
+        return os.path.exists(os.path.join(args.ckpt_path, var.name))
+
+    log.info('loading pretrained parameters from npy')
+    load_param(args.ckpt_path, ['shared_w'])
+
+    step = 0
+    prev_time = time.time()
+    train_model['pyreader'].start()
+
+    final_macro_f1 = 0.0
+    final_micro_f1 = 0.0
+    while 1:
+        try:
+            train_loss_val, train_probs_val, train_labels_val, train_topk_val = exe.run(
+                train_prog,
+                fetch_list=[
+                    train_model['loss'], train_model['prob'],
+                    train_model['labels'], train_model['topk']
+                ],
+                return_numpy=True)
+            train_macro_f1 = topk_f1_score(train_labels_val, train_probs_val,
+                                           train_topk_val, "macro",
+                                           args.threshold)
+            train_micro_f1 = topk_f1_score(train_labels_val, train_probs_val,
+                                           train_topk_val, "micro",
+                                           args.threshold)
+            step += 1
+            log.info("Step %d " % step + "Train Loss: %f " % train_loss_val +
+                     "Train Macro F1: %f " % train_macro_f1 +
+                     "Train Micro F1: %f " % train_micro_f1)
+        except fluid.core.EOFException:
+            train_model['pyreader'].reset()
+            break
+
+        test_model['pyreader'].start()
+        test_probs_vals, test_labels_vals, test_topk_vals = [], [], []
+        while 1:
+            try:
+                test_loss_val, test_probs_val, test_labels_val, test_topk_val = exe.run(
+                    test_prog,
+                    fetch_list=[
+                        test_model['loss'], test_model['prob'],
+                        test_model['labels'], test_model['topk']
+                    ],
+                    return_numpy=True)
+                test_probs_vals.append(
+                    test_probs_val), test_labels_vals.append(test_labels_val)
+                test_topk_vals.append(test_topk_val)
+            except fluid.core.EOFException:
+                test_model['pyreader'].reset()
+                test_probs_array = np.concatenate(test_probs_vals)
+                test_labels_array = np.concatenate(test_labels_vals)
+                test_topk_array = np.concatenate(test_topk_vals)
+                test_macro_f1 = topk_f1_score(
+                    test_labels_array, test_probs_array, test_topk_array,
+                    "macro", args.threshold)
+                test_micro_f1 = topk_f1_score(
+                    test_labels_array, test_probs_array, test_topk_array,
+                    "micro", args.threshold)
+                log.info("\t\tStep %d " % step + "Test Loss: %f " %
+                         test_loss_val + "Test Macro F1: %f " % test_macro_f1 +
+                         "Test Micro F1: %f " % test_micro_f1)
+                final_macro_f1 = max(test_macro_f1, final_macro_f1)
+                final_micro_f1 = max(test_micro_f1, final_micro_f1)
+                break
+
+    log.info("\nFinal test Macro F1: %f " % final_macro_f1 +
+             "Final test Micro F1: %f " % final_micro_f1)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='LINE')
+    parser.add_argument(
+        '--data_path',
+        type=str,
+        default='./data/flickr/',
+        help='dataset for training')
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument("--epochs", type=int, default=5)
+    parser.add_argument("--seed", type=int, default=1667)
+    parser.add_argument(
+        "--lr", type=float, default=0.025, help='learning rate')
+    parser.add_argument("--embed_dim", type=int, default=128)
+    parser.add_argument("--batch_size", type=int, default=256)
+    parser.add_argument("--threshold", type=float, default=None)
+    parser.add_argument(
+        "--percent",
+        type=float,
+        default=0.5,
+        help="the percentage of data as training data")
+    parser.add_argument(
+        "--ckpt_path", type=str, default="./checkpoints/model/model_epoch_0/")
+    args = parser.parse_args()
+    main(args)
--- a/examples/metapath2vec/Dataset.py
+++ b/examples/metapath2vec/Dataset.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file loads and preprocesses the dataset for metapath2vec model.
+"""
+
+import sys
+import os
+import glob
+import numpy as np
+import tqdm
+import time
+import logging
+import random
+from pgl import heter_graph
+import pickle as pkl
+
+
+class Dataset(object):
+    """Implementation of Dataset class
+
+    This is a simple implementation of loading and processing dataset for metapath2vec model.
+
+    Args:
+        config: dict, some configure parameters.
+    """
+
+    NEGATIVE_TABLE_SIZE = 1e8
+
+    def __init__(self, config):
+        self.config = config
+        self.walk_files = os.path.join(config['input_path'],
+                                       config['walk_path'])
+        self.word2id_file = os.path.join(config['input_path'],
+                                         config['word2id_file'])
+
+        self.word2freq = {}
+        self.word2id = {}
+        self.id2word = {}
+        self.sentences_count = 0
+        self.token_count = 0
+        self.negatives = []
+        self.discards = []
+
+        logging.info('reading sentences')
+        self.read_words()
+        logging.info('initializing discards')
+        self.initDiscards()
+        logging.info('initializing negatives')
+        self.initNegatives()
+
+    def read_words(self):
+        """Read words(nodes) from walk files which are produced by sampler.
+        """
+        word_freq = dict()
+        for walk_file in glob.glob(self.walk_files):
+            with open(walk_file, 'r') as reader:
+                for walk in reader:
+                    walk = walk.strip().split()
+                    if len(walk) > 1:
+                        self.sentences_count += 1
+                        for word in walk:
+                            if int(word) >= self.config[
+                                    'paper_start_index']:  # remove paper
+                                continue
+                            else:
+                                self.token_count += 1
+                                word_freq[word] = word_freq.get(word, 0) + 1
+
+        wid = 0
+        logging.info('Read %d sentences.' % self.sentences_count)
+        logging.info('Read %d words.' % self.token_count)
+        logging.info('%d words have been sampled.' % len(word_freq))
+        for w, c in word_freq.items():
+            if c < self.config['min_count']:
+                continue
+            self.word2id[w] = wid
+            self.id2word[wid] = w
+            self.word2freq[wid] = c
+            wid += 1
+
+        self.word_count = len(self.word2id)
+        logging.info(
+            '%d words displayed less than %d(min_count) have been discarded.' %
+            (len(word_freq) - len(self.word2id), self.config['min_count']))
+
+        pkl.dump(self.word2id, open(self.word2id_file, 'wb'))
+
+    def initDiscards(self):
+        """Get a frequency table for sub-sampling.
+        """
+        t = 0.0001
+        f = np.array(list(self.word2freq.values())) / self.token_count
+        self.discards = np.sqrt(t / f) + (t / f)
+
+    def initNegatives(self):
+        """Get a table for negative sampling
+        """
+        pow_freq = np.array(list(self.word2freq.values()))**0.75
+        words_pow = sum(pow_freq)
+        ratio = pow_freq / words_pow
+        count = np.round(ratio * Dataset.NEGATIVE_TABLE_SIZE)
+        for wid, c in enumerate(count):
+            self.negatives += [wid] * int(c)
+        self.negatives = np.array(self.negatives)
+        np.random.shuffle(self.negatives)
+        self.sampling_prob = ratio
+
+    def getNegatives(self, size):
+        """Get negative samples from negative samling table.
+        """
+        return np.random.choice(self.negatives, size)
+
+    def walk_from_files(self, walkpath_files):
+        """Generate walks from files.
+        """
+        bucket = []
+        for filename in walkpath_files:
+            with open(filename) as reader:
+                for line in reader:
+                    words = line.strip().split()
+                    words = [
+                        w for w in words
+                        if int(w) < self.config['paper_start_index']
+                    ]
+                    if len(words) > 1:
+                        word_ids = [
+                            self.word2id[w] for w in words if w in self.word2id
+                        ]
+                        bucket.append(word_ids)
+                        if len(bucket) == self.config['batch_size']:
+                            yield bucket
+                            bucket = []
+        if len(bucket):
+            yield bucket
+
+    def pairs_generator(self, walkpath_files):
+        """Generate train pairs(src, pos, negs) for training model.
+        """
+
+        def wrapper():
+            """wrapper for multiprocess calling.
+            """
+            for walks in self.walk_from_files(walkpath_files):
+                res = self.gen_pairs(walks)
+                yield res
+
+        return wrapper
+
+    def gen_pairs(self, walks):
+        """Generate train pairs data for training model.
+        """
+        src = []
+        pos = []
+        negs = []
+        skip_window = self.config['win_size'] // 2
+        for walk in walks:
+            for i in range(len(walk)):
+                for j in range(1, skip_window + 1):
+                    if i - j >= 0:
+                        src.append(walk[i])
+                        pos.append(walk[i - j])
+                        negs.append(
+                            self.getNegatives(size=self.config['neg_num']))
+                    if i + j < len(walk):
+                        src.append(walk[i])
+                        pos.append(walk[i + j])
+                        negs.append(
+                            self.getNegatives(size=self.config['neg_num']))
+
+        src = np.array(src, dtype=np.int64).reshape(-1, 1, 1)
+        pos = np.array(pos, dtype=np.int64).reshape(-1, 1, 1)
+        negs = np.expand_dims(np.array(negs, dtype=np.int64), -1)
+        return {"src": src, "pos": pos, "negs": negs}
+
+
+if __name__ == "__main__":
+    config = {
+        'input_path': './data/out_aminer_CPAPC/',
+        'walk_path': 'aminer_walks_CPAPC_500num_100len/*',
+        'author_label_file': 'author_label.txt',
+        'venue_label_file': 'venue_label.txt',
+        'remapping_author_label_file': 'multi_class_author_label.txt',
+        'remapping_venue_label_file': 'multi_class_venue_label.txt',
+        'word2id_file': 'word2id.pkl',
+        'win_size': 7,
+        'neg_num': 5,
+        'min_count': 2,
+        'batch_size': 1,
+    }
+
+    log_format = '%(asctime)s-%(levelname)s-%(name)s: %(message)s'
+    logging.basicConfig(level=getattr(logging, 'INFO'), format=log_format)
+
+    dataset = Dataset(config)
--- a/examples/metapath2vec/README.md
+++ b/examples/metapath2vec/README.md
+# metapath2vec: Scalable Representation Learning for Heterogeneous Networks
+[metapath2vec](https://ericdongyx.github.io/papers/KDD17-dong-chawla-swami-metapath2vec.pdf) is a algorithm framework for representation learning in heterogeneous networks which contains multiple types of nodes and links. Given a heterogeneous graph, metapath2vec algorithm first generates meta-path-based random walks and then use skipgram model to train a language model. Based on PGL, we reproduce metapath2vec algorithm.
+
+
+## Datasets
+You can dowload datasets from [here](https://ericdongyx.github.io/metapath2vec/m2v.html)
+
+We use the "aminer" data for example. After downloading the aminer data, put them, let's say, in ./data/net_aminer/ . We also need to put "label/" directory in ./data/.
+
+## Dependencies
+- paddlepaddle>=1.6
+- pgl>=1.0.0
+
+## Hyperparameters
+All the hyper parameters are saved in config.yaml file. So before training, you can open the config.yaml to modify the hyper parameters as you like.
+
+for example, you can change the \"use_cuda\" to \"True \" in order to use GPU for training or modify \"data_path\" to specify the data you want.
+
+Some important hyper parameters in config.yaml:
+- **use_cuda**: use GPU to train model
+- **data_path**: the directory of dataset that you want to load
+- **lr**: learning rate
+- **neg_num**: number of negative samples.
+- **num_walks**: number of walks started from each node
+- **walk_length**: walk length
+- **metapath**: meta path scheme
+
+## Metapath randomwalk sampling
+Before training, we should generate some metapath random walks to train skipgram model. we can run the below command to produce metapath randomwalk data.
+```sh
+python sample.py -c config.yaml
+```
+
+## Training and Testing
+After finishing metapath randomwalk sampling, you can run the below command to train and test the model.
+```sh
+python main.py -c config.yaml
+
+python multi_class.py --dataset ./data/out_aminer_CPAPC/author_label.txt --word2id ./checkpoints/train.metapath2vec/word2id.pkl  --ckpt_path ./checkpoints/train.metapath2vec/model_epoch5/
+
+```
+
+## Experiment results
+| train_percent | Metric   | PGL Result | Reported Result |
+|---------------|----------|------------|-----------------|
+| 50%           | macro-F1 | 0.9249     | 0.9314          |
+| 50%           | micro-F1 | 0.9283     | 0.9365          |
--- a/examples/metapath2vec/config.yaml
+++ b/examples/metapath2vec/config.yaml
+task_name: train.metapath2vec
+use_cuda: True
+log_level: info 
+seed: 1667
+
+sampler:
+    type:
+    args:
+        data_path: ./data/net_aminer/
+        author_label_file: ./data/label/googlescholar.8area.author.label.txt
+        venue_label_file: ./data/label/googlescholar.8area.venue.label.txt
+        output_path: ./data/out_aminer_CPAPC/
+        new_author_label_file: author_label.txt
+        new_venue_label_file: venue_label.txt
+        walk_saved_path: walks/
+        walk_batch_size: 1000
+        num_walks: 1000
+        walk_length: 100
+        num_sample_workers: 16
+        first_node_type: conf
+        metapath: c2p-p2a-a2p-p2c  #conf-paper-author-paper-conf
+
+optimizer:
+    type: Adam
+    args:
+        lr: 0.005
+        end_lr: 0.0001
+
+trainer:
+    type: trainer
+    args:
+        epochs: 5
+        log_dir: logs/
+        save_dir: checkpoints/
+        output_dir: outputs/
+        num_sample_workers: 8
+    
+data_loader:
+    type: Dataset
+    args:
+        input_path: ./data/out_aminer_CPAPC/  # same path as output_path in sampler
+        walk_path: walks/*
+        word2id_file: word2id.pkl
+        batch_size: 32
+        win_size: 5  # default: 7
+        neg_num: 5
+        min_count: 10
+        paper_start_index: 1697414
+
+model:
+    type: SkipgramModel
+    args:
+        embed_dim: 128
--- a/examples/metapath2vec/main.py
+++ b/examples/metapath2vec/main.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the training process of metapath2vec model.
+"""
+import os
+import sys
+import argparse
+import time
+import numpy as np
+import logging
+import pickle as pkl
+import shutil
+import glob
+
+import pgl
+from pgl.utils import paddle_helper
+import paddle
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+
+from utils import *
+import Dataset
+import model as Models
+from pgl.utils import mp_reader
+from sklearn.metrics import (auc, f1_score, precision_recall_curve,
+                             roc_auc_score)
+
+
+def set_seed(seed):
+    """Set global random seed."""
+    random.seed(seed)
+    np.random.seed(seed)
+
+
+def save_param(dirname, var_name_list):
+    """save_param"""
+    if not os.path.exists(dirname):
+        os.makedirs(dirname)
+
+    for var_name in var_name_list:
+        var = fluid.global_scope().find_var(var_name)
+        var_tensor = var.get_tensor()
+        np.save(os.path.join(dirname, var_name + '.npy'), np.array(var_tensor))
+
+
+def multiprocess_data_generator(config, dataset):
+    """Using multiprocess to generate training data.
+    """
+    num_sample_workers = config['trainer']['args']['num_sample_workers']
+
+    walkpath_files = [[] for i in range(num_sample_workers)]
+    for idx, f in enumerate(glob.glob(dataset.walk_files)):
+        walkpath_files[idx % num_sample_workers].append(f)
+
+    gen_data_pool = [
+        dataset.pairs_generator(files) for files in walkpath_files
+    ]
+    if num_sample_workers == 1:
+        gen_data_func = gen_data_pool[0]
+    else:
+        gen_data_func = mp_reader.multiprocess_reader(
+            gen_data_pool, use_pipe=True, queue_size=100)
+
+    return gen_data_func
+
+
+def run_epoch(epoch,
+              config,
+              data_generator,
+              train_prog,
+              model,
+              feed_dict,
+              exe,
+              for_test=False):
+    """Run training process of every epoch.
+    """
+    total_loss = []
+    for idx, batch_data in enumerate(data_generator()):
+        feed_dict['train_inputs'] = batch_data['src']
+        feed_dict['train_labels'] = batch_data['pos']
+        feed_dict['train_negs'] = batch_data['negs']
+
+        loss, lr = exe.run(train_prog,
+                           feed=feed_dict,
+                           fetch_list=[model.loss, model.lr],
+                           return_numpy=True)
+        total_loss.append(loss[0])
+
+        if (idx + 1) % 500 == 0:
+            avg_loss = np.mean(total_loss)
+            logging.info("epoch %d | step %d | lr %.4f | train_loss %f " %
+                         (epoch, idx + 1, lr, avg_loss))
+            total_loss = []
+
+
+def main(config):
+    """main function for training metapath2vec model.
+    """
+    logging.info(config)
+
+    set_seed(config['seed'])
+
+    dataset = getattr(
+        Dataset, config['data_loader']['type'])(config['data_loader']['args'])
+    data_generator = multiprocess_data_generator(config, dataset)
+
+    # move word2id file to checkpoints directory
+    src_word2id_file = dataset.word2id_file
+    dst_wor2id_file = config['trainer']['args']['save_dir'] + config[
+        'data_loader']['args']['word2id_file']
+    logging.info('backup word2id file to %s' % dst_wor2id_file)
+    shutil.move(src_word2id_file, dst_wor2id_file)
+
+    place = fluid.CUDAPlace(0) if config['use_cuda'] else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+
+    with fluid.program_guard(train_program, startup_program):
+        model = getattr(Models, config['model']['type'])(
+            dataset=dataset, config=config['model']['args'], place=place)
+
+    with fluid.program_guard(train_program, startup_program):
+        global_steps = int(dataset.sentences_count *
+                           config['trainer']['args']['epochs'] /
+                           config['data_loader']['args']['batch_size'])
+        model.backward(global_steps, config['optimizer']['args'])
+
+    # train
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    feed_dict = {}
+
+    logging.info('training...')
+    for epoch in range(1, 1 + config['trainer']['args']['epochs']):
+        run_epoch(epoch, config['trainer']['args'], data_generator,
+                  train_program, model, feed_dict, exe)
+
+        logging.info('saving model...')
+        cur_save_path = os.path.join(config['trainer']['args']['save_dir'],
+                                     "model_epoch%d" % (epoch))
+        save_param(cur_save_path, ['content'])
+
+    logging.info('finishing training')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='metapath2vec')
+    parser.add_argument(
+        '-c',
+        '--config',
+        default=None,
+        type=str,
+        help='config file path (default: None)')
+    parser.add_argument(
+        '-n',
+        '--taskname',
+        default=None,
+        type=str,
+        help='task name(default: None)')
+    args = parser.parse_args()
+
+    if args.config:
+        # load config file
+        config = Config(args.config, isCreate=True, isSave=True)
+        config = config()
+    else:
+        raise AssertionError(
+            "Configuration file need to be specified. Add '-c config.yaml', for example."
+        )
+
+    log_format = '%(asctime)s-%(levelname)s-%(name)s: %(message)s'
+    logging.basicConfig(
+        level=getattr(logging, config['log_level'].upper()), format=log_format)
+
+    main(config)
--- a/examples/metapath2vec/model.py
+++ b/examples/metapath2vec/model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the skipgram model for training metapath2vec.
+"""
+
+import argparse
+import time
+import math
+import os
+import io
+from multiprocessing import Pool
+import logging
+import numpy as np
+import glob
+
+import pgl
+from pgl import data_loader
+from pgl.utils import op
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+
+
+class SkipgramModel(object):
+    """Implemetation of skipgram model.
+
+    Args:
+        config: dict, some configure parameters.
+        dataset: instance of Dataset class
+        place: GPU or CPU place
+    """
+
+    def __init__(self, config, dataset, place):
+        self.config = config
+        self.dataset = dataset
+        self.place = place
+        self.neg_num = self.dataset.config['neg_num']
+
+        self.num_nodes = len(dataset.word2id)
+
+        self.train_inputs = fl.data(
+            'train_inputs', shape=[None, 1, 1], dtype='int64')
+
+        self.train_labels = fl.data(
+            'train_labels', shape=[None, 1, 1], dtype='int64')
+
+        self.train_negs = fl.data(
+            'train_negs', shape=[None, self.neg_num, 1], dtype='int64')
+
+        self.forward()
+
+    def backward(self, global_steps, opt_config):
+        """Build the optimizer.
+        """
+        self.lr = fl.polynomial_decay(opt_config['lr'], global_steps,
+                                      opt_config['end_lr'])
+        adam = fluid.optimizer.Adam(learning_rate=self.lr)
+        adam.minimize(self.loss)
+
+    def forward(self):
+        """Build the skipgram model.
+        """
+        initrange = 1.0 / self.config['embed_dim']
+        embed_init = fluid.initializer.UniformInitializer(
+            low=-initrange, high=initrange)
+        weight_init = fluid.initializer.TruncatedNormal(
+            scale=1.0 / math.sqrt(self.config['embed_dim']))
+
+        embed_src = fl.embedding(
+            input=self.train_inputs,
+            size=[self.num_nodes, self.config['embed_dim']],
+            param_attr=fluid.ParamAttr(
+                name='content', initializer=embed_init))
+
+        weight_pos = fl.embedding(
+            input=self.train_labels,
+            size=[self.num_nodes, self.config['embed_dim']],
+            param_attr=fluid.ParamAttr(
+                name='weight', initializer=weight_init))
+
+        weight_negs = fl.embedding(
+            input=self.train_negs,
+            size=[self.num_nodes, self.config['embed_dim']],
+            param_attr=fluid.ParamAttr(
+                name='weight', initializer=weight_init))
+
+        pos_logits = fl.matmul(
+            embed_src, weight_pos, transpose_y=True)  # [batch_size, 1, 1]
+
+        pos_score = fl.squeeze(pos_logits, axes=[1])
+        pos_score = fl.clip(pos_score, min=-10, max=10)
+        pos_score = -self.neg_num * fl.logsigmoid(pos_score)
+
+        neg_logits = fl.matmul(
+            embed_src, weight_negs,
+            transpose_y=True)  # [batch_size, 1, neg_num]
+        neg_score = fl.squeeze(neg_logits, axes=[1])
+        neg_score = fl.clip(neg_score, min=-10, max=10)
+        neg_score = -1.0 * fl.logsigmoid(-1.0 * neg_score)
+        neg_score = fl.reduce_sum(neg_score, dim=1, keep_dim=True)
+
+        self.loss = fl.reduce_mean(pos_score + neg_score) / self.neg_num / 2
--- a/examples/metapath2vec/multi_class.py
+++ b/examples/metapath2vec/multi_class.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file provides the multi class task for testing the embedding learned by metapath2vec model.
+"""
+import argparse
+import sys
+import os
+import tqdm
+import time
+import math
+import logging
+import random
+import pickle as pkl
+
+import numpy as np
+import sklearn.metrics
+from sklearn.metrics import f1_score
+
+import pgl
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+
+import Dataset
+from utils import *
+
+
+def load_param(dirname, var_name_list):
+    """load_param"""
+    for var_name in var_name_list:
+        var = fluid.global_scope().find_var(var_name)
+        var_tensor = var.get_tensor()
+        var_tmp = np.load(os.path.join(dirname, var_name + '.npy'))
+        var_tensor.set(var_tmp, fluid.CPUPlace())
+
+
+def load_data(file_, word2id):
+    """Load data for node classification.
+    """
+    words_label = []
+    line_count = 0
+    with open(file_, 'r') as reader:
+        for line in reader:
+            line_count += 1
+            tokens = line.strip().split(' ')
+            word, label = tokens[0], int(tokens[1]) - 1
+            if word in word2id:
+                words_label.append((word2id[word], label))
+
+    words_label = np.array(words_label, dtype=np.int64)
+    np.random.shuffle(words_label)
+
+    logging.info('%d/%d word_label pairs have been loaded' %
+                 (len(words_label), line_count))
+    return words_label
+
+
+def node_classify_model(word2id, num_labels, embed_dim=16):
+    """Build node classify model.
+
+    Args:
+        word2id(dict): map word(node) to its corresponding index
+
+        num_labels: The number of labels.
+
+        embed_dim: The dimension of embedding.
+    """
+
+    nodes = fl.data('nodes', shape=[None, 1], dtype='int64')
+    labels = fl.data('labels', shape=[None, 1], dtype='int64')
+
+    embed_nodes = fl.embedding(
+        input=nodes,
+        size=[len(word2id), embed_dim],
+        param_attr=fluid.ParamAttr(name='content'))
+
+    embed_nodes.stop_gradient = True
+    probs = fl.fc(input=embed_nodes, size=num_labels, act='softmax')
+    predict = fl.argmax(probs, axis=-1)
+    loss = fl.cross_entropy(input=probs, label=labels)
+    loss = fl.reduce_mean(loss)
+
+    return {
+        'loss': loss,
+        'probs': probs,
+        'predict': predict,
+        'labels': labels,
+    }
+
+
+def run_epoch(exe, prog, model, feed_dict, lr):
+    """Run training process of every epoch.
+    """
+    if lr is None:
+        loss, predict = exe.run(prog,
+                                feed=feed_dict,
+                                fetch_list=[model['loss'], model['predict']],
+                                return_numpy=True)
+        lr_ = 0
+    else:
+        loss, predict, lr_ = exe.run(
+            prog,
+            feed=feed_dict,
+            fetch_list=[model['loss'], model['predict'], lr],
+            return_numpy=True)
+
+    macro_f1 = f1_score(feed_dict['labels'], predict, average="macro")
+    micro_f1 = f1_score(feed_dict['labels'], predict, average="micro")
+
+    return {
+        'loss': loss,
+        'pred': predict,
+        'lr': lr_,
+        'macro_f1': macro_f1,
+        'micro_f1': micro_f1
+    }
+
+
+def main(args):
+    """main function for training node classification task.
+    """
+    word2id = pkl.load(open(args.word2id, 'rb'))
+    words_label = load_data(args.dataset, word2id)
+    # split data for training and testing
+    split_position = int(words_label.shape[0] * args.train_percent)
+    train_words_label = words_label[0:split_position, :]
+    test_words_label = words_label[split_position:, :]
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_prog = fluid.Program()
+    test_prog = fluid.Program()
+    startup_prog = fluid.Program()
+
+    with fluid.program_guard(train_prog, startup_prog):
+        with fluid.unique_name.guard():
+            model = node_classify_model(
+                word2id, args.num_labels, embed_dim=args.embed_dim)
+
+    test_prog = train_prog.clone(for_test=True)
+
+    with fluid.program_guard(train_prog, startup_prog):
+        lr = fl.polynomial_decay(args.lr, 1000, 0.001)
+        adam = fluid.optimizer.Adam(lr)
+        adam.minimize(model['loss'])
+
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+
+    load_param(args.ckpt_path, ['content'])
+
+    feed_dict = {}
+    X = train_words_label[:, 0].reshape(-1, 1)
+    labels = train_words_label[:, 1].reshape(-1, 1)
+    logging.info('%d/%d data to train' %
+                 (labels.shape[0], words_label.shape[0]))
+
+    test_feed_dict = {}
+    test_X = test_words_label[:, 0].reshape(-1, 1)
+    test_labels = test_words_label[:, 1].reshape(-1, 1)
+    logging.info('%d/%d data to test' %
+                 (test_labels.shape[0], words_label.shape[0]))
+
+    for epoch in range(args.epochs):
+        feed_dict['nodes'] = X
+        feed_dict['labels'] = labels
+        train_result = run_epoch(exe, train_prog, model, feed_dict, lr)
+
+        test_feed_dict['nodes'] = test_X
+        test_feed_dict['labels'] = test_labels
+
+        test_result = run_epoch(exe, test_prog, model, test_feed_dict, lr=None)
+
+        logging.info(
+            'epoch %d | lr %.4f | train_loss %.5f | train_macro_F1 %.4f | train_micro_F1 %.4f | test_loss %.5f | test_macro_F1 %.4f | test_micro_F1 %.4f'
+            % (epoch, train_result['lr'], train_result['loss'],
+               train_result['macro_f1'], train_result['micro_f1'],
+               test_result['loss'], test_result['macro_f1'],
+               test_result['micro_f1']))
+
+    logging.info(
+        'final_test_macro_f1 score: %.4f | final_test_micro_f1 score: %.4f' %
+        (test_result['macro_f1'], test_result['micro_f1']))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='multi_class')
+    parser.add_argument(
+        '--dataset',
+        default=None,
+        type=str,
+        help='training and testing data file(default: None)')
+    parser.add_argument(
+        '--word2id',
+        default=None,
+        type=str,
+        help='word2id file (default: None)')
+    parser.add_argument(
+        '--ckpt_path', default=None, type=str, help='task name(default: None)')
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument(
+        '--train_percent',
+        default=0.5,
+        type=float,
+        help='train_percent(default: 0.5)')
+    parser.add_argument(
+        '--num_labels',
+        default=8,
+        type=int,
+        help='number of labels(default: 8)')
+    parser.add_argument(
+        '--epochs',
+        default=100,
+        type=int,
+        help='number of epochs for training(default: 10)')
+    parser.add_argument(
+        '--lr',
+        default=0.025,
+        type=float,
+        help='learning rate(default: 0.025)')
+    parser.add_argument(
+        '--embed_dim',
+        default=128,
+        type=int,
+        help='dimension of embedding(default: 128)')
+    args = parser.parse_args()
+
+    log_format = '%(asctime)s-%(levelname)s-%(name)s: %(message)s'
+    logging.basicConfig(level='INFO', format=log_format)
+
+    main(args)
--- a/examples/metapath2vec/sample.py
+++ b/examples/metapath2vec/sample.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the sampler to sample metapath random walk sequence for 
+training metapath2vec model.
+"""
+
+import multiprocessing
+from multiprocessing import Pool
+from multiprocessing import Process
+import argparse
+import sys
+import os
+import numpy as np
+import pickle as pkl
+import tqdm
+import time
+import logging
+import random
+from pgl import heter_graph
+from pgl.sample import metapath_randomwalk
+from utils import *
+
+
+class Sampler(object):
+    """Implemetation of sampler in order to sample metapath random walk.
+
+    Args:
+        config: dict, some configure parameters.
+    """
+
+    def __init__(self, config):
+        self.config = config
+        self.build_graph()
+
+    def build_graph(self):
+        """Build pgl heterogeneous graph.
+        """
+        self.conf_id2index, self.conf_name2index, conf_node_type = self.remapping_id(
+            self.config['data_path'] + 'id_conf.txt',
+            start_index=0,
+            node_type='conf')
+        logging.info('%d venues have been loaded.' % (len(self.conf_id2index)))
+
+        self.author_id2index, self.author_name2index, author_node_type = self.remapping_id(
+            self.config['data_path'] + 'id_author.txt',
+            start_index=len(self.conf_id2index),
+            node_type='author')
+        logging.info('%d authors have been loaded.' %
+                     (len(self.author_id2index)))
+
+        self.paper_id2index, self.paper_name2index, paper_node_type = self.remapping_id(
+            self.config['data_path'] + 'paper.txt',
+            start_index=(len(self.conf_id2index) + len(self.author_id2index)),
+            node_type='paper',
+            separator='\t')
+        logging.info('%d papers have been loaded.' %
+                     (len(self.paper_id2index)))
+
+        node_types = conf_node_type + author_node_type + paper_node_type
+        num_nodes = len(node_types)
+        edges_by_types = {}
+        paper_author_edges = self.load_edges(
+            self.config['data_path'] + 'paper_author.txt', self.paper_id2index,
+            self.author_id2index)
+        paper_conf_edges = self.load_edges(
+            self.config['data_path'] + 'paper_conf.txt', self.paper_id2index,
+            self.conf_id2index)
+
+        #  edges_by_types['edge'] = paper_author_edges + paper_conf_edges
+        edges_by_types['p2c'] = paper_conf_edges
+        edges_by_types['c2p'] = [(dst, src) for src, dst in paper_conf_edges]
+        edges_by_types['p2a'] = paper_author_edges
+        edges_by_types['a2p'] = [(dst, src) for src, dst in paper_author_edges]
+
+        #  logging.info('%d edges have been loaded.' %
+        #               (len(edges_by_types['edge'])))
+
+        node_features = {
+            'index': np.array([i for i in range(num_nodes)]).reshape(
+                -1, 1).astype(np.int64)
+        }
+
+        self.graph = heter_graph.HeterGraph(
+            num_nodes=num_nodes,
+            edges=edges_by_types,
+            node_types=node_types,
+            node_feat=node_features)
+
+    def remapping_id(self, file_, start_index, node_type, separator='\t'):
+        """Mapp the ID and name of nodes to index.
+        """
+        node_types = []
+        id2index = {}
+        name2index = {}
+        index = start_index
+        with open(file_, encoding="ISO-8859-1") as reader:
+            for line in reader:
+                tokens = line.strip().split(separator)
+                id2index[tokens[0]] = index
+                if len(tokens) == 2:
+                    name2index[tokens[1]] = index
+                node_types.append((index, node_type))
+                index += 1
+
+        return id2index, name2index, node_types
+
+    def load_edges(self, file_, src2index, dst2index, symmetry=False):
+        """Load edges from file.
+        """
+        edges = []
+        with open(file_, 'r') as reader:
+            for line in reader:
+                items = line.strip().split()
+                src, dst = src2index[items[0]], dst2index[items[1]]
+                edges.append((src, dst))
+                if symmetry:
+                    edges.append((dst, src))
+            edges = list(set(edges))
+        return edges
+
+    def generate_multi_class_data(self, name_label_file):
+        """Mapp the data that will be used in multi class task to index.
+        """
+        if 'author' in name_label_file:
+            name2index = self.author_name2index
+        else:
+            name2index = self.conf_name2index
+
+        index_label_list = []
+        with open(name_label_file, encoding="ISO-8859-1") as reader:
+            for line in reader:
+                tokens = line.strip().split(' ')
+                name, label = tokens[0], int(tokens[1])
+                index = name2index[name]
+                index_label_list.append((index, label))
+
+        return index_label_list
+
+
+def walk_generator(graph, batch_size, metapath, n_type, walk_length):
+    """Generate metapath random walk.
+    """
+    np.random.seed(os.getpid())
+    while True:
+        for start_nodes in graph.node_batch_iter(
+                batch_size=batch_size, n_type=n_type):
+            walks = metapath_randomwalk(
+                graph=graph,
+                start_nodes=start_nodes,
+                metapath=metapath,
+                walk_length=walk_length)
+            yield walks
+
+
+def walk_to_files(g, batch_size, metapath, n_type, walk_length, max_num,
+                  filename):
+    """Generate metapath randomwalk and save in files"""
+    #  g, batch_size, metapath, n_type, walk_length, max_num, filename = args
+    with open(filename, 'w') as writer:
+        cc = 0
+        for walks in walk_generator(g, batch_size, metapath, n_type,
+                                    walk_length):
+            for walk in walks:
+                writer.write("%s\n" % "\t".join([str(i) for i in walk]))
+                cc += 1
+                if cc == max_num:
+                    return
+        return
+
+
+def multiprocess_generate_walks_to_files(graph, n_type, meta_path, num_walks,
+                                         walk_length, batch_size,
+                                         num_sample_workers, saved_path):
+    """Use multiprocess to generate metapath random walk to files.
+    """
+
+    num_nodes_by_type = graph.num_nodes_by_type(n_type)
+    logging.info("num_nodes_by_type: %s" % num_nodes_by_type)
+    max_num = (num_walks * num_nodes_by_type // num_sample_workers) + 1
+    logging.info("max sample number of every worker: %s" % max_num)
+
+    args = []
+    for i in range(num_sample_workers):
+        filename = os.path.join(saved_path, 'part-%05d' % (i))
+        args.append((graph, batch_size, meta_path, n_type, walk_length,
+                     max_num, filename))
+
+    ps = []
+    for i in range(num_sample_workers):
+        p = Process(target=walk_to_files, args=args[i])
+        p.start()
+        ps.append(p)
+    for i in range(num_sample_workers):
+        ps[i].join()
+    #  pool = Pool(num_sample_workers)
+    #  pool.map(walk_to_files, args)
+    #  pool.close()
+    #  pool.join()
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='metapath2vec')
+    parser.add_argument(
+        '-c',
+        '--config',
+        default=None,
+        type=str,
+        help='config file path (default: None)')
+    args = parser.parse_args()
+
+    if args.config:
+        # load config file
+        config = Config(args.config, isCreate=False, isSave=False)
+        config = config()
+        config = config['sampler']['args']
+    else:
+        raise AssertionError(
+            "Configuration file need to be specified. Add '-c config.yaml', for example."
+        )
+
+    log_format = '%(asctime)s-%(levelname)s-%(name)s: %(message)s'
+    logging.basicConfig(level="INFO", format=log_format)
+
+    logging.info(config)
+
+    log_format = '%(asctime)s-%(levelname)s-%(name)s: %(message)s'
+    logging.basicConfig(level=getattr(logging, 'INFO'), format=log_format)
+
+    if not os.path.exists(config['output_path']):
+        os.makedirs(config['output_path'])
+
+    config['walk_saved_path'] = config['output_path'] + config[
+        'walk_saved_path']
+    if not os.path.exists(config['walk_saved_path']):
+        os.makedirs(config['walk_saved_path'])
+
+    sampler = Sampler(config)
+
+    begin = time.time()
+    logging.info('multi process sampling')
+    multiprocess_generate_walks_to_files(
+        graph=sampler.graph,
+        n_type=config['first_node_type'],
+        meta_path=config['metapath'],
+        num_walks=config['num_walks'],
+        walk_length=config['walk_length'],
+        batch_size=config['walk_batch_size'],
+        num_sample_workers=config['num_sample_workers'],
+        saved_path=config['walk_saved_path'], )
+    logging.info('total time: %.4f' % (time.time() - begin))
+
+    logging.info('generating multi class data')
+    word_label_list = sampler.generate_multi_class_data(config[
+        'author_label_file'])
+    with open(config['output_path'] + config['new_author_label_file'],
+              'w') as writer:
+        for line in word_label_list:
+            line = [str(i) for i in line]
+            writer.write(' '.join(line) + '\n')
+
+    word_label_list = sampler.generate_multi_class_data(config[
+        'venue_label_file'])
+    with open(config['output_path'] + config['new_venue_label_file'],
+              'w') as writer:
+        for line in word_label_list:
+            line = [str(i) for i in line]
+            writer.write(' '.join(line) + '\n')
+    logging.info('finished')
--- a/examples/metapath2vec/utils.py
+++ b/examples/metapath2vec/utils.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement a class for model configure.
+"""
+
+import datetime
+import os
+import yaml
+import random
+import shutil
+
+
+class Config(object):
+    """Implementation of Config class for model configure.
+
+    Args:
+        config_file(str): configure filename, which is a yaml file.
+        isCreate(bool): if true, create some neccessary directories to save models, log file and other outputs.
+        isSave(bool): if true, save config_file in order to record the configure message.
+    """
+
+    def __init__(self, config_file, isCreate=False, isSave=False):
+        self.config_file = config_file
+        self.config = self.get_config_from_yaml(config_file)
+
+        if isCreate:
+            self.create_necessary_dirs()
+
+            if isSave:
+                self.save_config_file()
+
+    def get_config_from_yaml(self, yaml_file):
+        """Get the configure hyperparameters from yaml file.
+        """
+        try:
+            with open(yaml_file, 'r') as f:
+                config = yaml.load(f)
+        except Exception:
+            raise IOError("Error in parsing config file '%s'" % yaml_file)
+
+        return config
+
+    def create_necessary_dirs(self):
+        """Create some necessary directories to save some important files.
+        """
+
+        time_stamp = datetime.datetime.now().strftime('%m%d_%H%M')
+        self.config['trainer']['args']['log_dir'] = ''.join(
+            (self.config['trainer']['args']['log_dir'],
+             self.config['task_name'], '/'))  # , '.%s/' % (time_stamp)))
+        self.config['trainer']['args']['save_dir'] = ''.join(
+            (self.config['trainer']['args']['save_dir'],
+             self.config['task_name'], '/'))  # , '.%s/' % (time_stamp))) 
+        self.config['trainer']['args']['output_dir'] = ''.join(
+            (self.config['trainer']['args']['output_dir'],
+             self.config['task_name'], '/'))  # , '.%s/' % (time_stamp)))
+        #  if os.path.exists(self.config['trainer']['args']['save_dir']):
+        #      input('save_dir is existed, do you really want to continue?')
+
+        self.make_dir(self.config['trainer']['args']['log_dir'])
+        self.make_dir(self.config['trainer']['args']['save_dir'])
+        self.make_dir(self.config['trainer']['args']['output_dir'])
+
+    def save_config_file(self):
+        """Save config file so that we can know the config when we look back
+        """
+        filename = self.config_file.split('/')[-1]
+        targetpath = self.config['trainer']['args']['save_dir']
+        shutil.copyfile(self.config_file, targetpath + filename)
+
+    def make_dir(self, path):
+        """Build directory"""
+        if not os.path.exists(path):
+            os.makedirs(path)
+
+    def __getitem__(self, key):
+        """Return the configure dict"""
+        return self.config[key]
+
+    def __call__(self):
+        """__call__"""
+        return self.config
--- a/examples/node2vec/README.md
+++ b/examples/node2vec/README.md
-# PGL Examples for node2vec
+# node2vec: Scalable Feature Learning for Networks 
 [Node2vec](https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf) is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce node2vec algorithms and reach the same level of indicators as the paper.
 ## Datasets
 The datasets contain two networks: [BlogCatalog](http://socialcomputing.asu.edu/datasets/BlogCatalog3) and [Arxiv](http://snap.stanford.edu/data/ca-AstroPh.html). 

--- a/examples/pgl-ke/README.md
+++ b/examples/pgl-ke/README.md
+# PGL - Knowledge Graph Embedding
+
+
+This package is mainly for computing node and relation embedding of knowledge graphs efficiently.
+
+This package reproduce the following knowledge embedding models:
+- TransE
+- TransR
+- RotatE
+
+### Dataset
+
+The dataset WN18 and FB15k are originally published by TransE paper and can be download [here](https://everest.hds.utc.fr/doku.php?id=en:transe).
+
+FB15k: [https://drive.google.com/open?id=19I3LqaKjgq-3vOs0us7OgEL06TIs37W8](https://drive.google.com/open?id=19I3LqaKjgq-3vOs0us7OgEL06TIs37W8)
+
+WN18: [https://drive.google.com/open?id=1MXy257ZsjeXQHZScHLeQeVnUTPjltlwD](https://drive.google.com/open?id=1MXy257ZsjeXQHZScHLeQeVnUTPjltlwD)
+
+### Dependencies
+
+If you want to use the PGL-KG in paddle, please install following packages.
+- paddlepaddle>=1.7
+- pgl
+
+### Hyperparameters
+
+- use\_cuda: use cuda to train.
+- model: pgl-kg model names. Now available for `TransE`, `TransR` and `RotatE`.
+- data\_dir: the data path of dataset.
+- optimizer: optimizer to run the model.
+- batch\_size: batch size.
+- learning\_rate:learning rate.
+- epoch: epochs to run.
+- evaluate\_per\_iteration: evaluate after certain epochs.
+- sample\_workers: sample workers nums to prepare data.
+- margin: hyper-parameter for some model.
+
+For more hyper parameters usages, please refer the `main.py`. We also provide `run.sh` script to reproduce performance results (please download dataset in `./data` and specify the data\_dir paramter).
+
+
+### How to run
+
+For examples, use GPU to train TransR model on WN18 dataset.
+(please download WN18 dataset to `./data` floder)
+```
+python main.py --use_cuda --model TransR --data_dir ./data/WN18
+```
+We also provide `run.sh` script to reproduce following performance results.
+
+### Experiment results
+
+Here we report the experiment results on FB15k and WN18 dataset. The evaluation criteria are MR (mean rank), Mrr (mean reciprocal rank), Hit@N (The first N hit rate). The suffix `@f` means that we filter the exists relations of entities.
+
+FB15k dataset
+
+| Models | MR  |  Mrr  | Hits@1 | Hits@3 | Hits@10|  MR@f |Mrr@f|Hit1@f|Hit3@f|Hits10@f|
+|--------|-----|-------|--------|--------|--------|-------|-----|------|------|--------|
+| TransE | 215 | 0.205 |  0.093 | 0.234  |  0.446 |   74  |0.379| 0.235| 0.453|  0.647 |
+| TransR | 304 | 0.193 |  0.092 | 0.211  |  0.418 |  156  |0.366| 0.232| 0.435|  0.623 |
+| RotatE | 157 | 0.270 | 0.162  | 0.303  |  0.501 |   53  |0.478| 0.354| 0.547|  0.710 |
+
+
+WN18 dataset
+
+| Models | MR  |  Mrr  | Hits@1 | Hits@3 | Hits@10|  MR@f |Mrr@f|Hit1@f|Hit3@f|Hits10@f|
+|--------|-----|-------|--------|--------|--------|-------|-----|------|------|--------|
+| TransE | 219 | 0.338 | 0.082  | 0.523  |  0.800 |  208  |0.463| 0.135| 0.771| 0.932  |
+| TransR | 321 | 0.370 | 0.096  | 0.591  |  0.810 |  309  |0.513| 0.158| 0.941| 0.941  |
+| RotatE | 167 | 0.623 | 0.476  | 0.688  |  0.830 |  155  |0.915| 0.884| 0.941| 0.957  |
+
+
+## References
+
+[1]. [TransE: Translating embeddings for modeling multi-relational data.](https://ieeexplore.ieee.org/abstract/document/8047276)
+
+[2]. [TransR: Learning entity and relation embeddings for knowledge graph completion.](http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/viewFile/9571/9523)
+
+[3]. [RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space.](https://arxiv.org/abs/1902.10197)
--- a/examples/pgl-ke/data_loader.py
+++ b/examples/pgl-ke/data_loader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed und
+# er the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+loader for the knowledge dataset.
+"""
+import os
+import numpy as np
+from collections import defaultdict
+from pgl.utils.logger import log
+
+#from pybloom import BloomFilter
+
+
+class KGLoader:
+    """
+    load the FB15K
+    """
+
+    def __init__(self, data_dir, batch_size, neg_mode, neg_times):
+        """init"""
+        self.name = os.path.split(data_dir)[-1]
+        self._feed_list = ["pos_triple", "neg_triple"]
+        self._data_dir = data_dir
+        self._batch_size = batch_size
+        self._neg_mode = neg_mode
+        self._neg_times = neg_times
+
+        self._entity2id = {}
+        self._relation2id = {}
+        self.training_triple_pool = set()
+
+        self._triple_train = None
+        self._triple_test = None
+        self._triple_valid = None
+
+        self.entity_total = 0
+        self.relation_total = 0
+        self.train_num = 0
+        self.test_num = 0
+        self.valid_num = 0
+
+        self.load_data()
+
+    def test_data_batch(self, batch_size=None):
+        """
+        Test data reader.
+        :param batch_size: Todo: batch_size > 1.
+        :return: None
+        """
+        for i in range(self.test_num):
+            data = np.array(self._triple_test[i])
+            data = data.reshape((-1))
+            yield [data]
+
+    def training_data_no_filter(self, train_triple_positive):
+        """faster, no filter for exists triples"""
+        size = len(train_triple_positive) * self._neg_times
+        train_triple_negative = train_triple_positive.repeat(
+            self._neg_times, axis=0)
+        replace_head_probability = 0.5 * np.ones(size)
+        replace_entity_id = np.random.randint(self.entity_total, size=size)
+        random_num = np.random.random(size=size)
+        index_t = (random_num < replace_head_probability) * 1
+        train_triple_negative[:, 0] = train_triple_negative[:, 0] + (
+            replace_entity_id - train_triple_negative[:, 0]) * index_t
+        train_triple_negative[:, 2] = replace_entity_id + (
+            train_triple_negative[:, 2] - replace_entity_id) * index_t
+        train_triple_positive = np.expand_dims(train_triple_positive, axis=2)
+        train_triple_negative = np.expand_dims(train_triple_negative, axis=2)
+        return train_triple_positive, train_triple_negative
+
+    def training_data_map(self, train_triple_positive):
+        """
+        Map function for negative sampling.
+        :param train_triple_positive: the triple positive.
+        :return: the positive and negative triples.
+        """
+        size = len(train_triple_positive)
+        train_triple_negative = []
+        for i in range(size):
+            corrupt_head_prob = np.random.binomial(1, 0.5)
+            head_neg = train_triple_positive[i][0]
+            relation = train_triple_positive[i][1]
+            tail_neg = train_triple_positive[i][2]
+            for j in range(0, self._neg_times):
+                sample = train_triple_positive[i] + 0
+                while True:
+                    rand_id = np.random.randint(self.entity_total)
+                    if corrupt_head_prob:
+                        if (rand_id, relation, tail_neg
+                            ) not in self.training_triple_pool:
+                            sample[0] = rand_id
+                            train_triple_negative.append(sample)
+                            break
+                    else:
+                        if (head_neg, relation, rand_id
+                            ) not in self.training_triple_pool:
+                            sample[2] = rand_id
+                            train_triple_negative.append(sample)
+                            break
+        train_triple_positive = np.expand_dims(train_triple_positive, axis=2)
+        train_triple_negative = np.expand_dims(train_triple_negative, axis=2)
+        if self._neg_mode:
+            return train_triple_positive, train_triple_negative, np.array(
+                [corrupt_head_prob], dtype="float32")
+        return train_triple_positive, train_triple_negative
+
+    def training_data_batch(self):
+        """
+        train_triple_positive
+        :return:
+        """
+        n = len(self._triple_train)
+        rand_idx = np.random.permutation(n)
+        n_triple = len(rand_idx)
+        start = 0
+        while start < n_triple:
+            end = min(start + self._batch_size, n_triple)
+            train_triple_positive = self._triple_train[rand_idx[start:end]]
+            start = end
+            yield train_triple_positive
+
+    def load_kg_triple(self, file):
+        """
+        Read in kg files.
+        """
+        triples = []
+        with open(os.path.join(self._data_dir, file), "r") as f:
+            for line in f.readlines():
+                line_list = line.strip().split('\t')
+                assert len(line_list) == 3
+                head = self._entity2id[line_list[0]]
+                tail = self._entity2id[line_list[1]]
+                relation = self._relation2id[line_list[2]]
+                triples.append((head, relation, tail))
+        return np.array(triples)
+
+    def load_data(self):
+        """
+        load kg dataset.
+        """
+        log.info("Start loading the {} dataset".format(self.name))
+        with open(os.path.join(self._data_dir, 'entity2id.txt'), "r") as f:
+            for line in f.readlines():
+                line = line.strip().split('\t')
+                self._entity2id[line[0]] = int(line[1])
+        with open(os.path.join(self._data_dir, 'relation2id.txt'), "r") as f:
+            for line in f.readlines():
+                line = line.strip().split('\t')
+                self._relation2id[line[0]] = int(line[1])
+        self._triple_train = self.load_kg_triple('train.txt')
+        self._triple_test = self.load_kg_triple('test.txt')
+        self._triple_valid = self.load_kg_triple('valid.txt')
+
+        self.relation_total = len(self._relation2id)
+        self.entity_total = len(self._entity2id)
+        self.train_num = len(self._triple_train)
+        self.test_num = len(self._triple_test)
+        self.valid_num = len(self._triple_valid)
+
+        #bloom_capacity = len(self._triple_train) + len(self._triple_test) + len(self._triple_valid)
+        #self.training_triple_pool = BloomFilter(capacity=bloom_capacity, error_rate=0.01)
+        for i in range(len(self._triple_train)):
+            self.training_triple_pool.add(
+                (self._triple_train[i, 0], self._triple_train[i, 1],
+                 self._triple_train[i, 2]))
+
+        for i in range(len(self._triple_test)):
+            self.training_triple_pool.add(
+                (self._triple_test[i, 0], self._triple_test[i, 1],
+                 self._triple_test[i, 2]))
+
+        for i in range(len(self._triple_valid)):
+            self.training_triple_pool.add(
+                (self._triple_valid[i, 0], self._triple_valid[i, 1],
+                 self._triple_valid[i, 2]))
+        log.info('entity number: {}'.format(self.entity_total))
+        log.info('relation number: {}'.format(self.relation_total))
+        log.info('training triple number: {}'.format(self.train_num))
+        log.info('testing triple number: {}'.format(self.test_num))
+        log.info('valid triple number: {}'.format(self.valid_num))
--- a/examples/pgl-ke/evalutate.py
+++ b/examples/pgl-ke/evalutate.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Evaluate.py: Evaluator for the results of knowledge graph embeddings.
+"""
+import numpy as np
+import timeit
+from mp_mapper import mp_reader_mapper
+from pgl.utils.logger import log
+
+
+class Evaluate:
+    """
+    Evaluate for trained models.
+    """
+
+    def __init__(self, reader):
+        self.reader = reader
+        self.training_triple_pool = self.reader.training_triple_pool
+
+    @staticmethod
+    def rank_extract(results, training_triple_pool):
+        """
+        :param results: the scores of test examples.
+        :param training_triple_pool: existing edges.
+        :return: the ranks.
+        """
+        eval_triple, head_score, tail_score = results
+        head_order = np.argsort(head_score)
+        tail_order = np.argsort(tail_score)
+        head, relation, tail = eval_triple[0], eval_triple[1], eval_triple[2]
+        head_rank_raw = 1
+        tail_rank_raw = 1
+        head_rank_filter = 1
+        tail_rank_filter = 1
+        for candidate in head_order:
+            if candidate == head:
+                break
+            else:
+                head_rank_raw += 1
+                if (candidate, relation, tail) in training_triple_pool:
+                    continue
+                else:
+                    head_rank_filter += 1
+        for candidate in tail_order:
+            if candidate == tail:
+                break
+            else:
+                tail_rank_raw += 1
+                if (head, relation, candidate) in training_triple_pool:
+                    continue
+                else:
+                    tail_rank_filter += 1
+        return head_rank_raw, tail_rank_raw, head_rank_filter, tail_rank_filter
+
+    def launch_evaluation(self,
+                          exe,
+                          program,
+                          reader,
+                          fetch_list,
+                          num_workers=4):
+        """
+        launch_evaluation
+        :param exe: executor.
+        :param program: paddle program.
+        :param reader: test reader.
+        :param fetch_list: fetch list.
+        :param num_workers: num of workers.
+        :return: None
+        """
+
+        def func(training_triple_pool):
+            """func"""
+
+            def run_func(results):
+                """run_func"""
+                return self.rank_extract(results, training_triple_pool)
+
+            return run_func
+
+        def iterator():
+            """iterator"""
+            n_used_eval_triple = 0
+            start = timeit.default_timer()
+            for batch_feed_dict in reader():
+                head_score, tail_score = exe.run(program=program,
+                                                 fetch_list=fetch_list,
+                                                 feed=batch_feed_dict)
+                yield batch_feed_dict["test_triple"], head_score, tail_score
+                n_used_eval_triple += 1
+                if n_used_eval_triple % 500 == 0:
+                    print('[{:.3f}s] #evaluation triple: {}/{}'.format(
+                        timeit.default_timer(
+                        ) - start, n_used_eval_triple, self.reader.test_num))
+
+        res_reader = mp_reader_mapper(
+            reader=iterator,
+            func=func(self.training_triple_pool),
+            num_works=num_workers)
+        self.result(res_reader)
+
+    @staticmethod
+    def result(rank_result_iter):
+        """
+        Calculate the final results.
+        :param rank_result_iter: results iter.
+        :return: None
+        """
+        all_rank = [[], []]
+        for data in rank_result_iter():
+            for i in range(4):
+                all_rank[i // 2].append(data[i])
+
+        raw_rank = np.array(all_rank[0])
+        filter_rank = np.array(all_rank[1])
+        log.info("-----Raw-Average-Results")
+        log.info(
+            'MeanRank: {:.2f}, MRR: {:.4f}, Hits@1: {:.4f}, Hits@3: {:.4f}, Hits@10: {:.4f}'.
+            format(raw_rank.mean(), (1 / raw_rank).mean(), (raw_rank <= 1).
+                   mean(), (raw_rank <= 3).mean(), (raw_rank <= 10).mean()))
+        log.info("-----Filter-Average-Results")
+        log.info(
+            'MeanRank: {:.2f}, MRR: {:.4f}, Hits@1: {:.4f}, Hits@3: {:.4f}, Hits@10: {:.4f}'.
+            format(filter_rank.mean(), (1 / filter_rank).mean(), (
+                filter_rank <= 1).mean(), (filter_rank <= 3).mean(), (
+                    filter_rank <= 10).mean()))
--- a/examples/pgl-ke/main.py
+++ b/examples/pgl-ke/main.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+The script to run these models.
+"""
+import argparse
+import timeit
+import os
+import numpy as np
+import paddle.fluid as fluid
+from data_loader import KGLoader
+from evalutate import Evaluate
+from model import model_dict
+from model.utils import load_var
+from mp_mapper import mp_reader_mapper
+from pgl.utils.logger import log
+
+
+def run_round(batch_iter,
+              program,
+              exe,
+              fetch_list,
+              epoch,
+              prefix="train",
+              log_per_step=1000):
+    """
+    Run the program for one epoch.
+    :param batch_iter: the batch_iter of prepared data.
+    :param program: the running program, train_program or test program.
+    :param exe: the executor of paddle.
+    :param fetch_list: the variables to fetch.
+    :param epoch: the epoch number of train process.
+    :param prefix: the prefix name, type `string`.
+    :param log_per_step: log per step.
+    :return: None
+    """
+    batch = 0
+    tmp_epoch = 0
+    loss = 0
+    tmp_loss = 0
+    run_time = 0
+    data_time = 0
+    t2 = timeit.default_timer()
+    start_epoch_time = timeit.default_timer()
+    for batch_feed_dict in batch_iter():
+        batch += 1
+        t1 = timeit.default_timer()
+        data_time += (t1 - t2)
+        batch_fetch = exe.run(program,
+                              fetch_list=fetch_list,
+                              feed=batch_feed_dict)
+        if prefix == "train":
+            loss += batch_fetch[0]
+            tmp_loss += batch_fetch[0]
+        if batch % log_per_step == 0:
+            tmp_epoch += 1
+            if prefix == "train":
+                log.info("Epoch %s (%.7f sec) Train Loss: %.7f" %
+                         (epoch + tmp_epoch,
+                          timeit.default_timer() - start_epoch_time,
+                          tmp_loss[0] / batch))
+                start_epoch_time = timeit.default_timer()
+            else:
+                log.info("Batch %s" % batch)
+            batch = 0
+            tmp_loss = 0
+
+        t2 = timeit.default_timer()
+        run_time += (t2 - t1)
+
+    if prefix == "train":
+        log.info("GPU run time {}, Data prepare extra time {}".format(
+            run_time, data_time))
+        log.info("Epoch %s \t All Loss %s" % (epoch + tmp_epoch, loss))
+
+
+def train(args):
+    """
+    Train the knowledge graph embedding model.
+    :param args: all args.
+    :return: None
+    """
+    kgreader = KGLoader(
+        batch_size=args.batch_size,
+        data_dir=args.data_dir,
+        neg_mode=args.neg_mode,
+        neg_times=args.neg_times)
+    if args.model in model_dict:
+        Model = model_dict[args.model]
+    else:
+        raise ValueError("No model for name {}".format(args.model))
+    model = Model(
+        data_reader=kgreader,
+        hidden_size=args.hidden_size,
+        margin=args.margin,
+        learning_rate=args.learning_rate,
+        args=args,
+        optimizer=args.optimizer)
+
+    def iter_map_wrapper(data_batch, repeat=1):
+        """
+        wrapper for multiprocess reader
+        :param data_batch: the source data iter.
+        :param repeat: repeat data for multi epoch
+        :return: iterator of feed data
+        """
+
+        def data_repeat():
+            """repeat data for multi epoch"""
+            for i in range(repeat):
+                for d in data_batch():
+                    yield d
+
+        reader = mp_reader_mapper(
+            data_repeat,
+            func=kgreader.training_data_no_filter
+            if args.nofilter else kgreader.training_data_map,
+            num_works=args.sample_workers)
+
+        return reader
+
+    def iter_wrapper(data_batch, feed_list):
+        """
+        Decorator of make up the feed dict
+        :param data_batch: the source data iter.
+        :param feed_list: the feed list (names of variables).
+        :return: iterator of feed data.
+        """
+
+        def work():
+            """work"""
+            for batch in data_batch():
+                feed_dict = {}
+                for k, v in zip(feed_list, batch):
+                    feed_dict[k] = v
+                yield feed_dict
+
+        return work
+
+    loader = fluid.io.DataLoader.from_generator(
+        feed_list=model.train_feed_vars, capacity=20, iterable=True)
+
+    places = fluid.cuda_places() if args.use_cuda else fluid.cpu_places()
+    exe = fluid.Executor(places[0])
+    exe.run(model.startup_program)
+    exe.run(fluid.default_startup_program())
+    if args.pretrain and model.model_name in ["TransR", "transr"]:
+        pretrain_ent = os.path.join(args.checkpoint,
+                                    model.ent_name.replace("TransR", "TransE"))
+        pretrain_rel = os.path.join(args.checkpoint,
+                                    model.rel_name.replace("TransR", "TransE"))
+        if os.path.exists(pretrain_ent):
+            print("loading pretrain!")
+            #var = fluid.global_scope().find_var(model.ent_name)
+            load_var(exe, model.train_program, model.ent_name, pretrain_ent)
+            #var = fluid.global_scope().find_var(model.rel_name)
+            load_var(exe, model.train_program, model.rel_name, pretrain_rel)
+        else:
+            raise ValueError("pretrain file {} not exists!".format(
+                pretrain_ent))
+
+    prog = fluid.CompiledProgram(model.train_program).with_data_parallel(
+        loss_name=model.train_fetch_vars[0].name)
+
+    if args.only_evaluate:
+        s = timeit.default_timer()
+        fluid.io.load_params(
+            exe, dirname=args.checkpoint, main_program=model.train_program)
+        Evaluate(kgreader).launch_evaluation(
+            exe=exe,
+            reader=iter_wrapper(kgreader.test_data_batch,
+                                model.test_feed_list),
+            fetch_list=model.test_fetch_vars,
+            program=model.test_program,
+            num_workers=10)
+        log.info(timeit.default_timer() - s)
+        return None
+
+    batch_iter = iter_map_wrapper(
+        kgreader.training_data_batch,
+        repeat=args.evaluate_per_iteration, )
+    loader.set_batch_generator(batch_iter, places=places)
+
+    for epoch in range(0, args.epoch // args.evaluate_per_iteration):
+        run_round(
+            batch_iter=loader,
+            exe=exe,
+            prefix="train",
+            # program=model.train_program,
+            program=prog,
+            fetch_list=model.train_fetch_vars,
+            log_per_step=kgreader.train_num // args.batch_size,
+            epoch=epoch * args.evaluate_per_iteration)
+        log.info("epoch\t%s" % ((1 + epoch) * args.evaluate_per_iteration))
+        fluid.io.save_params(
+            exe, dirname=args.checkpoint, main_program=model.train_program)
+        if not args.noeval:
+            eva = Evaluate(kgreader)
+            eva.launch_evaluation(
+                exe=exe,
+                reader=iter_wrapper(kgreader.test_data_batch,
+                                    model.test_feed_list),
+                fetch_list=model.test_fetch_vars,
+                program=model.test_program,
+                num_workers=10)
+
+
+def main():
+    """
+    The main entry of all.
+    :return: None
+    """
+    parser = argparse.ArgumentParser(
+        description="Knowledge Graph Embedding for PGL")
+    parser.add_argument('--use_cuda', action='store_true', help="use_cuda")
+    parser.add_argument(
+        '--data_dir',
+        dest='data_dir',
+        type=str,
+        help='the directory of dataset',
+        default='./data/WN18/')
+    parser.add_argument(
+        '--model',
+        dest='model',
+        type=str,
+        help="model to run",
+        default="TransE")
+    parser.add_argument(
+        '--learning_rate',
+        dest='learning_rate',
+        type=float,
+        help='learning rate',
+        default=0.001)
+    parser.add_argument(
+        '--epoch', dest='epoch', type=int, help='epoch to run', default=400)
+    parser.add_argument(
+        '--sample_workers',
+        dest='sample_workers',
+        type=int,
+        help='sample workers',
+        default=4)
+    parser.add_argument(
+        '--batch_size',
+        dest='batch_size',
+        type=int,
+        help="batch size",
+        default=1000)
+    parser.add_argument(
+        '--optimizer',
+        dest='optimizer',
+        type=str,
+        help='optimizer',
+        default='adam')
+    parser.add_argument(
+        '--hidden_size',
+        dest='hidden_size',
+        type=int,
+        help='embedding dimension',
+        default=50)
+    parser.add_argument(
+        '--margin', dest='margin', type=float, help='margin', default=4.0)
+    parser.add_argument(
+        '--checkpoint',
+        dest='checkpoint',
+        type=str,
+        help='directory to save checkpoint directory',
+        default='output/')
+    parser.add_argument(
+        '--evaluate_per_iteration',
+        dest='evaluate_per_iteration',
+        type=int,
+        help='evaluate the training result per x iteration',
+        default=50)
+    parser.add_argument(
+        '--only_evaluate',
+        dest='only_evaluate',
+        action='store_true',
+        help='only do the evaluate program',
+        default=False)
+    parser.add_argument(
+        '--adv_temp_value', type=float, help='adv_temp_value', default=2.0)
+    parser.add_argument('--neg_times', type=int, help='neg_times', default=1)
+    parser.add_argument(
+        '--neg_mode', type=bool, help='return neg mode flag', default=False)
+
+    parser.add_argument(
+        '--nofilter',
+        type=bool,
+        help='don\'t filter invalid examples',
+        default=False)
+    parser.add_argument(
+        '--pretrain',
+        type=bool,
+        help='pretrain for TransR model',
+        default=False)
+    parser.add_argument(
+        '--noeval',
+        type=bool,
+        help='whether to evaluate the result',
+        default=False)
+
+    args = parser.parse_args()
+    log.info(args)
+    train(args)
+
+
+if __name__ == '__main__':
+    main()
--- a/examples/pgl-ke/model/Model.py
+++ b/examples/pgl-ke/model/Model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Base model of the knowledge graph embedding model.
+"""
+from paddle import fluid
+
+
+class Model(object):
+    """
+    Base model.
+    """
+
+    def __init__(self, **kwargs):
+        """
+        Init model
+        """
+        # Needed parameters
+        self.model_name = kwargs["model_name"]
+        self.data_reader = kwargs["data_reader"]
+        self._hidden_size = kwargs["hidden_size"]
+        self._learning_rate = kwargs["learning_rate"]
+        self._optimizer = kwargs["optimizer"]
+        self.args = kwargs["args"]
+
+        # Optional parameters
+        if "margin" in kwargs:
+            self._margin = kwargs["margin"]
+        self._prefix = "%s_%s_dim=%d_" % (
+            self.model_name, self.data_reader.name, self._hidden_size)
+        self.ent_name = self._prefix + "entity_embeddings"
+        self.rel_name = self._prefix + "relation_embeddings"
+
+        self._entity_total = self.data_reader.entity_total
+        self._relation_total = self.data_reader.relation_total
+        self._ent_shape = [self._entity_total, self._hidden_size]
+        self._rel_shape = [self._relation_total, self._hidden_size]
+
+    def construct(self):
+        """
+        Construct the program
+        :return: None
+        """
+        self.startup_program = fluid.Program()
+        self.train_program = fluid.Program()
+        self.test_program = fluid.Program()
+
+        with fluid.program_guard(self.train_program, self.startup_program):
+            self.train_pos_input = fluid.layers.data(
+                "pos_triple",
+                dtype="int64",
+                shape=[None, 3, 1],
+                append_batch_size=False)
+            self.train_neg_input = fluid.layers.data(
+                "neg_triple",
+                dtype="int64",
+                shape=[None, 3, 1],
+                append_batch_size=False)
+            self.train_feed_list = ["pos_triple", "neg_triple"]
+            self.train_feed_vars = [self.train_pos_input, self.train_neg_input]
+            self.train_fetch_vars = self.construct_train_program()
+            loss = self.train_fetch_vars[0]
+            self.apply_optimizer(loss, opt=self._optimizer)
+
+        with fluid.program_guard(self.test_program, self.startup_program):
+            self.test_input = fluid.layers.data(
+                "test_triple",
+                dtype="int64",
+                shape=[3],
+                append_batch_size=False)
+            self.test_feed_list = ["test_triple"]
+            self.test_fetch_vars = self.construct_test_program()
+
+    def apply_optimizer(self, loss, opt="sgd"):
+        """
+        Construct the backward of the train program.
+        :param loss: `type : variable` final loss of the model.
+        :param opt: `type : string` the optimizer name
+        :return:
+        """
+        optimizer_available = {
+            "adam": fluid.optimizer.Adam,
+            "sgd": fluid.optimizer.SGD,
+            "momentum": fluid.optimizer.Momentum
+        }
+        if opt in optimizer_available:
+            opt_func = optimizer_available[opt]
+        else:
+            opt_func = None
+        if opt_func is None:
+            raise ValueError("You should chose the optimizer in %s" %
+                             optimizer_available.keys())
+        else:
+            optimizer = opt_func(learning_rate=self._learning_rate)
+            return optimizer.minimize(loss)
+
+    def construct_train_program(self):
+        """
+        This function should construct the train program with the `self.train_pos_input`
+        and `self.train_neg_input`. These inputs are batch of triples.
+        :return: List of variables you want to get. Please be sure the ':var loss' should
+            be in the first place, eg. [loss, variable1, variable2, ...].
+        """
+        raise NotImplementedError(
+            "You should define the construct_train_program"
+            " function before use it!")
+
+    def construct_test_program(self):
+        """
+        This function should construct test (or evaluate) program with the 'self.test_input'.
+        Util now, we only support a triple the evaluate the ranks.
+        :return: the distance of all entity with the test triple (for both head and tail entity).
+        """
+        raise NotImplementedError(
+            "You should define the construct_test_program"
+            " function before use it")
--- a/examples/pgl-ke/model/RotatE.py
+++ b/examples/pgl-ke/model/RotatE.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+RotatE:
+"RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space."
+Sun, Zhiqing, et al.
+https://arxiv.org/abs/1902.10197
+"""
+import paddle.fluid as fluid
+from .Model import Model
+from .utils import lookup_table
+
+
+class RotatE(Model):
+    """
+    RotatE model.
+    """
+
+    def __init__(self,
+                 data_reader,
+                 hidden_size,
+                 margin,
+                 learning_rate,
+                 args,
+                 optimizer="adam"):
+        super(RotatE, self).__init__(
+            model_name="RotatE",
+            data_reader=data_reader,
+            hidden_size=hidden_size,
+            margin=margin,
+            learning_rate=learning_rate,
+            args=args,
+            optimizer=optimizer)
+
+        self._neg_times = self.args.neg_times
+        self._adv_temp_value = self.args.adv_temp_value
+
+        self._relation_hidden_size = self._hidden_size
+        self._entity_hidden_size = self._hidden_size * 2
+        self._entity_embedding_margin = (
+            self._margin + 2) / self._entity_hidden_size
+        self._relation_embedding_margin = (
+            self._margin + 2) / self._relation_hidden_size
+        self._rel_shape = [self._relation_total, self._relation_hidden_size]
+        self._ent_shape = [self._entity_total, self._entity_hidden_size]
+        self._pi = 3.141592654
+
+        self.construct_program()
+
+    def construct_program(self):
+        """
+        construct the main program for train and test
+        """
+        self.startup_program = fluid.Program()
+        self.train_program = fluid.Program()
+        self.test_program = fluid.Program()
+
+        with fluid.program_guard(self.train_program, self.startup_program):
+            self.train_pos_input = fluid.layers.data(
+                "pos_triple",
+                dtype="int64",
+                shape=[None, 3, 1],
+                append_batch_size=False)
+            self.train_neg_input = fluid.layers.data(
+                "neg_triple",
+                dtype="int64",
+                shape=[None, 3, 1],
+                append_batch_size=False)
+            self.train_neg_mode = fluid.layers.data(
+                "neg_mode",
+                dtype='float32',
+                shape=[1],
+                append_batch_size=False)
+            self.train_feed_vars = [
+                self.train_pos_input, self.train_neg_input, self.train_neg_mode
+            ]
+            self.train_fetch_vars = self.construct_train_program()
+            loss = self.train_fetch_vars[0]
+            self.apply_optimizer(loss, opt=self._optimizer)
+
+        with fluid.program_guard(self.test_program, self.startup_program):
+            self.test_input = fluid.layers.data(
+                "test_triple",
+                dtype="int64",
+                shape=[3],
+                append_batch_size=False)
+            self.test_feed_list = ["test_triple"]
+            self.test_fetch_vars = self.construct_test_program()
+
+    def creat_share_variables(self):
+        """
+        Share variables for train and test programs.
+        """
+        entity_embedding = fluid.layers.create_parameter(
+            shape=self._ent_shape,
+            dtype="float32",
+            name=self.ent_name,
+            default_initializer=fluid.initializer.Uniform(
+                low=-1.0 * self._entity_embedding_margin,
+                high=1.0 * self._entity_embedding_margin))
+        relation_embedding = fluid.layers.create_parameter(
+            shape=self._rel_shape,
+            dtype="float32",
+            name=self.rel_name,
+            default_initializer=fluid.initializer.Uniform(
+                low=-1.0 * self._relation_embedding_margin,
+                high=1.0 * self._relation_embedding_margin))
+
+        return entity_embedding, relation_embedding
+
+    def score_with_l2_normalize(self, head, tail, rel, epsilon_var,
+                                train_neg_mode):
+        """
+        Score function of RotatE  
+        """
+        one_var = fluid.layers.fill_constant(
+            shape=[1], dtype='float32', value=1.0)
+        re_head, im_head = fluid.layers.split(head, num_or_sections=2, dim=-1)
+        re_tail, im_tail = fluid.layers.split(tail, num_or_sections=2, dim=-1)
+
+        phase_relation = rel / (self._relation_embedding_margin / self._pi)
+        re_relation = fluid.layers.cos(phase_relation)
+        im_relation = fluid.layers.sin(phase_relation)
+
+        re_score = re_relation * re_tail + im_relation * im_tail
+        im_score = re_relation * im_tail - im_relation * re_tail
+        re_score = re_score - re_head
+        im_score = im_score - im_head
+        #with fluid.layers.control_flow.Switch() as switch:
+        #    with switch.case(train_neg_mode == one_var):
+        #        re_score = re_relation * re_tail + im_relation * im_tail
+        #        im_score = re_relation * im_tail - im_relation * re_tail
+        #        re_score = re_score - re_head
+        #        im_score = im_score - im_head
+        #    with switch.default():
+        #        re_score = re_head * re_relation - im_head * im_relation
+        #        im_score = re_head * im_relation + im_head * re_relation
+        #        re_score = re_score - re_tail
+        #        im_score = im_score - im_tail
+
+        re_score = re_score * re_score
+        im_score = im_score * im_score
+
+        score = re_score + im_score
+        score = score + epsilon_var
+        score = fluid.layers.sqrt(score)
+        score = fluid.layers.reduce_sum(score, dim=-1)
+        return self._margin - score
+
+    def adverarial_weight(self, score):
+        """
+        adverarial the weight for softmax
+        """
+        adv_score = self._adv_temp_value * score
+        adv_softmax = fluid.layers.softmax(adv_score)
+        return adv_softmax
+
+    def construct_train_program(self):
+        """
+        Construct train program
+        """
+        zero_var = fluid.layers.fill_constant(
+            shape=[1], dtype='float32', value=0.0)
+        epsilon_var = fluid.layers.fill_constant(
+            shape=[1], dtype='float32', value=1e-12)
+        entity_embedding, relation_embedding = self.creat_share_variables()
+        pos_head = lookup_table(self.train_pos_input[:, 0], entity_embedding)
+        pos_tail = lookup_table(self.train_pos_input[:, 2], entity_embedding)
+        pos_rel = lookup_table(self.train_pos_input[:, 1], relation_embedding)
+        neg_head = lookup_table(self.train_neg_input[:, 0], entity_embedding)
+        neg_tail = lookup_table(self.train_neg_input[:, 2], entity_embedding)
+        neg_rel = lookup_table(self.train_neg_input[:, 1], relation_embedding)
+
+        pos_score = self.score_with_l2_normalize(pos_head, pos_tail, pos_rel,
+                                                 epsilon_var, zero_var)
+        neg_score = self.score_with_l2_normalize(
+            neg_head, neg_tail, neg_rel, epsilon_var, self.train_neg_mode)
+
+        neg_score = fluid.layers.reshape(
+            neg_score, shape=[-1, self._neg_times], inplace=True)
+
+        if self._adv_temp_value > 0.0:
+            sigmoid_pos_score = fluid.layers.logsigmoid(1.0 * pos_score)
+            sigmoid_neg_score = fluid.layers.logsigmoid(
+                -1.0 * neg_score) * self.adverarial_weight(neg_score)
+            sigmoid_neg_score = fluid.layers.reduce_sum(
+                sigmoid_neg_score, dim=-1)
+        else:
+            sigmoid_pos_score = fluid.layers.logsigmoid(pos_score)
+            sigmoid_neg_score = fluid.layers.logsigmoid(-1.0 * neg_score)
+
+        loss_1 = fluid.layers.mean(sigmoid_pos_score)
+        loss_2 = fluid.layers.mean(sigmoid_neg_score)
+        loss = -1.0 * (loss_1 + loss_2) / 2
+        return [loss]
+
+    def score_with_l2_normalize_with_validate(self, entity_embedding, head,
+                                              rel, tail, epsilon_var):
+        """
+        the score function for validation
+        """
+        re_entity_embedding, im_entity_embedding = fluid.layers.split(
+            entity_embedding, num_or_sections=2, dim=-1)
+        re_head, im_head = fluid.layers.split(head, num_or_sections=2, dim=-1)
+        re_tail, im_tail = fluid.layers.split(tail, num_or_sections=2, dim=-1)
+        phase_relation = rel / (self._relation_embedding_margin / self._pi)
+        re_relation = fluid.layers.cos(phase_relation)
+        im_relation = fluid.layers.sin(phase_relation)
+
+        re_score = re_relation * re_tail + im_relation * im_tail
+        im_score = re_relation * im_tail - im_relation * re_tail
+        re_score = re_entity_embedding - re_score
+        im_score = im_entity_embedding - im_score
+
+        re_score = re_score * re_score
+        im_score = im_score * im_score
+        head_score = re_score + im_score
+        head_score += epsilon_var
+        head_score = fluid.layers.sqrt(head_score)
+        head_score = fluid.layers.reduce_sum(head_score, dim=-1)
+
+        re_score = re_head * re_relation - im_head * im_relation
+        im_score = re_head * im_relation + im_head * re_relation
+        re_score = re_entity_embedding - re_score
+        im_score = im_entity_embedding - im_score
+
+        re_score = re_score * re_score
+        im_score = im_score * im_score
+        tail_score = re_score + im_score
+        tail_score += epsilon_var
+        tail_score = fluid.layers.sqrt(tail_score)
+        tail_score = fluid.layers.reduce_sum(tail_score, dim=-1)
+
+        return head_score, tail_score
+
+    def construct_test_program(self):
+        """
+        Construct test program
+        """
+        epsilon_var = fluid.layers.fill_constant(
+            shape=[1], dtype='float32', value=1e-12)
+        entity_embedding, relation_embedding = self.creat_share_variables()
+
+        head_vec = lookup_table(self.test_input[0], entity_embedding)
+        rel_vec = lookup_table(self.test_input[1], relation_embedding)
+        tail_vec = lookup_table(self.test_input[2], entity_embedding)
+        head_vec = fluid.layers.unsqueeze(head_vec, axes=[0])
+        rel_vec = fluid.layers.unsqueeze(rel_vec, axes=[0])
+        tail_vec = fluid.layers.unsqueeze(tail_vec, axes=[0])
+
+        id_replace_head, id_replace_tail = self.score_with_l2_normalize_with_validate(
+            entity_embedding, head_vec, rel_vec, tail_vec, epsilon_var)
+
+        id_replace_head = fluid.layers.logsigmoid(id_replace_head)
+        id_replace_tail = fluid.layers.logsigmoid(id_replace_tail)
+
+        return [id_replace_head, id_replace_tail]
--- a/examples/pgl-ke/model/TransE.py
+++ b/examples/pgl-ke/model/TransE.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+TransE:
+"Translating embeddings for modeling multi-relational data."
+Bordes, Antoine, et al.
+https://www.utc.fr/~bordesan/dokuwiki/_media/en/transe_nips13.pdf
+"""
+import paddle.fluid as fluid
+from .Model import Model
+from .utils import lookup_table
+
+
+class TransE(Model):
+    """
+    The TransE Model.
+    """
+
+    def __init__(self,
+                 data_reader,
+                 hidden_size,
+                 margin,
+                 learning_rate,
+                 args,
+                 optimizer="adam"):
+        self._neg_times = args.neg_times
+        super(TransE, self).__init__(
+            model_name="TransE",
+            data_reader=data_reader,
+            hidden_size=hidden_size,
+            margin=margin,
+            learning_rate=learning_rate,
+            args=args,
+            optimizer=optimizer)
+        self.construct()
+
+    def creat_share_variables(self):
+        """
+        Share variables for train and test programs.
+        """
+        entity_embedding = fluid.layers.create_parameter(
+            shape=self._ent_shape, dtype="float32", name=self.ent_name)
+        relation_embedding = fluid.layers.create_parameter(
+            shape=self._rel_shape, dtype="float32", name=self.rel_name)
+        return entity_embedding, relation_embedding
+
+    @staticmethod
+    def score_with_l2_normalize(head, rel, tail):
+        """
+        Score function of TransE
+        """
+        head = fluid.layers.l2_normalize(head, axis=-1)
+        rel = fluid.layers.l2_normalize(rel, axis=-1)
+        tail = fluid.layers.l2_normalize(tail, axis=-1)
+        score = head + rel - tail
+        return score
+
+    def construct_train_program(self):
+        """
+        Construct train program.
+        """
+        entity_embedding, relation_embedding = self.creat_share_variables()
+        pos_head = lookup_table(self.train_pos_input[:, 0], entity_embedding)
+        pos_tail = lookup_table(self.train_pos_input[:, 2], entity_embedding)
+        pos_rel = lookup_table(self.train_pos_input[:, 1], relation_embedding)
+        neg_head = lookup_table(self.train_neg_input[:, 0], entity_embedding)
+        neg_tail = lookup_table(self.train_neg_input[:, 2], entity_embedding)
+        neg_rel = lookup_table(self.train_neg_input[:, 1], relation_embedding)
+
+        pos_score = self.score_with_l2_normalize(pos_head, pos_rel, pos_tail)
+        neg_score = self.score_with_l2_normalize(neg_head, neg_rel, neg_tail)
+
+        pos = fluid.layers.reduce_sum(
+            fluid.layers.abs(pos_score), 1, keep_dim=False)
+        neg = fluid.layers.reduce_sum(
+            fluid.layers.abs(neg_score), 1, keep_dim=False)
+        neg = fluid.layers.reshape(
+            neg, shape=[-1, self._neg_times], inplace=True)
+
+        loss = fluid.layers.reduce_mean(
+            fluid.layers.relu(pos - neg + self._margin))
+        return [loss]
+
+    def construct_test_program(self):
+        """
+        Construct test program
+        """
+        entity_embedding, relation_embedding = self.creat_share_variables()
+        entity_embedding = fluid.layers.l2_normalize(entity_embedding, axis=-1)
+        relation_embedding = fluid.layers.l2_normalize(
+            relation_embedding, axis=-1)
+        head_vec = lookup_table(self.test_input[0], entity_embedding)
+        rel_vec = lookup_table(self.test_input[1], relation_embedding)
+        tail_vec = lookup_table(self.test_input[2], entity_embedding)
+        # The paddle fluid.layers.topk GPU OP is very inefficient
+        # we do sort operation in the evaluation step using multiprocessing.
+        id_replace_head = fluid.layers.reduce_sum(
+            fluid.layers.abs(entity_embedding + rel_vec - tail_vec), dim=1)
+        id_replace_tail = fluid.layers.reduce_sum(
+            fluid.layers.abs(entity_embedding - rel_vec - head_vec), dim=1)
+
+        return [id_replace_head, id_replace_tail]
--- a/examples/pgl-ke/model/TransR.py
+++ b/examples/pgl-ke/model/TransR.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+TransR:
+"Learning entity and relation embeddings for knowledge graph completion."
+Lin, Yankai, et al.
+https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9571/9523
+"""
+import numpy as np
+import paddle.fluid as fluid
+from .Model import Model
+from .utils import lookup_table
+
+
+class TransR(Model):
+    """
+    TransR model.
+    """
+
+    def __init__(self,
+                 data_reader,
+                 hidden_size,
+                 margin,
+                 learning_rate,
+                 args,
+                 optimizer="adam"):
+        """init"""
+        self._neg_times = args.neg_times
+        super(TransR, self).__init__(
+            model_name="TransR",
+            data_reader=data_reader,
+            hidden_size=hidden_size,
+            margin=margin,
+            learning_rate=learning_rate,
+            args=args,
+            optimizer=optimizer)
+        self.construct()
+
+    def creat_share_variables(self):
+        """
+        Share variables for train and test programs.
+        """
+        entity_embedding = fluid.layers.create_parameter(
+            shape=self._ent_shape,
+            dtype="float32",
+            name=self.ent_name,
+            default_initializer=fluid.initializer.Xavier())
+        relation_embedding = fluid.layers.create_parameter(
+            shape=self._rel_shape,
+            dtype="float32",
+            name=self.rel_name,
+            default_initializer=fluid.initializer.Xavier())
+        init_values = np.tile(
+            np.identity(
+                self._hidden_size, dtype="float32").reshape(-1),
+            (self._relation_total, 1))
+        transfer_matrix = fluid.layers.create_parameter(
+            shape=[
+                self._relation_total, self._hidden_size * self._hidden_size
+            ],
+            dtype="float32",
+            name=self._prefix + "transfer_matrix",
+            default_initializer=fluid.initializer.NumpyArrayInitializer(
+                init_values))
+
+        return entity_embedding, relation_embedding, transfer_matrix
+
+    def score_with_l2_normalize(self, head, rel, tail):
+        """
+        Score function of TransR
+        """
+        head = fluid.layers.l2_normalize(head, axis=-1)
+        rel = fluid.layers.l2_normalize(rel, axis=-1)
+        tail = fluid.layers.l2_normalize(tail, axis=-1)
+        score = head + rel - tail
+        return score
+
+    @staticmethod
+    def matmul_with_expend_dims(x, y):
+        """matmul_with_expend_dims"""
+        x = fluid.layers.unsqueeze(x, axes=[1])
+        res = fluid.layers.matmul(x, y)
+        return fluid.layers.squeeze(res, axes=[1])
+
+    def construct_train_program(self):
+        """
+        Construct train program
+        """
+        entity_embedding, relation_embedding, transfer_matrix = self.creat_share_variables(
+        )
+        pos_head = lookup_table(self.train_pos_input[:, 0], entity_embedding)
+        pos_tail = lookup_table(self.train_pos_input[:, 2], entity_embedding)
+        pos_rel = lookup_table(self.train_pos_input[:, 1], relation_embedding)
+        neg_head = lookup_table(self.train_neg_input[:, 0], entity_embedding)
+        neg_tail = lookup_table(self.train_neg_input[:, 2], entity_embedding)
+        neg_rel = lookup_table(self.train_neg_input[:, 1], relation_embedding)
+
+        rel_matrix = fluid.layers.reshape(
+            lookup_table(self.train_pos_input[:, 1], transfer_matrix),
+            [-1, self._hidden_size, self._hidden_size])
+        pos_head_trans = self.matmul_with_expend_dims(pos_head, rel_matrix)
+        pos_tail_trans = self.matmul_with_expend_dims(pos_tail, rel_matrix)
+
+        trans_neg = True
+        if trans_neg:
+            rel_matrix_neg = fluid.layers.reshape(
+                lookup_table(self.train_neg_input[:, 1], transfer_matrix),
+                [-1, self._hidden_size, self._hidden_size])
+            neg_head_trans = self.matmul_with_expend_dims(neg_head,
+                                                          rel_matrix_neg)
+            neg_tail_trans = self.matmul_with_expend_dims(neg_tail,
+                                                          rel_matrix_neg)
+        else:
+            neg_head_trans = self.matmul_with_expend_dims(neg_head, rel_matrix)
+            neg_tail_trans = self.matmul_with_expend_dims(neg_tail, rel_matrix)
+
+        pos_score = self.score_with_l2_normalize(pos_head_trans, pos_rel,
+                                                 pos_tail_trans)
+        neg_score = self.score_with_l2_normalize(neg_head_trans, neg_rel,
+                                                 neg_tail_trans)
+
+        pos = fluid.layers.reduce_sum(
+            fluid.layers.abs(pos_score), -1, keep_dim=False)
+        neg = fluid.layers.reduce_sum(
+            fluid.layers.abs(neg_score), -1, keep_dim=False)
+        neg = fluid.layers.reshape(
+            neg, shape=[-1, self._neg_times], inplace=True)
+
+        loss = fluid.layers.reduce_mean(
+            fluid.layers.relu(pos - neg + self._margin))
+        return [loss]
+
+    def construct_test_program(self):
+        """
+        Construct test program
+        """
+        entity_embedding, relation_embedding, transfer_matrix = self.creat_share_variables(
+        )
+        rel_matrix = fluid.layers.reshape(
+            lookup_table(self.test_input[1], transfer_matrix),
+            [self._hidden_size, self._hidden_size])
+        entity_embedding_trans = fluid.layers.matmul(entity_embedding,
+                                                     rel_matrix, False, False)
+        rel_vec = lookup_table(self.test_input[1], relation_embedding)
+        entity_embedding_trans = fluid.layers.l2_normalize(
+            entity_embedding_trans, axis=-1)
+        rel_vec = fluid.layers.l2_normalize(rel_vec, axis=-1)
+        head_vec = lookup_table(self.test_input[0], entity_embedding_trans)
+        tail_vec = lookup_table(self.test_input[2], entity_embedding_trans)
+
+        # The paddle fluid.layers.topk GPU OP is very inefficient
+        # we do sort operation in the evaluation step using multiprocessing
+        id_replace_head = fluid.layers.reduce_sum(
+            fluid.layers.abs(entity_embedding_trans + rel_vec - tail_vec),
+            dim=1)
+        id_replace_tail = fluid.layers.reduce_sum(
+            fluid.layers.abs(entity_embedding_trans - rel_vec - head_vec),
+            dim=1)
+
+        return [id_replace_head, id_replace_tail]
--- a/examples/pgl-ke/model/__init__.py
+++ b/examples/pgl-ke/model/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""import all models"""
+
+from .TransE import TransE
+from .TransR import TransR
+from .RotatE import RotatE
+model_dict = {
+    "TransE": TransE,
+    "transe": TransE,
+    "TransR": TransR,
+    "transr": TransR,
+    "RotatE": RotatE
+}
--- a/examples/pgl-ke/model/utils.py
+++ b/examples/pgl-ke/model/utils.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Utils for the models.
+"""
+import paddle.fluid as fluid
+from paddle.fluid.layer_helper import LayerHelper
+
+
+def lookup_table(input, embedding_table, dtype='float32'):
+    """
+    lookup table support for paddle.
+    :param input:
+    :param embedding_table:
+    :param dtype:
+    :return:
+    """
+    is_sparse = False
+    is_distributed = False
+    helper = LayerHelper('embedding', **locals())
+    remote_prefetch = is_sparse and (not is_distributed)
+    if remote_prefetch:
+        assert is_sparse is True and is_distributed is False
+    tmp = helper.create_variable_for_type_inference(dtype)
+    padding_idx = -1
+    helper.append_op(
+        type='lookup_table',
+        inputs={'Ids': input,
+                'W': embedding_table},
+        outputs={'Out': tmp},
+        attrs={
+            'is_sparse': is_sparse,
+            'is_distributed': is_distributed,
+            'remote_prefetch': remote_prefetch,
+            'padding_idx': padding_idx
+        })
+    return tmp
+
+
+def lookup_table_gather(index, input):
+    """
+    lookup table support for paddle by gather.
+    :param index:
+    :param input:
+    :return:
+    """
+    return fluid.layers.gather(index=index, input=input, overwrite=False)
+
+
+def _clone_var_in_block_(block, var):
+    assert isinstance(var, fluid.Variable)
+    if var.desc.type() == fluid.core.VarDesc.VarType.LOD_TENSOR:
+        return block.create_var(
+            name=var.name,
+            shape=var.shape,
+            dtype=var.dtype,
+            type=var.type,
+            lod_level=var.lod_level,
+            persistable=True)
+    else:
+        return block.create_var(
+            name=var.name,
+            shape=var.shape,
+            dtype=var.dtype,
+            type=var.type,
+            persistable=True)
+
+
+def load_var(executor, main_program=None, var=None, filename=None):
+    """
+    load_var to certain program
+    :param executor: executor
+    :param main_program: the program to load
+    :param var: the variable name in main_program.
+    :file_name: the file name of the file to load.
+    :return: None
+    """
+    load_prog = fluid.Program()
+    load_block = load_prog.global_block()
+
+    if main_program is None:
+        main_program = fluid.default_main_program()
+
+    if not isinstance(main_program, fluid.Program):
+        raise TypeError("program should be as Program type or None")
+
+    vars = list(filter(None, main_program.list_vars()))
+    # save origin param shape
+    orig_para_shape = {}
+    load_var_map = {}
+    for each_var in vars:
+        if each_var.name != var:
+            continue
+        assert isinstance(each_var, fluid.Variable)
+        if each_var.type == fluid.core.VarDesc.VarType.RAW:
+            continue
+
+        if isinstance(each_var, fluid.framework.Parameter):
+            orig_para_shape[each_var.name] = tuple(each_var.desc.get_shape())
+        new_var = _clone_var_in_block_(load_block, each_var)
+        if filename is not None:
+            load_block.append_op(
+                type='load',
+                inputs={},
+                outputs={'Out': [new_var]},
+                attrs={'file_path': filename})
+
+    executor.run(load_prog)
--- a/examples/pgl-ke/mp_mapper.py
+++ b/examples/pgl-ke/mp_mapper.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file aims to use multiprocessing to do following process.
+    `
+    for data in reader():
+        yield func(data)
+    `
+"""
+#encoding=utf8
+import numpy as np
+import multiprocessing as mp
+import traceback
+from pgl.utils.logger import log
+
+
+def mp_reader_mapper(reader, func, num_works=4):
+    """
+    This function aims to use multiprocessing to do following process.
+    `
+    for data in reader():
+        yield func(data)
+    `
+    The data in_stream is the `reader`, the mapper is map the in_stream to
+    an out_stream.
+    Please ensure the `func` have specific return value, not `None`!
+    :param reader: the data iterator
+    :param func: the map func
+    :param num_works: number of works
+    :return: an new iterator
+    """
+
+    def _read_into_pipe(func, conn):
+        """
+        read into pipe, and use the `func` to get final data.
+        """
+        while True:
+            data = conn.recv()
+            if data is None:
+                conn.send(None)
+                conn.close()
+                break
+            conn.send(func(data))
+
+    def pipe_reader():
+        """pipe_reader"""
+        conns = []
+        all_process = []
+        for w in range(num_works):
+            parent_conn, child_conn = mp.Pipe()
+            conns.append(parent_conn)
+            p = mp.Process(target=_read_into_pipe, args=(func, child_conn))
+            p.start()
+            all_process.append(p)
+
+        data_iter = reader()
+        if not hasattr(data_iter, "__next__"):
+            __next__ = data_iter.next
+        else:
+            __next__ = data_iter.__next__
+
+        def next_data():
+            """next_data"""
+            _next = None
+            try:
+                _next = __next__()
+            except StopIteration:
+                # log.debug(traceback.format_exc())
+                pass
+            except Exception:
+                log.debug(traceback.format_exc())
+            return _next
+
+        for i in range(num_works):
+            conns[i].send(next_data())
+
+        finish_num = 0
+        finish_flag = np.zeros(len(conns), dtype="int32")
+        while finish_num < num_works:
+            for conn_id, conn in enumerate(conns):
+                if finish_flag[conn_id] > 0:
+                    continue
+                sample = conn.recv()
+                if sample is None:
+                    finish_num += 1
+                    conn.close()
+                    finish_flag[conn_id] = 1
+                else:
+                    yield sample
+                    conns[conn_id].send(next_data())
+
+    return pipe_reader
--- a/examples/pgl-ke/run.sh
+++ b/examples/pgl-ke/run.sh
+device=3
+
+CUDA_VISIBLE_DEVICES=$device \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python main.py \
+    --use_cuda \
+    --model TransE \
+    --data_dir ./data/FB15k \
+    --optimizer adam \
+    --batch_size=1024 \
+    --learning_rate=0.001 \
+    --epoch 200 \
+    --evaluate_per_iteration 200 \
+    --sample_workers 1 \
+    --margin 1.0 \
+    --nofilter True \
+    --neg_times 10 \
+    --neg_mode True
+    #--only_evaluate
+
+#  TransE FB15k
+#  -----Raw-Average-Results
+#  MeanRank: 214.94, MRR: 0.2051, Hits@1: 0.0929, Hits@3: 0.2343, Hits@10: 0.4458
+#  -----Filter-Average-Results
+#  MeanRank:  74.41, MRR: 0.3793, Hits@1: 0.2351, Hits@3: 0.4538, Hits@10: 0.6570
+
+
+
+CUDA_VISIBLE_DEVICES=$device \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python main.py \
+    --use_cuda \
+    --model TransE \
+    --data_dir ./data/WN18 \
+    --optimizer adam \
+    --batch_size=1024 \
+    --learning_rate=0.001 \
+    --epoch 100 \
+    --evaluate_per_iteration 100 \
+    --sample_workers 1 \
+    --margin 4 \
+    --nofilter True \
+    --neg_times 10 \
+    --neg_mode True
+
+#  TransE WN18
+#  -----Raw-Average-Results
+#  MeanRank: 219.08, MRR: 0.3383, Hits@1: 0.0821, Hits@3: 0.5233, Hits@10: 0.7997
+#  -----Filter-Average-Results
+#  MeanRank: 207.72, MRR: 0.4631, Hits@1: 0.1349, Hits@3: 0.7708, Hits@10: 0.9315
+
+
+
+#for  prertrain
+CUDA_VISIBLE_DEVICES=$device \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python main.py \
+    --use_cuda \
+    --model TransE \
+    --data_dir ./data/FB15k \
+    --optimizer adam \
+    --batch_size=512 \
+    --learning_rate=0.001 \
+    --epoch 30 \
+    --evaluate_per_iteration 30 \
+    --sample_workers 1 \
+    --margin 2.0 \
+    --nofilter True \
+    --noeval True \
+    --neg_times 10 \
+    --neg_mode True && \
+CUDA_VISIBLE_DEVICES=$device \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python main.py \
+    --use_cuda \
+    --model TransR \
+    --data_dir ./data/FB15k \
+    --optimizer adam \
+    --batch_size=512 \
+    --learning_rate=0.001 \
+    --epoch 200 \
+    --evaluate_per_iteration 200 \
+    --sample_workers 1 \
+    --margin 2.0 \
+    --pretrain True \
+    --nofilter True \
+    --neg_times 10 \
+    --neg_mode True
+
+#  FB15k TransR 200, pretrain 20
+#  -----Raw-Average-Results
+#  MeanRank: 303.81, MRR: 0.1931, Hits@1: 0.0920, Hits@3: 0.2109, Hits@10: 0.4181
+#  -----Filter-Average-Results
+#  MeanRank: 156.30, MRR: 0.3663, Hits@1: 0.2318, Hits@3: 0.4352, Hits@10: 0.6231
+
+
+
+# for pretrain
+CUDA_VISIBLE_DEVICES=$device \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python main.py \
+    --use_cuda \
+    --model TransE \
+    --data_dir ./data/WN18 \
+    --optimizer adam \
+    --batch_size=512 \
+    --learning_rate=0.001 \
+    --epoch 30 \
+    --evaluate_per_iteration 30 \
+    --sample_workers 1 \
+    --margin 4.0 \
+    --nofilter True \
+    --noeval True \
+    --neg_times 10 \
+    --neg_mode True && \
+CUDA_VISIBLE_DEVICES=$device \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python main.py \
+    --use_cuda \
+    --model TransR \
+    --data_dir ./data/WN18 \
+    --optimizer adam \
+    --batch_size=512 \
+    --learning_rate=0.001 \
+    --epoch 100 \
+    --evaluate_per_iteration 100 \
+    --sample_workers 1 \
+    --margin 4.0 \
+    --pretrain True \
+    --nofilter True \
+    --neg_times 10 \
+    --neg_mode True
+
+#  TransR WN18 100, pretrain 30
+#  -----Raw-Average-Results
+#  MeanRank: 321.41, MRR: 0.3706, Hits@1: 0.0955, Hits@3: 0.5906, Hits@10: 0.8099
+#  -----Filter-Average-Results
+#  MeanRank: 309.15, MRR: 0.5126, Hits@1: 0.1584, Hits@3: 0.8601, Hits@10: 0.9409
+
+
+
+CUDA_VISIBLE_DEVICES=$device \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python main.py \
+    --use_cuda \
+    --model RotatE \
+    --data_dir ./data/FB15k \
+    --optimizer adam \
+    --batch_size=512 \
+    --learning_rate=0.001 \
+    --epoch 100 \
+    --evaluate_per_iteration 100 \
+    --sample_workers 10 \
+    --margin 8 \
+    --neg_times 10 \
+    --neg_mode True
+
+# RotatE FB15k
+# -----Raw-Average-Results
+# MeanRank: 156.85, MRR: 0.2699, Hits@1: 0.1615, Hits@3: 0.3031, Hits@10: 0.5006
+# -----Filter-Average-Results
+# MeanRank:  53.35, MRR: 0.4776, Hits@1: 0.3537, Hits@3: 0.5473, Hits@10: 0.7062
+
+
+
+CUDA_VISIBLE_DEVICES=$device \
+FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+python main.py \
+    --use_cuda \
+    --model RotatE \
+    --data_dir ./data/WN18 \
+    --optimizer adam \
+    --batch_size=512 \
+    --learning_rate=0.001 \
+    --epoch 100 \
+    --evaluate_per_iteration 100 \
+    --sample_workers 10 \
+    --margin 6 \
+    --neg_times 10 \
+    --neg_mode True
+
+# RotaE WN18
+# -----Raw-Average-Results
+# MeanRank: 167.27, MRR: 0.6025, Hits@1: 0.4764, Hits@3: 0.6880, Hits@10: 0.8298
+# -----Filter-Average-Results
+# MeanRank: 155.23, MRR: 0.9145, Hits@1: 0.8843, Hits@3: 0.9412, Hits@10: 0.9570
--- a/examples/sgc/README.md
+++ b/examples/sgc/README.md
+# SGC: Simplifying Graph Convolutional Networks 
+
+[Simplifying Graph Convolutional Networks \(SGC\)](https://arxiv.org/pdf/1902.07153.pdf) is a powerful neural network designed for machine learning on graphs. Based on PGL, we reproduce SGC algorithms and reach the same level of indicators as the paper in citation network benchmarks.
+
+### Datasets
+
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+
+### Dependencies
+
+- paddlepaddle 1.5
+- pgl
+
+### Performance
+
+We train our models for 200 epochs and report the accuracy on the test dataset.
+
+| Dataset | Accuracy | Speed with paddle 1.5 <br> (epoch time)|
+| --- | --- | ---|
+| Cora | 0.818 (paper: 0.810) | 0.0015s | 
+| Pubmed | 0.788 (paper: 0.789) | 0.0015s |
+| Citeseer | 0.719 (paper: 0.719) | 0.0015s | 
+
+
+### How to run
+
+For examples, use gpu to train SGC on cora dataset.
+```
+python sgc.py --dataset cora --use_cuda
+```
+
+#### Hyperparameters
+
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda. 
--- a/examples/sgc/sgc.py
+++ b/examples/sgc/sgc.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the training process of SGC model with StaticGraphWrapper.
+"""
+
+import os
+import argparse
+import numpy as np
+import random
+import time
+
+import pgl
+from pgl import data_loader
+from pgl.utils.logger import log
+from pgl.utils import paddle_helper
+import paddle.fluid as fluid
+
+
+def load(name):
+    """Load dataset."""
+    if name == 'cora':
+        dataset = data_loader.CoraDataset()
+    elif name == "pubmed":
+        dataset = data_loader.CitationDataset("pubmed", symmetry_edges=False)
+    elif name == "citeseer":
+        dataset = data_loader.CitationDataset("citeseer", symmetry_edges=False)
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+
+
+def expand_data_dim(dataset):
+    """Expand the dimension of data."""
+    train_index = dataset.train_index
+    train_label = np.expand_dims(dataset.y[train_index], -1)
+    train_index = np.expand_dims(train_index, -1)
+
+    val_index = dataset.val_index
+    val_label = np.expand_dims(dataset.y[val_index], -1)
+    val_index = np.expand_dims(val_index, -1)
+
+    test_index = dataset.test_index
+    test_label = np.expand_dims(dataset.y[test_index], -1)
+    test_index = np.expand_dims(test_index, -1)
+
+    return {
+        'train_index': train_index,
+        'train_label': train_label,
+        'val_index': val_index,
+        'val_label': val_label,
+        'test_index': test_index,
+        'test_label': test_label,
+    }
+
+
+def MessagePassing(gw, feature, num_layers, norm=None):
+    """Precomputing message passing.
+    """
+
+    def send_src_copy(src_feat, dst_feat, edge_feat):
+        """send_src_copy
+        """
+        return src_feat["h"]
+
+    for _ in range(num_layers):
+        if norm is not None:
+            feature = feature * norm
+
+        msg = gw.send(send_src_copy, nfeat_list=[("h", feature)])
+
+        feature = gw.recv(msg, "sum")
+
+        if norm is not None:
+            feature = feature * norm
+
+    return feature
+
+
+def pre_gather(features, name_prefix, node_index_val):
+    """Get features with respect to node index.
+    """
+    node_index, init = paddle_helper.constant(
+        "%s_node_index" % (name_prefix), dtype='int32', value=node_index_val)
+    logits = fluid.layers.gather(features, node_index)
+
+    return logits, init
+
+
+def calculate_loss(name, np_cached_h, node_label_val, num_classes, args):
+    """Calculate loss function.
+    """
+    initializer = []
+    const_cached_h, init = paddle_helper.constant(
+        "const_%s_cached_h" % name, dtype='float32', value=np_cached_h)
+    initializer.append(init)
+
+    node_label, init = paddle_helper.constant(
+        "%s_node_label" % (name), dtype='int64', value=node_label_val)
+    initializer.append(init)
+
+    output = fluid.layers.fc(const_cached_h,
+                             size=num_classes,
+                             bias_attr=args.bias,
+                             name='fc')
+
+    loss, probs = fluid.layers.softmax_with_cross_entropy(
+        logits=output, label=node_label, return_softmax=True)
+    loss = fluid.layers.mean(loss)
+
+    acc = None
+    if name != 'train':
+        acc = fluid.layers.accuracy(input=probs, label=node_label, k=1)
+
+    return {
+        'loss': loss,
+        'acc': acc,
+        'probs': probs,
+        'initializer': initializer
+    }
+
+
+def main(args):
+    """"Main function."""
+    dataset = load(args.dataset)
+
+    # normalize
+    indegree = dataset.graph.indegree()
+    norm = np.zeros_like(indegree, dtype="float32")
+    norm[indegree > 0] = np.power(indegree[indegree > 0], -0.5)
+    dataset.graph.node_feat["norm"] = np.expand_dims(norm, -1)
+
+    data = expand_data_dim(dataset)
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    precompute_program = fluid.Program()
+    startup_program = fluid.Program()
+    train_program = fluid.Program()
+    val_program = train_program.clone(for_test=True)
+    test_program = train_program.clone(for_test=True)
+
+    # precompute message passing and gather
+    initializer = []
+    with fluid.program_guard(precompute_program, startup_program):
+        gw = pgl.graph_wrapper.StaticGraphWrapper(
+            name="graph", place=place, graph=dataset.graph)
+
+        cached_h = MessagePassing(
+            gw,
+            gw.node_feat["words"],
+            num_layers=args.num_layers,
+            norm=gw.node_feat['norm'])
+
+        train_cached_h, init = pre_gather(cached_h, 'train',
+                                          data['train_index'])
+        initializer.append(init)
+        val_cached_h, init = pre_gather(cached_h, 'val', data['val_index'])
+        initializer.append(init)
+        test_cached_h, init = pre_gather(cached_h, 'test', data['test_index'])
+        initializer.append(init)
+
+    exe = fluid.Executor(place)
+    gw.initialize(place)
+    for init in initializer:
+        init(place)
+
+    # get train features, val features and test features 
+    np_train_cached_h, np_val_cached_h, np_test_cached_h = exe.run(
+        precompute_program,
+        feed={},
+        fetch_list=[train_cached_h, val_cached_h, test_cached_h],
+        return_numpy=True)
+
+    initializer = []
+    with fluid.program_guard(train_program, startup_program):
+        with fluid.unique_name.guard():
+            train_handle = calculate_loss('train', np_train_cached_h,
+                                          data['train_label'],
+                                          dataset.num_classes, args)
+            initializer += train_handle['initializer']
+            adam = fluid.optimizer.Adam(
+                learning_rate=args.lr,
+                regularization=fluid.regularizer.L2DecayRegularizer(
+                    regularization_coeff=args.weight_decay))
+            adam.minimize(train_handle['loss'])
+
+    with fluid.program_guard(val_program, startup_program):
+        with fluid.unique_name.guard():
+            val_handle = calculate_loss('val', np_val_cached_h,
+                                        data['val_label'], dataset.num_classes,
+                                        args)
+            initializer += val_handle['initializer']
+
+    with fluid.program_guard(test_program, startup_program):
+        with fluid.unique_name.guard():
+            test_handle = calculate_loss('test', np_test_cached_h,
+                                         data['test_label'],
+                                         dataset.num_classes, args)
+            initializer += test_handle['initializer']
+
+    exe.run(startup_program)
+    for init in initializer:
+        init(place)
+
+    dur = []
+    for epoch in range(args.epochs):
+        if epoch >= 3:
+            t0 = time.time()
+        train_loss_t = exe.run(train_program,
+                               feed={},
+                               fetch_list=[train_handle['loss']],
+                               return_numpy=True)[0]
+
+        if epoch >= 3:
+            time_per_epoch = 1.0 * (time.time() - t0)
+            dur.append(time_per_epoch)
+
+        val_loss_t, val_acc_t = exe.run(
+            val_program,
+            feed={},
+            fetch_list=[val_handle['loss'], val_handle['acc']],
+            return_numpy=True)
+
+        log.info("Epoch %d " % epoch + "(%.5lf sec) " % np.mean(
+            dur) + "Train Loss: %f " % train_loss_t + "Val Loss: %f " %
+                 val_loss_t + "Val Acc: %f " % val_acc_t)
+
+    test_loss_t, test_acc_t = exe.run(
+        test_program,
+        feed={},
+        fetch_list=[test_handle['loss'], test_handle['acc']],
+        return_numpy=True)
+    log.info("Test Accuracy: %f" % test_acc_t)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='SGC')
+    parser.add_argument(
+        "--dataset",
+        type=str,
+        default="cora",
+        help="dataset (cora, pubmed, citeseer)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument(
+        "--seed", type=int, default=1667, help="global random seed")
+    parser.add_argument("--lr", type=float, default=0.2, help="learning rate")
+    parser.add_argument(
+        "--weight_decay",
+        type=float,
+        default=0.000005,
+        help="Weight for L2 loss")
+    parser.add_argument(
+        "--bias", action='store_true', default=False, help="flag to use bias")
+    parser.add_argument(
+        "--epochs", type=int, default=200, help="number of training epochs")
+    parser.add_argument(
+        "--num_layers", type=int, default=2, help="number of SGC layers")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/static_gat/README.md
+++ b/examples/static_gat/README.md
@@ -11,7 +11,7 @@ The datasets contain three citation networks: CORA, PUBMED, CITESEER. The detail

 ### Dependencies

- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- paddlepaddle>=1.6
 - pgl

 ### Performance
@@ -19,11 +19,11 @@ The datasets contain three citation networks: CORA, PUBMED, CITESEER. The detail
 We train our models for 200 epochs and report the accuracy on the test dataset.


-| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)| examples/gat | Improvement |
-| --- | --- | --- |---| --- | --- |
-| Cora | ~83% | 0.0145s | 0.0119s | 0.0175s | 1.47x |
-| Pubmed | ~78% | 0.0352s | 0.0193s |0.0295s | 1.53x |
-| Citeseer | ~70% | 0.0148s | 0.0124s |0.0253s | 2.04x |
+| Dataset | Accuracy | epoch time | examples/gat | Improvement |
+| --- | --- | --- | --- | --- |
+| Cora | ~83% | 0.0119s | 0.0175s | 1.47x |
+| Pubmed | ~78% | 0.0193s |0.0295s | 1.53x |
+| Citeseer | ~70% | 0.0124s |0.0253s | 2.04x |

 ### How to run


--- a/examples/static_gat/train.py
+++ b/examples/static_gat/train.py
@@ -84,7 +84,7 @@ def main(args):
    initializer = []
    with fluid.program_guard(train_program, startup_program):
        train_node_index, init = paddle_helper.constant(
-            "train_node_index", dtype="int32", value=train_index)
+            "train_node_index", dtype="int64", value=train_index)
        initializer.append(init)

        train_node_label, init = paddle_helper.constant(
@@ -103,7 +103,7 @@ def main(args):

    with fluid.program_guard(val_program, startup_program):
        val_node_index, init = paddle_helper.constant(
-            "val_node_index", dtype="int32", value=val_index)
+            "val_node_index", dtype="int64", value=val_index)
        initializer.append(init)

        val_node_label, init = paddle_helper.constant(
@@ -119,7 +119,7 @@ def main(args):

    with fluid.program_guard(test_program, startup_program):
        test_node_index, init = paddle_helper.constant(
-            "test_node_index", dtype="int32", value=test_index)
+            "test_node_index", dtype="int64", value=test_index)
        initializer.append(init)

        test_node_label, init = paddle_helper.constant(

--- a/examples/static_gcn/README.md
+++ b/examples/static_gcn/README.md
@@ -10,7 +10,7 @@ The datasets contain three citation networks: CORA, PUBMED, CITESEER. The detail

 ### Dependencies

- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- paddlepaddle>=1.6
 - pgl

 ### Performance
@@ -18,12 +18,11 @@ The datasets contain three citation networks: CORA, PUBMED, CITESEER. The detail
 We train our models for 200 epochs and report the accuracy on the test dataset.


-| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)| examples/gcn | Improvement |
-| --- | --- | --- |---| --- | --- |
-| Cora | ~81% | 0.0053s | 0.0047s | 0.0104s | 2.21x |
-| Pubmed | ~79% | 0.0105s  | 0.0049s |0.0154s | 3.14x |
-| Citeseer | ~71% | 0.0051s | 0.0045s |0.0177s | 3.93x |
-
+| Dataset | Accuracy | epoch time | examples/gcn | Improvement |
+| --- | --- | --- | --- | --- |
+| Cora | ~81% | 0.0047s | 0.0104s | 2.21x |
+| Pubmed | ~79% | 0.0049s |0.0154s | 3.14x |
+| Citeseer | ~71% | 0.0045s |0.0177s | 3.93x |


 ### How to run

--- a/examples/static_gcn/train.py
+++ b/examples/static_gcn/train.py
@@ -85,7 +85,7 @@ def main(args):
    initializer = []
    with fluid.program_guard(train_program, startup_program):
        train_node_index, init = paddle_helper.constant(
-            "train_node_index", dtype="int32", value=train_index)
+            "train_node_index", dtype="int64", value=train_index)
        initializer.append(init)

        train_node_label, init = paddle_helper.constant(
@@ -104,7 +104,7 @@ def main(args):

    with fluid.program_guard(val_program, startup_program):
        val_node_index, init = paddle_helper.constant(
-            "val_node_index", dtype="int32", value=val_index)
+            "val_node_index", dtype="int64", value=val_index)
        initializer.append(init)

        val_node_label, init = paddle_helper.constant(
@@ -120,7 +120,7 @@ def main(args):

    with fluid.program_guard(test_program, startup_program):
        test_node_index, init = paddle_helper.constant(
-            "test_node_index", dtype="int32", value=test_index)
+            "test_node_index", dtype="int64", value=test_index)
        initializer.append(init)

        test_node_label, init = paddle_helper.constant(

--- a/examples/stgcn/README.md
+++ b/examples/stgcn/README.md
+# STGCN: Spatio-Temporal Graph Convolutional Network
+
+[Spatio-Temporal Graph Convolutional Network \(STGCN\)](https://arxiv.org/pdf/1709.04875.pdf) is a novel deep learning framework to tackle time series prediction problem. Based on PGL, we reproduce STGCN algorithms to predict new confirmed patients in some cities with the historical immigration records.
+
+### Datasets
+
+You can make your customized dataset by the following format:
+
+* input.csv: Historical immigration records with shape of [num\_time\_steps * num\_cities].
+
+* output.csv: New confirmed patients records with shape of [num\_time\_steps * num\_cities].
+
+* W.csv: Weighted Adjacency Matrix with shape of [num\_cities * num\_cities].
+
+* city.csv: Each line is a number and the corresponding city name.
+
+### Dependencies
+
+- paddlepaddle 1.6
+- pgl 1.0.0
+
+### How to run
+
+For examples, use gpu to train STGCN on your dataset.
+```
+python main.py --use_cuda --input_file dataset/input_csv --label_file dataset/output.csv --adj_mat_file dataset/W.csv --city_file dataset/city.csv 
+```
+
+#### Hyperparameters
+
+- n\_route: Number of city.
+- n\_his: "n\_his" time steps of previous observations of historical immigration records.
+- n\_pred: Next "n\_pred" time steps of New confirmed patients records.
+- Ks: Number of GCN layers.
+- Kt: Kernel size of temporal convolution.
+- use\_cuda: Use gpu if assign use\_cuda. 
--- a/examples/stgcn/data_loader/__init__.py
+++ b/examples/stgcn/data_loader/__init__.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""__init__"""
--- a/examples/stgcn/data_loader/data_utils.py
+++ b/examples/stgcn/data_loader/data_utils.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""data processing
+"""
+import numpy as np
+import pandas as pd
+
+from utils.math_utils import z_score
+
+
+class Dataset(object):
+    """Dataset
+    """
+
+    def __init__(self, data, stats):
+        self.__data = data
+        self.mean = stats['mean']
+        self.std = stats['std']
+
+    def get_data(self, type):  # type: train, val or test
+        return self.__data[type]
+
+    def get_stats(self):
+        return {'mean': self.mean, 'std': self.std}
+
+    def get_len(self, type):
+        return len(self.__data[type])
+
+    def z_inverse(self, type):
+        return self.__data[type] * self.std + self.mean
+
+
+def seq_gen(len_seq, data_seq, offset, n_frame, n_route, day_slot, C_0=1):
+    """Generate data in the form of standard sequence unit."""
+    n_slot = day_slot - n_frame + 1
+
+    tmp_seq = np.zeros((len_seq * n_slot, n_frame, n_route, C_0))
+    for i in range(len_seq):
+        for j in range(n_slot):
+            sta = (i + offset) * day_slot + j
+            end = sta + n_frame
+            tmp_seq[i * n_slot + j, :, :, :] = np.reshape(
+                data_seq[sta:end, :], [n_frame, n_route, C_0])
+    return tmp_seq
+
+
+def adj_matrx_gen_custom(input_file, city_file):
+    """genenrate Adjacency Matrix from file 
+    """
+    print("generate adj_matrix data (take long time)...")
+    # data
+    df = pd.read_csv(
+        input_file,
+        sep='\t',
+        names=['date', '迁出省份', '迁出城市', '迁入省份', '迁入城市', '人数'])
+    # 只需要2020年的数据
+    df['date'] = pd.to_datetime(df['date'], format="%Y%m%d")
+    df = df.set_index('date')
+    df = df['2020']
+    city_df = pd.read_csv(city_file)
+    # 剔除武汉
+    city_df = city_df.drop(0)
+    num = len(city_df)
+    matrix = np.zeros([num, num])
+    for i in city_df['city']:
+        for j in city_df['city']:
+            if (i == j):
+                continue
+            # 选出从i到j的每日人数
+            cut = df[df['迁出城市'].str.contains(i)]
+            cut = cut[cut['迁入城市'].str.contains(j)]
+            # 求均值作为权重
+            average = cut['人数'].mean()
+            # 赋值给matrix
+            i_index = int(city_df[city_df['city'] == i]['num']) - 1
+            j_index = int(city_df[city_df['city'] == j]['num']) - 1
+            matrix[i_index, j_index] = average
+
+    np.savetxt("dataset/W_74.csv", matrix, delimiter=",")
+
+
+def data_gen_custom(input_file, output_file, city_file, n, n_his, n_pred,
+                    n_config):
+    """data_gen_custom"""
+    print("generate training data...")
+    # data
+    df = pd.read_csv(
+        input_file,
+        sep='\t',
+        names=['date', '迁出省份', '迁出城市', '迁入省份', '迁入城市', '人数'])
+    # 只需要2020年的数据
+    df['date'] = pd.to_datetime(df['date'], format="%Y%m%d")
+    df = df.set_index('date')
+    df = df['2020']
+    city_df = pd.read_csv(city_file)
+    input_df = pd.DataFrame()
+
+    out_df_wuhan = df[df['迁出城市'].str.contains('武汉')]
+    for i in city_df['city']:
+        # 筛选迁入城市
+        in_df_i = out_df_wuhan[out_df_wuhan['迁入城市'].str.contains(i)]
+        # 确保按时间升序
+        # in_df_i.sort_values("date",inplace=True)
+        # 按时间插入
+        in_df_i.reset_index(drop=True, inplace=True)
+        input_df[i] = in_df_i['人数']
+
+    # 替换Nan值
+    input_df = input_df.replace(np.nan, 0)
+
+    x = input_df
+    y = pd.read_csv(output_file)
+    # 删除第1列
+    x.drop(
+        x.columns[x.columns.str.contains(
+            'unnamed', case=False)],
+        axis=1,
+        inplace=True)
+    y = y.drop(columns=['date'])
+
+    # 剔除迁入武汉的数据
+    x = x.drop(columns=['武汉'])
+    y = y.drop(columns=['武汉'])
+
+    # param
+    n_val, n_test = n_config
+    n_train = len(y) - n_val - n_test - 2
+
+    # (?,26,74,1)
+    df = pd.DataFrame(columns=x.columns)
+    for i in range(len(y) - n_pred + 1):
+        df = df.append(x[i:i + n_his])
+        df = df.append(y[i:i + n_pred])
+    data = df.values.reshape(-1, n_his + n_pred, n,
+                             1)  # n == num_nodes == city num
+
+    x_stats = {'mean': np.mean(data), 'std': np.std(data)}
+
+    x_train = data[:n_train]
+    x_val = data[n_train:n_train + n_val]
+    x_test = data[n_train + n_val:]
+
+    x_data = {'train': x_train, 'val': x_val, 'test': x_test}
+    dataset = Dataset(x_data, x_stats)
+    print("generate successfully!")
+
+    return dataset
+
+
+def data_gen_mydata(input_file, label_file, n, n_his, n_pred, n_config):
+    """data processing
+    """
+    # data
+    x = pd.read_csv(input_file)
+    y = pd.read_csv(label_file)
+    x = x.drop(columns=['date'])
+    y = y.drop(columns=['date'])
+
+    x = x.drop(columns=['武汉'])
+    y = y.drop(columns=['武汉'])
+
+    # param
+    n_val, n_test = n_config
+    n_train = len(y) - n_val - n_test - 2
+
+    # (?,26,74,1)
+    df = pd.DataFrame(columns=x.columns)
+    for i in range(len(y) - n_pred + 1):
+        df = df.append(x[i:i + n_his])
+        df = df.append(y[i:i + n_pred])
+
+    data = df.values.reshape(-1, n_his + n_pred, n, 1)
+
+    x_stats = {'mean': np.mean(data), 'std': np.std(data)}
+
+    x_train = data[:n_train]
+    x_val = data[n_train:n_train + n_val]
+    x_test = data[n_train + n_val:]
+
+    x_data = {'train': x_train, 'val': x_val, 'test': x_test}
+    dataset = Dataset(x_data, x_stats)
+    return dataset
+
+
+def data_gen(file_path, data_config, n_route, n_frame=21, day_slot=288):
+    """Source file load and dataset generation."""
+    n_train, n_val, n_test = data_config
+    # generate training, validation and test data
+    try:
+        data_seq = pd.read_csv(file_path, header=None).values
+    except FileNotFoundError:
+        print(f'ERROR: input file was not found in {file_path}.')
+
+    seq_train = seq_gen(n_train, data_seq, 0, n_frame, n_route, day_slot)
+    seq_val = seq_gen(n_val, data_seq, n_train, n_frame, n_route, day_slot)
+    seq_test = seq_gen(n_test, data_seq, n_train + n_val, n_frame, n_route,
+                       day_slot)
+
+    # x_stats: dict, the stats for the train dataset, including the value of mean and standard deviation.
+    x_stats = {'mean': np.mean(seq_train), 'std': np.std(seq_train)}
+
+    # x_train, x_val, x_test: np.array, [sample_size, n_frame, n_route, channel_size].
+    x_train = z_score(seq_train, x_stats['mean'], x_stats['std'])
+    x_val = z_score(seq_val, x_stats['mean'], x_stats['std'])
+    x_test = z_score(seq_test, x_stats['mean'], x_stats['std'])
+
+    x_data = {'train': x_train, 'val': x_val, 'test': x_test}
+    dataset = Dataset(x_data, x_stats)
+    return dataset
+
+
+def gen_batch(inputs, batch_size, dynamic_batch=False, shuffle=False):
+    """Data iterator in batch.
+
+    Args:
+        inputs: np.ndarray, [len_seq, n_frame, n_route, C_0], standard sequence units.
+        batch_size: int, size of batch.
+        dynamic_batch: bool, whether changes the batch size in the last batch 
+            if its length is less than the default.
+        shuffle: bool, whether shuffle the batches.
+    """
+    len_inputs = len(inputs)
+
+    if shuffle:
+        idx = np.arange(len_inputs)
+        np.random.shuffle(idx)
+
+    for start_idx in range(0, len_inputs, batch_size):
+        end_idx = start_idx + batch_size
+        if end_idx > len_inputs:
+            if dynamic_batch:
+                end_idx = len_inputs
+            else:
+                break
+        if shuffle:
+            slide = idx[start_idx:end_idx]
+        else:
+            slide = slice(start_idx, end_idx)
+
+        yield inputs[slide]
--- a/examples/stgcn/data_loader/graph.py
+++ b/examples/stgcn/data_loader/graph.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PGL Graph
+"""
+import sys
+import os
+import numpy as np
+import pandas as pd
+
+from pgl.graph import Graph
+
+
+def weight_matrix(file_path, sigma2=0.1, epsilon=0.5, scaling=True):
+    """Load weight matrix function."""
+    try:
+        W = pd.read_csv(file_path, header=None).values
+    except FileNotFoundError:
+        print(f'ERROR: input file was not found in {file_path}.')
+
+    # check whether W is a 0/1 matrix.
+    if set(np.unique(W)) == {0, 1}:
+        print('The input graph is a 0/1 matrix; set "scaling" to False.')
+        scaling = False
+
+    if scaling:
+        n = W.shape[0]
+        W = W / 10000.
+        W2, W_mask = W * W, np.ones([n, n]) - np.identity(n)
+        # refer to Eq.10
+        return np.exp(-W2 / sigma2) * (
+            np.exp(-W2 / sigma2) >= epsilon) * W_mask
+    else:
+        return W
+
+
+class GraphFactory(object):
+    """GraphFactory"""
+
+    def __init__(self, args):
+        self.args = args
+        self.adj_matrix = weight_matrix(self.args.adj_mat_file)
+
+        L = np.eye(self.adj_matrix.shape[0]) + self.adj_matrix
+        D = np.sum(self.adj_matrix, axis=1)
+        #  L = D - self.adj_matrix
+        #  import ipdb; ipdb.set_trace()
+
+        edges = []
+        weights = []
+        for i in range(self.adj_matrix.shape[0]):
+            for j in range(self.adj_matrix.shape[1]):
+                edges.append([i, j])
+                weights.append(L[i][j])
+
+        self.edges = np.array(edges, dtype=np.int64)
+        self.weights = np.array(weights, dtype=np.float32).reshape(-1, 1)
+
+        self.norm = np.zeros_like(D, dtype=np.float32)
+        self.norm[D > 0] = np.power(D[D > 0], -0.5)
+        self.norm = self.norm.reshape(-1, 1)
+
+    def build_graph(self, x_batch):
+        """build graph"""
+        B, T, n, _ = x_batch.shape
+        batch = B * T
+
+        batch_edges = []
+        for i in range(batch):
+            batch_edges.append(self.edges + (i * n))
+        batch_edges = np.vstack(batch_edges)
+
+        num_nodes = B * T * n
+        node_feat = {'norm': np.tile(self.norm, [batch, 1])}
+        edge_feat = {'weights': np.tile(self.weights, [batch, 1])}
+        graph = Graph(
+            num_nodes=num_nodes,
+            edges=batch_edges,
+            node_feat=node_feat,
+            edge_feat=edge_feat)
+
+        return graph
--- a/examples/stgcn/main.py
+++ b/examples/stgcn/main.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This file implement the training process of STGCN model.
+"""
+
+import os
+import sys
+import time
+import argparse
+import numpy as np
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+import pgl
+from pgl.utils.logger import log
+
+from data_loader.data_utils import data_gen_mydata, gen_batch
+from data_loader.graph import GraphFactory
+from models.model import STGCNModel
+from models.tester import model_inference, model_test
+
+
+def main(args):
+    """main"""
+    PeMS = data_gen_mydata(args.input_file, args.label_file, args.n_route,
+                           args.n_his, args.n_pred, (args.n_val, args.n_test))
+
+    log.info(PeMS.get_stats())
+    log.info(PeMS.get_len('train'))
+
+    gf = GraphFactory(args)
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.GraphWrapper(
+            "gw",
+            place,
+            node_feat=[('norm', [None, 1], "float32")],
+            edge_feat=[('weights', [None, 1], "float32")])
+
+        model = STGCNModel(args, gw)
+        train_loss, y_pred = model.forward()
+
+    infer_program = train_program.clone(for_test=True)
+
+    with fluid.program_guard(train_program, startup_program):
+        epoch_step = int(PeMS.get_len('train') / args.batch_size) + 1
+        lr = fl.exponential_decay(
+            learning_rate=args.lr,
+            decay_steps=5 * epoch_step,
+            decay_rate=0.7,
+            staircase=True)
+        if args.opt == 'RMSProp':
+            train_op = fluid.optimizer.RMSPropOptimizer(lr).minimize(
+                train_loss)
+        elif args.opt == 'ADAM':
+            train_op = fluid.optimizer.Adam(lr).minimize(train_loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    if args.inf_mode == 'sep':
+        # for inference mode 'sep', the type of step index is int.
+        step_idx = args.n_pred - 1
+        tmp_idx = [step_idx]
+        min_val = min_va_val = np.array([4e1, 1e5, 1e5])
+    elif args.inf_mode == 'merge':
+        # for inference mode 'merge', the type of step index is np.ndarray.
+        step_idx = tmp_idx = np.arange(3, args.n_pred + 1, 3) - 1
+        min_val = min_va_val = np.array([4e1, 1e5, 1e5]) * len(step_idx)
+    else:
+        raise ValueError(f'ERROR: test mode "{args.inf_mode}" is not defined.')
+
+    step = 0
+    for epoch in range(1, args.epochs + 1):
+        for idx, x_batch in enumerate(
+                gen_batch(
+                    PeMS.get_data('train'),
+                    args.batch_size,
+                    dynamic_batch=True,
+                    shuffle=True)):
+
+            x = np.array(x_batch[:, 0:args.n_his, :, :], dtype=np.float32)
+            graph = gf.build_graph(x)
+            feed = gw.to_feed(graph)
+            feed['input'] = np.array(
+                x_batch[:, 0:args.n_his + 1, :, :], dtype=np.float32)
+            b_loss, b_lr = exe.run(train_program,
+                                   feed=feed,
+                                   fetch_list=[train_loss, lr])
+
+            if idx % 5 == 0:
+                log.info("epoch %d | step %d | lr %.6f | loss %.6f" %
+                         (epoch, idx, b_lr[0], b_loss[0]))
+
+        min_va_val, min_val = \
+                model_inference(exe, gw, gf, infer_program, y_pred, PeMS, args, \
+                                step_idx, min_va_val, min_val)
+
+        for ix in tmp_idx:
+            va, te = min_va_val[ix - 2:ix + 1], min_val[ix - 2:ix + 1]
+            print(f'Time Step {ix + 1}: '
+                  f'MAPE {va[0]:7.3%}, {te[0]:7.3%}; '
+                  f'MAE  {va[1]:4.3f}, {te[1]:4.3f}; '
+                  f'RMSE {va[2]:6.3f}, {te[2]:6.3f}.')
+
+        if epoch % 5 == 0:
+            model_test(exe, gw, gf, infer_program, y_pred, PeMS, args)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--n_route', type=int, default=74)
+    parser.add_argument('--n_his', type=int, default=23)
+    parser.add_argument('--n_pred', type=int, default=3)
+    parser.add_argument('--batch_size', type=int, default=10)
+    parser.add_argument('--epochs', type=int, default=100)
+    parser.add_argument('--save', type=int, default=10)
+    parser.add_argument('--Ks', type=int, default=3)  #equal to num_layers
+    parser.add_argument('--Kt', type=int, default=3)
+    parser.add_argument('--lr', type=float, default=1e-2)
+    parser.add_argument('--keep_prob', type=float, default=1.0)
+    parser.add_argument('--opt', type=str, default='RMSProp')
+    parser.add_argument('--inf_mode', type=str, default='sep')
+    parser.add_argument('--input_file', type=str, default='dataset/input.csv')
+    parser.add_argument('--label_file', type=str, default='dataset/output.csv')
+    parser.add_argument(
+        '--city_file', type=str, default='dataset/crawl_list.csv')
+    parser.add_argument('--adj_mat_file', type=str, default='dataset/W_74.csv')
+    parser.add_argument('--output_path', type=str, default='./outputs/')
+    parser.add_argument('--n_val', type=str, default=1)
+    parser.add_argument('--n_test', type=str, default=1)
+    parser.add_argument('--use_cuda', action='store_true')
+    args = parser.parse_args()
+
+    blocks = [[1, 32, 64], [64, 32, 128]]
+    args.blocks = blocks
+    log.info(args)
+    if not os.path.exists(args.output_path):
+        os.makedirs(args.output_path)
+
+    main(args)
--- a/examples/stgcn/models/model.py
+++ b/examples/stgcn/models/model.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""This file implement the STGCN model.
+"""
+import numpy as np
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+import pgl
+
+
+class STGCNModel(object):
+    """Implementation of Spatio-Temporal Graph Convolutional Networks"""
+
+    def __init__(self, args, gw):
+        self.args = args
+        self.gw = gw
+
+        self.input = fl.data(
+            name="input",
+            shape=[None, args.n_his + 1, args.n_route, 1],
+            dtype="float32")
+
+    def forward(self):
+        """forward"""
+        x = self.input[:, 0:self.args.n_his, :, :]
+        # Ko>0: kernel size of temporal convolution in the output layer.
+        Ko = self.args.n_his
+        # ST-Block
+        for i, channels in enumerate(self.args.blocks):
+            x = self.st_conv_block(
+                x,
+                self.args.Ks,
+                self.args.Kt,
+                channels,
+                "st_conv_%d" % i,
+                self.args.keep_prob,
+                act_func='GLU')
+
+        # output layer
+        if Ko > 1:
+            y = self.output_layer(x, Ko, 'output_layer')
+        else:
+            raise ValueError(f'ERROR: kernel size Ko must be greater than 1, \
+                    but received "{Ko}".')
+
+        label = self.input[:, self.args.n_his:self.args.n_his + 1, :, :]
+        train_loss = fl.reduce_sum((y - label) * (y - label))
+        single_pred = y[:, 0, :, :]  # shape: [batch, n, 1]
+
+        return train_loss, single_pred
+
+    def st_conv_block(self,
+                      x,
+                      Ks,
+                      Kt,
+                      channels,
+                      name,
+                      keep_prob,
+                      act_func='GLU'):
+        """Spatio-Temporal convolution block"""
+        c_si, c_t, c_oo = channels
+
+        x_s = self.temporal_conv_layer(
+            x, Kt, c_si, c_t, "%s_tconv_in" % name, act_func=act_func)
+        x_t = self.spatio_conv_layer(x_s, Ks, c_t, c_t, "%s_sonv" % name)
+        x_o = self.temporal_conv_layer(x_t, Kt, c_t, c_oo,
+                                       "%s_tconv_out" % name)
+
+        x_ln = fl.layer_norm(x_o)
+        return fl.dropout(x_ln, dropout_prob=(1.0 - keep_prob))
+
+    def temporal_conv_layer(self, x, Kt, c_in, c_out, name, act_func='relu'):
+        """Temporal convolution layer"""
+        _, T, n, _ = x.shape
+        if c_in > c_out:
+            x_input = fl.conv2d(
+                input=x,
+                num_filters=c_out,
+                filter_size=[1, 1],
+                stride=[1, 1],
+                padding="SAME",
+                data_format="NHWC",
+                param_attr=fluid.ParamAttr(name="%s_conv2d_1" % name))
+        elif c_in < c_out:
+            # if the size of input channel is less than the output,
+            # padding x to the same size of output channel.
+            pad = fl.fill_constant_batch_size_like(
+                input=x,
+                shape=[-1, T, n, c_out - c_in],
+                dtype="float32",
+                value=0.0)
+            x_input = fl.concat([x, pad], axis=3)
+        else:
+            x_input = x
+
+        #  x_input = x_input[:, Kt - 1:T, :, :]
+        if act_func == 'GLU':
+            # gated liner unit
+            bt_init = fluid.initializer.ConstantInitializer(value=0.0)
+            bt = fl.create_parameter(
+                shape=[2 * c_out],
+                dtype="float32",
+                attr=fluid.ParamAttr(
+                    name="%s_bt" % name, trainable=True, initializer=bt_init),
+            )
+            x_conv = fl.conv2d(
+                input=x,
+                num_filters=2 * c_out,
+                filter_size=[Kt, 1],
+                stride=[1, 1],
+                padding="SAME",
+                data_format="NHWC",
+                param_attr=fluid.ParamAttr(name="%s_conv2d_wt" % name))
+            x_conv = x_conv + bt
+            return (x_conv[:, :, :, 0:c_out] + x_input
+                    ) * fl.sigmoid(x_conv[:, :, :, -c_out:])
+        else:
+            bt_init = fluid.initializer.ConstantInitializer(value=0.0)
+            bt = fl.create_parameter(
+                shape=[c_out],
+                dtype="float32",
+                attr=fluid.ParamAttr(
+                    name="%s_bt" % name, trainable=True, initializer=bt_init),
+            )
+            x_conv = fl.conv2d(
+                input=x,
+                num_filters=c_out,
+                filter_size=[Kt, 1],
+                stride=[1, 1],
+                padding="SAME",
+                data_format="NHWC",
+                param_attr=fluid.ParamAttr(name="%s_conv2d_wt" % name))
+            x_conv = x_conv + bt
+            if act_func == "linear":
+                return x_conv
+            elif act_func == "sigmoid":
+                return fl.sigmoid(x_conv)
+            elif act_func == "relu":
+                return fl.relu(x_conv + x_input)
+            else:
+                raise ValueError(
+                    f'ERROR: activation function "{act_func}" is not defined.')
+
+    def spatio_conv_layer(self, x, Ks, c_in, c_out, name):
+        """Spatio convolution layer"""
+        _, T, n, _ = x.shape
+        if c_in > c_out:
+            x_input = fl.conv2d(
+                input=x,
+                num_filters=c_out,
+                filter_size=[1, 1],
+                stride=[1, 1],
+                padding="SAME",
+                data_format="NHWC",
+                param_attr=fluid.ParamAttr(name="%s_conv2d_1" % name))
+        elif c_in < c_out:
+            # if the size of input channel is less than the output,
+            # padding x to the same size of output channel.
+            pad = fl.fill_constant_batch_size_like(
+                input=x,
+                shape=[-1, T, n, c_out - c_in],
+                dtype="float32",
+                value=0.0)
+            x_input = fl.concat([x, pad], axis=3)
+        else:
+            x_input = x
+
+        for i in range(Ks):
+            # x_input shape: [B,T, num_nodes, c_out]
+            x_input = fl.reshape(x_input, [-1, c_out])
+
+            x_input = self.message_passing(
+                self.gw,
+                x_input,
+                name="%s_mp_%d" % (name, i),
+                norm=self.gw.node_feat["norm"])
+
+            x_input = fl.fc(x_input,
+                            size=c_out,
+                            bias_attr=False,
+                            param_attr=fluid.ParamAttr(name="%s_gcn_fc_%d" %
+                                                       (name, i)))
+
+            bias = fluid.layers.create_parameter(
+                shape=[c_out],
+                dtype='float32',
+                is_bias=True,
+                name='%s_gcn_bias_%d' % (name, i))
+            x_input = fluid.layers.elementwise_add(x_input, bias, act="relu")
+
+            x_input = fl.reshape(x_input, [-1, T, n, c_out])
+
+        return x_input
+
+    def message_passing(self, gw, feature, name, norm=None):
+        """Message passing layer"""
+
+        def send_src_copy(src_feat, dst_feat, edge_feat):
+            """send function"""
+            return src_feat["h"] * edge_feat['w']
+
+        if norm is not None:
+            feature = feature * norm
+
+        msg = gw.send(
+            send_src_copy,
+            nfeat_list=[("h", feature)],
+            efeat_list=[('w', gw.edge_feat['weights'])])
+        output = gw.recv(msg, "sum")
+
+        if norm is not None:
+            output = output * norm
+
+        return output
+
+    def output_layer(self, x, T, name, act_func='GLU'):
+        """Output layer"""
+        _, _, n, channel = x.shape
+
+        # maps multi-steps to one.
+        x_i = self.temporal_conv_layer(
+            x=x,
+            Kt=T,
+            c_in=channel,
+            c_out=channel,
+            name="%s_in" % name,
+            act_func=act_func)
+        x_ln = fl.layer_norm(x_i)
+        x_o = self.temporal_conv_layer(
+            x=x_ln,
+            Kt=1,
+            c_in=channel,
+            c_out=channel,
+            name="%s_out" % name,
+            act_func='sigmoid')
+
+        # maps multi-channels to one.
+        x_fc = self.fully_con_layer(
+            x=x_o, n=n, channel=channel, name="%s_fc" % name)
+        return x_fc
+
+    def fully_con_layer(self, x, n, channel, name):
+        """Fully connected layer"""
+        bt_init = fluid.initializer.ConstantInitializer(value=0.0)
+        bt = fl.create_parameter(
+            shape=[n, 1],
+            dtype="float32",
+            attr=fluid.ParamAttr(
+                name="%s_bt" % name, trainable=True, initializer=bt_init), )
+        x_conv = fl.conv2d(
+            input=x,
+            num_filters=1,
+            filter_size=[1, 1],
+            stride=[1, 1],
+            padding="SAME",
+            data_format="NHWC",
+            param_attr=fluid.ParamAttr(name="%s_conv2d" % name))
+        x_conv = x_conv + bt
+        return x_conv
--- a/examples/stgcn/models/tester.py
+++ b/examples/stgcn/models/tester.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""This file implement the testing process of STGCN model.
+"""
+import os
+import sys
+import time
+import argparse
+import numpy as np
+import pandas as pd
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as fl
+import pgl
+from pgl.utils.logger import log
+
+from data_loader.data_utils import gen_batch
+from utils.math_utils import evaluation
+
+
+def multi_pred(exe, gw, gf, program, y_pred, seq, batch_size, \
+        n_his, n_pred, step_idx, dynamic_batch=True):
+    """multi step prediction"""
+    pred_list = []
+    for i in gen_batch(
+            seq, min(batch_size, len(seq)), dynamic_batch=dynamic_batch):
+
+        # Note: use np.copy() to avoid the modification of source data.
+        test_seq = np.copy(i[:, 0:n_his + 1, :, :]).astype(np.float32)
+        graph = gf.build_graph(i[:, 0:n_his, :, :])
+        feed = gw.to_feed(graph)
+        step_list = []
+        for j in range(n_pred):
+            feed['input'] = test_seq
+            pred = exe.run(program, feed=feed, fetch_list=[y_pred])
+            if isinstance(pred, list):
+                pred = np.array(pred[0])
+            test_seq[:, 0:n_his - 1, :, :] = test_seq[:, 1:n_his, :, :]
+            test_seq[:, n_his - 1, :, :] = pred
+            step_list.append(pred)
+        pred_list.append(step_list)
+    #  pred_array -> [n_pred, len(seq), n_route, C_0)
+    pred_array = np.concatenate(pred_list, axis=1)
+    return pred_array, pred_array.shape[1]
+
+
+def model_inference(exe, gw, gf, program, pred, inputs, args, step_idx,
+                    min_va_val, min_val):
+    """inference model"""
+    x_val, x_test, x_stats = inputs.get_data('val'), inputs.get_data(
+        'test'), inputs.get_stats()
+
+    if args.n_his + args.n_pred > x_val.shape[1]:
+        raise ValueError(
+            f'ERROR: the value of n_pred "{args.n_pred}" exceeds the length limit.'
+        )
+
+    # y_val shape: [n_pred, len(x_val), n_route, C_0)
+    y_val, len_val = multi_pred(exe, gw, gf, program, pred, \
+            x_val, args.batch_size, args.n_his, args.n_pred, step_idx)
+
+    evl_val = evaluation(x_val[0:len_val, step_idx + args.n_his, :, :],
+                         y_val[step_idx], x_stats)
+
+    # chks: indicator that reflects the relationship of values between evl_val and min_va_val.
+    chks = evl_val < min_va_val
+    # update the metric on test set, if model's performance got improved on the validation.
+    if sum(chks):
+        min_va_val[chks] = evl_val[chks]
+        y_pred, len_pred = multi_pred(exe, gw, gf, program, pred, \
+                x_test, args.batch_size, args.n_his, args.n_pred, step_idx)
+
+        evl_pred = evaluation(x_test[0:len_pred, step_idx + args.n_his, :, :],
+                              y_pred[step_idx], x_stats)
+        min_val = evl_pred
+
+    return min_va_val, min_val
+
+
+def model_test(exe, gw, gf, program, pred, inputs, args):
+    """test model"""
+    if args.inf_mode == 'sep':
+        # for inference mode 'sep', the type of step index is int.
+        step_idx = args.n_pred - 1
+        tmp_idx = [step_idx]
+    elif args.inf_mode == 'merge':
+        # for inference mode 'merge', the type of step index is np.ndarray.
+        step_idx = tmp_idx = np.arange(3, args.n_pred + 1, 3) - 1
+        print(step_idx)
+    else:
+        raise ValueError(f'ERROR: test mode "{args.inf_mode}" is not defined.')
+
+    x_test, x_stats = inputs.get_data('test'), inputs.get_stats()
+    y_test, len_test = multi_pred(exe, gw, gf, program, pred, \
+            x_test, args.batch_size, args.n_his, args.n_pred, step_idx)
+
+    # save result
+    gt = x_test[0:len_test, args.n_his:, :, :].reshape(-1, args.n_route)
+    y_pred = y_test.reshape(-1, args.n_route)
+    city_df = pd.read_csv(args.city_file)
+    city_df = city_df.drop(0)
+
+    np.savetxt(
+        os.path.join(args.output_path, "groundtruth.csv"),
+        gt.astype(np.int32),
+        fmt='%d',
+        delimiter=',',
+        header=",".join(city_df['city']))
+    np.savetxt(
+        os.path.join(args.output_path, "prediction.csv"),
+        y_pred.astype(np.int32),
+        fmt='%d',
+        delimiter=",",
+        header=",".join(city_df['city']))
+
+    for i in range(step_idx + 1):
+        evl = evaluation(x_test[0:len_test, step_idx + args.n_his, :, :],
+                         y_test[i], x_stats)
+        for ix in tmp_idx:
+            te = evl[ix - 2:ix + 1]
+            print(
+                f'Time Step {i + 1}: MAPE {te[0]:7.3%}; MAE  {te[1]:4.3f}; RMSE {te[2]:6.3f}.'
+            )
--- a/examples/stgcn/utils/math_utils.py
+++ b/examples/stgcn/utils/math_utils.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Evaluation"""
+import os
+import sys
+import time
+import argparse
+import numpy as np
+
+
+def z_score(x, mean, std):
+    """z_score"""
+    return (x - mean) / std
+
+
+def z_inverse(x, mean, std):
+    """The inverse of function z_score"""
+    return x * std + mean
+
+
+def MAPE(v, v_):
+    """Mean absolute percentage error."""
+    return np.mean(np.abs(v_ - v) / (v + 1e-5))
+
+
+def RMSE(v, v_):
+    """Mean squared error."""
+    return np.sqrt(np.mean((v_ - v)**2))
+
+
+def MAE(v, v_):
+    """Mean absolute error."""
+    return np.mean(np.abs(v_ - v))
+
+
+def evaluation(y, y_, x_stats):
+    """Calculate MAPE, MAE and RMSE between ground truth and prediction."""
+    dim = len(y_.shape)
+
+    if dim == 3:
+        # single_step case
+        v = z_inverse(y, x_stats['mean'], x_stats['std'])
+        v_ = z_inverse(y_, x_stats['mean'], x_stats['std'])
+        return np.array([MAPE(v, v_), MAE(v, v_), RMSE(v, v_)])
+    else:
+        # multi_step case
+        tmp_list = []
+        # y -> [time_step, batch_size, n_route, 1]
+        y = np.swapaxes(y, 0, 1)
+        # recursively call
+        for i in range(y_.shape[0]):
+            tmp_res = evaluation(y[i], y_[i], x_stats)
+            tmp_list.append(tmp_res)
+        return np.concatenate(tmp_list, axis=-1)
--- a/examples/strucvec/README.md
+++ b/examples/strucvec/README.md
+# struc2vec: Learning Node Representations from Structural Identity
+[Struc2vec](https://arxiv.org/abs/1704.03165) is is a concept of symmetry in which network nodes are identified according to the network structure and their relationship to other nodes. A novel and flexible framework for learning latent representations is proposed in the paper of struc2vec. We reproduce Struc2vec algorithm in the PGL.
+###  DataSet
+The paper of use air-traffic network to valid algorithm of Struc2vec.
+The each edge in the dataset indicate that having one flight between the airports. Using the the connection between the airports to predict the level of activity. The following dataset will be used to valid the algorithm accuracy.Data collected from the Bureau of Transportation Statistics2 from January to October, 2016. The network has 1,190 nodes, 13,599 edges (diameter is 8). [Link](https://www.transtats.bts.gov/)
+
+- usa-airports.edgelist 
+- labels-usa-airports.txt
+
+### Dependencies
+If use want to use the struc2vec model in pgl, please install the gensim, pathos, fastdtw additional.
+- paddlepaddle>=1.6
+- pgl
+- gensim 
+- pathos
+- fastdtw
+
+### How to use
+For examples, we want to train and valid the Struc2vec model on American airpot dataset 
+> python struc2vec.py --edge_file data/usa-airports.edgelist --label_file data/labels-usa-airports.txt --train True --valid True --opt2 True
+
+### Hyperparameters
+| Args| Meaning|
+| ------------- | ------------- |
+| edge_file | input file name for edges|
+| label_file | input file name for node label|
+| emb_file | input file name for node label|
+| walk_depth| The step3 for random walk|
+| opt1| The flag to open optimization 1 to reduce time cost|
+| opt2| The flag to open optimization 2 to reduce time cost|
+| w2v_emb_size| The dims of output the word2vec embedding|
+| w2v_window_size| The context length of word2vec|
+| w2v_epoch| The num of epoch to train the model.|
+| train| The flag to run the struc2vec algorithm to get the w2v embedding|
+| valid| The flag to use the w2v embedding to valid the classification result|
+| num_class| The num of class in classification model to be trained|
+
+###  Experiment results
+| Dataset | Model | Metric | PGL Result | Paper repo Result |
+| ------------- | ------------- |------------- |------------- |------------- |
+| American airport dataset | Struc2vec without time cost optimization| ACC |0.6483|0.6340|
+| American airport dataset | Struc2vec with optimization 1| ACC |0.6466|0.6242|
+| American airport dataset | Struc2vec with optimization 2| ACC |0.6252|0.6241|
+| American airport dataset | Struc2vec with optimization1&2| ACC |0.6226|0.6083|
--- a/examples/strucvec/classify.py
+++ b/examples/strucvec/classify.py
+"""
+classify.py
+"""
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+
+
+def build_lr_model(args):
+    """
+    Build the LR model to train.
+    """
+    emb_x = fluid.layers.data(
+        name="emb_x", dtype='float32', shape=[args.w2v_emb_size])
+    label = fluid.layers.data(name="label_y", dtype='int64', shape=[1])
+    logits = fluid.layers.fc(input=emb_x,
+                             size=args.num_class,
+                             act=None,
+                             name='classification_layer')
+    proba = fluid.layers.softmax(logits)
+    loss = fluid.layers.softmax_with_cross_entropy(logits, label)
+    loss = fluid.layers.mean(loss)
+    acc = fluid.layers.accuracy(input=proba, label=label, k=1)
+    return loss, acc
+
+
+def construct_feed_data(data):
+    """
+    Construct the data to feed model.
+    """
+    datas = []
+    labels = []
+    for sample in data:
+        if len(datas) < 16:
+            labels.append([sample[-1]])
+            datas.append(sample[1:-1])
+        else:
+            yield np.array(datas).astype(np.float32), np.array(labels).astype(
+                np.int64)
+            datas = []
+            labels = []
+    if len(datas) != 0:
+        yield np.array(datas).astype(np.float32), np.array(labels).astype(
+            np.int64)
+
+
+def run_epoch(exe, data, program, stage, epoch, loss, acc):
+    """
+    The epoch funtcion to run each epoch.
+    """
+    print('start {} epoch of {}'.format(stage, epoch))
+    all_loss = 0.0
+    all_acc = 0.0
+    all_samples = 0.0
+    count = 0
+    for datas, labels in construct_feed_data(data):
+        batch_loss, batch_acc = exe.run(
+            program,
+            fetch_list=[loss, acc],
+            feed={"emb_x": datas,
+                  "label_y": labels})
+        len_samples = len(datas)
+        all_loss = batch_loss * len_samples
+        all_acc = batch_acc * len_samples
+        all_samples += len_samples
+        count += 1
+    print("pass:{}, epoch:{}, loss:{}, acc:{}".format(stage, epoch, batch_loss,
+                                                      all_acc / (len_samples)))
+
+
+def train_lr_model(args, data):
+    """
+    The main function to run the lr model.
+    """
+    data_nums = len(data)
+    train_data_nums = int(0.8 * data_nums)
+    train_data = data[:train_data_nums]
+    test_data = data[train_data_nums:]
+
+    place = fluid.CPUPlace()
+
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+
+    with fluid.program_guard(train_program, startup_program):
+        loss, acc = build_lr_model(args)
+    test_program = train_program.clone(for_test=True)
+
+    with fluid.program_guard(train_program, startup_program):
+        adam = fluid.optimizer.Adam(learning_rate=args.lr)
+        adam.minimize(loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    for epoch in range(0, args.epoch):
+        run_epoch(exe, train_data, train_program, "train", epoch, loss, acc)
+        print('-------------------')
+        run_epoch(exe, test_data, test_program, "valid", epoch, loss, acc)
--- a/examples/strucvec/data_loader.py
+++ b/examples/strucvec/data_loader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+data_loader.py
+"""
+from pgl import graph
+import numpy as np
+
+
+class EdgeDataset():
+    """
+    The data load just read the edge file, at the same time reindex the source and destination.
+    """
+
+    def __init__(self, undirected=True, data_dir=""):
+        self._undirected = undirected
+        self._data_dir = data_dir
+        self._load_edge_data()
+
+    def _load_edge_data(self):
+        node_sets = set()
+        edges = []
+        with open(self._data_dir, "r") as f:
+            node_dict = dict()
+            for line in f:
+                src, dist = [
+                    int(data) for data in line.strip("\n\r").split(" ")
+                ]
+                if src not in node_dict:
+                    node_dict[src] = len(node_dict) + 1
+                src = node_dict[src]
+                if dist not in node_dict:
+                    node_dict[dist] = len(node_dict) + 1
+                dist = node_dict[dist]
+                node_sets.add(src)
+                node_sets.add(dist)
+                edges.append((src, dist))
+                if self._undirected:
+                    edges.append((dist, src))
+
+        num_nodes = len(node_sets)
+        self.graph = graph.Graph(num_nodes=num_nodes + 1, edges=edges)
+        self.nodes = np.array(list(node_sets))
+        self.node_dict = node_dict
--- a/examples/strucvec/requirements.txt
+++ b/examples/strucvec/requirements.txt
+gensim
+pathos
+fastdtw
--- a/examples/strucvec/sklearn_classify.py
+++ b/examples/strucvec/sklearn_classify.py
+"""
+sklearn_classify.py
+"""
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+import sklearn
+from sklearn.linear_model import LogisticRegression
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import accuracy_score
+
+random_seed = 67
+
+
+def train_lr_l2_model(args, data):
+    """
+    The main function to train lr model with l2 regularization.
+    """
+    acc_list = []
+    data = np.array(data)
+    data = data[data[:, 0].argsort()]
+    x_data = data[:, 1:-1]
+    y_data = data[:, -1]
+    for random_num in range(0, 10):
+        X_train, X_test, y_train, y_test = train_test_split(
+            x_data,
+            y_data,
+            test_size=0.2,
+            random_state=random_num + random_seed)
+
+        # use the one vs rest to train the lr model with l2 
+        pred_test = []
+        for i in range(0, args.num_class):
+            y_train_relabel = np.where(y_train == i, 1, 0)
+            y_test_relabel = np.where(y_test == i, 1, 0)
+            lr = LogisticRegression(C=10.0, random_state=0, max_iter=100)
+            lr.fit(X_train, y_train_relabel)
+            pred = lr.predict_proba(X_test)
+            pred_test.append(pred[:, -1].tolist())
+        pred_test = np.array(pred_test)
+        pred_test = np.transpose(pred_test)
+        c_index = np.argmax(pred_test, axis=1)
+        acc = accuracy_score(y_test.flatten(), c_index)
+        acc_list.append(acc)
+        print("pass:{}-acc:{}".format(random_num, acc))
+    print("the avg acc is {}".format(np.mean(acc_list)))
--- a/examples/strucvec/struc2vec.py
+++ b/examples/strucvec/struc2vec.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+struc2vec.py
+"""
+import argparse
+import math
+import random
+import numpy as np
+import pgl
+from pgl import graph
+from pgl.graph_kernel import alias_sample_build_table
+from pgl.sample import alias_sample
+from data_loader import EdgeDataset
+from classify import train_lr_model
+from sklearn_classify import train_lr_l2_model
+
+
+def selectDegrees(degree_root, index_left, index_right, degree_left,
+                  degree_right):
+    """
+    Select the which degree to be next step.
+    """
+
+    if index_left == -1:
+        degree_now = degree_right
+    elif index_right == -1:
+        degree_now = degree_left
+    elif (abs(degree_left - degree_root) < abs(degree_right - degree_root)):
+        degree_now = degree_left
+    else:
+        degree_now = degree_right
+
+    return degree_now
+
+
+class StrucVecGraph():
+    """
+    The class wrapper the PGL graph, the class involve the funtions to implement struc2vec algorithm.
+    """
+
+    def __init__(self, graph, nodes, opt1, opt2, opt3, depth, num_walks,
+                 walk_depth):
+        self.graph = graph
+        self.nodes = nodes
+        self.opt1 = opt1
+        self.opt2 = opt2
+        self.opt3 = opt3
+        self.num_walks = num_walks
+        self.walk_depth = walk_depth
+        self.tag = args.tag
+        self.degree_list = dict()
+        self.degree2nodes = dict()
+        self.node2degree = dict()
+        self.distance = dict()
+        self.degrees_sorted = None
+        self.layer_distance = dict()
+        self.layer_message = dict()
+        self.layer_norm_distance = dict()
+        self.sample_alias = dict()
+        self.sample_events = dict()
+        self.layer_node_weight_count = dict()
+        if opt3 == True:
+            self.depth = depth
+        else:
+            self.depth = 1000
+
+    def distance_func(self, a, b):
+        """
+        The basic function to calculate the distance between two list with different length.
+        """
+        ep = 0.5
+        m = max(a, b) + ep
+        mi = min(a, b) + ep
+        return ((m / mi) - 1)
+
+    def distance_opt1_func(self, a, b):
+        """
+        The optimization function to calculate the distance between two list with list count.
+        """
+        ep = 0.5
+        m = max(a[0], b[0]) + ep
+        mi = min(a[0], b[0]) + ep
+        return ((m / mi) - 1) * max(a[1], b[1])
+
+    def add_degree_todict(self, node_id, degree, depth, opt1):
+        """
+        output the degree of each node to a dict
+        """
+        if node_id not in self.degree_list:
+            self.degree_list[node_id] = dict()
+        if depth not in self.degree_list[node_id]:
+            self.degree_list[node_id][depth] = None
+        if opt1:
+            degree = np.array(np.unique(degree, return_counts=True)).T
+        self.degree_list[node_id][depth] = degree
+
+    def output_degree_with_depth(self, depth, opt1):
+        """
+        according to the BFS to get the degree of each layer
+        """
+        degree_dict = dict()
+
+        for node in self.nodes:
+            start_node = node
+            cur_node = node
+            cur_dep = 0
+            flag_visit = set()
+            while cur_node is not None and cur_dep < depth:
+                if not isinstance(cur_node, list):
+                    cur_node = [cur_node]
+                filter_node = []
+                for node in cur_node:
+                    if node not in flag_visit:
+                        flag_visit.add(node)
+                        filter_node.append(node)
+                cur_node = filter_node
+                if len(cur_node) == 0:
+                    break
+                outdegree = self.graph.outdegree(cur_node)
+                mask = (outdegree != 0)
+                if np.any(mask):
+                    outdegree = np.sort(outdegree[mask])
+                else:
+                    break
+                # save the layer degree message to dict 
+                self.add_degree_todict(start_node, outdegree[mask], cur_dep,
+                                       opt1)
+                succes = self.graph.successor(cur_node)
+                cur_node = []
+                for succ in succes:
+                    if isinstance(succ, np.ndarray):
+                        cur_node.extend(succ.flatten().tolist())
+                    elif isinstance(succ, int):
+                        cur_node.append(succ)
+                cur_node = list(set(cur_node))
+                cur_dep += 1
+
+    def get_sim_neighbours(self, node, selected_num):
+        """
+        Select the neighours by using the degree similiarity.
+        """
+        degree = self.node2degree[node]
+        select_count = 0
+        node_nbh_list = list()
+        for node_nbh in self.degree2nodes[degree]:
+            if node != node_nbh:
+                node_nbh_list.append(node_nbh)
+                select_count += 1
+                if select_count > selected_num:
+                    return node_nbh_list
+        degree_vec_len = len(self.degrees_sorted)
+        index_degree = self.degrees_sorted.index(degree)
+
+        index_left = -1
+        index_right = -1
+        degree_left = -1
+        degree_right = -1
+
+        if index_degree != -1 and index_degree >= 1:
+            index_left = index_degree - 1
+        if index_degree != -1 and index_degree <= degree_vec_len - 2:
+            index_right = index_degree + 1
+        if index_left == -1 and index_right == -1:
+            return node_nbh_list
+        if index_left != -1:
+            degree_left = self.degrees_sorted[index_left]
+        if index_right != -1:
+            degree_right = self.degrees_sorted[index_right]
+        select_degree = selectDegrees(degree, index_left, index_right,
+                                      degree_left, degree_right)
+        while True:
+            for node_nbh in self.degree2nodes[select_degree]:
+                if node_nbh != node:
+                    node_nbh_list.append(node_nbh)
+                    select_count += 1
+                    if select_count > selected_num:
+                        return node_nbh_list
+
+            if select_degree == degree_left:
+                if index_left >= 1:
+                    index_left = index_left - 1
+                else:
+                    index_left = -1
+
+            else:
+                if index_right <= degree_vec_len - 2:
+                    index_right += 1
+                else:
+                    index_right = -1
+
+            if index_left == -1 and index_right == -1:
+                return node_nbh_list
+
+            if index_left != -1:
+                degree_left = self.degrees_sorted[index_left]
+            if index_right != -1:
+                degree_right = self.degrees_sorted[index_right]
+            select_degree = selectDegrees(degree, index_left, index_right,
+                                          degree_left, degree_right)
+        return node_nbh_list
+
+    def calc_node_with_neighbor_dtw_opt2(self, src):
+        """
+        Use the optimization algorithm to reduce the next steps range. 
+        """
+        from fastdtw import fastdtw
+        node_nbh_list = self.get_sim_neighbours(src, self.selected_nbh_nums)
+        distance = {}
+        for dist in node_nbh_list:
+            calc_layer_len = min(len(self.degree_list[src]), \
+                len(self.degree_list[dist]))
+            distance_iteration = 0.0
+            distance[src, dist] = {}
+            for layer in range(0, calc_layer_len):
+                src_layer = self.degree_list[src][layer]
+                dist_layer = self.degree_list[dist][layer]
+                weight, path = fastdtw(
+                    src_layer,
+                    dist_layer,
+                    radius=1,
+                    dist=self.distance_calc_func)
+                distance_iteration += weight
+                distance[src, dist][layer] = distance_iteration
+        return distance
+
+    def calc_node_with_neighbor_dtw(self, src_index):
+        """
+        No optimization algorithm to reduce the next steps range, just calculate distance of all path. 
+        """
+        from fastdtw import fastdtw
+        distance = {}
+        for dist_index in range(src_index + 1, self.graph.num_nodes - 1):
+            src = self.nodes[src_index]
+            dist = self.nodes[dist_index]
+            calc_layer_len = min(len(self.degree_list[src]), \
+                len(self.degree_list[dist]))
+            distance_iteration = 0.0
+            distance[src, dist] = {}
+            for layer in range(0, calc_layer_len):
+                src_layer = self.degree_list[src][layer]
+                dist_layer = self.degree_list[dist][layer]
+                weight, path = fastdtw(
+                    src_layer,
+                    dist_layer,
+                    radius=1,
+                    dist=self.distance_calc_func)
+                distance_iteration += weight
+                distance[src, dist][layer] = distance_iteration
+        return distance
+
+    def calc_distances_between_nodes(self):
+        """
+        Use the dtw algorithm to calculate the distance between nodes. 
+        """
+        from fastdtw import fastdtw
+        from pathos.multiprocessing import Pool
+        # decide use which algo to use 
+        if self.opt1 == True:
+            self.distance_calc_func = self.distance_opt1_func
+        else:
+            self.distance_calc_func = self.distance_func
+
+        dtws = []
+        if self.opt2:
+            depth = 0
+            for node in self.nodes:
+                if node in self.degree_list:
+                    if depth in self.degree_list[node]:
+                        degree = self.degree_list[node][depth]
+                        if args.opt1:
+                            degree = degree[0][0]
+                        else:
+                            degree = degree[0]
+                    if degree not in self.degree2nodes:
+                        self.degree2nodes[degree] = []
+                    if node not in self.node2degree:
+                        self.node2degree[node] = degree
+                    self.degree2nodes[degree].append(node)
+            # select the log(n) node to select data 
+            degree_keys = self.degree2nodes.keys()
+            degree_keys = np.array(list(degree_keys), dtype='int')
+            self.degrees_sorted = list(np.sort(degree_keys))
+            selected_nbh_nums = 2 * math.log(self.graph.num_nodes - 1, 2)
+            self.selected_nbh_nums = selected_nbh_nums
+
+            pool = Pool(10)
+            dtws = pool.map(self.calc_node_with_neighbor_dtw_opt2, self.nodes)
+            pool.close()
+            pool.join()
+        else:
+            src_indices = range(0, self.graph.num_nodes - 2)
+
+            pool = Pool(10)
+            dtws = pool.map(self.calc_node_with_neighbor_dtw, src_indices)
+            pool.close()
+            pool.join()
+        print('calc the dtw done.')
+        for dtw in dtws:
+            self.distance.update(dtw)
+
+    def normlization_layer_weight(self):
+        """
+        Normlation the distance between nodes, weight[1, 2, ....N] = distance[1, 2, ......N] / sum(distance)
+        """
+        for sd_keys, layer_weight in self.distance.items():
+            src, dist = sd_keys
+            layers, weights = layer_weight.keys(), layer_weight.values()
+            for layer, weight in zip(layers, weights):
+                if layer not in self.layer_distance:
+                    self.layer_distance[layer] = {}
+                if layer not in self.layer_message:
+                    self.layer_message[layer] = {}
+                self.layer_distance[layer][src, dist] = weight
+
+                if src not in self.layer_message[layer]:
+                    self.layer_message[layer][src] = []
+                if dist not in self.layer_message[layer]:
+                    self.layer_message[layer][dist] = []
+                self.layer_message[layer][src].append(dist)
+                self.layer_message[layer][dist].append(src)
+
+        # normalization the layer weight  
+        for i in range(0, self.depth):
+            layer_weight = 0.0
+            layer_count = 0
+            if i not in self.layer_norm_distance:
+                self.layer_norm_distance[i] = {}
+            if i not in self.sample_alias:
+                self.sample_alias[i] = {}
+            if i not in self.sample_events:
+                self.sample_events[i] = {}
+            if i not in self.layer_message:
+                continue
+            for node in self.nodes:
+                if node not in self.layer_message[i]:
+                    continue
+                nbhs = self.layer_message[i][node]
+                weights = []
+                sum_weight = 0.0
+                for dist in nbhs:
+                    if (node, dist) in self.layer_distance[i]:
+                        weight = self.layer_distance[i][node, dist]
+                    else:
+                        weight = self.layer_distance[i][dist, node]
+                    weight = np.exp(-float(weight))
+                    weights.append(weight)
+                # norm the weight 
+                sum_weight = sum(weights)
+                if sum_weight == 0.0:
+                    sum_weight = 1.0
+                weight_list = [weight / sum_weight for weight in weights]
+                self.layer_norm_distance[i][node] = weight_list
+                alias, events = alias_sample_build_table(np.array(weight_list))
+                self.sample_alias[i][node] = alias
+                self.sample_events[i][node] = events
+                layer_weight += 1.0
+                #layer_weight += sum(weight_list)
+                layer_count += len(weights)
+            layer_avg_weight = layer_weight / (1.0 * layer_count)
+
+            self.layer_node_weight_count[i] = dict()
+            for node in self.nodes:
+                if node not in self.layer_norm_distance[i]:
+                    continue
+                weight_list = self.layer_norm_distance[i][node]
+                node_cnt = 0
+                for weight in weight_list:
+                    if weight > layer_avg_weight:
+                        node_cnt += 1
+                self.layer_node_weight_count[i][node] = node_cnt
+
+    def choose_neighbor_alias_method(self, node, layer):
+        """
+        Choose the neighhor with strategy of random 
+        """
+        weight_list = self.layer_norm_distance[layer][node]
+        neighbors = self.layer_message[layer][node]
+        select_idx = alias_sample(1, self.sample_alias[layer][node],
+                                  self.sample_events[layer][node])
+        return neighbors[select_idx[0]]
+
+    def choose_layer_to_walk(self, node, layer):
+        """
+        Choose the layer to random walk
+        """
+        random_value = random.random()
+        higher_neigbours_nums = self.layer_node_weight_count[layer][node]
+        prob = math.log(higher_neigbours_nums + math.e)
+        prob = prob / (1.0 + prob)
+        if random_value > prob:
+            if layer > 0:
+                layer = layer - 1
+        else:
+            if layer + 1 in self.layer_message and \
+                node in self.layer_message[layer + 1]:
+                layer = layer + 1
+        return layer
+
+    def executor_random_walk(self, walk_process_id):
+        """
+        The main function to execute the structual random walk 
+        """
+        nodes = self.nodes
+        random.shuffle(nodes)
+        walk_path_all_nodes = []
+        for node in nodes:
+            walk_path = []
+            walk_path.append(node)
+            layer = 0
+            while len(walk_path) < self.walk_depth:
+                prop = random.random()
+                if prop < 0.3:
+                    node = self.choose_neighbor_alias_method(node, layer)
+                    walk_path.append(node)
+                else:
+                    layer = self.choose_layer_to_walk(node, layer)
+            walk_path_all_nodes.append(walk_path)
+        return walk_path_all_nodes
+
+    def random_walk_structual_sim(self):
+        """
+        According to struct distance to walk the path 
+        """
+        from pathos.multiprocessing import Pool
+        print('start process struc2vec random walk.')
+        walks_process_ids = [i for i in range(0, self.num_walks)]
+        pool = Pool(10)
+        walks = pool.map(self.executor_random_walk, walks_process_ids)
+        pool.close()
+        pool.join()
+
+        #save the final walk result 
+        file_result = open(args.tag + "_walk_path", "w")
+        for walk in walks:
+            for walk_node in walk:
+                walk_node_str = " ".join([str(node) for node in walk_node])
+                file_result.write(walk_node_str + "\n")
+        file_result.close()
+        print('process struc2vec random walk done.')
+
+
+def learning_embedding_from_struc2vec(args):
+    """
+    Learning the word2vec from the random path
+    """
+    from gensim.models import Word2Vec
+    from gensim.models.word2vec import LineSentence
+    struc_walks = LineSentence(args.tag + "_walk_path")
+    model = Word2Vec(struc_walks, size=args.w2v_emb_size, window=args.w2v_window_size, iter=args.w2v_epoch, \
+        min_count=0, hs=1, sg=1, workers=5)
+    model.wv.save_word2vec_format(args.emb_file)
+
+
+def main(args):
+    """
+    The main fucntion to run the algorithm struc2vec
+    """
+    if args.train:
+        dataset = EdgeDataset(
+            undirected=args.undirected, data_dir=args.edge_file)
+        graph = StrucVecGraph(dataset.graph, dataset.nodes, args.opt1, args.opt2, args.opt3, args.depth,\
+            args.num_walks, args.walk_depth)
+        graph.output_degree_with_depth(args.depth, args.opt1)
+        graph.calc_distances_between_nodes()
+        graph.normlization_layer_weight()
+        graph.random_walk_structual_sim()
+        learning_embedding_from_struc2vec(args)
+        file_label = open(args.label_file)
+        file_label_reindex = open(args.label_file + "_reindex", "w")
+        for line in file_label:
+            items = line.strip("\n\r").split(" ")
+            try:
+                items = [int(item) for item in items]
+            except:
+                continue
+            if items[0] not in dataset.node_dict:
+                continue
+            reindex = dataset.node_dict[items[0]]
+            file_label_reindex.write(str(reindex) + " " + str(items[1]) + "\n")
+        file_label_reindex.close()
+
+    if args.valid:
+        emb_file = open(args.emb_file)
+        file_label_reindex = open(args.label_file + "_reindex")
+        label_dict = dict()
+        for line in file_label_reindex:
+            items = line.strip("\n\r").split(" ")
+            try:
+                label_dict[int(items[0])] = int(items[1])
+            except:
+                continue
+
+        data_for_train_valid = []
+        for line in emb_file:
+            items = line.strip("\n\r").split(" ")
+            if len(items) <= 2:
+                continue
+            index = int(items[0])
+            label = int(label_dict[index])
+            sample = []
+            sample.append(index)
+            feature_emb = items[1:]
+            feature_emb = [float(feature) for feature in feature_emb]
+            sample.extend(feature_emb)
+            sample.append(label)
+            data_for_train_valid.append(sample)
+        train_lr_l2_model(args, data_for_train_valid)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='struc2vec')
+    parser.add_argument("--edge_file", type=str, default="")
+    parser.add_argument("--label_file", type=str, default="")
+    parser.add_argument("--emb_file", type=str, default="w2v_emb")
+    parser.add_argument("--undirected", type=bool, default=True)
+    parser.add_argument("--depth", type=int, default=8)
+    parser.add_argument("--num_walks", type=int, default=10)
+    parser.add_argument("--walk_depth", type=int, default=80)
+    parser.add_argument("--opt1", type=bool, default=False)
+    parser.add_argument("--opt2", type=bool, default=False)
+    parser.add_argument("--opt3", type=bool, default=False)
+    parser.add_argument("--w2v_emb_size", type=int, default=128)
+    parser.add_argument("--w2v_window_size", type=int, default=10)
+    parser.add_argument("--w2v_epoch", type=int, default=5)
+    parser.add_argument("--train", type=bool, default=False)
+    parser.add_argument("--valid", type=bool, default=False)
+    parser.add_argument("--lr", type=float, default=0.0001)
+    parser.add_argument("--num_class", type=int, default=4)
+    parser.add_argument("--epoch", type=int, default=2000)
+    parser.add_argument("--tag", type=str, default="")
+
+    args = parser.parse_args()
+    main(args)
--- a/examples/unsup_graphsage/README.md
+++ b/examples/unsup_graphsage/README.md
+# Unsupervised GraphSAGE in PGL
+[GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) is a general inductive framework that leverages node feature
+information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, GraphSAGE learns a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Based on PGL, we reproduce GraphSAGE algorithm and reach the same level of indicators as the paper in Reddit Dataset. Besides, this is an example of subgraph sampling and training in PGL.
+For purpose of unsupervised learning, we use graph edges as positive samples for graphsage training.
+### Datasets(Quickstart)
+The dataset `./sample.txt` is handcrafted bigraph for quick demo purpose, which format is `src \t dst`.
+### Dependencies
+```txt
+- paddlepaddle>=1.6
+- pgl
+```
+### How to run
+#### 1. Training
+```sh
+python train.py --data_path ./sample.txt --num_nodes 2000 --phase train
+```
+#### 2. Predicting
+```sh
+python train.py --data_path ./sample.txt --num_nodes 2000 --phase predict
+```
+The resulted node embedding is stored in `emb.npy` file, which latter can be loaded using `np.load`.
+#### Hyperparameters
+- epoch: Number of epochs default (1)
+- use_cuda: Use gpu if assign use_cuda. 
+- layer_type: We support 4 aggregator types including "graphsage_mean", "graphsage_maxpool", "graphsage_meanpool" and "graphsage_lstm".
+- sample_workers: The number of workers for multiprocessing subgraph sample.
+- lr: Learning rate.
+- batch_size: Batch size.
+- samples: The max neighbors sampling rate for each hop. (default: [10, 10])
+- num_layers: The number of layer for graph sampling. (default: 2)
+- hidden_size: The hidden size of the GraphSAGE models.
+- checkpoint. Path for model checkpoint at each epoch. (default: 'model_ckpt')
--- a/examples/unsup_graphsage/model.py
+++ b/examples/unsup_graphsage/model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""model.py"""
+import paddle
+import paddle.fluid as fluid
+
+
+def copy_send(src_feat, dst_feat, edge_feat):
+    """copy_send"""
+    return src_feat["h"]
+
+
+def mean_recv(feat):
+    """mean_recv"""
+    return fluid.layers.sequence_pool(feat, pool_type="average")
+
+
+def sum_recv(feat):
+    """sum_recv"""
+    return fluid.layers.sequence_pool(feat, pool_type="sum")
+
+
+def max_recv(feat):
+    """max_recv"""
+    return fluid.layers.sequence_pool(feat, pool_type="max")
+
+
+def lstm_recv(feat):
+    """lstm_recv"""
+    hidden_dim = 128
+    forward, _ = fluid.layers.dynamic_lstm(
+        input=feat, size=hidden_dim * 4, use_peepholes=False)
+    output = fluid.layers.sequence_last_step(forward)
+    return output
+
+
+def graphsage_mean(gw, feature, hidden_size, act, name):
+    """graphsage_mean"""
+    msg = gw.send(copy_send, nfeat_list=[("h", feature)])
+    neigh_feature = gw.recv(msg, mean_recv)
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+
+
+def graphsage_meanpool(gw,
+                       feature,
+                       hidden_size,
+                       act,
+                       name,
+                       inner_hidden_size=512):
+    """graphsage_meanpool"""
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+    msg = gw.send(copy_send, nfeat_list=[("h", neigh_feature)])
+    neigh_feature = gw.recv(msg, mean_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+
+
+def graphsage_maxpool(gw,
+                      feature,
+                      hidden_size,
+                      act,
+                      name,
+                      inner_hidden_size=512):
+    """graphsage_maxpool"""
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+    msg = gw.send(copy_send, nfeat_list=[("h", neigh_feature)])
+    neigh_feature = gw.recv(msg, max_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+
+
+def graphsage_lstm(gw, feature, hidden_size, act, name):
+    """graphsage_lstm"""
+    inner_hidden_size = 128
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+
+    hidden_dim = 128
+    forward_proj = fluid.layers.fc(input=neigh_feature,
+                                   size=hidden_dim * 4,
+                                   bias_attr=False,
+                                   name="lstm_proj")
+    msg = gw.send(copy_send, nfeat_list=[("h", forward_proj)])
+    neigh_feature = gw.recv(msg, lstm_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
--- a/examples/unsup_graphsage/reader.py
+++ b/examples/unsup_graphsage/reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""reader.py"""
+import os
+import numpy as np
+import pickle as pkl
+import paddle
+import paddle.fluid as fluid
+import pgl
+import time
+from pgl.utils.logger import log
+from pgl.utils import mp_reader
+
+
+def batch_iter(data, batch_size):
+    """batch_iter"""
+    src, dst, eid = data
+    perm = np.arange(len(eid))
+    np.random.shuffle(perm)
+    start = 0
+    while start < len(src):
+        index = perm[start:start + batch_size]
+        start += batch_size
+        yield src[index], dst[index], eid[index]
+
+
+def traverse(item):
+    """traverse"""
+    if isinstance(item, list) or isinstance(item, np.ndarray):
+        for i in iter(item):
+            for j in traverse(i):
+                yield j
+    else:
+        yield item
+
+
+def flat_node_and_edge(nodes, eids):
+    """flat_node_and_edge"""
+    nodes = list(set(traverse(nodes)))
+    eids = list(set(traverse(eids)))
+    return nodes, eids
+
+
+def graph_reader(num_layers,
+                 graph_wrappers,
+                 data,
+                 batch_size,
+                 samples,
+                 num_workers,
+                 feed_name_list,
+                 use_pyreader=False,
+                 graph=None,
+                 predict=False):
+    """graph_reader
+    """
+    assert num_layers == len(samples), "Must be unified number of layers!"
+    if num_workers > 1:
+        return multiprocess_graph_reader(
+            num_layers,
+            graph_wrappers,
+            data,
+            batch_size,
+            samples,
+            num_workers,
+            feed_name_list,
+            use_pyreader,
+            graph=graph,
+            predict=predict)
+
+    batch_info = list(batch_iter(data, batch_size=batch_size))
+    work = worker(
+        num_layers,
+        batch_info,
+        graph_wrappers,
+        samples,
+        feed_name_list,
+        use_pyreader,
+        graph=graph,
+        predict=predict)
+
+    def reader():
+        """reader"""
+        for batch in work():
+            yield batch
+
+    return reader
+    #return paddle.reader.buffered(reader, 100)
+
+
+def worker(num_layers, batch_info, graph_wrappers, samples, feed_name_list,
+           use_pyreader, graph, predict):
+    """worker
+    """
+    pid = os.getppid()
+    np.random.seed((int(time.time() * 10000) + pid) % 65535)
+
+    graphs = [graph, graph]
+
+    def work():
+        """work
+        """
+        feed_dict = {}
+        ind = 0
+        perm = np.arange(0, len(batch_info))
+        np.random.shuffle(perm)
+        for p in perm:
+            batch_src, batch_dst, batch_eid = batch_info[p]
+            ind += 1
+            ind_start = time.time()
+            try:
+                nodes = start_nodes = np.concatenate([batch_src, batch_dst], 0)
+                eids = []
+                layer_nodes, layer_eids = [], []
+                for layer_idx in reversed(range(num_layers)):
+                    if len(start_nodes) == 0:
+                        layer_nodes = [nodes] + layer_nodes
+                        layer_eids = [eids] + layer_eids
+                        continue
+                    pred_nodes, pred_eids = graphs[
+                        layer_idx].sample_predecessor(
+                            start_nodes, samples[layer_idx], return_eids=True)
+                    last_nodes = nodes
+                    nodes, eids = flat_node_and_edge([nodes, pred_nodes],
+                                                     [eids, pred_eids])
+                    layer_nodes = [nodes] + layer_nodes
+                    layer_eids = [eids] + layer_eids
+                    # Find new nodes
+                    start_nodes = list(set(nodes) - set(last_nodes))
+                if predict is False:
+                    eids = (batch_eid * 2 + 1).tolist() + (batch_eid * 2
+                                                           ).tolist()
+                    layer_eids[0] = list(set(layer_eids[0]) - set(eids))
+
+                # layer_nodes[0]: use first layer nodes as all subgraphs' nodes
+                subgraph = graphs[0].subgraph(
+                    nodes=layer_nodes[0], eid=layer_eids[0])
+                node_feat = np.array(layer_nodes[0], dtype="int64")
+                subgraph.node_feat["index"] = node_feat
+
+            except Exception as e:
+                print(e)
+                if len(feed_dict) > 0:
+                    yield feed_dict
+                continue
+            feed_dict = graph_wrappers[0].to_feed(subgraph)
+
+            # only reindex from first subgraph
+            sub_src_idx = subgraph.reindex_from_parrent_nodes(batch_src)
+            sub_dst_idx = subgraph.reindex_from_parrent_nodes(batch_dst)
+
+            feed_dict["src_index"] = sub_src_idx.astype("int64")
+            feed_dict["dst_index"] = sub_dst_idx.astype("int64")
+            if predict:
+                feed_dict["node_id"] = batch_src.astype("int64")
+
+            if use_pyreader:
+                yield [feed_dict[name] for name in feed_name_list]
+            else:
+                yield feed_dict
+
+    return work
+
+
+def multiprocess_graph_reader(num_layers, graph_wrappers, data, batch_size,
+                              samples, num_workers, feed_name_list,
+                              use_pyreader, graph, predict):
+    """ multiprocess_graph_reader
+    """
+
+    def parse_to_subgraph(rd):
+        """ parse_to_subgraph
+        """
+
+        def work():
+            """ work
+            """
+            for data in rd():
+                yield data
+
+        return work
+
+    def reader():
+        """ reader
+        """
+        batch_info = list(batch_iter(data, batch_size=batch_size))
+        log.info("The size of batch:%d" % (len(batch_info)))
+        block_size = int(len(batch_info) / num_workers + 1)
+        reader_pool = []
+        for i in range(num_workers):
+            reader_pool.append(
+                worker(num_layers, batch_info[block_size * i:block_size * (
+                    i + 1)], graph_wrappers, samples, feed_name_list,
+                       use_pyreader, graph, predict))
+        use_pipe = True
+        multi_process_sample = mp_reader.multiprocess_reader(
+            reader_pool, use_pipe=use_pipe)
+        r = parse_to_subgraph(multi_process_sample)
+        if use_pipe:
+            return paddle.reader.buffered(r, 5 * num_workers)
+        else:
+            return r
+
+    return reader()
--- a/examples/unsup_graphsage/sample.txt
+++ b/examples/unsup_graphsage/sample.txt
+265	1599
+979	1790
+650	1488
+638	1310
+962	1916
+239	1958
+103	1763
+918	1874
+599	1924
+47	1691
+272	1978
+550	1583
+163	1142
+561	1458
+211	1447
+188	1529
+983	1039
+68	1923
+715	1900
+657	1555
+338	1937
+379	1409
+19	1978
+224	1420
+755	1499
+618	1172
+766	1294
+401	1188
+89	1257
+149	1048
+835	1526
+358	1858
+218	1187
+227	1022
+530	1643
+197	1255
+529	1672
+960	1558
+519	1176
+433	1093
+347	1495
+572	1877
+505	1047
+988	1587
+125	1249
+555	1942
+614	1586
+836	1681
+628	1076
+28	1693
+519	1398
+133	1136
+883	1493
+158	1441
+568	1928
+723	1585
+488	1331
+719	1471
+265	1113
+174	1799
+722	1226
+744	1467
+807	1075
+839	1393
+664	1380
+689	1552
+36	1864
+211	1611
+90	1444
+819	1428
+241	1551
+746	1599
+72	1098
+712	1787
+54	1575
+677	1485
+289	1007
+289	1079
+907	1144
+7	1983
+655	1272
+638	1047
+849	1957
+492	1278
+453	1304
+657	1807
+367	1002
+141	1346
+688	1450
+984	1749
+255	1240
+156	1625
+731	1051
+211	1922
+165	1805
+765	1054
+794	1555
+709	1747
+822	1099
+805	1774
+422	1240
+728	1679
+55	1299
+314	1808
+781	1689
+558	1605
+707	1110
+510	1705
+956	1064
+568	1132
+267	1257
+868	1269
+690	1453
+858	1602
+826	1373
+338	1650
+335	1453
+458	1340
+0	1818
+729	1694
+25	1816
+679	1109
+323	1609
+614	1457
+342	1028
+436	1081
+932	1139
+190	1821
+808	1623
+717	1267
+950	1265
+177	1956
+97	1380
+500	1744
+232	1582
+119	1015
+656	1462
+730	1007
+860	1142
+771	1989
+784	1623
+976	1084
+770	1642
+527	1515
+784	1943
+527	1578
+718	1396
+942	1089
+661	1705
+787	1800
+893	1932
+849	1395
+758	1482
+424	1148
+873	1470
+896	1333
+465	1021
+137	1507
+718	1027
+7	1045
+285	1932
+371	1468
+51	1692
+249	1358
+898	1858
+688	1213
+419	1289
+328	1326
+764	1786
+142	1399
+905	1738
+976	1295
+715	1537
+994	1393
+479	1291
+165	1560
+308	1446
+691	1728
+779	1162
+320	1989
+745	1579
+586	1426
+142	1517
+45	1317
+657	1339
+191	1780
+801	1216
+124	1414
+344	1717
+682	1383
+216	1891
+24	1759
+207	1080
+707	1699
+212	1606
+902	1435
+525	1174
+349	1299
+380	1840
+265	1294
+352	1390
+439	1410
+984	1481
+423	1499
+261	1484
+70	1033
+192	1909
+36	1960
+823	1109
+132	1418
+992	1257
+126	1548
+872	1488
+287	1645
+108	1836
+990	1314
+450	1119
+132	1549
+0	1003
+748	1373
+841	1475
+75	1987
+880	1458
+447	1443
+122	1385
+209	1022
+74	1724
+355	1688
+742	1892
+900	1092
+48	1220
+525	1221
+817	1010
+957	1212
+713	1558
+504	1851
+84	1860
+695	1187
+326	1524
+33	1647
+864	1637
+905	1637
+280	1617
+47	1034
+781	1137
+792	1319
+901	1850
+183	1511
+571	1725
+111	1957
+222	1030
+794	1169
+147	1973
+588	1789
+24	1581
+597	1471
+106	1786
+432	1146
+447	1325
+521	1444
+968	1417
+13	1075
+521	1478
+853	1294
+550	1550
+673	1426
+150	1684
+369	1737
+994	1038
+601	1397
+616	1400
+958	1028
+279	1177
+920	1180
+878	1584
+661	1852
+225	1631
+793	1401
+507	1289
+177	1818
+551	1836
+473	1065
+723	1383
+337	1938
+81	1601
+62	1139
+928	1853
+122	1946
+260	1289
+541	1378
+934	1069
+52	1311
+689	1420
+307	1862
+811	1691
+636	1885
+405	1883
+337	1132
+645	1261
+969	1224
+823	1106
+727	1066
+763	1126
+54	1168
+677	1750
+699	1223
+744	1183
+343	1883
+152	1440
+534	1665
+79	1853
+272	1581
+92	1309
+756	1884
+460	1305
+595	1868
+469	1904
+552	1067
+422	1318
+673	1843
+403	1174
+224	1445
+181	1566
+389	1618
+936	1479
+80	1002
+291	1611
+776	1201
+57	1495
+397	1053
+807	1810
+763	1374
+648	1054
+869	1432
+169	1083
+891	1318
+270	1200
+833	1663
+970	1653
+363	1637
+188	1192
+116	1751
+110	1035
+204	1216
+524	1995
+914	1426
+289	1814
+357	1521
+366	1808
+176	1775
+650	1959
+775	1062
+781	1712
+396	1798
+725	1577
+864	1497
+540	1188
+321	1623
+995	1622
+719	1299
+72	1656
+348	1728
+141	1547
+722	1095
+64	1689
+747	1143
+892	1758
+381	1463
+693	1199
+89	1555
+576	1313
+253	1809
+878	1466
+954	1776
+365	1366
+716	1351
+707	1441
+325	1167
+63	1385
+430	1225
+479	1159
+13	1185
+731	1653
+373	1529
+271	1904
+631	1111
+114	1758
+502	1983
+685	1261
+719	1932
+1	1646
+738	1698
+432	1294
+197	1463
+293	1626
+434	1457
+315	1481
+552	1877
+100	1103
+294	1569
+689	1377
+84	1142
+631	1935
+87	1508
+560	1358
+5	1787
+65	1877
+114	1948
+536	1435
+223	1753
+494	1230
+139	1335
+55	1306
+481	1253
+326	1662
+7	1171
+663	1992
+353	1586
+693	1397
+70	1498
+902	1897
+729	1627
+838	1296
+9	1528
+633	1988
+216	1535
+813	1534
+528	1061
+130	1705
+889	1019
+278	1810
+937	1399
+286	1498
+166	1574
+725	1506
+202	1018
+306	1420
+553	1717
+755	1731
+561	1619
+147	1981
+862	1065
+349	1219
+573	1137
+336	1871
+473	1511
+342	1051
+983	1181
+798	1663
+197	1930
+164	1477
+954	1083
+695	1879
+964	1046
+638	1817
+404	1886
+927	1211
+554	1115
+88	1417
+345	1165
+383	1551
+412	1484
+305	1532
+57	1380
+171	1550
+15	1082
+941	1507
+199	1774
+787	1953
+125	1398
+336	1958
+640	1851
+251	1127
+740	1306
+302	1217
+786	1014
+706	1811
+835	1851
+978	1262
+629	1944
+429	1202
+714	1954
+153	1381
+103	1759
+268	1286
+346	1808
+420	1343
+947	1467
+668	1857
+833	1736
+600	1008
+137	1649
+452	1985
+480	1545
+212	1182
+150	1726
+784	1217
+362	1595
+763	1365
+68	1395
+195	1041
+92	1599
+314	1397
+971	1003
+606	1914
+711	1706
+699	1056
+119	1593
+367	1476
+725	1098
+432	1234
+684	1255
+469	1606
+440	1086
+200	1848
+294	1144
+449	1888
+376	1225
+796	1352
+767	1447
+713	1845
+223	1333
+119	1797
+752	1927
+627	1464
+279	1488
+40	1562
+62	1149
+771	1058
+600	1911
+625	1164
+366	1416
+714	1530
+513	1935
+419	1485
+963	1665
+459	1648
+977	1522
+890	1521
+931	1566
+622	1838
+158	1958
+848	1520
+357	1275
+43	1440
+404	1772
+788	1930
+841	1832
+845	1281
+516	1121
+423	1130
+86	1619
+863	1928
+195	1789
+167	1944
+589	1093
+146	1206
+74	1133
+819	1445
+678	1004
+752	1725
+366	1604
+903	1738
+882	1858
+561	1195
+436	1980
+77	1894
+353	1879
+561	1166
+989	1964
+624	1013
+572	1704
+272	1077
+509	1242
+770	1001
+279	1392
+621	1924
+542	1766
+555	1951
+577	1598
+531	1148
+806	1401
+497	1115
+872	1309
+387	1880
+430	1485
+295	1175
+400	1774
+941	1522
+336	1032
+806	1873
+576	1422
+566	1974
+241	1847
+215	1645
+670	1804
+831	1834
+734	1091
+16	1641
+952	1975
+299	1587
+442	1032
+702	1341
+570	1405
+633	1651
+444	1731
+980	1774
+381	1729
+900	1661
+875	1274
+968	1095
+894	1805
+683	1961
+130	1549
+963	1350
+817	1864
+190	1281
+91	1657
+208	1194
+621	1911
+447	1338
+538	1343
+234	1534
+765	1920
+632	1263
+96	1090
+121	1659
+47	1975
+856	1354
+601	1061
+480	1236
+808	1487
+866	1999
+861	1892
+667	1124
+425	1307
+90	1002
+725	1337
+134	1749
+272	1587
+567	1276
+43	1332
+715	1084
+967	1477
+62	1731
+244	1540
+317	1112
+893	1108
+242	1443
+688	1544
+937	1475
+761	1912
+994	1219
+827	1193
+420	1966
+109	1691
+482	1767
+564	1146
+372	1215
+954	1348
+422	1045
+987	1040
+471	1247
+919	1824
+190	1615
+874	1879
+251	1198
+611	1575
+121	1733
+596	1950
+791	1492
+504	1201
+153	1680
+719	1967
+964	1095
+889	1106
+732	1770
+967	1631
+351	1061
+912	1835
+911	1925
+501	1502
+810	1406
+948	1718
+928	1080
+384	1940
+330	1301
+143	1081
+412	1649
+686	1840
+178	1544
+266	1121
+528	1714
+296	1156
+220	1753
+726	1679
+126	1416
+364	1424
+625	1539
+721	1708
+805	1639
+384	1157
+553	1693
+570	1877
+511	1984
+774	1254
+354	1949
+823	1162
+281	1204
+657	1774
+578	1943
+902	1764
+859	1063
+543	1845
+815	1052
+430	1118
+22	1210
+477	1586
+872	1692
+478	1943
+630	1850
+928	1247
+893	1126
+757	1774
+133	1275
+740	1101
+117	1200
+931	1120
+259	1184
+16	1782
+447	1131
+637	1498
+472	1859
+760	1877
+303	1511
+903	1074
+795	1227
+398	1450
+28	1339
+428	1891
+476	1680
+934	1409
+78	1737
+467	1075
+126	1830
+0	1421
+783	1357
+584	1061
+139	1166
+122	1768
+735	1219
+202	1684
+867	1405
+619	1176
+843	1833
+553	1239
+287	1080
+373	1780
+65	1816
+227	1871
+45	1701
+38	1281
+46	1077
+911	1708
+137	1478
+20	1550
+822	1631
+831	1527
+13	1001
+509	1096
+31	1751
+196	1123
+379	1614
+777	1288
+364	1222
+478	1070
+460	1580
+986	1340
+696	1498
+679	1139
+713	1343
+91	1691
+602	1696
+377	1770
+253	1021
+957	1179
+500	1423
+487	1281
+821	1652
+180	1122
+443	1247
+583	1289
+676	1258
+781	1693
+718	1500
+832	1662
+555	1029
+575	1595
+145	1801
+471	1769
+491	1388
+269	1241
+159	1428
+631	1698
+478	1268
+925	1141
+583	1096
+759	1592
+967	1352
+862	1444
+119	1991
+534	1602
+526	1226
+880	1614
+236	1615
+448	1600
+752	1041
+25	1127
+445	1853
+414	1058
+127	1913
+512	1080
+158	1522
+787	1287
+664	1744
+914	1335
+899	1630
+187	1279
+951	1942
+884	1777
+529	1937
+395	1590
+478	1066
+790	1518
+286	1614
+640	1528
+882	1707
+102	1303
+716	1794
+919	1605
+859	1759
+236	1321
+858	1608
+732	1506
+435	1263
+93	1508
+813	1260
+640	1668
+607	1185
+402	1039
+943	1569
+523	1415
+511	1786
+637	1934
+10	1885
+507	1375
+544	1988
+709	1537
+342	1717
+324	1393
+216	1090
+788	1753
+362	1308
+64	1576
+811	1726
+555	1636
+944	1715
+259	1251
+141	1888
+48	1290
+570	1331
+957	1104
+223	1233
+494	1531
+423	1433
+151	1266
+704	1002
+694	1685
+740	1001
+174	1537
+947	1359
+49	1891
+875	1386
+274	1621
+918	1610
+631	1564
+961	1960
+702	1642
+871	1489
+384	1642
+932	1559
+886	1097
+842	1143
+950	1971
+83	1986
+944	1135
+168	1923
+900	1611
+684	1389
+540	1749
+123	1265
+673	1617
+952	1921
+767	1401
+696	1941
+868	1536
+515	1953
+438	1757
+430	1411
+661	1193
+527	1882
+147	1145
+225	1101
+710	1671
+579	1255
+30	1920
+906	1298
+333	1635
+214	1127
+362	1189
+878	1530
+808	1842
+419	1559
+861	1291
+743	1043
+333	1257
+186	1604
+141	1957
+751	1236
+573	1937
+908	1460
+627	1155
+726	1885
+332	1888
+267	1040
+28	1660
+194	1200
+971	1788
+861	1122
+582	1397
+176	1091
+397	1678
+730	1307
+309	1860
+881	1255
+701	1068
+750	1103
+755	1843
+834	1786
+900	1837
+433	1601
+897	1464
+593	1661
+451	1638
+953	1101
+122	1123
+220	1792
+35	1933
+726	1751
+715	1411
+662	1307
+197	1322
+125	1658
+478	1700
+772	1881
+547	1822
+910	1280
+924	1933
+79	1740
+466	1567
+53	1768
+500	1502
+572	1048
+751	1194
+18	1187
+374	1480
+158	1135
+712	1686
+171	1466
+25	1036
+144	1847
+664	1937
+301	1129
+641	1880
+147	1709
+885	1911
+631	1910
+338	1914
+628	1257
+909	1333
+970	1790
+971	1691
+260	1724
+693	1946
+857	1056
+918	1053
+612	1838
+479	1407
+626	1359
+273	1709
+633	1008
+364	1434
+393	1873
+294	1300
+657	1988
+355	1639
+635	1468
+914	1350
+916	1148
+305	1381
+131	1748
+756	1484
+758	1203
+825	1062
+152	1209
+441	1164
+63	1885
+864	1797
+165	1036
+124	1548
+246	1053
+810	1398
+127	1091
+277	1028
+860	1069
+700	1933
+338	1962
+211	1770
+809	1483
+489	1507
+123	1382
+669	1030
+180	1996
+972	1922
+723	1670
+647	1683
+422	1440
+391	1204
+178	1071
+421	1598
+729	1466
+339	1403
+419	1326
+407	1011
+479	1867
+722	1076
+662	1802
+110	1438
+759	1868
+22	1458
+725	1648
+958	1753
+814	1656
+673	1044
+962	1020
+475	1523
+882	1513
+802	1227
+863	1121
+772	1677
+714	1072
+112	1047
+422	1664
+419	1718
+60	1864
+570	1683
+536	1673
+581	1789
+894	1074
+739	1311
+805	1863
+861	1750
+55	1748
+47	1833
+101	1108
+872	1008
+926	1907
+909	1021
+53	1233
+617	1349
+674	1909
+507	1567
+855	1723
+690	1171
+973	1859
+686	1210
+49	1435
+146	1915
+357	1620
+208	1724
+76	1583
+133	1191
+619	1426
+190	1497
+228	1868
+365	1144
+360	1770
+329	1142
+672	1408
+91	1997
+986	1299
+654	1333
+93	1475
+146	1307
+62	1772
+502	1058
+382	1427
+181	1739
+74	1104
+170	1684
+466	1861
+147	1747
+162	1027
+499	1903
+813	1621
+591	1379
+227	1518
+110	1999
+781	1791
+415	1744
+257	1846
+942	1601
+628	1696
+317	1001
+27	1681
+80	1078
+794	1279
+330	1237
+830	1994
+728	1673
+204	1943
+295	1422
+159	1499
+207	1019
+110	1497
+439	1526
+201	1323
+620	1723
+501	1157
+305	1604
+878	1784
+483	1653
+262	1539
+21	1967
+191	1836
+199	1821
+500	1910
+232	1499
+104	1750
+868	1607
+288	1013
+434	1368
+874	1055
+870	1257
+219	1143
+990	1924
+70	1764
+207	1575
+1	1364
+405	1498
+414	1507
+65	1704
+868	1415
+256	1962
+886	1425
+834	1587
+770	1842
+74	1070
+778	1750
+550	1592
+484	1948
+669	1401
+610	1909
+480	1784
+182	1147
+842	1670
+272	1923
+371	1407
+574	1985
+978	1300
+369	1286
+884	1459
+322	1261
+456	1418
+261	1718
+330	1708
+83	1249
+473	1188
+542	1281
+551	1262
+801	1288
+372	1574
+676	1927
+44	1222
+190	1020
+284	1513
+866	1845
+828	1977
+620	1854
+288	1086
+367	1606
+71	1770
+114	1316
+571	1850
+224	1272
+406	1095
+902	1571
+576	1886
+576	1562
+767	1443
+644	1201
+295	1009
+944	1751
+90	1708
+663	1042
+283	1708
+758	1027
+851	1684
+537	1204
+271	1697
+541	1885
+973	1218
+694	1904
+822	1999
+194	1872
+276	1297
+909	1886
+312	1706
+516	1473
+844	1236
+62	1617
+366	1866
+127	1474
+743	1215
+286	1096
+87	1795
+69	1711
+757	1530
+333	1844
+257	1796
+515	1491
+66	1851
+117	1510
+18	1967
+553	1979
+267	1060
+99	1321
+861	1155
+506	1067
+944	1727
+964	1171
+329	1159
+856	1018
+858	1931
+765	1617
+951	1457
+903	1184
+241	1717
+285	1533
+320	1286
+409	1400
+924	1999
+719	1501
+14	1550
+866	1246
+86	1987
+868	1551
+620	1495
+285	1918
+810	1733
+754	1871
+755	1418
+394	1528
+839	1856
+927	1964
+321	1381
+758	1337
+635	1986
+404	1038
+854	1124
+600	1507
+342	1517
+756	1567
+498	1350
+944	1048
+481	1899
+904	1335
+412	1492
+218	1021
+636	1556
+417	1354
+116	1960
+173	1267
+525	1086
+312	1389
+973	1064
+619	1103
+987	1394
+447	1188
+862	1969
+930	1485
+419	1157
+756	1787
+860	1821
+58	1662
+353	1437
+345	1290
+753	1889
+412	1688
+37	1319
+753	1201
+136	1253
+949	1592
+459	1756
+976	1522
+450	1868
+936	1384
+393	1653
+385	1936
+704	1840
+616	1709
+786	1438
+291	1830
+848	1112
+975	1595
+967	1231
+741	1672
+160	1217
+254	1634
+530	1610
+0	1445
+170	1236
+164	1316
+127	1330
+302	1627
+953	1449
+156	1583
+784	1210
+226	1551
+397	1325
+564	1825
+42	1027
+725	1612
+114	1802
+483	1384
+684	1352
+463	1908
+978	1226
+445	1217
+800	1969
+556	1274
+49	1049
+777	1808
+732	1982
+749	1590
+574	1433
+462	1515
+637	1702
+344	1224
+489	1586
+45	1242
+755	1144
+716	1293
+319	1595
+831	1657
+154	1562
+396	1814
+657	1704
+442	1405
+898	1698
+970	1287
+967	1068
+25	1761
+211	1183
+691	1905
+466	1116
+99	1521
+834	1871
+408	1809
+8	1007
+483	1336
+485	1896
+849	1467
+192	1341
+779	1801
+678	1596
+276	1051
+709	1252
+759	1656
+27	1621
+273	1911
+697	1898
+450	1995
+688	1717
+52	1966
+920	1957
+437	1549
+533	1627
+130	1315
+392	1676
+73	1886
+650	1254
+352	1079
+165	1930
+388	1236
+426	1370
+625	1648
+457	1858
+17	1109
+926	1431
+853	1530
+90	1766
+586	1275
+894	1244
+331	1469
+447	1183
+132	1167
+230	1198
+501	1240
+440	1100
+58	1665
+85	1864
+913	1448
+738	1041
+486	1012
+162	1767
+877	1060
+10	1485
+514	1807
+224	1453
+781	1340
+311	1645
+720	1837
+259	1252
+54	1174
+788	1926
+375	1440
+23	1880
+977	1632
+389	1445
+38	1508
+517	1927
+798	1598
+483	1391
+541	1788
+46	1329
+816	1758
+158	1317
+900	1577
+369	1255
+227	1795
+37	1630
+813	1565
+965	1663
+953	1963
+503	1221
+223	1064
+161	1498
+717	1855
+527	1349
+773	1813
+522	1630
+767	1275
+582	1305
+541	1563
+79	1403
+794	1544
+74	1161
+548	1543
+18	1739
+516	1516
+697	1422
+259	1840
+195	1273
+412	1222
+571	1301
+203	1914
+420	1256
+327	1277
+894	1315
+929	1302
+773	1429
+302	1309
+488	1728
+403	1256
+549	1342
+940	1764
+524	1226
+409	1076
+233	1421
+753	1667
+664	1257
+359	1079
+291	1973
+199	1373
+654	1498
+645	1074
+481	1607
+432	1852
+692	1206
+498	1726
+586	1249
+555	1338
+107	1563
+473	1300
+51	1031
+345	1236
+757	1907
+548	1088
+680	1430
+349	1468
+435	1451
+884	1301
+683	1645
+280	1388
+84	1393
+585	1561
+86	1338
+261	1972
+941	1523
+306	1697
+718	1192
+930	1121
+726	1639
+617	1399
+939	1184
+511	1084
+832	1662
+377	1881
+371	1725
+393	1653
+415	1528
+254	1572
+927	1447
+848	1355
+797	1983
+613	1417
+127	1835
+715	1471
+974	1999
+355	1178
+675	1820
+415	1601
+593	1186
+648	1907
+922	1931
+859	1828
+110	1809
+547	1809
+944	1841
+106	1446
+635	1762
+866	1431
+199	1373
+595	1454
+991	1626
+903	1720
+989	1465
+509	1506
+168	1653
+742	1892
+644	1457
+972	1046
+87	1807
+79	1596
+24	1470
+313	1732
+772	1976
+226	1146
+835	1835
+107	1057
+430	1719
+203	1810
+643	1477
+30	1918
+889	1216
+750	1501
+180	1660
+71	1463
+966	1588
+261	1858
+829	1804
+774	1379
+342	1765
+328	1943
+296	1939
+937	1444
+628	1407
+0	1977
+233	1097
+359	1438
+910	1911
+963	1026
+942	1483
+706	1997
+682	1974
+900	1513
+298	1463
+893	1855
+322	1360
+604	1122
+948	1091
+828	1158
+682	1198
+466	1781
+661	1031
+884	1744
+891	1299
+688	1266
+89	1325
+3	1026
+299	1861
+413	1062
+775	1812
+560	1926
+799	1473
+936	1445
+537	1718
+591	1680
+202	1140
+906	1163
+977	1709
+482	1904
+345	1181
+486	1502
+445	1292
+305	1328
+87	1851
+803	1197
+94	1937
+574	1546
+643	1302
+704	1633
+536	1238
+329	1663
+737	1969
+663	1278
+335	1416
+873	1390
+705	1607
+139	1436
+740	1904
+974	1321
+338	1350
+694	1456
+779	1035
+639	1238
+603	1768
+245	1363
+390	1329
+141	1680
+483	1613
+226	1632
+820	1303
+424	1655
+54	1618
+399	1297
+130	1295
+169	1996
+78	1455
+525	1409
+741	1860
+887	1664
+347	1878
+391	1343
+66	1243
+287	1876
+35	1750
+492	1261
+789	1404
+917	1041
+937	1756
+69	1239
+218	1981
+142	1382
+882	1052
+757	1290
+178	1593
+962	1504
+781	1090
+648	1912
+207	1551
+472	1372
+937	1427
+37	1270
+511	1721
+208	1491
+299	1193
+167	1718
+781	1100
+689	1177
+732	1202
+852	1665
+556	1152
+256	1908
+261	1473
+918	1941
+755	1786
+77	1062
+208	1633
+451	1502
+181	1513
+311	1571
+240	1404
+470	1720
+913	1239
+947	1553
+706	1158
+215	1968
+912	1213
+684	1117
+560	1825
+787	1083
+764	1654
+566	1252
+238	1959
+953	1954
+985	1437
+835	1434
+88	1896
+469	1447
+655	1672
+760	1631
+919	1516
+683	1698
+811	1123
+911	1961
+302	1273
+344	1399
+89	1289
+936	1236
+395	1575
+417	1981
+10	1115
+878	1839
+213	1171
+484	1475
+460	1901
+708	1299
+320	1544
+965	1375
+451	1144
+116	1959
+143	1384
+843	1051
+368	1953
+994	1141
+704	1641
+385	1729
+240	1851
+967	1306
+719	1878
+726	1439
+550	1613
+261	1660
+550	1511
+154	1782
+12	1087
+328	1120
+618	1763
+422	1667
+519	1854
+639	1719
+942	1705
+814	1893
+576	1491
+139	1499
+422	1956
+95	1082
+676	1262
+287	1965
+60	1867
+713	1444
+435	1021
+606	1042
+86	1891
+58	1035
+311	1320
+140	1463
+82	1415
+756	1991
+505	1140
+510	1982
+701	1579
+428	1787
+388	1279
+446	1709
+222	1060
+550	1363
+798	1691
+219	1181
+137	1225
+828	1955
+721	1417
+82	1675
+854	1649
+203	1355
+352	1560
+582	1633
+118	1858
+771	1304
+321	1251
+392	1206
+958	1070
+684	1713
+939	1999
+592	1726
+56	1867
+592	1988
+736	1842
+958	1559
+989	1906
+183	1749
+462	1407
+294	1890
+771	1725
+1	1897
+49	1062
+124	1558
+575	1327
+506	1243
+154	1403
+672	1573
+423	1160
+222	1950
+67	1904
+664	1802
+585	1438
+327	1353
+284	1803
+369	1251
+291	1294
+61	1509
+551	1861
+938	1061
+765	1678
+509	1323
+145	1822
+887	1975
+768	1646
+610	1140
+690	1793
+763	1262
+96	1287
+837	1876
+632	1819
+747	1141
+71	1442
+561	1709
+290	1050
+514	1106
+87	1416
+762	1666
+83	1070
+467	1271
+7	1152
+472	1509
+861	1016
+913	1109
+934	1154
+288	1197
+175	1244
+588	1960
+316	1946
+543	1882
+359	1614
+465	1779
+892	1726
+695	1531
+542	1461
+288	1190
+966	1558
+736	1064
+997	1750
+885	1427
+888	1064
+342	1553
+77	1234
+845	1636
+407	1181
+354	1114
+670	1836
+69	1065
+12	1432
+982	1944
+837	1518
+231	1274
+2	1155
+423	1136
+377	1012
+353	1203
+257	1205
+350	1753
+479	1238
+324	1619
+705	1382
+236	1249
+695	1195
+213	1906
+231	1368
+819	1392
+509	1785
+661	1546
+210	1123
+873	1301
+363	1029
+216	1998
+240	1351
+667	1195
+515	1136
+230	1779
+385	1750
+574	1432
+435	1830
+804	1902
+249	1360
+303	1158
+969	1732
+249	1526
+159	1575
+139	1833
+347	1342
+661	1731
+887	1859
+19	1001
+748	1763
+829	1878
+828	1086
+835	1791
+895	1387
+326	1003
+568	1049
+485	1750
+760	1171
+414	1394
+987	1379
+851	1857
+8	1594
+76	1655
+363	1189
+90	1630
+976	1005
+57	1457
+886	1166
+29	1658
+543	1710
+379	1142
+499	1112
+177	1843
+746	1808
+454	1523
+676	1465
+762	1980
+309	1286
+74	1330
+359	1949
+781	1590
+874	1658
+455	1770
+790	1487
+651	1249
+855	1143
+386	1439
+298	1007
+2	1028
+217	1428
+318	1191
+968	1588
+5	1329
+625	1475
+140	1718
+401	1543
+936	1260
+311	1625
+711	1886
+832	1395
+114	1259
+782	1156
+434	1891
+539	1855
+448	1748
+199	1518
+735	1380
+908	1798
+301	1759
+876	1155
+63	1637
+739	1461
+558	1305
+533	1177
+801	1914
+97	1422
+423	1377
+920	1775
+215	1512
+691	1628
+905	1824
+540	1573
+567	1285
+573	1665
--- a/examples/unsup_graphsage/train.py
+++ b/examples/unsup_graphsage/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""train.py
+"""
+import argparse
+import time
+import glob
+import os
+
+import numpy as np
+
+import pgl
+from pgl.utils.logger import log
+from pgl.utils import paddle_helper
+import paddle
+import paddle.fluid as fluid
+import tqdm
+
+import reader
+import model
+
+
+def get_layer(layer_type, gw, feature, hidden_size, act, name, is_test=False):
+    """get_layer"""
+    return getattr(model, layer_type)(gw, feature, hidden_size, act, name)
+
+
+def load_pos_neg(data_path):
+    """load_pos_neg"""
+    train_eid = []
+    train_src = []
+    train_dst = []
+    with open(data_path) as f:
+        eid = 0
+        for idx, line in tqdm.tqdm(enumerate(f)):
+            src, dst = line.strip().split('\t')
+            train_src.append(int(src))
+            train_dst.append(int(dst))
+            train_eid.append(int(eid))
+            eid += 1
+    # concate the the pos data and neg data
+    train_eid = np.array(train_eid, dtype="int64")
+    train_src = np.array(train_src, dtype="int64")
+    train_dst = np.array(train_dst, dtype="int64")
+
+    returns = {"train_data": (train_src, train_dst, train_eid), }
+    return returns
+
+
+def binary_op(u_embed, v_embed, binary_op_type):
+    """binary_op"""
+    if binary_op_type == "Average":
+        edge_embed = (u_embed + v_embed) / 2
+    elif binary_op_type == "Hadamard":
+        edge_embed = u_embed * v_embed
+    elif binary_op_type == "Weighted-L1":
+        edge_embed = fluid.layers.abs(u_embed - v_embed)
+    elif binary_op_type == "Weighted-L2":
+        edge_embed = (u_embed - v_embed) * (u_embed - v_embed)
+    else:
+        raise ValueError(binary_op_type + " binary_op_type doesn't exists")
+    return edge_embed
+
+
+class RetDict(object):
+    """RetDict"""
+    pass
+
+
+def build_graph_model(args):
+    """build_graph_model"""
+    node_feature_info = [('index', [None], np.dtype('int64'))]
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    graph_wrappers = []
+    feed_list = []
+
+    graph_wrappers.append(
+        pgl.graph_wrapper.GraphWrapper(
+            "layer_0", fluid.CPUPlace(), node_feat=node_feature_info))
+    #edge_feat=[("f", [None, 1], "float32")]))
+
+    num_embed = args.num_nodes
+
+    num_layers = args.num_layers
+
+    src_index = fluid.layers.data(
+        "src_index", shape=[None], dtype="int64", append_batch_size=False)
+
+    dst_index = fluid.layers.data(
+        "dst_index", shape=[None], dtype="int64", append_batch_size=False)
+
+    feature = fluid.layers.embedding(
+        input=fluid.layers.reshape(graph_wrappers[0].node_feat['index'],
+                                   [-1, 1]),
+        size=(num_embed + 1, args.hidden_size),
+        is_sparse=args.is_sparse,
+        is_distributed=args.is_distributed)
+
+    features = [feature]
+    ret_dict = RetDict()
+    ret_dict.graph_wrappers = graph_wrappers
+    edge_data = [src_index, dst_index]
+    feed_list.extend(edge_data)
+    ret_dict.feed_list = feed_list
+
+    for i in range(num_layers):
+        if i == num_layers - 1:
+            act = None
+        else:
+            act = "leaky_relu"
+        feature = get_layer(
+            args.layer_type,
+            graph_wrappers[0],
+            feature,
+            args.hidden_size,
+            act,
+            name="%s_%s" % (args.layer_type, i))
+        features.append(feature)
+
+    src_feat = fluid.layers.gather(features[-1], src_index)
+    src_feat = fluid.layers.fc(src_feat,
+                               args.hidden_size,
+                               bias_attr=None,
+                               param_attr=fluid.ParamAttr(name="feat"))
+    dst_feat = fluid.layers.gather(features[-1], dst_index)
+    dst_feat = fluid.layers.fc(dst_feat,
+                               args.hidden_size,
+                               bias_attr=None,
+                               param_attr=fluid.ParamAttr(name="feat"))
+    if args.phase == "predict":
+        node_id = fluid.layers.data(
+            "node_id", shape=[None, 1], dtype="int64", append_batch_size=False)
+        ret_dict.src_feat = src_feat
+        ret_dict.dst_feat = dst_feat
+        ret_dict.id = node_id
+        return ret_dict
+
+    batch_size = args.batch_size
+    batch_negative_label = fluid.layers.reshape(
+        fluid.layers.range(0, batch_size, 1, "int64"), [-1, 1])
+    batch_negative_label = fluid.layers.one_hot(batch_negative_label,
+                                                batch_size)
+    batch_loss_weight = (batch_negative_label *
+                         (batch_size - 2) + 1.0) / (batch_size - 1)
+    batch_loss_weight.stop_gradient = True
+    batch_negative_label = batch_negative_label
+    batch_negative_label = fluid.layers.cast(
+        batch_negative_label, dtype="float32")
+    batch_negative_label.stop_gradient = True
+
+    cos_theta = fluid.layers.matmul(src_feat, dst_feat, transpose_y=True)
+
+    # Calc Loss
+    loss = fluid.layers.sigmoid_cross_entropy_with_logits(
+        x=cos_theta, label=batch_negative_label)
+    loss = loss * batch_loss_weight
+    #loss = fluid.layers.reduce_sum(loss, -1)
+    loss = fluid.layers.mean(loss)
+
+    # Calc AUC
+    proba = fluid.layers.sigmoid(cos_theta)
+    proba = fluid.layers.reshape(proba, [-1, 1])
+    proba = fluid.layers.concat([proba * -1 + 1, proba], axis=1)
+    gold_label = fluid.layers.reshape(batch_negative_label, [-1, 1])
+    gold_label = fluid.layers.cast(gold_label, "int64")
+    auc, batch_auc_out, [batch_stat_pos, batch_stat_neg, stat_pos, stat_neg] = \
+         fluid.layers.auc(input=proba, label=gold_label, curve='ROC', )
+
+    ret_dict.loss = loss
+    ret_dict.auc = batch_auc_out
+    return ret_dict
+
+
+def run_epoch(
+        py_reader,
+        exe,
+        program,
+        prefix,
+        model_dict,
+        epoch,
+        batch_size,
+        log_per_step=100,
+        save_per_step=10000, ):
+    """run_epoch"""
+    batch = 0
+    start = time.time()
+
+    batch_end = time.time()
+
+    for batch_feed_dict in py_reader():
+        if prefix == "train":
+            if batch_feed_dict["src_index"].shape[0] != batch_size:
+                log.warning(
+                    'batch_feed_dict["src_index"].shape[0] != 1024, continue')
+                continue
+        batch_start = time.time()
+        batch += 1
+        batch_loss, batch_auc = exe.run(
+            program,
+            feed=batch_feed_dict,
+            fetch_list=[model_dict.loss.name, model_dict.auc.name])
+
+        batch_end = time.time()
+        if batch % log_per_step == 0:
+            log.info(
+                "Batch %s %s-Loss %s \t %s-Auc  %s \t Speed(per batch) %.5lf sec"
+                % (batch, prefix, np.mean(batch_loss), prefix,
+                   np.mean(batch_auc), batch_end - batch_start))
+        if batch != 0 and batch % save_per_step == 0:
+            fluid.io.save_params(
+                exe, dirname='checkpoint', main_program=program)
+    fluid.io.save_params(exe, dirname='checkpoint', main_program=program)
+
+
+def run_predict_epoch(py_reader,
+                      exe,
+                      program,
+                      prefix,
+                      model_dict,
+                      num_nodes,
+                      hidden_size,
+                      log_per_step=100):
+    """run_predict_epoch"""
+    batch = 0
+    start = time.time()
+    #use the parallel executor to speed up
+    batch_end = time.time()
+    all_feat = np.zeros((num_nodes, hidden_size), dtype="float32")
+
+    for batch_feed_dict in tqdm.tqdm(py_reader()):
+        batch_start = time.time()
+        batch += 1
+        batch_src_feat, batch_id = exe.run(
+            program,
+            feed=batch_feed_dict,
+            fetch_list=[model_dict.src_feat.name, model_dict.id.name])
+
+        for ind, id in enumerate(batch_id):
+            all_feat[id] = batch_src_feat[ind]
+    np.save("emb.npy", all_feat)
+
+
+def main(args):
+    """main"""
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+
+    with fluid.program_guard(train_program, startup_program):
+        ret_dict = build_graph_model(args=args)
+
+    val_program = train_program.clone(for_test=True)
+    if args.phase == "train":
+        with fluid.program_guard(train_program, startup_program):
+            adam = fluid.optimizer.Adam(learning_rate=args.lr)
+            adam.minimize(ret_dict.loss)
+    # reset the place according to role of parameter server
+    exe.run(startup_program)
+
+    with open(args.data_path) as f:
+        log.info("Begin Load Graph")
+        src = []
+        dst = []
+        for idx, line in tqdm.tqdm(enumerate(f)):
+            s, d = line.strip().split()
+            src.append(s)
+            dst.append(d)
+            dst.append(s)
+            src.append(d)
+    src = np.array(src, dtype="int64").reshape(-1, 1)
+    dst = np.array(dst, dtype="int64").reshape(-1, 1)
+    edges = np.hstack([src, dst])
+
+    log.info("Begin Build Index")
+    ret_dict.graph = pgl.graph.Graph(num_nodes=args.num_nodes, edges=edges)
+    ret_dict.graph.indegree()
+    log.info("End Build Index")
+
+    if args.phase == "train":
+        #just the worker, load the sample
+        data = load_pos_neg(args.data_path)
+
+        feed_name_list = [var.name for var in ret_dict.feed_list]
+        train_iter = reader.graph_reader(
+            args.num_layers,
+            ret_dict.graph_wrappers,
+            batch_size=args.batch_size,
+            data=data['train_data'],
+            samples=args.samples,
+            num_workers=args.sample_workers,
+            feed_name_list=feed_name_list,
+            use_pyreader=args.use_pyreader,
+            graph=ret_dict.graph)
+
+        # get PyReader 
+        for epoch in range(args.epoch):
+            epoch_start = time.time()
+            try:
+                run_epoch(
+                    train_iter,
+                    program=train_program,
+                    exe=exe,
+                    prefix="train",
+                    model_dict=ret_dict,
+                    epoch=epoch,
+                    batch_size=args.batch_size,
+                    log_per_step=1)
+                epoch_end = time.time()
+                print("Epoch: {0}, Train total expend: {1} ".format(
+                    epoch, epoch_end - epoch_start))
+            except Exception as e:
+                log.info("Run Epoch Error %s" % e)
+            fluid.io.save_params(
+                exe,
+                dirname=args.checkpoint + '_%s' % (epoch + 1),
+                main_program=train_program)
+
+            log.info("EPOCH END")
+
+        log.info("RUN FINISH")
+    elif args.phase == "predict":
+        fluid.io.load_params(
+            exe,
+            dirname=args.checkpoint + '_%s' % args.epoch,
+            main_program=val_program)
+        test_src = np.arange(0, args.num_nodes, dtype="int64")
+        feed_name_list = [var.name for var in ret_dict.feed_list]
+        predict_iter = reader.graph_reader(
+            args.num_layers,
+            ret_dict.graph_wrappers,
+            batch_size=args.batch_size,
+            data=(test_src, test_src, test_src),
+            samples=args.samples,
+            num_workers=args.sample_workers,
+            feed_name_list=feed_name_list,
+            use_pyreader=args.use_pyreader,
+            graph=ret_dict.graph,
+            predict=True)
+        run_predict_epoch(
+            predict_iter,
+            program=val_program,
+            exe=exe,
+            prefix="predict",
+            hidden_size=args.hidden_size,
+            model_dict=ret_dict,
+            num_nodes=args.num_nodes,
+            log_per_step=100)
+        log.info("EPOCH END")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='graphsage')
+    parser.add_argument(
+        "--use_cuda", action='store_true', help="use_cuda", default=False)
+    parser.add_argument("--layer_type", type=str, default="graphsage_mean")
+    parser.add_argument("--epoch", type=int, default=1)
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--batch_size", type=int, default=1024)
+    parser.add_argument("--lr", type=float, default=0.001)
+    parser.add_argument("--num_layers", type=int, default=2)
+    parser.add_argument("--data_path", type=str, required=True)
+    parser.add_argument("--checkpoint", type=str, default="model_ckpt")
+    parser.add_argument("--cache_path", type=str, default="./tmp")
+    parser.add_argument("--phase", type=str, default="train")
+    parser.add_argument("--digraph", action='store_true', default=False)
+    parser.add_argument('--samples', nargs='+', type=int, default=[10, 10])
+    parser.add_argument("--sample_workers", type=int, default=10)
+    parser.add_argument("--num_nodes", type=int, required=True)
+    parser.add_argument("--is_sparse", action='store_true', default=False)
+    parser.add_argument("--is_distributed", action='store_true', default=False)
+    parser.add_argument("--real_graph", action='store_true', default=True)
+    parser.add_argument("--use_pyreader", action='store_true', default=False)
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/ogb_examples/graphproppred/main_pgl.py
+++ b/ogb_examples/graphproppred/main_pgl.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""test ogb
+"""
+import argparse
+
+import pgl
+import numpy as np
+import paddle.fluid as fluid
+from pgl.contrib.ogb.graphproppred.dataset_pgl import PglGraphPropPredDataset
+from pgl.utils import paddle_helper
+from ogb.graphproppred import Evaluator
+from pgl.contrib.ogb.graphproppred.mol_encoder import AtomEncoder, BondEncoder
+
+
+def train(exe, batch_size, graph_wrapper, train_program, splitted_idx, dataset,
+          evaluator, fetch_loss, fetch_pred):
+    """Train"""
+    graphs, labels = dataset[splitted_idx["train"]]
+    perm = np.arange(0, len(graphs))
+    np.random.shuffle(perm)
+    start_batch = 0
+    batch_no = 0
+    pred_output = np.zeros_like(labels, dtype="float32")
+    while start_batch < len(perm):
+        batch_index = perm[start_batch:start_batch + batch_size]
+        start_batch += batch_size
+        batch_graph = pgl.graph.MultiGraph(graphs[batch_index])
+        batch_label = labels[batch_index]
+        batch_valid = (batch_label == batch_label).astype("float32")
+        batch_label = np.nan_to_num(batch_label).astype("float32")
+        feed_dict = graph_wrapper.to_feed(batch_graph)
+        feed_dict["label"] = batch_label
+        feed_dict["weight"] = batch_valid
+        loss, pred = exe.run(train_program,
+                             feed=feed_dict,
+                             fetch_list=[fetch_loss, fetch_pred])
+        pred_output[batch_index] = pred
+        batch_no += 1
+    print("train", evaluator.eval({"y_true": labels, "y_pred": pred_output}))
+
+
+def evaluate(exe, batch_size, graph_wrapper, val_program, splitted_idx,
+             dataset, mode, evaluator, fetch_pred):
+    """Eval"""
+    graphs, labels = dataset[splitted_idx[mode]]
+    perm = np.arange(0, len(graphs))
+    start_batch = 0
+    batch_no = 0
+    pred_output = np.zeros_like(labels, dtype="float32")
+    while start_batch < len(perm):
+        batch_index = perm[start_batch:start_batch + batch_size]
+        start_batch += batch_size
+        batch_graph = pgl.graph.MultiGraph(graphs[batch_index])
+        feed_dict = graph_wrapper.to_feed(batch_graph)
+        pred = exe.run(val_program, feed=feed_dict, fetch_list=[fetch_pred])
+        pred_output[batch_index] = pred[0]
+        batch_no += 1
+    print(mode, evaluator.eval({"y_true": labels, "y_pred": pred_output}))
+
+
+def send_func(src_feat, dst_feat, edge_feat):
+    """Send"""
+    return src_feat["h"] + edge_feat["h"]
+
+
+class GNNModel(object):
+    """GNNModel"""
+
+    def __init__(self, name, emb_dim, num_task, num_layers):
+        self.num_task = num_task
+        self.emb_dim = emb_dim
+        self.num_layers = num_layers
+        self.name = name
+        self.atom_encoder = AtomEncoder(name=name, emb_dim=emb_dim)
+        self.bond_encoder = BondEncoder(name=name, emb_dim=emb_dim)
+
+    def forward(self, graph):
+        """foward"""
+        h_node = self.atom_encoder(graph.node_feat['feat'])
+        h_edge = self.bond_encoder(graph.edge_feat['feat'])
+        for layer in range(self.num_layers):
+            msg = graph.send(
+                send_func,
+                nfeat_list=[("h", h_node)],
+                efeat_list=[("h", h_edge)])
+            h_node = graph.recv(msg, 'sum') + h_node
+            h_node = fluid.layers.fc(h_node,
+                                     size=self.emb_dim,
+                                     name=self.name + '_%s' % layer,
+                                     act="relu")
+        graph_nodes = pgl.layers.graph_pooling(graph, h_node, "average")
+        graph_pred = fluid.layers.fc(graph_nodes, self.num_task, name="final")
+        return graph_pred
+
+
+def main():
+    """main
+    """
+    # Training settings
+    parser = argparse.ArgumentParser(description='Graph Dataset')
+    parser.add_argument(
+        '--epochs',
+        type=int,
+        default=100,
+        help='number of epochs to train (default: 100)')
+    parser.add_argument(
+        '--dataset',
+        type=str,
+        default="ogbg-mol-tox21",
+        help='dataset name (default: proteinfunc)')
+    args = parser.parse_args()
+
+    place = fluid.CPUPlace()  # Dataset too big to use GPU
+
+    ### automatic dataloading and splitting
+    dataset = PglGraphPropPredDataset(name=args.dataset)
+    splitted_idx = dataset.get_idx_split()
+
+    ### automatic evaluator. takes dataset name as input
+    evaluator = Evaluator(args.dataset)
+
+    graph_data, label = dataset[:2]
+    batch_graph = pgl.graph.MultiGraph(graph_data)
+    graph_data = batch_graph
+
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    test_program = fluid.Program()
+    # degree normalize
+    graph_data.edge_feat["feat"] = graph_data.edge_feat["feat"].astype("int64")
+    graph_data.node_feat["feat"] = graph_data.node_feat["feat"].astype("int64")
+
+    model = GNNModel(
+        name="gnn", num_task=dataset.num_tasks, emb_dim=64, num_layers=2)
+
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.GraphWrapper(
+            "graph",
+            place=place,
+            node_feat=graph_data.node_feat_info(),
+            edge_feat=graph_data.edge_feat_info())
+        pred = model.forward(gw)
+        sigmoid_pred = fluid.layers.sigmoid(pred)
+
+    val_program = train_program.clone(for_test=True)
+
+    initializer = []
+    with fluid.program_guard(train_program, startup_program):
+        train_label = fluid.layers.data(
+            name="label", dtype="float32", shape=[None, dataset.num_tasks])
+        train_weight = fluid.layers.data(
+            name="weight", dtype="float32", shape=[None, dataset.num_tasks])
+        train_loss_t = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=pred, label=train_label) * train_weight
+        train_loss_t = fluid.layers.reduce_sum(train_loss_t)
+
+        adam = fluid.optimizer.Adam(
+            learning_rate=1e-2,
+            regularization=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0005))
+        adam.minimize(train_loss_t)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    for epoch in range(1, args.epochs + 1):
+        print("Epoch", epoch)
+        train(exe, 128, gw, train_program, splitted_idx, dataset, evaluator,
+              train_loss_t, sigmoid_pred)
+        evaluate(exe, 128, gw, val_program, splitted_idx, dataset, "valid",
+                 evaluator, sigmoid_pred)
+        evaluate(exe, 128, gw, val_program, splitted_idx, dataset, "test",
+                 evaluator, sigmoid_pred)
+
+
+if __name__ == "__main__":
+    main()
--- a/ogb_examples/linkproppred/main_pgl.py
+++ b/ogb_examples/linkproppred/main_pgl.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""test ogb
+"""
+import argparse
+import time
+import logging
+import numpy as np
+
+import paddle.fluid as fluid
+
+import pgl
+from pgl.contrib.ogb.linkproppred.dataset_pgl import PglLinkPropPredDataset
+from pgl.utils import paddle_helper
+from ogb.linkproppred import Evaluator
+
+
+def send_func(src_feat, dst_feat, edge_feat):
+    """send_func"""
+    return src_feat["h"]
+
+
+def recv_func(feat):
+    """recv_func"""
+    return fluid.layers.sequence_pool(feat, pool_type="sum")
+
+
+class GNNModel(object):
+    """GNNModel"""
+
+    def __init__(self, name, num_nodes, emb_dim, num_layers):
+        self.num_nodes = num_nodes
+        self.emb_dim = emb_dim
+        self.num_layers = num_layers
+        self.name = name
+
+        self.src_nodes = fluid.layers.data(
+            name='src_nodes',
+            shape=[None],
+            dtype='int64', )
+
+        self.dst_nodes = fluid.layers.data(
+            name='dst_nodes',
+            shape=[None],
+            dtype='int64', )
+
+        self.edge_label = fluid.layers.data(
+            name='edge_label',
+            shape=[None, 1],
+            dtype='float32', )
+
+    def forward(self, graph):
+        """forward"""
+        h = fluid.layers.create_parameter(
+            shape=[self.num_nodes, self.emb_dim],
+            dtype="float32",
+            name=self.name + "_embedding")
+
+        for layer in range(self.num_layers):
+            msg = graph.send(
+                send_func,
+                nfeat_list=[("h", h)], )
+            h = graph.recv(msg, recv_func)
+            h = fluid.layers.fc(
+                h,
+                size=self.emb_dim,
+                bias_attr=False,
+                param_attr=fluid.ParamAttr(name=self.name + '_%s' % layer))
+            h = h * graph.node_feat["norm"]
+            bias = fluid.layers.create_parameter(
+                shape=[self.emb_dim],
+                dtype='float32',
+                is_bias=True,
+                name=self.name + '_bias_%s' % layer)
+            h = fluid.layers.elementwise_add(h, bias, act="relu")
+
+        src = fluid.layers.gather(h, self.src_nodes, overwrite=False)
+        dst = fluid.layers.gather(h, self.dst_nodes, overwrite=False)
+        edge_embed = src * dst
+        pred = fluid.layers.fc(input=edge_embed,
+                               size=1,
+                               name=self.name + "_pred_output")
+
+        prob = fluid.layers.sigmoid(pred)
+
+        loss = fluid.layers.sigmoid_cross_entropy_with_logits(pred,
+                                                              self.edge_label)
+        loss = fluid.layers.reduce_sum(loss)
+
+        return pred, prob, loss
+
+
+def main():
+    """main
+    """
+    # Training settings
+    parser = argparse.ArgumentParser(description='Graph Dataset')
+    parser.add_argument(
+        '--epochs',
+        type=int,
+        default=4,
+        help='number of epochs to train (default: 100)')
+    parser.add_argument(
+        '--dataset',
+        type=str,
+        default="ogbl-ppa",
+        help='dataset name (default: protein protein associations)')
+    parser.add_argument('--use_cuda', action='store_true')
+    parser.add_argument('--batch_size', type=int, default=5120)
+    parser.add_argument('--embed_dim', type=int, default=64)
+    parser.add_argument('--num_layers', type=int, default=2)
+    parser.add_argument('--lr', type=float, default=0.001)
+    args = parser.parse_args()
+    print(args)
+
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+
+    ### automatic dataloading and splitting
+    print("loadding dataset")
+    dataset = PglLinkPropPredDataset(name=args.dataset)
+    splitted_edge = dataset.get_edge_split()
+    print(splitted_edge['train_edge'].shape)
+    print(splitted_edge['train_edge_label'].shape)
+
+    print("building evaluator")
+    ### automatic evaluator. takes dataset name as input
+    evaluator = Evaluator(args.dataset)
+
+    graph_data = dataset[0]
+    print("num_nodes: %d" % graph_data.num_nodes)
+
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+
+    # degree normalize
+    indegree = graph_data.indegree()
+    norm = np.zeros_like(indegree, dtype="float32")
+    norm[indegree > 0] = np.power(indegree[indegree > 0], -0.5)
+    graph_data.node_feat["norm"] = np.expand_dims(norm, -1).astype("float32")
+    #  graph_data.node_feat["index"] = np.array([i for i in range(graph_data.num_nodes)], dtype=np.int64).reshape(-1,1)
+
+    with fluid.program_guard(train_program, startup_program):
+        model = GNNModel(
+            name="gnn",
+            num_nodes=graph_data.num_nodes,
+            emb_dim=args.embed_dim,
+            num_layers=args.num_layers)
+        gw = pgl.graph_wrapper.GraphWrapper(
+            "graph",
+            place,
+            node_feat=graph_data.node_feat_info(),
+            edge_feat=graph_data.edge_feat_info())
+        pred, prob, loss = model.forward(gw)
+
+    val_program = train_program.clone(for_test=True)
+
+    with fluid.program_guard(train_program, startup_program):
+        global_steps = int(splitted_edge['train_edge'].shape[0] /
+                           args.batch_size * 2)
+        learning_rate = fluid.layers.polynomial_decay(args.lr, global_steps,
+                                                      0.00005)
+
+        adam = fluid.optimizer.Adam(
+            learning_rate=learning_rate,
+            regularization=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0005))
+        adam.minimize(loss)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    feed = gw.to_feed(graph_data)
+
+    print("evaluate result before training: ")
+    result = test(exe, val_program, prob, evaluator, feed, splitted_edge)
+    print(result)
+
+    print("training")
+    cc = 0
+    for epoch in range(1, args.epochs + 1):
+        for batch_data, batch_label in data_generator(
+                graph_data,
+                splitted_edge["train_edge"],
+                splitted_edge["train_edge_label"],
+                batch_size=args.batch_size):
+            feed['src_nodes'] = batch_data[:, 0].reshape(-1, 1)
+            feed['dst_nodes'] = batch_data[:, 1].reshape(-1, 1)
+            feed['edge_label'] = batch_label.astype("float32")
+
+            res_loss, y_pred, b_lr = exe.run(
+                train_program,
+                feed=feed,
+                fetch_list=[loss, prob, learning_rate])
+            if cc % 1 == 0:
+                print("epoch %d | step %d | lr %s | Loss %s" %
+                      (epoch, cc, b_lr[0], res_loss[0]))
+            cc += 1
+
+            if cc % 20 == 0:
+                print("Evaluating...")
+                result = test(exe, val_program, prob, evaluator, feed,
+                              splitted_edge)
+                print("epoch %d | step %d" % (epoch, cc))
+                print(result)
+
+
+def test(exe, val_program, prob, evaluator, feed, splitted_edge):
+    """Evaluation"""
+    result = {}
+    feed['src_nodes'] = splitted_edge["valid_edge"][:, 0].reshape(-1, 1)
+    feed['dst_nodes'] = splitted_edge["valid_edge"][:, 1].reshape(-1, 1)
+    feed['edge_label'] = splitted_edge["valid_edge_label"].astype(
+        "float32").reshape(-1, 1)
+    y_pred = exe.run(val_program, feed=feed, fetch_list=[prob])[0]
+    input_dict = {
+        "y_pred_pos":
+        y_pred[splitted_edge["valid_edge_label"] == 1].reshape(-1, ),
+        "y_pred_neg":
+        y_pred[splitted_edge["valid_edge_label"] == 0].reshape(-1, )
+    }
+    result["valid"] = evaluator.eval(input_dict)
+
+    feed['src_nodes'] = splitted_edge["test_edge"][:, 0].reshape(-1, 1)
+    feed['dst_nodes'] = splitted_edge["test_edge"][:, 1].reshape(-1, 1)
+    feed['edge_label'] = splitted_edge["test_edge_label"].astype(
+        "float32").reshape(-1, 1)
+    y_pred = exe.run(val_program, feed=feed, fetch_list=[prob])[0]
+    input_dict = {
+        "y_pred_pos":
+        y_pred[splitted_edge["test_edge_label"] == 1].reshape(-1, ),
+        "y_pred_neg":
+        y_pred[splitted_edge["test_edge_label"] == 0].reshape(-1, )
+    }
+    result["test"] = evaluator.eval(input_dict)
+    return result
+
+
+def data_generator(graph, data, label_data, batch_size, shuffle=True):
+    """Data Generator"""
+    perm = np.arange(0, len(data))
+    if shuffle:
+        np.random.shuffle(perm)
+
+    offset = 0
+    while offset < len(perm):
+        batch_index = perm[offset:(offset + batch_size)]
+        offset += batch_size
+        pos_data = data[batch_index]
+        pos_label = label_data[batch_index]
+
+        neg_src_node = pos_data[:, 0]
+        neg_dst_node = np.random.choice(
+            pos_data.reshape(-1, ), size=len(neg_src_node))
+        neg_data = np.hstack(
+            [neg_src_node.reshape(-1, 1), neg_dst_node.reshape(-1, 1)])
+        exists = graph.has_edges_between(neg_src_node, neg_dst_node)
+        neg_data = neg_data[np.invert(exists)]
+        neg_label = np.zeros(shape=len(neg_data), dtype=np.int64)
+
+        batch_data = np.vstack([pos_data, neg_data])
+        label = np.vstack([pos_label.reshape(-1, 1), neg_label.reshape(-1, 1)])
+        yield batch_data, label
+
+
+if __name__ == "__main__":
+    main()
--- a/ogb_examples/nodeproppred/main_pgl.py
+++ b/ogb_examples/nodeproppred/main_pgl.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""test ogb
+"""
+import argparse
+
+import pgl
+import numpy as np
+import paddle.fluid as fluid
+from pgl.contrib.ogb.nodeproppred.dataset_pgl import PglNodePropPredDataset
+from pgl.utils import paddle_helper
+from ogb.nodeproppred import Evaluator
+
+
+def train():
+    pass
+
+
+def send_func(src_feat, dst_feat, edge_feat):
+    return (src_feat["h"] + edge_feat["h"]) * src_feat["norm"]
+
+
+class GNNModel(object):
+    def __init__(self, name, emb_dim, num_task, num_layers):
+        self.num_task = num_task
+        self.emb_dim = emb_dim
+        self.num_layers = num_layers
+        self.name = name
+
+    def forward(self, graph):
+        h = fluid.layers.embedding(
+            graph.node_feat["x"],
+            size=(2, self.emb_dim))  # name=self.name + "_embedding") 
+        edge_attr = fluid.layers.fc(graph.edge_feat["feat"], size=self.emb_dim)
+        for layer in range(self.num_layers):
+            msg = graph.send(
+                send_func,
+                nfeat_list=[("h", h), ("norm", graph.node_feat["norm"])],
+                efeat_list=[("h", edge_attr)])
+            h = graph.recv(msg, "sum")
+            h = fluid.layers.fc(
+                h,
+                size=self.emb_dim,
+                bias_attr=False,
+                param_attr=fluid.ParamAttr(name=self.name + '_%s' % layer))
+            h = h * graph.node_feat["norm"]
+            bias = fluid.layers.create_parameter(
+                shape=[self.emb_dim],
+                dtype='float32',
+                is_bias=True,
+                name=self.name + '_bias_%s' % layer)
+            h = fluid.layers.elementwise_add(h, bias, act="relu")
+        pred = fluid.layers.fc(h,
+                               self.num_task,
+                               act=None,
+                               name=self.name + "_pred_output")
+        return pred
+
+
+def main():
+    """main
+    """
+    # Training settings
+    parser = argparse.ArgumentParser(description='Graph Dataset')
+    parser.add_argument(
+        '--epochs',
+        type=int,
+        default=100,
+        help='number of epochs to train (default: 100)')
+    parser.add_argument(
+        '--dataset',
+        type=str,
+        default="ogbn-proteins",
+        help='dataset name (default: proteinfunc)')
+    args = parser.parse_args()
+
+    #device = torch.device("cuda:" + str(args.device)) if torch.cuda.is_available() else torch.device("cpu")
+    #place = fluid.CUDAPlace(0)
+    place = fluid.CPUPlace()  # Dataset too big to use GPU
+
+    ### automatic dataloading and splitting
+    dataset = PglNodePropPredDataset(name=args.dataset)
+    splitted_idx = dataset.get_idx_split()
+
+    ### automatic evaluator. takes dataset name as input
+    evaluator = Evaluator(args.dataset)
+
+    graph_data, label = dataset[0]
+
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    test_program = fluid.Program()
+    # degree normalize
+    indegree = graph_data.indegree()
+    norm = np.zeros_like(indegree, dtype="float32")
+    norm[indegree > 0] = np.power(indegree[indegree > 0], -0.5)
+    graph_data.node_feat["norm"] = np.expand_dims(norm, -1).astype("float32")
+    graph_data.node_feat["x"] = np.zeros((len(indegree), 1), dtype="int64")
+    graph_data.edge_feat["feat"] = graph_data.edge_feat["feat"].astype(
+        "float32")
+    model = GNNModel(
+        name="gnn", num_task=dataset.num_tasks, emb_dim=64, num_layers=2)
+
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.StaticGraphWrapper("graph", graph_data, place)
+        pred = model.forward(gw)
+        sigmoid_pred = fluid.layers.sigmoid(pred)
+
+    val_program = train_program.clone(for_test=True)
+
+    initializer = []
+    with fluid.program_guard(train_program, startup_program):
+        train_node_index, init = paddle_helper.constant(
+            "train_node_index", dtype="int64", value=splitted_idx["train"])
+        initializer.append(init)
+
+        train_node_label, init = paddle_helper.constant(
+            "train_node_label",
+            dtype="float32",
+            value=label[splitted_idx["train"]].astype("float32"))
+        initializer.append(init)
+        train_pred_t = fluid.layers.gather(pred, train_node_index)
+        train_loss_t = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=train_pred_t, label=train_node_label)
+        train_loss_t = fluid.layers.reduce_sum(train_loss_t)
+        train_pred_t = fluid.layers.sigmoid(train_pred_t)
+
+        adam = fluid.optimizer.Adam(
+            learning_rate=1e-2,
+            regularization=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0005))
+        adam.minimize(train_loss_t)
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    gw.initialize(place)
+    for init in initializer:
+        init(place)
+
+    for epoch in range(1, args.epochs + 1):
+        loss = exe.run(train_program, feed={}, fetch_list=[train_loss_t])
+        print("Loss %s" % loss[0])
+        print("Evaluating...")
+        y_pred = exe.run(val_program, feed={}, fetch_list=[sigmoid_pred])[0]
+        result = {}
+        input_dict = {
+            "y_true": label[splitted_idx["train"]],
+            "y_pred": y_pred[splitted_idx["train"]]
+        }
+        result["train"] = evaluator.eval(input_dict)
+        input_dict = {
+            "y_true": label[splitted_idx["valid"]],
+            "y_pred": y_pred[splitted_idx["valid"]]
+        }
+        result["valid"] = evaluator.eval(input_dict)
+        input_dict = {
+            "y_true": label[splitted_idx["test"]],
+            "y_pred": y_pred[splitted_idx["test"]]
+        }
+        result["test"] = evaluator.eval(input_dict)
+        print(result)
+
+
+if __name__ == "__main__":
+    main()
--- a/pgl/__init__.py
+++ b/pgl/__init__.py
@@ -13,8 +13,11 @@
 # limitations under the License.
 """Generate pgl apis
 """
-__version__ = "0.1.0.beta"
+__version__ = "1.0.2"
 from pgl import layers
 from pgl import graph_wrapper
 from pgl import graph
 from pgl import data_loader
+from pgl import heter_graph
+from pgl import heter_graph_wrapper
+from pgl import contrib
--- a/pgl/contrib/__init__.py
+++ b/pgl/contrib/__init__.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/pgl/contrib/ogb/__init__.py
+++ b/pgl/contrib/ogb/__init__.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/pgl/contrib/ogb/graphproppred/__init__.py
+++ b/pgl/contrib/ogb/graphproppred/__init__.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""__init__.py"""
--- a/pgl/contrib/ogb/graphproppred/dataset_pgl.py
+++ b/pgl/contrib/ogb/graphproppred/dataset_pgl.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PglGraphPropPredDataset
+"""
+import pandas as pd
+import shutil, os
+import os.path as osp
+import numpy as np
+from ogb.utils.url import decide_download, download_url, extract_zip
+from ogb.graphproppred import make_master_file
+from pgl.contrib.ogb.io.read_graph_pgl import read_csv_graph_pgl
+
+
+def to_bool(value):
+    """to_bool"""
+    return np.array([value], dtype="bool")[0]
+
+
+class PglGraphPropPredDataset(object):
+    """PglGraphPropPredDataset"""
+
+    def __init__(self, name, root="dataset"):
+        self.name = name  ## original name, e.g., ogbg-mol-tox21
+        self.dir_name = "_".join(
+            name.split("-")
+        ) + "_pgl"  ## replace hyphen with underline, e.g., ogbg_mol_tox21_dgl
+
+        self.original_root = root
+        self.root = osp.join(root, self.dir_name)
+
+        self.meta_info = make_master_file.df  #pd.read_csv(
+        #os.path.join(os.path.dirname(__file__), "master.csv"), index_col=0)
+        if not self.name in self.meta_info:
+            print(self.name)
+            error_mssg = "Invalid dataset name {}.\n".format(self.name)
+            error_mssg += "Available datasets are as follows:\n"
+            error_mssg += "\n".join(self.meta_info.keys())
+            raise ValueError(error_mssg)
+
+        self.download_name = self.meta_info[self.name][
+            "download_name"]  ## name of downloaded file, e.g., tox21
+
+        self.num_tasks = int(self.meta_info[self.name]["num tasks"])
+        self.task_type = self.meta_info[self.name]["task type"]
+
+        super(PglGraphPropPredDataset, self).__init__()
+
+        self.pre_process()
+
+    def pre_process(self):
+        """Pre-processing"""
+        processed_dir = osp.join(self.root, 'processed')
+        raw_dir = osp.join(self.root, 'raw')
+        pre_processed_file_path = osp.join(processed_dir, 'pgl_data_processed')
+
+        if os.path.exists(pre_processed_file_path):
+            # TODO: Load Preprocessed
+            pass
+        else:
+            ### download
+            url = self.meta_info[self.name]["url"]
+            if decide_download(url):
+                path = download_url(url, self.original_root)
+                extract_zip(path, self.original_root)
+                os.unlink(path)
+                # delete folder if there exists
+                try:
+                    shutil.rmtree(self.root)
+                except:
+                    pass
+                shutil.move(
+                    osp.join(self.original_root, self.download_name),
+                    self.root)
+            else:
+                print("Stop download.")
+                exit(-1)
+
+            ### preprocess
+            add_inverse_edge = to_bool(self.meta_info[self.name][
+                "add_inverse_edge"])
+            self.graphs = read_csv_graph_pgl(
+                raw_dir, add_inverse_edge=add_inverse_edge)
+            self.graphs = np.array(self.graphs)
+            self.labels = np.array(
+                pd.read_csv(
+                    osp.join(raw_dir, "graph-label.csv.gz"),
+                    compression="gzip",
+                    header=None).values)
+
+            # TODO: Load Graph
+            ### load preprocessed files
+
+    def get_idx_split(self):
+        """Train/Valid/Test split"""
+        split_type = self.meta_info[self.name]["split"]
+        path = osp.join(self.root, "split", split_type)
+
+        train_idx = pd.read_csv(
+            osp.join(path, "train.csv.gz"), compression="gzip",
+            header=None).values.T[0]
+        valid_idx = pd.read_csv(
+            osp.join(path, "valid.csv.gz"), compression="gzip",
+            header=None).values.T[0]
+        test_idx = pd.read_csv(
+            osp.join(path, "test.csv.gz"), compression="gzip",
+            header=None).values.T[0]
+
+        return {
+            "train": np.array(
+                train_idx, dtype="int64"),
+            "valid": np.array(
+                valid_idx, dtype="int64"),
+            "test": np.array(
+                test_idx, dtype="int64")
+        }
+
+    def __getitem__(self, idx):
+        """Get datapoint with index"""
+        return self.graphs[idx], self.labels[idx]
+
+    def __len__(self):
+        """Length of the dataset
+        Returns
+        -------
+        int
+            Length of Dataset
+        """
+        return len(self.graphs)
+
+    def __repr__(self):  # pragma: no cover
+        return '{}({})'.format(self.__class__.__name__, len(self))
+
+
+if __name__ == "__main__":
+    pgl_dataset = PglGraphPropPredDataset(name="ogbg-mol-bace")
+    splitted_index = pgl_dataset.get_idx_split()
+    print(pgl_dataset)
+    print(pgl_dataset[3:20])
+    #print(pgl_dataset[splitted_index["train"]])
+    #print(pgl_dataset[splitted_index["valid"]])
+    #print(pgl_dataset[splitted_index["test"]])
--- a/pgl/contrib/ogb/graphproppred/mol_encoder.py
+++ b/pgl/contrib/ogb/graphproppred/mol_encoder.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""MolEncoder for ogb
+"""
+import paddle.fluid as fluid
+from ogb.utils.features import get_atom_feature_dims, get_bond_feature_dims
+
+
+class AtomEncoder(object):
+    """AtomEncoder for encoding node features"""
+
+    def __init__(self, name, emb_dim):
+        self.emb_dim = emb_dim
+        self.name = name
+
+    def __call__(self, x):
+        atom_feature = get_atom_feature_dims()
+        atom_input = fluid.layers.split(
+            x, num_or_sections=len(atom_feature), dim=-1)
+        outputs = None
+        count = 0
+        for _x, _atom_input_dim in zip(atom_input, atom_feature):
+            count += 1
+            emb = fluid.layers.embedding(
+                _x,
+                size=(_atom_input_dim, self.emb_dim),
+                param_attr=fluid.ParamAttr(
+                    name=self.name + '_atom_feat_%s' % count))
+            if outputs is None:
+                outputs = emb
+            else:
+                outputs = outputs + emb
+        return outputs
+
+
+class BondEncoder(object):
+    """Bond for encoding edge features"""
+
+    def __init__(self, name, emb_dim):
+        self.emb_dim = emb_dim
+        self.name = name
+
+    def __call__(self, x):
+        bond_feature = get_bond_feature_dims()
+        bond_input = fluid.layers.split(
+            x, num_or_sections=len(bond_feature), dim=-1)
+        outputs = None
+        count = 0
+        for _x, _bond_input_dim in zip(bond_input, bond_feature):
+            count += 1
+            emb = fluid.layers.embedding(
+                _x,
+                size=(_bond_input_dim, self.emb_dim),
+                param_attr=fluid.ParamAttr(
+                    name=self.name + '_bond_feat_%s' % count))
+            if outputs is None:
+                outputs = emb
+            else:
+                outputs = outputs + emb
+        return outputs
--- a/pgl/contrib/ogb/io/__init__.py
+++ b/pgl/contrib/ogb/io/__init__.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""__init__.py
+"""
--- a/pgl/contrib/ogb/io/read_graph_pgl.py
+++ b/pgl/contrib/ogb/io/read_graph_pgl.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""pgl read_csv_graph for ogb
+"""
+
+import pandas as pd
+import os.path as osp
+import numpy as np
+import pgl
+from ogb.io.read_graph_raw import read_csv_graph_raw
+
+
+def read_csv_graph_pgl(raw_dir, add_inverse_edge=False):
+    """Read CSV data and build PGL Graph
+    """
+    graph_list = read_csv_graph_raw(raw_dir, add_inverse_edge)
+    pgl_graph_list = []
+
+    for graph in graph_list:
+        edges = list(zip(graph["edge_index"][0], graph["edge_index"][1]))
+        g = pgl.graph.Graph(num_nodes=graph["num_nodes"], edges=edges)
+
+        if graph["edge_feat"] is not None:
+            g.edge_feat["feat"] = graph["edge_feat"]
+
+        if graph["node_feat"] is not None:
+            g.node_feat["feat"] = graph["node_feat"]
+
+        pgl_graph_list.append(g)
+
+    return pgl_graph_list
+
+
+if __name__ == "__main__":
+    # graph_list = read_csv_graph_dgl('dataset/proteinfunc_v2/raw', add_inverse_edge = True)
+    graph_list = read_csv_graph_pgl(
+        'dataset/ogbn_proteins_pgl/raw', add_inverse_edge=True)
+    print(graph_list)
--- a/pgl/contrib/ogb/linkproppred/__init__.py
+++ b/pgl/contrib/ogb/linkproppred/__init__.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""__init__.py
+"""
--- a/pgl/contrib/ogb/linkproppred/dataset_pgl.py
+++ b/pgl/contrib/ogb/linkproppred/dataset_pgl.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""LinkPropPredDataset for pgl
+"""
+import pandas as pd
+import shutil, os
+import os.path as osp
+import numpy as np
+from ogb.utils.url import decide_download, download_url, extract_zip
+from ogb.linkproppred import make_master_file
+from pgl.contrib.ogb.io.read_graph_pgl import read_csv_graph_pgl
+
+
+def to_bool(value):
+    """to_bool"""
+    return np.array([value], dtype="bool")[0]
+
+
+class PglLinkPropPredDataset(object):
+    """PglLinkPropPredDataset
+    """
+
+    def __init__(self, name, root="dataset"):
+        self.name = name  ## original name, e.g., ogbl-ppa
+        self.dir_name = "_".join(name.split(
+            "-")) + "_pgl"  ## replace hyphen with underline, e.g., ogbl_ppa_pgl
+
+        self.original_root = root
+        self.root = osp.join(root, self.dir_name)
+
+        self.meta_info = make_master_file.df  #pd.read_csv(os.path.join(os.path.dirname(__file__), "master.csv"), index_col=0)
+        if not self.name in self.meta_info:
+            print(self.name)
+            error_mssg = "Invalid dataset name {}.\n".format(self.name)
+            error_mssg += "Available datasets are as follows:\n"
+            error_mssg += "\n".join(self.meta_info.keys())
+            raise ValueError(error_mssg)
+
+        self.download_name = self.meta_info[self.name][
+            "download_name"]  ## name of downloaded file, e.g., ppassoc
+
+        self.task_type = self.meta_info[self.name]["task type"]
+
+        super(PglLinkPropPredDataset, self).__init__()
+
+        self.pre_process()
+
+    def pre_process(self):
+        """pre_process downlaoding data
+        """
+        processed_dir = osp.join(self.root, 'processed')
+        pre_processed_file_path = osp.join(processed_dir, 'pgl_data_processed')
+
+        if osp.exists(pre_processed_file_path):
+            #TODO: Reload Preprocess files
+            pass
+        else:
+            ### check download
+            if not osp.exists(osp.join(self.root, "raw", "edge.csv.gz")):
+                url = self.meta_info[self.name]["url"]
+                if decide_download(url):
+                    path = download_url(url, self.original_root)
+                    extract_zip(path, self.original_root)
+                    os.unlink(path)
+                    # delete folder if there exists
+                    try:
+                        shutil.rmtree(self.root)
+                    except:
+                        pass
+                    shutil.move(
+                        osp.join(self.original_root, self.download_name),
+                        self.root)
+                else:
+                    print("Stop download.")
+                    exit(-1)
+
+            raw_dir = osp.join(self.root, "raw")
+
+            ### pre-process and save
+            add_inverse_edge = to_bool(self.meta_info[self.name][
+                "add_inverse_edge"])
+            self.graph = read_csv_graph_pgl(
+                raw_dir, add_inverse_edge=add_inverse_edge)
+
+            #TODO: SAVE preprocess graph
+
+    def get_edge_split(self):
+        """Train/Validation/Test split
+        """
+        split_type = self.meta_info[self.name]["split"]
+        path = osp.join(self.root, "split", split_type)
+
+        train_idx = pd.read_csv(
+            osp.join(path, "train.csv.gz"), compression="gzip",
+            header=None).values
+        valid_idx = pd.read_csv(
+            osp.join(path, "valid.csv.gz"), compression="gzip",
+            header=None).values
+        test_idx = pd.read_csv(
+            osp.join(path, "test.csv.gz"), compression="gzip",
+            header=None).values
+
+        if self.task_type == "link prediction":
+            target_type = np.int64
+        else:
+            target_type = np.float32
+
+        return {
+            "train_edge": np.array(
+                train_idx[:, :2], dtype="int64"),
+            "train_edge_label": np.array(
+                train_idx[:, 2], dtype=target_type),
+            "valid_edge": np.array(
+                valid_idx[:, :2], dtype="int64"),
+            "valid_edge_label": np.array(
+                valid_idx[:, 2], dtype=target_type),
+            "test_edge": np.array(
+                test_idx[:, :2], dtype="int64"),
+            "test_edge_label": np.array(
+                test_idx[:, 2], dtype=target_type)
+        }
+
+    def __getitem__(self, idx):
+        assert idx == 0, "This dataset has only one graph"
+        return self.graph[0]
+
+    def __len__(self):
+        return 1
+
+    def __repr__(self):  # pragma: no cover
+        return '{}({})'.format(self.__class__.__name__, len(self))
+
+
+if __name__ == "__main__":
+    pgl_dataset = PglLinkPropPredDataset(name="ogbl-ppa")
+    splitted_edge = pgl_dataset.get_edge_split()
+    print(pgl_dataset[0])
+    print(splitted_edge)
--- a/pgl/contrib/ogb/nodeproppred/__init__.py
+++ b/pgl/contrib/ogb/nodeproppred/__init__.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""__init__.py
+"""
--- a/pgl/contrib/ogb/nodeproppred/dataset_pgl.py
+++ b/pgl/contrib/ogb/nodeproppred/dataset_pgl.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""NodePropPredDataset for pgl
+"""
+import pandas as pd
+import shutil, os
+import os.path as osp
+import numpy as np
+from ogb.utils.url import decide_download, download_url, extract_zip
+from ogb.nodeproppred import make_master_file  # create master.csv
+from pgl.contrib.ogb.io.read_graph_pgl import read_csv_graph_pgl
+
+
+def to_bool(value):
+    """to_bool"""
+    return np.array([value], dtype="bool")[0]
+
+
+class PglNodePropPredDataset(object):
+    """PglNodePropPredDataset
+    """
+
+    def __init__(self, name, root="dataset"):
+        self.name = name  ## original name, e.g., ogbn-proteins
+        self.dir_name = "_".join(
+            name.split("-")
+        ) + "_pgl"  ## replace hyphen with underline, e.g., ogbn_proteins_pgl
+
+        self.original_root = root
+        self.root = osp.join(root, self.dir_name)
+
+        self.meta_info = make_master_file.df  #pd.read_csv(
+        #os.path.join(os.path.dirname(__file__), "master.csv"), index_col=0)
+        if not self.name in self.meta_info:
+            error_mssg = "Invalid dataset name {}.\n".format(self.name)
+            error_mssg += "Available datasets are as follows:\n"
+            error_mssg += "\n".join(self.meta_info.keys())
+            raise ValueError(error_mssg)
+
+        self.download_name = self.meta_info[self.name][
+            "download_name"]  ## name of downloaded file, e.g., tox21
+
+        self.num_tasks = int(self.meta_info[self.name]["num tasks"])
+        self.task_type = self.meta_info[self.name]["task type"]
+
+        super(PglNodePropPredDataset, self).__init__()
+
+        self.pre_process()
+
+    def pre_process(self):
+        """pre_process downlaoding data
+        """
+        processed_dir = osp.join(self.root, 'processed')
+        pre_processed_file_path = osp.join(processed_dir, 'pgl_data_processed')
+
+        if osp.exists(pre_processed_file_path):
+            # TODO: Reload Preprocess files 
+            pass
+        else:
+            ### check download
+            if not osp.exists(osp.join(self.root, "raw", "edge.csv.gz")):
+                url = self.meta_info[self.name]["url"]
+                if decide_download(url):
+                    path = download_url(url, self.original_root)
+                    extract_zip(path, self.original_root)
+                    os.unlink(path)
+                    # delete folder if there exists
+                    try:
+                        shutil.rmtree(self.root)
+                    except:
+                        pass
+                    shutil.move(
+                        osp.join(self.original_root, self.download_name),
+                        self.root)
+                else:
+                    print("Stop download.")
+                    exit(-1)
+
+            raw_dir = osp.join(self.root, "raw")
+
+            ### pre-process and save
+            add_inverse_edge = to_bool(self.meta_info[self.name][
+                "add_inverse_edge"])
+            self.graph = read_csv_graph_pgl(
+                raw_dir, add_inverse_edge=add_inverse_edge)
+
+            ### adding prediction target
+            node_label = pd.read_csv(
+                osp.join(raw_dir, 'node-label.csv.gz'),
+                compression="gzip",
+                header=None).values
+            if "classification" in self.task_type:
+                node_label = np.array(node_label, dtype=np.int64)
+            else:
+                node_label = np.array(node_label, dtype=np.float32)
+
+            label_dict = {"labels": node_label}
+
+            # TODO: SAVE preprocess graph
+            self.labels = label_dict['labels']
+
+    def get_idx_split(self):
+        """Train/Validation/Test split
+        """
+        split_type = self.meta_info[self.name]["split"]
+        path = osp.join(self.root, "split", split_type)
+
+        train_idx = pd.read_csv(
+            osp.join(path, "train.csv.gz"), compression="gzip",
+            header=None).values.T[0]
+        valid_idx = pd.read_csv(
+            osp.join(path, "valid.csv.gz"), compression="gzip",
+            header=None).values.T[0]
+        test_idx = pd.read_csv(
+            osp.join(path, "test.csv.gz"), compression="gzip",
+            header=None).values.T[0]
+
+        return {
+            "train": np.array(
+                train_idx, dtype="int64"),
+            "valid": np.array(
+                valid_idx, dtype="int64"),
+            "test": np.array(
+                test_idx, dtype="int64")
+        }
+
+    def __getitem__(self, idx):
+        assert idx == 0, "This dataset has only one graph"
+        return self.graph[idx], self.labels
+
+    def __len__(self):
+        return 1
+
+    def __repr__(self):  # pragma: no cover
+        return '{}({})'.format(self.__class__.__name__, len(self))
+
+
+if __name__ == "__main__":
+    pgl_dataset = PglNodePropPredDataset(name="ogbn-proteins")
+    splitted_index = pgl_dataset.get_idx_split()
+    print(pgl_dataset[0])
+    print(splitted_index)
--- a/pgl/data_loader.py
+++ b/pgl/data_loader.py
@@ -20,7 +20,6 @@ import io
 import sys
 import numpy as np
 import pickle as pkl
-import networkx as nx

 from pgl import graph
 from pgl.utils.logger import log
@@ -91,6 +90,7 @@ class CitationDataset(object):
    def _load_data(self):
        """Load data
        """
+        import networkx as nx
        objnames = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']
        objects = []
        for i in range(len(objnames)):
@@ -98,7 +98,7 @@ class CitationDataset(object):
                      'rb') as f:
                objects.append(_pickle_load(f))

-        x, y, tx, ty, allx, ally, _graph = tuple(objects)
+        x, y, tx, ty, allx, ally, _graph = objects
        test_idx_reorder = _parse_index_file("{}/ind.{}.test.index".format(
            self.path, self.name))
        test_idx_range = np.sort(test_idx_reorder)

--- a/pgl/graph.py
+++ b/pgl/graph.py
@@ -15,12 +15,14 @@
    This package implement Graph structure for handling graph data.
 """

+import os
 import numpy as np
 import pickle as pkl
 import time
 import pgl.graph_kernel as graph_kernel
+from collections import defaultdict

-__all__ = ['Graph', 'SubGraph']
+__all__ = ['Graph', 'SubGraph', 'MultiGraph']


 def _hide_num_nodes(shape):
@@ -43,8 +45,8 @@ class EdgeIndex(object):
    """

    def __init__(self, u, v, num_nodes):
-        self._v, self._eid, self._degree, self._sorted_u,\
-                self._sorted_v, self._sorted_eid = graph_kernel.build_index(u, v, num_nodes)
+        self._degree, self._sorted_v, self._sorted_u, \
+             self._sorted_eid, self._indptr = graph_kernel.build_index(u, v, num_nodes)

    @property
    def degree(self):
@@ -52,23 +54,40 @@ class EdgeIndex(object):
        """
        return self._degree

-    @property
-    def v(self):
-        """Return the compressed v.
+    def view_v(self, u=None):
+        """Return the compressed v for given u.
        """
-        return self._v
+        if u is None:
+            return np.split(self._sorted_v, self._indptr[1:])
+        else:
+            u = np.array(u, dtype="int64")
+            return graph_kernel.slice_by_index(
+                self._sorted_v, self._indptr, index=u)

-    @property
-    def eid(self):
-        """Return the edge id.
+    def view_eid(self, u=None):
+        """Return the compressed edge id for given u.
        """
-        return self._eid
+        if u is None:
+            return np.split(self._sorted_eid, self._indptr[1:])
+        else:
+            u = np.array(u, dtype="int64")
+            return graph_kernel.slice_by_index(
+                self._sorted_eid, self._indptr, index=u)

    def triples(self):
        """Return the sorted (u, v, eid) tuples.
        """
        return self._sorted_u, self._sorted_v, self._sorted_eid

+    def dump(self, path):
+        if not os.path.exists(path):
+            os.makedirs(path)
+        np.save(os.path.join(path, 'degree.npy'), self._degree)
+        np.save(os.path.join(path, 'sorted_u.npy'), self._sorted_u)
+        np.save(os.path.join(path, 'sorted_v.npy'), self._sorted_v)
+        np.save(os.path.join(path, 'sorted_eid.npy'), self._sorted_eid)
+        np.save(os.path.join(path, 'indptr.npy'), self._indptr)
+

 class Graph(object):
    """Implementation of graph structure in pgl.
@@ -114,25 +133,76 @@ class Graph(object):
            self._edge_feat = {}

        if isinstance(edges, np.ndarray):
-            if edges.dtype != "int32":
-                edges = edges.astype("int32")
+            if edges.dtype != "int64":
+                edges = edges.astype("int64")
        else:
-            edges = np.array(edges, dtype="int32")
+            edges = np.array(edges, dtype="int64")

        self._edges = edges
        self._num_nodes = num_nodes

-        if len(edges) == 0:
-            # check emtpy edges
-            src, dst = np.array([], dtype="int32"), np.array([], dtype="int32")
-        else:
-            src = edges[:, 0]
-            dst = edges[:, 1]
+        self._adj_src_index = None
+        self._adj_dst_index = None
+        self.indegree()
+        self._num_graph = 1
+        self._graph_lod = np.array([0, self.num_nodes], dtype="int32")
+
+    def dump(self, path):
+        if not os.path.exists(path):
+            os.makedirs(path)
+        np.save(os.path.join(path, 'num_nodes.npy'), self._num_nodes)
+        np.save(os.path.join(path, 'edges.npy'), self._edges)
+
+        if self._adj_src_index:
+            self._adj_src_index.dump(os.path.join(path, 'adj_src'))
+
+        if self._adj_dst_index:
+            self._adj_dst_index.dump(os.path.join(path, 'adj_dst'))
+
+        def dump_feat(feat_path, feat):
+            """Dump all features to .npy file.
+            """
+            if len(feat) == 0:
+                return
+            if not os.path.exists(feat_path):
+                os.makedirs(feat_path)
+            for key in feat:
+                np.save(os.path.join(feat_path, key + ".npy"), feat[key])
+
+        dump_feat(os.path.join(path, "node_feat"), self.node_feat)
+        dump_feat(os.path.join(path, "edge_feat"), self.edge_feat)
+
+    @property
+    def adj_src_index(self):
+        """Return an EdgeIndex object for src.
+        """
+        if self._adj_src_index is None:
+            if len(self._edges) == 0:
+                u = np.array([], dtype="int64")
+                v = np.array([], dtype="int64")
+            else:
+                u = self._edges[:, 0]
+                v = self._edges[:, 1]
+
+            self._adj_src_index = EdgeIndex(
+                u=u, v=v, num_nodes=self._num_nodes)
+        return self._adj_src_index
+
+    @property
+    def adj_dst_index(self):
+        """Return an EdgeIndex object for dst.
+        """
+        if self._adj_dst_index is None:
+            if len(self._edges) == 0:
+                v = np.array([], dtype="int64")
+                u = np.array([], dtype="int64")
+            else:
+                v = self._edges[:, 0]
+                u = self._edges[:, 1]

-        self._adj_src_index = EdgeIndex(
-            u=src, v=dst, num_nodes=self._num_nodes)
-        self._adj_dst_index = EdgeIndex(
-            u=dst, v=src, num_nodes=self._num_nodes)
+            self._adj_dst_index = EdgeIndex(
+                u=u, v=v, num_nodes=self._num_nodes)
+        return self._adj_dst_index

    @property
    def edge_feat(self):
@@ -180,16 +250,16 @@ class Graph(object):
        if sort_by not in ["src", "dst"]:
            raise ValueError("sort_by should be in 'src' or 'dst'.")
        if sort_by == 'src':
-            src, dst, eid = self._adj_src_index.triples()
+            src, dst, eid = self.adj_src_index.triples()
        else:
-            dst, src, eid = self._adj_dst_index.triples()
+            dst, src, eid = self.adj_dst_index.triples()
        return src, dst, eid

    @property
    def nodes(self):
        """Return all nodes id from 0 to :code:`num_nodes - 1`
        """
-        return np.arange(self._num_nodes, dtype="int32")
+        return np.arange(self._num_nodes, dtype="int64")

    def indegree(self, nodes=None):
        """Return the indegree of the given nodes
@@ -204,9 +274,9 @@ class Graph(object):
            A numpy.ndarray as the given nodes' indegree.
        """
        if nodes is None:
-            return self._adj_dst_index.degree
+            return self.adj_dst_index.degree
        else:
-            return self._adj_dst_index.degree[nodes]
+            return self.adj_dst_index.degree[nodes]

    def outdegree(self, nodes=None):
        """Return the outdegree of the given nodes.
@@ -221,9 +291,9 @@ class Graph(object):
            A numpy.array as the given nodes' outdegree.
        """
        if nodes is None:
-            return self._adj_src_index.degree
+            return self.adj_src_index.degree
        else:
-            return self._adj_src_index.degree[nodes]
+            return self.adj_src_index.degree[nodes]

    def successor(self, nodes=None, return_eids=False):
        """Find successor of given nodes.
@@ -271,19 +341,17 @@ class Graph(object):
                       []]

        """
-        if nodes is None:
-            if return_eids:
-                return self._adj_src_index.v, self._adj_src_index.eid
-            else:
-                return self._adj_src_index.v
+        if return_eids:
+            return self.adj_src_index.view_v(
+                nodes), self.adj_src_index.view_eid(nodes)
        else:
-            if return_eids:
-                return self._adj_src_index.v[nodes], self._adj_src_index.eid[
-                    nodes]
-            else:
-                return self._adj_src_index.v[nodes]
+            return self.adj_src_index.view_v(nodes)

-    def sample_successor(self, nodes, max_degree, return_eids=False):
+    def sample_successor(self,
+                         nodes,
+                         max_degree,
+                         return_eids=False,
+                         shuffle=False):
        """Sample successors of given nodes.

        Args:
@@ -304,26 +372,20 @@ class Graph(object):
        node_succ = self.successor(nodes, return_eids=return_eids)
        if return_eids:
            node_succ, node_succ_eid = node_succ
+
        if nodes is None:
            nodes = self.nodes

-        sample_succ, sample_succ_eid = [], []
-        for i in range(len(nodes)):
-            max_size = min(max_degree, len(node_succ[i]))
-            if max_size == 0:
-                sample_succ.append([])
-                if return_eids:
-                    sample_succ_eid.append([])
-            else:
-                ind = np.random.choice(
-                    len(node_succ[i]), max_size, replace=False)
-                sample_succ.append(node_succ[i][ind])
-                if return_eids:
-                    sample_succ_eid.append(node_succ_eid[i][ind])
+        node_succ = node_succ.tolist()
+
+        if return_eids:
+            node_succ_eid = node_succ_eid.tolist()
+
        if return_eids:
-            return sample_succ, sample_succ_eid
+            return graph_kernel.sample_subset_with_eid(
+                node_succ, node_succ_eid, max_degree, shuffle)
        else:
-            return sample_succ
+            return graph_kernel.sample_subset(node_succ, max_degree, shuffle)

    def predecessor(self, nodes=None, return_eids=False):
        """Find predecessor of given nodes.
@@ -371,19 +433,17 @@ class Graph(object):
                       [2]]

        """
-        if nodes is None:
-            if return_eids:
-                return self._adj_dst_index.v, self._adj_dst_index.eid
-            else:
-                return self._adj_dst_index.v
+        if return_eids:
+            return self.adj_dst_index.view_v(
+                nodes), self.adj_dst_index.view_eid(nodes)
        else:
-            if return_eids:
-                return self._adj_dst_index.v[nodes], self._adj_dst_index.eid[
-                    nodes]
-            else:
-                return self._adj_dst_index.v[nodes]
+            return self.adj_dst_index.view_v(nodes)

-    def sample_predecessor(self, nodes, max_degree, return_eids=False):
+    def sample_predecessor(self,
+                           nodes,
+                           max_degree,
+                           return_eids=False,
+                           shuffle=False):
        """Sample predecessor of given nodes.

        Args:
@@ -407,24 +467,16 @@ class Graph(object):
        if nodes is None:
            nodes = self.nodes

-        sample_pred, sample_pred_eid = [], []
-        for i in range(len(nodes)):
-            max_size = min(max_degree, len(node_pred[i]))
-            if max_size == 0:
-                sample_pred.append([])
-                if return_eids:
-                    sample_pred_eid.append([])
-            else:
-                ind = np.random.choice(
-                    len(node_pred[i]), max_size, replace=False)
-                sample_pred.append(node_pred[i][ind])
-                if return_eids:
-                    sample_pred_eid.append(node_pred_eid[i][ind])
+        node_pred = node_pred.tolist()
+
+        if return_eids:
+            node_pred_eid = node_pred_eid.tolist()

        if return_eids:
-            return sample_pred, sample_pred_eid
+            return graph_kernel.sample_subset_with_eid(
+                node_pred, node_pred_eid, max_degree, shuffle)
        else:
-            return sample_pred
+            return graph_kernel.sample_subset(node_pred, max_degree, shuffle)

    def node_feat_info(self):
        """Return the information of node feature for GraphWrapper.
@@ -500,19 +552,31 @@ class Graph(object):
                (key, _hide_num_nodes(value.shape), value.dtype))
        return edge_feat_info

-    def subgraph(self, nodes, eid):
+    def subgraph(self,
+                 nodes,
+                 eid=None,
+                 edges=None,
+                 edge_feats=None,
+                 with_node_feat=True,
+                 with_edge_feat=True):
        """Generate subgraph with nodes and edge ids.

        This function will generate a :code:`pgl.graph.Subgraph` object and
        copy all corresponding node and edge features. Nodes and edges will
-        be reindex from 0.
+        be reindex from 0. Eid and edges can't both be None.

        WARNING: ALL NODES IN EID MUST BE INCLUDED BY NODES

        Args:
            nodes: Node ids which will be included in the subgraph.

-            eid: Edge ids which will be included in the subgraph.
+            eid (optional): Edge ids which will be included in the subgraph.
+
+            edges (optional): Edge(src, dst) list which will be included in the subgraph.
+    
+            with_node_feat: Whether to inherit node features from parent graph.
+
+            with_edge_feat: Whether to inherit edge features from parent graph.

        Return:
            A :code:`pgl.graph.Subgraph` object.
@@ -522,16 +586,33 @@ class Graph(object):
        for ind, node in enumerate(nodes):
            reindex[node] = ind

-        eid = np.array(eid, dtype="int32")
-        sub_edges = graph_kernel.map_edges(eid, self._edges, reindex)
+        if eid is None and edges is None:
+            raise ValueError("Eid and edges can't be None at the same time.")
+
+        if edges is None:
+            edges = self._edges[eid]
+        else:
+            edges = np.array(edges, dtype="int64")
+
+        sub_edges = graph_kernel.map_edges(
+            np.arange(
+                len(edges), dtype="int64"), edges, reindex)

        sub_edge_feat = {}
-        for key, value in self._edge_feat.items():
-            sub_edge_feat[key] = value[eid]
+        if edges is None:
+            if with_edge_feat:
+                for key, value in self._edge_feat.items():
+                    if eid is None:
+                        raise ValueError(
+                            "Eid can not be None with edge features.")
+                    sub_edge_feat[key] = value[eid]
+        else:
+            sub_edge_feat = edge_feats

        sub_node_feat = {}
-        for key, value in self._node_feat.items():
-            sub_node_feat[key] = value[nodes]
+        if with_node_feat:
+            for key, value in self._node_feat.items():
+                sub_node_feat[key] = value[nodes]

        subgraph = SubGraph(
            num_nodes=len(nodes),
@@ -554,7 +635,7 @@ class Graph(object):
        Return:
            Batch iterator
        """
-        perm = np.arange(self._num_nodes, dtype="int32")
+        perm = np.arange(self._num_nodes, dtype="int64")
        if shuffle:
            np.random.shuffle(perm)
        start = 0
@@ -644,7 +725,7 @@ class Graph(object):
                break
            succ = self.successor(cur_nodes)
            sample_index = np.floor(
-                np.random.rand(outdegree.shape[0]) * outdegree).astype("int32")
+                np.random.rand(outdegree.shape[0]) * outdegree).astype("int64")

            nxt_cur_nodes = []
            for s, ind, walk_id in zip(succ, sample_index, cur_walk_ids):
@@ -677,8 +758,8 @@ class Graph(object):

        cur_walk_ids = np.arange(0, len(nodes))
        cur_nodes = np.array(nodes)
-        prev_nodes = np.array([-1] * len(nodes), dtype="int32")
-        prev_succs = np.array([[]] * len(nodes), dtype="int32")
+        prev_nodes = np.array([-1] * len(nodes), dtype="int64")
+        prev_succs = np.array([[]] * len(nodes), dtype="int64")
        for l in range(max_depth):
            # select the walks not end
            outdegree = self.outdegree(cur_nodes)
@@ -693,7 +774,7 @@ class Graph(object):
                break
            cur_succs = self.successor(cur_nodes)
            num_nodes = cur_nodes.shape[0]
-            nxt_nodes = np.zeros(num_nodes, dtype="int32")
+            nxt_nodes = np.zeros(num_nodes, dtype="int64")

            for idx, (succ, prev_succ, walk_id, prev_node) in enumerate(
                    zip(cur_succs, prev_succs, cur_walk_ids, prev_nodes)):
@@ -707,6 +788,16 @@ class Graph(object):
            cur_nodes = nxt_nodes
        return walk

+    @property
+    def num_graph(self):
+        """ Return Number of Graphs"""
+        return self._num_graph
+
+    @property
+    def graph_lod(self):
+        """ Return Graph Lod Index for Paddle Computation"""
+        return self._graph_lod
+

 class SubGraph(Graph):
    """Implementation of SubGraph in pgl.
@@ -760,3 +851,120 @@ class SubGraph(Graph):
            A list of node ids in parent graph.
        """
        return graph_kernel.map_nodes(nodes, self._to_reindex)
+
+
+class MultiGraph(Graph):
+    """Implementation of multiple disjoint graph structure in pgl.
+
+    This is a simple implementation of graph structure in pgl.
+
+    Args:
+        graph_list :  A list of Graph Instances
+
+    Examples:
+
+        .. code-block:: python
+        
+            batch_graph = MultiGraph([graph1, graph2, graph3])
+
+    """
+
+    def __init__(self, graph_list):
+        num_nodes = np.sum([g.num_nodes for g in graph_list])
+        node_feat = self._join_node_feature(graph_list)
+        edge_feat = self._join_edge_feature(graph_list)
+        edges = self._join_edges(graph_list)
+        super(MultiGraph, self).__init__(
+            num_nodes=num_nodes,
+            edges=edges,
+            node_feat=node_feat,
+            edge_feat=edge_feat)
+        self._num_graph = len(graph_list)
+        self._src_graph = graph_list
+        graph_lod = [g.num_nodes for g in graph_list]
+        graph_lod = np.cumsum(graph_lod, dtype="int32")
+        graph_lod = np.insert(graph_lod, 0, 0)
+        self._graph_lod = graph_lod
+
+    def __getitem__(self, index):
+        return self._src_graph[index]
+
+    def _join_node_feature(self, graph_list):
+        """join node features for multiple graph"""
+        node_feat = defaultdict(lambda: [])
+        for graph in graph_list:
+            for key in graph.node_feat:
+                node_feat[key].append(graph.node_feat[key])
+        ret_node_feat = {}
+        for key in node_feat:
+            ret_node_feat[key] = np.vstack(node_feat[key])
+        return ret_node_feat
+
+    def _join_edge_feature(self, graph_list):
+        """join edge features for multiple graph"""
+        edge_feat = defaultdict(lambda: [])
+        for graph in graph_list:
+            for key in graph.edge_feat:
+                efeat = graph.edge_feat[key]
+                if len(efeat) > 0:
+                    edge_feat[key].append(efeat)
+
+        ret_edge_feat = {}
+        for key in edge_feat:
+            ret_edge_feat[key] = np.vstack(edge_feat[key])
+        return ret_edge_feat
+
+    def _join_edges(self, graph_list):
+        """join edges for multiple graph"""
+        list_edges = []
+        start_offset = 0
+        for graph in graph_list:
+            edges = graph.edges
+            if len(edges) > 0:
+                edges = edges + start_offset
+                list_edges.append(edges)
+            start_offset += graph.num_nodes
+        edges = np.vstack(list_edges)
+        return edges
+
+
+class MemmapEdgeIndex(EdgeIndex):
+    def __init__(self, path):
+        self._degree = np.load(os.path.join(path, 'degree.npy'), mmap_mode="r")
+        self._sorted_u = np.load(
+            os.path.join(path, 'sorted_u.npy'), mmap_mode="r")
+        self._sorted_v = np.load(
+            os.path.join(path, 'sorted_v.npy'), mmap_mode="r")
+        self._sorted_eid = np.load(
+            os.path.join(path, 'sorted_eid.npy'), mmap_mode="r")
+        self._indptr = np.load(os.path.join(path, 'indptr.npy'), mmap_mode="r")
+
+
+class MemmapGraph(Graph):
+    def __init__(self, path):
+        self._num_nodes = np.load(os.path.join(path, 'num_nodes.npy'))
+        self._edges = np.load(os.path.join(path, 'edges.npy'), mmap_mode="r")
+        if os.path.isdir(os.path.join(path, 'adj_src')):
+            self._adj_src_index = MemmapEdgeIndex(
+                os.path.join(path, 'adj_src'))
+        else:
+            self._adj_src_index = None
+
+        if os.path.isdir(os.path.join(path, 'adj_dst')):
+            self._adj_dst_index = MemmapEdgeIndex(
+                os.path.join(path, 'adj_dst'))
+        else:
+            self._adj_dst_index = None
+
+        def load_feat(feat_path):
+            """Load features from .npy file.
+            """
+            feat = {}
+            if os.path.isdir(feat_path):
+                for feat_name in os.listdir(feat_path):
+                    feat[os.path.splitext(feat_name)[0]] = np.load(
+                        os.path.join(feat_path, feat_name), mmap_mode="r")
+            return feat
+
+        self._node_feat = load_feat(os.path.join(path, 'node_feat'))
+        self._edge_feat = load_feat(os.path.join(path, 'edge_feat'))
--- a/pgl/graph_kernel.pyx
+++ b/pgl/graph_kernel.pyx
@@ -26,20 +26,20 @@ from libc.stdlib cimport rand, RAND_MAX

 @cython.boundscheck(False)
 @cython.wraparound(False)
-def build_index(np.ndarray[np.int32_t, ndim=1] u,
-        np.ndarray[np.int32_t, ndim=1] v,
-        int num_nodes):
+def build_index(np.ndarray[np.int64_t, ndim=1] u,
+        np.ndarray[np.int64_t, ndim=1] v,
+        long long num_nodes):
    """Building Edge Index
    """
-    cdef int i
-    cdef int h=len(u)
-    cdef int n_size = num_nodes
-    cdef np.ndarray[np.int32_t, ndim=1] degree = np.zeros([n_size], dtype=np.int32)
-    cdef np.ndarray[np.int32_t, ndim=1] count = np.zeros([n_size], dtype=np.int32)
-    cdef np.ndarray[np.int32_t, ndim=1] _tmp_v = np.zeros([h], dtype=np.int32)
-    cdef np.ndarray[np.int32_t, ndim=1] _tmp_u = np.zeros([h], dtype=np.int32)
-    cdef np.ndarray[np.int32_t, ndim=1] _tmp_eid = np.zeros([h], dtype=np.int32)
-    cdef np.ndarray[np.int32_t, ndim=1] indptr = np.zeros([n_size + 1], dtype=np.int32)
+    cdef long long i
+    cdef long long h=len(u)
+    cdef long long n_size = num_nodes
+    cdef np.ndarray[np.int64_t, ndim=1] degree = np.zeros([n_size], dtype=np.int64)
+    cdef np.ndarray[np.int64_t, ndim=1] count = np.zeros([n_size], dtype=np.int64)
+    cdef np.ndarray[np.int64_t, ndim=1] _tmp_v = np.zeros([h], dtype=np.int64)
+    cdef np.ndarray[np.int64_t, ndim=1] _tmp_u = np.zeros([h], dtype=np.int64)
+    cdef np.ndarray[np.int64_t, ndim=1] _tmp_eid = np.zeros([h], dtype=np.int64)
+    cdef np.ndarray[np.int64_t, ndim=1] indptr = np.zeros([n_size + 1], dtype=np.int64)

    with nogil:
        for i in xrange(h):
@@ -53,27 +53,34 @@ def build_index(np.ndarray[np.int32_t, ndim=1] u,
            _tmp_eid[indptr[u[i]] + count[u[i]]] = i
            _tmp_u[indptr[u[i]] + count[u[i]]] = u[i]
            count[u[i]] += 1
+    return degree, _tmp_v, _tmp_u, _tmp_eid, indptr

-    cdef list output_eid = []
-    cdef list output_v = []
-    for i in xrange(n_size):
-        output_eid.append(_tmp_eid[indptr[i]:indptr[i+1]])
-        output_v.append(_tmp_v[indptr[i]:indptr[i+1]])
-    return np.array(output_v), np.array(output_eid), degree, _tmp_u, _tmp_v, _tmp_eid
-
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def slice_by_index(np.ndarray[np.int64_t, ndim=1] u,
+    np.ndarray[np.int64_t, ndim=1] indptr,
+    np.ndarray[np.int64_t, ndim=1] index):
+    cdef list output = []
+    cdef long long i
+    cdef long long h = len(index)
+    cdef long long j
+    for i in xrange(h):
+        j = index[i] 
+        output.append(u[indptr[j]:indptr[j+1]])
+    return np.array(output)

 @cython.boundscheck(False)
 @cython.wraparound(False)
-def map_edges(np.ndarray[np.int32_t, ndim=1] eid,
-        np.ndarray[np.int32_t, ndim=2] edges,
+def map_edges(np.ndarray[np.int64_t, ndim=1] eid,
+        np.ndarray[np.int64_t, ndim=2] edges,
        reindex):
    """Mapping edges by given dictionary
    """
-    cdef unordered_map[int, int] m = reindex
-    cdef int i = 0
-    cdef int h = len(eid)
-    cdef np.ndarray[np.int32_t, ndim=2] r_edges = np.zeros([h, 2], dtype=np.int32)
-    cdef int j
+    cdef unordered_map[long long, long long] m = reindex
+    cdef long long i = 0
+    cdef long long h = len(eid)
+    cdef np.ndarray[np.int64_t, ndim=2] r_edges = np.zeros([h, 2], dtype=np.int64)
+    cdef long long j
    with nogil:
        for i in xrange(h):
            j = eid[i]
@@ -86,31 +93,33 @@ def map_edges(np.ndarray[np.int32_t, ndim=1] eid,
 def map_nodes(nodes, reindex):
    """Mapping nodes by given dictionary
    """
-    cdef unordered_map[int, int] m = reindex
-    cdef int i = 0
-    cdef int h = len(nodes)
-    cdef np.ndarray[np.int32_t, ndim=1] new_nodes = np.zeros([h], dtype=np.int32)
-    cdef int j
-    for i in xrange(h):
-        j = nodes[i]
-        new_nodes[i] = m[j]
+    cdef np.ndarray[np.int64_t, ndim=1] t_nodes = np.array(nodes, dtype=np.int64)
+    cdef unordered_map[long long, long long] m = reindex
+    cdef long long i = 0
+    cdef long long h = len(nodes)
+    cdef np.ndarray[np.int64_t, ndim=1] new_nodes = np.zeros([h], dtype=np.int64)
+    cdef long long j
+    with nogil:
+        for i in xrange(h):
+            j = t_nodes[i]
+            new_nodes[i] = m[j]
    return new_nodes

 @cython.boundscheck(False)
 @cython.wraparound(False)
-def node2vec_sample(np.ndarray[np.int32_t, ndim=1] succ,
-        np.ndarray[np.int32_t, ndim=1] prev_succ, int prev_node, 
+def node2vec_sample(np.ndarray[np.int64_t, ndim=1] succ,
+        np.ndarray[np.int64_t, ndim=1] prev_succ, long long prev_node,
        float p, float q):
    """Fast implement of node2vec sampling
    """
-    cdef int i
+    cdef long long i
    cdef succ_len = len(succ)
    cdef prev_succ_len = len(prev_succ)

    cdef vector[float] probs
    cdef float prob_sum = 0

-    cdef unordered_set[int] prev_succ_set
+    cdef unordered_set[long long] prev_succ_set
    for i in xrange(prev_succ_len):
        prev_succ_set.insert(prev_succ[i])

@@ -127,9 +136,188 @@ def node2vec_sample(np.ndarray[np.int32_t, ndim=1] succ,

    cdef float rand_num = float(rand())/RAND_MAX * prob_sum

-    cdef int sample_succ = 0
+    cdef long long sample_succ = 0
    for i in xrange(succ_len):
        rand_num -= probs[i]
        if rand_num <= 0:
            sample_succ = succ[i]
            return sample_succ
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def subset_choose_index(long long s_size,
+                            np.ndarray[ndim=1, dtype=np.int64_t] nid,
+                            np.ndarray[ndim=1, dtype=np.int64_t] rnd,
+                            np.ndarray[ndim=1, dtype=np.int64_t] buff_nid,
+                           long long offset):
+    cdef long long n_size = len(nid)
+    cdef long long i
+    cdef long long j
+    cdef unordered_map[long long, long long] m
+    with nogil:
+        for i in xrange(s_size):
+            j = rnd[offset + i] % n_size
+            if j >= i:
+                buff_nid[offset + i] = nid[j] if m.find(j) == m.end() else nid[m[j]]
+                m[j] = i if m.find(i) == m.end() else m[i]
+            else:
+                buff_nid[offset + i] = buff_nid[offset + j]
+                buff_nid[offset + j] = nid[i] if m.find(i) == m.end() else nid[m[i]]
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def subset_choose_index_eid(long long s_size,
+                            np.ndarray[ndim=1, dtype=np.int64_t] nid,
+                            np.ndarray[ndim=1, dtype=np.int64_t] eid,
+                            np.ndarray[ndim=1, dtype=np.int64_t] rnd,
+                            np.ndarray[ndim=1, dtype=np.int64_t] buff_nid,
+                            np.ndarray[ndim=1, dtype=np.int64_t] buff_eid,
+                           long long offset):
+    cdef long long n_size = len(nid)
+    cdef long long i
+    cdef long long j
+    cdef unordered_map[long long, long long] m
+    with nogil:
+        for i in xrange(s_size):
+            j = rnd[offset + i] % n_size
+            if j >= i:
+                if m.find(j) == m.end():
+                    buff_nid[offset + i], buff_eid[offset + i] = nid[j], eid[j]
+                else:
+                    buff_nid[offset + i], buff_eid[offset + i] = nid[m[j]], eid[m[j]]
+                m[j] = i if m.find(i) == m.end() else m[i]
+            else:
+                buff_nid[offset + i], buff_eid[offset + i] = buff_nid[offset + j], buff_eid[offset + j]
+                if m.find(i) == m.end():
+                    buff_nid[offset + j], buff_eid[offset + j] = nid[i], eid[i]
+                else:
+                    buff_nid[offset + j], buff_eid[offset + j] = nid[m[i]], eid[m[i]]
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def sample_subset(list nids, long long maxdegree, shuffle=False):
+    cdef np.ndarray[ndim=1, dtype=np.int64_t] buff_index
+    cdef long long buff_size, sample_size
+    cdef long long total_buff_size = 0
+    cdef long long inc = 0
+    cdef list output = []
+    for inc in xrange(len(nids)):
+        buff_size = len(nids[inc])
+        if buff_size > maxdegree:
+            total_buff_size += maxdegree
+        elif shuffle:
+            total_buff_size += buff_size
+    cdef np.ndarray[ndim=1, dtype=np.int64_t] buff_nid = np.zeros([total_buff_size], dtype=np.int64)
+    cdef np.ndarray[np.int64_t, ndim=1] rnd = np.random.randint(0,  np.iinfo(np.int64).max,
+                                                              dtype=np.int64, size=total_buff_size)
+
+    cdef long long offset = 0
+    for inc in xrange(len(nids)):
+        buff_size = len(nids[inc])
+        if not shuffle and buff_size <= maxdegree:
+            output.append(nids[inc])
+        else:
+            sample_size = buff_size if buff_size <= maxdegree else maxdegree
+            if isinstance(nids[inc], list):
+                tmp = np.array(nids[inc], dtype=np.int64)
+            else:
+                tmp = nids[inc]
+            subset_choose_index(sample_size, tmp, rnd, buff_nid, offset)
+            output.append(buff_nid[offset:offset+sample_size])
+            offset += sample_size
+    return output
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def sample_subset_with_eid(list nids, list eids, long long maxdegree, shuffle=False):
+    cdef np.ndarray[ndim=1, dtype=np.int64_t] buff_index
+    cdef long long buff_size, sample_size
+    cdef long long total_buff_size = 0
+    cdef long long inc = 0
+    cdef list output = []
+    cdef list output_eid = []
+    for inc in xrange(len(nids)):
+        buff_size = len(nids[inc])
+        if buff_size > maxdegree:
+            total_buff_size += maxdegree
+        elif shuffle:
+            total_buff_size += buff_size
+    cdef np.ndarray[ndim=1, dtype=np.int64_t] buff_nid = np.zeros([total_buff_size], dtype=np.int64)
+    cdef np.ndarray[ndim=1, dtype=np.int64_t] buff_eid = np.zeros([total_buff_size], dtype=np.int64)
+    cdef np.ndarray[np.int64_t, ndim=1] rnd = np.random.randint(0,  np.iinfo(np.int64).max,
+                                                              dtype=np.int64, size=total_buff_size)
+
+    cdef long long offset = 0
+    for inc in xrange(len(nids)):
+        buff_size = len(nids[inc])
+        if not shuffle and buff_size <= maxdegree:
+            output.append(nids[inc])
+            output_eid.append(eids[inc])
+        else:
+            sample_size = buff_size if buff_size <= maxdegree else maxdegree
+            if isinstance(nids[inc], list):
+                tmp = np.array(nids[inc], dtype=np.int64)
+                tmp_eids = np.array(eids[inc], dtype=np.int64)
+            else:
+                tmp = nids[inc]
+                tmp_eids = eids[inc]
+
+            subset_choose_index_eid(sample_size, tmp, tmp_eids, rnd, buff_nid, buff_eid, offset)
+            output.append(buff_nid[offset:offset+sample_size])
+            output_eid.append(buff_eid[offset:offset+sample_size])
+            offset += sample_size
+    return output, output_eid
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def skip_gram_gen_pair(vector[long long] walk, long win_size=5):
+    cdef vector[long long] src
+    cdef vector[long long] dst
+    cdef long long l = len(walk)
+    cdef long long real_win_size, left, right, i
+    cdef np.ndarray[np.int64_t, ndim=1] rnd = np.random.randint(1,  win_size+1,
+                                    dtype=np.int64, size=l)
+    with nogil:
+        for i in xrange(l):
+            real_win_size = rnd[i]
+            left = i - real_win_size
+            if left < 0:
+                left = 0
+            right = i + real_win_size
+            if right >= l:
+                right = l - 1
+            for j in xrange(left, right+1):
+                if walk[i] == walk[j]:
+                    continue
+                src.push_back(walk[i])
+                dst.push_back(walk[j])
+    return src, dst
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def alias_sample_build_table(np.ndarray[np.float64_t, ndim=1] probs):
+    cdef long long l = len(probs)
+    cdef np.ndarray[np.float64_t, ndim=1] alias = probs * l
+    cdef np.ndarray[np.int64_t, ndim=1] events = np.zeros(l, dtype=np.int64)
+
+    cdef vector[long long] larger_num, smaller_num
+    cdef long long i, s_i, l_i
+    with nogil:
+        for i in xrange(l):
+            if alias[i] > 1:
+                larger_num.push_back(i)
+            elif alias[i] < 1:
+                smaller_num.push_back(i)
+
+        while smaller_num.size() > 0 and larger_num.size() > 0:
+            s_i = smaller_num.back()
+            l_i = larger_num.back()
+            smaller_num.pop_back()
+            events[s_i] = l_i
+            alias[l_i] -= (1 - alias[s_i])
+            if alias[l_i] <= 1:
+                larger_num.pop_back()
+            if alias[l_i] < 1:
+                smaller_num.push_back(l_i)
+    return alias, events
--- a/pgl/graph_wrapper.py
+++ b/pgl/graph_wrapper.py
@@ -36,19 +36,22 @@ def send(src, dst, nfeat, efeat, message_func):
    return msg


-def recv(dst, uniq_dst, bucketing_index, msg, reduce_function, node_ids):
+def recv(dst, uniq_dst, bucketing_index, msg, reduce_function, num_nodes,
+         num_edges):
    """Recv message from given msg to dst nodes.
    """
+    empty_msg_flag = fluid.layers.cast(num_edges > 0, dtype="float32")
    if reduce_function == "sum":
        if isinstance(msg, dict):
            raise TypeError("The message for build-in function"
                            " should be Tensor not dict.")

        try:
-            out_dims = msg.shape[-1]
-            init_output = fluid.layers.fill_constant_batch_size_like(
-                node_ids, shape=[1, out_dims], value=0, dtype="float32")
+            out_dim = msg.shape[-1]
+            init_output = fluid.layers.fill_constant(
+                shape=[num_nodes, out_dim], value=0, dtype="float32")
            init_output.stop_gradient = False
+            msg = msg * empty_msg_flag
            output = paddle_helper.scatter_add(init_output, dst, msg)
            return output
        except TypeError as e:
@@ -60,17 +63,16 @@ def recv(dst, uniq_dst, bucketing_index, msg, reduce_function, node_ids):

            reduce_function = sum_func

-    # convert msg into lodtensor
    bucketed_msg = op.nested_lod_reset(msg, bucketing_index)
-    # Check dim for bucketed_msg equal to out_dims
    output = reduce_function(bucketed_msg)
-    out_dims = output.shape[-1]
+    output_dim = output.shape[-1]
+    output = output * empty_msg_flag

-    init_output = fluid.layers.fill_constant_batch_size_like(
-        node_ids, shape=[1, out_dims], value=0, dtype="float32")
-    init_output.stop_gradient = False
-    output = fluid.layers.scatter(init_output, uniq_dst, output)
-    return output
+    init_output = fluid.layers.fill_constant(
+        shape=[num_nodes, output_dim], value=0, dtype="float32")
+    init_output.stop_gradient = True
+    final_output = fluid.layers.scatter(init_output, uniq_dst, output)
+    return final_output


 class BaseGraphWrapper(object):
@@ -89,16 +91,21 @@ class BaseGraphWrapper(object):
    """

    def __init__(self):
-        self._node_feat_tensor_dict = {}
-        self._edge_feat_tensor_dict = {}
+        self.node_feat_tensor_dict = {}
+        self.edge_feat_tensor_dict = {}
        self._edges_src = None
        self._edges_dst = None
        self._num_nodes = None
        self._indegree = None
        self._edge_uniq_dst = None
        self._edge_uniq_dst_count = None
-        self._bucketing_index = None
        self._node_ids = None
+        self._graph_lod = None
+        self._num_graph = None
+        self._data_name_prefix = ""
+
+    def __repr__(self):
+        return self._data_name_prefix

    def send(self, message_func, nfeat_list=None, efeat_list=None):
        """Send message from all src nodes to dst nodes.
@@ -188,10 +195,11 @@ class BaseGraphWrapper(object):
        output = recv(
            dst=self._edges_dst,
            uniq_dst=self._edge_uniq_dst,
-            bucketing_index=self._bucketing_index,
+            bucketing_index=self._edge_uniq_dst_count,
            msg=msg,
            reduce_function=reduce_function,
-            node_ids=self._node_ids)
+            num_edges=self._num_edges,
+            num_nodes=self._num_nodes)
        return output

    @property
@@ -200,7 +208,7 @@ class BaseGraphWrapper(object):

        Return:
            A tuple of Tensor (src, dst). Src and dst are both
-            tensor with shape (num_edges, ) and dtype int32.
+            tensor with shape (num_edges, ) and dtype int64.
        """
        return self._edges_src, self._edges_dst

@@ -209,10 +217,28 @@ class BaseGraphWrapper(object):
        """Return a variable of number of nodes

        Return:
-            A variable with shape (1,) as the number of nodes in int32.
+            A variable with shape (1,) as the number of nodes in int64.
        """
        return self._num_nodes

+    @property
+    def graph_lod(self):
+        """Return graph index for graphs
+
+        Return:
+            A variable with shape [None ]  as the Lod information of multiple-graph.
+        """
+        return self._graph_lod
+
+    @property
+    def num_graph(self):
+        """Return a variable of number of graphs
+
+        Return:
+            A variable with shape (1,) as the number of Graphs in int64.
+        """
+        return self._num_graph
+
    @property
    def edge_feat(self):
        """Return a dictionary of tensor representing edge features.
@@ -221,7 +247,7 @@ class BaseGraphWrapper(object):
            A dictionary whose keys are the feature names and the values
            are feature tensor.
        """
-        return self._edge_feat_tensor_dict
+        return self.edge_feat_tensor_dict

    @property
    def node_feat(self):
@@ -231,13 +257,13 @@ class BaseGraphWrapper(object):
            A dictionary whose keys are the feature names and the values
            are feature tensor.
        """
-        return self._node_feat_tensor_dict
+        return self.node_feat_tensor_dict

    def indegree(self):
        """Return the indegree tensor for all nodes.

        Return:
-            A tensor of shape (num_nodes, ) in int32.
+            A tensor of shape (num_nodes, ) in int64.
        """
        return self._indegree

@@ -252,7 +278,7 @@ class StaticGraphWrapper(BaseGraphWrapper):

        graph: The static graph that should be put into memory

-        place: fluid.CPUPlace or fluid.GPUPlace(n) indicating the
+        place: fluid.CPUPlace or fluid.CUDAPlace(n) indicating the
               device to hold the graph data.

    Examples:
@@ -299,19 +325,31 @@ class StaticGraphWrapper(BaseGraphWrapper):

    def __init__(self, name, graph, place):
        super(StaticGraphWrapper, self).__init__()
+        self._data_name_prefix = name
        self._initializers = []
-        self.__data_name_prefix = name
        self.__create_graph_attr(graph)

    def __create_graph_attr(self, graph):
        """Create graph attributes for paddlepaddle.
        """
-        src, dst = list(zip(*graph.edges))
        src, dst, eid = graph.sorted_edges(sort_by="dst")
        indegree = graph.indegree()
        nodes = graph.nodes
        uniq_dst = nodes[indegree > 0]
        uniq_dst_count = indegree[indegree > 0]
+        uniq_dst_count = np.cumsum(uniq_dst_count, dtype='int32')
+        uniq_dst_count = np.insert(uniq_dst_count, 0, 0)
+        graph_lod = graph.graph_lod
+        num_graph = graph.num_graph
+
+        num_edges = len(src)
+        if num_edges == 0:
+            # Fake Graph
+            src = np.array([0], dtype="int64")
+            dst = np.array([0], dtype="int64")
+            eid = np.array([0], dtype="int64")
+            uniq_dst_count = np.array([0, 1], dtype="int32")
+            uniq_dst = np.array([0], dtype="int64")

        edge_feat = {}

@@ -322,57 +360,67 @@ class StaticGraphWrapper(BaseGraphWrapper):
        self.__create_graph_node_feat(node_feat, self._initializers)
        self.__create_graph_edge_feat(edge_feat, self._initializers)

+        self._num_edges, init = paddle_helper.constant(
+            dtype="int64",
+            value=np.array(
+                [num_edges], dtype="int64"),
+            name=self._data_name_prefix + '/num_edges')
+        self._initializers.append(init)
+
+        self._num_graph, init = paddle_helper.constant(
+            dtype="int64",
+            value=np.array(
+                [num_graph], dtype="int64"),
+            name=self._data_name_prefix + '/num_graph')
+        self._initializers.append(init)
+
        self._edges_src, init = paddle_helper.constant(
-            dtype="int32",
+            dtype="int64",
            value=src,
-            name=self.__data_name_prefix + '_edges_src')
+            name=self._data_name_prefix + '/edges_src')
        self._initializers.append(init)

        self._edges_dst, init = paddle_helper.constant(
-            dtype="int32",
+            dtype="int64",
            value=dst,
-            name=self.__data_name_prefix + '_edges_dst')
+            name=self._data_name_prefix + '/edges_dst')
        self._initializers.append(init)

        self._num_nodes, init = paddle_helper.constant(
-            dtype="int32",
+            dtype="int64",
            hide_batch_size=False,
            value=np.array([graph.num_nodes]),
-            name=self.__data_name_prefix + '_num_nodes')
+            name=self._data_name_prefix + '/num_nodes')
        self._initializers.append(init)

        self._edge_uniq_dst, init = paddle_helper.constant(
-            name=self.__data_name_prefix + "_uniq_dst",
-            dtype="int32",
+            name=self._data_name_prefix + "/uniq_dst",
+            dtype="int64",
            value=uniq_dst)
        self._initializers.append(init)

        self._edge_uniq_dst_count, init = paddle_helper.constant(
-            name=self.__data_name_prefix + "_uniq_dst_count",
+            name=self._data_name_prefix + "/uniq_dst_count",
            dtype="int32",
            value=uniq_dst_count)
        self._initializers.append(init)

-        bucket_value = np.expand_dims(
-            np.arange(
-                0, len(dst), dtype="int32"), -1)
-        self._bucketing_index, init = paddle_helper.lod_constant(
-            name=self.__data_name_prefix + "_bucketing_index",
+        self._graph_lod, init = paddle_helper.constant(
+            name=self._data_name_prefix + "/graph_lod",
            dtype="int32",
-            lod=list(uniq_dst_count),
-            value=bucket_value)
+            value=graph_lod)
        self._initializers.append(init)

-        node_ids_value = np.arange(0, graph.num_nodes, dtype="int32")
+        node_ids_value = np.arange(0, graph.num_nodes, dtype="int64")
        self._node_ids, init = paddle_helper.constant(
-            name=self.__data_name_prefix + "_node_ids",
-            dtype="int32",
+            name=self._data_name_prefix + "/node_ids",
+            dtype="int64",
            value=node_ids_value)
        self._initializers.append(init)

        self._indegree, init = paddle_helper.constant(
-            name=self.__data_name_prefix + "_indegree",
-            dtype="int32",
+            name=self._data_name_prefix + "/indegree",
+            dtype="int64",
            value=indegree)
        self._initializers.append(init)

@@ -382,9 +430,10 @@ class StaticGraphWrapper(BaseGraphWrapper):
        for node_feat_name, node_feat_value in node_feat.items():
            node_feat_shape = node_feat_value.shape
            node_feat_dtype = node_feat_value.dtype
-            self._node_feat_tensor_dict[
+            self.node_feat_tensor_dict[
                node_feat_name], init = paddle_helper.constant(
-                    name=self.__data_name_prefix + '_' + node_feat_name,
+                    name=self._data_name_prefix + '/node_feat/' +
+                    node_feat_name,
                    dtype=node_feat_dtype,
                    value=node_feat_value)
            collector.append(init)
@@ -395,9 +444,10 @@ class StaticGraphWrapper(BaseGraphWrapper):
        for edge_feat_name, edge_feat_value in edge_feat.items():
            edge_feat_shape = edge_feat_value.shape
            edge_feat_dtype = edge_feat_value.dtype
-            self._edge_feat_tensor_dict[
+            self.edge_feat_tensor_dict[
                edge_feat_name], init = paddle_helper.constant(
-                    name=self.__data_name_prefix + '_' + edge_feat_name,
+                    name=self._data_name_prefix + '/edge_feat/' +
+                    edge_feat_name,
                    dtype=edge_feat_dtype,
                    value=edge_feat_value)
            collector.append(init)
@@ -406,7 +456,7 @@ class StaticGraphWrapper(BaseGraphWrapper):
        """Placing the graph data into the devices.

        Args:
-            place: fluid.CPUPlace or fluid.GPUPlace(n) indicating the
+            place: fluid.CPUPlace or fluid.CUDAPlace(n) indicating the
                   device to hold the graph data.
        """
        log.info(
@@ -425,7 +475,7 @@ class GraphWrapper(BaseGraphWrapper):
    Args:
        name: The graph data prefix

-        place: fluid.CPUPlace or fluid.GPUPlace(n) indicating the
+        place: fluid.CPUPlace or fluid.CUDAPlace(n) indicating the
               device to hold the graph data.

        node_feat: A list of tuples that decribe the details of node
@@ -483,7 +533,9 @@ class GraphWrapper(BaseGraphWrapper):

    def __init__(self, name, place, node_feat=[], edge_feat=[]):
        super(GraphWrapper, self).__init__()
-        self.__data_name_prefix = name
+        # collect holders for PyReader
+        self._data_name_prefix = name
+        self._holder_list = []
        self._place = place
        self.__create_graph_attr_holders()
        for node_feat_name, node_feat_shape, node_feat_dtype in node_feat:
@@ -497,79 +549,108 @@ class GraphWrapper(BaseGraphWrapper):
    def __create_graph_attr_holders(self):
        """Create data holders for graph attributes.
        """
+        self._num_edges = fluid.layers.data(
+            self._data_name_prefix + '/num_edges',
+            shape=[1],
+            append_batch_size=False,
+            dtype="int64",
+            stop_gradient=True)
+        self._num_graph = fluid.layers.data(
+            self._data_name_prefix + '/num_graph',
+            shape=[1],
+            append_batch_size=False,
+            dtype="int64",
+            stop_gradient=True)
        self._edges_src = fluid.layers.data(
-            self.__data_name_prefix + '_edges_src',
+            self._data_name_prefix + '/edges_src',
            shape=[None],
            append_batch_size=False,
-            dtype="int32",
+            dtype="int64",
            stop_gradient=True)
        self._edges_dst = fluid.layers.data(
-            self.__data_name_prefix + '_edges_dst',
+            self._data_name_prefix + '/edges_dst',
            shape=[None],
            append_batch_size=False,
-            dtype="int32",
+            dtype="int64",
            stop_gradient=True)
        self._num_nodes = fluid.layers.data(
-            self.__data_name_prefix + '_num_nodes',
+            self._data_name_prefix + '/num_nodes',
            shape=[1],
            append_batch_size=False,
-            dtype='int32',
+            dtype='int64',
            stop_gradient=True)
+
        self._edge_uniq_dst = fluid.layers.data(
-            self.__data_name_prefix + "_uniq_dst",
+            self._data_name_prefix + "/uniq_dst",
            shape=[None],
            append_batch_size=False,
-            dtype="int32",
+            dtype="int64",
            stop_gradient=True)
-        self._edge_uniq_dst_count = fluid.layers.data(
-            self.__data_name_prefix + "_uniq_dst_count",
+
+        self._graph_lod = fluid.layers.data(
+            self._data_name_prefix + "/graph_lod",
            shape=[None],
            append_batch_size=False,
            dtype="int32",
            stop_gradient=True)
-        self._bucketing_index = fluid.layers.data(
-            self.__data_name_prefix + "_bucketing_index",
-            shape=[None, 1],
+
+        self._edge_uniq_dst_count = fluid.layers.data(
+            self._data_name_prefix + "/uniq_dst_count",
+            shape=[None],
            append_batch_size=False,
            dtype="int32",
-            lod_level=1,
            stop_gradient=True)
+
        self._node_ids = fluid.layers.data(
-            self.__data_name_prefix + "_node_ids",
+            self._data_name_prefix + "/node_ids",
            shape=[None],
            append_batch_size=False,
-            dtype="int32",
+            dtype="int64",
            stop_gradient=True)
        self._indegree = fluid.layers.data(
-            self.__data_name_prefix + "_indegree",
+            self._data_name_prefix + "/indegree",
            shape=[None],
            append_batch_size=False,
-            dtype="int32",
+            dtype="int64",
            stop_gradient=True)
+        self._holder_list.extend([
+            self._edges_src,
+            self._edges_dst,
+            self._num_nodes,
+            self._edge_uniq_dst,
+            self._edge_uniq_dst_count,
+            self._node_ids,
+            self._indegree,
+            self._graph_lod,
+            self._num_graph,
+            self._num_edges,
+        ])

    def __create_graph_node_feat_holders(self, node_feat_name, node_feat_shape,
                                         node_feat_dtype):
        """Create data holders for node features.
        """
        feat_holder = fluid.layers.data(
-            self.__data_name_prefix + '_' + node_feat_name,
+            self._data_name_prefix + '/node_feat/' + node_feat_name,
            shape=node_feat_shape,
            append_batch_size=False,
            dtype=node_feat_dtype,
            stop_gradient=True)
-        self._node_feat_tensor_dict[node_feat_name] = feat_holder
+        self.node_feat_tensor_dict[node_feat_name] = feat_holder
+        self._holder_list.append(feat_holder)

    def __create_graph_edge_feat_holders(self, edge_feat_name, edge_feat_shape,
                                         edge_feat_dtype):
        """Create edge holders for edge features.
        """
        feat_holder = fluid.layers.data(
-            self.__data_name_prefix + '_' + edge_feat_name,
+            self._data_name_prefix + '/edge_feat/' + edge_feat_name,
            shape=edge_feat_shape,
            append_batch_size=False,
            dtype=edge_feat_dtype,
            stop_gradient=True)
-        self._edge_feat_tensor_dict[edge_feat_name] = feat_holder
+        self.edge_feat_tensor_dict[edge_feat_name] = feat_holder
+        self._holder_list.append(feat_holder)

    def to_feed(self, graph):
        """Convert the graph into feed_dict.
@@ -588,8 +669,22 @@ class GraphWrapper(BaseGraphWrapper):
        src, dst, eid = graph.sorted_edges(sort_by="dst")
        indegree = graph.indegree()
        nodes = graph.nodes
+        num_edges = len(src)
        uniq_dst = nodes[indegree > 0]
        uniq_dst_count = indegree[indegree > 0]
+        uniq_dst_count = np.cumsum(uniq_dst_count, dtype='int32')
+        uniq_dst_count = np.insert(uniq_dst_count, 0, 0)
+        num_graph = graph.num_graph
+        graph_lod = graph.graph_lod
+
+        if num_edges == 0:
+            # Fake Graph
+            src = np.array([0], dtype="int64")
+            dst = np.array([0], dtype="int64")
+            eid = np.array([0], dtype="int64")
+
+            uniq_dst_count = np.array([0, 1], dtype="int32")
+            uniq_dst = np.array([0], dtype="int64")

        edge_feat = {}

@@ -597,21 +692,33 @@ class GraphWrapper(BaseGraphWrapper):
            edge_feat[key] = value[eid]
        node_feat = graph.node_feat

-        feed_dict[self.__data_name_prefix + '_edges_src'] = src
-        feed_dict[self.__data_name_prefix + '_edges_dst'] = dst
-        feed_dict[self.__data_name_prefix + '_num_nodes'] = graph.num_nodes
-        feed_dict[self.__data_name_prefix + '_uniq_dst'] = uniq_dst
-        feed_dict[self.__data_name_prefix + '_uniq_dst_count'] = uniq_dst_count
-        feed_dict[self.__data_name_prefix + '_node_ids'] = graph.nodes
-        feed_dict[self.__data_name_prefix + '_indegree'] = indegree
-        feed_dict[self.__data_name_prefix + '_bucketing_index'] = \
-                fluid.create_lod_tensor(np.expand_dims(np.arange(0, len(dst), dtype="int32"), -1),
-                [list(uniq_dst_count)],  self._place)
-
-        for key in self._node_feat_tensor_dict:
-            feed_dict[self.__data_name_prefix + '_' + key] = node_feat[key]
-
-        for key in self._edge_feat_tensor_dict:
-            feed_dict[self.__data_name_prefix + '_' + key] = edge_feat[key]
+        feed_dict[self._data_name_prefix + '/num_edges'] = np.array(
+            [num_edges], dtype="int64")
+        feed_dict[self._data_name_prefix + '/edges_src'] = src
+        feed_dict[self._data_name_prefix + '/edges_dst'] = dst
+        feed_dict[self._data_name_prefix + '/num_nodes'] = np.array(
+            [graph.num_nodes], dtype="int64")
+        feed_dict[self._data_name_prefix + '/uniq_dst'] = uniq_dst
+        feed_dict[self._data_name_prefix + '/uniq_dst_count'] = uniq_dst_count
+        feed_dict[self._data_name_prefix + '/node_ids'] = graph.nodes
+        feed_dict[self._data_name_prefix + '/indegree'] = indegree
+        feed_dict[self._data_name_prefix + '/graph_lod'] = graph_lod
+        feed_dict[self._data_name_prefix + '/num_graph'] = np.array(
+            [num_graph], dtype="int64")
+        feed_dict[self._data_name_prefix + '/indegree'] = indegree
+
+        for key in self.node_feat_tensor_dict:
+            feed_dict[self._data_name_prefix + '/node_feat/' +
+                      key] = node_feat[key]
+
+        for key in self.edge_feat_tensor_dict:
+            feed_dict[self._data_name_prefix + '/edge_feat/' +
+                      key] = edge_feat[key]

        return feed_dict
+
+    @property
+    def holder_list(self):
+        """Return the holder list.
+        """
+        return self._holder_list
--- a/pgl/heter_graph.py
+++ b/pgl/heter_graph.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    This package implement Heterogeneous Graph structure for handling Heterogeneous graph data.
+"""
+import time
+import numpy as np
+import pickle as pkl
+import time
+import pgl.graph_kernel as graph_kernel
+from pgl.graph import Graph
+
+__all__ = ['HeterGraph', 'SubHeterGraph']
+
+
+def _hide_num_nodes(shape):
+    """Set the first dimension as unknown
+    """
+    shape = list(shape)
+    shape[0] = None
+    return shape
+
+
+class HeterGraph(object):
+    """Implementation of heterogeneous graph structure in pgl
+
+    This is a simple implementation of heterogeneous graph structure in pgl.
+
+    Args:
+        num_nodes: number of nodes in a heterogeneous graph
+        edges: dict, every element in dict is a list of (u, v) tuples.
+        node_types (optional): list of (u, node_type) tuples to specify the node type of every node
+        node_feat (optional): a dict of numpy array as node features
+        edge_feat (optional): a dict of dict as edge features for every edge type
+
+    Examples:
+        .. code-block:: python
+
+            import numpy as np
+            num_nodes = 4
+            node_types = [(0, 'user'), (1, 'item'), (2, 'item'), (3, 'user')]
+            edges = {
+                'edges_type1': [(0,1), (3,2)],
+                'edges_type2': [(1,2), (3,1)],
+            }
+            node_feat = {'feature': np.random.randn(4, 16)}
+            edges_feat = {
+                'edges_type1': {'h': np.random.randn(2, 16)},
+                'edges_type2': {'h': np.random.randn(2, 16)},
+            }
+
+            g = heter_graph.HeterGraph(
+                            num_nodes=num_nodes,
+                            edges=edges,
+                            node_types=node_types,
+                            node_feat=node_feat,
+                            edge_feat=edges_feat)
+    """
+
+    def __init__(self,
+                 num_nodes,
+                 edges,
+                 node_types=None,
+                 node_feat=None,
+                 edge_feat=None):
+        self._num_nodes = num_nodes
+        self._edges_dict = edges
+
+        if isinstance(node_types, list):
+            self._node_types = np.array(node_types, dtype=object)[:, 1]
+        else:
+            self._node_types = node_types
+
+        self._nodes_type_dict = {}
+        for n_type in np.unique(self._node_types):
+            self._nodes_type_dict[n_type] = np.where(
+                self._node_types == n_type)[0]
+
+        if node_feat is not None:
+            self._node_feat = node_feat
+        else:
+            self._node_feat = {}
+
+        if edge_feat is not None:
+            self._edge_feat = edge_feat
+        else:
+            self._edge_feat = {}
+
+        self._multi_graph = {}
+
+        for key, value in self._edges_dict.items():
+            if not self._edge_feat:
+                edge_feat = None
+            else:
+                edge_feat = self._edge_feat[key]
+
+            self._multi_graph[key] = Graph(
+                num_nodes=self._num_nodes,
+                edges=value,
+                node_feat=self._node_feat,
+                edge_feat=edge_feat)
+
+        self._edge_types = self.edge_types_info()
+
+    @property
+    def edge_types(self):
+        """Return a list of edge types.
+        """
+        return self._edge_types
+
+    @property
+    def num_nodes(self):
+        """Return the number of nodes.
+        """
+        return self._num_nodes
+
+    @property
+    def num_edges(self):
+        """Return edges number of all edge types.
+        """
+        n_edges = {}
+        for e_type in self._edge_types:
+            n_edges[e_type] = self._multi_graph[e_type].num_edges
+        return n_edges
+
+    @property
+    def node_types(self):
+        """Return the node types.
+        """
+        return self._node_types
+
+    @property
+    def edge_feat(self, edge_type=None):
+        """Return edge features of all edge types.
+        """
+        return self._edge_feat
+
+    @property
+    def node_feat(self):
+        """Return a dictionary of node features.
+        """
+        return self._node_feat
+
+    @property
+    def nodes(self):
+        """Return all nodes id from 0 to :code:`num_nodes - 1`
+        """
+        return np.arange(self._num_nodes, dtype='int64')
+
+    def __getitem__(self, edge_type):
+        """__getitem__
+        """
+        return self._multi_graph[edge_type]
+
+    def num_nodes_by_type(self, n_type=None):
+        """Return the number of nodes with the specified node type.
+        """
+        if n_type not in self._nodes_type_dict:
+            raise ("%s is not in valid node type" % n_type)
+        else:
+            return len(self._nodes_type_dict[n_type])
+
+    def indegree(self, nodes=None, edge_type=None):
+        """Return the indegree of the given nodes with the specified edge_type.
+
+        Args:
+            nodes: Return the indegree of given nodes.
+                    if nodes is None, return indegree for all nodes.
+
+            edge_types: Return the indegree with specified edge_type.
+                    if edge_type is None, return the total indegree of the given nodes.
+
+        Return:
+            A numpy.ndarray as the given nodes' indegree.
+        """
+        if edge_type is None:
+            indegrees = []
+            for e_type in self._edge_types:
+                indegrees.append(self._multi_graph[e_type].indegree(nodes))
+            indegrees = np.sum(np.vstack(indegrees), axis=0)
+            return indegrees
+        else:
+            return self._multi_graph[edge_type].indegree(nodes)
+
+    def outdegree(self, nodes=None, edge_type=None):
+        """Return the outdegree of the given nodes with the specified edge_type.
+
+        Args:
+            nodes: Return the outdegree of given nodes,
+                   if nodes is None, return outdegree for all nodes
+
+            edge_types: Return the outdegree with specified edge_type.
+                    if edge_type is None, return the total outdegree of the given nodes.
+
+        Return:
+            A numpy.array as the given nodes' outdegree.
+        """
+        if edge_type is None:
+            outdegrees = []
+            for e_type in self._edge_types:
+                outdegrees.append(self._multi_graph[e_type].outdegree(nodes))
+            outdegrees = np.sum(np.vstack(outdegrees), axis=0)
+            return outdegrees
+        else:
+            return self._multi_graph[edge_type].outdegree(nodes)
+
+    def successor(self, edge_type, nodes=None, return_eids=False):
+        """Find successor of given nodes with the specified edge_type.
+
+        Args:
+            nodes: Return the successor of given nodes,
+                   if nodes is None, return successor for all nodes
+
+            edge_types: Return the successor with specified edge_type.
+                    if edge_type is None, return the total successor of the given nodes
+                    and eids are invalid in this way.
+
+            return_eids: If True return nodes together with corresponding eid
+        """
+        return self._multi_graph[edge_type].successor(nodes, return_eids)
+
+    def sample_successor(self,
+                         edge_type,
+                         nodes,
+                         max_degree,
+                         return_eids=False,
+                         shuffle=False):
+        """Sample successors of given nodes with the specified edge_type.
+
+        Args:
+            edge_type: The specified edge_type.
+
+            nodes: Given nodes whose successors will be sampled.
+
+            max_degree: The max sampled successors for each nodes.
+
+            return_eids: Whether to return the corresponding eids.
+
+        Return:
+
+            Return a list of numpy.ndarray and each numpy.ndarray represent a list
+            of sampled successor ids for given nodes with specified edge type. 
+            If :code:`return_eids=True`, there will be an additional list of 
+            numpy.ndarray and each numpy.ndarray represent a list of eids that 
+            connected nodes to their successors.
+        """
+        return self._multi_graph[edge_type].sample_successor(
+            nodes=nodes,
+            max_degree=max_degree,
+            return_eids=return_eids,
+            shuffle=shuffle)
+
+    def predecessor(self, edge_type, nodes=None, return_eids=False):
+        """Find predecessor of given nodes with the specified edge_type.
+
+        Args:
+            nodes: Return the predecessor of given nodes,
+                   if nodes is None, return predecessor for all nodes
+
+            edge_types: Return the predecessor with specified edge_type.
+
+            return_eids: If True return nodes together with corresponding eid
+        """
+        return self._multi_graph[edge_type].predecessor(nodes, return_eids)
+
+    def sample_predecessor(self,
+                           edge_type,
+                           nodes,
+                           max_degree,
+                           return_eids=False,
+                           shuffle=False):
+        """Sample predecessors of given nodes with the specified edge_type.
+
+        Args:
+            edge_type: The specified edge_type.
+
+            nodes: Given nodes whose predecessors will be sampled.
+
+            max_degree: The max sampled predecessors for each nodes.
+
+            return_eids: Whether to return the corresponding eids.
+
+        Return:
+
+            Return a list of numpy.ndarray and each numpy.ndarray represent a list
+            of sampled predecessor ids for given nodes with specified edge type. 
+            If :code:`return_eids=True`, there will be an additional list of 
+            numpy.ndarray and each numpy.ndarray represent a list of eids that 
+            connected nodes to their predecessors.
+        """
+        return self._multi_graph[edge_type].sample_predecessor(
+            nodes=nodes,
+            max_degree=max_degree,
+            return_eids=return_eids,
+            shuffle=shuffle)
+
+    def node_batch_iter(self, batch_size, shuffle=True, n_type=None):
+        """Node batch iterator
+
+        Iterate all nodes by batch with the specified node type.
+
+        Args:
+            batch_size: The batch size of each batch of nodes.
+
+            shuffle: Whether shuffle the nodes.
+            
+            n_type: Iterate the nodes with the specified node type. If n_type is None, 
+                    iterate all nodes by batch.
+
+        Return:
+            Batch iterator
+        """
+        if n_type is None:
+            nodes = np.arange(self._num_nodes, dtype="int64")
+        else:
+            nodes = self._nodes_type_dict[n_type]
+
+        if shuffle:
+            np.random.shuffle(nodes)
+        start = 0
+        while start < len(nodes):
+            yield nodes[start:start + batch_size]
+            start += batch_size
+
+    def sample_nodes(self, sample_num, n_type=None):
+        """Sample nodes with the specified n_type from the graph
+
+        This function helps to sample nodes with the specified n_type from the graph.
+        If n_type is None, this function will sample nodes from all nodes.
+        Nodes might be duplicated.
+
+        Args:
+            sample_num: The number of samples
+            n_type: The nodes of type to be sampled
+
+        Return:
+            A list of nodes
+        """
+        if n_type is not None:
+            return np.random.choice(
+                self._nodes_type_dict[n_type], size=sample_num)
+        else:
+            return np.random.randint(
+                low=0, high=self._num_nodes, size=sample_num)
+
+    def node_feat_info(self):
+        """Return the information of node feature for HeterGraphWrapper.
+
+        This function return the information of node features of all node types. And this
+        function is used to help constructing HeterGraphWrapper
+
+        Return:
+            A list of tuple (name, shape, dtype) for all given node feature.
+
+        """
+        node_feat_info = []
+        for feat_name, feat in self._node_feat.items():
+            node_feat_info.append(
+                (feat_name, _hide_num_nodes(feat.shape), feat.dtype))
+
+        return node_feat_info
+
+    def edge_feat_info(self):
+        """Return the information of edge feature for HeterGraphWrapper.
+
+        This function return the information of edge features of all edge types. And this
+        function is used to help constructing HeterGraphWrapper
+
+        Return:
+            A dict of list of tuple (name, shape, dtype) for all given edge feature.
+
+        """
+        edge_feat_info = {}
+        for edge_type_name, feat_dict in self._edge_feat.items():
+            tmp_edge_feat_info = []
+            for feat_name, feat in feat_dict.items():
+                full_name = feat_name
+                tmp_edge_feat_info.append(
+                    (full_name, _hide_num_nodes(feat.shape), feat.dtype))
+            edge_feat_info[edge_type_name] = tmp_edge_feat_info
+        return edge_feat_info
+
+    def edge_types_info(self):
+        """Return the information of all edge types.
+        
+        Return:
+            A list of all edge types.
+        
+        """
+        edge_types_info = []
+        for key, _ in self._edges_dict.items():
+            edge_types_info.append(key)
+
+        return edge_types_info
+
+
+class SubHeterGraph(HeterGraph):
+    """Implementation of SubHeterGraph in pgl.
+
+    SubHeterGraph is inherit from :code:`HeterGraph`. 
+
+    Args:
+        num_nodes: number of nodes in a heterogeneous graph
+        edges: dict, every element in dict is a list of (u, v) tuples.
+        node_types (optional): list of (u, node_type) tuples to specify the node type of every node
+        node_feat (optional): a dict of numpy array as node features
+        edge_feat (optional): a dict of dict as edge features for every edge type
+
+        reindex: A dictionary that maps parent hetergraph node id to subhetergraph node id.
+    """
+
+    def __init__(self,
+                 num_nodes,
+                 edges,
+                 node_types=None,
+                 node_feat=None,
+                 edge_feat=None,
+                 reindex=None):
+        super(SubHeterGraph, self).__init__(
+            num_nodes=num_nodes,
+            edges=edges,
+            node_types=node_types,
+            node_feat=node_feat,
+            edge_feat=edge_feat)
+
+        if reindex is None:
+            reindex = {}
+        self._from_reindex = reindex
+        self._to_reindex = {u: v for v, u in reindex.items()}
+
+    def reindex_from_parrent_nodes(self, nodes):
+        """Map the given parent graph node id to subgraph id.
+
+        Args:
+            nodes: A list of nodes from parent graph.
+
+        Return:
+            A list of subgraph ids.
+        """
+        return graph_kernel.map_nodes(nodes, self._from_reindex)
+
+    def reindex_to_parrent_nodes(self, nodes):
+        """Map the given subgraph node id to parent graph id.
+
+        Args:
+            nodes: A list of nodes in this subgraph.
+
+        Return:
+            A list of node ids in parent graph.
+        """
+        return graph_kernel.map_nodes(nodes, self._to_reindex)
--- a/pgl/heter_graph_wrapper.py
+++ b/pgl/heter_graph_wrapper.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This package provides interface to help building static computational graph
+for PaddlePaddle.
+"""
+
+import warnings
+import numpy as np
+import paddle.fluid as fluid
+
+from pgl.utils import op
+from pgl.utils import paddle_helper
+from pgl.utils.logger import log
+from pgl.graph_wrapper import GraphWrapper
+
+ALL = "__ALL__"
+__all__ = ["HeterGraphWrapper"]
+
+
+def is_all(arg):
+    """is_all
+    """
+    return isinstance(arg, str) and arg == ALL
+
+
+class HeterGraphWrapper(object):
+    """Implement a heterogeneous graph wrapper that creates a graph data holders
+    that attributes and features in the heterogeneous graph.
+    And we provide interface :code:`to_feed` to help converting :code:`Graph`
+    data into :code:`feed_dict`.
+
+    Args:
+        name: The heterogeneous graph data prefix
+
+        place: fluid.CPUPlace or fluid.CUDAPlace(n) indicating the
+               device to hold the graph data.
+
+        node_feat: A dict of list of tuples that decribe the details of node
+                   feature tenosr. Each tuple mush be (name, shape, dtype)
+                   and the first dimension of the shape must be set unknown
+                   (-1 or None) or we can easily use :code:`HeterGraph.node_feat_info()`
+                   to get the node_feat settings.
+
+        edge_feat: A dict of list of tuples that decribe the details of edge
+                   feature tenosr. Each tuple mush be (name, shape, dtype)
+                   and the first dimension of the shape must be set unknown
+                   (-1 or None) or we can easily use :code:`HeterGraph.edge_feat_info()`
+                   to get the edge_feat settings.
+                   
+    Examples:
+        .. code-block:: python
+
+            import paddle.fluid as fluid
+            import numpy as np
+            from pgl import heter_graph
+            from pgl import heter_graph_wrapper
+            num_nodes = 4
+            node_types = [(0, 'user'), (1, 'item'), (2, 'item'), (3, 'user')]
+            edges = {
+                'edges_type1': [(0,1), (3,2)],
+                'edges_type2': [(1,2), (3,1)],
+            }
+            node_feat = {'feature': np.random.randn(4, 16)}
+            edges_feat = {
+                'edges_type1': {'h': np.random.randn(2, 16)},
+                'edges_type2': {'h': np.random.randn(2, 16)},
+            }
+
+            g = heter_graph.HeterGraph(
+                            num_nodes=num_nodes,
+                            edges=edges,
+                            node_types=node_types,
+                            node_feat=node_feat,
+                            edge_feat=edges_feat)
+           
+            place = fluid.CPUPlace()
+
+            gw = heter_graph_wrapper.HeterGraphWrapper(
+                                name='heter_graph', 
+                                place = place, 
+                                edge_types = g.edge_types_info(),
+                                node_feat=g.node_feat_info(),
+                                edge_feat=g.edge_feat_info())
+    """
+
+    def __init__(self, name, place, edge_types, node_feat={}, edge_feat={}):
+        self.__data_name_prefix = name
+        self._place = place
+        self._edge_types = edge_types
+        self._multi_gw = {}
+        for edge_type in self._edge_types:
+            type_name = self.__data_name_prefix + '/' + edge_type
+            if node_feat:
+                n_feat = node_feat
+            else:
+                n_feat = {}
+
+            if edge_feat:
+                e_feat = edge_feat[edge_type]
+            else:
+                e_feat = {}
+
+            self._multi_gw[edge_type] = GraphWrapper(
+                name=type_name,
+                place=self._place,
+                node_feat=n_feat,
+                edge_feat=e_feat)
+
+    def to_feed(self, heterGraph, edge_types_list=ALL):
+        """Convert the graph into feed_dict.
+
+        This function helps to convert graph data into feed dict
+        for :code:`fluid.Excecutor` to run the model.
+
+        Args:
+            heterGraph: the :code:`HeterGraph` data object
+            edge_types_list: the edge types list to be fed
+
+        Return:
+            A dictinary contains data holder names and its coresponding data.
+        """
+        multi_graphs = heterGraph._multi_graph
+        if is_all(edge_types_list):
+            edge_types_list = self._edge_types
+
+        feed_dict = {}
+        for edge_type in edge_types_list:
+            feed_d = self._multi_gw[edge_type].to_feed(multi_graphs[edge_type])
+            feed_dict.update(feed_d)
+
+        return feed_dict
+
+    def __getitem__(self, edge_type):
+        """__getitem__
+        """
+        return self._multi_gw[edge_type]
--- a/pgl/layers/__init__.py
+++ b/pgl/layers/__init__.py
@@ -16,6 +16,12 @@

 from pgl.layers import conv
 from pgl.layers.conv import *
+from pgl.layers import set2set
+from pgl.layers.set2set import *
+from pgl.layers import graph_pool
+from pgl.layers.graph_pool import *

 __all__ = []
 __all__ += conv.__all__
+__all__ += set2set.__all__
+__all__ += graph_pool.__all__
--- a/pgl/layers/conv.py
+++ b/pgl/layers/conv.py
@@ -18,7 +18,7 @@ import paddle.fluid as fluid
 from pgl import graph_wrapper
 from pgl.utils import paddle_helper

-__all__ = ['gcn', 'gat']
+__all__ = ['gcn', 'gat', 'gin']


 def gcn(gw, feature, hidden_size, activation, name, norm=None):
@@ -53,7 +53,7 @@ def gcn(gw, feature, hidden_size, activation, name, norm=None):
        feature = fluid.layers.fc(feature,
                                  size=hidden_size,
                                  bias_attr=False,
-                                  name=name)
+                                  param_attr=fluid.ParamAttr(name=name))

    if norm is not None:
        feature = feature * norm
@@ -67,7 +67,7 @@ def gcn(gw, feature, hidden_size, activation, name, norm=None):
        output = fluid.layers.fc(output,
                                 size=hidden_size,
                                 bias_attr=False,
-                                 name=name)
+                                 param_attr=fluid.ParamAttr(name=name))

    if norm is not None:
        output = output * norm
@@ -152,7 +152,7 @@ def gat(gw,
    ft = fluid.layers.fc(feature,
                         hidden_size * num_heads,
                         bias_attr=False,
-                         name=name + '_weight')
+                         param_attr=fluid.ParamAttr(name=name + '_weight'))
    left_a = fluid.layers.create_parameter(
        shape=[num_heads, hidden_size],
        dtype='float32',
@@ -178,3 +178,73 @@ def gat(gw,
    bias.stop_gradient = True
    output = fluid.layers.elementwise_add(output, bias, act=activation)
    return output
+
+
+def gin(gw,
+        feature,
+        hidden_size,
+        activation,
+        name,
+        init_eps=0.0,
+        train_eps=False):
+    """Implementation of Graph Isomorphism Network (GIN) layer.
+
+    This is an implementation of the paper How Powerful are Graph Neural Networks?
+    (https://arxiv.org/pdf/1810.00826.pdf).
+
+    In their implementation, all MLPs have 2 layers. Batch normalization is applied
+    on every hidden layer.
+
+    Args:
+        gw: Graph wrapper object (:code:`StaticGraphWrapper` or :code:`GraphWrapper`)
+
+        feature: A tensor with shape (num_nodes, feature_size).
+
+        name: GIN layer names.
+
+        hidden_size: The hidden size for gin.
+
+        activation: The activation for the output.
+
+        init_eps: float, optional
+            Initial :math:`\epsilon` value, default is 0.
+
+        train_eps: bool, optional
+            if True, :math:`\epsilon` will be a learnable parameter.
+
+    Return:
+        A tensor with shape (num_nodes, hidden_size).
+    """
+
+    def send_src_copy(src_feat, dst_feat, edge_feat):
+        return src_feat["h"]
+
+    epsilon = fluid.layers.create_parameter(
+        shape=[1, 1],
+        dtype="float32",
+        attr=fluid.ParamAttr(name="%s_eps" % name),
+        default_initializer=fluid.initializer.ConstantInitializer(
+            value=init_eps))
+
+    if not train_eps:
+        epsilon.stop_gradient = True
+
+    msg = gw.send(send_src_copy, nfeat_list=[("h", feature)])
+    output = gw.recv(msg, "sum") + feature * (epsilon + 1.0)
+
+    output = fluid.layers.fc(output,
+                             size=hidden_size,
+                             act=None,
+                             param_attr=fluid.ParamAttr(name="%s_w_0" % name),
+                             bias_attr=fluid.ParamAttr(name="%s_b_0" % name))
+
+    output = fluid.layers.batch_norm(output)
+    output = getattr(fluid.layers, activation)(output)
+
+    output = fluid.layers.fc(output,
+                             size=hidden_size,
+                             act=activation,
+                             param_attr=fluid.ParamAttr(name="%s_w_1" % name),
+                             bias_attr=fluid.ParamAttr(name="%s_b_1" % name))
+
+    return output
--- a/pgl/layers/graph_pool.py
+++ b/pgl/layers/graph_pool.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""This package implements common layers to help building
+graph neural networks.
+"""
+import paddle.fluid as fluid
+from pgl import graph_wrapper
+from pgl.utils import paddle_helper
+from pgl.utils import op
+
+__all__ = ['graph_pooling', 'graph_norm']
+
+
+def graph_pooling(gw, node_feat, pool_type):
+    """Implementation of graph pooling 
+
+    This is an implementation of graph pooling
+
+    Args:
+        gw: Graph wrapper object (:code:`StaticGraphWrapper` or :code:`GraphWrapper`)
+
+        node_feat: A tensor with shape (num_nodes, feature_size).
+
+        pool_type: The type of pooling ("sum", "average" , "min")
+
+    Return:
+        A tensor with shape (num_graph, hidden_size)
+    """
+    graph_feat = op.nested_lod_reset(node_feat, gw.graph_lod)
+    graph_feat = fluid.layers.sequence_pool(graph_feat, pool_type)
+    return graph_feat
+
+
+def graph_norm(gw, feature):
+    """Implementation of graph normalization
+   
+    Reference Paper: BENCHMARKING GRAPH NEURAL NETWORKS
+   
+    Each node features is divied by sqrt(num_nodes) per graphs.
+
+    Args:
+        gw: Graph wrapper object (:code:`StaticGraphWrapper` or :code:`GraphWrapper`)
+
+        feature: A tensor with shape (num_nodes, hidden_size)
+
+    Return:
+        A tensor with shape (num_nodes, hidden_size)
+    """
+    nodes = fluid.layers.fill_constant(
+        [gw.num_nodes, 1], dtype="float32", value=1.0)
+    norm = graph_pooling(gw, nodes, pool_type="sum")
+    norm = fluid.layers.sqrt(norm)
+    feature_lod = op.nested_lod_reset(feature, gw.graph_lod)
+    norm = fluid.layers.sequence_expand_as(norm, feature_lod)
+    norm.stop_gradient = True
+    return feature_lod / norm
--- a/pgl/layers/set2set.py
+++ b/pgl/layers/set2set.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""This package implements common layers to help building pooling operators.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import paddle.fluid as F
+import paddle.fluid.layers as L
+
+import pgl
+
+__all__ = ['Set2Set']
+
+
+class Set2Set(object):
+    """Implementation of set2set pooling operator.
+
+    This is an implementation of the paper ORDER MATTERS: SEQUENCE TO SEQUENCE 
+    FOR SETS (https://arxiv.org/pdf/1511.06391.pdf).
+    """
+
+    def __init__(self, input_dim, n_iters, n_layers):
+        """
+        Args:
+            input_dim: hidden size of input data.
+            n_iters: number of set2set iterations.
+            n_layers: number of lstm layers.
+        """
+        self.input_dim = input_dim
+        self.output_dim = 2 * input_dim
+        self.n_iters = n_iters
+
+        # this's set2set n_layers, lstm n_layers = 1
+        self.n_layers = n_layers
+
+    def forward(self, feat):
+        """
+        Args:
+            feat: input feature with shape [batch, n_edges, dim].
+        
+        Return:
+            output_feat: output feature of set2set pooling with shape [batch, 2*dim].
+        """
+
+        seqlen = 1
+        h = L.fill_constant_batch_size_like(
+            feat, [1, self.n_layers, self.input_dim], "float32", 0)
+        h = L.transpose(h, [1, 0, 2])
+        c = h
+
+        # [seqlen, batch, dim]
+        q_star = L.fill_constant_batch_size_like(
+            feat, [1, seqlen, self.output_dim], "float32", 0)
+        q_star = L.transpose(q_star, [1, 0, 2])
+
+        for _ in range(self.n_iters):
+
+            # q [seqlen, batch, dim]
+            # h [layer, batch, dim]
+            q, h, c = L.lstm(
+                q_star,
+                h,
+                c,
+                seqlen,
+                self.input_dim,
+                self.n_layers,
+                is_bidirec=False)
+
+            # e [batch, seqlen, n_edges]
+            e = L.matmul(L.transpose(q, [1, 0, 2]), feat, transpose_y=True)
+            # alpha [batch, seqlen, n_edges]
+            alpha = L.softmax(e)
+
+            # readout [batch, seqlen, dim]
+            readout = L.matmul(alpha, feat)
+            readout = L.transpose(readout, [1, 0, 2])
+
+            # q_star [seqlen, batch, dim + dim]
+            q_star = L.concat([q, readout], -1)
+
+        return L.squeeze(q_star, [0])
--- a/pgl/redis_graph.py
+++ b/pgl/redis_graph.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""redis_graph"""
+
+import pgl
+import redis
+from redis import BlockingConnectionPool, StrictRedis
+from redis._compat import b, unicode, bytes, long, basestring
+from rediscluster.nodemanager import NodeManager
+from rediscluster.crc import crc16
+from collections import OrderedDict
+import threading
+import numpy as np
+import time
+import json
+import pgl.graph as pgraph
+import pickle as pkl
+from pgl.utils.logger import log
+import pgl.graph_kernel as graph_kernel
+
+
+def encode(value):
+    """
+    Return a bytestring representation of the value.
+    This method is copied from Redis' connection.py:Connection.encode
+    """
+    if isinstance(value, bytes):
+        return value
+    elif isinstance(value, (int, long)):
+        value = b(str(value))
+    elif isinstance(value, float):
+        value = b(repr(value))
+    elif not isinstance(value, basestring):
+        value = unicode(value)
+    if isinstance(value, unicode):
+        value = value.encode('utf-8')
+    return value
+
+
+def crc16_hash(data):
+    """crc16_hash"""
+    return crc16(encode(data))
+
+
+LUA_SCRIPT = """
+math.randomseed(tonumber(ARGV[1]))
+
+local function permute(tab, count, bucket_size)
+    local n = #tab / bucket_size
+    local o_ret = {}
+    local o_dict = {}
+    for i = 1, count do
+        local j = math.random(i, n)
+        o_ret[i] = string.sub(tab, (i - 1) * bucket_size + 1, i * bucket_size)
+        if j > count then
+            if o_dict[j] ~= nil then
+                o_ret[i], o_dict[j] = o_dict[j], o_ret[i]
+            else
+                o_dict[j], o_ret[i] = o_ret[i], string.sub(tab, (j - 1) * bucket_size + 1, j * bucket_size)
+            end
+        end
+    end
+    return table.concat(o_ret)
+end
+
+local bucket_size = 16
+local ret = {}
+local sample_size = tonumber(ARGV[2])
+for i=1, #ARGV - 2 do
+    local tab = redis.call("HGET", KEYS[1], ARGV[i + 2])
+    if tab then
+        if #tab / bucket_size <= sample_size then
+            ret[i] = tab
+        else
+            ret[i] = permute(tab, sample_size, bucket_size)
+        end
+    else
+        ret[i] = tab
+    end
+end
+return ret
+"""
+
+
+class RedisCluster(object):
+    """RedisCluster"""
+
+    def __init__(self, startup_nodes):
+        self.nodemanager = NodeManager(startup_nodes=startup_nodes)
+        self.nodemanager.initialize()
+        self.redis_worker = {}
+        for node, config in self.nodemanager.nodes.items():
+            rdp = BlockingConnectionPool(
+                host=config["host"], port=config["port"])
+            self.redis_worker[node] = {
+                "worker": StrictRedis(
+                    connection_pool=rdp, decode_responses=False),
+                "type": config["server_type"]
+            }
+
+    def get(self, key):
+        """get"""
+        slot = self.nodemanager.keyslot(key)
+        node = np.random.choice(self.nodemanager.slots[slot])
+        worker = self.redis_worker[node['name']]
+        if worker["type"] == "slave":
+            worker["worker"].execute_command("READONLY")
+        return worker["worker"].get(key)
+
+    def hmget(self, key, fields):
+        """hmget"""
+        while True:
+            retry = 0
+            try:
+                slot = self.nodemanager.keyslot(key)
+                node = np.random.choice(self.nodemanager.slots[slot])
+                worker = self.redis_worker[node['name']]
+                if worker["type"] == "slave":
+                    worker["worker"].execute_command("READONLY")
+                ret = worker["worker"].hmget(key, fields)
+                break
+            except Exception as e:
+                retry += 1
+                if retry > 5:
+                    raise e
+                print("RETRY  hmget after 1 sec. Retry Time %s" % retry)
+                time.sleep(1)
+        return ret
+
+    def hmget_sample(self, key, fields, sample):
+        """hmget_sample"""
+        while True:
+            retry = 0
+            try:
+                slot = self.nodemanager.keyslot(key)
+                node = np.random.choice(self.nodemanager.slots[slot])
+                worker = self.redis_worker[node['name']]
+                if worker["type"] == "slave":
+                    worker["worker"].execute_command("READONLY")
+                func = worker["worker"].register_script(LUA_SCRIPT)
+                ret = func(
+                    keys=[key],
+                    args=[np.random.randint(4294967295), sample] + fields)
+                break
+            except Exception as e:
+                retry += 1
+                if retry > 5:
+                    raise e
+                print("RETRY  hmget_sample after 1 sec. Retry Time %s" % retry)
+                time.sleep(1)
+        return ret
+
+
+def hmget_sample_helper(rs, query, num_parts, sample_size):
+    """hmget_sample_helper"""
+    buff = [b""] * len(query)
+    part_dict = {}
+    part_ind_dict = {}
+    for ind, q in enumerate(query):
+        part = crc16_hash(q) % num_parts
+        part = "part-%s" % part
+        if part not in part_dict:
+            part_dict[part] = []
+            part_ind_dict[part] = []
+        part_dict[part].append(q)
+        part_ind_dict[part].append(ind)
+
+    def worker(_key, _value, _buff, _rs, _part_ind_dict, _sample_size):
+        """worker"""
+        response = _rs.hmget_sample(_key, _value, _sample_size)
+        for res, ind in zip(response, _part_ind_dict[_key]):
+            buff[ind] = res
+
+    def hmget(_part_dict, _rs, _buff, _part_ind_dict, _sample_size):
+        """hmget"""
+        key_value = list(_part_dict.items())
+        np.random.shuffle(key_value)
+        for key, value in key_value:
+            worker(key, value, _buff, _rs, _part_ind_dict, _sample_size)
+
+    hmget(part_dict, rs, buff, part_ind_dict, sample_size)
+    return buff
+
+
+def hmget_helper(rs, query, num_parts):
+    """hmget_helper"""
+    buff = [b""] * len(query)
+    part_dict = {}
+    part_ind_dict = {}
+    for ind, q in enumerate(query):
+        part = crc16_hash(q) % num_parts
+        part = "part-%s" % part
+        if part not in part_dict:
+            part_dict[part] = []
+            part_ind_dict[part] = []
+        part_dict[part].append(q)
+        part_ind_dict[part].append(ind)
+
+    def worker(_key, _value, _buff, _rs, _part_ind_dict):
+        """worker"""
+        response = _rs.hmget(_key, _value)
+        for res, ind in zip(response, _part_ind_dict[_key]):
+            buff[ind] = res
+
+    def hmget(_part_dict, _rs, _buff, _part_ind_dict):
+        """hmget"""
+        key_value = list(_part_dict.items())
+        np.random.shuffle(key_value)
+        for key, value in key_value:
+            worker(key, value, _buff, _rs, _part_ind_dict)
+
+    hmget(part_dict, rs, buff, part_ind_dict)
+    return buff
+
+
+class RedisGraph(pgraph.Graph):
+    """RedisGraph"""
+
+    def __init__(self, name, redis_config, num_parts):
+        self._rs = RedisCluster(startup_nodes=redis_config)
+        self.num_parts = num_parts
+        self._name = name
+        self._num_nodes = None
+        self._num_edges = None
+        self._node_feat_info = None
+        self._edge_feat_info = None
+        self._node_feat_dtype = None
+        self._edge_feat_dtype = None
+        self._node_feat_shape = None
+        self._edge_feat_shape = None
+
+    @property
+    def num_nodes(self):
+        """num_nodes"""
+        if self._num_nodes is None:
+            self._num_nodes = int(self._rs.get("num_nodes"))
+        return self._num_nodes
+
+    @property
+    def num_edges(self):
+        """num_edges"""
+        if self._num_edges is None:
+            self._num_edges = int(self._rs.get("num_edges"))
+        return self._num_edges
+
+    def node_feat_info(self):
+        """node_feat_info"""
+        if self._node_feat_info is None:
+            buff = self._rs.get("nf:infos")
+            self._node_feat_info = json.loads(buff.decode())
+        return self._node_feat_info
+
+    def node_feat_dtype(self, key):
+        """node_feat_dtype"""
+        if self._node_feat_dtype is None:
+            self._node_feat_dtype = {}
+            for key, _, dtype in self.node_feat_info():
+                self._node_feat_dtype[key] = dtype
+        return self._node_feat_dtype[key]
+
+    def node_feat_shape(self, key):
+        """node_feat_shape"""
+        if self._node_feat_shape is None:
+            self._node_feat_shape = {}
+            for key, shape, _ in self.node_feat_info():
+                self._node_feat_shape[key] = shape
+        return self._node_feat_shape[key]
+
+    def edge_feat_shape(self, key):
+        """edge_feat_shape"""
+        if self._edge_feat_shape is None:
+            self._edge_feat_shape = {}
+            for key, shape, _ in self.edge_feat_info():
+                self._edge_feat_shape[key] = shape
+        return self._edge_feat_shape[key]
+
+    def edge_feat_dtype(self, key):
+        """edge_feat_dtype"""
+        if self._edge_feat_dtype is None:
+            self._edge_feat_dtype = {}
+            for key, _, dtype in self.edge_feat_info():
+                self._edge_feat_dtype[key] = dtype
+        return self._edge_feat_dtype[key]
+
+    def edge_feat_info(self):
+        """edge_feat_info"""
+        if self._edge_feat_info is None:
+            buff = self._rs.get("ef:infos")
+            self._edge_feat_info = json.loads(buff.decode())
+        return self._edge_feat_info
+
+    def sample_predecessor(self, nodes, max_degree, return_eids=False):
+        """sample_predecessor"""
+        query = ["d:%s" % n for n in nodes]
+        rets = hmget_sample_helper(self._rs, query, self.num_parts, max_degree)
+        v = []
+        eid = []
+        for buff in rets:
+            if buff is None:
+                v.append(np.array([], dtype="int64"))
+                eid.append(np.array([], dtype="int64"))
+            else:
+                npret = np.frombuffer(
+                    buff, dtype="int64").reshape([-1, 2]).astype("int64")
+                v.append(npret[:, 0])
+                eid.append(npret[:, 1])
+        if return_eids:
+            return np.array(v), np.array(eid)
+        else:
+            return np.array(v)
+
+    def sample_successor(self, nodes, max_degree, return_eids=False):
+        """sample_successor"""
+        query = ["s:%s" % n for n in nodes]
+        rets = hmget_sample_helper(self._rs, query, self.num_parts, max_degree)
+        v = []
+        eid = []
+        for buff in rets:
+            if buff is None:
+                v.append(np.array([], dtype="int64"))
+                eid.append(np.array([], dtype="int64"))
+            else:
+                npret = np.frombuffer(
+                    buff, dtype="int64").reshape([-1, 2]).astype("int64")
+                v.append(npret[:, 0])
+                eid.append(npret[:, 1])
+        if return_eids:
+            return np.array(v), np.array(eid)
+        else:
+            return np.array(v)
+
+    def predecessor(self, nodes, return_eids=False):
+        """predecessor"""
+        query = ["d:%s" % n for n in nodes]
+        ret = hmget_helper(self._rs, query, self.num_parts)
+        v = []
+        eid = []
+        for buff in ret:
+            if buff is not None:
+                npret = np.frombuffer(
+                    buff, dtype="int64").reshape([-1, 2]).astype("int64")
+                v.append(npret[:, 0])
+                eid.append(npret[:, 1])
+            else:
+                v.append(np.array([], dtype="int64"))
+                eid.append(np.array([], dtype="int64"))
+        if return_eids:
+            return np.array(v), np.array(eid)
+        else:
+            return np.array(v)
+
+    def successor(self, nodes, return_eids=False):
+        """successor"""
+        query = ["s:%s" % n for n in nodes]
+        ret = hmget_helper(self._rs, query, self.num_parts)
+        v = []
+        eid = []
+        for buff in ret:
+            if buff is not None:
+                npret = np.frombuffer(
+                    buff, dtype="int64").reshape([-1, 2]).astype("int64")
+                v.append(npret[:, 0])
+                eid.append(npret[:, 1])
+            else:
+                v.append(np.array([], dtype="int64"))
+                eid.append(np.array([], dtype="int64"))
+        if return_eids:
+            return np.array(v), np.array(eid)
+        else:
+            return np.array(v)
+
+    def get_edges_by_id(self, eids):
+        """get_edges_by_id"""
+        queries = ["e:%s" % e for e in eids]
+        ret = hmget_helper(self._rs, queries, self.num_parts)
+        o = np.asarray(ret, dtype="int64")
+        dst = o % self.num_nodes
+        src = o // self.num_nodes
+        data = np.hstack(
+            [src.reshape([-1, 1]), dst.reshape([-1, 1])]).astype("int64")
+        return data
+
+    def get_node_feat_by_id(self, key, nodes):
+        """get_node_feat_by_id"""
+        queries = ["nf:%s:%i" % (key, nid) for nid in nodes]
+        ret = hmget_helper(self._rs, queries, self.num_parts)
+        ret = b"".join(ret)
+        data = np.frombuffer(ret, dtype=self.node_feat_dtype(key))
+        data = data.reshape(self.node_feat_shape(key))
+        return data
+
+    def get_edge_feat_by_id(self, key, eids):
+        """get_edge_feat_by_id"""
+        queries = ["ef:%s:%i" % (key, e) for e in eids]
+        ret = hmget_helper(self._rs, queries, self.num_parts)
+        ret = b"".join(ret)
+        data = np.frombuffer(ret, dtype=self.edge_feat_dtype(key))
+        data = data.reshape(self.edge_feat_shape(key))
+        return data
+
+    def subgraph(self, nodes, eid, edges=None):
+        """Generate subgraph with nodes and edge ids.
+
+        This function will generate a :code:`pgl.graph.Subgraph` object and
+        copy all corresponding node and edge features. Nodes and edges will
+        be reindex from 0.
+
+        WARNING: ALL NODES IN EID MUST BE INCLUDED BY NODES
+
+        Args:
+            nodes: Node ids which will be included in the subgraph.
+
+            eid: Edge ids which will be included in the subgraph.
+
+        Return:
+            A :code:`pgl.graph.Subgraph` object.
+        """
+        reindex = {}
+
+        for ind, node in enumerate(nodes):
+            reindex[node] = ind
+
+        if edges is None:
+            edges = self.get_edges_by_id(eid)
+        else:
+            edges = np.array(edges, dtype="int64")
+
+        sub_edges = graph_kernel.map_edges(
+            np.arange(
+                len(edges), dtype="int64"), edges, reindex)
+
+        sub_edge_feat = {}
+        for key, _, _ in self.edge_feat_info():
+            sub_edge_feat[key] = self.get_edge_feat_by_id(key, eid)
+
+        sub_node_feat = {}
+        for key, _, _ in self.node_feat_info():
+            sub_node_feat[key] = self.get_node_feat_by_id(key, nodes)
+
+        subgraph = pgraph.SubGraph(
+            num_nodes=len(nodes),
+            edges=sub_edges,
+            node_feat=sub_node_feat,
+            edge_feat=sub_edge_feat,
+            reindex=reindex)
+        return subgraph
+
+    def node_batch_iter(self, batch_size, shuffle=True):
+        """Node batch iterator
+
+        Iterate all node by batch.
+
+        Args:
+            batch_size: The batch size of each batch of nodes.
+
+            shuffle: Whether shuffle the nodes.
+
+        Return:
+            Batch iterator
+        """
+        perm = np.arange(self.num_nodes, dtype="int64")
+        if shuffle:
+            np.random.shuffle(perm)
+        start = 0
+        while start < self._num_nodes:
+            yield perm[start:start + batch_size]
+            start += batch_size
--- a/pgl/redis_hetergraph.py
+++ b/pgl/redis_hetergraph.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""redis_hetergraph"""
+
+import pgl
+import redis
+from redis import BlockingConnectionPool, StrictRedis
+from redis._compat import b, unicode, bytes, long, basestring
+from rediscluster.nodemanager import NodeManager
+from rediscluster.crc import crc16
+from collections import OrderedDict
+import threading
+import numpy as np
+import time
+import json
+import pgl.graph as pgraph
+import pickle as pkl
+from pgl.utils.logger import log
+import pgl.graph_kernel as graph_kernel
+from pgl import heter_graph
+import pgl.redis_graph as rg
+
+
+class RedisHeterGraph(rg.RedisGraph):
+    """Redis Heterogeneous Graph"""
+
+    def __init__(self, name, edge_types, redis_config, num_parts):
+        super(RedisHeterGraph, self).__init__(name, redis_config, num_parts)
+        self._num_edges = {}
+        self.edge_types = edge_types
+        self.e_type = None
+
+        self._edge_feat_info = {}
+        self._edge_feat_dtype = {}
+        self._edge_feat_shape = {}
+
+    def num_edges_by_type(self, e_type):
+        """get edge number by specified edge type"""
+        if e_type not in self._num_edges:
+            self._num_edges[e_type] = int(
+                self._rs.get("%s:num_edges" % e_type))
+
+        return self._num_edges[e_type]
+
+    def num_edges(self):
+        """num_edges"""
+        num_edges = {}
+        for e_type in self.edge_types:
+            num_edges[e_type] = self.num_edges_by_type(e_type)
+
+        return num_edges
+
+    def edge_feat_info_by_type(self, e_type):
+        """get edge features information by specified edge type"""
+        if e_type not in self._edge_feat_info:
+            buff = self._rs.get("%s:ef:infos" % e_type)
+            if buff is not None:
+                self._edge_feat_info[e_type] = json.loads(buff.decode())
+            else:
+                self._edge_feat_info[e_type] = []
+        return self._edge_feat_info[e_type]
+
+    def edge_feat_info(self):
+        """edge_feat_info"""
+        edge_feat_info = {}
+        for e_type in self.edge_types:
+            edge_feat_info[e_type] = self.edge_feat_info_by_type(e_type)
+        return edge_feat_info
+
+    def edge_feat_shape(self, e_type, key):
+        """edge_feat_shape"""
+        if e_type not in self._edge_feat_shape:
+            e_feat_shape = {}
+            for k, shape, _ in self.edge_feat_info()[e_type]:
+                e_feat_shape[k] = shape
+            self._edge_feat_shape[e_type] = e_feat_shape
+        return self._edge_feat_shape[e_type][key]
+
+    def edge_feat_dtype(self, e_type, key):
+        """edge_feat_dtype"""
+        if e_type not in self._edge_feat_dtype:
+            e_feat_dtype = {}
+            for k, _, dtype in self.edge_feat_info()[e_type]:
+                e_feat_dtype[k] = dtype
+            self._edge_feat_dtype[e_type] = e_feat_dtype
+        return self._edge_feat_dtype[e_type][key]
+
+    def sample_predecessor(self, e_type, nodes, max_degree, return_eids=False):
+        """sample predecessor with the specified edge type"""
+        query = ["%s:d:%s" % (e_type, n) for n in nodes]
+        rets = rg.hmget_sample_helper(self._rs, query, self.num_parts,
+                                      max_degree)
+        v = []
+        eid = []
+        for buff in rets:
+            if buff is None:
+                v.append(np.array([], dtype="int64"))
+                eid.append(np.array([], dtype="int64"))
+            else:
+                npret = np.frombuffer(
+                    buff, dtype="int64").reshape([-1, 2]).astype("int64")
+                v.append(npret[:, 0])
+                eid.append(npret[:, 1])
+        if return_eids:
+            return np.array(v), np.array(eid)
+        else:
+            return np.array(v)
+
+    def sample_successor(self, e_type, nodes, max_degree, return_eids=False):
+        """sample successor with the specified edge type"""
+        query = ["%s:s:%s" % (e_type, n) for n in nodes]
+        rets = rg.hmget_sample_helper(self._rs, query, self.num_parts,
+                                      max_degree)
+        v = []
+        eid = []
+        for buff in rets:
+            if buff is None:
+                v.append(np.array([], dtype="int64"))
+                eid.append(np.array([], dtype="int64"))
+            else:
+                npret = np.frombuffer(
+                    buff, dtype="int64").reshape([-1, 2]).astype("int64")
+                v.append(npret[:, 0])
+                eid.append(npret[:, 1])
+        if return_eids:
+            return np.array(v), np.array(eid)
+        else:
+            return np.array(v)
+
+    def predecessor(self, e_type, nodes, return_eids=False):
+        """predecessor with the specified edge type"""
+        query = ["%s:d:%s" % (e_type, n) for n in nodes]
+        ret = rg.hmget_helper(self._rs, query, self.num_parts)
+        v = []
+        eid = []
+        for buff in ret:
+            if buff is not None:
+                npret = np.frombuffer(
+                    buff, dtype="int64").reshape([-1, 2]).astype("int64")
+                v.append(npret[:, 0])
+                eid.append(npret[:, 1])
+            else:
+                v.append(np.array([], dtype="int64"))
+                eid.append(np.array([], dtype="int64"))
+        if return_eids:
+            return np.array(v), np.array(eid)
+        else:
+            return np.array(v)
+
+    def successor(self, e_type, nodes, return_eids=False):
+        """successor with the specified edge type"""
+        query = ["%s:s:%s" % (e_type, n) for n in nodes]
+        ret = rg.hmget_helper(self._rs, query, self.num_parts)
+        v = []
+        eid = []
+        for buff in ret:
+            if buff is not None:
+                npret = np.frombuffer(
+                    buff, dtype="int64").reshape([-1, 2]).astype("int64")
+                v.append(npret[:, 0])
+                eid.append(npret[:, 1])
+            else:
+                v.append(np.array([], dtype="int64"))
+                eid.append(np.array([], dtype="int64"))
+        if return_eids:
+            return np.array(v), np.array(eid)
+        else:
+            return np.array(v)
+
+    def get_edges_by_id(self, e_type, eids):
+        """get_edges_by_id"""
+        queries = ["%s:e:%s" % (e_type, e) for e in eids]
+        ret = rg.hmget_helper(self._rs, queries, self.num_parts)
+        o = np.asarray(ret, dtype="int64")
+        dst = o % self.num_nodes
+        src = o // self.num_nodes
+        data = np.hstack(
+            [src.reshape([-1, 1]), dst.reshape([-1, 1])]).astype("int64")
+        return data
+
+    def get_edge_feat_by_id(self, e_type, key, eids):
+        """get_edge_feat_by_id"""
+        queries = ["%s:ef:%s:%i" % (e_type, key, e) for e in eids]
+        ret = rg.hmget_helper(self._rs, queries, self.num_parts)
+        if ret is None:
+            return None
+        else:
+            ret = b"".join(ret)
+            data = np.frombuffer(ret, dtype=self.edge_feat_dtype(e_type, key))
+            data = data.reshape(self.edge_feat_shape(e_type, key))
+            return data
+
+    def get_node_types(self, nodes):
+        """get_node_types """
+        queries = ["nt:%i" % n for n in nodes]
+        ret = rg.hmget_helper(self._rs, queries, self.num_parts)
+        node_types = []
+        for buff in ret:
+            if buff:
+                node_types.append(buff.decode())
+            else:
+                node_types = None
+        return node_types
+
+    def subgraph(self, nodes, eid, edges=None):
+        """Generate heterogeneous subgraph with nodes and edge ids.
+
+        WARNING: ALL NODES IN EID MUST BE INCLUDED BY NODES
+
+        Args:
+            nodes: Node ids which will be included in the subgraph.
+
+            eid: Edge ids which will be included in the subgraph.
+
+        Return:
+            A :code:`pgl.heter_graph.Subgraph` object.
+        """
+        reindex = {}
+
+        for ind, node in enumerate(nodes):
+            reindex[node] = ind
+
+        _node_types = self.get_node_types(nodes)
+        if _node_types is None:
+            node_types = None
+        else:
+            node_types = []
+            for idx, t in zip(nodes, _node_types):
+                node_types.append([reindex[idx], t])
+
+        if edges is None:
+            edges = {}
+            for e_type, eid_list in eid.items():
+                edges[e_type] = self.get_edges_by_id(e_type, eid_list)
+
+        sub_edges = {}
+        for e_type, edges_list in edges.items():
+            sub_edges[e_type] = graph_kernel.map_edges(
+                np.arange(
+                    len(edges_list), dtype="int64"), edges_list, reindex)
+
+        sub_edge_feat = {}
+        for e_type, edge_feat_info in self.edge_feat_info().items():
+            type_edge_feat = {}
+            for key, _, _ in edge_feat_info:
+                type_edge_feat[key] = self.get_edge_feat_by_id(e_type, key,
+                                                               eid)
+            sub_edge_feat[e_type] = type_edge_feat
+
+        sub_node_feat = {}
+        for key, _, _ in self.node_feat_info():
+            sub_node_feat[key] = self.get_node_feat_by_id(key, nodes)
+
+        subgraph = heter_graph.SubHeterGraph(
+            num_nodes=len(nodes),
+            edges=sub_edges,
+            node_types=node_types,
+            node_feat=sub_node_feat,
+            edge_feat=sub_edge_feat,
+            reindex=reindex)
+        return subgraph
--- a/pgl/sample.py
+++ b/pgl/sample.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    This package implement graph sampling algorithm.
+"""
+import time
+import copy
+
+import numpy as np
+import pgl
+from pgl.utils.logger import log
+from pgl import graph_kernel
+
+__all__ = [
+    'graphsage_sample', 'node2vec_sample', 'deepwalk_sample',
+    'metapath_randomwalk', 'pinsage_sample'
+]
+
+
+def traverse(item):
+    """traverse the list or numpy"""
+    if isinstance(item, list) or isinstance(item, np.ndarray):
+        for i in iter(item):
+            for j in traverse(i):
+                yield j
+    else:
+        yield item
+
+
+def flat_node_and_edge(nodes, eids, weights=None):
+    """flatten the sub-lists to one list"""
+    nodes = list(set(traverse(nodes)))
+    eids = list(traverse(eids))
+    if weights is not None:
+        weights = list(traverse(weights))
+    return nodes, eids, weights
+
+
+def edge_hash(src, dst):
+    """edge_hash
+    """
+    return src * 100000007 + dst
+
+
+def graphsage_sample(graph, nodes, samples, ignore_edges=[]):
+    """Implement of graphsage sample.
+    
+    Reference paper: https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf.
+
+    Args:
+        graph: A pgl graph instance
+        nodes: Sample starting from nodes
+        samples: A list, number of neighbors in each layer
+        ignore_edges: list of edge(src, dst) will be ignored.
+    
+    Return:
+        A list of subgraphs
+    """
+    start = time.time()
+    num_layers = len(samples)
+    start_nodes = nodes
+    nodes = list(start_nodes)
+    eids, edges = [], []
+    nodes_set = set(nodes)
+    layer_nodes, layer_eids, layer_edges = [], [], []
+    ignore_edge_set = set([edge_hash(src, dst) for src, dst in ignore_edges])
+
+    for layer_idx in reversed(range(num_layers)):
+        if len(start_nodes) == 0:
+            layer_nodes = [nodes] + layer_nodes
+            layer_eids = [eids] + layer_eids
+            layer_edges = [edges] + layer_edges
+            continue
+        batch_pred_nodes, batch_pred_eids = graph.sample_predecessor(
+            start_nodes, samples[layer_idx], return_eids=True)
+        start = time.time()
+        last_nodes_set = nodes_set
+
+        nodes, eids = copy.copy(nodes), copy.copy(eids)
+        edges = copy.copy(edges)
+        nodes_set, eids_set = set(nodes), set(eids)
+        for srcs, dst, pred_eids in zip(batch_pred_nodes, start_nodes,
+                                        batch_pred_eids):
+            for src, eid in zip(srcs, pred_eids):
+                if edge_hash(src, dst) in ignore_edge_set:
+                    continue
+                if eid not in eids_set:
+                    eids.append(eid)
+                    edges.append([src, dst])
+                    eids_set.add(eid)
+                if src not in nodes_set:
+                    nodes.append(src)
+                    nodes_set.add(src)
+        layer_edges = [edges] + layer_edges
+        start_nodes = list(nodes_set - last_nodes_set)
+        layer_nodes = [nodes] + layer_nodes
+        layer_eids = [eids] + layer_eids
+        start = time.time()
+        # Find new nodes
+
+    feed_dict = {}
+
+    subgraphs = []
+    for i in range(num_layers):
+        subgraphs.append(
+            graph.subgraph(
+                nodes=layer_nodes[0], eid=layer_eids[i], edges=layer_edges[i]))
+        # only for this task
+        subgraphs[i].node_feat["index"] = np.array(
+            layer_nodes[0], dtype="int64")
+
+    return subgraphs
+
+
+def alias_sample(size, alias, events):
+    """Implement of alias sample.
+    Args:
+        size: Output shape.
+        alias: The alias table build by `alias_sample_build_table`.
+        events: The events table build by `alias_sample_build_table`.
+    
+    Return:
+        samples: The generated random samples.
+    """
+    rand_num = np.random.uniform(0.0, len(alias), size)
+    idx = rand_num.astype("int64")
+    uni = rand_num - idx
+    flags = (uni >= alias[idx])
+    idx[flags] = events[idx][flags]
+    return idx
+
+
+def graph_alias_sample_table(graph, edge_weight_name):
+    """Build alias sample table for weighted deepwalk.
+    Args:
+        graph: The input graph
+        edge_weight_name: The name of edge weight in edge_feat.
+
+    Return:
+        Alias sample tables for each nodes.
+    """
+    edge_weight = graph.edge_feat[edge_weight_name]
+    _, eids_array = graph.successor(return_eids=True)
+    alias_array, events_array = [], []
+    for eids in eids_array:
+        probs = edge_weight[eids]
+        probs /= np.sum(probs)
+        alias, events = graph_kernel.alias_sample_build_table(probs)
+        alias_array.append(alias), events_array.append(events)
+    alias_array, events_array = np.array(alias_array), np.array(events_array)
+    return alias_array, events_array
+
+
+def deepwalk_sample(graph, nodes, max_depth, alias_name=None,
+                    events_name=None):
+    """Implement of random walk.
+
+    This function get random walks path for given nodes and depth.
+
+    Args:
+        nodes: Walk starting from nodes
+        max_depth: Max walking depth
+
+    Return:
+        A list of walks.
+    """
+    walk = []
+    # init
+    for node in nodes:
+        walk.append([node])
+
+    cur_walk_ids = np.arange(0, len(nodes))
+    cur_nodes = np.array(nodes)
+    for l in range(max_depth):
+        # select the walks not end
+        cur_succs = graph.successor(cur_nodes)
+        mask = [len(succ) > 0 for succ in cur_succs]
+
+        if np.any(mask):
+            cur_walk_ids = cur_walk_ids[mask]
+            cur_nodes = cur_nodes[mask]
+            cur_succs = cur_succs[mask]
+        else:
+            # stop when all nodes have no successor
+            break
+
+        if alias_name is not None and events_name is not None:
+            sample_index = [
+                alias_sample([1], graph.node_feat[alias_name][node],
+                             graph.node_feat[events_name][node])[0]
+                for node in cur_nodes
+            ]
+        else:
+            outdegree = [len(cur_succ) for cur_succ in cur_succs]
+            sample_index = np.floor(
+                np.random.rand(cur_succs.shape[0]) * outdegree).astype("int64")
+
+        nxt_cur_nodes = []
+        for s, ind, walk_id in zip(cur_succs, sample_index, cur_walk_ids):
+            walk[walk_id].append(s[ind])
+            nxt_cur_nodes.append(s[ind])
+        cur_nodes = np.array(nxt_cur_nodes)
+    return walk
+
+
+def node2vec_sample(graph, nodes, max_depth, p=1.0, q=1.0):
+    """Implement of node2vec random walk.
+
+    Reference paper: https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf.
+
+    Args:
+        graph: A pgl graph instance
+        nodes: Walk starting from nodes
+        max_depth: Max walking depth
+        p: Return parameter
+        q: In-out parameter
+
+    Return:
+        A list of walks.
+    """
+    if p == 1.0 and q == 1.0:
+        return deepwalk_sample(graph, nodes, max_depth)
+
+    walk = []
+    # init
+    for node in nodes:
+        walk.append([node])
+
+    cur_walk_ids = np.arange(0, len(nodes))
+    cur_nodes = np.array(nodes)
+    prev_nodes = np.array([-1] * len(nodes), dtype="int64")
+    prev_succs = np.array([[]] * len(nodes), dtype="int64")
+    for l in range(max_depth):
+        # select the walks not end
+        cur_succs = graph.successor(cur_nodes)
+
+        mask = [len(succ) > 0 for succ in cur_succs]
+        if np.any(mask):
+            cur_walk_ids = cur_walk_ids[mask]
+            cur_nodes = cur_nodes[mask]
+            prev_nodes = prev_nodes[mask]
+            prev_succs = prev_succs[mask]
+            cur_succs = cur_succs[mask]
+        else:
+            # stop when all nodes have no successor
+            break
+        num_nodes = cur_nodes.shape[0]
+        nxt_nodes = np.zeros(num_nodes, dtype="int64")
+
+        for idx, (
+                succ, prev_succ, walk_id, prev_node
+        ) in enumerate(zip(cur_succs, prev_succs, cur_walk_ids, prev_nodes)):
+
+            sampled_succ = graph_kernel.node2vec_sample(succ, prev_succ,
+                                                        prev_node, p, q)
+            walk[walk_id].append(sampled_succ)
+            nxt_nodes[idx] = sampled_succ
+
+        prev_nodes, prev_succs = cur_nodes, cur_succs
+        cur_nodes = nxt_nodes
+    return walk
+
+
+def metapath_randomwalk(graph,
+                        start_nodes,
+                        metapath,
+                        walk_length,
+                        alias_name=None,
+                        events_name=None):
+    """Implementation of metapath random walk in heterogeneous graph.
+
+    Args:
+        graph: instance of pgl heterogeneous graph
+        start_nodes: start nodes to generate walk
+        metapath: meta path for sample nodes. 
+            e.g: "c2p-p2a-a2p-p2c"
+        walk_length: the walk length
+
+    Return:
+        a list of metapath walks. 
+        
+    """
+
+    edge_types = metapath.split('-')
+
+    walk = []
+    for node in start_nodes:
+        walk.append([node])
+
+    cur_walk_ids = np.arange(0, len(start_nodes))
+    cur_nodes = np.array(start_nodes)
+    mp_len = len(edge_types)
+    for i in range(0, walk_length - 1):
+        g = graph[edge_types[i % mp_len]]
+
+        cur_succs = g.successor(cur_nodes)
+        mask = [len(succ) > 0 for succ in cur_succs]
+
+        if np.any(mask):
+            cur_walk_ids = cur_walk_ids[mask]
+            cur_nodes = cur_nodes[mask]
+            cur_succs = cur_succs[mask]
+        else:
+            # stop when all nodes have no successor
+            break
+
+        if alias_name is not None and events_name is not None:
+            sample_index = [
+                alias_sample([1], g.node_feat[alias_name][node],
+                             g.node_feat[events_name][node])[0]
+                for node in cur_nodes
+            ]
+        else:
+            outdegree = [len(cur_succ) for cur_succ in cur_succs]
+            sample_index = np.floor(
+                np.random.rand(cur_succs.shape[0]) * outdegree).astype("int64")
+
+        nxt_cur_nodes = []
+        for s, ind, walk_id in zip(cur_succs, sample_index, cur_walk_ids):
+            walk[walk_id].append(s[ind])
+            nxt_cur_nodes.append(s[ind])
+        cur_nodes = np.array(nxt_cur_nodes)
+
+    return walk
+
+
+def random_walk_with_start_prob(graph, nodes, max_depth, proba=0.5):
+    """Implement of random walk with the probability of returning the origin node.
+
+    This function get random walks path for given nodes and depth.
+
+    Args:
+        nodes: Walk starting from nodes
+        max_depth: Max walking depth
+        proba: the proba to return the origin node
+
+    Return:
+        A list of walks.
+    """
+    walk = []
+    # init
+    for node in nodes:
+        walk.append([node])
+
+    walk_ids = np.arange(0, len(nodes))
+    cur_nodes = np.array(nodes)
+    nodes = np.array(nodes)
+    for l in range(max_depth):
+        # select the walks not end
+        if l >= 1:
+            return_proba = np.random.rand(cur_nodes.shape[0])
+            proba_mask = (return_proba < proba)
+            cur_nodes[proba_mask] = nodes[proba_mask]
+        outdegree = graph.outdegree(cur_nodes)
+        mask = (outdegree != 0)
+        if np.any(mask):
+            cur_walk_ids = walk_ids[mask]
+            outdegree = outdegree[mask]
+        else:
+            # stop when all nodes have no successor, wait start next loop to get precesssor
+            continue
+        succ = graph.successor(cur_nodes[mask])
+        sample_index = np.floor(
+            np.random.rand(outdegree.shape[0]) * outdegree).astype("int64")
+
+        nxt_cur_nodes = cur_nodes
+        for s, ind, walk_id in zip(succ, sample_index, cur_walk_ids):
+            walk[walk_id].append(s[ind])
+            nxt_cur_nodes[walk_id] = s[ind]
+        cur_nodes = np.array(nxt_cur_nodes)
+    return walk
+
+
+def pinsage_sample(graph,
+                   nodes,
+                   samples,
+                   top_k=10,
+                   proba=0.5,
+                   norm_bais=1.0,
+                   ignore_edges=set()):
+    """Implement of graphsage sample.
+    
+    Reference paper: .
+
+    Args:
+        graph: A pgl graph instance
+        nodes: Sample starting from nodes
+        samples: A list, number of neighbors in each layer
+        top_k: select the top_k visit count nodes to construct the edges 
+        proba: the probability to return the origin node 
+        norm_bais: the normlization for the visit count
+        ignore_edges: list of edge(src, dst) will be ignored.
+    
+    Return:
+        A list of subgraphs
+    """
+    start = time.time()
+    num_layers = len(samples)
+    start_nodes = nodes
+    edges, weights = [], []
+    layer_nodes, layer_edges, layer_weights = [], [], []
+    ignore_edge_set = set([edge_hash(src, dst) for src, dst in ignore_edges])
+
+    for layer_idx in reversed(range(num_layers)):
+        if len(start_nodes) == 0:
+            layer_nodes = [nodes] + layer_nodes
+            layer_edges = [edges] + layer_edges
+            layer_edges_weight = [weights] + layer_weights
+            continue
+        walks = random_walk_with_start_prob(
+            graph, start_nodes, samples[layer_idx], proba=proba)
+        walks = [walk[1:] for walk in walks]
+        pred_edges = []
+        pred_weights = []
+        pred_nodes = []
+        for node, walk in zip(start_nodes, walks):
+            walk_nodes = []
+            walk_weights = []
+            count_sum = 0
+
+            for random_walk_node in walk:
+                if len(ignore_edge_set) > 0 and random_walk_node != node and \
+                    edge_hash(random_walk_node, node) in ignore_edge_set:
+                    continue
+                walk_nodes.append(random_walk_node)
+            unique, counts = np.unique(walk_nodes, return_counts=True)
+            frequencies = np.asarray((unique, counts)).T
+            frequencies = frequencies[np.argsort(frequencies[:, 1])]
+            frequencies = frequencies[-1 * top_k:, :]
+            for random_walk_node, random_count in zip(
+                    frequencies[:, 0].tolist(), frequencies[:, 1].tolist()):
+                pred_nodes.append(random_walk_node)
+                pred_edges.append((random_walk_node, node))
+                walk_weights.append(random_count)
+                count_sum += random_count
+            count_sum += len(walk_weights) * norm_bais
+            walk_weights = (np.array(walk_weights) + norm_bais) / (count_sum)
+            pred_weights.extend(walk_weights.tolist())
+        last_node_set = set(nodes)
+        nodes, edges, weights = flat_node_and_edge([nodes, pred_nodes], \
+            [edges, pred_edges], [weights, pred_weights])
+
+        layer_edges = [edges] + layer_edges
+        layer_weights = [weights] + layer_weights
+        layer_nodes = [nodes] + layer_nodes
+
+        start_nodes = list(set(nodes) - last_node_set)
+        start = time.time()
+
+    feed_dict = {}
+
+    subgraphs = []
+
+    for i in range(num_layers):
+        edge_feat_dict = {
+            "weight": np.array(
+                layer_weights[i], dtype='float32')
+        }
+        subgraphs.append(
+            graph.subgraph(
+                nodes=layer_nodes[0],
+                edges=layer_edges[i],
+                edge_feats=edge_feat_dict))
+        subgraphs[i].node_feat["index"] = np.array(
+            layer_nodes[0], dtype="int64")
+
+    return subgraphs
--- a/pgl/tests/deepwalk/test_alias_sample.py
+++ b/pgl/tests/deepwalk/test_alias_sample.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""test_alias_sample"""
+import argparse
+import time
+import unittest
+from collections import Counter
+
+import numpy as np
+
+from pgl.graph_kernel import alias_sample_build_table
+from pgl.sample import alias_sample
+
+
+class AliasSampleTest(unittest.TestCase):
+    """AliasSampleTest
+    """
+
+    def setUp(self):
+        pass
+
+    def test_speed(self):
+        """test_speed
+        """
+
+        num = 1000
+        size = [10240, 1, 5]
+        probs = np.random.uniform(0.0, 1.0, [num])
+        probs /= np.sum(probs)
+
+        start = time.time()
+        alias, events = alias_sample_build_table(probs)
+        for i in range(100):
+            alias_sample(size, alias, events)
+        alias_sample_time = time.time() - start
+
+        start = time.time()
+        for i in range(100):
+            np.random.choice(num, size, p=probs)
+        np_sample_time = time.time() - start
+        self.assertTrue(alias_sample_time < np_sample_time)
+
+    def test_resut(self):
+        """test_result
+        """
+        size = [450000]
+        num = 10
+        probs = np.arange(1, num).astype(np.float64)
+        probs /= np.sum(probs)
+        alias, events = alias_sample_build_table(probs)
+        ret = alias_sample(size, alias, events)
+        cnt = Counter(ret)
+        sort_cnt_keys = [x[1] for x in sorted(zip(cnt.values(), cnt.keys()))]
+        self.assertEqual(sort_cnt_keys, np.arange(0, num - 1).tolist())
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/pgl/tests/test_gin.py
+++ b/pgl/tests/test_gin.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    This file is for testing gin layer.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+import unittest
+import numpy as np
+
+import paddle.fluid as F
+import paddle.fluid.layers as L
+
+from pgl.layers.conv import gin
+from pgl import graph
+from pgl import graph_wrapper
+
+
+class GinTest(unittest.TestCase):
+    """GinTest
+    """
+
+    def test_gin(self):
+        """test_gin
+        """
+        np.random.seed(1)
+        hidden_size = 8
+
+        num_nodes = 10
+
+        edges = [(1, 4), (0, 5), (1, 9), (1, 8), (2, 8), (2, 5), (3, 6),
+                 (3, 7), (3, 4), (3, 8)]
+        inver_edges = [(v, u) for u, v in edges]
+        edges.extend(inver_edges)
+
+        node_feat = {"feature": np.random.rand(10, 4).astype("float32")}
+
+        g = graph.Graph(num_nodes=num_nodes, edges=edges, node_feat=node_feat)
+
+        use_cuda = False
+        place = F.CUDAPlace(0) if use_cuda else F.CPUPlace()
+
+        prog = F.Program()
+        startup_prog = F.Program()
+        with F.program_guard(prog, startup_prog):
+            gw = graph_wrapper.GraphWrapper(
+                name='graph',
+                place=place,
+                node_feat=g.node_feat_info(),
+                edge_feat=g.edge_feat_info())
+
+            output = gin(gw,
+                         gw.node_feat['feature'],
+                         hidden_size=hidden_size,
+                         activation='relu',
+                         name='gin',
+                         init_eps=1,
+                         train_eps=True)
+
+        exe = F.Executor(place)
+        exe.run(startup_prog)
+        ret = exe.run(prog, feed=gw.to_feed(g), fetch_list=[output])
+
+        self.assertEqual(ret[0].shape[0], num_nodes)
+        self.assertEqual(ret[0].shape[1], hidden_size)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/pgl/tests/test_hetergraph.py
+++ b/pgl/tests/test_hetergraph.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""test_hetergraph"""
+
+import time
+import unittest
+import json
+import os
+
+import numpy as np
+from pgl.sample import metapath_randomwalk
+from pgl.graph import Graph
+from pgl import heter_graph
+
+
+class HeterGraphTest(unittest.TestCase):
+    """HeterGraph test
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        np.random.seed(1)
+        edges = {}
+        # for test no successor
+        edges['c2p'] = [(1, 4), (0, 5), (1, 9), (1, 8), (2, 8), (2, 5), (3, 6),
+                        (3, 7), (3, 4), (3, 8)]
+        edges['p2c'] = [(v, u) for u, v in edges['c2p']]
+        edges['p2a'] = [(4, 10), (4, 11), (4, 12), (4, 14), (4, 13), (6, 12),
+                        (6, 11), (6, 14), (7, 12), (7, 11), (8, 14), (9, 10)]
+        edges['a2p'] = [(v, u) for u, v in edges['p2a']]
+
+        # for test speed
+        #  edges['c2p'] = [(0, 4), (0, 5), (1, 9), (1,8), (2,8), (2,5), (3,6), (3,7), (3,4), (3,8)]
+        #  edges['p2c'] = [(v,u) for u, v in edges['c2p']]
+        #  edges['p2a'] = [(4,10), (4,11), (4,12), (4,14), (5,13), (6,13), (6,11), (6,14), (7,12), (7,11), (8,14), (9,13)]
+        #  edges['a2p'] = [(v,u) for u, v in edges['p2a']]
+
+        node_types = ['c' for _ in range(4)] + ['p' for _ in range(6)
+                                                ] + ['a' for _ in range(5)]
+        node_types = [(i, t) for i, t in enumerate(node_types)]
+
+        cls.graph = heter_graph.HeterGraph(
+            num_nodes=len(node_types), edges=edges, node_types=node_types)
+
+    def test_num_nodes_by_type(self):
+        print()
+        n_types = {'c': 4, 'p': 6, 'a': 5}
+        for nt in n_types:
+            num_nodes = self.graph.num_nodes_by_type(nt)
+            self.assertEqual(num_nodes, n_types[nt])
+
+    def test_node_batch_iter(self):
+        print()
+        batch_size = 2
+        ground = [[4, 5], [6, 7], [8, 9]]
+        for idx, nodes in enumerate(
+                self.graph.node_batch_iter(
+                    batch_size=batch_size, shuffle=False, n_type='p')):
+            self.assertEqual(len(nodes), batch_size)
+            self.assertListEqual(list(nodes), ground[idx])
+
+    def test_sample_successor(self):
+        print()
+        nodes = [4, 5, 8]
+        md = 2
+        succes = self.graph.sample_successor(
+            edge_type='p2a', nodes=nodes, max_degree=md, return_eids=False)
+        self.assertIsInstance(succes, list)
+        ground = [[10, 11, 12, 14, 13], [], [14]]
+        for succ, g in zip(succes, ground):
+            self.assertIsInstance(succ, np.ndarray)
+            for i in succ:
+                self.assertIn(i, g)
+
+        nodes = [4]
+        succes = self.graph.sample_successor(
+            edge_type='p2a', nodes=nodes, max_degree=md, return_eids=False)
+        self.assertIsInstance(succes, list)
+        ground = [[10, 11, 12, 14, 13]]
+        for succ, g in zip(succes, ground):
+            self.assertIsInstance(succ, np.ndarray)
+            for i in succ:
+                self.assertIn(i, g)
+
+    def test_successor(self):
+        print()
+        nodes = [4, 5, 8]
+        e_type = 'p2a'
+        succes = self.graph.successor(
+            edge_type=e_type,
+            nodes=nodes, )
+
+        self.assertIsInstance(succes, np.ndarray)
+        ground = [[10, 11, 12, 14, 13], [], [14]]
+        for succ, g in zip(succes, ground):
+            self.assertIsInstance(succ, np.ndarray)
+            self.assertCountEqual(succ, g)
+
+        nodes = [4]
+        e_type = 'p2a'
+        succes = self.graph.successor(
+            edge_type=e_type,
+            nodes=nodes, )
+
+        self.assertIsInstance(succes, np.ndarray)
+        ground = [[10, 11, 12, 14, 13]]
+        for succ, g in zip(succes, ground):
+            self.assertIsInstance(succ, np.ndarray)
+            self.assertCountEqual(succ, g)
+
+    def test_sample_nodes(self):
+        print()
+        p_ground = [4, 5, 6, 7, 8, 9]
+        sample_num = 10
+        nodes = self.graph.sample_nodes(sample_num=sample_num, n_type='p')
+
+        self.assertEqual(len(nodes), sample_num)
+        for n in nodes:
+            self.assertIn(n, p_ground)
+
+        # test n_type == None
+        ground = [i for i in range(15)]
+        nodes = self.graph.sample_nodes(sample_num=sample_num, n_type=None)
+        self.assertEqual(len(nodes), sample_num)
+        for n in nodes:
+            self.assertIn(n, ground)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/pgl/tests/test_metapath_randomwalk.py
+++ b/pgl/tests/test_metapath_randomwalk.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""test_metapath_randomwalk"""
+import time
+import unittest
+import json
+import os
+
+import numpy as np
+from pgl.sample import metapath_randomwalk
+from pgl.graph import Graph
+from pgl import heter_graph
+
+np.random.seed(1)
+
+
+class MetapathRandomwalkTest(unittest.TestCase):
+    """metapath_randomwalk test
+    """
+
+    def setUp(self):
+        edges = {}
+        # for test no successor
+        edges['c2p'] = [(1, 4), (0, 5), (1, 9), (1, 8), (2, 8), (2, 5), (3, 6),
+                        (3, 7), (3, 4), (3, 8)]
+        edges['p2c'] = [(v, u) for u, v in edges['c2p']]
+        edges['p2a'] = [(4, 10), (4, 11), (4, 12), (4, 14), (4, 13), (6, 12),
+                        (6, 11), (6, 14), (7, 12), (7, 11), (8, 14), (9, 10)]
+        edges['a2p'] = [(v, u) for u, v in edges['p2a']]
+
+        # for test speed
+        #  edges['c2p'] = [(0, 4), (0, 5), (1, 9), (1,8), (2,8), (2,5), (3,6), (3,7), (3,4), (3,8)]
+        #  edges['p2c'] = [(v,u) for u, v in edges['c2p']]
+        #  edges['p2a'] = [(4,10), (4,11), (4,12), (4,14), (5,13), (6,13), (6,11), (6,14), (7,12), (7,11), (8,14), (9,13)]
+        #  edges['a2p'] = [(v,u) for u, v in edges['p2a']]
+
+        self.node_types = ['c' for _ in range(4)] + [
+            'p' for _ in range(6)
+        ] + ['a' for _ in range(5)]
+        node_types = [(i, t) for i, t in enumerate(self.node_types)]
+
+        self.graph = heter_graph.HeterGraph(
+            num_nodes=len(node_types), edges=edges, node_types=node_types)
+
+    def test_metapath_randomwalk(self):
+        meta_path = 'c2p-p2a-a2p-p2c'
+        path = ['c', 'p', 'a', 'p', 'c']
+        start_nodes = [0, 1, 2, 3]
+        walk_len = 10
+        walks = metapath_randomwalk(
+            graph=self.graph,
+            start_nodes=start_nodes,
+            metapath=meta_path,
+            walk_length=walk_len)
+
+        self.assertEqual(len(walks), 4)
+
+        for walk in walks:
+            for i in range(len(walk)):
+                idx = i % (len(path) - 1)
+                self.assertEqual(self.node_types[walk[i]], path[idx])
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/pgl/tests/test_redis_graph.py
+++ b/pgl/tests/test_redis_graph.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""test_redis_graph"""
+import time
+import unittest
+import json
+import os
+
+import numpy as np
+from pgl.redis_graph import RedisGraph
+
+
+class RedisGraphTest(unittest.TestCase):
+    """RedisGraphTest
+    """
+
+    def setUp(self):
+        config_path = os.path.join(
+            os.path.abspath(os.path.dirname(__file__)),
+            'test_redis_graph_conf.json')
+        with open(config_path) as inf:
+            config = json.load(inf)
+        redis_configs = [config["redis"], ]
+        self.graph = RedisGraph(
+            "reddit-graph", redis_configs, num_parts=config["num_parts"])
+
+    def test_random_seed(self):
+        """test_random_seed
+        """
+        np.random.seed(1)
+        data1 = self.graph.sample_predecessor(range(1000), max_degree=5)
+        data1 = [nid for nodes in data1 for nid in nodes]
+        np.random.seed(1)
+        data2 = self.graph.sample_predecessor(range(1000), max_degree=5)
+        data2 = [nid for nodes in data2 for nid in nodes]
+        np.random.seed(3)
+        data3 = self.graph.sample_predecessor(range(1000), max_degree=5)
+        data3 = [nid for nodes in data3 for nid in nodes]
+
+        self.assertEqual(data1, data2)
+        self.assertNotEqual(data2, data3)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/pgl/tests/test_redis_graph_conf.json
+++ b/pgl/tests/test_redis_graph_conf.json
+{
+    "redis":
+    {
+        "host": "10.86.54.13",
+        "port": "7003"
+    },
+    "num_parts": 64
+}
--- a/pgl/tests/test_sample.py
+++ b/pgl/tests/test_sample.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    This package implement graph sampling algorithm.
+"""
+import unittest
+import os
+import json
+
+import numpy as np
+from pgl.redis_graph import RedisGraph
+from pgl.sample import graphsage_sample
+from pgl.sample import node2vec_sample
+
+
+class SampleTest(unittest.TestCase):
+    """SampleTest
+    """
+
+    def setUp(self):
+        config_path = os.path.join(
+            os.path.abspath(os.path.dirname(__file__)),
+            'test_redis_graph_conf.json')
+        with open(config_path) as inf:
+            config = json.load(inf)
+        redis_configs = [config["redis"], ]
+        self.graph = RedisGraph(
+            "reddit-graph", redis_configs, num_parts=config["num_parts"])
+
+    def test_graphsage_sample(self):
+        """test_graphsage_sample
+        """
+        eids = np.random.choice(self.graph.num_edges, 1000)
+        edges = self.graph.get_edges_by_id(eids)
+        nodes = [n for edge in edges for n in edge]
+        ignore_edges = edges.tolist() + edges[:, [1, 0]].tolist()
+
+        np.random.seed(1)
+        subgraphs = graphsage_sample(self.graph, nodes, [10, 10], [])
+
+        np.random.seed(1)
+        subgraphs_ignored = graphsage_sample(self.graph, nodes, [10, 10],
+                                             ignore_edges)
+
+        self.assertEqual(subgraphs[0].num_nodes,
+                         subgraphs_ignored[0].num_nodes)
+        self.assertGreaterEqual(subgraphs[0].num_edges,
+                                subgraphs_ignored[0].num_edges)
+
+    def test_node2vec_sample(self):
+        """test_node2vec_sample
+        """
+        walks = node2vec_sample(self.graph, range(10), 3)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/pgl/tests/test_set2set.py
+++ b/pgl/tests/test_set2set.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    Comment.
+"""
+from __future__ import division
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import unicode_literals
+import unittest
+
+import paddle.fluid as F
+import paddle.fluid.layers as L
+
+from pgl.layers.set2set import Set2Set
+
+
+def paddle_easy_run(model_func, data):
+    prog = F.Program()
+    startup_prog = F.Program()
+    with F.program_guard(prog, startup_prog):
+        ret = model_func()
+    place = F.CUDAPlace(0)
+    exe = F.Executor(place)
+    exe.run(startup_prog)
+    return exe.run(prog, fetch_list=ret, feed=data)
+
+
+class Set2SetTest(unittest.TestCase):
+    """Set2SetTest
+    """
+
+    def test_graphsage_sample(self):
+        """test_graphsage_sample
+        """
+        import numpy as np
+
+        def model_func():
+            s2s = Set2Set(5, 1, 3)
+            h0 = L.data(
+                name='h0',
+                shape=[2, 10, 5],
+                dtype='float32',
+                append_batch_size=False)
+            h1 = s2s.forward(h0)
+            return h1,
+
+        data = {"h0": np.random.rand(2, 10, 5).astype("float32")}
+        h1, = paddle_easy_run(model_func, data)
+
+        self.assertEqual(h1.shape[0], 2)
+        self.assertEqual(h1.shape[1], 10)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/pgl/utils/mp_reader.py
+++ b/pgl/utils/mp_reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Optimized Multiprocessing Reader for PaddlePaddle
+"""
+import logging
+log = logging.getLogger(__name__)
+import multiprocessing
+import copy
+try:
+    import ujson as json
+except:
+    log.info("ujson not install, fail back to use json instead")
+    import json
+import numpy as np
+import time
+import paddle.fluid as fluid
+from queue import Queue
+import threading
+
+
+def serialize_data(data):
+    """serialize_data"""
+    if data is None:
+        return None
+    return numpy_serialize_data(data)  #, ensure_ascii=False)
+
+
+def numpy_serialize_data(data):
+    """serialize_data"""
+    ret_data = {}
+    for key in data:
+        if isinstance(data[key], np.ndarray):
+            ret_data[key] = (data[key].tobytes(), list(data[key].shape),
+                             "%s" % data[key].dtype)
+        else:
+            ret_data[key] = data[key]
+    return ret_data
+
+
+def numpy_deserialize_data(data):
+    """deserialize_data"""
+    if data is None:
+        return None
+    for key in data:
+        if isinstance(data[key], tuple):
+            value = np.frombuffer(
+                data[key][0], dtype=data[key][2]).reshape(data[key][1])
+            data[key] = value
+    return data
+
+
+def deserialize_data(data):
+    """deserialize_data"""
+    return numpy_deserialize_data(data)
+
+
+def multiprocess_reader(readers, use_pipe=True, queue_size=1000, pipe_size=10):
+    """
+    multiprocess_reader use python multi process to read data from readers
+    and then use multiprocess.Queue or multiprocess.Pipe to merge all
+    data. The process number is equal to the number of input readers, each
+    process call one reader.
+    Multiprocess.Queue require the rw access right to /dev/shm, some
+    platform does not support.
+    you need to create multiple readers first, these readers should be independent
+    to each other so that each process can work independently.
+    An example:
+    .. code-block:: python
+        reader0 = reader(["file01", "file02"])
+        reader1 = reader(["file11", "file12"])
+        reader1 = reader(["file21", "file22"])
+        reader = multiprocess_reader([reader0, reader1, reader2],
+            queue_size=100, use_pipe=False)
+    """
+
+    assert type(readers) is list and len(readers) > 0
+
+    def _read_into_queue(reader, queue):
+        """read_into_queue"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None")
+            queue.put(serialize_data(sample))
+        queue.put(serialize_data(None))
+
+    def queue_reader():
+        """queue_reader"""
+        queue = multiprocessing.Queue(queue_size)
+        for reader in readers:
+            p = multiprocessing.Process(
+                target=_read_into_queue, args=(reader, queue))
+            p.start()
+
+        reader_num = len(readers)
+        finish_num = 0
+        while finish_num < reader_num:
+            sample = deserialize_data(queue.get())
+            if sample is None:
+                finish_num += 1
+            else:
+                yield sample
+
+    def _read_into_pipe(reader, conn, max_pipe_size):
+        """read_into_pipe"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None!")
+            conn.send(serialize_data(sample))
+        conn.send(serialize_data(None))
+        conn.close()
+
+    def pipe_reader():
+        """pipe_reader"""
+        conns = []
+        for reader in readers:
+            parent_conn, child_conn = multiprocessing.Pipe()
+            conns.append(parent_conn)
+            p = multiprocessing.Process(
+                target=_read_into_pipe, args=(reader, child_conn, pipe_size))
+            p.start()
+
+        reader_num = len(readers)
+        conn_to_remove = []
+        finish_flag = np.zeros(len(conns), dtype="int32")
+        start = time.time()
+
+        def queue_worker(sub_conn, que):
+            while True:
+                buff = sub_conn.recv()
+                sample = deserialize_data(buff)
+                if sample is None:
+                    que.put(None)
+                    sub_conn.close()
+                    break
+                que.put(sample)
+
+        thread_pool = []
+        output_queue = Queue(maxsize=reader_num)
+        for i in range(reader_num):
+            t = threading.Thread(
+                target=queue_worker, args=(conns[i], output_queue))
+            t.daemon = True
+            t.start()
+            thread_pool.append(t)
+
+        finish_num = 0
+        while finish_num < reader_num:
+            sample = output_queue.get()
+            if sample is None:
+                finish_num += 1
+            else:
+                yield sample
+
+        for thread in thread_pool:
+            thread.join()
+
+    if use_pipe:
+        return pipe_reader
+    else:
+        return queue_reader
--- a/pgl/utils/mt_reader.py
+++ b/pgl/utils/mt_reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Optimized Multithreading Reader for PaddlePaddle
+"""
+import logging
+log = logging.getLogger(__name__)
+import threading
+import queue
+import copy
+import numpy as np
+import time
+import paddle.fluid as fluid
+
+
+def multithreading_reader(readers, queue_size=1000):
+    """
+    multithreading_reader use python multi thread to read data from readers
+    and then use queue to merge all
+    data. The process number is equal to the number of input readers, each
+    process call one reader.
+    CPU usage rate won't go over 100% with GIL. 
+    you need to create multiple readers first, these readers should be independent
+    to each other so that each process can work independently.
+    An example:
+    .. code-block:: python
+        reader0 = reader(["file01", "file02"])
+        reader1 = reader(["file11", "file12"])
+        reader1 = reader(["file21", "file22"])
+        reader = multithreading_reader([reader0, reader1, reader2],
+            queue_size=100)
+    """
+
+    assert type(readers) is list and len(readers) > 0
+
+    def _read_into_queue(reader, queue):
+        """read_into_queue"""
+        for sample in reader():
+            if sample is None:
+                raise ValueError("sample has None")
+            queue.put(sample)
+        queue.put(None)
+
+    def queue_reader():
+        """queue_reader"""
+        output_queue = queue.Queue(queue_size)
+        thread_pool = []
+        thread_num = 0
+        for reader in readers:
+            p = threading.Thread(
+                target=_read_into_queue, args=(reader, output_queue))
+            p.daemon = True
+            p.start()
+            thread_pool.append(p)
+            thread_num += 1
+
+        while True:
+            ret = output_queue.get()
+            if ret is not None:
+                yield ret
+            else:
+                thread_num -= 1
+                if thread_num == 0:
+                    break
+
+        for thread in thread_pool:
+            thread.join()
+
+    return queue_reader
--- a/pgl/utils/paddle_helper.py
+++ b/pgl/utils/paddle_helper.py
@@ -225,3 +225,23 @@ def scatter_add(input, index, updates):

    output = fluid.layers.scatter(input, index, updates, overwrite=False)
    return output
+
+
+def scatter_max(input, index, updates):
+    """Scatter max updates to input by given index.
+
+    Adds sparse updates to input variables.
+
+    Args:
+        input: Input tensor to be updated
+
+        index: Slice index
+
+        updates: Must have same type as input.
+
+    Return:
+        Same type and shape as input.
+    """
+
+    output = fluid.layers.scatter(input, index, updates, mode='max')
+    return output
--- a/requirements.txt
+++ b/requirements.txt
-numpy >= 1.14.5
+numpy >= 1.16.4
 cython >= 0.25.2
+
+#paddlepaddle
+
+redis-py-cluster
--- a/setup.py
+++ b/setup.py
@@ -16,10 +16,35 @@ import os
 import sys
 import re
 import codecs
-import numpy as np
 from setuptools import setup, find_packages
-from setuptools.extension import Extension
-from Cython.Build import cythonize
+from setuptools import Extension
+from setuptools import dist
+from setuptools.command.build_ext import build_ext as _build_ext
+
+try:
+    from Cython.Build import cythonize
+except ImportError:
+
+    def cythonize(*args, **kwargs):
+        """cythonize"""
+        from Cython.Build import cythonize
+        return cythonize(*args, **kwargs)
+
+
+class CustomBuildExt(_build_ext):
+    """CustomBuildExt"""
+
+    def finalize_options(self):
+        _build_ext.finalize_options(self)
+        # Prevent numpy from thinking it is still in its setup process:
+        __builtins__.__NUMPY_SETUP__ = False
+        import numpy
+        self.include_dirs.append(numpy.get_include())
+
+
+workdir = os.path.dirname(os.path.abspath(__file__))
+with open(os.path.join(workdir, './requirements.txt')) as f:
+    requirements = f.read().splitlines()

 cur_dir = os.path.abspath(os.path.dirname(__file__))
 with open(os.path.join(cur_dir, 'README.md'), 'rb') as f:
@@ -58,7 +83,6 @@ extensions = [
        "pgl.graph_kernel",
        ["pgl/graph_kernel.pyx"],
        language="c++",
-        include_dirs=[np.get_include()],
        extra_compile_args=compile_extra_args,
        extra_link_args=link_extra_args, ),
 ]
@@ -66,7 +90,6 @@ extensions = [

 def get_package_data(path):
    files = []
-    print(path)
    for root, dirnames, filenames in os.walk(path):
        for filename in filenames:
            files.append(os.path.join(root, filename))
@@ -83,9 +106,16 @@ setup(
    long_description_content_type='text/markdown',
    url="https://github.com/PaddlePaddle/PGL",
    package_data=package_data,
+    setup_requires=[
+        'setuptools>=18.0',
+        'numpy>=1.16.4',
+    ],
+    install_requires=requirements,
+    cmdclass={'build_ext': CustomBuildExt},
    packages=find_packages(),
    include_package_data=True,
-    ext_modules=cythonize(extensions),
+    #ext_modules=cythonize(extensions),
+    ext_modules=extensions,
    classifiers=[
        'Intended Audience :: Developers',
        'License :: OSI Approved :: Apache Software License',

--- a/tests/scatter_add_test.py
+++ b/tests/scatter_add_test.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""scatter test cases"""
+
+import unittest
+
+import numpy as np
+import paddle.fluid as fluid
+
+
+class ScatterAddTest(unittest.TestCase):
+    """ScatterAddTest"""
+
+    def test_scatter_add(self):
+        """test_scatter_add"""
+        with fluid.dygraph.guard(fluid.CPUPlace()):
+            input = fluid.dygraph.to_variable(
+                np.array(
+                    [[1, 2], [5, 6]], dtype='float32'), )
+            index = fluid.dygraph.to_variable(np.array([1, 1], dtype=np.int32))
+            updates = fluid.dygraph.to_variable(
+                np.array(
+                    [[3, 4], [3, 4]], dtype='float32'), )
+            output = fluid.layers.scatter(input, index, updates, mode='add')
+            assert output.numpy().tolist() == [[1, 2], [11, 14]]
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/unique_with_counts_test.py
+++ b/tests/unique_with_counts_test.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""unique with counts test"""
+
+import unittest
+
+import numpy as np
+import paddle.fluid as fluid
+
+
+class UniqueWithCountTest(unittest.TestCase):
+    """UniqueWithCountTest"""
+
+    def _test_unique_with_counts_helper(self, input, output):
+        place = fluid.CPUPlace()
+        exe = fluid.Executor(place)
+        main_program = fluid.Program()
+        startup_program = fluid.Program()
+        with fluid.program_guard(main_program, startup_program):
+            x = fluid.layers.data(
+                name='input',
+                dtype='int64',
+                shape=[-1],
+                append_batch_size=False)
+            #x = fluid.assign(np.array([2, 3, 3, 1, 5, 3], dtype='int32'))
+            out, index, count = fluid.layers.unique_with_counts(x)
+
+        out, index, count = exe.run(
+            main_program,
+            feed={'input': np.array(
+                input, dtype='int64'), },
+            fetch_list=[out, index, count],
+            return_numpy=True, )
+        out, index, count = out.tolist(), index.tolist(), count.tolist()
+        assert [out, index, count] == output
+
+    def test_unique_with_counts(self):
+        """test_unique_with_counts"""
+        self._test_unique_with_counts_helper(
+            input=[1, 1, 2, 4, 4, 4, 7, 8, 8],
+            output=[
+                [1, 2, 4, 7, 8],
+                [0, 0, 1, 2, 2, 2, 3, 4, 4],
+                [2, 1, 3, 1, 2],
+            ], )
+        self._test_unique_with_counts_helper(
+            input=[1],
+            output=[
+                [1],
+                [0],
+                [1],
+            ], )
+        self._test_unique_with_counts_helper(
+            input=[1, 1],
+            output=[
+                [1],
+                [0, 0],
+                [2],
+            ], )
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tutorials/1-Introduction.ipynb
+++ b/tutorials/1-Introduction.ipynb
@@ -145,7 +145,7 @@
   "source": [
    "import paddle.fluid as fluid\n",
    "use_cuda = False  \n",
-    "place = fluid.GPUPlace(0) if use_cuda else fluid.CPUPlace()\n",
+    "place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()\n",
    "\n",
    "gw = pgl.graph_wrapper.GraphWrapper(name='graph',\n",
    "                place = place,\n",