init commit

dad7ec3a · yelrose · cd3076a6 · dad7ec3a · dad7ec3a · dad7ec3a
101 changed file
--- a/LICENSE
+++ b/LICENSE
@@ -186,7 +186,7 @@
      same "printed page" as the copyright notice for easier
      identification within third-party archives.
-   Copyright [yyyy] [name of copyright owner]
+   Copyright 2019 PaddlePaddle Authors.
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.

--- a/README.md
+++ b/README.md
-# PGL
+# PGL ReadMe
-Paddle Graph Learning
+# PGL README.md
+# Paddle Graph Learning (PGL) 
+[API](https://xx) | [Tutorials](https://xx)
+Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle). 
+<div>
+<div align=center><img src="framework_of_pgl.png" width="700">
+<center>The Framework of Paddle Graph Learning (PGL) </center>
+<div>
+We provide python interfaces for storing/reading/querying graph structured data and two fundamental computational interfaces, which are walk based paradigm and message-passing based paradigm as shown in the above framework of PGL, for building cutting-edge graph learning algorithms.  Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.
+## Highlight: Efficient and Flexible Message Passing Paradigm
+One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt **Message Passing Paradigm** similar to [DGL](https://github.com/dmlc/dgl) to help to build a customize graph neural network easily. Users only need to write ```send``` and ```recv``` functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function $\phi^e$ to send the message from the source to the target node. For the second step, the recv function $\phi^v$ is responsible for aggregating $\oplus$ messages together from different sources.
+<div>
+<div align=center><img src="MPP.png" width="700">
+<center>The basic idea of message passing paradigm</center>
+<div>
+As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function $\oplus$ on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**. 
+<div>
+<div align=center><img src="parallel_degree_bucketing.png"  width="750">
+<center>The parallel degree bucketing of PGL<center>
+<div>
+Users only need to call the ```sequence_ops``` functions provided by Paddle to easily implement efficient message aggregation. For examples, using ```sequence_pool``` to sum the neighbor message.
+```python
+    import paddle.fluid as fluid
+    def recv(msg):
+        return fluid.layers.sequence_pool(msg, "sum")
+```
+Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
+## Performance
+We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.
+| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL speed (epoch time) |
+| -------- | ----- | ----------------- | ------------ | ------------------------------------ |
+| Cora | GCN |81.75% | 0.0047s | **0.0045s** |
+| Cora | GAT | 83.5% | **0.0119s** | 0.0141s |
+| Pubmed | GCN |79.2% |**0.0049s** |0.0051s |
+| Pubmed | GAT | 77% |0.0193s|**0.0144s**|
+| Citeseer | GCN |70.2%| **0.0045** |0.0046s|
+| Citeseer | GAT |68.8%| **0.0124s** |0.0139s|
+## System requirements
+PGL requires:
+* paddle >= 1.5
+* networkx 
+PGL supports both Python 2 & 3
+## Installation
+pip install pgl
+## The Team
+PGL is developed and maintained by NLP and Paddle Teams at Baidu
+## License
+PGL uses Apache License 2.0.
--- a/docs/Makefile
+++ b/docs/Makefile
+# Minimal makefile for Sphinx documentation
+#
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+.PHONY: help Makefile
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/markdown2rst.py
+++ b/docs/markdown2rst.py
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
+sphinx==2.1.0
+sphinx_rtd_theme
--- a/docs/source/_static/framework_of_pgl.png
+++ b/docs/source/_static/framework_of_pgl.png
--- a/docs/source/_static/message_passing_paradigm.png
+++ b/docs/source/_static/message_passing_paradigm.png
--- a/docs/source/_static/parallel_degree_bucketing.png
+++ b/docs/source/_static/parallel_degree_bucketing.png
--- a/docs/source/_static/quick_start_graph.png
+++ b/docs/source/_static/quick_start_graph.png
--- a/docs/source/api/modules.rst
+++ b/docs/source/api/modules.rst
+pgl
+===
+.. toctree::
+   :maxdepth: 1
+   pgl
--- a/docs/source/api/pgl.data_loader.rst
+++ b/docs/source/api/pgl.data_loader.rst
+pgl.data\_loader module: Some benchmark datasets.
+=================================================
+.. automodule:: pgl.data_loader
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/pgl.graph.rst
+++ b/docs/source/api/pgl.graph.rst
+pgl.graph module: Graph Storage
+===============================
+.. automodule:: pgl.graph
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/pgl.graph_wrapper.rst
+++ b/docs/source/api/pgl.graph_wrapper.rst
+pgl.graph\_wrapper module: Graph data holders for Paddle GNN.
+=========================
+.. automodule:: pgl.graph_wrapper
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/pgl.layers.rst
+++ b/docs/source/api/pgl.layers.rst
+pgl.layers: Predefined graph neural networks layers.
+==================
+.. automodule:: pgl.layers
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/pgl.rst
+++ b/docs/source/api/pgl.rst
+API Reference
+=============
+.. toctree::
+   pgl.graph
+   pgl.graph_wrapper
+   pgl.layers
+   pgl.data_loader
+   pgl.utils.paddle_helper
--- a/docs/source/api/pgl.utils.paddle_helper.rst
+++ b/docs/source/api/pgl.utils.paddle_helper.rst
+pgl.utils.paddle\_helper module: Some helper function for Paddle.
+===============================
+.. automodule:: pgl.utils.paddle_helper
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# http://www.sphinx-doc.org/en/master/config
+# -- Path setup --------------------------------------------------------------
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+"""
+conf.py
+"""
+import os
+import sys
+sys.path.append(os.path.abspath('../../pgl/'))
+sys.path.append(os.path.abspath('..'))
+import sphinx_rtd_theme
+# -- Project information -----------------------------------------------------
+project = 'pgl'
+copyright = '2019, PaddlePaddle'
+author = 'PaddlePaddle'
+# The full version, including alpha/beta/rc tags
+release = '0.1.0.beta'
+# -- General configuration ---------------------------------------------------
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.todo', 'sphinx.ext.viewcode', 'sphinx.ext.mathjax',
+    'sphinx.ext.autodoc', 'sphinx.ext.napoleon', "markdown2rst"
+]
+# Support Inline mathjax
+m2r_disable_inline_math = False
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+source_suffix = ['.rst', '.md']
+exclude_patterns = ['pgl.graph_kernel', 'pgl.layers.conv']
+lanaguage = "zh_cn"
+# -- Options for HTML output -------------------------------------------------
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = "sphinx_rtd_theme"
+html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+html_show_sourcelink = False
+#html_logo = 'pgl_logo.png'
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+'''
+html_theme_options = {
+    'canonical_url': '',
+    'analytics_id': 'UA-XXXXXXX-1',  #  Provided by Google in your dashboard
+    'logo_only': False,
+    'display_version': True,
+    'prev_next_buttons_location': 'bottom',
+    'style_external_links': False,
+    'vcs_pageview_mode': '',
+    'style_nav_header_background': 'white',
+    # Toc options
+    'collapse_navigation': True,
+    'sticky_navigation': True,
+    'navigation_depth': 4,
+    'includehidden': True,
+    'titles_only': False
+}
+'''
--- a/docs/source/examples/gat_examples.rst
+++ b/docs/source/examples/gat_examples.rst
+.. mdinclude:: md/gat_examples.md
--- a/docs/source/examples/gat_examples_code.rst
+++ b/docs/source/examples/gat_examples_code.rst
+View the Code
+=============
+examples/gat/train.py
+.. literalinclude:: ../../../examples/gat/train.py
+    :language: python
+    :linenos:
--- a/docs/source/examples/gcn_examples.rst
+++ b/docs/source/examples/gcn_examples.rst
+.. mdinclude:: md/gcn_examples.md
--- a/docs/source/examples/gcn_examples_code.rst
+++ b/docs/source/examples/gcn_examples_code.rst
+View the Code
+=============
+examples/gcn/train.py
+.. literalinclude:: ../../../examples/gcn/train.py
+    :language: python
+    :linenos:
--- a/docs/source/examples/graphsage_examples.rst
+++ b/docs/source/examples/graphsage_examples.rst
+.. mdinclude:: md/graphsage_examples.md
--- a/docs/source/examples/graphsage_examples_code.rst
+++ b/docs/source/examples/graphsage_examples_code.rst
+View the Code
+=============
+examples/graphsage/train.py
+.. literalinclude:: ../../../examples/graphsage/train.py
+    :language: python
+    :linenos:
+examples/graphsage/reader.py
+.. literalinclude:: ../../../examples/graphsage/reader.py
+    :language: python
+    :linenos:
+examples/graphsage/model.py
+.. literalinclude:: ../../../examples/graphsage/model.py
+    :language: python
+    :linenos:
--- a/docs/source/examples/md/gat_examples.md
+++ b/docs/source/examples/md/gat_examples.md
+# Building Graph Attention Networks
+[Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of indicators as the paper in citation network benchmarks.
+### Simple example to build single head GAT
+To build a gat layer,  one can use our pre-defined ```pgl.layers.gat``` or just write a gat layer with message passing interface.
+```python
+import paddle.fluid as fluid
+def gat_layer(graph_wrapper, node_feature, hidden_size):
+    def send_func(src_feat, dst_feat, edge_feat):
+        logits = src_feat["a1"] + dst_feat["a2"]
+        logits = fluid.layers.leaky_relu(logits, alpha=0.2)
+        return {"logits": logits, "h": src_feat }
+    def recv_func(msg):
+        norm = fluid.layers.sequence_softmax(msg["logits"])
+        output = msg["h"] * norm
+        return output
+    h = fluid.layers.fc(node_feature, hidden_size, bias_attr=False, name="hidden")
+    a1 = fluid.layers.fc(node_feature, 1, name="a1_weight")
+    a2 = fluid.layers.fc(node_feature, 1, name="a2_weight")
+    message = graph_wrapper.send(send_func,
+            nfeat_list=[("h", h), ("a1", a1), ("a2", a2)])
+    output = graph_wrapper.recv(recv_func, message)
+    return output
+```
+### Datasets
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)|
+| --- | --- | --- |---|
+| Cora | ~83% | 0.0188s | 0.0175s |
+| Pubmed | ~78% | 0.0449s  | 0.0295s |
+| Citeseer | ~70% | 0.0275 | 0.0253s |
+### How to run
+For examples, use gpu to train gat on cora dataset.
+```
+python train.py --dataset cora --use_cuda
+```
+#### Hyperparameters
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda.
+### View the Code
+See the code [here](gat_examples_code.html)
--- a/docs/source/examples/md/gcn_examples.md
+++ b/docs/source/examples/md/gcn_examples.md
+# Building Graph Convolutional Network
+[Graph Convolutional Network \(GCN\)](https://arxiv.org/abs/1609.02907) is a powerful neural network designed for machine learning on graphs. Based on PGL, we reproduce GCN algorithms and reach the same level of indicators as the paper in citation network benchmarks.
+### Simple example to build GCN
+To build a gcn layer, one can use our pre-defined ```pgl.layers.gcn``` or just write a gcn layer with message passing interface.
+```python
+import paddle.fluid as fluid
+def gcn_layer(graph_wrapper, node_feature, hidden_size, act):
+    def send_func(src_feat, dst_feat, edge_feat):
+        return src_feat["h"]
+    def recv_func(msg):
+        return fluid.layers.sequence_pool(msg, "sum")
+    message = graph_wrapper.send(send_func, nfeat_list=[("h", node_feature)])
+    output = graph_wrapper.recv(recv_func, message)
+    output = fluid.layers.fc(output, size=hidden_size, act=act)
+    return output
+```
+### Datasets
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)|
+| --- | --- | --- |---|
+| Cora | ~81% | 0.0106s | 0.0104s | 
+| Pubmed | ~79% | 0.0210s  | 0.0154s |
+| Citeseer | ~71% | 0.0175s | 0.0177s | 
+### How to run
+For examples, use gpu to train gcn on cora dataset.
+```
+python train.py --dataset cora --use_cuda
+```
+#### Hyperparameters
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda. 
+### View the Code
+See the code [here](gcn_examples_code.html)
--- a/docs/source/examples/md/graphsage_examples.md
+++ b/docs/source/examples/md/graphsage_examples.md
+# GraphSage for Large Scale Networks
+[GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) is a general inductive framework that leverages node feature
+information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, GraphSAGE learns a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Based on PGL, we reproduce GraphSAGE algorithm and reach the same level of indicators as the paper in Reddit Dataset. Besides, this is an example of subgraph sampling and training in PGL.
+### Datasets
+The reddit dataset should be downloaded from the following links and placed in directory ```./data```. The details for Reddit Dataset can be found [here](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf).
+- reddit.npz https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
+- reddit_adj.npz: https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### How to run
+To train a GraphSAGE model on Reddit Dataset, you can just run
+```
+ python train.py --use_cuda --epoch 10 --graphsage_type graphsage_mean --normalize --symmetry     
+```
+#### Hyperparameters
+- epoch: Number of epochs default (10)
+- use_cuda: Use gpu if assign use_cuda. 
+- graphsage_type: We support 4 aggregator types including "graphsage_mean", "graphsage_maxpool", "graphsage_meanpool" and "graphsage_lstm".
+- normalize: Normalize the input feature if assign normalize.
+- sample_workers: The number of workers for multiprocessing subgraph sample.
+- lr: Learning rate.
+- symmetry: Make the edges symmetric if assign symmetry.
+- batch_size: Batch size.
+- samples_1: The max neighbors for the first hop neighbor sampling. (default: 25)
+- samples_2: The max neighbors for the second hop neighbor sampling. (default: 10)
+- hidden_size: The hidden size of the GraphSAGE models.
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Aggregator | Accuracy   | Reported in paper |
+| --- | --- | --- |
+| Mean | 95.70% |  95.0% | 
+| Meanpool | 95.60% | 94.8% |
+| Maxpool | 94.95%  | 94.8% |
+| LSTM | 95.13% | 95.4% |
+### View the Code
+See the code [here](graphsage_examples_code.html).
--- a/docs/source/examples/md/node2vec_examples.md
+++ b/docs/source/examples/md/node2vec_examples.md
+# Graph Representation Learning: Node2vec
+[Node2vec](https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf) is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce node2vec algorithms and reach the same level of indicators as the paper.
+## Datasets
+The datasets contain two networks: [BlogCatalog](http://socialcomputing.asu.edu/datasets/BlogCatalog3) and [Arxiv](http://snap.stanford.edu/data/ca-AstroPh.html).
+## Dependencies
+- paddlepaddle>=1.4
+- pgl
+## How to run
+For examples, use gpu to train gcn on cora dataset.
+```sh
+# multiclass task example
+python node2vec.py --use_cuda --dataset BlogCatalog --save_path ./tmp/node2vec_BlogCatalog/ --offline_learning --epoch 400
+python multi_class.py --use_cuda --ckpt_path ./tmp/node2vec_BlogCatalog/paddle_model --epoch 1000
+# link prediction task example
+python node2vec.py --use_cuda --dataset ArXiv --save_path
+./tmp/node2vec_ArXiv --offline_learning --epoch 10
+python link_predict.py --use_cuda --ckpt_path ./tmp/node2vec_ArXiv/paddle_model --epoch 400
+```
+## Hyperparameters
+- dataset: The citation dataset "BlogCatalog" and "ArXiv".
+- use_cuda: Use gpu if assign use_cuda.
+### Experiment results
+Dataset|model|Task|Metric|PGL Result|Reported Result
+--|--|--|--|--|--
+BlogCatalog|deepwalk|multi-label classification|MacroF1|0.250|0.211
+BlogCatalog|node2vec|multi-label classification|MacroF1|0.262|0.258
+ArXiv|deepwalk|link prediction|AUC|0.9538|0.9340
+ArXiv|node2vec|link prediction|AUC|0.9541|0.9366
+## View the Code
+See the code [here](node2vec_examples_code.html)
--- a/docs/source/examples/md/static_gat_examples.md
+++ b/docs/source/examples/md/static_gat_examples.md
+# StaticGraphWrapper for GAT Speed Optimization
+[Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of indicators as the paper in citation network benchmarks.
+However, different from the reproduction in **examples/gat**, we use `pgl.graph_wrapper.StaticGraphWrapper` to preload the graph data into gpu or cpu memories which achieves better performance on speed.
+### Datasets
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)| examples/gat | Improvement |
+| --- | --- | --- |---| --- | --- |
+| Cora | ~83% | 0.0145s | 0.0119s | 0.0175s | 1.47x |
+| Pubmed | ~78% | 0.0352s | 0.0193s |0.0295s | 1.53x |
+| Citeseer | ~70% | 0.0148s | 0.0124s |0.0253s | 2.04x |
+### How to run
+For examples, use gpu to train gat on cora dataset.
+```sh
+python train.py --dataset cora --use_cuda
+```
+#### Hyperparameters
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda. 
+### View the Code
+See the code [here](static_gat_examples_code.html)
--- a/docs/source/examples/md/static_gcn_examples.md
+++ b/docs/source/examples/md/static_gcn_examples.md
+# StaticGraphWrapper for GCN Speed Optimization
+[Graph Convolutional Network \(GCN\)](https://arxiv.org/abs/1609.02907) is a powerful neural network designed for machine learning on graphs. Based on PGL, we reproduce GCN algorithms and reach the same level of indicators as the paper in citation network benchmarks.
+However, different from the reproduction in **examples/gcn**, we use `pgl.graph_wrapper.StaticGraphWrapper` to preload the graph data into gpu or cpu memories which achieves better performance on speed.
+### Datasets
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)| examples/gcn | Improvement |
+| --- | --- | --- |---| --- | --- |
+| Cora | ~81% | 0.0053s | 0.0047s | 0.0104s | 2.21x |
+| Pubmed | ~79% | 0.0105s  | 0.0049s |0.0154s | 3.14x |
+| Citeseer | ~71% | 0.0051s | 0.0045s |0.0177s | 3.93x |
+### How to run
+For examples, use gpu to train gcn on cora dataset.
+```sh
+python train.py --dataset cora --use_cuda
+```
+#### Hyperparameters
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda.
+### View the Code
+See the code [here](static_gcn_examples_code.html)
--- a/docs/source/examples/node2vec_examples.rst
+++ b/docs/source/examples/node2vec_examples.rst
+.. mdinclude:: md/node2vec_examples.md
--- a/docs/source/examples/node2vec_examples_code.rst
+++ b/docs/source/examples/node2vec_examples_code.rst
+View the Code
+=============
+examples/node2vec/node2vec.py
+.. literalinclude:: ../../../examples/node2vec/node2vec.py
+    :language: python
+    :linenos:
+examples/node2vec/multi_class.py
+.. literalinclude:: ../../../examples/node2vec/multi_class.py
+    :language: python
+    :linenos:
+examples/node2vec/link_predict.py
+.. literalinclude:: ../../../examples/node2vec/link_predict.py
+    :language: python
+    :linenos:
--- a/docs/source/examples/static_gat_examples.rst
+++ b/docs/source/examples/static_gat_examples.rst
+.. mdinclude:: md/static_gat_examples.md
--- a/docs/source/examples/static_gat_examples_code.rst
+++ b/docs/source/examples/static_gat_examples_code.rst
+View the Code
+=============
+examples/static_gat/train.py
+.. literalinclude:: ../../../examples/static_gat/train.py
+    :language: python
+    :linenos:
--- a/docs/source/examples/static_gcn_examples.rst
+++ b/docs/source/examples/static_gcn_examples.rst
+.. mdinclude:: md/static_gcn_examples.md
--- a/docs/source/examples/static_gcn_examples_code.rst
+++ b/docs/source/examples/static_gcn_examples_code.rst
+View the Code
+=============
+examples/static_gcn/train.py
+.. literalinclude:: ../../../examples/static_gcn/train.py
+    :language: python
+    :linenos:
--- a/docs/source/examples/static_graph_wrapper.rst
+++ b/docs/source/examples/static_graph_wrapper.rst
+Using StaticGraphWrapper for Speed Optimization
+===============================================
+.. toctree::
+    :maxdepth: 1
+    static_gcn_examples.rst
+    static_gat_examples.rst
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
+:github_url: https://github.com/PaddlePaddle/PGL
+.. toctree::
+    :maxdepth: 1
+    :caption: Introduction
+    :hidden:
+    introduction.rst
+.. mdinclude:: md/introduction.md
+Quick Start
+===========
+.. toctree::
+    :maxdepth: 1
+    :caption: Quick Start
+    :hidden:
+    instruction.rst
+See instruction_ for quick start.
+.. _instruction: instruction.html
+.. toctree::
+   :maxdepth: 1
+   :caption: Examples
+   examples/gcn_examples.rst
+   examples/gat_examples.rst
+   examples/static_graph_wrapper.rst
+   examples/node2vec_examples.rst
+   examples/graphsage_examples.rst
+.. toctree::
+   :maxdepth: 2
+   :caption: API Reference
+   api/pgl
+The Team
+========
+.. toctree::
+    :maxdepth: 1
+    :caption: The Team
+    :hidden:
+    team.rst
+PGL is developed and maintained by NLP and Paddle Teams at Baidu
+License
+=======
+PGL uses Apache License 2.0.
--- a/docs/source/instruction.rst
+++ b/docs/source/instruction.rst
+Quick Start Instructions
+========================
+Install PGL
+-----------
+To install Paddle Graph Learning, we need the following packages.
+.. code-block:: sh
+    paddlepaddle >= 1.4 (Faster performance on 1.5)
+    networkx
+    cython
+We can simply install pgl by pip.
+.. code-block:: sh
+    pip install pgl
+.. mdinclude:: md/quick_start.md
--- a/docs/source/introduction.rst
+++ b/docs/source/introduction.rst
+.. mdinclude:: md/introduction.md
--- a/docs/source/md/introduction.md
+++ b/docs/source/md/introduction.md
+# Paddle Graph Learning (PGL) 
+Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle). 
+<div />
+<div align=center><img src="_static/framework_of_pgl.png" width="700"></div>
+<center>The Framework of Paddle Graph Learning (PGL)</center>
+<div />
+We provide python interfaces for storing/reading/querying graph structured data and two fundamental computational interfaces, which are walk based paradigm and message-passing based paradigm as shown in the above framework of PGL, for building cutting-edge graph learning algorithms.  Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.
+## Highlight: Efficient and Flexible <br/> Message Passing Paradigm
+<div />
+<div align=center><img src="_static/message_passing_paradigm.png" width="700"></div>
+<center>The basic idea of message passing paradigm</center>
+<div />
+As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function $\oplus$ on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**. 
+<div/>
+<div align=center><img src="_static/parallel_degree_bucketing.png"  width="750"></div>
+<center>The parallel degree bucketing of PGL</center>
+<div/>
+Users only need to call the ```sequence_ops``` functions provided by Paddle to easily implement efficient message aggregation. For examples, using ```sequence_pool``` to sum the neighbor message.
+```python
+    import paddle.fluid as fluid
+    def recv(msg):
+        return fluid.layers.sequence_pool(msg, "sum")
+```
+Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
+## Performance
+We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.
+| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL speed (epoch time) |
+| -------- | ----- | ----------------- | ------------ | ------------------------------------ |
+| Cora | GCN |81.75% | 0.0047s | **0.0045s** |
+| Cora | GAT | 83.5% | **0.0119s** | 0.0141s |
+| Pubmed | GCN |79.2% |**0.0049s** |0.0051s |
+| Pubmed | GAT | 77% |0.0193s|**0.0144s**|
+| Citeseer | GCN |70.2%| **0.0045** |0.0046s|
+| Citeseer | GAT |68.8%| **0.0124s** |0.0139s|
--- a/docs/source/md/quick_start.md
+++ b/docs/source/md/quick_start.md
+## Step 1: using PGL to create a graph 
+Suppose we have a graph with 10 nodes and 14 edges as shown in the following figure:
+![A simple graph](_static/quick_start_graph.png)
+Our purpose is to train a graph neural network to classify yellow and green nodes. So we can create this graph in such way:
+```python
+import pgl
+from pgl import graph  # import pgl module
+import numpy as np
+def build_graph():
+    # define the number of nodes; we can use number to represent every node
+    num_node = 10
+    # add edges, we represent all edges as a list of tuple (src, dst)
+    edge_list = [(2, 0), (2, 1), (3, 1),(4, 0), (5, 0), 
+             (6, 0), (6, 4), (6, 5), (7, 0), (7, 1),
+             (7, 2), (7, 3), (8, 0), (9, 7)]
+    # Each node can be represented by a d-dimensional feature vector, here for simple, the feature vectors are randomly generated.
+    d = 16
+    feature = np.random.randn(num_node, d).astype("float32")
+    # each edge also can be represented by a feature vector
+    edge_feature = np.random.randn(len(edge_list), d).astype("float32")
+    # create a graph
+    g = graph.Graph(num_nodes = num_node,
+                    edges = edge_list, 
+                    node_feat = {'feature':feature}, 
+                    edge_feat ={'edge_feature': edge_feature})
+    return g
+# create a graph object for saving graph data
+g = build_graph()
+```
+After creating a graph in PGL, we can print out some information in the graph.
+```python
+print('There are %d nodes in the graph.'%g.num_nodes)
+print('There are %d edges in the graph.'%g.num_edges)
+# Out:
+# There are 10 nodes in the graph.
+# There are 14 edges in the graph. 
+```
+Currently our PGL is developed based on static computational mode of paddle (we’ll support dynamic computational model later). We need to build model upon a virtual data holder. GraphWrapper provide a virtual graph structure that users can build deep learning models based on this virtual graph. And then feed real graph data to run the models.
+```python
+import paddle.fluid as fluid
+use_cuda = False  
+place = fluid.GPUPlace(0) if use_cuda else fluid.CPUPlace()
+# use GraphWrapper as a container for graph data to construct a graph neural network
+gw = pgl.graph_wrapper.GraphWrapper(name='graph',
+                        place = place,
+                        node_feat=g.node_feat_info())
+```
+## Step 2: create a simple Graph Convolutional Network(GCN)
+In this tutorial, we use a simple Graph Convolutional Network(GCN) developed by [Kipf and Welling](https://arxiv.org/abs/1609.02907) to perform node classification. Here we use the simplest GCN structure. If readers want to know more about GCN, you can refer to the original paper.
+* In layer $l$，each node $u_i^l$ has a feature vector $h_i^l$;
+* In every layer,  the idea of GCN is that the feature vector $h_i^{l+1}$ of each node $u_i^{l+1}$ in the next layer are obtained by weighting the feature vectors of all the neighboring nodes and then go through a non-linear transformation.  
+In PGL, we can easily implement a GCN layer as follows:
+```python
+# define GCN layer function
+def gcn_layer(gw, feature, hidden_size, name, activation):
+    # gw is a GraphWrapper；feature is the feature vectors of nodes
+    # define message function
+    def send_func(src_feat, dst_feat, edge_feat): 
+        # In this tutorial, we return the feature vector of the source node as message
+        return src_feat['h']
+    # define reduce function
+    def recv_func(feat):
+        # we sum the feature vector of the source node
+        return fluid.layers.sequence_pool(feat, pool_type='sum')
+    # trigger message to passing
+    msg = gw.send(send_func, nfeat_list=[('h', feature)])
+    # recv funciton receives message and trigger reduce funcition to handle message 
+    output = gw.recv(msg, recv_func)
+    output = fluid.layers.fc(output,
+                    size=hidden_size,
+                    bias_attr=False,
+                    act=activation,
+                    name=name)
+    return output
+```
+After defining the GCN layer, we can construct a deeper GCN model with two GCN layers.
+```python
+output = gcn_layer(gw, gw.node_feat['feature'],
+                hidden_size=8, name='gcn_layer_1', activation='relu')
+output = gcn_layer(gw, output, hidden_size=2,
+                name='gcn_layer_2', activation=None)
+```
+## Step 3:  data preprocessing
+Since we implement a node binary classifier, we can use 0 and 1 to represent two classes respectively.
+```python 
+y = [0,1,1,1,0,0,0,1,0,1]
+label = np.array(y, dtype="float32")
+label = np.expand_dims(label, -1)
+```
+## Step 4:  training program
+The training process of GCN is the same as that of other paddle-based models.
+- First we create a loss function. 
+- Then we create a optimizer.
+- Finally, we create a executor and train the model. 
+```python
+# create a label layer as a container 
+node_label = fluid.layers.data("node_label", shape=[None, 1],
+            dtype="float32", append_batch_size=False)
+# using cross-entropy with sigmoid layer as the loss function
+loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=output, label=node_label)
+# calculate the mean loss
+loss = fluid.layers.mean(loss)
+# choose the Adam optimizer and set the learning rate to be 0.01
+adam = fluid.optimizer.Adam(learning_rate=0.01)
+adam.minimize(loss)
+# create the executor 
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+feed_dict = gw.to_feed(g) # gets graph data
+for epoch in range(30):
+    feed_dict['node_label'] = label
+    train_loss = exe.run(fluid.default_main_program(),
+        feed=feed_dict,
+        fetch_list=[loss],
+        return_numpy=True)
+    print('Epoch %d | Loss: %f'%(epoch, train_loss[0]))
+```
--- a/docs/source/team.rst
+++ b/docs/source/team.rst
+The Team
+========
+PGL is developed and maintained by NLP and Paddle Teams at Baidu
--- a/examples/gat/README.md
+++ b/examples/gat/README.md
+# PGL Examples for GAT
+[Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of indicators as the paper in citation network benchmarks.
+### Simple example to build single head GAT
+To build a gat layer,  one can use our pre-defined ```pgl.layers.gat``` or just write a gat layer with message passing interface.
+```python
+import paddle.fluid as fluid
+def gat_layer(graph_wrapper, node_feature, hidden_size):
+    def send_func(src_feat, dst_feat, edge_feat):
+        logits = src_feat["a1"] + dst_feat["a2"]
+        logits = fluid.layers.leaky_relu(logits, alpha=0.2)
+        return {"logits": logits, "h": src_feat }
+    def recv_func(msg):
+        norm = fluid.layers.sequence_softmax(msg["logits"])
+        output = msg["h"] * norm
+        return output
+    h = fluid.layers.fc(node_feature, hidden_size, bias_attr=False, name="hidden")
+    a1 = fluid.layers.fc(node_feature, 1, name="a1_weight")
+    a2 = fluid.layers.fc(node_feature, 1, name="a2_weight")
+    message = graph_wrapper.send(send_func,
+            nfeat_list=[("h", h), ("a1", a1), ("a2", a2)])
+    output = graph_wrapper.recv(recv_func, message)
+    return output
+```
+### Datasets
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)|
+| --- | --- | --- |---|
+| Cora | ~83% | 0.0188s | 0.0175s | 
+| Pubmed | ~78% | 0.0449s  | 0.0295s |
+| Citeseer | ~70% | 0.0275 | 0.0253s | 
+### How to run
+For examples, use gpu to train gat on cora dataset.
+```
+python train.py --dataset cora --use_cuda
+```
+#### Hyperparameters
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda. 
--- a/examples/gat/train.py
+++ b/examples/gat/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#-*- coding: utf-8 -*-
+import pgl
+from pgl import data_loader
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import numpy as np
+import time
+import argparse
+def load(name):
+    if name == 'cora':
+        dataset = data_loader.CoraDataset()
+    elif name == "pubmed":
+        dataset = data_loader.CitationDataset("pubmed", symmetry_edges=False)
+    elif name == "citeseer":
+        dataset = data_loader.CitationDataset("citeseer", symmetry_edges=False)
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+def main(args):
+    dataset = load(args.dataset)
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    test_program = fluid.Program()
+    hidden_size = 8
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.GraphWrapper(
+            name="graph",
+            place=place,
+            node_feat=dataset.graph.node_feat_info())
+        output = pgl.layers.gat(gw,
+                                gw.node_feat["words"],
+                                hidden_size,
+                                activation="elu",
+                                name="gat_layer_1",
+                                num_heads=8,
+                                feat_drop=0.6,
+                                attn_drop=0.6,
+                                is_test=False)
+        output = pgl.layers.gat(gw,
+                                output,
+                                dataset.num_classes,
+                                num_heads=1,
+                                activation=None,
+                                name="gat_layer_2",
+                                feat_drop=0.6,
+                                attn_drop=0.6,
+                                is_test=False)
+        node_index = fluid.layers.data(
+            "node_index",
+            shape=[None, 1],
+            dtype="int32",
+            append_batch_size=False)
+        node_label = fluid.layers.data(
+            "node_label",
+            shape=[None, 1],
+            dtype="int64",
+            append_batch_size=False)
+        pred = fluid.layers.gather(output, node_index)
+        loss, pred = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=node_label, return_softmax=True)
+        acc = fluid.layers.accuracy(input=pred, label=node_label, k=1)
+        loss = fluid.layers.mean(loss)
+    test_program = train_program.clone(for_test=True)
+    with fluid.program_guard(train_program, startup_program):
+        adam = fluid.optimizer.Adam(
+            learning_rate=0.005,
+            regularization=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0005))
+        adam.minimize(loss)
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    feed_dict = gw.to_feed(dataset.graph)
+    train_index = dataset.train_index
+    train_label = np.expand_dims(dataset.y[train_index], -1)
+    train_index = np.expand_dims(train_index, -1)
+    val_index = dataset.val_index
+    val_label = np.expand_dims(dataset.y[val_index], -1)
+    val_index = np.expand_dims(val_index, -1)
+    test_index = dataset.test_index
+    test_label = np.expand_dims(dataset.y[test_index], -1)
+    test_index = np.expand_dims(test_index, -1)
+    dur = []
+    for epoch in range(200):
+        if epoch >= 3:
+            t0 = time.time()
+        feed_dict["node_index"] = np.array(train_index, dtype="int32")
+        feed_dict["node_label"] = np.array(train_label, dtype="int64")
+        train_loss, train_acc = exe.run(train_program,
+                                        feed=feed_dict,
+                                        fetch_list=[loss, acc],
+                                        return_numpy=True)
+        if epoch >= 3:
+            time_per_epoch = 1.0 * (time.time() - t0)
+            dur.append(time_per_epoch)
+        feed_dict["node_index"] = np.array(val_index, dtype="int32")
+        feed_dict["node_label"] = np.array(val_label, dtype="int64")
+        val_loss, val_acc = exe.run(test_program,
+                                    feed=feed_dict,
+                                    fetch_list=[loss, acc],
+                                    return_numpy=True)
+        log.info("Epoch %d " % epoch + "(%.5lf sec) " % np.mean(dur) +
+                 "Train Loss: %f " % train_loss + "Train Acc: %f " % train_acc
+                 + "Val Loss: %f " % val_loss + "Val Acc: %f " % val_acc)
+    feed_dict["node_index"] = np.array(test_index, dtype="int32")
+    feed_dict["node_label"] = np.array(test_label, dtype="int64")
+    test_loss, test_acc = exe.run(test_program,
+                                  feed=feed_dict,
+                                  fetch_list=[loss, acc],
+                                  return_numpy=True)
+    log.info("Accuracy: %f" % test_acc)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='GAT')
+    parser.add_argument(
+        "--dataset", type=str, default="cora", help="dataset (cora, pubmed)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/gcn/README.md
+++ b/examples/gcn/README.md
+# PGL Examples for GCN
+[Graph Convolutional Network \(GCN\)](https://arxiv.org/abs/1609.02907) is a powerful neural network designed for machine learning on graphs. Based on PGL, we reproduce GCN algorithms and reach the same level of indicators as the paper in citation network benchmarks.
+### Simple example to build GCN
+To build a gcn layer, one can use our pre-defined ```pgl.layers.gcn``` or just write a gcn layer with message passing interface.
+```python
+import paddle.fluid as fluid
+def gcn_layer(graph_wrapper, node_feature, hidden_size, act):
+    def send_func(src_feat, dst_feat, edge_feat):
+        return src_feat["h"]
+    def recv_func(msg):
+        return fluid.layers.sequence_pool(msg, "sum")
+    message = graph_wrapper.send(send_func, nfeat_list=[("h", node_feature)])
+    output = graph_wrapper.recv(recv_func, message)
+    output = fluid.layers.fc(output, size=hidden_size, act=act)
+    return output
+```
+### Datasets
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)|
+| --- | --- | --- |---|
+| Cora | ~81% | 0.0106s | 0.0104s | 
+| Pubmed | ~79% | 0.0210s  | 0.0154s |
+| Citeseer | ~71% | 0.0175s | 0.0177s | 
+### How to run
+For examples, use gpu to train gcn on cora dataset.
+```
+python train.py --dataset cora --use_cuda
+```
+#### Hyperparameters
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda. 
--- a/examples/gcn/train.py
+++ b/examples/gcn/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import pgl
+from pgl import data_loader
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import numpy as np
+import time
+import argparse
+def load(name):
+    if name == 'cora':
+        dataset = data_loader.CoraDataset()
+    elif name == "pubmed":
+        dataset = data_loader.CitationDataset("pubmed", symmetry_edges=False)
+    elif name == "citeseer":
+        dataset = data_loader.CitationDataset("citeseer", symmetry_edges=False)
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+def main(args):
+    dataset = load(args.dataset)
+    # normalize
+    indegree = dataset.graph.indegree()
+    norm = np.zeros_like(indegree, dtype="float32")
+    norm[indegree > 0] = np.power(indegree[indegree > 0], -0.5)
+    dataset.graph.node_feat["norm"] = np.expand_dims(norm, -1)
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    test_program = fluid.Program()
+    hidden_size = 16
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.GraphWrapper(
+            name="graph",
+            place=place,
+            node_feat=dataset.graph.node_feat_info())
+        output = pgl.layers.gcn(gw,
+                                gw.node_feat["words"],
+                                hidden_size,
+                                activation="relu",
+                                norm=gw.node_feat['norm'],
+                                name="gcn_layer_1")
+        output = fluid.layers.dropout(
+            output, 0.5, dropout_implementation='upscale_in_train')
+        output = pgl.layers.gcn(gw,
+                                output,
+                                dataset.num_classes,
+                                activation=None,
+                                norm=gw.node_feat['norm'],
+                                name="gcn_layer_2")
+        node_index = fluid.layers.data(
+            "node_index",
+            shape=[None, 1],
+            dtype="int32",
+            append_batch_size=False)
+        node_label = fluid.layers.data(
+            "node_label",
+            shape=[None, 1],
+            dtype="int64",
+            append_batch_size=False)
+        pred = fluid.layers.gather(output, node_index)
+        loss, pred = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=node_label, return_softmax=True)
+        acc = fluid.layers.accuracy(input=pred, label=node_label, k=1)
+        loss = fluid.layers.mean(loss)
+    test_program = train_program.clone(for_test=True)
+    with fluid.program_guard(train_program, startup_program):
+        adam = fluid.optimizer.Adam(
+            learning_rate=1e-2,
+            regularization=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0005))
+        adam.minimize(loss)
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    feed_dict = gw.to_feed(dataset.graph)
+    train_index = dataset.train_index
+    train_label = np.expand_dims(dataset.y[train_index], -1)
+    train_index = np.expand_dims(train_index, -1)
+    val_index = dataset.val_index
+    val_label = np.expand_dims(dataset.y[val_index], -1)
+    val_index = np.expand_dims(val_index, -1)
+    test_index = dataset.test_index
+    test_label = np.expand_dims(dataset.y[test_index], -1)
+    test_index = np.expand_dims(test_index, -1)
+    dur = []
+    for epoch in range(200):
+        if epoch >= 3:
+            t0 = time.time()
+        feed_dict["node_index"] = np.array(train_index, dtype="int32")
+        feed_dict["node_label"] = np.array(train_label, dtype="int64")
+        train_loss, train_acc = exe.run(train_program,
+                                        feed=feed_dict,
+                                        fetch_list=[loss, acc],
+                                        return_numpy=True)
+        if epoch >= 3:
+            time_per_epoch = 1.0 * (time.time() - t0)
+            dur.append(time_per_epoch)
+        feed_dict["node_index"] = np.array(val_index, dtype="int32")
+        feed_dict["node_label"] = np.array(val_label, dtype="int64")
+        val_loss, val_acc = exe.run(test_program,
+                                    feed=feed_dict,
+                                    fetch_list=[loss, acc],
+                                    return_numpy=True)
+        log.info("Epoch %d " % epoch + "(%.5lf sec) " % np.mean(dur) +
+                 "Train Loss: %f " % train_loss + "Train Acc: %f " % train_acc
+                 + "Val Loss: %f " % val_loss + "Val Acc: %f " % val_acc)
+    feed_dict["node_index"] = np.array(test_index, dtype="int32")
+    feed_dict["node_label"] = np.array(test_label, dtype="int64")
+    test_loss, test_acc = exe.run(test_program,
+                                  feed=feed_dict,
+                                  fetch_list=[loss, acc],
+                                  return_numpy=True)
+    log.info("Accuracy: %f" % test_acc)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='GCN')
+    parser.add_argument(
+        "--dataset", type=str, default="cora", help="dataset (cora, pubmed)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/graphsage/README.md
+++ b/examples/graphsage/README.md
+# GraphSAGE in PGL
+[GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) is a general inductive framework that leverages node feature
+information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, GraphSAGE learns a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Based on PGL, we reproduce GraphSAGE algorithm and reach the same level of indicators as the paper in Reddit Dataset. Besides, this is an example of subgraph sampling and training in PGL.
+### Datasets
+The reddit dataset should be downloaded from the following links and placed in directory ```./data```. The details for Reddit Dataset can be found [here](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf).
+- reddit.npz https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
+- reddit_adj.npz: https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### How to run
+To train a GraphSAGE model on Reddit Dataset, you can just run
+```
+ python train.py --use_cuda --epoch 10 --graphsage_type graphsage_mean --normalize --symmetry     
+```
+#### Hyperparameters
+- epoch: Number of epochs default (10)
+- use_cuda: Use gpu if assign use_cuda. 
+- graphsage_type: We support 4 aggregator types including "graphsage_mean", "graphsage_maxpool", "graphsage_meanpool" and "graphsage_lstm".
+- normalize: Normalize the input feature if assign normalize.
+- sample_workers: The number of workers for multiprocessing subgraph sample.
+- lr: Learning rate.
+- symmetry: Make the edges symmetric if assign symmetry.
+- batch_size: Batch size.
+- samples_1: The max neighbors for the first hop neighbor sampling. (default: 25)
+- samples_2: The max neighbors for the second hop neighbor sampling. (default: 10)
+- hidden_size: The hidden size of the GraphSAGE models.
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Aggregator | Accuracy   | Reported in paper |
+| --- | --- | --- |
+| Mean | 95.70% |  95.0% | 
+| Meanpool | 95.60% | 94.8% |
+| Maxpool | 94.95%  | 94.8% |
+| LSTM | 95.13% | 95.4% |
--- a/examples/graphsage/data/README.md
+++ b/examples/graphsage/data/README.md
+** download the dataset **
+        data from https://github.com/matenure/FastGCN/issues/8
+        reddit_adj.npz: https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt
+        reddit.npz: https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
--- a/examples/graphsage/model.py
+++ b/examples/graphsage/model.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.fluid as fluid
+def copy_send(src_feat, dst_feat, edge_feat):
+    return src_feat["h"]
+def mean_recv(feat):
+    return fluid.layers.sequence_pool(feat, pool_type="average")
+def sum_recv(feat):
+    return fluid.layers.sequence_pool(feat, pool_type="sum")
+def max_recv(feat):
+    return fluid.layers.sequence_pool(feat, pool_type="max")
+def lstm_recv(feat):
+    hidden_dim = 128
+    forward, _ = fluid.layers.dynamic_lstm(
+        input=feat, size=hidden_dim * 4, use_peepholes=False)
+    output = fluid.layers.sequence_last_step(forward)
+    return output
+def graphsage_mean(gw, feature, hidden_size, act, name):
+    msg = gw.send(copy_send, nfeat_list=[("h", feature)])
+    neigh_feature = gw.recv(msg, mean_recv)
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+def graphsage_meanpool(gw,
+                       feature,
+                       hidden_size,
+                       act,
+                       name,
+                       inner_hidden_size=512):
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+    msg = gw.send(copy_send, nfeat_list=[("h", neigh_feature)])
+    neigh_feature = gw.recv(msg, mean_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+def graphsage_maxpool(gw,
+                      feature,
+                      hidden_size,
+                      act,
+                      name,
+                      inner_hidden_size=512):
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+    msg = gw.send(copy_send, nfeat_list=[("h", neigh_feature)])
+    neigh_feature = gw.recv(msg, max_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
+def graphsage_lstm(gw, feature, hidden_size, act, name):
+    inner_hidden_size = 128
+    neigh_feature = fluid.layers.fc(feature, inner_hidden_size, act="relu")
+    hidden_dim = 128
+    forward_proj = fluid.layers.fc(input=neigh_feature,
+                                   size=hidden_dim * 4,
+                                   bias_attr=False,
+                                   name="lstm_proj")
+    msg = gw.send(copy_send, nfeat_list=[("h", forward_proj)])
+    neigh_feature = gw.recv(msg, lstm_recv)
+    neigh_feature = fluid.layers.fc(neigh_feature,
+                                    hidden_size,
+                                    act=act,
+                                    name=name + '_r')
+    self_feature = feature
+    self_feature = fluid.layers.fc(self_feature,
+                                   hidden_size,
+                                   act=act,
+                                   name=name + '_l')
+    output = fluid.layers.concat([self_feature, neigh_feature], axis=1)
+    output = fluid.layers.l2_normalize(output, axis=1)
+    return output
--- a/examples/graphsage/reader.py
+++ b/examples/graphsage/reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+import pickle as pkl
+import paddle
+import paddle.fluid as fluid
+import pgl
+import time
+from pgl.utils.logger import log
+import train
+import time
+def node_batch_iter(nodes, node_label, batch_size):
+    perm = np.arange(len(nodes))
+    np.random.shuffle(perm)
+    start = 0
+    while start < len(nodes):
+        index = perm[start:start + batch_size]
+        start += batch_size
+        yield nodes[index], node_label[index]
+def traverse(item):
+    if isinstance(item, list) or isinstance(item, np.ndarray):
+        for i in iter(item):
+            for j in traverse(i):
+                yield j
+    else:
+        yield item
+def flat_node_and_edge(nodes, eids):
+    nodes = list(set(traverse(nodes)))
+    eids = list(set(traverse(eids)))
+    return nodes, eids
+def worker(batch_info, graph, samples):
+    def work():
+        for batch_train_samples, batch_train_labels in batch_info:
+            start_nodes = batch_train_samples
+            nodes = start_nodes
+            eids = []
+            for max_deg in samples:
+                pred, pred_eid = graph.sample_predecessor(
+                    start_nodes, max_degree=max_deg, return_eids=True)
+                last_nodes = nodes
+                nodes = [nodes, pred]
+                eids = [eids, pred_eid]
+                nodes, eids = flat_node_and_edge(nodes, eids)
+                # Find new nodes
+                start_nodes = list(set(nodes) - set(last_nodes))
+                if len(start_nodes) == 0:
+                    break
+            feed_dict = {}
+            feed_dict["nodes"] = [int(n) for n in nodes]
+            feed_dict["eids"] = [int(e) for e in eids]
+            feed_dict["node_label"] = [int(n) for n in batch_train_labels]
+            feed_dict["node_index"] = [int(n) for n in batch_train_samples]
+            yield feed_dict
+    return work
+def multiprocess_graph_reader(graph,
+                              graph_wrapper,
+                              samples,
+                              node_index,
+                              batch_size,
+                              node_label,
+                              num_workers=4):
+    def parse_to_subgraph(rd):
+        def work():
+            for data in rd():
+                nodes = data["nodes"]
+                eids = data["eids"]
+                batch_train_labels = data["node_label"]
+                batch_train_samples = data["node_index"]
+                subgraph = graph.subgraph(nodes=nodes, eid=eids)
+                sub_node_index = subgraph.reindex_from_parrent_nodes(
+                    batch_train_samples)
+                feed_dict = graph_wrapper.to_feed(subgraph)
+                feed_dict["node_label"] = np.expand_dims(
+                    np.array(
+                        batch_train_labels, dtype="int64"), -1)
+                feed_dict["node_index"] = sub_node_index
+                yield feed_dict
+        return work
+    def reader():
+        batch_info = list(
+            node_batch_iter(
+                node_index, node_label, batch_size=batch_size))
+        block_size = int(len(batch_info) / num_workers + 1)
+        reader_pool = []
+        for i in range(num_workers):
+            reader_pool.append(
+                worker(batch_info[block_size * i:block_size * (i + 1)], graph,
+                       samples))
+        multi_process_sample = paddle.reader.multiprocess_reader(
+            reader_pool, use_pipe=False)
+        r = parse_to_subgraph(multi_process_sample)
+        return paddle.reader.buffered(r, 1000)
+    return reader()
+def graph_reader(graph, graph_wrapper, samples, node_index, batch_size,
+                 node_label):
+    def reader():
+        for batch_train_samples, batch_train_labels in node_batch_iter(
+                node_index, node_label, batch_size=batch_size):
+            start_nodes = batch_train_samples
+            nodes = start_nodes
+            eids = []
+            for max_deg in samples:
+                pred, pred_eid = graph.sample_predecessor(
+                    start_nodes, max_degree=max_deg, return_eids=True)
+                last_nodes = nodes
+                nodes = [nodes, pred]
+                eids = [eids, pred_eid]
+                nodes, eids = flat_node_and_edge(nodes, eids)
+                # Find new nodes
+                start_nodes = list(set(nodes) - set(last_nodes))
+                if len(start_nodes) == 0:
+                    break
+            subgraph = graph.subgraph(nodes=nodes, eid=eids)
+            feed_dict = graph_wrapper.to_feed(subgraph)
+            sub_node_index = subgraph.reindex_from_parrent_nodes(
+                batch_train_samples)
+            feed_dict["node_label"] = np.expand_dims(
+                np.array(
+                    batch_train_labels, dtype="int64"), -1)
+            feed_dict["node_index"] = np.array(sub_node_index, dtype="int32")
+            yield feed_dict
+    return paddle.reader.buffered(reader, 1000)
--- a/examples/graphsage/train.py
+++ b/examples/graphsage/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import time
+import numpy as np
+import scipy.sparse as sp
+from sklearn.preprocessing import StandardScaler
+import pgl
+from pgl.utils.logger import log
+from pgl.utils import paddle_helper
+import paddle
+import paddle.fluid as fluid
+import reader
+from model import graphsage_mean, graphsage_meanpool,\
+        graphsage_maxpool, graphsage_lstm
+def load_data(normalize=True, symmetry=True):
+    """
+        data from https://github.com/matenure/FastGCN/issues/8
+        reddit_adj.npz: https://drive.google.com/open?id=174vb0Ws7Vxk_QTUtxqTgDHSQ4El4qDHt
+        reddit.npz: https://drive.google.com/open?id=19SphVl_Oe8SJ1r87Hr5a6znx3nJu1F2J
+    """
+    data = np.load("data/reddit.npz")
+    adj = sp.load_npz("data/reddit_adj.npz")
+    if symmetry:
+        adj = adj + adj.T
+    adj = adj.tocoo()
+    src = adj.row
+    dst = adj.col
+    num_class = 41
+    train_label = data['y_train']
+    val_label = data['y_val']
+    test_label = data['y_test']
+    train_index = data['train_index']
+    val_index = data['val_index']
+    test_index = data['test_index']
+    feature = data["feats"].astype("float32")
+    if normalize:
+        scaler = StandardScaler()
+        scaler.fit(feature[train_index])
+        feature = scaler.transform(feature)
+    log.info("Feature shape %s" % (repr(feature.shape)))
+    graph = pgl.graph.Graph(
+        num_nodes=feature.shape[0],
+        edges=list(zip(src, dst)),
+        node_feat={"index": np.arange(
+            0, len(feature), dtype="int32")})
+    return {
+        "graph": graph,
+        "train_index": train_index,
+        "train_label": train_label,
+        "val_label": val_label,
+        "val_index": val_index,
+        "test_index": test_index,
+        "test_label": test_label,
+        "feature": feature,
+        "num_class": 41
+    }
+def build_graph_model(graph_wrapper, num_class, k_hop, graphsage_type,
+                      hidden_size, feature):
+    node_index = fluid.layers.data(
+        "node_index", shape=[None], dtype="int32", append_batch_size=False)
+    node_label = fluid.layers.data(
+        "node_label", shape=[None, 1], dtype="int64", append_batch_size=False)
+    feature = fluid.layers.gather(feature, graph_wrapper.node_feat['index'])
+    feature.stop_gradient = True
+    for i in range(k_hop):
+        if graphsage_type == 'graphsage_mean':
+            feature = graphsage_mean(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_mean_%s % i")
+        elif graphsage_type == 'graphsage_meanpool':
+            feature = graphsage_meanpool(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_meanpool_%s % i")
+        elif graphsage_type == 'graphsage_maxpool':
+            feature = graphsage_maxpool(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_maxpool_%s % i")
+        elif graphsage_type == 'graphsage_lstm':
+            feature = graphsage_lstm(
+                graph_wrapper,
+                feature,
+                hidden_size,
+                act="relu",
+                name="graphsage_maxpool_%s % i")
+        else:
+            raise ValueError("graphsage type %s is not"
+                             " implemented" % graphsage_type)
+    feature = fluid.layers.gather(feature, node_index)
+    logits = fluid.layers.fc(feature,
+                             num_class,
+                             act=None,
+                             name='classification_layer')
+    proba = fluid.layers.softmax(logits)
+    loss = fluid.layers.softmax_with_cross_entropy(
+        logits=logits, label=node_label)
+    loss = fluid.layers.mean(loss)
+    acc = fluid.layers.accuracy(input=proba, label=node_label, k=1)
+    return loss, acc
+def run_epoch(batch_iter,
+              exe,
+              program,
+              prefix,
+              model_loss,
+              model_acc,
+              epoch,
+              log_per_step=100):
+    batch = 0
+    total_loss = 0.
+    total_acc = 0.
+    total_sample = 0
+    start = time.time()
+    for batch_feed_dict in batch_iter():
+        batch += 1
+        batch_loss, batch_acc = exe.run(program,
+                                        fetch_list=[model_loss, model_acc],
+                                        feed=batch_feed_dict)
+        if batch % log_per_step == 0:
+            log.info("Batch %s %s-Loss %s %s-Acc %s" %
+                     (batch, prefix, batch_loss, prefix, batch_acc))
+        num_samples = len(batch_feed_dict["node_index"])
+        total_loss += batch_loss * num_samples
+        total_acc += batch_acc * num_samples
+        total_sample += num_samples
+    end = time.time()
+    log.info("%s Epoch %s Loss %.5lf Acc %.5lf Speed(per batch) %.5lf sec" %
+             (prefix, epoch, total_loss / total_sample,
+              total_acc / total_sample, (end - start) / batch))
+def main(args):
+    data = load_data(args.normalize, args.symmetry)
+    log.info("preprocess finish")
+    log.info("Train Examples: %s" % len(data["train_index"]))
+    log.info("Val Examples: %s" % len(data["val_index"]))
+    log.info("Test Examples: %s" % len(data["test_index"]))
+    log.info("Num nodes %s" % data["graph"].num_nodes)
+    log.info("Num edges %s" % data["graph"].num_edges)
+    log.info("Average Degree %s" % np.mean(data["graph"].indegree()))
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    samples = []
+    if args.samples_1 > 0:
+        samples.append(args.samples_1)
+    if args.samples_2 > 0:
+        samples.append(args.samples_2)
+    with fluid.program_guard(train_program, startup_program):
+        feature, feature_init = paddle_helper.constant(
+            "feat",
+            dtype=data['feature'].dtype,
+            value=data['feature'],
+            hide_batch_size=False)
+        graph_wrapper = pgl.graph_wrapper.GraphWrapper(
+            "sub_graph", place, node_feat=data['graph'].node_feat_info())
+        model_loss, model_acc = build_graph_model(
+            graph_wrapper,
+            num_class=data["num_class"],
+            feature=feature,
+            hidden_size=args.hidden_size,
+            graphsage_type=args.graphsage_type,
+            k_hop=len(samples))
+    test_program = train_program.clone(for_test=True)
+    with fluid.program_guard(train_program, startup_program):
+        adam = fluid.optimizer.Adam(learning_rate=args.lr)
+        adam.minimize(model_loss)
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    feature_init(place)
+    if args.sample_workers > 1:
+        train_iter = reader.multiprocess_graph_reader(
+            data['graph'],
+            graph_wrapper,
+            samples=samples,
+            num_workers=args.sample_workers,
+            batch_size=args.batch_size,
+            node_index=data['train_index'],
+            node_label=data["train_label"])
+    else:
+        train_iter = reader.graph_reader(
+            data['graph'],
+            graph_wrapper,
+            samples=samples,
+            batch_size=args.batch_size,
+            node_index=data['train_index'],
+            node_label=data["train_label"])
+    if args.sample_workers > 1:
+        val_iter = reader.multiprocess_graph_reader(
+            data['graph'],
+            graph_wrapper,
+            samples=samples,
+            num_workers=args.sample_workers,
+            batch_size=args.batch_size,
+            node_index=data['val_index'],
+            node_label=data["val_label"])
+    else:
+        val_iter = reader.graph_reader(
+            data['graph'],
+            graph_wrapper,
+            samples=samples,
+            batch_size=args.batch_size,
+            node_index=data['val_index'],
+            node_label=data["val_label"])
+    if args.sample_workers > 1:
+        test_iter = reader.multiprocess_graph_reader(
+            data['graph'],
+            graph_wrapper,
+            samples=samples,
+            num_workers=args.sample_workers,
+            batch_size=args.batch_size,
+            node_index=data['test_index'],
+            node_label=data["test_label"])
+    else:
+        test_iter = reader.graph_reader(
+            data['graph'],
+            graph_wrapper,
+            samples=samples,
+            batch_size=args.batch_size,
+            node_index=data['test_index'],
+            node_label=data["test_label"])
+    for epoch in range(args.epoch):
+        run_epoch(
+            train_iter,
+            program=train_program,
+            exe=exe,
+            prefix="train",
+            model_loss=model_loss,
+            model_acc=model_acc,
+            epoch=epoch)
+        run_epoch(
+            val_iter,
+            program=test_program,
+            exe=exe,
+            prefix="val",
+            model_loss=model_loss,
+            model_acc=model_acc,
+            log_per_step=10000,
+            epoch=epoch)
+    run_epoch(
+        test_iter,
+        program=test_program,
+        prefix="test",
+        exe=exe,
+        model_loss=model_loss,
+        model_acc=model_acc,
+        log_per_step=10000,
+        epoch=epoch)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='graphsage')
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument(
+        "--normalize", action='store_true', help="normalize features")
+    parser.add_argument(
+        "--symmetry", action='store_true', help="undirect graph")
+    parser.add_argument("--graphsage_type", type=str, default="graphsage_mean")
+    parser.add_argument("--sample_workers", type=int, default=5)
+    parser.add_argument("--epoch", type=int, default=10)
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--batch_size", type=int, default=128)
+    parser.add_argument("--lr", type=float, default=0.01)
+    parser.add_argument("--samples_1", type=int, default=25)
+    parser.add_argument("--samples_2", type=int, default=10)
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/node2vec/README.md
+++ b/examples/node2vec/README.md
+# PGL Examples for node2vec
+[Node2vec](https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf) is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce node2vec algorithms and reach the same level of indicators as the paper.
+## Datasets
+The datasets contain two networks: [BlogCatalog](http://socialcomputing.asu.edu/datasets/BlogCatalog3) and [Arxiv](http://snap.stanford.edu/data/ca-AstroPh.html). 
+## Dependencies
+- paddlepaddle>=1.4
+- pgl
+## How to run
+For examples, use gpu to train gcn on cora dataset.
+```sh
+# multiclass task example
+python node2vec.py --use_cuda --dataset BlogCatalog --save_path ./tmp/node2vec_BlogCatalog/ --offline_learning --epoch 400
+python multi_class.py --use_cuda --ckpt_path ./tmp/node2vec_BlogCatalog/paddle_model --epoch 1000
+# link prediction task example
+python node2vec.py --use_cuda --dataset ArXiv --save_path
+./tmp/node2vec_ArXiv --offline_learning --epoch 10
+python link_predict.py --use_cuda --ckpt_path ./tmp/node2vec_ArXiv/paddle_model --epoch 400
+```
+## Hyperparameters
+- dataset: The citation dataset "BlogCatalog" and "ArXiv".
+- use_cuda: Use gpu if assign use_cuda. 
+### Experiment results
+Dataset|model|Task|Metric|PGL Result|Reported Result 
+--|--|--|--|--|--
+BlogCatalog|deepwalk|multi-label classification|MacroF1|0.250|0.211
+BlogCatalog|node2vec|multi-label classification|MacroF1|0.262|0.258
+ArXiv|deepwalk|link prediction|AUC|0.9538|0.9340
+ArXiv|node2vec|link prediction|AUC|0.9541|0.9366
--- a/examples/node2vec/link_predict.py
+++ b/examples/node2vec/link_predict.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import time
+import math
+import os
+import numpy as np
+from sklearn import metrics
+import pgl
+from pgl import data_loader
+from pgl.utils import op
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import paddle.fluid.layers as l
+np.random.seed(123)
+def load(name):
+    if name == 'BlogCatalog':
+        dataset = data_loader.BlogCatalogDataset()
+    elif name == "ArXiv":
+        dataset = data_loader.ArXivDataset()
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+def binary_op(u_embed, v_embed, binary_op_type):
+    if binary_op_type == "Average":
+        edge_embed = (u_embed + v_embed) / 2
+    elif binary_op_type == "Hadamard":
+        edge_embed = u_embed * v_embed
+    elif binary_op_type == "Weighted-L1":
+        edge_embed = l.abs(u_embed - v_embed)
+    elif binary_op_type == "Weighted-L2":
+        edge_embed = (u_embed - v_embed) * (u_embed - v_embed)
+    else:
+        raise ValueError(binary_op_type + " binary_op_type doesn't exists")
+    return edge_embed
+def link_predict_model(num_nodes,
+                       hidden_size=16,
+                       name='link_predict_task',
+                       binary_op_type="Weighted-L2"):
+    pyreader = l.py_reader(
+        capacity=70,
+        shapes=[[-1, 1], [-1, 1], [-1, 1]],
+        dtypes=['int64', 'int64', 'int64'],
+        lod_levels=[0, 0, 0],
+        name=name + '_pyreader',
+        use_double_buffer=True)
+    u, v, label = l.read_file(pyreader)
+    u_embed = l.embedding(
+        input=u,
+        size=[num_nodes, hidden_size],
+        param_attr=fluid.ParamAttr(name='content'))
+    v_embed = l.embedding(
+        input=v,
+        size=[num_nodes, hidden_size],
+        param_attr=fluid.ParamAttr(name='content'))
+    u_embed.stop_gradient = True
+    v_embed.stop_gradient = True
+    edge_embed = binary_op(u_embed, v_embed, binary_op_type)
+    logit = l.fc(input=edge_embed, size=1)
+    loss = l.sigmoid_cross_entropy_with_logits(logit, l.cast(label, 'float32'))
+    loss = l.reduce_mean(loss)
+    prob = l.sigmoid(logit)
+    return pyreader, loss, prob, label
+def link_predict_generator(pos_edges,
+                           neg_edges,
+                           batch_size=512,
+                           epoch=2000,
+                           shuffle=True):
+    all_edges = []
+    for (u, v) in pos_edges:
+        all_edges.append([u, v, 1])
+    for (u, v) in neg_edges:
+        all_edges.append([u, v, 0])
+    all_edges = np.array(all_edges, np.int64)
+    def batch_edges_generator(shuffle=shuffle):
+        perm = np.arange(len(all_edges), dtype=np.int64)
+        if shuffle:
+            np.random.shuffle(perm)
+        start = 0
+        while start < len(all_edges):
+            yield all_edges[perm[start:start + batch_size]]
+            start += batch_size
+    def wrapper():
+        for _ in range(epoch):
+            for batch_edges in batch_edges_generator():
+                yield batch_edges.T[0:1].T, batch_edges.T[
+                    1:2].T, batch_edges.T[2:3].T
+    return wrapper
+def main(args):
+    hidden_size = args.hidden_size
+    epoch = args.epoch
+    ckpt_path = args.ckpt_path
+    dataset = load(args.dataset)
+    num_edges = len(dataset.pos_edges) + len(dataset.neg_edges)
+    train_num_edges = int(len(dataset.pos_edges) * 0.5) + int(
+        len(dataset.neg_edges) * 0.5)
+    test_num_edges = num_edges - train_num_edges
+    train_steps = (train_num_edges // train_num_edges) * epoch
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_prog = fluid.Program()
+    test_prog = fluid.Program()
+    startup_prog = fluid.Program()
+    with fluid.program_guard(train_prog, startup_prog):
+        with fluid.unique_name.guard():
+            train_pyreader, train_loss, train_probs, train_labels = link_predict_model(
+                dataset.graph.num_nodes, hidden_size=hidden_size, name='train')
+            lr = l.polynomial_decay(0.025, train_steps, 0.0001)
+            adam = fluid.optimizer.Adam(lr)
+            adam.minimize(train_loss)
+    with fluid.program_guard(test_prog, startup_prog):
+        with fluid.unique_name.guard():
+            test_pyreader, test_loss, test_probs, test_labels = link_predict_model(
+                dataset.graph.num_nodes, hidden_size=hidden_size, name='test')
+    test_prog = test_prog.clone(for_test=True)
+    train_pyreader.decorate_tensor_provider(
+        link_predict_generator(
+            dataset.pos_edges[:train_num_edges // 2],
+            dataset.neg_edges[:train_num_edges // 2],
+            batch_size=train_num_edges,
+            epoch=epoch))
+    test_pyreader.decorate_tensor_provider(
+        link_predict_generator(
+            dataset.pos_edges[train_num_edges // 2:],
+            dataset.neg_edges[train_num_edges // 2:],
+            batch_size=test_num_edges,
+            epoch=1))
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    train_pyreader.start()
+    def existed_params(var):
+        if not isinstance(var, fluid.framework.Parameter):
+            return False
+        return os.path.exists(os.path.join(ckpt_path, var.name))
+    fluid.io.load_vars(
+        exe, ckpt_path, main_program=train_prog, predicate=existed_params)
+    step = 0
+    prev_time = time.time()
+    while 1:
+        try:
+            train_loss_val, train_probs_val, train_labels_val = exe.run(
+                train_prog,
+                fetch_list=[train_loss, train_probs, train_labels],
+                return_numpy=True)
+            fpr, tpr, thresholds = metrics.roc_curve(train_labels_val,
+                                                     train_probs_val)
+            train_auc = metrics.auc(fpr, tpr)
+            step += 1
+            log.info("Step %d " % step + "Train Loss: %f " % train_loss_val +
+                     "Train AUC: %f " % train_auc)
+        except fluid.core.EOFException:
+            train_pyreader.reset()
+            break
+        test_pyreader.start()
+        test_probs_vals, test_labels_vals = [], []
+        while 1:
+            try:
+                test_loss_val, test_probs_val, test_labels_val = exe.run(
+                    test_prog,
+                    fetch_list=[test_loss, test_probs, test_labels],
+                    return_numpy=True)
+                test_probs_vals.append(
+                    test_probs_val), test_labels_vals.append(test_labels_val)
+            except fluid.core.EOFException:
+                test_pyreader.reset()
+                test_probs_array = np.concatenate(test_probs_vals)
+                test_labels_array = np.concatenate(test_labels_vals)
+                fpr, tpr, thresholds = metrics.roc_curve(test_labels_array,
+                                                         test_probs_array)
+                test_auc = metrics.auc(fpr, tpr)
+                log.info("\t\tStep %d " % step + "Test Loss: %f " %
+                         test_loss_val + "Test AUC: %f " % test_auc)
+                break
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='node2vec')
+    parser.add_argument(
+        "--dataset",
+        type=str,
+        default="ArXiv",
+        help="dataset (BlogCatalog, ArXiv)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--epoch", type=int, default=400)
+    parser.add_argument("--batch_size", type=int, default=None)
+    parser.add_argument(
+        "--ckpt_path",
+        type=str,
+        default="./tmp/deepwalk_arxiv_e10/paddle_model")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/node2vec/multi_class.py
+++ b/examples/node2vec/multi_class.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import time
+import math
+import os
+import numpy as np
+import sklearn.metrics
+from sklearn.metrics import f1_score
+import pgl
+from pgl import data_loader
+from pgl.utils import op
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import paddle.fluid.layers as l
+np.random.seed(123)
+def load(name):
+    if name == 'BlogCatalog':
+        dataset = data_loader.BlogCatalogDataset()
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+def node_classify_model(graph,
+                        num_labels,
+                        hidden_size=16,
+                        name='node_classify_task'):
+    pyreader = l.py_reader(
+        capacity=70,
+        shapes=[[-1, 1], [-1, num_labels]],
+        dtypes=['int64', 'float32'],
+        lod_levels=[0, 0],
+        name=name + '_pyreader',
+        use_double_buffer=True)
+    nodes, labels = l.read_file(pyreader)
+    embed_nodes = l.embedding(
+        input=nodes,
+        size=[graph.num_nodes, hidden_size],
+        param_attr=fluid.ParamAttr(name='content'))
+    embed_nodes.stop_gradient = True
+    logits = l.fc(input=embed_nodes, size=num_labels)
+    loss = l.sigmoid_cross_entropy_with_logits(logits, labels)
+    loss = l.reduce_mean(loss)
+    prob = l.sigmoid(logits)
+    topk = l.reduce_sum(labels, -1)
+    return pyreader, loss, prob, labels, topk
+def node_classify_generator(graph,
+                            all_nodes=None,
+                            batch_size=512,
+                            epoch=1,
+                            shuffle=True):
+    if all_nodes is None:
+        all_nodes = np.arange(graph.num_nodes)
+    #labels = (np.random.rand(512, 39) > 0.95).astype(np.float32)
+    def batch_nodes_generator(shuffle=shuffle):
+        perm = np.arange(len(all_nodes), dtype=np.int64)
+        if shuffle:
+            np.random.shuffle(perm)
+        start = 0
+        while start < len(all_nodes):
+            yield all_nodes[perm[start:start + batch_size]]
+            start += batch_size
+    def wrapper():
+        for _ in range(epoch):
+            for batch_nodes in batch_nodes_generator():
+                batch_nodes_expanded = np.expand_dims(batch_nodes,
+                                                      -1).astype(np.int64)
+                batch_labels = graph.node_feat['group_id'][batch_nodes].astype(
+                    np.float32)
+                yield [batch_nodes_expanded, batch_labels]
+    return wrapper
+def topk_f1_score(labels,
+                  probs,
+                  topk_list=None,
+                  average="macro",
+                  threshold=None):
+    assert topk_list is not None or threshold is not None, "one of topklist and threshold should not be None"
+    if threshold is not None:
+        preds = probs > threshold
+    else:
+        preds = np.zeros_like(labels, dtype=np.int64)
+        for idx, (prob, topk) in enumerate(zip(np.argsort(probs), topk_list)):
+            preds[idx][prob[-int(topk):]] = 1
+    return f1_score(labels, preds, average=average)
+def main(args):
+    hidden_size = args.hidden_size
+    epoch = args.epoch
+    ckpt_path = args.ckpt_path
+    threshold = args.threshold
+    dataset = load(args.dataset)
+    if args.batch_size is None:
+        batch_size = len(dataset.train_index)
+    else:
+        batch_size = args.batch_size
+    train_steps = (len(dataset.train_index) // batch_size) * epoch
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_prog = fluid.Program()
+    test_prog = fluid.Program()
+    startup_prog = fluid.Program()
+    with fluid.program_guard(train_prog, startup_prog):
+        with fluid.unique_name.guard():
+            train_pyreader, train_loss, train_probs, train_labels, train_topk = node_classify_model(
+                dataset.graph,
+                dataset.num_groups,
+                hidden_size=hidden_size,
+                name='train')
+            lr = l.polynomial_decay(0.025, train_steps, 0.0001)
+            adam = fluid.optimizer.Adam(lr)
+            adam.minimize(train_loss)
+    with fluid.program_guard(test_prog, startup_prog):
+        with fluid.unique_name.guard():
+            test_pyreader, test_loss, test_probs, test_labels, test_topk = node_classify_model(
+                dataset.graph,
+                dataset.num_groups,
+                hidden_size=hidden_size,
+                name='test')
+    test_prog = test_prog.clone(for_test=True)
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    train_pyreader.decorate_tensor_provider(
+        node_classify_generator(
+            dataset.graph,
+            dataset.train_index,
+            batch_size=batch_size,
+            epoch=epoch))
+    test_pyreader.decorate_tensor_provider(
+        node_classify_generator(
+            dataset.graph, dataset.test_index, batch_size=batch_size, epoch=1))
+    def existed_params(var):
+        if not isinstance(var, fluid.framework.Parameter):
+            return False
+        return os.path.exists(os.path.join(ckpt_path, var.name))
+    fluid.io.load_vars(
+        exe, ckpt_path, main_program=train_prog, predicate=existed_params)
+    step = 0
+    prev_time = time.time()
+    train_pyreader.start()
+    while 1:
+        try:
+            train_loss_val, train_probs_val, train_labels_val, train_topk_val = exe.run(
+                train_prog,
+                fetch_list=[
+                    train_loss, train_probs, train_labels, train_topk
+                ],
+                return_numpy=True)
+            train_macro_f1 = topk_f1_score(train_labels_val, train_probs_val,
+                                           train_topk_val, "macro", threshold)
+            train_micro_f1 = topk_f1_score(train_labels_val, train_probs_val,
+                                           train_topk_val, "micro", threshold)
+            step += 1
+            log.info("Step %d " % step + "Train Loss: %f " % train_loss_val +
+                     "Train Macro F1: %f " % train_macro_f1 +
+                     "Train Micro F1: %f " % train_micro_f1)
+        except fluid.core.EOFException:
+            train_pyreader.reset()
+            break
+        test_pyreader.start()
+        test_probs_vals, test_labels_vals, test_topk_vals = [], [], []
+        while 1:
+            try:
+                test_loss_val, test_probs_val, test_labels_val, test_topk_val = exe.run(
+                    test_prog,
+                    fetch_list=[
+                        test_loss, test_probs, test_labels, test_topk
+                    ],
+                    return_numpy=True)
+                test_probs_vals.append(
+                    test_probs_val), test_labels_vals.append(test_labels_val)
+                test_topk_vals.append(test_topk_val)
+            except fluid.core.EOFException:
+                test_pyreader.reset()
+                test_probs_array = np.concatenate(test_probs_vals)
+                test_labels_array = np.concatenate(test_labels_vals)
+                test_topk_array = np.concatenate(test_topk_vals)
+                test_macro_f1 = topk_f1_score(
+                    test_labels_array, test_probs_array, test_topk_array,
+                    "macro", threshold)
+                test_micro_f1 = topk_f1_score(
+                    test_labels_array, test_probs_array, test_topk_array,
+                    "micro", threshold)
+                log.info("\t\tStep %d " % step + "Test Loss: %f " %
+                         test_loss_val + "Test Macro F1: %f " % test_macro_f1 +
+                         "Test Micro F1: %f " % test_micro_f1)
+                break
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='node2vec')
+    parser.add_argument(
+        "--dataset",
+        type=str,
+        default="BlogCatalog",
+        help="dataset (BlogCatalog)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--epoch", type=int, default=400)
+    parser.add_argument("--batch_size", type=int, default=None)
+    parser.add_argument("--threshold", type=float, default=0.3)
+    parser.add_argument(
+        "--ckpt_path",
+        type=str,
+        default="./tmp/baseline_node2vec/paddle_model")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/node2vec/node2vec.py
+++ b/examples/node2vec/node2vec.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import time
+import math
+import os
+import io
+from multiprocessing import Pool
+import glob
+import numpy as np
+import sklearn.metrics
+from sklearn.metrics import f1_score
+import pgl
+from pgl import data_loader
+from pgl.utils import op
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import paddle.fluid.layers as l
+def load(name):
+    if name == "BlogCatalog":
+        dataset = data_loader.BlogCatalogDataset()
+    elif name == "ArXiv":
+        dataset = data_loader.ArXivDataset()
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+def node2vec_model(graph, hidden_size=16, neg_num=5):
+    pyreader = l.py_reader(
+        capacity=70,
+        shapes=[[-1, 1, 1], [-1, 1, 1], [-1, neg_num, 1]],
+        dtypes=['int64', 'int64', 'int64'],
+        lod_levels=[0, 0, 0],
+        name='train',
+        use_double_buffer=True)
+    embed_init = fluid.initializer.UniformInitializer(low=-1.0, high=1.0)
+    weight_init = fluid.initializer.TruncatedNormal(scale=1.0 /
+                                                    math.sqrt(hidden_size))
+    src, pos, negs = l.read_file(pyreader)
+    embed_src = l.embedding(
+        input=src,
+        size=[graph.num_nodes, hidden_size],
+        param_attr=fluid.ParamAttr(
+            name='content', initializer=embed_init))
+    weight_pos = l.embedding(
+        input=pos,
+        size=[graph.num_nodes, hidden_size],
+        param_attr=fluid.ParamAttr(
+            name='weight', initializer=weight_init))
+    weight_negs = l.embedding(
+        input=negs,
+        size=[graph.num_nodes, hidden_size],
+        param_attr=fluid.ParamAttr(
+            name='weight', initializer=weight_init))
+    pos_logits = l.matmul(
+        embed_src, weight_pos, transpose_y=True)  # [batch_size, 1, 1]
+    neg_logits = l.matmul(
+        embed_src, weight_negs, transpose_y=True)  # [batch_size, 1, neg_num]
+    ones_label = pos_logits * 0. + 1.
+    ones_label.stop_gradient = True
+    pos_loss = l.sigmoid_cross_entropy_with_logits(pos_logits, ones_label)
+    zeros_label = neg_logits * 0.
+    zeros_label.stop_gradient = True
+    neg_loss = l.sigmoid_cross_entropy_with_logits(neg_logits, zeros_label)
+    loss = (l.reduce_mean(pos_loss) + l.reduce_mean(neg_loss)) / 2
+    return pyreader, loss
+def gen_pair(walks, left_win_size=2, right_win_size=2):
+    src = []
+    pos = []
+    for walk in walks:
+        for left_offset in range(1, left_win_size + 1):
+            src.extend(walk[left_offset:])
+            pos.extend(walk[:-left_offset])
+        for right_offset in range(1, right_win_size + 1):
+            src.extend(walk[:-right_offset])
+            pos.extend(walk[right_offset:])
+    src, pos = np.array(src, dtype=np.int64), np.array(pos, dtype=np.int64)
+    src, pos = np.expand_dims(src, -1), np.expand_dims(pos, -1)
+    src, pos = np.expand_dims(src, -1), np.expand_dims(pos, -1)
+    return src, pos
+def node2vec_generator(graph,
+                       batch_size=512,
+                       walk_len=5,
+                       p=0.25,
+                       q=0.25,
+                       win_size=2,
+                       neg_num=5,
+                       epoch=200,
+                       filelist=None):
+    def walks_generator():
+        if filelist is not None:
+            bucket = []
+            for filename in filelist:
+                with io.open(filename) as inf:
+                    for line in inf:
+                        walk = [int(x) for x in line.strip('\n').split(' ')]
+                        bucket.append(walk)
+                        if len(bucket) == batch_size:
+                            yield bucket
+                            bucket = []
+            if len(bucket):
+                yield bucket
+        else:
+            for _ in range(epoch):
+                for nodes in graph.node_batch_iter(batch_size):
+                    walks = graph.node2vec_random_walk(nodes, walk_len, p, q)
+                    yield walks
+    def wrapper():
+        for walks in walks_generator():
+            src, pos = gen_pair(walks, win_size, win_size)
+            if src.shape[0] == 0:
+                continue
+            negs = graph.sample_nodes([len(src), neg_num, 1]).astype(np.int64)
+            yield [src, pos, negs]
+    return wrapper
+def process(args):
+    idx, graph, save_path, epoch, batch_size, walk_len, p, q, seed = args
+    with open('%s/%s' % (save_path, idx), 'w') as outf:
+        for _ in range(epoch):
+            np.random.seed(seed)
+            for nodes in graph.node_batch_iter(batch_size):
+                walks = graph.node2vec_random_walk(nodes, walk_len, p, q)
+                for walk in walks:
+                    outf.write(' '.join([str(token) for token in walk]) + '\n')
+def main(args):
+    hidden_size = args.hidden_size
+    neg_num = args.neg_num
+    epoch = args.epoch
+    p = args.p
+    q = args.q
+    save_path = args.save_path
+    batch_size = args.batch_size
+    walk_len = args.walk_len
+    win_size = args.win_size
+    if not os.path.isdir(save_path):
+        os.makedirs(save_path)
+    dataset = load(args.dataset)
+    if args.offline_learning:
+        log.info("Start random walk on disk...")
+        walk_save_path = os.path.join(save_path, "walks")
+        if not os.path.isdir(walk_save_path):
+            os.makedirs(walk_save_path)
+        pool = Pool(args.processes)
+        args_list = [(x, dataset.graph, walk_save_path, 1, batch_size,
+                      walk_len, p, q, np.random.randint(2**32))
+                     for x in range(epoch)]
+        pool.map(process, args_list)
+        filelist = glob.glob(os.path.join(walk_save_path, "*"))
+        log.info("Random walk on disk Done.")
+    else:
+        filelist = None
+    train_steps = int(dataset.graph.num_nodes / batch_size) * epoch
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    node2vec_prog = fluid.Program()
+    startup_prog = fluid.Program()
+    with fluid.program_guard(node2vec_prog, startup_prog):
+        with fluid.unique_name.guard():
+            node2vec_pyreader, node2vec_loss = node2vec_model(
+                dataset.graph, hidden_size=hidden_size, neg_num=neg_num)
+            lr = l.polynomial_decay(0.025, train_steps, 0.0001)
+            adam = fluid.optimizer.Adam(lr)
+            adam.minimize(node2vec_loss)
+    node2vec_pyreader.decorate_tensor_provider(
+        node2vec_generator(
+            dataset.graph,
+            batch_size=batch_size,
+            walk_len=walk_len,
+            win_size=win_size,
+            epoch=epoch,
+            neg_num=neg_num,
+            p=p,
+            q=q,
+            filelist=filelist))
+    node2vec_pyreader.start()
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    prev_time = time.time()
+    step = 0
+    while 1:
+        try:
+            node2vec_loss_val = exe.run(node2vec_prog,
+                                        fetch_list=[node2vec_loss],
+                                        return_numpy=True)[0]
+            cur_time = time.time()
+            use_time = cur_time - prev_time
+            prev_time = cur_time
+            step += 1
+            log.info("Step %d " % step + "Node2vec Loss: %f " %
+                     node2vec_loss_val + " %f s/step." % use_time)
+        except fluid.core.EOFException:
+            node2vec_pyreader.reset()
+            break
+    fluid.io.save_persistables(exe,
+                               os.path.join(save_path, "paddle_model"),
+                               node2vec_prog)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='node2vec')
+    parser.add_argument(
+        "--dataset",
+        type=str,
+        default="BlogCatalog",
+        help="dataset (BlogCatalog, ArXiv)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    parser.add_argument(
+        "--offline_learning", action='store_true', help="use_cuda")
+    parser.add_argument("--hidden_size", type=int, default=128)
+    parser.add_argument("--neg_num", type=int, default=20)
+    parser.add_argument("--epoch", type=int, default=100)
+    parser.add_argument("--batch_size", type=int, default=1024)
+    parser.add_argument("--walk_len", type=int, default=40)
+    parser.add_argument("--win_size", type=int, default=10)
+    parser.add_argument("--p", type=float, default=0.25)
+    parser.add_argument("--q", type=float, default=0.25)
+    parser.add_argument("--save_path", type=str, default="./tmp/node2vec")
+    parser.add_argument("--processes", type=int, default=10)
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/static_gat/README.md
+++ b/examples/static_gat/README.md
+# PGL Examples for GAT with StaticGraphWrapper
+[Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of indicators as the paper in citation network benchmarks.
+However, different from the reproduction in **examples/gat**, we use `pgl.graph_wrapper.StaticGraphWrapper` to preload the graph data into gpu or cpu memories which achieves better performance on speed.
+### Datasets
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)| examples/gat | Improvement |
+| --- | --- | --- |---| --- | --- |
+| Cora | ~83% | 0.0145s | 0.0119s | 0.0175s | 1.47x |
+| Pubmed | ~78% | 0.0352s | 0.0193s |0.0295s | 1.53x |
+| Citeseer | ~70% | 0.0148s | 0.0124s |0.0253s | 2.04x |
+### How to run
+For examples, use gpu to train gat on cora dataset.
+```sh
+python train.py --dataset cora --use_cuda
+```
+#### Hyperparameters
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda. 
--- a/examples/static_gat/train.py
+++ b/examples/static_gat/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import pgl
+from pgl import data_loader
+from pgl.utils import paddle_helper
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import numpy as np
+import time
+import argparse
+def load(name):
+    if name == 'cora':
+        dataset = data_loader.CoraDataset()
+    elif name == "pubmed":
+        dataset = data_loader.CitationDataset("pubmed", symmetry_edges=False)
+    elif name == "citeseer":
+        dataset = data_loader.CitationDataset("citeseer", symmetry_edges=False)
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+def main(args):
+    dataset = load(args.dataset)
+    train_index = dataset.train_index
+    train_label = np.expand_dims(dataset.y[train_index], -1)
+    train_index = np.expand_dims(train_index, -1)
+    val_index = dataset.val_index
+    val_label = np.expand_dims(dataset.y[val_index], -1)
+    val_index = np.expand_dims(val_index, -1)
+    test_index = dataset.test_index
+    test_label = np.expand_dims(dataset.y[test_index], -1)
+    test_index = np.expand_dims(test_index, -1)
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    test_program = fluid.Program()
+    hidden_size = 16
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.StaticGraphWrapper(
+            name="graph", graph=dataset.graph, place=place)
+        output = pgl.layers.gat(gw,
+                                gw.node_feat["words"],
+                                hidden_size,
+                                activation="elu",
+                                name="gat_layer_1",
+                                num_heads=8,
+                                feat_drop=0.6,
+                                attn_drop=0.6,
+                                is_test=False)
+        output = pgl.layers.gat(gw,
+                                output,
+                                dataset.num_classes,
+                                num_heads=1,
+                                activation=None,
+                                name="gat_layer_2",
+                                feat_drop=0.6,
+                                attn_drop=0.6,
+                                is_test=False)
+    val_program = train_program.clone(for_test=True)
+    test_program = train_program.clone(for_test=True)
+    initializer = []
+    with fluid.program_guard(train_program, startup_program):
+        train_node_index, init = paddle_helper.constant(
+            "train_node_index", dtype="int32", value=train_index)
+        initializer.append(init)
+        train_node_label, init = paddle_helper.constant(
+            "train_node_label", dtype="int64", value=train_label)
+        initializer.append(init)
+        pred = fluid.layers.gather(output, train_node_index)
+        train_loss_t = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=train_node_label)
+        train_loss_t = fluid.layers.reduce_mean(train_loss_t)
+        adam = fluid.optimizer.Adam(
+            learning_rate=1e-2,
+            regularization=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0005))
+        adam.minimize(train_loss_t)
+    with fluid.program_guard(val_program, startup_program):
+        val_node_index, init = paddle_helper.constant(
+            "val_node_index", dtype="int32", value=val_index)
+        initializer.append(init)
+        val_node_label, init = paddle_helper.constant(
+            "val_node_label", dtype="int64", value=val_label)
+        initializer.append(init)
+        pred = fluid.layers.gather(output, val_node_index)
+        val_loss_t, pred = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=val_node_label, return_softmax=True)
+        val_acc_t = fluid.layers.accuracy(
+            input=pred, label=val_node_label, k=1)
+        val_loss_t = fluid.layers.reduce_mean(val_loss_t)
+    with fluid.program_guard(test_program, startup_program):
+        test_node_index, init = paddle_helper.constant(
+            "test_node_index", dtype="int32", value=test_index)
+        initializer.append(init)
+        test_node_label, init = paddle_helper.constant(
+            "test_node_label", dtype="int64", value=test_label)
+        initializer.append(init)
+        pred = fluid.layers.gather(output, test_node_index)
+        test_loss_t, pred = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=test_node_label, return_softmax=True)
+        test_acc_t = fluid.layers.accuracy(
+            input=pred, label=test_node_label, k=1)
+        test_loss_t = fluid.layers.reduce_mean(test_loss_t)
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    gw.initialize(place)
+    for init in initializer:
+        init(place)
+    dur = []
+    for epoch in range(200):
+        if epoch >= 3:
+            t0 = time.time()
+        train_loss = exe.run(train_program,
+                             feed={},
+                             fetch_list=[train_loss_t],
+                             return_numpy=True)
+        train_loss = train_loss[0]
+        if epoch >= 3:
+            time_per_epoch = 1.0 * (time.time() - t0)
+            dur.append(time_per_epoch)
+        val_loss, val_acc = exe.run(val_program,
+                                    feed={},
+                                    fetch_list=[val_loss_t, val_acc_t],
+                                    return_numpy=True)
+        log.info("Epoch %d " % epoch + "(%.5lf sec) " % np.mean(
+            dur) + "Train Loss: %f " % train_loss + "Val Loss: %f " % val_loss
+                 + "Val Acc: %f " % val_acc)
+    test_loss, test_acc = exe.run(test_program,
+                                  feed={},
+                                  fetch_list=[test_loss_t, test_acc_t],
+                                  return_numpy=True)
+    log.info("Accuracy: %f" % test_acc)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='GCN')
+    parser.add_argument(
+        "--dataset", type=str, default="cora", help="dataset (cora, pubmed)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/examples/static_gcn/README.md
+++ b/examples/static_gcn/README.md
+# PGL Examples for GCN with StaticGraphWrapper
+[Graph Convolutional Network \(GCN\)](https://arxiv.org/abs/1609.02907) is a powerful neural network designed for machine learning on graphs. Based on PGL, we reproduce GCN algorithms and reach the same level of indicators as the paper in citation network benchmarks. 
+However, different from the reproduction in **examples/gcn**, we use `pgl.graph_wrapper.StaticGraphWrapper` to preload the graph data into gpu or cpu memories which achieves better performance on speed.
+### Datasets
+The datasets contain three citation networks: CORA, PUBMED, CITESEER. The details for these three datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
+### Dependencies
+- paddlepaddle>=1.4 (The speed can be faster in 1.5.)
+- pgl
+### Performance
+We train our models for 200 epochs and report the accuracy on the test dataset.
+| Dataset | Accuracy | Speed with paddle 1.4 <br> (epoch time) | Speed with paddle 1.5 <br> (epoch time)| examples/gcn | Improvement |
+| --- | --- | --- |---| --- | --- |
+| Cora | ~81% | 0.0053s | 0.0047s | 0.0104s | 2.21x |
+| Pubmed | ~79% | 0.0105s  | 0.0049s |0.0154s | 3.14x |
+| Citeseer | ~71% | 0.0051s | 0.0045s |0.0177s | 3.93x |
+### How to run
+For examples, use gpu to train gcn on cora dataset.
+```sh
+python train.py --dataset cora --use_cuda
+```
+#### Hyperparameters
+- dataset: The citation dataset "cora", "citeseer", "pubmed".
+- use_cuda: Use gpu if assign use_cuda. 
--- a/examples/static_gcn/train.py
+++ b/examples/static_gcn/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import pgl
+from pgl import data_loader
+from pgl.utils import paddle_helper
+from pgl.utils.logger import log
+import paddle.fluid as fluid
+import numpy as np
+import time
+import argparse
+def load(name):
+    if name == 'cora':
+        dataset = data_loader.CoraDataset()
+    elif name == "pubmed":
+        dataset = data_loader.CitationDataset("pubmed", symmetry_edges=False)
+    elif name == "citeseer":
+        dataset = data_loader.CitationDataset("citeseer", symmetry_edges=False)
+    else:
+        raise ValueError(name + " dataset doesn't exists")
+    return dataset
+def main(args):
+    dataset = load(args.dataset)
+    # normalize
+    indegree = dataset.graph.indegree()
+    norm = np.zeros_like(indegree, dtype="float32")
+    norm[indegree > 0] = np.power(indegree[indegree > 0], -0.5)
+    dataset.graph.node_feat["norm"] = np.expand_dims(norm, -1)
+    train_index = dataset.train_index
+    train_label = np.expand_dims(dataset.y[train_index], -1)
+    train_index = np.expand_dims(train_index, -1)
+    val_index = dataset.val_index
+    val_label = np.expand_dims(dataset.y[val_index], -1)
+    val_index = np.expand_dims(val_index, -1)
+    test_index = dataset.test_index
+    test_label = np.expand_dims(dataset.y[test_index], -1)
+    test_index = np.expand_dims(test_index, -1)
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    test_program = fluid.Program()
+    hidden_size = 16
+    with fluid.program_guard(train_program, startup_program):
+        gw = pgl.graph_wrapper.StaticGraphWrapper(
+            name="graph", graph=dataset.graph, place=place)
+        output = pgl.layers.gcn(gw,
+                                gw.node_feat["words"],
+                                hidden_size,
+                                activation="relu",
+                                norm=gw.node_feat['norm'],
+                                name="gcn_layer_1")
+        output = fluid.layers.dropout(
+            output, 0.5, dropout_implementation='upscale_in_train')
+        output = pgl.layers.gcn(gw,
+                                output,
+                                dataset.num_classes,
+                                activation=None,
+                                norm=gw.node_feat['norm'],
+                                name="gcn_layer_2")
+    val_program = train_program.clone(for_test=True)
+    test_program = train_program.clone(for_test=True)
+    initializer = []
+    with fluid.program_guard(train_program, startup_program):
+        train_node_index, init = paddle_helper.constant(
+            "train_node_index", dtype="int32", value=train_index)
+        initializer.append(init)
+        train_node_label, init = paddle_helper.constant(
+            "train_node_label", dtype="int64", value=train_label)
+        initializer.append(init)
+        pred = fluid.layers.gather(output, train_node_index)
+        train_loss_t = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=train_node_label)
+        train_loss_t = fluid.layers.reduce_mean(train_loss_t)
+        adam = fluid.optimizer.Adam(
+            learning_rate=1e-2,
+            regularization=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0005))
+        adam.minimize(train_loss_t)
+    with fluid.program_guard(val_program, startup_program):
+        val_node_index, init = paddle_helper.constant(
+            "val_node_index", dtype="int32", value=val_index)
+        initializer.append(init)
+        val_node_label, init = paddle_helper.constant(
+            "val_node_label", dtype="int64", value=val_label)
+        initializer.append(init)
+        pred = fluid.layers.gather(output, val_node_index)
+        val_loss_t, pred = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=val_node_label, return_softmax=True)
+        val_acc_t = fluid.layers.accuracy(
+            input=pred, label=val_node_label, k=1)
+        val_loss_t = fluid.layers.reduce_mean(val_loss_t)
+    with fluid.program_guard(test_program, startup_program):
+        test_node_index, init = paddle_helper.constant(
+            "test_node_index", dtype="int32", value=test_index)
+        initializer.append(init)
+        test_node_label, init = paddle_helper.constant(
+            "test_node_label", dtype="int64", value=test_label)
+        initializer.append(init)
+        pred = fluid.layers.gather(output, test_node_index)
+        test_loss_t, pred = fluid.layers.softmax_with_cross_entropy(
+            logits=pred, label=test_node_label, return_softmax=True)
+        test_acc_t = fluid.layers.accuracy(
+            input=pred, label=test_node_label, k=1)
+        test_loss_t = fluid.layers.reduce_mean(test_loss_t)
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    gw.initialize(place)
+    for init in initializer:
+        init(place)
+    dur = []
+    for epoch in range(200):
+        if epoch >= 3:
+            t0 = time.time()
+        train_loss = exe.run(train_program,
+                             feed={},
+                             fetch_list=[train_loss_t],
+                             return_numpy=True)
+        train_loss = train_loss[0]
+        if epoch >= 3:
+            time_per_epoch = 1.0 * (time.time() - t0)
+            dur.append(time_per_epoch)
+        val_loss, val_acc = exe.run(val_program,
+                                    feed={},
+                                    fetch_list=[val_loss_t, val_acc_t],
+                                    return_numpy=True)
+        log.info("Epoch %d " % epoch + "(%.5lf sec) " % np.mean(
+            dur) + "Train Loss: %f " % train_loss + "Val Loss: %f " % val_loss
+                 + "Val Acc: %f " % val_acc)
+    test_loss, test_acc = exe.run(test_program,
+                                  feed={},
+                                  fetch_list=[test_loss_t, test_acc_t],
+                                  return_numpy=True)
+    log.info("Accuracy: %f" % test_acc)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='GCN')
+    parser.add_argument(
+        "--dataset", type=str, default="cora", help="dataset (cora, pubmed)")
+    parser.add_argument("--use_cuda", action='store_true', help="use_cuda")
+    args = parser.parse_args()
+    log.info(args)
+    main(args)
--- a/pgl/.gitignore
+++ b/pgl/.gitignore
+graph_kernel.cpp
+graph_kernel.*.so
--- a/pgl/README
+++ b/pgl/README
--- a/pgl/__init__.py
+++ b/pgl/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Generate pgl apis
+"""
+__version__ = "0.1.0.beta"
+from pgl import layers
+from pgl import graph_wrapper
+from pgl import graph
+from pgl import data_loader
--- a/pgl/data/BlogCatalog/edges.csv
+++ b/pgl/data/BlogCatalog/edges.csv
--- a/pgl/data/BlogCatalog/group-edges.csv
+++ b/pgl/data/BlogCatalog/group-edges.csv
--- a/pgl/data/BlogCatalog/groups.csv
+++ b/pgl/data/BlogCatalog/groups.csv
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
\ No newline at end of file
--- a/pgl/data/BlogCatalog/nodes.csv
+++ b/pgl/data/BlogCatalog/nodes.csv
--- a/pgl/data/arXiv/ca-AstroPh.txt
+++ b/pgl/data/arXiv/ca-AstroPh.txt
--- a/pgl/data/citeseer/ind.citeseer.allx
+++ b/pgl/data/citeseer/ind.citeseer.allx
--- a/pgl/data/citeseer/ind.citeseer.ally
+++ b/pgl/data/citeseer/ind.citeseer.ally
--- a/pgl/data/citeseer/ind.citeseer.graph
+++ b/pgl/data/citeseer/ind.citeseer.graph
--- a/pgl/data/citeseer/ind.citeseer.test.index
+++ b/pgl/data/citeseer/ind.citeseer.test.index
--- a/pgl/data/citeseer/ind.citeseer.tx
+++ b/pgl/data/citeseer/ind.citeseer.tx
--- a/pgl/data/citeseer/ind.citeseer.ty
+++ b/pgl/data/citeseer/ind.citeseer.ty
--- a/pgl/data/citeseer/ind.citeseer.x
+++ b/pgl/data/citeseer/ind.citeseer.x
--- a/pgl/data/citeseer/ind.citeseer.y
+++ b/pgl/data/citeseer/ind.citeseer.y
--- a/pgl/data/cora/README
+++ b/pgl/data/cora/README
+This directory contains the a selection of the Cora dataset (www.research.whizbang.com/data).
+The Cora dataset consists of Machine Learning papers. These papers are classified into one of the following seven classes:
+		Case_Based
+		Genetic_Algorithms
+		Neural_Networks
+		Probabilistic_Methods
+		Reinforcement_Learning
+		Rule_Learning
+		Theory
+The papers were selected in a way such that in the final corpus every paper cites or is cited by atleast one other paper. There are 2708 papers in the whole corpus. 
+After stemming and removing stopwords we were left with a vocabulary of size 1433 unique words. All words with document frequency less than 10 were removed.
+THE DIRECTORY CONTAINS TWO FILES:
+The .content file contains descriptions of the papers in the following format:
+		<paper_id> <word_attributes>+ <class_label>
+The first entry in each line contains the unique string ID of the paper followed by binary values indicating whether each word in the vocabulary is present (indicated by 1) or absent (indicated by 0) in the paper. Finally, the last entry in the line contains the class label of the paper.
+The .cites file contains the citation graph of the corpus. Each line describes a link in the following format:
+		<ID of cited paper> <ID of citing paper>
+Each line contains two paper IDs. The first entry is the ID of the paper being cited and the second ID stands for the paper which contains the citation. The direction of the link is from right to left. If a line is represented by "paper1 paper2" then the link is "paper2->paper1". 
\ No newline at end of file
--- a/pgl/data/cora/cora.cites
+++ b/pgl/data/cora/cora.cites
--- a/pgl/data/cora/cora.content
+++ b/pgl/data/cora/cora.content
--- a/pgl/data/pubmed/ind.pubmed.allx
+++ b/pgl/data/pubmed/ind.pubmed.allx
--- a/pgl/data/pubmed/ind.pubmed.ally
+++ b/pgl/data/pubmed/ind.pubmed.ally
--- a/pgl/data/pubmed/ind.pubmed.graph
+++ b/pgl/data/pubmed/ind.pubmed.graph
--- a/pgl/data/pubmed/ind.pubmed.test.index
+++ b/pgl/data/pubmed/ind.pubmed.test.index
--- a/pgl/data/pubmed/ind.pubmed.tx
+++ b/pgl/data/pubmed/ind.pubmed.tx
--- a/pgl/data/pubmed/ind.pubmed.ty
+++ b/pgl/data/pubmed/ind.pubmed.ty
--- a/pgl/data/pubmed/ind.pubmed.x
+++ b/pgl/data/pubmed/ind.pubmed.x
--- a/pgl/data/pubmed/ind.pubmed.y
+++ b/pgl/data/pubmed/ind.pubmed.y
--- a/pgl/data_loader.py
+++ b/pgl/data_loader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This package implements some benchmark dataset for graph network
+and node representation learning.
+"""
+import os
+import io
+import sys
+import numpy as np
+import pickle as pkl
+import networkx as nx
+from pgl import graph
+from pgl.utils.logger import log
+__all__ = [
+    "CitationDataset",
+    "CoraDataset",
+    "ArXivDataset",
+    "BlogCatalogDataset",
+]
+def get_default_data_dir(name):
+    """Get data path name"""
+    dir_path = os.path.abspath(os.path.dirname(__file__))
+    dir_path = os.path.join(dir_path, 'data')
+    filepath = os.path.join(dir_path, name)
+    return filepath
+def _pickle_load(pkl_file):
+    """Load pickle"""
+    if sys.version_info > (3, 0):
+        return pkl.load(pkl_file, encoding='latin1')
+    else:
+        return pkl.load(pkl_file)
+def _parse_index_file(filename):
+    """Parse index file."""
+    index = []
+    for line in open(filename):
+        index.append(int(line.strip()))
+    return index
+class CitationDataset(object):
+    """Citation dataset helps to create data for citation dataset (Pubmed and Citeseer)
+    Args:
+        name: The name for the dataset ("pubmed" or "citeseer")
+        symmetry_edges: Whether to create symmetry edges.
+        self_loop:  Whether to contain self loop edges.
+    Attributes:
+        graph: The :code:`Graph` data object
+        y: Labels for each nodes
+        num_classes: Number of classes.
+        train_index: The index for nodes in training set.
+        val_index: The index for nodes in validation set.
+        test_index: The index for nodes in test set.
+    """
+    def __init__(self, name, symmetry_edges=True, self_loop=True):
+        self.path = get_default_data_dir(name)
+        self.symmetry_edges = symmetry_edges
+        self.self_loop = self_loop
+        self.name = name
+        self._load_data()
+    def _load_data(self):
+        """Load data
+        """
+        objnames = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']
+        objects = []
+        for i in range(len(objnames)):
+            with open("{}/ind.{}.{}".format(self.path, self.name, objnames[i]),
+                      'rb') as f:
+                objects.append(_pickle_load(f))
+        x, y, tx, ty, allx, ally, _graph = tuple(objects)
+        test_idx_reorder = _parse_index_file("{}/ind.{}.test.index".format(
+            self.path, self.name))
+        test_idx_range = np.sort(test_idx_reorder)
+        allx = allx.todense()
+        tx = tx.todense()
+        if self.name == 'citeseer':
+            # Fix citeseer dataset (there are some isolated nodes in the graph)
+            # Find isolated nodes, add them as zero-vecs into the right position
+            test_idx_range_full = range(
+                min(test_idx_reorder), max(test_idx_reorder) + 1)
+            tx_extended = np.zeros(
+                (len(test_idx_range_full), x.shape[1]), dtype="float32")
+            tx_extended[test_idx_range - min(test_idx_range), :] = tx
+            tx = tx_extended
+            ty_extended = np.zeros(
+                (len(test_idx_range_full), y.shape[1]), dtype="float32")
+            ty_extended[test_idx_range - min(test_idx_range), :] = ty
+            ty = ty_extended
+        features = np.vstack([allx, tx])
+        features[test_idx_reorder, :] = features[test_idx_range, :]
+        features = features / (np.sum(features, axis=-1) + 1e-15)
+        features = np.array(features, dtype="float32")
+        _graph = nx.DiGraph(nx.from_dict_of_lists(_graph))
+        onehot_labels = np.vstack((ally, ty))
+        onehot_labels[test_idx_reorder, :] = onehot_labels[test_idx_range, :]
+        labels = np.argmax(onehot_labels, 1)
+        idx_test = test_idx_range.tolist()
+        idx_train = range(len(y))
+        idx_val = range(len(y), len(y) + 500)
+        all_edges = []
+        for i in _graph.edges():
+            u, v = tuple(i)
+            all_edges.append((u, v))
+            if self.symmetry_edges:
+                all_edges.append((v, u))
+        if self.self_loop:
+            for i in range(_graph.number_of_nodes()):
+                all_edges.append((i, i))
+        all_edges = list(set(all_edges))
+        self.graph = graph.Graph(
+            num_nodes=_graph.number_of_nodes(),
+            edges=all_edges,
+            node_feat={"words": features})
+        self.y = np.array(labels, dtype="int64")
+        self.num_classes = onehot_labels.shape[1]
+        self.train_index = np.array(idx_train, dtype="int32")
+        self.val_index = np.array(idx_val, dtype="int32")
+        self.test_index = np.array(idx_test, dtype="int32")
+class CoraDataset(object):
+    """Cora dataset implementation
+    Args:
+        symmetry_edges: Whether to create symmetry edges.
+        self_loop:  Whether to contain self loop edges.
+    Attributes:
+        graph: The :code:`Graph` data object
+        y: Labels for each nodes
+        num_classes: Number of classes.
+        train_index: The index for nodes in training set.
+        val_index: The index for nodes in validation set.
+        test_index: The index for nodes in test set.
+    """
+    def __init__(self, symmetry_edges=True, self_loop=True):
+        self.path = get_default_data_dir("cora")
+        self.symmetry_edges = symmetry_edges
+        self.self_loop = self_loop
+        self._load_data()
+    def _load_data(self):
+        """Load data"""
+        content = os.path.join(self.path, 'cora.content')
+        cite = os.path.join(self.path, 'cora.cites')
+        node_feature = []
+        paper_ids = []
+        y = []
+        y_dict = {}
+        with open(content, 'r') as f:
+            for line in f:
+                line = line.strip().split()
+                paper_id = int(line[0])
+                paper_class = line[-1]
+                if paper_class not in y_dict:
+                    y_dict[paper_class] = len(y_dict)
+                feature = [int(i) for i in line[1:-1]]
+                feature_array = np.array(feature, dtype="float32")
+                # Normalize
+                feature_array = feature_array / (np.sum(feature_array) + 1e-15)
+                node_feature.append(feature_array)
+                y.append(y_dict[paper_class])
+                paper_ids.append(paper_id)
+        paper2vid = dict([(v, k) for (k, v) in enumerate(paper_ids)])
+        num_nodes = len(paper_ids)
+        node_feature = np.array(node_feature, dtype="float32")
+        all_edges = []
+        with open(cite, 'r') as f:
+            for line in f:
+                u, v = line.split()
+                u = paper2vid[int(u)]
+                v = paper2vid[int(v)]
+                all_edges.append((u, v))
+                if self.symmetry_edges:
+                    all_edges.append((v, u))
+        if self.self_loop:
+            for i in range(num_nodes):
+                all_edges.append((i, i))
+        all_edges = list(set(all_edges))
+        self.graph = graph.Graph(
+            num_nodes=num_nodes,
+            edges=all_edges,
+            node_feat={"words": node_feature})
+        perm = np.arange(0, num_nodes)
+        #np.random.shuffle(perm)
+        self.train_index = perm[:140]
+        self.val_index = perm[200:500]
+        self.test_index = perm[500:1500]
+        self.y = np.array(y, dtype="int64")
+        self.num_classes = len(y_dict)
+class BlogCatalogDataset(object):
+    """BlogCatalog dataset implementation
+    Args:
+        symmetry_edges: Whether to create symmetry edges.
+        self_loop:  Whether to contain self loop edges.
+    Attributes:
+        graph: The :code:`Graph` data object.
+        num_groups: Number of classes.
+        train_index: The index for nodes in training set.
+        test_index: The index for nodes in validation set.
+    """
+    def __init__(self, symmetry_edges=True, self_loop=False):
+        self.path = get_default_data_dir("BlogCatalog")
+        self.num_groups = 39
+        self.symmetry_edges = symmetry_edges
+        self.self_loop = self_loop
+        self._load_data()
+    def _load_data(self):
+        edge_path = os.path.join(self.path, 'edges.csv')
+        node_path = os.path.join(self.path, 'nodes.csv')
+        group_edge_path = os.path.join(self.path, 'group-edges.csv')
+        all_edges = []
+        with io.open(node_path) as inf:
+            num_nodes = len(inf.readlines())
+        node_feature = np.zeros((num_nodes, self.num_groups))
+        with io.open(group_edge_path) as inf:
+            for line in inf:
+                node_id, group_id = line.strip('\n').split(',')
+                node_id, group_id = int(node_id) - 1, int(group_id) - 1
+                node_feature[node_id][group_id] = 1
+        with io.open(edge_path) as inf:
+            for line in inf:
+                u, v = line.strip('\n').split(',')
+                u, v = int(u) - 1, int(v) - 1
+                all_edges.append((u, v))
+                if self.symmetry_edges:
+                    all_edges.append((v, u))
+        if self.self_loop:
+            for i in range(num_nodes):
+                all_edges.append((i, i))
+        all_edges = list(set(all_edges))
+        self.graph = graph.Graph(
+            num_nodes=num_nodes,
+            edges=all_edges,
+            node_feat={"group_id": node_feature})
+        perm = np.arange(0, num_nodes)
+        np.random.shuffle(perm)
+        train_num = int(num_nodes * 0.5)
+        self.train_index = perm[:train_num]
+        self.test_index = perm[train_num:]
+class ArXivDataset(object):
+    """ArXiv dataset implementation
+    Args:
+        np_random_seed: The random seed for numpy.
+    Attributes:
+        graph: The :code:`Graph` data object.
+    """
+    def __init__(self, np_random_seed=123):
+        self.path = get_default_data_dir("arXiv")
+        self.np_random_seed = np_random_seed
+        self._load_data()
+    def _load_data(self):
+        np.random.seed(self.np_random_seed)
+        edge_path = os.path.join(self.path, 'ca-AstroPh.txt')
+        bi_edges = set()
+        self.neg_edges = []
+        self.pos_edges = []
+        self.node2id = dict()
+        def node_id(node):
+            if node not in self.node2id:
+                self.node2id[node] = len(self.node2id)
+            return self.node2id[node]
+        with io.open(edge_path) as inf:
+            for _ in range(4):
+                inf.readline()
+            for line in inf:
+                u, v = line.strip('\n').split('\t')
+                u, v = node_id(u), node_id(v)
+                if u < v:
+                    bi_edges.add((u, v))
+                else:
+                    bi_edges.add((v, u))
+        num_nodes = len(self.node2id)
+        while len(self.neg_edges) < len(bi_edges) // 2:
+            random_edges = np.random.choice(num_nodes, [len(bi_edges), 2])
+            for (u, v) in random_edges:
+                if u != v and (u, v) not in bi_edges and (v, u
+                                                          ) not in bi_edges:
+                    self.neg_edges.append((u, v))
+                    if len(self.neg_edges) == len(bi_edges) // 2:
+                        break
+        bi_edges = list(bi_edges)
+        np.random.shuffle(bi_edges)
+        self.pos_edges = bi_edges[:len(bi_edges) // 2]
+        bi_edges = bi_edges[len(bi_edges) // 2:]
+        all_edges = []
+        for edge in bi_edges:
+            u, v = edge
+            all_edges.append((u, v))
+            all_edges.append((v, u))
+        self.graph = graph.Graph(num_nodes=num_nodes, edges=all_edges)
--- a/pgl/graph.py
+++ b/pgl/graph.py
--- a/pgl/graph_kernel.pyx
+++ b/pgl/graph_kernel.pyx
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+    Fast implementation for graph construction and sampling.
+"""
+import numpy as np
+cimport numpy as np
+cimport cython
+from libcpp.map cimport map
+from libcpp.set cimport set
+from libcpp.unordered_set cimport unordered_set
+from libcpp.unordered_map cimport unordered_map
+from libcpp.vector cimport vector
+from libc.stdlib cimport rand, RAND_MAX
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def build_index(np.ndarray[np.int32_t, ndim=1] u,
+        np.ndarray[np.int32_t, ndim=1] v,
+        int num_nodes):
+    """Building Edge Index
+    """
+    cdef int i
+    cdef int h=len(u)
+    cdef int n_size = num_nodes
+    cdef np.ndarray[np.int32_t, ndim=1] degree = np.zeros([n_size], dtype=np.int32)
+    cdef np.ndarray[np.int32_t, ndim=1] count = np.zeros([n_size], dtype=np.int32)
+    cdef np.ndarray[np.int32_t, ndim=1] _tmp_v = np.zeros([h], dtype=np.int32)
+    cdef np.ndarray[np.int32_t, ndim=1] _tmp_u = np.zeros([h], dtype=np.int32)
+    cdef np.ndarray[np.int32_t, ndim=1] _tmp_eid = np.zeros([h], dtype=np.int32)
+    cdef np.ndarray[np.int32_t, ndim=1] indptr = np.zeros([n_size + 1], dtype=np.int32)
+    with nogil:
+        for i in xrange(h):
+            degree[u[i]] += 1
+        for i in xrange(n_size):
+            indptr[i + 1] = indptr[i] + degree[i]
+        for i in xrange(h):
+            _tmp_v[indptr[u[i]] + count[u[i]]] = v[i]
+            _tmp_eid[indptr[u[i]] + count[u[i]]] = i
+            _tmp_u[indptr[u[i]] + count[u[i]]] = u[i]
+            count[u[i]] += 1
+    cdef list output_eid = []
+    cdef list output_v = []
+    for i in xrange(n_size):
+        output_eid.append(_tmp_eid[indptr[i]:indptr[i+1]])
+        output_v.append(_tmp_v[indptr[i]:indptr[i+1]])
+    return np.array(output_v), np.array(output_eid), degree, _tmp_u, _tmp_v, _tmp_eid
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def map_edges(np.ndarray[np.int32_t, ndim=1] eid,
+        np.ndarray[np.int32_t, ndim=2] edges,
+        reindex):
+    """Mapping edges by given dictionary
+    """
+    cdef unordered_map[int, int] m = reindex
+    cdef int i = 0
+    cdef int h = len(eid)
+    cdef np.ndarray[np.int32_t, ndim=2] r_edges = np.zeros([h, 2], dtype=np.int32)
+    cdef int j
+    with nogil:
+        for i in xrange(h):
+            j = eid[i]
+            r_edges[i, 0] = m[edges[j, 0]]
+            r_edges[i, 1] = m[edges[j, 1]]
+    return r_edges
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def map_nodes(nodes, reindex):
+    """Mapping nodes by given dictionary
+    """
+    cdef unordered_map[int, int] m = reindex
+    cdef int i = 0
+    cdef int h = len(nodes)
+    cdef np.ndarray[np.int32_t, ndim=1] new_nodes = np.zeros([h], dtype=np.int32)
+    cdef int j
+    for i in xrange(h):
+        j = nodes[i]
+        new_nodes[i] = m[j]
+    return new_nodes
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def node2vec_sample(np.ndarray[np.int32_t, ndim=1] succ,
+        np.ndarray[np.int32_t, ndim=1] prev_succ, int prev_node, 
+        float p, float q):
+    """Fast implement of node2vec sampling
+    """
+    cdef int i
+    cdef succ_len = len(succ)
+    cdef prev_succ_len = len(prev_succ)
+    cdef vector[float] probs
+    cdef float prob_sum = 0
+    cdef unordered_set[int] prev_succ_set
+    for i in xrange(prev_succ_len):
+        prev_succ_set.insert(prev_succ[i])
+    cdef float prob
+    for i in xrange(succ_len):
+        if succ[i] == prev_node:
+            prob = 1. / p
+        elif prev_succ_set.find(succ[i]) != prev_succ_set.end():
+            prob = 1.
+        else:
+            prob = 1. / q
+        probs.push_back(prob)
+        prob_sum += prob
+    cdef float rand_num = float(rand())/RAND_MAX * prob_sum
+    cdef int sample_succ = 0
+    for i in xrange(succ_len):
+        rand_num -= probs[i]
+        if rand_num <= 0:
+            sample_succ = succ[i]
+            return sample_succ
--- a/pgl/graph_wrapper.py
+++ b/pgl/graph_wrapper.py
--- a/pgl/layers/__init__.py
+++ b/pgl/layers/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Generate layers api
+"""
+from pgl.layers import conv
+from pgl.layers.conv import *
+__all__ = []
+__all__ += conv.__all__
--- a/pgl/layers/conv.py
+++ b/pgl/layers/conv.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""This package implements common layers to help building
+graph neural networks.
+"""
+import paddle.fluid as fluid
+from pgl import graph_wrapper
+from pgl.utils import paddle_helper
+__all__ = ['gcn', 'gat']
+def gcn(gw, feature, hidden_size, activation, name, norm=None):
+    """Implementation of graph convolutional neural networks (GCN)
+    This is an implementation of the paper SEMI-SUPERVISED CLASSIFICATION
+    WITH GRAPH CONVOLUTIONAL NETWORKS (https://arxiv.org/pdf/1609.02907.pdf).
+    Args:
+        gw: Graph wrapper object (:code:`StaticGraphWrapper` or :code:`GraphWrapper`)
+        feature: A tensor with shape (num_nodes, feature_size).
+        hidden_size: The hidden size for gcn.
+        activation: The activation for the output.
+        name: Gcn layer names.
+        norm: If :code:`norm` is not None, then the feature will be normalized. Norm must
+              be tensor with shape (num_nodes,) and dtype float32.
+    Return:
+        A tensor with shape (num_nodes, hidden_size)
+    """
+    def send_src_copy(src_feat, dst_feat, edge_feat):
+        return src_feat["h"]
+    size = feature.shape[-1]
+    if size > hidden_size:
+        feature = fluid.layers.fc(feature,
+                                  size=hidden_size,
+                                  bias_attr=False,
+                                  name=name)
+    if norm is not None:
+        feature = feature * norm
+    msg = gw.send(send_src_copy, nfeat_list=[("h", feature)])
+    if size > hidden_size:
+        output = gw.recv(msg, "sum")
+    else:
+        output = gw.recv(msg, "sum")
+        output = fluid.layers.fc(output,
+                                 size=hidden_size,
+                                 bias_attr=False,
+                                 name=name)
+    if norm is not None:
+        output = output * norm
+    bias = fluid.layers.create_parameter(
+        shape=[hidden_size],
+        dtype='float32',
+        is_bias=True,
+        name=name + '_bias')
+    output = fluid.layers.elementwise_add(output, bias, act=activation)
+    return output
+def gat(gw,
+        feature,
+        hidden_size,
+        activation,
+        name,
+        num_heads=8,
+        feat_drop=0.6,
+        attn_drop=0.6,
+        is_test=False):
+    """Implementation of graph attention networks (GAT)
+    This is an implementation of the paper GRAPH ATTENTION NETWORKS
+    (https://arxiv.org/abs/1710.10903).
+    Args:
+        gw: Graph wrapper object (:code:`StaticGraphWrapper` or :code:`GraphWrapper`)
+        feature: A tensor with shape (num_nodes, feature_size).
+        hidden_size: The hidden size for gat.
+        activation: The activation for the output.
+        name: Gat layer names.
+        num_heads: The head number in gat.
+        feat_drop: Dropout rate for feature.
+        attn_drop: Dropout rate for attention.
+        is_test: Whether in test phrase.
+    Return:
+        A tensor with shape (num_nodes, hidden_size * num_heads)
+    """
+    def send_attention(src_feat, dst_feat, edge_feat):
+        output = src_feat["left_a"] + dst_feat["right_a"]
+        output = fluid.layers.leaky_relu(
+            output, alpha=0.2)  # (num_edges, num_heads)
+        return {"alpha": output, "h": src_feat["h"]}
+    def reduce_attention(msg):
+        alpha = msg["alpha"]  # lod-tensor (batch_size, seq_len, num_heads)
+        h = msg["h"]
+        alpha = paddle_helper.sequence_softmax(alpha)
+        old_h = h
+        h = fluid.layers.reshape(h, [-1, num_heads, hidden_size])
+        alpha = fluid.layers.reshape(alpha, [-1, num_heads, 1])
+        if attn_drop > 1e-15:
+            alpha = fluid.layers.dropout(
+                alpha,
+                dropout_prob=attn_drop,
+                is_test=is_test,
+                dropout_implementation="upscale_in_train")
+        h = h * alpha
+        h = fluid.layers.reshape(h, [-1, num_heads * hidden_size])
+        h = fluid.layers.lod_reset(h, old_h)
+        return fluid.layers.sequence_pool(h, "sum")
+    if feat_drop > 1e-15:
+        feature = fluid.layers.dropout(
+            feature,
+            dropout_prob=feat_drop,
+            is_test=is_test,
+            dropout_implementation='upscale_in_train')
+    ft = fluid.layers.fc(feature,
+                         hidden_size * num_heads,
+                         bias_attr=False,
+                         name=name + '_weight')
+    left_a = fluid.layers.create_parameter(
+        shape=[num_heads, hidden_size],
+        dtype='float32',
+        name=name + '_gat_l_A')
+    right_a = fluid.layers.create_parameter(
+        shape=[num_heads, hidden_size],
+        dtype='float32',
+        name=name + '_gat_r_A')
+    reshape_ft = fluid.layers.reshape(ft, [-1, num_heads, hidden_size])
+    left_a_value = fluid.layers.reduce_sum(reshape_ft * left_a, -1)
+    right_a_value = fluid.layers.reduce_sum(reshape_ft * right_a, -1)
+    msg = gw.send(
+        send_attention,
+        nfeat_list=[("h", ft), ("left_a", left_a_value),
+                    ("right_a", right_a_value)])
+    output = gw.recv(msg, reduce_attention)
+    bias = fluid.layers.create_parameter(
+        shape=[hidden_size * num_heads],
+        dtype='float32',
+        is_bias=True,
+        name=name + '_bias')
+    bias.stop_gradient = True
+    output = fluid.layers.elementwise_add(output, bias, act=activation)
+    return output
--- a/pgl/tests/graph_test.py
+++ b/pgl/tests/graph_test.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import unittest
+import numpy as np
+from pgl.graph import Graph
+class GraphTest(unittest.TestCase):
+    def setUp(self):
+        num_nodes = 5
+        edges = [(0, 1), (1, 2), (3, 4)]
+        feature = np.random.randn(5, 100)
+        edge_feature = np.random.randn(3, 100)
+        self.graph = Graph(
+            num_nodes=num_nodes,
+            edges=edges,
+            node_feat={"feature": feature},
+            edge_feat={"edge_feature": edge_feature})
+    def test_subgraph_consistency(self):
+        node_index = [0, 2, 3, 4]
+        eid = [2]
+        subgraph = self.graph.subgraph(node_index, eid)
+        for key, value in subgraph.node_feat.items():
+            diff = value - self.graph.node_feat[key][node_index]
+            diff = np.sqrt(np.sum(diff * diff))
+            self.assertLessEqual(diff, 1e-6)
+        for key, value in subgraph.edge_feat.items():
+            diff = value - self.graph.edge_feat[key][eid]
+            diff = np.sqrt(np.sum(diff * diff))
+            self.assertLessEqual(diff, 1e-6)
+if __name__ == '__main__':
+    unittest.main()
--- a/pgl/tests/import_test.py
+++ b/pgl/tests/import_test.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import unittest
+class ImportTest(unittest.TestCase):
+    def test_import_pgl_alone(self):
+        import pgl
+if __name__ == '__main__':
+    unittest.main()
--- a/pgl/utils/__init__.py
+++ b/pgl/utils/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/pgl/utils/logger.py
+++ b/pgl/utils/logger.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Logger setup
+"""
+import logging
+log = logging.getLogger('pgl')
+console = logging.StreamHandler()
+log.addHandler(console)
+formatter = logging.Formatter(
+    fmt='[%(levelname)s] %(asctime)s [%(filename)12s:%(lineno)5d]:\t%(message)s'
+)
+console.setFormatter(formatter)
+log.setLevel(logging.DEBUG)
+log.propagate = False
--- a/pgl/utils/op.py
+++ b/pgl/utils/op.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+op package provide some common ops for building paddle model.
+"""
+import numpy as np
+import paddle.fluid as fluid
+from pgl.utils import paddle_helper
+def nested_lod_reset(data, reset_index):
+    """Reset the lod information as the one in given reset_index.
+    This function apply :code:`fluid.layers.lod_reset` recursively
+    to all the tensor in nested data.
+    Args:
+        data: A dictionary of tensor or tensor.
+        reset_index: A variable which the target lod information comes from.
+    Return:
+        Return a dictionary of LodTensor of LodTensor.
+    """
+    if data is None:
+        return None
+    elif isinstance(data, dict):
+        new_data = {}
+        for key, value in data.items():
+            new_data[key] = nested_lod_reset(value, reset_index)
+        return new_data
+    else:
+        return fluid.layers.lod_reset(data, reset_index)
+def read_rows(data, index):
+    """Slice tensor with given index from dictionary of tensor or tensor
+    This function helps to slice data from nested dictionary structure.
+    Args:
+        data: A dictionary of tensor or tensor
+        index: A tensor of slicing index
+    Returns:
+        Return a dictionary of tensor or tensor.
+    """
+    if data is None:
+        return None
+    elif isinstance(data, dict):
+        new_data = {}
+        for key, value in data.items():
+            new_data[key] = read_rows(value, index)
+        return new_data
+    else:
+        return paddle_helper.gather(data, index)
--- a/pgl/utils/paddle_helper.py
+++ b/pgl/utils/paddle_helper.py
--- a/requirements.txt
+++ b/requirements.txt
+numpy >= 1.14.5
+cython >= 0.25.2
--- a/setup.py
+++ b/setup.py
--- a/tutorials/1-Introduction.ipynb
+++ b/tutorials/1-Introduction.ipynb