diff --git a/README.md b/README.md index f52f4807252fc1f4e5da98aea2ec2824fcb0a9b3..3a58bf618ebcde862336f281b21ca5078dfb7ef0 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,28 @@ -# PGL ReadMe -# PGL README.md - # Paddle Graph Learning (PGL) -[API](https://xx) | [Tutorials](https://xx) +[DOC](https://pgl.readthedocs.io/en/latest/) | [Quick Start](https://pgl.readthedocs.io/en/latest/instruction.html) | [中文](./README.zh.md) Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle). -
 -
- +
 
 We provide python interfaces for storing/reading/querying graph structured data and two fundamental computational interfaces, which are walk based paradigm and message-passing based paradigm as shown in the above framework of PGL, for building cutting-edge graph learning algorithms.  Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.
 
 
 ## Highlight: Efficient and Flexible Message Passing Paradigm
 
-One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt **Message Passing Paradigm** similar to [DGL](https://github.com/dmlc/dgl) to help to build a customize graph neural network easily. Users only need to write ```send``` and ```recv``` functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function $\phi^e$ to send the message from the source to the target node. For the second step, the recv function $\phi^v$ is responsible for aggregating $\oplus$ messages together from different sources.
+One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt **Message Passing Paradigm** similar to [DGL](https://github.com/dmlc/dgl) to help to build a customize graph neural network easily. Users only need to write ```send``` and ```recv``` functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function  to send the message from the source to the target node. For the second step, the recv function  is responsible for aggregating  messages together from different sources.
+
+
 
-
+
 
 We provide python interfaces for storing/reading/querying graph structured data and two fundamental computational interfaces, which are walk based paradigm and message-passing based paradigm as shown in the above framework of PGL, for building cutting-edge graph learning algorithms.  Combined with the PaddlePaddle deep learning framework, we are able to support both graph representation learning models and graph neural networks, and thus our framework has a wide range of graph-based applications.
 
 
 ## Highlight: Efficient and Flexible Message Passing Paradigm
 
-One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt **Message Passing Paradigm** similar to [DGL](https://github.com/dmlc/dgl) to help to build a customize graph neural network easily. Users only need to write ```send``` and ```recv``` functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function $\phi^e$ to send the message from the source to the target node. For the second step, the recv function $\phi^v$ is responsible for aggregating $\oplus$ messages together from different sources.
+One of the most important benefits of graph neural networks compared to other models is the ability to use node-to-node connectivity information, but coding the communication between nodes is very cumbersome. At PGL we adopt **Message Passing Paradigm** similar to [DGL](https://github.com/dmlc/dgl) to help to build a customize graph neural network easily. Users only need to write ```send``` and ```recv``` functions to easily implement a simple GCN. As shown in the following figure, for the first step the send function is defined on the edges of the graph, and the user can customize the send function  to send the message from the source to the target node. For the second step, the recv function  is responsible for aggregating  messages together from different sources.
+
+
 
- -
- +As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function  on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**. 
 
-As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function $\oplus$ on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**. 
 
-
 
+As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function  on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**. 
 
-As shown in the left of the following figure, to adapt general user-defined message aggregate functions, DGL uses the degree bucketing method to combine nodes with the same degree into a batch and then apply an aggregate function $\oplus$ on each batch serially. For our PGL UDF aggregate function, we organize the message as a [LodTensor](http://www.paddlepaddle.org/documentation/docs/en/1.4/user_guides/howto/basic_concept/lod_tensor_en.html) in [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) taking the message as variable length sequences. And we **utilize the features of LodTensor in Paddle to obtain fast parallel aggregation**. 
 
- -
- Users only need to call the ```sequence_ops``` functions provided by Paddle to easily implement efficient message aggregation. For examples, using ```sequence_pool``` to sum the neighbor message.
@@ -40,12 +32,15 @@ Users only need to call the ```sequence_ops``` functions provided by Paddle to e
         return fluid.layers.sequence_pool(msg, "sum")
 ```
 
-Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
+
+Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. In our experiments, PGL can reach up to 13 times the speed of DGL with complex user-defined functions. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
 
 ## Performance
 
+
 We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.
-| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL speed (epoch time) |
+
+| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) |
 | -------- | ----- | ----------------- | ------------ | ------------------------------------ |
 | Cora | GCN |81.75% | 0.0047s | **0.0045s** |
 | Cora | GAT | 83.5% | **0.0119s** | 0.0141s |
@@ -54,12 +49,22 @@ We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs t
 | Citeseer | GCN |70.2%| **0.0045** |0.0046s|
 | Citeseer | GAT |68.8%| **0.0124s** |0.0139s|
 
+If we use complex user-defined aggregation like [GraphSAGE-LSTM](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) that aggregates neighbor features with LSTM ignoring the order of recieved messages, the optimized message-passing in DGL will be forced to degenerate into degree bucketing scheme. The speed performance will be much slower than the one implemented in PGL. Performances may be various with different scale of the graph, in our experiments, PGL can reach up to 13 times the speed of DGL.
+
+| Dataset |   PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) | Speed up|
+| -------- |  ------------ | ------------------------------------ |----|
+| Cora | **0.0186s** | 0.1638s | 8.80x|
+| Pubmed | **0.0388s** |0.5275s | 13.59x|
+| Citeseer | **0.0150s** | 0.1278s | 8.52x |
+
+
+
 ## System requirements
 
 PGL requires:
 
 * paddle >= 1.6
-* networkx 
+* cython
 
 
 PGL supports both Python 2 & 3
@@ -67,10 +72,11 @@ PGL supports both Python 2 & 3
 
 ## Installation
 
-pip install pgl
-
-
+The current version of PGL is 1.0.0. You can simply install it via pip.
 
+```sh
+pip install pgl
+```
 
 ## The Team
 
diff --git a/examples/distribute_deepwalk/README.md b/examples/distribute_deepwalk/README.md
index d3839f60a623bfc3f98afed619f13d394c7aa0ee..c0c71fb45935093a34d24431d5db01f9d2d16a8d 100644
--- a/examples/distribute_deepwalk/README.md
+++ b/examples/distribute_deepwalk/README.md
@@ -1,4 +1,4 @@
-# distributed deepwalk in PGL
+# Distributed Deepwalk in PGL
 [Deepwalk](https://arxiv.org/pdf/1403.6652.pdf) is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce distributed deepwalk algorithms and reach the same level of indicators as the paper.
 
 ## Datasets
 
 
 Users only need to call the ```sequence_ops``` functions provided by Paddle to easily implement efficient message aggregation. For examples, using ```sequence_pool``` to sum the neighbor message.
@@ -40,12 +32,15 @@ Users only need to call the ```sequence_ops``` functions provided by Paddle to e
         return fluid.layers.sequence_pool(msg, "sum")
 ```
 
-Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
+
+Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. In our experiments, PGL can reach up to 13 times the speed of DGL with complex user-defined functions. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
 
 ## Performance
 
+
 We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.
-| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL speed (epoch time) |
+
+| Dataset | Model |  PGL Accuracy | PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) |
 | -------- | ----- | ----------------- | ------------ | ------------------------------------ |
 | Cora | GCN |81.75% | 0.0047s | **0.0045s** |
 | Cora | GAT | 83.5% | **0.0119s** | 0.0141s |
@@ -54,12 +49,22 @@ We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs t
 | Citeseer | GCN |70.2%| **0.0045** |0.0046s|
 | Citeseer | GAT |68.8%| **0.0124s** |0.0139s|
 
+If we use complex user-defined aggregation like [GraphSAGE-LSTM](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) that aggregates neighbor features with LSTM ignoring the order of recieved messages, the optimized message-passing in DGL will be forced to degenerate into degree bucketing scheme. The speed performance will be much slower than the one implemented in PGL. Performances may be various with different scale of the graph, in our experiments, PGL can reach up to 13 times the speed of DGL.
+
+| Dataset |   PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) | Speed up|
+| -------- |  ------------ | ------------------------------------ |----|
+| Cora | **0.0186s** | 0.1638s | 8.80x|
+| Pubmed | **0.0388s** |0.5275s | 13.59x|
+| Citeseer | **0.0150s** | 0.1278s | 8.52x |
+
+
+
 ## System requirements
 
 PGL requires:
 
 * paddle >= 1.6
-* networkx 
+* cython
 
 
 PGL supports both Python 2 & 3
@@ -67,10 +72,11 @@ PGL supports both Python 2 & 3
 
 ## Installation
 
-pip install pgl
-
-
+The current version of PGL is 1.0.0. You can simply install it via pip.
 
+```sh
+pip install pgl
+```
 
 ## The Team
 
diff --git a/examples/distribute_deepwalk/README.md b/examples/distribute_deepwalk/README.md
index d3839f60a623bfc3f98afed619f13d394c7aa0ee..c0c71fb45935093a34d24431d5db01f9d2d16a8d 100644
--- a/examples/distribute_deepwalk/README.md
+++ b/examples/distribute_deepwalk/README.md
@@ -1,4 +1,4 @@
-# distributed deepwalk in PGL
+# Distributed Deepwalk in PGL
 [Deepwalk](https://arxiv.org/pdf/1403.6652.pdf) is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce distributed deepwalk algorithms and reach the same level of indicators as the paper.
 
 ## Datasets