From bb2a4d993863866d4039f732095e178f595ff748 Mon Sep 17 00:00:00 2001
From: yelrose <270018958@qq.com>
Date: Tue, 25 Jun 2019 22:21:54 +0800
Subject: [PATCH] Add LSTM-Pool compared

---
 docs/source/md/introduction.md | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/source/md/introduction.md b/docs/source/md/introduction.md
index bd42565..6dd06fa 100644
--- a/docs/source/md/introduction.md
+++ b/docs/source/md/introduction.md
@@ -35,8 +35,8 @@ Users only need to call the ```sequence_ops``` functions provided by Paddle to e
         return fluid.layers.sequence_pool(msg, "sum")
 ```
 
-Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
 
+Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. In our experiments, PGL can reach up to 13 times the speed of DGL with complex user-defined functions. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions.
 
 ## Performance
 
@@ -50,3 +50,11 @@ We test all the GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs t
 | Pubmed | GAT | 77% |0.0193s|**0.0144s**|
 | Citeseer | GCN |70.2%| **0.0045** |0.0046s|
 | Citeseer | GAT |68.8%| **0.0124s** |0.0139s|
+
+If we use complex user-defined aggregation like [GraphSAGE-LSTM](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) that aggregates neighbor features with LSTM ignoring the order of recieved messages, the optimized message-passing in DGL will be forced to degenerate into degree bucketing scheme. The speed performance will be much slower than the one implemented in PGL. Performances may be various with different scale of the graph, in our experiments, PGL can reach up to 13 times the speed of DGL.
+
+| Dataset |   PGL speed (epoch time) | DGL 0.3.0 speed (epoch time) | Speed up|
+| -------- |  ------------ | ------------------------------------ |----|
+| Cora | **0.0186s** | 0.1638s | 8.80x|
+| Pubmed | **0.0388s** |0.5275s | 13.59x|
+| Citeseer | **0.0150s** | 0.1278s | 8.52x |
-- 
GitLab