diff --git a/README.md b/README.md index 682c2202ce43d6e61464621316cfc342df8431f8..97315d23e71ac3a0260a2ee9274db0a5f0b41eec 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ Users only need to call the ```sequence_ops``` functions provided by Paddle to e Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. In our experiments, PGL can reach up to 13 times the speed of DGL with complex user-defined functions. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions. -## Performance +### Performance We test all the following GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping. @@ -82,7 +82,7 @@ In most cases of large-scale graph learning, we need distributed graph storage a ## Highlight: Tons of Models -The following are 13 graph learning models that have been implemented in the framework. +The following are 13 graph learning models that have been implemented in the framework. See the details [here](https://pgl.readthedocs.io/en/latest/introduction.html#tons-of-models) |Model | feature | |---|---| diff --git a/docs/source/md/introduction.md b/docs/source/md/introduction.md index ec7a4bfe60604b5e7984843eb0c660e7ba391ede..e0474ef8cd814d2683e8af513afe72d79ff1fac2 100644 --- a/docs/source/md/introduction.md +++ b/docs/source/md/introduction.md @@ -41,7 +41,7 @@ Users only need to call the ``sequence_ops`` functions provided by Paddle to eas Although DGL does some kernel fusion optimization for general sum, max and other aggregate functions with scatter-gather. For **complex user-defined functions** with degree bucketing algorithm, the serial execution for each degree bucket cannot take full advantage of the performance improvement provided by GPU. However, operations on the PGL LodTensor-based message is performed in parallel, which can fully utilize GPU parallel optimization. In our experiments, PGL can reach up to 13 times the speed of DGL with complex user-defined functions. Even without scatter-gather optimization, PGL still has excellent performance. Of course, we still provide build-in scatter-optimized message aggregation functions. -## Performance +### Performance We test all the following GNN algorithms with Tesla V100-SXM2-16G running for 200 epochs to get average speeds. And we report the accuracy on test dataset without early stoppping.