未验证 提交 b505f7d4 编写于 作者: H Huang Zhengjie 提交者: GitHub

Merge pull request #3 from PaddlePaddle/master

Merge From paddle/PGL
<img src="./docs/source/_static/logo.png" alt="The logo of Paddle Graph Learning (PGL)" width="320">
[DOC](https://pgl.readthedocs.io/en/latest/) | [Quick Start](https://pgl.readthedocs.io/en/latest/instruction.html) | [中文](./README.zh.md)
[DOC](https://pgl.readthedocs.io/en/latest/) | [Quick Start](https://pgl.readthedocs.io/en/latest/quick_start/instruction.html) | [中文](./README.zh.md)
Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle).
......@@ -76,12 +76,12 @@ Because of the different node types on the heterogeneous graph, the message deli
## Large-Scale: Support distributed graph storage and distributed training algorithms
In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted [PaddleFleet](https://github.com/PaddlePaddle/Fleet) as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so tcan easily set up a large scale distributed training algorithm with MPI clusters.
In most cases of large-scale graph learning, we need distributed graph storage and distributed training support. As shown in the following figure, PGL provided a general solution of large-scale training, we adopted [PaddleFleet](https://github.com/PaddlePaddle/Fleet) as our distributed parameter servers, which supports large scale distributed embeddings and a lightweighted distributed storage engine so it can easily set up a large scale distributed training algorithm with MPI clusters.
<img src="./docs/source/_static/distributed_frame.png" alt="The distributed frame of PGL" width="800">
## Highlight: Tons of Models
## Model Zoo
The following are 13 graph learning models that have been implemented in the framework. See the details [here](https://pgl.readthedocs.io/en/latest/introduction.html#highlight-tons-of-models)
|Model | feature |
......@@ -125,6 +125,8 @@ pip install pgl
PGL is developed and maintained by NLP and Paddle Teams at Baidu
E-mail: nlp-gnn[at]baidu.com
## License
PGL uses Apache License 2.0.
<img src="./docs/source/_static/logo.png" alt="The logo of Paddle Graph Learning (PGL)" width="320">
[文档](https://pgl.readthedocs.io/en/latest/) | [快速开始](https://pgl.readthedocs.io/en/latest/instruction.html) | [English](./README.md)
[文档](https://pgl.readthedocs.io/en/latest/) | [快速开始](https://pgl.readthedocs.io/en/latest/quick_start/instruction.html) | [English](./README.md)
Paddle Graph Learning (PGL)是一个基于[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)的高效易用的图学习框架
......@@ -28,7 +28,7 @@ Paddle Graph Learning (PGL)是一个基于[PaddlePaddle](https://github.com/Padd
return fluid.layers.sequence_pool(msg, "sum")
```
尽管DGL用了一些内核融合(kernel fusion)的方法来将常用的sum,max等聚合函数用scatter-gather进行优化。但是对于**复杂的用户定义函数**,他们使用的Degree Bucketing算法,仅仅使用串行的方案来处理不同的分块,并不充分利用GPU进行加速。然而,在PGL中我们使用基于LodTensor的消息传递能够充分地利用GPU的并行优化,在复杂的用户定义函数下,PGL的速度在我们的实验中甚至能够达到DGL的13倍。即使不使用scatter-gather的优化,PGL仍然有高效的性能表现。当然,我们也是提供了scatter优化的聚合函数。
尽管DGL用了一些内核融合(kernel fusion)的方法来将常用的sum,max等聚合函数用scatter-gather进行优化。但是对于**复杂的用户定义函数**,他们使用的Degree Bucketing算法,仅仅使用串行的方案来处理不同的分块,并不充分利用GPU进行加速。然而,在PGL中我们使用基于LodTensor的消息传递能够充分地利用GPU的并行优化,在复杂的用户定义函数下,PGL的速度在我们的实验中甚至能够达到DGL的13倍。即使不使用scatter-gather的优化,PGL仍然有高效的性能表现。当然,我们也是提供了scatter优化的聚合函数。
### 性能测试
......@@ -75,12 +75,12 @@ Paddle Graph Learning (PGL)是一个基于[PaddlePaddle](https://github.com/Padd
<img src="./docs/source/_static/distributed_frame.png" alt="The distributed frame of PGL" width="800">
## 特色:丰富性——覆盖业界大部分图学习网络
## 丰富性——覆盖业界大部分图学习网络
下列是框架中已经自带实现的十三种图网络学习模型
下列是框架中已经自带实现的十三种图网络学习模型。详情请参考[这里](https://pgl.readthedocs.io/en/latest/introduction.html#highlight-tons-of-models)
| 模型 | 特点 |
|---|---|--- |
|---|---|
| GCN | 图卷积网络 |
| GAT | 基于Attention的图卷积网络 |
| GraphSage | 基于邻居采样的大规模图卷积网络 |
......@@ -121,6 +121,8 @@ pip install pgl
PGL由百度的NLP以及Paddle团队共同开发以及维护。
联系方式 E-mail: nlp-gnn[at]baidu.com
## License
PGL uses Apache License 2.0.
docs/source/_static/logo.png

50.4 KB | W: | H:

docs/source/_static/logo.png

45.2 KB | W: | H:

docs/source/_static/logo.png
docs/source/_static/logo.png
docs/source/_static/logo.png
docs/source/_static/logo.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -96,7 +96,7 @@ In most cases of large-scale graph learning, we need distributed graph storage a
<div/>
## Highlight: Tons of Models
## Model Zoo
The following are 13 graph learning models that have been implemented in the framework.
|Model | feature |
......
......@@ -95,7 +95,7 @@ After defining the GCN layer, we can construct a deeper GCN model with two GCN l
```python
output = gcn_layer(gw, gw.node_feat['feature'],
hidden_size=8, name='gcn_layer_1', activation='relu')
output = gcn_layer(gw, output, hidden_size=2,
output = gcn_layer(gw, output, hidden_size=1,
name='gcn_layer_2', activation=None)
```
......
......@@ -108,9 +108,7 @@ def build_complied_prog(train_program, model_loss):
compiled_prog = F.compiler.CompiledProgram(
train_program).with_data_parallel(
loss_name=model_loss.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
loss_name=model_loss.name)
return compiled_prog
......
......@@ -253,10 +253,22 @@ def sample_subset_with_eid(list nids, list eids, long long maxdegree, shuffle=Fa
@cython.boundscheck(False)
@cython.wraparound(False)
def skip_gram_gen_pair(vector[long long] walk, long win_size=5):
def skip_gram_gen_pair(vector[long long] walk_path, long win_size=5):
"""Return node paris generated by skip-gram algorithm.
This function will auto remove the pair which src node is the same
as dst node.
Args:
walk_path: List of nodes as a walk path.
win_size: the windows size used in skip-gram.
Return:
A tuple of (src node list, dst node list).
"""
cdef vector[long long] src
cdef vector[long long] dst
cdef long long l = len(walk)
cdef long long l = len(walk_path)
cdef long long real_win_size, left, right, i
cdef np.ndarray[np.int64_t, ndim=1] rnd = np.random.randint(1, win_size+1,
dtype=np.int64, size=l)
......@@ -270,15 +282,23 @@ def skip_gram_gen_pair(vector[long long] walk, long win_size=5):
if right >= l:
right = l - 1
for j in xrange(left, right+1):
if walk[i] == walk[j]:
if walk_path[i] == walk_path[j]:
continue
src.push_back(walk[i])
dst.push_back(walk[j])
src.push_back(walk_path[i])
dst.push_back(walk_path[j])
return src, dst
@cython.boundscheck(False)
@cython.wraparound(False)
def alias_sample_build_table(np.ndarray[np.float64_t, ndim=1] probs):
"""Return the alias table and event table for alias sampling.
Args:
porobs: A list of float numbers as the probability.
Return:
A tuple of (alias table, event table).
"""
cdef long long l = len(probs)
cdef np.ndarray[np.float64_t, ndim=1] alias = probs * l
cdef np.ndarray[np.int64_t, ndim=1] events = np.zeros(l, dtype=np.int64)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册