1-Introduction.ipynb 36.5 KB
Notebook
Newer Older
Y
yelrose 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 介绍\n",
    "PGL 是一个用paddlepaddle实现的图神经网络(GNN)框架,它可以方便用户快速构建自己的图神经网络模型。\n",
    "\n",
    "为了让用户快速上手,本教程的主要目的是:\n",
    "* 理解PGL是如何在图网络上进行计算的。\n",
    "* 使用PGL实现一个简单的图神经网络模型,用于对图网络中的节点进行二分类。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 第一步:使用PGL创建一个图网络"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pgl\n",
    "from pgl import graph  # 导入pgl的图模块\n",
    "import numpy as np\n",
    "\n",
    "def build_graph():\n",
    "    # 定义节点的个数;每个节点用一个数字表示,即从0~9\n",
    "    num_node = 10\n",
    "    # 添加节点之间的边,每条边用一个tuple表示为: (src, dst)\n",
    "    edge_list = [(2, 0), (2, 1), (3, 1),(4, 0), (5, 0), \n",
    "             (6, 0), (6, 4), (6, 5), (7, 0), (7, 1),\n",
    "             (7, 2), (7, 3), (8, 0), (9, 7)]\n",
    "\n",
    "    # 每个节点可以用一个d维的特征向量作为表示,这里随机产生节点的向量表示.\n",
    "    # 在PGL中,我们可以使用numpy来添加节点的向量表示。\n",
    "    d = 16\n",
    "    feature = np.random.randn(num_node, d).astype(\"float32\")\n",
    "    #feature = np.array(feature,  dtype=\"float32\")\n",
W
Webbley 已提交
45 46
    "    # 对于边,也同样可以用边的权重作为边特征\n",
    "    edge_feature = np.random.randn(len(edge_list), 1).astype(\"float32\")\n",
Y
yelrose 已提交
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
    " \n",
    "    # 根据节点,边以及对应的特征向量,创建一个完整的图网络。\n",
    "    # 在PGL中,节点特征和边特征都是存储在一个dict中。\n",
    "    g = graph.Graph(num_nodes = num_node,\n",
    "                    edges = edge_list, \n",
    "                    node_feat = {'feature':feature}, \n",
    "                    edge_feat ={'edge_feature': edge_feature})\n",
    "\n",
    "    return g"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 创建一个图对象,用于保存图网络的各种数据。\n",
    "g = build_graph()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 10 nodes in the graph.\n",
      "There are 14 edges in the graph.\n"
     ]
    }
   ],
   "source": [
    "# 打印图的节点的数量和边的数量\n",
    "print('There are %d nodes in the graph.'%g.num_nodes)\n",
    "print('There are %d edges in the graph.'%g.num_edges)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "除了打印节点,我们也可以可视化整个图网络,下面演示如何绘图显示整个图网络。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
W
Webbley 已提交
102
      "image/png": "\n",
Y
yelrose 已提交
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import networkx as nx # networkx是一个常用的绘制复杂图形的Python包。\n",
    "\n",
    "def display_graph(g):\n",
    "    nx_G = nx.Graph()\n",
    "    nx_G.add_nodes_from(range(g.num_nodes))\n",
    "    for line in g.edges:\n",
    "        nx_G.add_edge(*line)\n",
    "    nx.draw(nx_G, with_labels=True,\n",
    "            node_color=['y','g','g','g','y','y','y','g','y','g'], node_size=1000)\n",
    "    foo_fig = plt.gcf() # 'get current figure'\n",
    "    foo_fig.savefig('gcn.png', format='png', dpi=1000)\n",
    "    #foo_fig.savefig('./foo.pdf', format='pdf')  # 也可以保存成pdf\n",
    "\n",
    "    plt.show()\n",
    "\n",
    "display_graph(g)# 创建一个GraphWrapper作为图数据的容器,用于构建图神经网络。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在PGL中,图对象用于保存各种图数据。我们还需要用到GraphWrapper作为图数据的容器,用于构建图神经网络。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import paddle.fluid as fluid\n",
    "use_cuda = False  \n",
W
Webbley 已提交
148
    "place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()\n",
Y
yelrose 已提交
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
    "\n",
    "gw = pgl.graph_wrapper.GraphWrapper(name='graph',\n",
    "                place = place,\n",
    "                node_feat=g.node_feat_info(),\n",
    "                edge_feat=g.edge_feat_info())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 第二步:构建一个图卷积网络模型(GCN)\n",
    "\n",
    "在本教程中,我们使用图卷积网络模型([Kipf和Welling](https://arxiv.org/abs/1609.02907))来实现节点分类器。为了方便,这里我们使用最简单的GCN结构。如果读者想更加深入了解GCN,可以参考原始论文。\n",
    "\n",
    "* 在第$l$层中,每个节点$u_i^l$都有一个特征向量$h_i^l$;\n",
    "* 在每一层中,GCN的想法是下一层的每个节点$u_i^{l+1}$的特征向量$h_i^{l+1}$是由该节点的所有邻居节点的特征向量加权后经过一个非线性变换后得到的。\n",
    "\n",
    "GCN模型符合消息传递模式(message-passing paradigm),当一个节点的所有邻居节点把消息发送出来后,这个节点就可以根据上面的定义更新自己的特征向量了。\n",
    "\n",
    "在PGL中,我们可以很容易实现一个GCN层。如下所示:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 自定义GCN层函数\n",
W
Webbley 已提交
179
    "def gcn_layer(gw, nfeat, efeat, hidden_size, name, activation):\n",
Y
yelrose 已提交
180 181 182 183 184 185
    "    # gw是一个GraphWrapper;feature是节点的特征向量。\n",
    "    \n",
    "    # 定义message函数,\n",
    "    def send_func(src_feat, dst_feat, edge_feat): \n",
    "        # 注意: 这里三个参数是固定的,虽然我们只用到了第一个参数。\n",
    "        # 在本教程中,我们直接返回源节点的特征向量作为message。用户也可以自定义message函数的内容。\n",
W
Webbley 已提交
186
    "        return src_feat['h'] * edge_feat['e']\n",
Y
yelrose 已提交
187 188 189 190 191 192 193 194
    "\n",
    "    # 定义reduce函数,参数feat其实是从message函数那里获得的。\n",
    "    def recv_func(feat):\n",
    "        # 这里通过将源节点的特征向量进行加和。\n",
    "        # feat为LodTensor,关于LodTensor的介绍参照Paddle官网。\n",
    "        return fluid.layers.sequence_pool(feat, pool_type='sum')\n",
    "\n",
    "    # send函数触发message函数,发送消息,并将返回消息。\n",
W
Webbley 已提交
195
    "    msg = gw.send(send_func, nfeat_list=[('h', nfeat)], efeat_list=[('e', efeat)])\n",
Y
yelrose 已提交
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213
    "    # recv函数接收消息,并触发reduce函数,对消息进行处理。\n",
    "    output = gw.recv(msg, recv_func) \n",
    "    # 以activation为激活函数的全连接输出层。\n",
    "    output = fluid.layers.fc(output, size=hidden_size, bias_attr=False, act=activation, name=name)\n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在定义好GCN层之后,我们可以构建一个更深的GCN模型,如下我们定一个两层GCN。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
W
Webbley 已提交
214
   "outputs": [],
Y
yelrose 已提交
215 216
   "source": [
    "# 第一层GCN将特征向量从16维映射到8维,激活函数使用relu。\n",
W
Webbley 已提交
217 218
    "output = gcn_layer(gw, gw.node_feat['feature'], gw.edge_feat['edge_feature'], \n",
    "                   hidden_size=8, name='gcn_layer_1', activation='relu')\n",
Y
yelrose 已提交
219
    "# 第二层GCN将特征向量从8维映射导2维,对应我们的二分类。不使用激活函数。\n",
W
Webbley 已提交
220 221
    "output = gcn_layer(gw, output, gw.edge_feat['edge_feature'], \n",
    "                   hidden_size=1, name='gcn_layer_2', activation=None)"
Y
yelrose 已提交
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 第三步:数据预处理\n",
    "\n",
    "由于我们实现一个节点二分类器,所以我们可以使用0,1分别表示两个类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [0,1,1,1,0,0,0,1,0,1]\n",
    "label = np.array(y, dtype=\"float32\")\n",
    "label = np.expand_dims(label, -1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 第四步:设置训练程序\n",
    "GCN的训练过程跟训练其它基于paddlepaddle的模型是一样的。\n",
    "* 首先我们构建损失函数;\n",
    "* 接着创建一个优化器;\n",
    "* 最后创建执行器并执行训练过程。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
W
Webbley 已提交
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
      "Epoch 0 | Loss: 0.629119\n",
      "Epoch 1 | Loss: 0.614591\n",
      "Epoch 2 | Loss: 0.602767\n",
      "Epoch 3 | Loss: 0.593824\n",
      "Epoch 4 | Loss: 0.587454\n",
      "Epoch 5 | Loss: 0.581866\n",
      "Epoch 6 | Loss: 0.576963\n",
      "Epoch 7 | Loss: 0.572337\n",
      "Epoch 8 | Loss: 0.567905\n",
      "Epoch 9 | Loss: 0.563806\n",
      "Epoch 10 | Loss: 0.559831\n",
      "Epoch 11 | Loss: 0.555969\n",
      "Epoch 12 | Loss: 0.552211\n",
      "Epoch 13 | Loss: 0.548553\n",
      "Epoch 14 | Loss: 0.544992\n",
      "Epoch 15 | Loss: 0.541524\n",
      "Epoch 16 | Loss: 0.538145\n",
      "Epoch 17 | Loss: 0.534852\n",
      "Epoch 18 | Loss: 0.531641\n",
      "Epoch 19 | Loss: 0.528505\n",
      "Epoch 20 | Loss: 0.525442\n",
      "Epoch 21 | Loss: 0.522446\n",
      "Epoch 22 | Loss: 0.519513\n",
      "Epoch 23 | Loss: 0.516638\n",
      "Epoch 24 | Loss: 0.513819\n",
      "Epoch 25 | Loss: 0.511053\n",
      "Epoch 26 | Loss: 0.508336\n",
      "Epoch 27 | Loss: 0.505668\n",
      "Epoch 28 | Loss: 0.503046\n",
      "Epoch 29 | Loss: 0.500472\n"
Y
yelrose 已提交
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344
     ]
    }
   ],
   "source": [
    "# 创建一个标签层作为节点类别标签的容器。\n",
    "node_label = fluid.layers.data(\"node_label\", shape=[None, 1], dtype=\"float32\", append_batch_size=False)\n",
    "# 使用带sigmoid的交叉熵函数作为损失函数\n",
    "loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=output, label=node_label)\n",
    "# 计算平均损失\n",
    "loss = fluid.layers.mean(loss)\n",
    "\n",
    "# 选择Adam优化器,学习率设置为0.01\n",
    "adam = fluid.optimizer.Adam(learning_rate=0.01)\n",
    "adam.minimize(loss)\n",
    "\n",
    "# 创建执行器\n",
    "exe = fluid.Executor(place)\n",
    "exe.run(fluid.default_startup_program())\n",
    "feed_dict = gw.to_feed(g) # 获取图数据\n",
    "\n",
    "for epoch in range(30):\n",
    "    feed_dict['node_label'] = label\n",
    "    \n",
    "    train_loss = exe.run(fluid.default_main_program(), feed=feed_dict, fetch_list=[loss], return_numpy=True)\n",
    "    print('Epoch %d | Loss: %f'%(epoch, train_loss[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
W
Webbley 已提交
345
   "version": "3.6.5"
Y
yelrose 已提交
346 347 348
  }
 },
 "nbformat": 4,
W
Webbley 已提交
349
 "nbformat_minor": 4
Y
yelrose 已提交
350
}