Merge pull request #2 from doombeaker/add_nn_graph_doc

Add nn graph doc

Merge pull request #2 from doombeaker/add_nn_graph_doc
Add nn graph doc
92a33545 · Yao Chi · GitHub · b6060433 · abcf518a · 92a33545
6 changed file
--- a/cn/docs/basics/08_nn_graph.md
+++ b/cn/docs/basics/08_nn_graph.md
+# 静态图模块 nn.Graph
+
+目前，深度学习框架中模型的运行方式主要有两种，即 **动态图** 与 **静态图**，在 OneFlow 中，也被习惯称为 **Eager 模式** 和 **Graph 模式** 。
+
+这两种方式各有优缺点，OneFlow 对两种方式均提供了支持，默认情况下是 Eager 模式。如果你是按顺序阅读本基础专题的教程，那么，到目前为所接触的所有代码都是 Eager 模式的代码。
+
+一般而言，动态图更易用，静态图性能更具优势。OneFlow 提供的 [nn.Graph](todo_rst_nngraph.md) 模块，让用户可以用类似 Eager 的编程习惯，构建静态图并训练模型。
+
+本文包括：
+
+- 动态图与静态图的基本介绍
+- `nn.Graph` 模块接口介绍
+- Eager 转 Graph 实例
+
+
+
+## 动态图 VS 静态图
+
+我们已经知道，用户定义的神经网络，都会被深度学习框架转为计算图，如 [自动求梯度](./05_autograd.md) 中的例子：
+
+```python
+def loss(y_pred, y):
+    return flow.sum(1/2*(y_pred-y)**2)
+
+x = flow.ones(1, 5)  # 输入
+w = flow.randn(5, 3, requires_grad=True)
+b = flow.randn(1, 3, requires_grad=True)
+z = flow.matmul(x, w) + b
+
+y = flow.zeros(1, 3)  # label
+l = loss(z,y)
+```
+
+对应的计算图为：
+
+![计算图](./imgs/compute_graph.png)
+
+**动态图（Dynamic Graph）**
+
+动态图的特点在于，它是一边执行代码，一边完成计算图的构建的。
+以上代码和构图关系可看下图（注意：下图对简单的语句做了合并）
+
+![](./imgs/dynamic_graph.gif)
+
+
+因为动态图是一边执行一边构图，所以很灵活，可以随时修改图的结构，运行一行代码就能得到一行的结果，易于调试。但是因为深度学习框架无法获取完整的图信息（随时可以改变、永远不能认为构图已经完成），因此无法进行充分的全局优化，在性能上会相对欠缺。
+
+  
+
+**静态图（Static Graph）**
+
+与动态图不同，静态图先定义完整的计算图。即需要用户先声明所有计算节点后，框架才开始进行计算。这可以理解为在用户代码与最终运行的计算图之间，框架起到了编译器的作用。
+
+![static graph](./imgs/static_graph.png)
+
+以 OneFlow 框架为例，用户的代码会被先转换为完整的计算图，然后再由 OneFlow Runtime 模块运行。
+
+静态图这种先获取完整网络，再编译运行的方式，使得它可以做很多动态图做不到的优化，因此性能上更有优势。并且编译完成后的计算图，也更容易跨平台部署。
+
+不过，在静态图中真正的计算发生时，已经与用户的代码没有直接关系了，因此静态图的调试较不方便。
+
+
+两种方式对比总结如下：
+
+|              | 动态图 | 静态图   |
+| ------------ | ------------------------------------- | ------------------------ |
+| 计算方式 | Eager 模式                             | Graph 模式                 |
+| 优点     | 代码编写灵活，易于调试                | 性能好，易于优化和部署   |
+| 缺点     | 性能及可移植性差                      | 不易调试                 |
+
+
+OneFlow 提供的 Eager 模式，与 PyTorch 对齐，让熟悉 PyTorch 的用户可以零成本直接上手。
+OneFlow 提供的 Graph 模式，也基于面向对象的编程风格，让熟悉 Eager 开发的用户，只需要改很少量的代码，就可以使用高效率的静态图。
+
+## OneFlow 的 Eager 模式
+
+OneFlow 默认以 Eager 模式运行。
+
+以下脚本，用多项式 $y=a+bx+cx^2+dx^3$ 拟合正弦函数 $y=sin(x)$，求出一组近似拟合参数 $a$, $b$, $c$, $d$。
+
+> 注：该例子代码改编自 [PyTorch 官网教程](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#nn-module)。
+
+```python
+import math
+import numpy as np
+import oneflow as flow
+
+device = flow.device("cuda")
+dtype = flow.float32
+
+# Create Tensors to hold input and outputs.
+x = flow.tensor(np.linspace(-math.pi, math.pi, 2000), device=device, dtype=dtype)
+y = flow.tensor(np.sin(x), device=device, dtype=dtype)
+
+# For this example, the output y is a linear function of (x, x^2, x^3), so
+# we can consider it as a linear layer neural network. Let's prepare the
+# tensor (x, x^2, x^3).
+xx = flow.cat(
+    [x.unsqueeze(-1).pow(1), x.unsqueeze(-1).pow(2), x.unsqueeze(-1).pow(3)], dim=1
+)
+# The Linear Module
+model = flow.nn.Sequential(flow.nn.Linear(3, 1), flow.nn.Flatten(0, 1))
+model.to(device)
+
+# Loss Function
+loss_fn = flow.nn.MSELoss(reduction="sum")
+loss_fn.to(device)
+
+# Optimizer
+optimizer = flow.optim.SGD(model.parameters(), lr=1e-6)
+
+for t in range(2000):
+    # Forward pass: compute predicted y by passing x to the model.
+    y_pred = model(xx)
+
+    # Compute and print loss.
+    loss = loss_fn(y_pred, y)
+    if t % 100 == 99:
+        print(t, loss.numpy())
+
+    # Use the optimizer object to zero all of the gradients for the variables
+    # it will update (which are the learnable weights of the model).
+    optimizer.zero_grad()
+
+    # Backward pass: compute gradient of the loss with respect to model
+    # parameters.
+    loss.backward()
+
+    # Calling the step function on an Optimizer makes an update to its
+    # parameters.
+    optimizer.step()
+
+linear_layer = model[0]
+
+print(
+    f"Result: y = {linear_layer.bias.numpy()[0]} + {linear_layer.weight[:, 0].numpy()[0]}*x + {linear_layer.weight[:, 1].numpy()[0]}*x^2 + {linear_layer.weight[:, 2].numpy()[0]}*x^3"
+)
+```
+
+输出：
+
+```text
+99 582.7045
+...
+1799 9.326502
+1899 9.154123
+1999 9.040091
+Result: y = -0.0013652867637574673 + 0.8422811627388*x + 0.0002355352626182139*x^2 + -0.09127362817525864*x^3
+```
+
+```python
+import matplotlib.pyplot as plt
+w = linear_layer.weight.numpy()[0]
+b = linear_layer.bias.numpy()[0]
+y_fit = b + w[0]*x + w[1]*x**2 + w[2]*x**3
+plt.plot(x.numpy(),y.numpy())
+plt.plot(x.numpy(),y_fit.numpy())
+```
+
+![poly_fit](./imgs/poly_fit.png)
+
+## OneFlow 的 Graph 模式
+
+### nn.Graph
+
+OneFlow 提供了 [nn.Graph](todo_nn_graph_rst.md) 基类，它可以简单理解成“静态图版本的 [nn.Module](https://oneflow.readthedocs.io/en/master/module.html?highlight=oneflow.nn.Module#oneflow.nn.Module)”。
+
+用户使用 OneFlow， 静态图和动态图开发差异很小。想使用静态图，只需要先像 Eager 那样搭建模型。最后构建一个 `nn.Graph`的子类，将神经网络（`nn.Module` 的子类）、损失函数、优化器等相关内容注册到该类中，就可以在 Graph 模式下运行模型。
+
+在以下代码中可以看到，Graph 模式的代码与 Eager 模式几乎没有区别，从 Eager 模式转为 Graph 模式，只需要做以下少量额外的工作：
+
+- 在 `nn.Graph` 子类的 `__init__` 中，指定神经网络、loss、优化器
+- 在 `nn.Graph` 子类的 `build` 中，指定模型的前向和后向过程（仅调用 `backward`，参数更新会自动完成）
+
+```python
+# The Linear Train Graph
+class LinearTrainGraph(flow.nn.Graph):
+    def __init__(self):
+        super().__init__()
+        self.model = model
+        self.loss_fn = loss_fn
+        self.add_optimizer(optimizer)
+
+    def build(self, x, y):
+        y_pred = self.model(x)
+        loss = self.loss_fn(y_pred, y)
+        loss.backward()
+        return loss
+linear_graph = LinearTrainGraph()
+```
+
+得到的 Graph 对象，和 Module 对象类似，可以直接接受 Tensor 对象，并调用：
+
+```python
+loss = linear_graph(xx, y)
+```
+
+我们在该类的`__init__()`方法中，进行模型构建和配置，在`build()`方法中，显示指定模型的前向和后向过程（梯度更新将隐式地自动完成）。该类的实例在每次调用时，会自动执行`build()`方法，完成一次模型迭代。
+
+
+### Graph 模式的多项式拟合例子
+
+将上述多项式拟合例子从Eager模式转换为为Graph模式，需要通过继承`nn.Graph`类构建一个`LinearTrainGraph`类。
+
+在控制台运行以下命令，体验 Graph 模式。
+
+```shell
+wget https://docs.oneflow.org/master/code/basics/fit_graph_mode.py
+python3 ./fit_graph_mode.py
+```
+
+以下为详细代码。
+
+```python
+import math
+import numpy as np
+import oneflow as flow
+
+device = flow.device("cuda")
+dtype = flow.float32
+
+# Create Tensors to hold input and outputs.
+x = flow.tensor(np.linspace(-math.pi, math.pi, 2000), device=device, dtype=dtype)
+y = flow.tensor(np.sin(x), device=device, dtype=dtype)
+
+# For this example, the output y is a linear function of (x, x^2, x^3), so
+# we can consider it as a linear layer neural network. Let's prepare the
+# tensor (x, x^2, x^3).
+xx = flow.cat(
+    [x.unsqueeze(-1).pow(1), x.unsqueeze(-1).pow(2), x.unsqueeze(-1).pow(3)], dim=1
+)
+
+# The Linear Module
+model = flow.nn.Sequential(flow.nn.Linear(3, 1), flow.nn.Flatten(0, 1))
+model.to(device)
+
+# Loss Function
+loss_fn = flow.nn.MSELoss(reduction="sum")
+loss_fn.to(device)
+
+# Optimizer
+optimizer = flow.optim.SGD(model.parameters(), lr=1e-6)
+
+
+# The Linear Train Graph
+class LinearTrainGraph(flow.nn.Graph):
+    def __init__(self):
+        super().__init__()
+        self.model = model
+        self.loss_fn = loss_fn
+        self.add_optimizer(optimizer)
+
+    def build(self, x, y):
+        y_pred = self.model(x)
+        loss = self.loss_fn(y_pred, y)
+        loss.backward()
+        return loss
+
+
+linear_graph = LinearTrainGraph()
+
+for t in range(2000):
+    # Print loss.
+    loss = linear_graph(xx, y)
+    if t % 100 == 99:
+        print(t, loss.numpy())
+
+
+linear_layer = model[0]
+print(
+    f"Result: y = {linear_layer.bias.numpy()} + {linear_layer.weight[:, 0].numpy()} x + {linear_layer.weight[:, 1].numpy()} x^2 + {linear_layer.weight[:, 2].numpy()} x^3"
+)
+```
+
+
+
+### Graph 调试
+
+调用 Graph 对象的 `debug` 方法，OneFlow 在编译生成计算图的过程中会打印调试信息：
+
+```
+linear_graph = LinearTrainGraph()
+linear_graph.debug()
+```
+
+输出：
+
+```text
+Note that nn.Graph.debug() only print debug info on rank 0.
+(GRAPH:LinearTrainGraph_0:LinearTrainGraph) start building forward graph.
+(INPUT:_LinearTrainGraph_0-input_0:tensor(..., device='cuda:0', size=(20, 3), dtype=oneflow.float32))
+(INPUT:_LinearTrainGraph_0-input_1:tensor(..., device='cuda:0', size=(20,), dtype=oneflow.float32))
+(MODULE:model:Sequential())
+(INPUT:_model-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 3),
+       dtype=oneflow.float32))
+(MODULE:model.0:Linear(in_features=3, out_features=1, bias=True))
+(INPUT:_model.0-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 3),
+       dtype=oneflow.float32))
+(PARAMETER:model.0.weight:tensor(..., device='cuda:0', size=(1, 3), dtype=oneflow.float32,
+       requires_grad=True))
+(PARAMETER:model.0.bias:tensor(..., device='cuda:0', size=(1,), dtype=oneflow.float32,
+       requires_grad=True))
+(OUTPUT:_model.0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 1),
+       dtype=oneflow.float32))
+(MODULE:model.1:Flatten(start_dim=0, end_dim=1))
+(INPUT:_model.1-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 1),
+       dtype=oneflow.float32))
+(OUTPUT:_model.1-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
+(OUTPUT:_model-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
+(MODULE:loss_fn:MSELoss())
+(INPUT:_loss_fn-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
+(INPUT:_loss_fn-input_1:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
+(OUTPUT:_loss_fn-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
+(OUTPUT:_LinearTrainGraph_0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
+(GRAPH:LinearTrainGraph_0:LinearTrainGraph) end building forward graph.
+(GRAPH:LinearTrainGraph_0:LinearTrainGraph) start compiling and init graph runtime.
+(GRAPH:LinearTrainGraph_0:LinearTrainGraph) end compiling and init graph rumtime.
+```
+
+输出中将显示包括计算图中各层的名称、输入输出张量的信息，包括形状、设备信息、数据类型等。
+
+
+
+## 相关链接
+
+OneFlow Eager模式下的神经网络搭建：[搭建神经网络](./04_build_network.md)
+
+PyTorch版本的多项式拟合实例代码：[PyTorch: nn](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#id19)
+
+
+
--- a/cn/docs/basics/imgs/compute_graph.png
+++ b/cn/docs/basics/imgs/compute_graph.png
--- a/cn/docs/basics/imgs/dynamic_graph.gif
+++ b/cn/docs/basics/imgs/dynamic_graph.gif
--- a/cn/docs/basics/imgs/poly_fit.png
+++ b/cn/docs/basics/imgs/poly_fit.png
--- a/cn/docs/basics/imgs/static_graph.png
+++ b/cn/docs/basics/imgs/static_graph.png
--- a/cn/docs/code/basics/fit_graph_mode.py
+++ b/cn/docs/code/basics/fit_graph_mode.py
+import math
+import numpy as np
+import oneflow as flow
+
+device = flow.device("cuda")
+dtype = flow.float32
+
+# Create Tensors to hold input and outputs.
+x = flow.tensor(np.linspace(-math.pi, math.pi, 2000), device=device, dtype=dtype)
+y = flow.tensor(np.sin(x), device=device, dtype=dtype)
+
+# For this example, the output y is a linear function of (x, x^2, x^3), so
+# we can consider it as a linear layer neural network. Let's prepare the
+# tensor (x, x^2, x^3).
+xx = flow.cat(
+    [x.unsqueeze(-1).pow(1), x.unsqueeze(-1).pow(2), x.unsqueeze(-1).pow(3)], dim=1
+)
+
+# The Linear Module
+model = flow.nn.Sequential(flow.nn.Linear(3, 1), flow.nn.Flatten(0, 1))
+model.to(device)
+
+# Loss Function
+loss_fn = flow.nn.MSELoss(reduction="sum")
+loss_fn.to(device)
+
+# Optimizer
+optimizer = flow.optim.SGD(model.parameters(), lr=1e-6)
+
+
+# The Linear Train Graph
+class LinearTrainGraph(flow.nn.Graph):
+    def __init__(self):
+        super().__init__()
+        self.model = model
+        self.loss_fn = loss_fn
+        self.add_optimizer(optimizer)
+
+    def build(self, x, y):
+        y_pred = self.model(x)
+        loss = self.loss_fn(y_pred, y)
+        loss.backward()
+        return loss
+
+
+linear_graph = LinearTrainGraph()
+
+for t in range(2000):
+    # Print loss.
+    loss = linear_graph(xx, y)
+    if t % 100 == 99:
+        print(t, loss.numpy())
+
+
+linear_layer = model[0]
+print(
+    f"Result: y = {linear_layer.bias.numpy()} + {linear_layer.weight[:, 0].numpy()} x + {linear_layer.weight[:, 1].numpy()} x^2 + {linear_layer.weight[:, 2].numpy()} x^3"
+)