diff --git a/docs/1.0/nn.md b/docs/1.0/nn.md index 64678d500e0fe9067b12c5c048f3a4a0b1f20799..6f5e2c9c78747d7efb24dfe6a94875308a21b4f9 100644 --- a/docs/1.0/nn.md +++ b/docs/1.0/nn.md @@ -1184,7 +1184,7 @@ class torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, 该模块可以看作是Conv1d相对于其输入的梯度(the gradient of Conv1d with respect to its input, 直译), 转置卷积又被称为小数步长卷积或是反卷积(尽管这不是一个真正意义上的反卷积)。 -* `stride` 控制了转置卷积操作的步长 controls the stride for the cross-correlation. +* `stride` 控制了转置卷积操作的步长 * `padding` 控制了要在输入的各维度的各边上补齐0的层数,与Conv1d不同的地方,此padding参数与实际补齐0的层数的关系为`层数 = kernel_size - 1 - padding`,详情请见下面的note。 @@ -1249,61 +1249,63 @@ Shape: class torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1) ``` -Applies a 2D transposed convolution operator over an input image composed of several input planes. +利用指定大小的二维转置卷积核对输入的多通道二维输入信号进行转置卷积(当然此卷积也是互相关操作,cross-correlation)操作的模块。 -This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). +该模块可以看作是Conv2d相对于其输入的梯度(the gradient of Conv2d with respect to its input, 直译), 转置卷积又被称为小数步长卷积或是反卷积(尽管这不是一个真正意义上的反卷积)。 -* `stride` controls the stride for the cross-correlation. +* `stride` 控制了转置卷积操作的步长 -* `padding` controls the amount of implicit zero-paddings on both sides for `kernel_size - 1 - padding` number of points. See note below for details. - -* `output_padding` controls the additional size added to one side of the output shape. See note below for details. +* `padding` 控制了要在输入的各维度的各边上补齐0的层数,与Conv1d不同的地方,此padding参数与实际补齐0的层数的关系为`层数 = kernel_size - 1 - padding`,详情请见下面的note。 -* `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. + * `output_padding` 控制了转置卷积操作输出的各维度的长度增量,但注意这个参数不是说要往转置卷积的输出上pad 0,而是直接控制转置卷积的输出大小为根据此参数pad后的大小。更多的详情请见下面的note。 -* `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example, +* `dilation` 控制了卷积核中各点之间的空间距离;这也被称为多孔算法(à trous algorithm)。这个概念有点难解释,这个链接[link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md)用可视化的方法很好地解释了dilation的作用。 - > * At groups=1, all inputs are convolved to all outputs. - > * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. - > * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size ![](img/648a514da1dace3deacf3f078287e157.jpg)). +* `groups` 控制了输入输出之间的连接(connections)的数量。`in_channels` 和 `out_channels` 必须能被 `groups` 整除。举个栗子, -The parameters `kernel_size`, `stride`, `padding`, `output_padding` can either be: + > * 当 groups=1, 此Conv1d层会使用一个卷积层进行所有输入到输出的卷积操作。 + + > * 当 groups=2, 此时Conv1d层会产生两个并列的卷积层。同时,输入通道被分为两半,两个卷积层分别处理一半的输入通道,同时各自产生一半的输出通道。最后这两个卷积层的输出会被concatenated一起,作为此Conv1d层的输出。 + + > * 当 groups= `in_channels`, 每个输入通道都会被单独的一组卷积层处理,这个组的大小是![](img/648a514da1dace3deacf3f078287e157.jpg)。 + +`kernel_size`, `stride`, `padding`, `output_padding` 这几个参数均支持一下输入形式: -> * a single `int` – in which case the same value is used for the height and width dimensions -> * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension +> * 一个 `int` 数字 – 二维维数据的高和宽这两个维度都会采用这一个数字。 +> * 一个由两个int数字组成的`tuple`– 这种情况下,二维数据的高这一维度会采用元组中的第一个`int`数字,宽这一维度会采用第二个`int`数字。 Note -Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid [cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation), and not a full [cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation). It is up to the user to add proper padding. +取决于你卷积核的大小,有些时候输入数据中某些列(最后几列)可能不会参与计算(比如列数整除卷积核大小有余数,而又没有padding,那最后的余数列一般不会参与卷积计算),这主要是因为pytorch中的互相关操作[cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation)是保证计算正确的操作(valid operation), 而不是满操作(full operation)。所以实际操作中,还是要亲尽量选择好合适的padding参数哦。 + Note -The `padding` argument effectively adds `kernel_size - 1 - padding` amount of zero padding to both sizes of the input. This is set so that when a [`Conv2d`](#torch.nn.Conv2d "torch.nn.Conv2d") and a [`ConvTranspose2d`](#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when `stride > 1`, [`Conv2d`](#torch.nn.Conv2d "torch.nn.Conv2d") maps multiple input shapes to the same output shape. `output_padding` is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that `output_padding` is only used to find output shape, but does not actually add zero-padding to output. +`padding` 参数控制了要在输入的各维度各边上补齐0的层数,与在Conv1d中不同的是,在转置卷积操作过程中,此padding参数与实际补齐0的层数的关系为`层数 = kernel_size - 1 - padding`, 这样设置的主要原因是当使用相同的参数构建[`Conv2d`](#torch.nn.Conv2d "torch.nn.Conv2d") 和[`ConvTranspose2d`](#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d")模块的时候,这种设置能够实现两个模块有正好相反的输入输出的大小,即Conv2d的输出大小是其对应的ConvTranspose2d模块的输入大小,而ConvTranspose2d的输出大小又恰好是其对应的Conv2d模块的输入大小。然而,当`stride > 1`的时候,[`Conv2d`](#torch.nn.Conv2d "torch.nn.Conv2d") 的一个输出大小可能会对应多个输入大小,上一个note中就详细的介绍了这种情况,这样的情况下要保持前面提到两种模块的输入输出保持反向一致,那就要用到 `output_padding`参数了,这个参数可以增加转置卷积输出的某一维度的大小,以此来达到前面提到的同参数构建的[`Conv2d`](#torch.nn.Conv2d "torch.nn.Conv2d") 和[`ConvTranspose2d`](#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d")模块的输入输出方向一致。 但注意这个参数不是说要往转置卷积的输出上pad 0,而是直接控制转置卷积的输出各维度的大小为根据此参数pad后的大小。 Note -In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. Please see the notes on [Reproducibility](notes/randomness.html) for background. - -Parameters: +当程序的运行环境是使用了CuDNN的CUDA环境的时候,一些非确定性的算法(nondeterministic algorithm)可能会被采用以提高整个计算的性能。如果不想使用这些非确定性的算法,你可以通过设置`torch.backends.cudnn.deterministic = True`来让整个计算过程保持确定性(可能会损失一定的计算性能)。对于后端(background),你可以看一下这一部分[Reproducibility](notes/randomness.html)了解其相关信息。 -* **in_channels** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")) – Number of channels in the input image -* **out_channels** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")) – Number of channels produced by the convolution -* **kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – Size of the convolving kernel -* **stride** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – Stride of the convolution. Default: 1 -* **padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – `kernel_size - 1 - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0 -* **output_padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – Additional size added to one side of each dimension in the output shape. Default: 0 -* **groups** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")_,_ _optional_) – Number of blocked connections from input channels to output channels. Default: 1 -* **bias** ([_bool_](https://docs.python.org/3/library/functions.html#bool "(in Python v3.7)")_,_ _optional_) – If `True`, adds a learnable bias to the output. Default: `True` -* **dilation** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – Spacing between kernel elements. Default: 1 +Parameters: +* **in_channels** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")) – 输入通道的个数 +* **out_channels** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")) – 卷积操作输出通道的个数 +* **kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – 卷积核大小 +* **stride** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 卷积操作的步长。 默认: 1 +* **padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – `kernel_size - 1 - padding` 层 0 会被补齐到输入数据的各边上。 默认: 0 +* **output_padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 输出的各维度要增加的大小。默认:0 +* **groups** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")_,_ _optional_) – 输入通道与输出通道之间相互隔离的连接的个数。 默认:1 +* **bias** ([_bool_](https://docs.python.org/3/library/functions.html#bool "(in Python v3.7)")_,_ _optional_) – 如果被置为 `True`,向输出增加一个偏差量,此偏差是可学习参数。 默认:`True` +* **dilation** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 卷积核各元素之间的距离。 默认: 1 ```py Shape: ``` -* Input: ![](img/0385ad868fed790d36381b9e8788c18b.jpg) -* Output: ![](img/d3edfe8a9bbdd73ba5c4b566353777f0.jpg) where +* 输入: ![](img/0385ad868fed790d36381b9e8788c18b.jpg) +* 输出: ![](img/d3edfe8a9bbdd73ba5c4b566353777f0.jpg) 其中 ![](img/a2616e3fb8e8e919b799c2e62921c374.jpg) @@ -1311,12 +1313,12 @@ Shape: | Variables: | -* **weight** ([_Tensor_](tensors.html#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (in_channels, out_channels, kernel_size[0], kernel_size[1]) The values of these weights are sampled from ![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg) where ![](img/c12e2153347b696ebb784e5675cc566e.jpg) -* **bias** ([_Tensor_](tensors.html#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels) If `bias` is `True`, then the values of these weights are sampled from ![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg) where ![](img/c12e2153347b696ebb784e5675cc566e.jpg) +* **weight** ([_Tensor_](tensors.html#torch.Tensor "torch.Tensor")) – 模块中的一个大小为 (in_channels, out_channels, kernel_size[0], kernel_size[1])的权重张量,这些权重可训练学习(learnable)。这些权重的初始值的采样空间是![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg),其中 ![](img/c12e2153347b696ebb784e5675cc566e.jpg)。 +* **bias** ([_Tensor_](tensors.html#torch.Tensor "torch.Tensor")) – 模块的偏差项,大小为 (out_channels), 如果构造函数中的 `bias` 被置为 `True`,那么这些权重的初始值的采样空间是 ![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg) ,其中 ![](img/c12e2153347b696ebb784e5675cc566e.jpg)。 -Examples: +例子: ```py >>> # With square kernels and equal stride @@ -1344,61 +1346,62 @@ torch.Size([1, 16, 12, 12]) class torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1) ``` -Applies a 3D transposed convolution operator over an input image composed of several input planes. The transposed convolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes. - -This module can be seen as the gradient of Conv3d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). +利用指定大小的三维转置卷积核对输入的多通道三维输入信号进行转置卷积(当然此卷积也是互相关操作,cross-correlation)操作的模块。转置卷积的操作本质是将各通道输入与卷积核做乘法,然后返回各通道与此卷积核乘积结果之和(卷积的定义)。 -* `stride` controls the stride for the cross-correlation. +该模块可以看作是Conv3d相对于其输入的梯度(the gradient of Conv3d with respect to its input, 直译), 转置卷积又被称为小数步长卷积或是反卷积(尽管这不是一个真正意义上的反卷积)。 -* `padding` controls the amount of implicit zero-paddings on both sides for `kernel_size - 1 - padding` number of points. See note below for details. +* `stride` 控制了转置卷积操作的步长 -* `output_padding` controls the additional size added to one side of the output shape. See note below for details. +* `padding` 控制了要在输入的各维度的各边上补齐0的层数,与Conv1d不同的地方,此padding参数与实际补齐0的层数的关系为`层数 = kernel_size - 1 - padding`,详情请见下面的note。 -* `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. + * `output_padding` 控制了转置卷积操作输出的各维度的长度增量,但注意这个参数不是说要往转置卷积的输出上pad 0,而是直接控制转置卷积的输出大小为根据此参数pad后的大小。更多的详情请见下面的note。 -* `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example, +* `dilation` 控制了卷积核中各点之间的空间距离;这也被称为多孔算法(à trous algorithm)。这个概念有点难解释,这个链接[link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md)用可视化的方法很好地解释了dilation的作用。 - > * At groups=1, all inputs are convolved to all outputs. - > * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. - > * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size ![](img/648a514da1dace3deacf3f078287e157.jpg)). +* `groups` 控制了输入输出之间的连接(connections)的数量。`in_channels` 和 `out_channels` 必须能被 `groups` 整除。举个栗子, -The parameters `kernel_size`, `stride`, `padding`, `output_padding` can either be: + > * 当 groups=1, 此Conv1d层会使用一个卷积层进行所有输入到输出的卷积操作。 + + > * 当 groups=2, 此时Conv1d层会产生两个并列的卷积层。同时,输入通道被分为两半,两个卷积层分别处理一半的输入通道,同时各自产生一半的输出通道。最后这两个卷积层的输出会被concatenated一起,作为此Conv1d层的输出。 + + > * 当 groups= `in_channels`, 每个输入通道都会被单独的一组卷积层处理,这个组的大小是![](img/648a514da1dace3deacf3f078287e157.jpg)。 + +`kernel_size`, `stride`, `padding`, `output_padding` 这几个参数均支持一下输入形式: -> * a single `int` – in which case the same value is used for the depth, height and width dimensions -> * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension +> * 一个 `int` 数字 – 三维维数据的深度,高和宽这两个维度都会采用这一个数字。 +> * 一个由三个int数字组成的`tuple`– 这种情况下,三维数据的深度这一维度会采用元组中的第一个`int`数字,高这一维度会采用元组中的第二个`int`数字,宽这一维度会采用第三个`int`数字。 Note -Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid [cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation), and not a full [cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation). It is up to the user to add proper padding. - -Note +取决于你卷积核的大小,有些时候输入数据中某些列(最后几列)可能不会参与计算(比如列数整除卷积核大小有余数,而又没有padding,那最后的余数列一般不会参与卷积计算),这主要是因为pytorch中的互相关操作[cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation)是保证计算正确的操作(valid operation), 而不是满操作(full operation)。所以实际操作中,还是要亲尽量选择好合适的padding参数哦。 -The `padding` argument effectively adds `kernel_size - 1 - padding` amount of zero padding to both sizes of the input. This is set so that when a [`Conv3d`](#torch.nn.Conv3d "torch.nn.Conv3d") and a [`ConvTranspose3d`](#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when `stride > 1`, [`Conv3d`](#torch.nn.Conv3d "torch.nn.Conv3d") maps multiple input shapes to the same output shape. `output_padding` is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that `output_padding` is only used to find output shape, but does not actually add zero-padding to output. Note -In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. Please see the notes on [Reproducibility](notes/randomness.html) for background. +`padding` 参数控制了要在输入的各维度各边上补齐0的层数,与在Conv3d中不同的是,在转置卷积操作过程中,此padding参数与实际补齐0的层数的关系为`层数 = kernel_size - 1 - padding`, 这样设置的主要原因是当使用相同的参数构建[`Conv3d`](#torch.nn.Conv3d "torch.nn.Conv3d") 和[`ConvTranspose3d`](#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d")模块的时候,这种设置能够实现两个模块有正好相反的输入输出的大小,即Conv3d的输出大小是其对应的ConvTranspose3d模块的输入大小,而ConvTranspose3d的输出大小又恰好是其对应的Conv3d模块的输入大小。然而,当`stride > 1`的时候,[`Conv3d`](#torch.nn.Conv3d "torch.nn.Conv3d") 的一个输出大小可能会对应多个输入大小,上一个note中就详细的介绍了这种情况,这样的情况下要保持前面提到两种模块的输入输出保持反向一致,那就要用到 `output_padding`参数了,这个参数可以增加转置卷积输出的某一维度的大小,以此来达到前面提到的同参数构建的[`Conv3d`](#torch.nn.Conv3d "torch.nn.Conv3d") 和[`ConvTranspose3d`](#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d")模块的输入输出方向一致。 但注意这个参数不是说要往转置卷积的输出上pad 0,而是直接控制转置卷积的输出各维度的大小为根据此参数pad后的大小。 -Parameters: +Note -* **in_channels** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")) – Number of channels in the input image -* **out_channels** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")) – Number of channels produced by the convolution -* **kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – Size of the convolving kernel -* **stride** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – Stride of the convolution. Default: 1 -* **padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – `kernel_size - 1 - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0 -* **output_padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – Additional size added to one side of each dimension in the output shape. Default: 0 -* **groups** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")_,_ _optional_) – Number of blocked connections from input channels to output channels. Default: 1 -* **bias** ([_bool_](https://docs.python.org/3/library/functions.html#bool "(in Python v3.7)")_,_ _optional_) – If `True`, adds a learnable bias to the output. Default: `True` -* **dilation** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – Spacing between kernel elements. Default: 1 +当程序的运行环境是使用了CuDNN的CUDA环境的时候,一些非确定性的算法(nondeterministic algorithm)可能会被采用以提高整个计算的性能。如果不想使用这些非确定性的算法,你可以通过设置`torch.backends.cudnn.deterministic = True`来让整个计算过程保持确定性(可能会损失一定的计算性能)。对于后端(background),你可以看一下这一部分[Reproducibility](notes/randomness.html)了解其相关信息。 +Parameters: +* **in_channels** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")) – 输入通道的个数 +* **out_channels** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")) – 卷积操作输出通道的个数 +* **kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – 卷积核大小 +* **stride** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 卷积操作的步长。 默认: 1 +* **padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – `kernel_size - 1 - padding` 层 0 会被补齐到输入数据的各边上。 默认: 0 +* **output_padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 输出的各维度要增加的大小。默认:0 +* **groups** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)")_,_ _optional_) – 输入通道与输出通道之间相互隔离的连接的个数。 默认:1 +* **bias** ([_bool_](https://docs.python.org/3/library/functions.html#bool "(in Python v3.7)")_,_ _optional_) – 如果被置为 `True`,向输出增加一个偏差量,此偏差是可学习参数。 默认:`True` +* **dilation** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 卷积核各元素之间的距离。 默认: 1 ```py Shape: ``` -* Input: ![](img/a8d71105bc4954eb54660bc5d37c23de.jpg) -* Output: ![](img/f05e8faaf90b4c16b23ca0165e8e09f4.jpg) where +* 输入: ![](img/a8d71105bc4954eb54660bc5d37c23de.jpg) +* 输出: ![](img/f05e8faaf90b4c16b23ca0165e8e09f4.jpg) 其中 ![](img/35234de680c85870881b7f5d9e8de589.jpg) @@ -1408,12 +1411,10 @@ Shape: | Variables: | -* **weight** ([_Tensor_](tensors.html#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (in_channels, out_channels, kernel_size[0], kernel_size[1], kernel_size[2]) The values of these weights are sampled from ![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg) where ![](img/378f5c5b47c36239b817ad23a612a9f7.jpg) -* **bias** ([_Tensor_](tensors.html#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels) If `bias` is `True`, then the values of these weights are sampled from ![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg) where ![](img/378f5c5b47c36239b817ad23a612a9f7.jpg) - - +* **weight** ([_Tensor_](tensors.html#torch.Tensor "torch.Tensor")) – 模块中的一个大小为 (in_channels, out_channels, kernel_size[0], kernel_size[1], kernel_size[2])的权重张量,这些权重可训练学习(learnable)。这些权重的初始值的采样空间是![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg),其中 ![](img/378f5c5b47c36239b817ad23a612a9f7.jpg)。 +* **bias** ([_Tensor_](tensors.html#torch.Tensor "torch.Tensor")) – 模块的偏差项,大小为 (out_channels), 如果构造函数中的 `bias` 被置为 `True`,那么这些权重的初始值的采样空间是 ![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg),其中 ![](img/378f5c5b47c36239b817ad23a612a9f7.jpg)。 -Examples: +例子: ```py >>> # With square kernels and equal stride @@ -1431,48 +1432,51 @@ Examples: class torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1) ``` +将一个batch的输入张量展开成由多个滑动局部块组成的形式。 Extracts sliding local blocks from a batched input tensor. -Consider an batched `input` tensor of shape ![](img/2468b226c29a7e754a9c20f0214fa85f.jpg), where ![](img/9341d9048ac485106d2b2ee8de14876f.jpg) is the batch dimension, ![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg) is the channel dimension, and ![](img/28ec51e742166ea3400be6e7343bbfa5.jpg) represent arbitrary spatial dimensions. This operation flattens each sliding `kernel_size`-sized block within the spatial dimensions of `input` into a column (i.e., last dimension) of a 3-D `output` tensor of shape ![](img/4e1cad10fa9480fa82adbe59a5ae81fa.jpg), where ![](img/a8846766f2e1b47021f1520993773ccb.jpg) is the total number of values with in each block (a block has ![](img/8c7a54ca7193bc3a6c5ace8c3b07d24c.jpg) spatial locations each containing a ![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg)-channeled vector), and ![](img/db4a9fef02111450bf98261889de550c.jpg) is the total number of such blocks: +以一个大小为![](img/2468b226c29a7e754a9c20f0214fa85f.jpg)的批次化(batched)输入张量为例,其中![](img/9341d9048ac485106d2b2ee8de14876f.jpg)是batch的大小,![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg)是通道数量,![](img/28ec51e742166ea3400be6e7343bbfa5.jpg)代表了任意空间维度。那Unfold这个操作在此张量上的操作就是,将这个张量展开成由多个`kernel_size`大小的滑动块组成的大小为![](img/4e1cad10fa9480fa82adbe59a5ae81fa.jpg)的三维张量,其中![](img/a8846766f2e1b47021f1520993773ccb.jpg)是每个块中数的个数(每个块有![](img/8c7a54ca7193bc3a6c5ace8c3b07d24c.jpg)个空间位置,每个空间位置存储一个通道大小为![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg)的向量),![](img/db4a9fef02111450bf98261889de550c.jpg)是块的个数: + ![](img/1d2c6a9103e2b33f725602aebf90364e.jpg) +(这张图有问题啊,编辑整理的时候注意修正一下) -where ![](img/42a2dca8a9cb6104321cf29ae30fd56a.jpg) is formed by the spatial dimensions of `input` (![](img/28ec51e742166ea3400be6e7343bbfa5.jpg) above), and ![](img/9566974d45a96737f7e0ecf302d877b8.jpg) is over all spatial dimensions. +其中 ![](img/42a2dca8a9cb6104321cf29ae30fd56a.jpg) 是由上面例子中的`input`各空间维度组成的,![](img/9566974d45a96737f7e0ecf302d877b8.jpg)遍历了各个空间维度。 -Therefore, indexing `output` at the last dimension (column dimension) gives all values within a certain block. +因此,索引Fold操作的`output`的最后一个维度等价于索引某一个block,而索引操作的返回值是这个索引到的block中的所有值。 -The `padding`, `stride` and `dilation` arguments specify how the sliding blocks are retrieved. -* `stride` controls the stride for the sliding blocks. -* `padding` controls the amount of implicit zero-paddings on both sides for `padding` number of points for each dimension before reshaping. -* `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. +`padding`, `stride` 和 `dilation` 参数指明了滑动块的相关性质。 + +* `stride` 控制了滑动块的步长。 +* `padding` 控制了在变形之前要向input的各维度各边上补齐的0的层数。 +* `dilation` 控制了卷积核中各点之间的空间距离;这也被称为多孔算法(à trous algorithm)。这个概念有点难解释,这个链接[link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md)用可视化的方法很好地解释了dilation的作用。 Parameters: -* **kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – the size of the sliding blocks -* **stride** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – the stride of the sliding blocks in the input spatial dimensions. Default: 1 -* **padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – implicit zero padding to be added on both sides of input. Default: 0 -* **dilation** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – a parameter that controls the stride of elements within the neighborhood. Default: 1 +* **kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – 滑动块的大小 +* **stride** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 滑动块在输入各维度上的步长。默认: 1 +* **padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 在输入各维度各边上补齐0的层数。 +* **dilation** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 控制了各元素之间的距离(没有指明元素具体指的是谁的元素,猜测是输出的)。默认:1 -* If `kernel_size`, `dilation`, `padding` or `stride` is an int or a tuple of length 1, their values will be replicated across all spatial dimensions. -* For the case of two input spatial dimensions this operation is sometimes called `im2col`. +* 如果 `kernel_size`, `dilation`, `padding` 或者 `stride`的值是一个int,或是一个长度为1的int元组,在相关操作的时候各个空间维度上都会使用这同一个值。 +* 如果输出向量有两个空间维度,那么此Fold操作有时又被称为`im2col`。 Note - -[`Fold`](#torch.nn.Fold "torch.nn.Fold") calculates each combined value in the resulting large tensor by summing all values from all containing blocks. [`Unfold`](#torch.nn.Unfold "torch.nn.Unfold") extracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other. +[`Fold`](#torch.nn.Fold "torch.nn.Fold") 的主要作用是通过求和输入张量中各block的值来生成输出张量,而[`Unfold`](#torch.nn.Unfold "torch.nn.Unfold")则是通过从输入张量中不断拷贝数值到相应的block中来生成输出张量。所以,这两个操作并不是互逆操作。 Warning -Currently, only 4-D input tensors (batched image-like tensors) are supported. +目前,只有四维张量(比如批次化的图像张量)支持这个操作。 ```py Shape: ``` -* Input: ![](img/2468b226c29a7e754a9c20f0214fa85f.jpg) -* Output: ![](img/4e1cad10fa9480fa82adbe59a5ae81fa.jpg) as described above +* 输入: ![](img/2468b226c29a7e754a9c20f0214fa85f.jpg) +* 输出: ![](img/4e1cad10fa9480fa82adbe59a5ae81fa.jpg) Examples: @@ -1504,51 +1508,54 @@ tensor(1.9073e-06) class torch.nn.Fold(output_size, kernel_size, dilation=1, padding=0, stride=1) ``` +将由滑动局部块组成的数组集合为一个大张量。 Combines an array of sliding local blocks into a large containing tensor. -Consider a batched `input` tensor containing sliding local blocks, e.g., patches of images, of shape ![](img/9e56ff5e3827b936da5cfa3a5258b12e.jpg), where ![](img/9341d9048ac485106d2b2ee8de14876f.jpg) is batch dimension, ![](img/a8846766f2e1b47021f1520993773ccb.jpg) is the number of values with in a block (a block has ![](img/8c7a54ca7193bc3a6c5ace8c3b07d24c.jpg) spatial locations each containing a ![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg)-channeled vector), and ![](img/db4a9fef02111450bf98261889de550c.jpg) is the total number of blocks. (This is exacly the same specification as the output shape of [`Unfold`](#torch.nn.Unfold "torch.nn.Unfold").) This operation combines these local blocks into the large `output` tensor of shape ![](img/c2176aae9e099eeee07cc00c4dc7b7e7.jpg) by summing the overlapping values. Similar to [`Unfold`](#torch.nn.Unfold "torch.nn.Unfold"), the arguments must satisfy +考虑一个包含了很多个滑动局部块的输入张量,比如,一批图像分割块(patches of images)的集合,大小为![](img/9e56ff5e3827b936da5cfa3a5258b12e.jpg),其中![](img/9341d9048ac485106d2b2ee8de14876f.jpg)是batch大小, ![](img/a8846766f2e1b47021f1520993773ccb.jpg) 是一个块中的数值个数(每个块有![](img/8c7a54ca7193bc3a6c5ace8c3b07d24c.jpg)个空间位置,每个空间位置存储一个通道大小为![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg)的向量),![](img/db4a9fef02111450bf98261889de550c.jpg)是滑动块的个数。(这些大小参数严格遵循了[`Unfold`](#torch.nn.Unfold "torch.nn.Unfold")操作的输出向量的大小规定。)Fold操作通过求和重叠值的方式来将这些局部块集合为一个大小为![](img/c2176aae9e099eeee07cc00c4dc7b7e7.jpg)的`output`张量。与 [`Unfold`](#torch.nn.Unfold "torch.nn.Unfold")类似,这些参数必须满足: ![](img/465bba7070e80a7e5964f46f7f5ed8bb.jpg) -where ![](img/9566974d45a96737f7e0ecf302d877b8.jpg) is over all spatial dimensions. +其中![](img/9566974d45a96737f7e0ecf302d877b8.jpg)遍历了各个空间维度。 -* `output_size` describes the spatial shape of the large containing tensor of the sliding local blocks. It is useful to resolve the ambiguity when multiple input shapes map to same number of sliding blocks, e.g., with `stride > 0`. +* `output_size` 描述了要生成的output的各空间维度的大小。有时,同样数量的滑动块,可能会产生多种`input`的形状,比如,当`stride > 0`的时候,这时候,设置`output_size`参数就会显得极为重要。 -The `padding`, `stride` and `dilation` arguments specify how the sliding blocks are retrieved. +`padding`, `stride` 和 `dilation` 参数指明了滑动块的相关性质。 -* `stride` controls the stride for the sliding blocks. -* `padding` controls the amount of implicit zero-paddings on both sides for `padding` number of points for each dimension before reshaping. -* `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. +* `stride` 控制了滑动块的步长。 +* `padding` 控制了在变形之前要向input的各维度各边上补齐的0的层数。 +* `dilation` 控制了卷积核中各点之间的空间距离;这也被称为多孔算法(à trous algorithm)。这个概念有点难解释,这个链接[link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md)用可视化的方法很好地解释了dilation的作用。 Parameters: -* **output_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – the shape of the spatial dimensions of the output (i.e., `input.sizes()[2:]`) -* **kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – the size of the sliding blocks -* **stride** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – the stride of the sliding blocks in the input spatial dimensions. Default: 1 -* **padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – implicit zero padding to be added on both sides of input. Default: 0 -* **dilation** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – a parameter that controls the stride of elements within the neighborhood. Default: 1 +* **output_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – 输出向量的各空间维度的大小 (i.e., `input.sizes()[2:]`) +* **kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")) – 滑动块的大小 +* **stride** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 滑动块在输入各维度上的步长。默认: 1 +* **padding** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 在输入各维度各边上补齐0的层数。 +* **dilation** ([_int_](https://docs.python.org/3/library/functions.html#int "(in Python v3.7)") _or_ [_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple "(in Python v3.7)")_,_ _optional_) – 控制了各元素之间的距离(没有指明元素具体指的是谁的元素,猜测是输出的)。默认:1 -* If `output_size`, `kernel_size`, `dilation`, `padding` or `stride` is an int or a tuple of length 1 then their values will be replicated across all spatial dimensions. -* For the case of two output spatial dimensions this operation is sometimes called `col2im`. -Note -[`Fold`](#torch.nn.Fold "torch.nn.Fold") calculates each combined value in the resulting large tensor by summing all values from all containing blocks. [`Unfold`](#torch.nn.Unfold "torch.nn.Unfold") extracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other. +* 如果`output_size`, `kernel_size`, `dilation`, `padding` 或者 `stride`是一个int或者长度为1的int元组,在相关操作的时候各个空间维度上都会使用这同一个值。 +* 如果此输出向量的空间维度数为2,那么此Fold操作有时又被称为`col2im`。 + +Note +[`Fold`](#torch.nn.Fold "torch.nn.Fold") 的主要作用是通过求和输入张量中各block的值来生成输出张量,而[`Unfold`](#torch.nn.Unfold "torch.nn.Unfold")则是通过从输入张量中不断拷贝数值到相应的block中来生成输出张量。所以,这两个操作并不是互逆操作。 Warning -Currently, only 4-D output tensors (batched image-like tensors) are supported. +目前,只有四维张量(比如批次化的图像张量)支持这个操作。 + ```py Shape: ``` -* Input: ![](img/4e1cad10fa9480fa82adbe59a5ae81fa.jpg) -* Output: ![](img/c2176aae9e099eeee07cc00c4dc7b7e7.jpg) as described above +* 输入: ![](img/4e1cad10fa9480fa82adbe59a5ae81fa.jpg) +* 输出: ![](img/c2176aae9e099eeee07cc00c4dc7b7e7.jpg) -Examples: +举例: ```py >>> fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2)) @@ -1558,6 +1565,8 @@ Examples: ``` +``` 卷积层部分Fold 与 Unfold 是1.0新增的内容,猜测其主要目的是开放col2im和im2col这两个通过矩阵乘法实现卷积操作的前序接口,要好好理解这部分可能要了解一下现在主流框架通过大矩阵乘法来实现卷积操作这一通用做法了,这一篇文章就介绍的很好[Implementing convolution as a matrix multiplication](https://buptldy.github.io/2016/10/01/2016-10-01-im2col/),这一段如果感觉我的直译晦涩难懂,可以看一下英文原版,虽然我觉得英文原版介绍的也是晦涩难懂。。。``` + ## Pooling layers ### MaxPool1d