A kind of Tensor that is to be considered a module parameter.
Parameters are [`Tensor`](tensors.html#torch.Tensor"torch.Tensor") subclasses, that have a very special property when used with [`Module`](#torch.nn.Module"torch.nn.Module") s - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in [`parameters()`](#torch.nn.Module.parameters"torch.nn.Module.parameters") iterator. Assigning a Tensor doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as [`Parameter`](#torch.nn.Parameter"torch.nn.Parameter"), these temporaries would get registered too.
***requires_grad** ([_bool_](https://docs.python.org/3/library/functions.html#bool"(in Python v3.7)")_,_ _optional_) – if the parameter requires gradient. See [Excluding subgraphs from backward](notes/autograd.html#excluding-subgraphs) for more details. Default: `True`
Submodules assigned in this way will be registered, and will have their parameters converted too when you call [`to()`](#torch.nn.Module.to"torch.nn.Module.to"), etc.
Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self. Typical use includes initializing the parameters of a model (see also torch-nn-init).
| Parameters: | **fn** ([`Module`](#torch.nn.Module"torch.nn.Module") -> None) – function to be applied to each submodule |
| Parameters: | **recurse** ([_bool_](https://docs.python.org/3/library/functions.html#bool"(in Python v3.7)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. |
Returns an iterator over immediate children modules.
| Yields: | _Module_ – a child module |
| Yields: | _Module_ – 子模块 |
| --- | --- |
```py
cpu()
```
Moves all model parameters and buffers to the CPU.
将模型的所有参数(parameter)和缓冲区(buffer)都转移到CPU内存中。
| Returns: | self |
| --- | --- |
...
...
@@ -154,11 +154,11 @@ Moves all model parameters and buffers to the CPU.
cuda(device=None)
```
Moves all model parameters and buffers to the GPU.
将模型的所有参数和缓冲区都转移到CUDA设备内存中。
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
| Parameters: | **device** ([_int_](https://docs.python.org/3/library/functions.html#int"(in Python v3.7)")_,_ _optional_) – if specified, all parameters will be copied to that device |
@@ -180,31 +180,33 @@ Casts all floating point parameters and buffers to `double` datatype.
dump_patches=False
```
This allows better BC support for [`load_state_dict()`](#torch.nn.Module.load_state_dict"torch.nn.Module.load_state_dict"). In [`state_dict()`](#torch.nn.Module.state_dict"torch.nn.Module.state_dict"), the version number will be saved as in the attribute `_metadata` of the returned state dict, and thus pickled. `_metadata` is a dictionary with keys that follow the naming convention of state dict. See `_load_from_state_dict` on how to use this information in loading.
这个字段可以为[`load_state_dict()`](#torch.nn.Module.load_state_dict"torch.nn.Module.load_state_dict")提供 BC 支持(BC support实在不懂是什么意思-.-)。 在 [`state_dict()`](#torch.nn.Module.state_dict"torch.nn.Module.state_dict")函数返回的状态字典(state dict)中, 有一个名为`_metadata`的属性中存储了这个state_dict的版本号。`_metadata`是一个遵从了状态字典(state dict)的命名规范的关键字字典, 要想了解这个`_metadata`在加载状态(loading state dict)的时候是怎么用的,可以看一下 `_load_from_state_dict`部分的文档。
If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s `_load_from_state_dict` method can compare the version number and do appropriate changes if the state dict is from before the change.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. [`Dropout`](#torch.nn.Dropout"torch.nn.Dropout"), `BatchNorm`, etc.
@@ -215,19 +217,19 @@ Casts all floating point parameters and buffers to float datatype.
forward(*input)
```
Defines the computation performed at every call.
定义了每次模块被调用之后所进行的计算过程。
Should be overridden by all subclasses.
应该被Module类的所有子类重写。
Note
Although the recipe for forward pass needs to be defined within this function, one should call the [`Module`](#torch.nn.Module"torch.nn.Module") instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
@@ -238,12 +240,12 @@ Casts all floating point parameters and buffers to `half` datatype.
load_state_dict(state_dict,strict=True)
```
Copies parameters and buffers from [`state_dict`](#torch.nn.Module.state_dict"torch.nn.Module.state_dict") into this module and its descendants. If `strict` is `True`, then the keys of [`state_dict`](#torch.nn.Module.state_dict"torch.nn.Module.state_dict") must exactly match the keys returned by this module’s [`state_dict()`](#torch.nn.Module.state_dict"torch.nn.Module.state_dict") function.
***state_dict** ([_dict_](https://docs.python.org/3/library/stdtypes.html#dict"(in Python v3.7)")) – a dict containing parameters and persistent buffers.
***strict** ([_bool_](https://docs.python.org/3/library/functions.html#bool"(in Python v3.7)")_,_ _optional_) – whether to strictly enforce that the keys in [`state_dict`](#torch.nn.Module.state_dict"torch.nn.Module.state_dict") match the keys returned by this module’s [`state_dict()`](#torch.nn.Module.state_dict"torch.nn.Module.state_dict") function. Default:`True`
Returns an iterator over all modules in the network.
返回一个当前模块内所有模块(包括自身)的迭代器。
| Yields: | _Module_ – a module in the network |
| --- | --- |
Note
Duplicate modules are returned only once. In the following example, `l` will be returned only once.
注意重复的模块只会被返回一次。比在下面这个例子中,`l`就只会被返回一次。
Example:
例子:
```py
>>>l=nn.Linear(2,2)
...
...
@@ -280,18 +282,18 @@ Example:
named_buffers(prefix='',recurse=True)
```
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
返回一个模块缓冲区的迭代器,每次返回的元素是由缓冲区的名字和缓冲区自身组成的元组。
Parameters:
named_buffers()函数的参数:
***prefix** ([_str_](https://docs.python.org/3/library/stdtypes.html#str"(in Python v3.7)")) – prefix to prepend to all buffer names.
***recurse** ([_bool_](https://docs.python.org/3/library/functions.html#bool"(in Python v3.7)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
返回一个当前模块内所有模块(包括自身)的迭代器,每次返回的元素是由模块的名字和模块自身组成的元组。
| Yields: | _(string, Module)_ – Tuple of name and module |
| Yields: | _(string, Module)_ – 模块名字和模块自身组成的元组 |
| --- | --- |
Note
Duplicate modules are returned only once. In the following example, `l` will be returned only once.
重复的模块只会被返回一次。在下面的例子中,`l`只被返回了一次。
Example:
例子:
```py
>>>l=nn.Linear(2,2)
...
...
@@ -351,18 +355,18 @@ Example:
named_parameters(prefix='',recurse=True)
```
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
返回一个当前模块内所有参数的迭代器,每次返回的元素是由参数的名字和参数自身组成的元组。
Parameters:
named_parameters()函数参数:
***prefix** ([_str_](https://docs.python.org/3/library/stdtypes.html#str"(in Python v3.7)")) – prefix to prepend to all parameter names.
***recurse** ([_bool_](https://docs.python.org/3/library/functions.html#bool"(in Python v3.7)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
| Parameters: | **recurse** ([_bool_](https://docs.python.org/3/library/functions.html#bool"(in Python v3.7)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. |
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
```py
hook(module,grad_input,grad_output)->TensororNone
```
The `grad_input` and `grad_output` may be tuples if the module has multiple inputs or outputs. The hook should not modify its arguments, but it can optionally return a new gradient with respect to input that will be used in place of `grad_input` in subsequent computations.
The current implementation will not have the presented behavior for complex [`Module`](#torch.nn.Module"torch.nn.Module") that perform many operations. In some failure cases, `grad_input` and `grad_output` will only contain the gradients for a subset of the inputs and outputs. For such [`Module`](#torch.nn.Module"torch.nn.Module"), you should use [`torch.Tensor.register_hook()`](autograd.html#torch.Tensor.register_hook"torch.Tensor.register_hook") directly on a specific input or output to get the required gradients.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s `running_mean` is not a parameter, but is part of the persistent state.
The hook will be called every time after [`forward()`](#torch.nn.Module.forward"torch.nn.Module.forward") has computed an output. It should have the following signature:
The hook will be called every time before [`forward()`](#torch.nn.Module.forward"torch.nn.Module.forward") is invoked. It should have the following signature:
```py
...
...
@@ -475,9 +481,9 @@ hook(module, input) -> None
```
The hook should not modify the input.
此钩子函数不能进行会修改 input 这个参数的操作。
| Returns: | a handle that can be used to remove the added hook by calling `handle.remove()` |
Its signature is similar to [`torch.Tensor.to()`](tensors.html#torch.Tensor.to"torch.Tensor.to"), but only accepts floating point desired `dtype` s. In addition, this method will only cast the floating point parameters and buffers to `dtype` (if given). The integral parameters and buffers will be moved `device`, if that is given, but with dtypes unchanged. When `non_blocking` is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.
此函数的函数签名跟[`torch.Tensor.to()`](tensors.html#torch.Tensor.to"torch.Tensor.to")函数的函数签名很相似,只不过这个函数`dtype`参数只接受浮点数类型的dtype,如float, double, half( floating point desired `dtype` s)。同时,这个方法只会将浮点数类型的参数和缓冲区(the floating point parameters and buffers)转化为`dtype`(如果输入参数中给定的话)的数据类型。而对于整数类型的参数和缓冲区(the integral parameters and buffers),即便输入参数中给定了`dtype`,也不会进行转换操作,而如果给定了 `device`参数,移动操作则会正常进行。当`non_blocking`参数被设置为True之后,此函数会尽可能地相对于 host 进行异步的 转换/移动 操作,比如,将存储在固定内存(pinned memory)上的CPU Tensors移动到CUDA设备上这一过程既是如此。
See below for examples.
例子在下面。
Note
This method modifies the module in-place.
这个方法对模块的修改都是in-place操作。
Parameters:
to()函数的参数:
***device** (`torch.device`) – the desired device of the parameters and buffers in this module
***dtype** (`torch.dtype`) – the desired floating point type of the floating point parameters and buffers in this module
***tensor** ([_torch.Tensor_](tensors.html#torch.Tensor"torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. [`Dropout`](#torch.nn.Dropout"torch.nn.Dropout"), `BatchNorm`, etc.
@@ -619,7 +626,7 @@ Casts all parameters and buffers to `dst_type`.
zero_grad()
```
Sets gradients of all model parameters to zero.
讲模块所有参数的梯度设置为0。
### Sequential
...
...
@@ -627,12 +634,12 @@ Sets gradients of all model parameters to zero.
classtorch.nn.Sequential(*args)
```
A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.
ModuleDict can be indexed like a regular Python dictionary, but modules it contains are properly registered, and will be visible by all Module methods.
| Parameters: | **modules** (_iterable__,_ _optional_) – a mapping (dictionary) of (string: module) or an iterable of key/value pairs of type (string, module) |
| Parameters: | **modules** (_iterable_) – a mapping (dictionary) of (string: `Module``) or an iterable of key/value pairs of type (string, `Module``) |
ParameterList can be indexed like a regular Python list, but parameters it contains are properly registered, and will be visible by all Module methods.
| Parameters: | **parameters** (_iterable__,_ _optional_) – an iterable of `Parameter`` to add |
ParameterDict can be indexed like a regular Python dictionary, but parameters it contains are properly registered, and will be visible by all Module methods.
| Parameters: | **parameters** (_iterable__,_ _optional_) – a mapping (dictionary) of (string : [`Parameter`](#torch.nn.Parameter"torch.nn.Parameter")) or an iterable of key,value pairs of type (string, [`Parameter`](#torch.nn.Parameter"torch.nn.Parameter")) |
| Parameters: | **parameters** (_iterable_) – a mapping (dictionary) of (string : [`Parameter`](#torch.nn.Parameter"torch.nn.Parameter")) or an iterable of key/value pairs of type (string, [`Parameter`](#torch.nn.Parameter"torch.nn.Parameter")) |
Applies a 1D convolution over an input signal composed of several input planes.
利用指定大小的一维卷积核对输入的多通道一维输入信号进行一维卷积操作的卷积层。
In the simplest case, the output value of the layer with input size ![](img/1dad4f3ff614c986028f7100e0205f6d.jpg) and output ![](img/a03de8b18f61a493174a56530fb03f1d.jpg) can be precisely described as:
where ![](img/d5d3d32b4a35f91edb54c3c3f87d582e.jpg) is the valid [cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator, ![](img/9341d9048ac485106d2b2ee8de14876f.jpg) is a batch size, ![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg) denotes a number of channels, ![](img/db4a9fef02111450bf98261889de550c.jpg) is a length of signal sequence.
这里的![](img/d5d3d32b4a35f91edb54c3c3f87d582e.jpg)符号实际上是一个互相关([cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation)) 操作符(大家可以自己查一下互相关和真卷积的区别,互相关因为实现起来很简单,所以一般的深度学习框架都是用互相关操作取代真卷积), ![](img/9341d9048ac485106d2b2ee8de14876f.jpg) is a batch size, ![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg) 代表通道的数量, ![](img/db4a9fef02111450bf98261889de550c.jpg) 代表信号序列的长度。
*`stride`controls the stride for the cross-correlation, a single number or a one-element tuple.
*`padding`controls the amount of implicit zero-paddings on both sides for `padding` number of points.
*`padding`参数控制了要在一维卷积核的输入信号的两边填补的0的个数。
*`dilation`controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.
> * At groups=1, all inputs are convolved to all outputs.
> * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
> * At groups= `in_channels`, each input channel is convolved with its own set of filters, of size ![](img/19131f9f53448ae579b613bc7bc90158.jpg)
Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid [cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation), and not a full [cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation). It is up to the user to add proper padding.
When `groups == in_channels` and `out_channels == K * in_channels`, where `K` is a positive integer, this operation is also termed in literature as depthwise convolution.
In other words, for an input of size ![](img/7db3e5e5d600c81e77756d5eee050505.jpg), a depthwise convolution with a depthwise multiplier `K`, can be constructed by arguments ![](img/eab8f2745761d762e48a59446243af90.jpg).
当`groups == in_channels` 并且 `out_channels == K * in_channels`(其中K是正整数)的时候,这个操作也被称为深度卷积。
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. Please see the notes on [Reproducibility](notes/randomness.html) for background.
Conv1d的参数:
Parameters:
***in_channels** ([_int_](https://docs.python.org/3/library/functions.html#int"(in Python v3.7)")) – Number of channels in the input image
***out_channels** ([_int_](https://docs.python.org/3/library/functions.html#int"(in Python v3.7)")) – Number of channels produced by the convolution
***kernel_size** ([_int_](https://docs.python.org/3/library/functions.html#int"(in Python v3.7)")_or_[_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple"(in Python v3.7)")) – Size of the convolving kernel
***stride** ([_int_](https://docs.python.org/3/library/functions.html#int"(in Python v3.7)")_or_[_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple"(in Python v3.7)")_,_ _optional_) – Stride of the convolution. Default: 1
***padding** ([_int_](https://docs.python.org/3/library/functions.html#int"(in Python v3.7)")_or_[_tuple_](https://docs.python.org/3/library/stdtypes.html#tuple"(in Python v3.7)")_,_ _optional_) – Zero-padding added to both sides of the input. Default: 0
***groups** ([_int_](https://docs.python.org/3/library/functions.html#int"(in Python v3.7)")_,_ _optional_) – Number of blocked connections from input channels to output channels. Default: 1
***bias** ([_bool_](https://docs.python.org/3/library/functions.html#bool"(in Python v3.7)")_,_ _optional_) – If `True`, adds a learnable bias to the output. Default: `True`
***weight** ([_Tensor_](tensors.html#torch.Tensor"torch.Tensor")) – the learnable weights of the module of shape (out_channels, in_channels, kernel_size). The values of these weights are sampled from ![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg) where ![](img/69aab1ce658aabc9a2d986ae8281e2ad.jpg)
***bias** ([_Tensor_](tensors.html#torch.Tensor"torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from ![](img/3d305f1c240ff844b6cb2c1c6660e0af.jpg) where ![](img/69aab1ce658aabc9a2d986ae8281e2ad.jpg)
Applies a 2D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size ![](img/a6c3a4e9779c159b39576bee3400a00b.jpg) and output ![](img/4b354af142fb0f01680d390ef552829f.jpg) can be precisely described as:
where ![](img/d5d3d32b4a35f91edb54c3c3f87d582e.jpg) is the valid 2D [cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator, ![](img/9341d9048ac485106d2b2ee8de14876f.jpg) is a batch size, ![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg) denotes a number of channels, ![](img/9b7d9beafd65e2cf6493bdca741827a5.jpg) is a height of input planes in pixels, and ![](img/90490a34512e9bd1843ed4da713d0813.jpg) is width in pixels.
这里的![](img/d5d3d32b4a35f91edb54c3c3f87d582e.jpg)符号实际上是一个二维互相关([cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation)) 操作符(大家可以自己查一下互相关和真卷积的区别,互相关因为实现起来很简单,所以一般的深度学习框架都是用互相关操作取代真卷积), ![](img/9341d9048ac485106d2b2ee8de14876f.jpg) is a batch size, ![](img/6c8feca3b2da3d6cf371417edff4be4f.jpg) 代表通道的数量, ![](img/9b7d9beafd65e2cf6493bdca741827a5.jpg) 是输入的二维数据的像素高度,![](img/90490a34512e9bd1843ed4da713d0813.jpg) 是输入的二维数据的像素宽度。
*`stride` controls the stride for the cross-correlation, a single number or a tuple.
*`padding` controls the amount of implicit zero-paddings on both sides for `padding` number of points for each dimension.
*`dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.
*`padding` 参数控制了要在二维卷积核的输入信号的上下左右各边填补的0的个数。
*`groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example,
> * At groups=1, all inputs are convolved to all outputs.
> * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
> * At groups= `in_channels`, each input channel is convolved with its own set of filters, of size: ![](img/19131f9f53448ae579b613bc7bc90158.jpg).