Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
magicwindyyd
mindspore
提交
2c2fe9be
M
mindspore
项目概览
magicwindyyd
/
mindspore
与 Fork 源项目一致
Fork自
MindSpore / mindspore
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
mindspore
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
2c2fe9be
编写于
8月 10, 2020
作者:
M
mindspore-ci-bot
提交者:
Gitee
8月 10, 2020
浏览文件
操作
浏览文件
下载
差异文件
!4121 Third round of the enhancement of API comments
Merge pull request !4121 from Simson/enhancement-API
上级
05f405c0
dec2e4a8
变更
13
隐藏空白更改
内联
并排
Showing
13 changed file
with
236 addition
and
235 deletion
+236
-235
mindspore/nn/cell.py
mindspore/nn/cell.py
+20
-20
mindspore/nn/layer/basic.py
mindspore/nn/layer/basic.py
+1
-1
mindspore/nn/layer/container.py
mindspore/nn/layer/container.py
+1
-1
mindspore/nn/layer/conv.py
mindspore/nn/layer/conv.py
+42
-42
mindspore/nn/layer/normalization.py
mindspore/nn/layer/normalization.py
+8
-8
mindspore/nn/layer/pooling.py
mindspore/nn/layer/pooling.py
+20
-20
mindspore/nn/layer/quant.py
mindspore/nn/layer/quant.py
+72
-72
mindspore/nn/loss/loss.py
mindspore/nn/loss/loss.py
+6
-5
mindspore/nn/metrics/accuracy.py
mindspore/nn/metrics/accuracy.py
+5
-5
mindspore/nn/optim/adam.py
mindspore/nn/optim/adam.py
+25
-25
mindspore/nn/wrap/cell_wrapper.py
mindspore/nn/wrap/cell_wrapper.py
+2
-2
mindspore/ops/operations/_grad_ops.py
mindspore/ops/operations/_grad_ops.py
+12
-12
mindspore/ops/operations/nn_ops.py
mindspore/ops/operations/nn_ops.py
+22
-22
未找到文件。
mindspore/nn/cell.py
浏览文件 @
2c2fe9be
...
...
@@ -33,7 +33,7 @@ from ..common.tensor import Tensor
class
Cell
:
"""
Base class for all neural network.
Base class for all neural network
s
.
A 'Cell' could be a single neural network cell, such as conv2d, relu, batch_norm, etc. or a composition of
cells to constructing a network.
...
...
@@ -42,8 +42,8 @@ class Cell:
In general, the autograd algorithm will automatically generate the implementation of the gradient function,
but if bprop method is implemented, the gradient function
will be replaced by the bprop. The bprop implementation will receive a Tensor `dout` containing the gradient
of the loss w.r.t. the output, and a Tensor `out` containing the forward result. The bprop need to compute the
gradient of the loss w.r.t. the inputs, gradient of the loss w.r.t. Parameter variables
is
not supported
of the loss w.r.t. the output, and a Tensor `out` containing the forward result. The bprop need
s
to compute the
gradient of the loss w.r.t. the inputs, gradient of the loss w.r.t. Parameter variables
are
not supported
currently.
Args:
...
...
@@ -138,7 +138,7 @@ class Cell:
"""
Update the all child cells' self.param_prefix.
After
invoked,
can get all the cell's children's name prefix by '_param_prefix'.
After
being invoked, it
can get all the cell's children's name prefix by '_param_prefix'.
"""
cells_name
=
self
.
cells_and_names
()
...
...
@@ -147,9 +147,9 @@ class Cell:
def
update_cell_type
(
self
,
cell_type
):
"""
Update current cell type mainly identify if quantization aware training network.
Update
the
current cell type mainly identify if quantization aware training network.
After
invoked,
can set the cell type to 'cell_type'.
After
being invoked, it
can set the cell type to 'cell_type'.
"""
self
.
cell_type
=
cell_type
...
...
@@ -346,7 +346,7 @@ class Cell:
Please refer to the usage in source code of `mindspore.common._Executor.compile`.
Args:
params (dict): The parameters dictionary used for init data graph.
params (dict): The parameters dictionary used for init
ializing the
data graph.
"""
if
params
is
None
:
params
=
self
.
parameters_dict
()
...
...
@@ -499,7 +499,7 @@ class Cell:
"""
Adds a child cell to the current cell.
Inserts a subcell with
given name to
current cell.
Inserts a subcell with
a given name to the
current cell.
Args:
child_name (str): Name of the child cell.
...
...
@@ -534,7 +534,7 @@ class Cell:
def
init_parameters_data
(
self
,
auto_parallel_mode
=
False
):
"""
Init
all parameters' data
and replace the original saved parameters in cell.
Init
ialize all parameters
and replace the original saved parameters in cell.
Notes:
trainable_params() and other similar interfaces may return different parameter instance after
...
...
@@ -655,7 +655,7 @@ class Cell:
Yields parameters of this cell. If `expand` is True, yield parameters of this cell and all subcells.
Args:
expand (bool): If True, yields parameters of this cell and all subcells. Otherwise,
yields only
parameters
expand (bool): If True, yields parameters of this cell and all subcells. Otherwise,
only yield
parameters
that are direct members of this cell. Default: True.
Examples:
...
...
@@ -682,7 +682,7 @@ class Cell:
Args:
name_prefix (str): Namespace. Default: ''.
expand (bool): If True, yields parameters of this cell and all subcells. Otherwise,
yields only
parameters
expand (bool): If True, yields parameters of this cell and all subcells. Otherwise,
only yield
parameters
that are direct members of this cell. Default: True.
Examples:
...
...
@@ -772,7 +772,7 @@ class Cell:
return
self
.
_scope
def
generate_scope
(
self
):
"""Generate the scope for e
very
cell object in the network."""
"""Generate the scope for e
ach
cell object in the network."""
for
name
,
cell
in
self
.
_children_scope_recursive
():
cell
.
_set_scope
(
name
)
...
...
@@ -819,14 +819,14 @@ class Cell:
`mindspore.train.amp.build_train_network`.
Note:
Call multiple times will overwrite the previous
.
Multiple calls will overwrite
.
Args:
dst_type (:class:`mindspore.dtype`): Transfer Cell to Run with dst_type.
dst_type can be `mindspore.dtype.float16` or `mindspore.dtype.float32`.
Raises:
ValueError: If dst_type is not float32 or float16.
ValueError: If dst_type is not float32
n
or float16.
"""
if
dst_type
not
in
(
mstype
.
float16
,
mstype
.
float32
):
raise
ValueError
(
"dst_type should inside float32 or float16."
)
...
...
@@ -871,8 +871,8 @@ class Cell:
Set the cell to auto parallel mode.
Note:
If a cell needs to use auto parallel or semi auto parallel mode for training, evaluation or prediction,
this interface needs to be called
for
the cell.
If a cell needs to use
the
auto parallel or semi auto parallel mode for training, evaluation or prediction,
this interface needs to be called
by
the cell.
"""
self
.
_auto_parallel_mode
=
True
self
.
add_flags
(
auto_parallel
=
True
)
...
...
@@ -890,9 +890,9 @@ class Cell:
Set the cell backward hook function. Note that this function is only supported in Pynative Mode.
Note:
fn should be defined as
following code shows, `cell_name` is the name of registered cell,
`grad_input` is gradient passed to the cell
, `grad_output` is the gradient computed and pass to
next cell or primitve, which may be modified and return.
fn should be defined as
the following code. `cell_name` is the name of registered cell.
`grad_input` is gradient passed to the cell
. `grad_output` is the gradient computed and passed to the
next cell or primitve, which may be modified and return
ed
.
>>> hook_fn(cell_name, grad_input, grad_output) -> Tensor or None
Args:
...
...
@@ -907,7 +907,7 @@ class Cell:
Set whether the trainable parameter is updated by parameter server.
Note:
This only works when running task in
parameter server mode.
It only works when a running task is in the
parameter server mode.
Args:
recurse (bool): Whether sets the trainable parameters of subcells. Default: True.
...
...
mindspore/nn/layer/basic.py
浏览文件 @
2c2fe9be
...
...
@@ -172,7 +172,7 @@ class Dense(Cell):
bias_init (Union[Tensor, str, Initializer, numbers.Number]): The trainable bias_init parameter. The dtype is
same as input x. The values of str refer to the function `initializer`. Default: 'zeros'.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: True.
activation (str): activate function applied to the output of the fully connected layer, eg. '
relu
'.
activation (str): activate function applied to the output of the fully connected layer, eg. '
ReLU
'.
Default: None.
Raises:
...
...
mindspore/nn/layer/container.py
浏览文件 @
2c2fe9be
...
...
@@ -236,7 +236,7 @@ class CellList(_CellListBase, Cell):
Appends cells from a Python iterable to the end of the list.
Raises:
TypeError: If the cells
is
not a list of subcells.
TypeError: If the cells
are
not a list of subcells.
"""
if
not
isinstance
(
cells
,
list
):
raise
TypeError
(
'Cells {} should be list of subcells'
.
format
(
cells
))
...
...
mindspore/nn/layer/conv.py
浏览文件 @
2c2fe9be
...
...
@@ -111,11 +111,11 @@ class Conv2d(_Conv):
out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,
where :math:`ccor` is cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to :math:`i`-th channel of the :math:`j`-th
where :math:`ccor` is
the
cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to
the
:math:`i`-th channel of the :math:`j`-th
filter and :math:`out_{j}` corresponds to the :math:`j`-th channel of the output. :math:`W_{ij}` is a slice
of kernel and it has shape :math:`(\text{ks_h}, \text{ks_w})`, where :math:`\text{ks_h}` and
:math:`\text{ks_w}` are height and width of the convolution kernel. The full kernel has shape
:math:`\text{ks_w}` are
the
height and width of the convolution kernel. The full kernel has shape
:math:`(C_{out}, C_{in} // \text{group}, \text{ks_h}, \text{ks_w})`, where group is the group number
to split the input in the channel dimension.
...
...
@@ -132,7 +132,7 @@ class Conv2d(_Conv):
in_channels (int): The number of input channel :math:`C_{in}`.
out_channels (int): The number of output channel :math:`C_{out}`.
kernel_size (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the height
and width of the 2D convolution window. Single int means the value i
f for both height and
width of
and width of the 2D convolution window. Single int means the value i
s for both the height and the
width of
the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
width of the kernel.
stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
...
...
@@ -141,13 +141,13 @@ class Conv2d(_Conv):
pad_mode (str): Specifies padding mode. The optional values are
"same", "valid", "pad". Default: "same".
- same: Adopts the way of completion.
Output height and width will be the same as the input.
Total number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible. Otherwise, the
- same: Adopts the way of completion.
The height and width of the output will be the same as
the input. The total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible. Otherwise, the
last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
must be 0.
- valid: Adopts the way of discarding. The possibly largest height and width of output will be return
- valid: Adopts the way of discarding. The possibly largest height and width of output will be return
ed
without padding. Extra pixels will be discarded. If this mode is set, `padding`
must be 0.
...
...
@@ -155,9 +155,9 @@ class Conv2d(_Conv):
Tensor borders. `padding` should be greater than or equal to 0.
padding (Union[int, tuple[int]]): Implicit paddings on both sides of the input. If `padding` is one integer,
the padding of top, bottom, left and right is
same, equal to padding. If `padding` is tuple with
four integer, the padding of top, bottom, left and right equal to padding[0], padding[1
],
padding[
2], padding[3] with corresponding
. Default: 0.
the padding of top, bottom, left and right is
the same, equal to padding. If `padding` is a tuple
with four integers, the padding of top, bottom, left and right will be equal to padding[0
],
padding[
1], padding[2], and padding[3] accordingly
. Default: 0.
dilation (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the dilation rate
to use for dilated convolution. If set to be :math:`k > 1`, there will
be :math:`k - 1` pixels skipped for each sampling location. Its value should
...
...
@@ -167,7 +167,7 @@ class Conv2d(_Conv):
divisible by the number of groups. Default: 1.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
It can be a Tensor, a string, an Initializer or a number
s.Number
. When a string is specified,
It can be a Tensor, a string, an Initializer or a number. When a string is specified,
values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
...
...
@@ -274,10 +274,10 @@ class Conv1d(_Conv):
out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,
where :math:`ccor` is cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to :math:`i`-th channel of the :math:`j`-th
where :math:`ccor` is
the
cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to
the
:math:`i`-th channel of the :math:`j`-th
filter and :math:`out_{j}` corresponds to the :math:`j`-th channel of the output. :math:`W_{ij}` is a slice
of kernel and it has shape :math:`(\text{ks_w})`, where :math:`\text{ks_w}`
ar
e width of the convolution kernel.
of kernel and it has shape :math:`(\text{ks_w})`, where :math:`\text{ks_w}`
is th
e width of the convolution kernel.
The full kernel has shape :math:`(C_{out}, C_{in} // \text{group}, \text{ks_w})`, where group is the group number
to split the input in the channel dimension.
...
...
@@ -285,8 +285,8 @@ class Conv1d(_Conv):
:math:`\left \lfloor{1 + \frac{W_{in} + 2 \times \text{padding} - \text{ks_w} -
(\text{ks_w} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor` respectively.
The first introduction
can be found in paper `Gradient Based Learning Applied to Document Recognition
<http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_.
The first introduction
of convolution layer can be found in paper `Gradient Based Learning Applied to Document
Recognition
<http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_.
Args:
in_channels (int): The number of input channel :math:`C_{in}`.
...
...
@@ -298,13 +298,13 @@ class Conv1d(_Conv):
pad_mode (str): Specifies padding mode. The optional values are
"same", "valid", "pad". Default: "same".
- same: Adopts the way of completion.
O
utput width will be the same as the input.
T
otal number of padding will be calculated for
horizontal
- same: Adopts the way of completion.
The o
utput width will be the same as the input.
T
he total number of padding will be calculated in the
horizontal
direction and evenly distributed to left and right if possible. Otherwise, the
last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
must be 0.
- valid: Adopts the way of discarding. The possibl
y largest width of output will be return
- valid: Adopts the way of discarding. The possibl
e largest width of the output will be returned
without padding. Extra pixels will be discarded. If this mode is set, `padding`
must be 0.
...
...
@@ -320,8 +320,8 @@ class Conv1d(_Conv):
group (int): Split filter into groups, `in_ channels` and `out_channels` should be
divisible by the number of groups. Default: 1.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]):
I
nitializer for the convolution kernel.
It can be a Tensor, a string, an Initializer or a number
s.Number
. When a string is specified,
weight_init (Union[Tensor, str, Initializer, numbers.Number]):
An i
nitializer for the convolution kernel.
It can be a Tensor, a string, an Initializer or a number. When a string is specified,
values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
...
...
@@ -443,8 +443,8 @@ class Conv2dTranspose(_Conv):
r
"""
2D transposed convolution layer.
Compute a 2D transposed convolution, which is also know as a deconvolution
(although it is not actual deconvolution).
Compute a 2D transposed convolution, which is also know
n
as a deconvolution
(although it is not a
n a
ctual deconvolution).
Input is typically of shape :math:`(N, C, H, W)`, where :math:`N` is batch size and :math:`C` is channel number.
...
...
@@ -452,7 +452,7 @@ class Conv2dTranspose(_Conv):
in_channels (int): The number of channels in the input space.
out_channels (int): The number of channels in the output space.
kernel_size (Union[int, tuple]): int or tuple with 2 integers, which specifies the height
and width of the 2D convolution window. Single int means the value is for both
height and
width of
and width of the 2D convolution window. Single int means the value is for both
the height and the
width of
the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
width of the kernel.
stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
...
...
@@ -467,19 +467,19 @@ class Conv2dTranspose(_Conv):
- valid: Adopted the way of discarding.
padding (Union[int, tuple[int]]): Implicit paddings on both sides of the input. If `padding` is one integer,
the padding of top, bottom, left and right is
same, equal to padding. If `padding` is tuple with
four integer, the padding of top, bottom, left and right equal to padding[0], padding[1
],
padding[
2], padding[3] with corresponding
. Default: 0.
dilation (Union[int, tuple[int]]): The data type is int or
tuple with
2 integers. Specifies the dilation rate
the padding of top, bottom, left and right is
the same, equal to padding. If `padding` is a tuple
with four integers, the padding of top, bottom, left and right will be equal to padding[0
],
padding[
1], padding[2], and padding[3] accordingly
. Default: 0.
dilation (Union[int, tuple[int]]): The data type is int or
a tuple of
2 integers. Specifies the dilation rate
to use for dilated convolution. If set to be :math:`k > 1`, there will
be :math:`k - 1` pixels skipped for each sampling location. Its value should
be greater or equal to 1 and bounded by the height and width of the
be greater
than
or equal to 1 and bounded by the height and width of the
input. Default: 1.
group (int): Split filter into groups, `in_channels` and `out_channels` should be
divisible by the number of groups. This
i
s not support for Davinci devices when group > 1. Default: 1.
divisible by the number of groups. This
doe
s not support for Davinci devices when group > 1. Default: 1.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
It can be a Tensor, a string, an Initializer or a number
s.Number
. When a string is specified,
It can be a Tensor, a string, an Initializer or a number. When a string is specified,
values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
...
...
@@ -614,8 +614,8 @@ class Conv1dTranspose(_Conv):
r
"""
1D transposed convolution layer.
Compute a 1D transposed convolution, which is also know as a deconvolution
(although it is not actual deconvolution).
Compute a 1D transposed convolution, which is also know
n
as a deconvolution
(although it is not a
n a
ctual deconvolution).
Input is typically of shape :math:`(N, C, W)`, where :math:`N` is batch size and :math:`C` is channel number.
...
...
@@ -805,11 +805,11 @@ class DepthwiseConv2d(Cell):
out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,
where :math:`ccor` is cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to :math:`i`-th channel of the :math:`j`-th
where :math:`ccor` is
the
cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to
the
:math:`i`-th channel of the :math:`j`-th
filter and :math:`out_{j}` corresponds to the :math:`j`-th channel of the output. :math:`W_{ij}` is a slice
of kernel and it has shape :math:`(\text{ks_h}, \text{ks_w})`, where :math:`\text{ks_h}` and
:math:`\text{ks_w}` are height and width of the convolution kernel. The full kernel has shape
:math:`\text{ks_w}` are
the
height and width of the convolution kernel. The full kernel has shape
:math:`(C_{out}, C_{in} // \text{group}, \text{ks_h}, \text{ks_w})`, where group is the group number
to split the input in the channel dimension.
...
...
@@ -826,7 +826,7 @@ class DepthwiseConv2d(Cell):
in_channels (int): The number of input channel :math:`C_{in}`.
out_channels (int): The number of output channel :math:`C_{out}`.
kernel_size (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the height
and width of the 2D convolution window. Single int means the value i
f for both height and
width of
and width of the 2D convolution window. Single int means the value i
s for both the height and the
width of
the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
width of the kernel.
stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
...
...
@@ -835,13 +835,13 @@ class DepthwiseConv2d(Cell):
pad_mode (str): Specifies padding mode. The optional values are
"same", "valid", "pad". Default: "same".
- same: Adopts the way of completion.
Output height and width will be the same as the input.
Total number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible. Otherwise, the
- same: Adopts the way of completion.
The height and width of the output will be the same as
the input. The total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible. Otherwise, the
last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
must be 0.
- valid: Adopts the way of discarding. The possibly largest height and width of output will be return
- valid: Adopts the way of discarding. The possibly largest height and width of output will be return
ed
without padding. Extra pixels will be discarded. If this mode is set, `padding`
must be 0.
...
...
mindspore/nn/layer/normalization.py
浏览文件 @
2c2fe9be
...
...
@@ -248,7 +248,7 @@ class BatchNorm1d(_BatchNorm):
eps (float): A value added to the denominator for numerical stability. Default: 1e-5.
momentum (float): A floating hyperparameter of the momentum for the
running_mean and running_var computation. Default: 0.9.
affine (bool): A bool value
when set to True, gamma and beta can be learnable
. Default: True.
affine (bool): A bool value
. When set to True, gamma and beta can be learned
. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the gamma weight.
The values of str refer to the function `initializer` including 'zeros', 'ones', 'xavier_uniform',
'he_uniform', etc. Default: 'ones'.
...
...
@@ -262,9 +262,9 @@ class BatchNorm1d(_BatchNorm):
The values of str refer to the function `initializer` including 'zeros', 'ones', 'xavier_uniform',
'he_uniform', etc. Default: 'ones'.
use_batch_statistics (bool): If true, use the mean value and variance value of current batch data. If false,
use the mean value and variance value of specified value. If None, t
raining process will use the mean and
variance of current batch data and track the running mean and variance, eval process will use the running
mean and variance. Default: None.
use the mean value and variance value of specified value. If None, t
he training process will use the mean
and variance of current batch data and track the running mean and variance, the evaluation process will use
the running
mean and variance. Default: None.
Inputs:
- **input** (Tensor) - Tensor of shape :math:`(N, C_{in})`.
...
...
@@ -324,7 +324,7 @@ class BatchNorm2d(_BatchNorm):
eps (float): A value added to the denominator for numerical stability. Default: 1e-5.
momentum (float): A floating hyperparameter of the momentum for the
running_mean and running_var computation. Default: 0.9.
affine (bool): A bool value
when set to True, gamma and beta can be learnable
. Default: True.
affine (bool): A bool value
. When set to True, gamma and beta can be learned
. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the gamma weight.
The values of str refer to the function `initializer` including 'zeros', 'ones', 'xavier_uniform',
'he_uniform', etc. Default: 'ones'.
...
...
@@ -338,9 +338,9 @@ class BatchNorm2d(_BatchNorm):
The values of str refer to the function `initializer` including 'zeros', 'ones', 'xavier_uniform',
'he_uniform', etc. Default: 'ones'.
use_batch_statistics (bool): If true, use the mean value and variance value of current batch data. If false,
use the mean value and variance value of specified value. If None, t
raining process will use the mean and
variance of current batch data and track the running mean and variance, eval process will use the running
mean and variance. Default: None.
use the mean value and variance value of specified value. If None, t
he training process will use the mean
and variance of current batch data and track the running mean and variance, the evaluation process will use
the running
mean and variance. Default: None.
Inputs:
- **input** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.
...
...
mindspore/nn/layer/pooling.py
浏览文件 @
2c2fe9be
...
...
@@ -84,16 +84,16 @@ class MaxPool2d(_PoolNd):
stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
the height and width of movement are both strides, or a tuple of two int numbers that
represent height and width of movement respectively. Default: 1.
pad_mode (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
pad_mode (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
- same: Adopts the way of completion.
Output height and width
will be the same as
the input. T
otal number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible.
- same: Adopts the way of completion.
The height and width of the output
will be the same as
the input. T
he total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible.
Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possibl
y
largest height and width of output
will be return without padding. Extra pixels will be discarded.
- valid: Adopts the way of discarding. The possibl
e
largest height and width of output
will be return
ed
without padding. Extra pixels will be discarded.
Inputs:
- **input** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.
...
...
@@ -158,23 +158,23 @@ class AvgPool2d(_PoolNd):
pad_mode for training only supports "same" and "valid".
Args:
kernel_size (Union[int, tuple[int]]): The size of kernel used to take the average value
,
is an int number that represents height and width are both kernel_size
,
kernel_size (Union[int, tuple[int]]): The size of kernel used to take the average value
.
The data type of kernel_size should be int and the value represents the height and width
,
or a tuple of two int numbers that represent height and width respectively.
Default: 1.
stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
the height and width of movement are both strides, or a tuple of two int numbers that
represent height and width of movement respectively. Default: 1.
pad_mode (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
pad_mode (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
- same: Adopts the way of completion.
Output height and width
will be the same as
the input. T
otal number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible.
- same: Adopts the way of completion.
The height and width of the output
will be the same as
the input. T
he total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible.
Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possibl
y
largest height and width of output
will be return without padding. Extra pixels will be discarded.
- valid: Adopts the way of discarding. The possibl
e
largest height and width of output
will be return
ed
without padding. Extra pixels will be discarded.
Inputs:
...
...
@@ -238,16 +238,16 @@ class AvgPool1d(_PoolNd):
kernel_size (int): The size of kernel window used to take the average value, Default: 1.
stride (int): The distance of kernel moving, an int number that represents
the width of movement is strides, Default: 1.
pad_mode (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
pad_mode (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
- same: Adopts the way of completion.
Output height and width
will be the same as
the input. T
otal number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible.
- same: Adopts the way of completion.
The height and width of the output
will be the same as
the input. T
he total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible.
Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possibl
y
largest height and width of output
will be return without padding. Extra pixels will be discarded.
- valid: Adopts the way of discarding. The possibl
e
largest height and width of output
will be return
ed
without padding. Extra pixels will be discarded.
Inputs:
...
...
mindspore/nn/layer/quant.py
浏览文件 @
2c2fe9be
...
...
@@ -56,27 +56,27 @@ class Conv2dBnAct(Cell):
r
"""
A combination of convolution, Batchnorm, activation layer.
For a more D
etailed overview of Conv2d op.
This part is a more d
etailed overview of Conv2d op.
Args:
in_channels (int): The number of input channel :math:`C_{in}`.
out_channels (int): The number of output channel :math:`C_{out}`.
kernel_size (Union[int, tuple]): The data type is int or tuple with 2 integers. Specifies the height
and width of the 2D convolution window. Single int means the value i
f
for both height and width of
and width of the 2D convolution window. Single int means the value i
s
for both height and width of
the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
width of the kernel.
stride (int): Specifies stride for all spatial dimensions with the same value.
V
alue of stride should be
greater
or equal to 1 but bounded by
the height and width of the input. Default: 1.
stride (int): Specifies stride for all spatial dimensions with the same value.
The v
alue of stride should be
greater
than or equal to 1 and lower than any one of
the height and width of the input. Default: 1.
pad_mode (str): Specifies padding mode. The optional values are "same", "valid", "pad". Default: "same".
padding (int): Implicit paddings on both sides of the input. Default: 0.
dilation (int): Specifying the dilation rate to use for dilated convolution. If set to be :math:`k > 1`,
there will be :math:`k - 1` pixels skipped for each sampling location. Its value should be greater
or equal to 1 and
bounded by
the height and width of the input. Default: 1.
there will be :math:`k - 1` pixels skipped for each sampling location. Its value should be greater
than
or equal to 1 and
lower than any one of
the height and width of the input. Default: 1.
group (int): Split filter into groups, `in_ channels` and `out_channels` should be
divisible by the number of groups. Default: 1.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
It can be a Tensor, a string, an Initializer or a number
s.Number
. When a string is specified,
It can be a Tensor, a string, an Initializer or a number. When a string is specified,
values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
...
...
@@ -160,9 +160,9 @@ class Conv2dBnAct(Cell):
class
DenseBnAct
(
Cell
):
r
"""
A combination of Dense, Batchnorm, activation layer.
A combination of Dense, Batchnorm, a
nd the a
ctivation layer.
For a more D
etailed overview of Dense op.
This part is a more d
etailed overview of Dense op.
Args:
in_channels (int): The number of channels in the input space.
...
...
@@ -172,11 +172,11 @@ class DenseBnAct(Cell):
bias_init (Union[Tensor, str, Initializer, numbers.Number]): The trainable bias_init parameter. The dtype is
same as input x. The values of str refer to the function `initializer`. Default: 'zeros'.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: True.
activation (Cell):
Regularizer function applied to the output of the layer, eg. 'relu
'. Default: None.
has_bn (bool): Specifies to use
d
batchnorm or not. Default: False.
activation (Cell):
The regularization function applied to the output of the layer, eg. 'ReLU
'. Default: None.
has_bn (bool): Specifies to use batchnorm or not. Default: False.
activation (string): Specifies activation type. The optional values are as following:
'
softmax', 'logsoftmax', 'relu', 'relu6', 'tanh', 'gelu', 's
igmoid',
'
prelu', 'leakyrelu', 'hswish', 'hs
igmoid'. Default: None.
'
Softmax', 'LogSoftmax', 'ReLU', 'ReLU6', 'Tanh', 'GELU', 'S
igmoid',
'
PReLU', 'LeakyReLU', 'h-Swish', and 'h-S
igmoid'. Default: None.
Inputs:
- **input** (Tensor) - Tensor of shape :math:`(N, in\_channels)`.
...
...
@@ -292,10 +292,10 @@ class FakeQuantWithMinMax(Cell):
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
channel_axis (int): Quantization by channel axis. Default: 1.
num_channels (int): declarate the min and max channel size, Default: 1.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - The input of FakeQuantWithMinMax.
...
...
@@ -398,7 +398,7 @@ class Conv2dBnFoldQuant(Cell):
r
"""
2D convolution with BatchNormal op folded layer.
For a more D
etailed overview of Conv2d op.
This part is a more d
etailed overview of Conv2d op.
Args:
in_channels (int): The number of input channel :math:`C_{in}`.
...
...
@@ -419,13 +419,13 @@ class Conv2dBnFoldQuant(Cell):
mean vector. Default: 'zeros'.
var_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the
variance vector. Default: 'ones'.
fake (bool):
Conv2dBnFoldQuant Cell add FakeQuantWithMinMax op or not
. Default: True.
fake (bool):
Whether Conv2dBnFoldQuant Cell adds FakeQuantWithMinMax op
. Default: True.
per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int):
Quantization delay parameters according by
global step. Default: 0.
freeze_bn (int):
Quantization freeze BatchNormal op according by
global step. Default: 100000.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int):
The Quantization delay parameters according to the
global step. Default: 0.
freeze_bn (int):
The quantization freeze BatchNormal op is according to the
global step. Default: 100000.
Inputs:
- **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.
...
...
@@ -585,7 +585,7 @@ class Conv2dBnWithoutFoldQuant(Cell):
r
"""
2D convolution + batchnorm without fold with fake quant op layer.
For a more D
etailed overview of Conv2d op.
This part is a more d
etailed overview of Conv2d op.
Args:
in_channels (int): The number of input channel :math:`C_{in}`.
...
...
@@ -605,10 +605,10 @@ class Conv2dBnWithoutFoldQuant(Cell):
Default: 'normal'.
bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Default: 'zeros'.
per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.
...
...
@@ -711,7 +711,7 @@ class Conv2dQuant(Cell):
r
"""
2D convolution with fake quant op layer.
For a more D
etailed overview of Conv2d op.
This part is a more d
etailed overview of Conv2d op.
Args:
in_channels (int): The number of input channel :math:`C_{in}`.
...
...
@@ -728,10 +728,10 @@ class Conv2dQuant(Cell):
Default: 'normal'.
bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Default: 'zeros'.
per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.
...
...
@@ -826,7 +826,7 @@ class DenseQuant(Cell):
r
"""
The fully connected layer with fake quant op.
For a more D
etailed overview of Dense op.
This part is a more d
etailed overview of Dense op.
Args:
in_channels (int): The dimension of the input space.
...
...
@@ -836,12 +836,12 @@ class DenseQuant(Cell):
bias_init (Union[Tensor, str, Initializer, numbers.Number]): The trainable bias_init parameter. The dtype is
same as input x. The values of str refer to the function `initializer`. Default: 'zeros'.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: True.
activation (str):
Regularizer
function applied to the output of the layer, eg. 'relu'. Default: None.
activation (str):
The regularization
function applied to the output of the layer, eg. 'relu'. Default: None.
per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.
...
...
@@ -906,7 +906,7 @@ class DenseQuant(Cell):
quant_delay
=
quant_delay
)
def
construct
(
self
,
x
):
"""Use operators to construct t
o
Dense layer."""
"""Use operators to construct t
he
Dense layer."""
output
=
self
.
fake_quant_weight
(
self
.
weight
)
output
=
self
.
matmul
(
x
,
output
)
if
self
.
has_bias
:
...
...
@@ -942,16 +942,16 @@ class ActQuant(_QuantActivation):
Add Fake Quant OP after activation. Not Recommand to used these cell for Fake Quant Op
Will climp the max range of the activation and the relu6 do the same operation.
For a more D
etailed overview of ReLU6 op.
This part is a more d
etailed overview of ReLU6 op.
Args:
activation (Cell): Activation cell class.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by global step
. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the global steps
. Default: 0.
Inputs:
- **x** (Tensor) - The input of ReLU6Quant.
...
...
@@ -997,16 +997,16 @@ class LeakyReLUQuant(_QuantActivation):
r
"""
LeakyReLUQuant activation function. Add Fake Quant OP after HSwish OP.
For a more D
etailed overview of HSwish op.
This part is a more d
etailed overview of HSwish op.
Args:
activation (Cell): Activation cell class.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - The input of LeakyReLUQuant.
...
...
@@ -1067,16 +1067,16 @@ class HSwishQuant(_QuantActivation):
r
"""
HSwishQuant activation function. Add Fake Quant OP after HSwish OP.
For a more D
etailed overview of HSwish op.
This part is a more d
etailed overview of HSwish op.
Args:
activation (Cell): Activation cell class.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - The input of HSwishQuant.
...
...
@@ -1136,16 +1136,16 @@ class HSigmoidQuant(_QuantActivation):
r
"""
HSigmoidQuant activation function. Add Fake Quant OP before and after HSigmoid OP.
For a more D
etailed overview of HSigmoid op.
This part is a more d
etailed overview of HSigmoid op.
Args:
activation (Cell): Activation cell class.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - The input of HSigmoidQuant.
...
...
@@ -1205,15 +1205,15 @@ class TensorAddQuant(Cell):
r
"""
Add Fake Quant OP after TensorAdd OP.
For a more D
etailed overview of TensorAdd op.
This part is a more d
etailed overview of TensorAdd op.
Args:
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - The input of TensorAddQuant.
...
...
@@ -1257,15 +1257,15 @@ class MulQuant(Cell):
r
"""
Add Fake Quant OP after Mul OP.
For a more D
etailed overview of Mul op.
This part is a more d
etailed overview of Mul op.
Args:
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int):
Q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
Quantization algorithm use
symmetric or not. Default: False.
narrow_range (bool):
Quantization algorithm use
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
by
global step. Default: 0.
num_bits (int):
The q
uantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool):
The quantization algorithm is
symmetric or not. Default: False.
narrow_range (bool):
The quantization algorithm uses
narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according
to the
global step. Default: 0.
Inputs:
- **x** (Tensor) - The input of MulQuant.
...
...
@@ -1317,7 +1317,7 @@ class QuantBlock(Cell):
bias_init (Union[Tensor, str, Initializer, numbers.Number]): The trainable bias_init parameter. The dtype is
same as input x. The values of str refer to the function `initializer`. Default: 'zeros'.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: True.
activation (str):
Regularizer
function applied to the output of the layer, eg. 'relu'. Default: None.
activation (str):
The regularization
function applied to the output of the layer, eg. 'relu'. Default: None.
batchnorm (bool): Specifies to used batchnorm or not. Default: None.
activation (string): Specifies activation type. The optional values are as following:
'softmax', 'logsoftmax', 'relu', 'relu6', 'tanh', 'gelu', 'sigmoid',
...
...
mindspore/nn/loss/loss.py
浏览文件 @
2c2fe9be
...
...
@@ -353,17 +353,18 @@ class CosineEmbeddingLoss(_Loss):
Args:
margin (float): Should be in [-1.0, 1.0]. Default 0.0.
reduction (str): Specifies which reduction to
apply
to the output. It should be one of
"none", "mean",
"sum", meaning no reduction, reduce mean or
sum on output, respectively. Default "mean".
reduction (str): Specifies which reduction to
be applied
to the output. It should be one of
"none", "mean",
and "sum", meaning no reduction, reduce mean and
sum on output, respectively. Default "mean".
Inputs:
- **input_x1** (Tensor) - Input tensor.
- **input_x2** (Tensor) - Its shape and data type should be the same as `input_x1`'s shape and data type.
- **y** (Tensor) - Contains value 1 or -1. Suppose
`input_x1` shape
is
:math:`(x_1, x_2, x_3,..., x_R)`, then
`target` shape
should be :math:`(x_1, x_3, x_4, ..., x_R)`.
- **y** (Tensor) - Contains value 1 or -1. Suppose
the shape of `input_x1`
is
:math:`(x_1, x_2, x_3,..., x_R)`, then
the shape of `target`
should be :math:`(x_1, x_3, x_4, ..., x_R)`.
Outputs:
- **loss** (Tensor) - If `reduction` is "none", its shape is the same as `y`'s shape, loss value otherwise.
- **loss** (Tensor) - If `reduction` is "none", its shape is the same as `y`'s shape, otherwise a scalar value
will be returned.
Examples:
>>> x1 = Tensor(np.array([[0.3, 0.8], [0.4, 0.3]]), mindspore.float32)
...
...
mindspore/nn/metrics/accuracy.py
浏览文件 @
2c2fe9be
...
...
@@ -21,9 +21,9 @@ class Accuracy(EvaluationBase):
r
"""
Calculates the accuracy for classification and multilabel data.
The accuracy class creates two local variables,
correct number and
total number that are used to compute the
The accuracy class creates two local variables,
the correct number and the
total number that are used to compute the
frequency with which predictions matches labels. This frequency is ultimately returned as the accuracy: an
idempotent operation that simply divides
correct number by
total number.
idempotent operation that simply divides
the correct number by the
total number.
.. math::
\text{accuracy} =\frac{\text{true_positive} + \text{true_negative}}
...
...
@@ -58,17 +58,17 @@ class Accuracy(EvaluationBase):
Args:
inputs: Input `y_pred` and `y`. `y_pred` and `y` are a `Tensor`, a list or an array.
For 'classification' evaluation type, `y_pred` is in most cases (not strictly) a list
For
the
'classification' evaluation type, `y_pred` is in most cases (not strictly) a list
of floating numbers in range :math:`[0, 1]`
and the shape is :math:`(N, C)`, where :math:`N` is the number of cases and :math:`C`
is the number of categories. Shape of `y` can be :math:`(N, C)` with values 0 and 1 if one-hot
encoding is used or the shape is :math:`(N,)` with integer values if index of category is used.
For 'multilabel' evaluation type, `y_pred` and `y` can only be one-hot encoding with
values 0 or 1. Indices with 1 indicate positive category. The shape of `y_pred` and `y`
values 0 or 1. Indices with 1 indicate
the
positive category. The shape of `y_pred` and `y`
are both :math:`(N, C)`.
Raises:
ValueError: If the number of the input is not 2.
ValueError: If the number of the input
s
is not 2.
"""
if
len
(
inputs
)
!=
2
:
raise
ValueError
(
'Accuracy need 2 inputs (y_pred, y), but got {}'
.
format
(
len
(
inputs
)))
...
...
mindspore/nn/optim/adam.py
浏览文件 @
2c2fe9be
...
...
@@ -132,7 +132,7 @@ def _check_param_value(beta1, beta2, eps, prim_name):
class
Adam
(
Optimizer
):
r
"""
Updates gradients by Adaptive Moment Estimation (Adam) algorithm.
Updates gradients by
the
Adaptive Moment Estimation (Adam) algorithm.
The Adam algorithm is proposed in `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_.
...
...
@@ -157,9 +157,9 @@ class Adam(Optimizer):
weight decay is positive. When not separating parameter groups, the `weight_decay` in the API will be applied
on the parameters without 'beta' or 'gamma' in their names if `weight_decay` is positive.
To improve parameter groups performance, the customized order of parameters
can be
supported.
To improve parameter groups performance, the customized order of parameters
is
supported.
The sparse strategy is applied while the SparseGatherV2 operator
being
used for forward network.
The sparse strategy is applied while the SparseGatherV2 operator
is
used for forward network.
The sparse feature is under continuous development. The sparse
behavior is currently performed on the CPU.
...
...
@@ -170,36 +170,36 @@ class Adam(Optimizer):
- params: Required. The value should be a list of `Parameter`.
- lr: Optional. If "lr" i
n the keys, the value of
corresponding learning rate will be used.
- lr: Optional. If "lr" i
s in the keys, the value of the
corresponding learning rate will be used.
If not, the `learning_rate` in the API will be used.
- weight_decay: Optional. If "weight_decay" i
n the keys, the value of
corresponding weight decay
- weight_decay: Optional. If "weight_decay" i
s in the keys, the value of the
corresponding weight decay
will be used. If not, the `weight_decay` in the API will be used.
- order_params: Optional. If "order_params" in the keys, the value should be the order of parameters and
the order will be followed in
optimizer. There are no other keys in the `dict` and the parameters which
in the value of
'order_params' should be in one of group parameters.
- order_params: Optional. If "order_params" i
s i
n the keys, the value should be the order of parameters and
the order will be followed in
the optimizer. There are no other keys in the `dict` and the parameters
which in the
'order_params' should be in one of group parameters.
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
When the learning_rate is a Iterable or a Tensor with dimension of 1, use
the
dynamic learning rate, then
the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
use dynamic learning rate, the i-th learning rate will be calculated during the process of training
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
Default: 1e-3.
beta1 (float): The exponential decay rate for the 1st moment estimat
es. Should be in range (0.0, 1.0). Default:
0.9.
beta2 (float): The exponential decay rate for the 2nd moment estimat
es. Should be in range (0.0, 1.0). Default:
0.999.
beta1 (float): The exponential decay rate for the 1st moment estimat
ions. Should be in range (0.0, 1.0).
Default:
0.9.
beta2 (float): The exponential decay rate for the 2nd moment estimat
ions. Should be in range (0.0, 1.0).
Default:
0.999.
eps (float): Term added to the denominator to improve numerical stability. Should be greater than 0. Default:
1e-8.
use_locking (bool): Whether to enable a lock to protect updating variable tensors.
If True, updating of the var, m, and v tensors will be protected by a lock.
If False, the result is unpredictable. Default: False.
use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
If True, update
s
the gradients using NAG.
If False, update
s
the gradients without using NAG. Default: False.
If True, update the gradients using NAG.
If False, update the gradients without using NAG. Default: False.
weight_decay (float): Weight decay (L2 penalty). It should be equal to or greater than 0. Default: 0.0.
loss_scale (float): A floating point value for the loss scale. Should be greater than 0. Default: 1.0.
...
...
@@ -278,11 +278,11 @@ class Adam(Optimizer):
class
AdamWeightDecay
(
Optimizer
):
"""
Implements
Adam algorithm weight decay fix
.
Implements
the Adam algorithm to fix the weight decay
.
Note:
When separating parameter groups, the weight decay in each group will be applied on the parameters if the
weight decay is posi
g
ive. When not separating parameter groups, the `weight_decay` in the API will be applied
weight decay is posi
t
ive. When not separating parameter groups, the `weight_decay` in the API will be applied
on the parameters without 'beta' or 'gamma' in their names if `weight_decay` is positive.
To improve parameter groups performance, the customized order of parameters can be supported.
...
...
@@ -294,27 +294,27 @@ class AdamWeightDecay(Optimizer):
- params: Required. The value should be a list of `Parameter`.
- lr: Optional. If "lr" i
n the keys, the value of
corresponding learning rate will be used.
- lr: Optional. If "lr" i
s in the keys, the value of the
corresponding learning rate will be used.
If not, the `learning_rate` in the API will be used.
- weight_decay: Optional. If "weight_decay" i
n the keys, the value of
corresponding weight decay
- weight_decay: Optional. If "weight_decay" i
s in the keys, the value of the
corresponding weight decay
will be used. If not, the `weight_decay` in the API will be used.
- order_params: Optional. If "order_params" in the keys, the value should be the order of parameters and
the order will be followed in
optimizer. There are no other keys in the `dict` and the parameters which
in the value of
'order_params' should be in one of group parameters.
- order_params: Optional. If "order_params" i
s i
n the keys, the value should be the order of parameters and
the order will be followed in
the optimizer. There are no other keys in the `dict` and the parameters
which in the
'order_params' should be in one of group parameters.
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
When the learning_rate is a Iterable or a Tensor with dimension of 1, use
the
dynamic learning rate, then
the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
use dynamic learning rate, the i-th learning rate will be calculated during the process of training
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
Default: 1e-3.
beta1 (float): The exponential decay rate for the 1st moment estimat
e
s. Default: 0.9.
beta1 (float): The exponential decay rate for the 1st moment estimat
ion
s. Default: 0.9.
Should be in range (0.0, 1.0).
beta2 (float): The exponential decay rate for the 2nd moment estimat
e
s. Default: 0.999.
beta2 (float): The exponential decay rate for the 2nd moment estimat
ion
s. Default: 0.999.
Should be in range (0.0, 1.0).
eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6.
Should be greater than 0.
...
...
mindspore/nn/wrap/cell_wrapper.py
浏览文件 @
2c2fe9be
...
...
@@ -201,8 +201,8 @@ class DataWrapper(Cell):
Args:
network (Cell): The training network for dataset.
dataset_types (list): The type of dataset. The list contains
describes
the types of the inputs.
dataset_shapes (list): The shapes of dataset. The list contains multiple sublists that describe
s
dataset_types (list): The type of dataset. The list contains the types of the inputs.
dataset_shapes (list): The shapes of dataset. The list contains multiple sublists that describe
the shape of the inputs.
queue_name (str): The identification of dataset channel which specifies the dataset channel to supply
data for the network.
...
...
mindspore/ops/operations/_grad_ops.py
浏览文件 @
2c2fe9be
...
...
@@ -663,16 +663,16 @@ class MaxPoolGradGrad(_PoolGrad):
strides (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
the height and width of movement are both strides, or a tuple of two int numbers that
represent height and width of movement respectively. Default: 1.
padding (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
padding (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
- same: Adopts the way of completion.
Output height and width
will be the same as
the input. T
otal number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible.
- same: Adopts the way of completion.
The height and width of the output
will be the same as
the input. T
he total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible.
Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possibl
y
largest height and width of output
will be return without padding. Extra pixels will be discarded.
- valid: Adopts the way of discarding. The possibl
e
largest height and width of output
will be return
ed
without padding. Extra pixels will be discarded.
Inputs:
- **origin_input** (Tensor) - Tensor with data format "NCHW", data type should be float16.
...
...
@@ -736,16 +736,16 @@ class MaxPoolGradGradWithArgmax(_PoolGrad):
strides (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
the height and width of movement are both strides, or a tuple of two int numbers that
represent height and width of movement respectively. Default: 1.
padding (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
padding (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
- same: Adopts the way of completion.
Output height and width
will be the same as
the input. T
otal number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible.
- same: Adopts the way of completion.
The height and width of the output
will be the same as
the input. T
he total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible.
Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possibl
y
largest height and width of output
will be return without padding. Extra pixels will be discarded.
- valid: Adopts the way of discarding. The possibl
e
largest height and width of output
will be return
ed
without padding. Extra pixels will be discarded.
Inputs:
- **x** (Tensor) - Tensor with data format "NCHW", data type should be float16.
...
...
mindspore/ops/operations/nn_ops.py
浏览文件 @
2c2fe9be
...
...
@@ -756,11 +756,11 @@ class Conv2D(PrimitiveWithInfer):
out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,
where :math:`ccor` is cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to :math:`i`-th channel of the :math:`j`-th
where :math:`ccor` is
the
cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to
the
:math:`i`-th channel of the :math:`j`-th
filter and :math:`out_{j}` corresponds to the :math:`j`-th channel of the output. :math:`W_{ij}` is a slice
of kernel and it has shape :math:`(\text{ks_h}, \text{ks_w})`, where :math:`\text{ks_h}` and
:math:`\text{ks_w}` are height and width of the convolution kernel. The full kernel has shape
:math:`\text{ks_w}` are
the
height and width of the convolution kernel. The full kernel has shape
:math:`(C_{out}, C_{in} // \text{group}, \text{ks_h}, \text{ks_w})`, where group is the group number
to split the input in the channel dimension.
...
...
@@ -1029,7 +1029,7 @@ class _Pool(PrimitiveWithInfer):
of two `int` for height and width. Default: 1.
strides (Union[int, tuple[int]]): The stride of the window, that should be
a tuple of two `int` for height and width. Default: 1.
padding (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
padding (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
"""
...
...
@@ -1104,16 +1104,16 @@ class MaxPool(_Pool):
strides (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
the height and width of movement are both strides, or a tuple of two int numbers that
represent height and width of movement respectively. Default: 1.
padding (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
padding (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
- same: Adopts the way of completion.
Output height and width
will be the same as
the input. T
otal number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible.
- same: Adopts the way of completion.
The height and width of the output
will be the same as
the input. T
he total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible.
Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possibl
y
largest height and width of output
will be return without padding. Extra pixels will be discarded.
- valid: Adopts the way of discarding. The possibl
e
largest height and width of output
will be return
ed
without padding. Extra pixels will be discarded.
Inputs:
- **input** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.
...
...
@@ -1151,16 +1151,16 @@ class MaxPoolWithArgmax(_Pool):
strides (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
the height and width of movement are both strides, or a tuple of two int numbers that
represent height and width of movement respectively. Default: 1.
padding (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
padding (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
- same: Adopts the way of completion.
Output height and width
will be the same as
the input. T
otal number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible.
- same: Adopts the way of completion.
The height and width of the output
will be the same as
the input. T
he total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible.
Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possibl
y
largest height and width of output
will be return without padding. Extra pixels will be discarded.
- valid: Adopts the way of discarding. The possibl
e
largest height and width of output
will be return
ed
without padding. Extra pixels will be discarded.
Inputs:
...
...
@@ -1233,16 +1233,16 @@ class AvgPool(_Pool):
strides (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
the height and width of movement are both strides, or a tuple of two int numbers that
represent height and width of movement respectively. Default: 1.
padding (str): The optional value
s
for pad mode, is "same" or "valid", not case sensitive.
padding (str): The optional value for pad mode, is "same" or "valid", not case sensitive.
Default: "valid".
- same: Adopts the way of completion.
Output height and width
will be the same as
the input. T
otal number of padding will be calculated for
horizontal and vertical
direction and evenly distributed to top and bottom, left and right if possible.
- same: Adopts the way of completion.
The height and width of the output
will be the same as
the input. T
he total number of padding will be calculated in
horizontal and vertical
direction
s
and evenly distributed to top and bottom, left and right if possible.
Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possibl
y
largest height and width of output
will be return without padding. Extra pixels will be discarded.
- valid: Adopts the way of discarding. The possibl
e
largest height and width of output
will be return
ed
without padding. Extra pixels will be discarded.
Inputs:
- **input** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录