Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
483947c4
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 2 年 前同步成功
通知
2325
Star
20933
Fork
5424
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
483947c4
编写于
11月 06, 2017
作者:
D
dangqingqing
浏览文件
操作
浏览文件
下载
差异文件
Update and Resolve conflicts
上级
4c9be1a9
f8a6bda8
变更
140
显示空白变更内容
内联
并排
Showing
140 changed file
with
2966 addition
and
1408 deletion
+2966
-1408
paddle/framework/CMakeLists.txt
paddle/framework/CMakeLists.txt
+2
-1
paddle/framework/executor.cc
paddle/framework/executor.cc
+7
-1
paddle/framework/framework.proto
paddle/framework/framework.proto
+8
-0
paddle/framework/lod_rank_table.cc
paddle/framework/lod_rank_table.cc
+48
-0
paddle/framework/lod_rank_table.h
paddle/framework/lod_rank_table.h
+55
-0
paddle/framework/lod_tensor.cc
paddle/framework/lod_tensor.cc
+38
-0
paddle/framework/lod_tensor.h
paddle/framework/lod_tensor.h
+6
-0
paddle/framework/lod_tensor_array.h
paddle/framework/lod_tensor_array.h
+23
-0
paddle/framework/lod_tensor_test.cc
paddle/framework/lod_tensor_test.cc
+42
-0
paddle/framework/operator.h
paddle/framework/operator.h
+0
-2
paddle/framework/var_desc.cc
paddle/framework/var_desc.cc
+22
-4
paddle/framework/var_desc.h
paddle/framework/var_desc.h
+1
-0
paddle/gserver/layers/MKLDNNAddtoLayer.cpp
paddle/gserver/layers/MKLDNNAddtoLayer.cpp
+154
-0
paddle/gserver/layers/MKLDNNAddtoLayer.h
paddle/gserver/layers/MKLDNNAddtoLayer.h
+110
-0
paddle/gserver/layers/MKLDNNLayer.cpp
paddle/gserver/layers/MKLDNNLayer.cpp
+8
-6
paddle/gserver/layers/MKLDNNLayer.h
paddle/gserver/layers/MKLDNNLayer.h
+5
-2
paddle/gserver/tests/MKLDNNTester.cpp
paddle/gserver/tests/MKLDNNTester.cpp
+3
-3
paddle/gserver/tests/test_MKLDNN.cpp
paddle/gserver/tests/test_MKLDNN.cpp
+38
-5
paddle/operators/CMakeLists.txt
paddle/operators/CMakeLists.txt
+2
-0
paddle/operators/accuracy_op.cc
paddle/operators/accuracy_op.cc
+13
-9
paddle/operators/activation_op.cc
paddle/operators/activation_op.cc
+24
-24
paddle/operators/adadelta_op.cc
paddle/operators/adadelta_op.cc
+16
-18
paddle/operators/adagrad_op.cc
paddle/operators/adagrad_op.cc
+8
-4
paddle/operators/adam_op.cc
paddle/operators/adam_op.cc
+13
-16
paddle/operators/adamax_op.cc
paddle/operators/adamax_op.cc
+9
-13
paddle/operators/auc_op.cc
paddle/operators/auc_op.cc
+15
-16
paddle/operators/batch_norm_op.cc
paddle/operators/batch_norm_op.cc
+16
-9
paddle/operators/cast_op.cc
paddle/operators/cast_op.cc
+9
-5
paddle/operators/clip_op.cc
paddle/operators/clip_op.cc
+4
-1
paddle/operators/concat_op.cc
paddle/operators/concat_op.cc
+17
-13
paddle/operators/cond_op.cc
paddle/operators/cond_op.cc
+6
-5
paddle/operators/conv2d_op.cc
paddle/operators/conv2d_op.cc
+18
-14
paddle/operators/conv2d_transpose_op.cc
paddle/operators/conv2d_transpose_op.cc
+11
-7
paddle/operators/conv_cudnn_op.cc
paddle/operators/conv_cudnn_op.cc
+1
-1
paddle/operators/conv_shift_op.cc
paddle/operators/conv_shift_op.cc
+5
-6
paddle/operators/cos_sim_op.cc
paddle/operators/cos_sim_op.cc
+7
-6
paddle/operators/crf_decoding_op.cc
paddle/operators/crf_decoding_op.cc
+136
-0
paddle/operators/crf_decoding_op.h
paddle/operators/crf_decoding_op.h
+127
-0
paddle/operators/crop_op.cc
paddle/operators/crop_op.cc
+22
-21
paddle/operators/cross_entropy_op.cc
paddle/operators/cross_entropy_op.cc
+10
-8
paddle/operators/decayed_adagrad_op.cc
paddle/operators/decayed_adagrad_op.cc
+10
-3
paddle/operators/dropout_op.cc
paddle/operators/dropout_op.cc
+8
-6
paddle/operators/dynamic_recurrent_op.cc
paddle/operators/dynamic_recurrent_op.cc
+10
-4
paddle/operators/elementwise_add_op.cc
paddle/operators/elementwise_add_op.cc
+1
-1
paddle/operators/elementwise_div_op.cc
paddle/operators/elementwise_div_op.cc
+1
-1
paddle/operators/elementwise_mul_op.cc
paddle/operators/elementwise_mul_op.cc
+1
-1
paddle/operators/elementwise_op.h
paddle/operators/elementwise_op.h
+30
-25
paddle/operators/elementwise_sub_op.cc
paddle/operators/elementwise_sub_op.cc
+1
-1
paddle/operators/feed_op.cc
paddle/operators/feed_op.cc
+7
-2
paddle/operators/fetch_op.cc
paddle/operators/fetch_op.cc
+7
-2
paddle/operators/fill_constant_batch_size_like_op.cc
paddle/operators/fill_constant_batch_size_like_op.cc
+7
-2
paddle/operators/fill_constant_op.cc
paddle/operators/fill_constant_op.cc
+6
-1
paddle/operators/fill_zeros_like_op.cc
paddle/operators/fill_zeros_like_op.cc
+5
-3
paddle/operators/gather_op.cc
paddle/operators/gather_op.cc
+20
-3
paddle/operators/gaussian_random_op.cc
paddle/operators/gaussian_random_op.cc
+24
-10
paddle/operators/gru_unit_op.cc
paddle/operators/gru_unit_op.cc
+22
-17
paddle/operators/huber_loss_op.cc
paddle/operators/huber_loss_op.cc
+4
-2
paddle/operators/increment_op.cc
paddle/operators/increment_op.cc
+8
-4
paddle/operators/l1_norm_op.cc
paddle/operators/l1_norm_op.cc
+1
-1
paddle/operators/linear_chain_crf_op.cc
paddle/operators/linear_chain_crf_op.cc
+45
-43
paddle/operators/linear_chain_crf_op.h
paddle/operators/linear_chain_crf_op.h
+2
-2
paddle/operators/load_op.cc
paddle/operators/load_op.cc
+8
-4
paddle/operators/lod_rank_table_op.cc
paddle/operators/lod_rank_table_op.cc
+80
-0
paddle/operators/lookup_table_op.cc
paddle/operators/lookup_table_op.cc
+16
-10
paddle/operators/lrn_op.cc
paddle/operators/lrn_op.cc
+41
-43
paddle/operators/lstm_op.cc
paddle/operators/lstm_op.cc
+31
-34
paddle/operators/lstm_unit_op.cc
paddle/operators/lstm_unit_op.cc
+12
-7
paddle/operators/margin_rank_loss_op.cc
paddle/operators/margin_rank_loss_op.cc
+11
-10
paddle/operators/matmul_op.cc
paddle/operators/matmul_op.cc
+6
-2
paddle/operators/mean_op.cc
paddle/operators/mean_op.cc
+5
-1
paddle/operators/minus_op.cc
paddle/operators/minus_op.cc
+5
-3
paddle/operators/modified_huber_loss_op.cc
paddle/operators/modified_huber_loss_op.cc
+20
-12
paddle/operators/momentum_op.cc
paddle/operators/momentum_op.cc
+15
-9
paddle/operators/mul_op.cc
paddle/operators/mul_op.cc
+8
-3
paddle/operators/multiplex_op.cc
paddle/operators/multiplex_op.cc
+5
-3
paddle/operators/name_convention.md
paddle/operators/name_convention.md
+12
-8
paddle/operators/nccl_op.cc
paddle/operators/nccl_op.cc
+32
-13
paddle/operators/pad_op.cc
paddle/operators/pad_op.cc
+22
-19
paddle/operators/pool_op.cc
paddle/operators/pool_op.cc
+70
-57
paddle/operators/pool_with_index_op.cc
paddle/operators/pool_with_index_op.cc
+75
-60
paddle/operators/precision_recall_op.cc
paddle/operators/precision_recall_op.cc
+31
-29
paddle/operators/prelu_op.cc
paddle/operators/prelu_op.cc
+13
-6
paddle/operators/proximal_adagrad_op.cc
paddle/operators/proximal_adagrad_op.cc
+10
-6
paddle/operators/proximal_gd_op.cc
paddle/operators/proximal_gd_op.cc
+9
-5
paddle/operators/rank_loss_op.cc
paddle/operators/rank_loss_op.cc
+14
-14
paddle/operators/recurrent_op.cc
paddle/operators/recurrent_op.cc
+9
-7
paddle/operators/reduce_op.cc
paddle/operators/reduce_op.cc
+10
-7
paddle/operators/reshape_op.cc
paddle/operators/reshape_op.cc
+6
-3
paddle/operators/rmsprop_op.cc
paddle/operators/rmsprop_op.cc
+15
-14
paddle/operators/save_op.cc
paddle/operators/save_op.cc
+10
-5
paddle/operators/scale_op.cc
paddle/operators/scale_op.cc
+8
-5
paddle/operators/seq_expand_op.cc
paddle/operators/seq_expand_op.cc
+3
-1
paddle/operators/sequence_concat_op.cc
paddle/operators/sequence_concat_op.cc
+37
-33
paddle/operators/sequence_conv_op.cc
paddle/operators/sequence_conv_op.cc
+13
-11
paddle/operators/sequence_pool_op.cc
paddle/operators/sequence_pool_op.cc
+29
-26
paddle/operators/sequence_softmax_op.cc
paddle/operators/sequence_softmax_op.cc
+10
-6
paddle/operators/sgd_op.cc
paddle/operators/sgd_op.cc
+8
-6
paddle/operators/sigmoid_cross_entropy_with_logits_op.cc
paddle/operators/sigmoid_cross_entropy_with_logits_op.cc
+11
-9
paddle/operators/sign_op.cc
paddle/operators/sign_op.cc
+3
-2
paddle/operators/smooth_l1_loss_op.cc
paddle/operators/smooth_l1_loss_op.cc
+9
-6
paddle/operators/softmax_op.cc
paddle/operators/softmax_op.cc
+10
-7
paddle/operators/softmax_with_cross_entropy_op.cc
paddle/operators/softmax_with_cross_entropy_op.cc
+16
-14
paddle/operators/split_op.cc
paddle/operators/split_op.cc
+24
-16
paddle/operators/squared_l2_distance_op.cc
paddle/operators/squared_l2_distance_op.cc
+16
-13
paddle/operators/squared_l2_norm_op.cc
paddle/operators/squared_l2_norm_op.cc
+2
-2
paddle/operators/sum_op.cc
paddle/operators/sum_op.cc
+7
-5
paddle/operators/top_k_op.cc
paddle/operators/top_k_op.cc
+12
-12
paddle/operators/transpose_op.cc
paddle/operators/transpose_op.cc
+7
-4
paddle/operators/uniform_random_op.cc
paddle/operators/uniform_random_op.cc
+21
-9
paddle/pybind/protobuf.cc
paddle/pybind/protobuf.cc
+3
-1
paddle/pybind/pybind.cc
paddle/pybind/pybind.cc
+34
-0
paddle/scripts/docker/build.sh
paddle/scripts/docker/build.sh
+1
-0
python/paddle/trainer/config_parser.py
python/paddle/trainer/config_parser.py
+12
-1
python/paddle/v2/framework/backward.py
python/paddle/v2/framework/backward.py
+14
-2
python/paddle/v2/framework/framework.py
python/paddle/v2/framework/framework.py
+8
-2
python/paddle/v2/framework/io.py
python/paddle/v2/framework/io.py
+35
-29
python/paddle/v2/framework/layer_helper.py
python/paddle/v2/framework/layer_helper.py
+19
-16
python/paddle/v2/framework/layers.py
python/paddle/v2/framework/layers.py
+70
-50
python/paddle/v2/framework/net_drawer.py
python/paddle/v2/framework/net_drawer.py
+3
-3
python/paddle/v2/framework/nets.py
python/paddle/v2/framework/nets.py
+22
-22
python/paddle/v2/framework/optimizer.py
python/paddle/v2/framework/optimizer.py
+7
-5
python/paddle/v2/framework/tests/test_crf_decoding_op.py
python/paddle/v2/framework/tests/test_crf_decoding_op.py
+146
-0
python/paddle/v2/framework/tests/test_executor_and_mul.py
python/paddle/v2/framework/tests/test_executor_and_mul.py
+2
-2
python/paddle/v2/framework/tests/test_fit_a_line.py
python/paddle/v2/framework/tests/test_fit_a_line.py
+20
-16
python/paddle/v2/framework/tests/test_image_classification_layer.py
...dle/v2/framework/tests/test_image_classification_layer.py
+35
-31
python/paddle/v2/framework/tests/test_image_classification_train.py
...dle/v2/framework/tests/test_image_classification_train.py
+85
-80
python/paddle/v2/framework/tests/test_inference_model_io.py
python/paddle/v2/framework/tests/test_inference_model_io.py
+10
-10
python/paddle/v2/framework/tests/test_layers.py
python/paddle/v2/framework/tests/test_layers.py
+53
-36
python/paddle/v2/framework/tests/test_lod_rank_table.py
python/paddle/v2/framework/tests/test_lod_rank_table.py
+29
-0
python/paddle/v2/framework/tests/test_lod_tensor_array.py
python/paddle/v2/framework/tests/test_lod_tensor_array.py
+38
-0
python/paddle/v2/framework/tests/test_operator_desc.py
python/paddle/v2/framework/tests/test_operator_desc.py
+2
-2
python/paddle/v2/framework/tests/test_parameter.py
python/paddle/v2/framework/tests/test_parameter.py
+2
-2
python/paddle/v2/framework/tests/test_program.py
python/paddle/v2/framework/tests/test_program.py
+9
-9
python/paddle/v2/framework/tests/test_recognize_digits_conv.py
...n/paddle/v2/framework/tests/test_recognize_digits_conv.py
+25
-19
python/paddle/v2/framework/tests/test_recognize_digits_mlp.py
...on/paddle/v2/framework/tests/test_recognize_digits_mlp.py
+28
-19
python/paddle/v2/framework/tests/test_recommender_system.py
python/paddle/v2/framework/tests/test_recommender_system.py
+66
-64
python/paddle/v2/framework/tests/test_recurrent_op.py
python/paddle/v2/framework/tests/test_recurrent_op.py
+23
-14
python/paddle/v2/framework/tests/test_understand_sentiment_conv.py
...ddle/v2/framework/tests/test_understand_sentiment_conv.py
+3
-3
python/paddle/v2/framework/tests/test_variable.py
python/paddle/v2/framework/tests/test_variable.py
+2
-2
python/paddle/v2/framework/tests/test_word2vec.py
python/paddle/v2/framework/tests/test_word2vec.py
+34
-33
未找到文件。
paddle/framework/CMakeLists.txt
浏览文件 @
483947c4
...
...
@@ -45,8 +45,9 @@ add_custom_command(TARGET framework_py_proto POST_BUILD
cc_library
(
backward SRCS backward.cc DEPS net_op
)
cc_test
(
backward_test SRCS backward_test.cc DEPS backward recurrent_op device_context fill_constant_op
)
cc_library
(
lod_rank_table SRCS lod_rank_table.cc DEPS lod_tensor
)
cc_library
(
executor SRCS executor.cc DEPS op_registry device_context scope framework_proto backward glog
)
cc_library
(
executor SRCS executor.cc DEPS op_registry device_context scope framework_proto backward glog
lod_rank_table
)
cc_library
(
prune SRCS prune.cc DEPS framework_proto
)
cc_test
(
prune_test SRCS prune_test.cc DEPS op_info prune recurrent_op device_context
)
...
...
paddle/framework/executor.cc
浏览文件 @
483947c4
...
...
@@ -21,7 +21,9 @@ limitations under the License. */
#include <vector>
#include "paddle/framework/feed_fetch_type.h"
#include "paddle/framework/lod_rank_table.h"
#include "paddle/framework/lod_tensor.h"
#include "paddle/framework/lod_tensor_array.h"
#include "paddle/framework/op_registry.h"
#include "paddle/framework/scope.h"
...
...
@@ -70,10 +72,14 @@ static void CreateTensor(Variable* var, VarDesc::VarType var_type) {
var
->
GetMutable
<
FeedFetchList
>
();
}
else
if
(
var_type
==
VarDesc
::
STEP_SCOPES
)
{
var
->
GetMutable
<
std
::
vector
<
framework
::
Scope
>>
();
}
else
if
(
var_type
==
VarDesc
::
LOD_RANK_TABLE
)
{
var
->
GetMutable
<
LoDRankTable
>
();
}
else
if
(
var_type
==
VarDesc
::
LOD_TENSOR_ARRAY
)
{
var
->
GetMutable
<
LoDTensorArray
>
();
}
else
{
PADDLE_THROW
(
"Variable type %d is not in "
"[LoDTensor, SelectedRows, FEED_MINIBATCH, FETCH_LIST]"
,
"[LoDTensor, SelectedRows, FEED_MINIBATCH, FETCH_LIST
, LOD_RANK_TABLE
]"
,
var_type
);
}
}
...
...
paddle/framework/framework.proto
浏览文件 @
483947c4
...
...
@@ -109,6 +109,11 @@ message LoDTensorDesc {
optional
int32
lod_level
=
2
[
default
=
0
];
}
message
LoDTensorArrayDesc
{
required
TensorDesc
tensor
=
1
;
optional
int32
lod_level
=
2
[
default
=
0
];
}
message
VarDesc
{
enum
VarType
{
LOD_TENSOR
=
1
;
...
...
@@ -116,11 +121,14 @@ message VarDesc {
FEED_MINIBATCH
=
3
;
FETCH_LIST
=
4
;
STEP_SCOPES
=
5
;
LOD_RANK_TABLE
=
6
;
LOD_TENSOR_ARRAY
=
7
;
}
required
string
name
=
1
;
required
VarType
type
=
2
;
optional
LoDTensorDesc
lod_tensor
=
3
;
optional
TensorDesc
selected_rows
=
4
;
optional
LoDTensorArrayDesc
tensor_array
=
6
;
optional
bool
persistable
=
5
[
default
=
false
];
}
...
...
paddle/framework/lod_rank_table.cc
0 → 100644
浏览文件 @
483947c4
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/lod_rank_table.h"
namespace
paddle
{
namespace
framework
{
void
LoDRankTable
::
Reset
(
const
LoD
&
lod
,
size_t
level
)
{
this
->
coarse_lod_
.
clear
();
this
->
items_
.
clear
();
PADDLE_ENFORCE
(
level
<
lod
.
size
(),
"Cannot rank lod since the level %d is less than lod size %d"
,
level
,
lod
.
size
());
coarse_lod_
.
reserve
(
level
);
for
(
size_t
i
=
0
;
i
<
level
;
++
i
)
{
coarse_lod_
.
push_back
(
lod
[
i
]);
}
auto
&
vec
=
lod
[
level
];
for
(
size_t
i
=
0
;
i
<
vec
.
size
()
-
1
;
++
i
)
{
TableItem
item
;
item
.
index
=
i
;
item
.
length
=
vec
[
i
+
1
]
-
vec
[
i
];
items_
.
emplace_back
(
item
);
}
// NOTE(yuyang18):
//
// The time complexity of stable_sort is O(N*log(N)) if additional memory is
// available. It is easy to debug and unit test when using `stable_sort`
// instead of `sort`. Also, the items of a rank table will not be too large.
std
::
stable_sort
(
items_
.
begin
(),
items_
.
end
(),
[](
const
TableItem
&
a
,
const
TableItem
&
b
)
{
return
a
.
length
>
b
.
length
;
});
}
}
// namespace framework
}
// namespace paddle
paddle/framework/lod_rank_table.h
0 → 100644
浏览文件 @
483947c4
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/lod_tensor.h"
namespace
paddle
{
namespace
framework
{
// LoD Rank Table stores the `level` of `lod` which is ordered by sequence
// length in descending order. It is useful when implement dynamic RNN and is
// shared by dynamic RNN memory, dynamic RNN slice input and dynamic RNN slice
// output operators.
//
// The table item contains two element. The length of sequence and the index of
// sequence in that level.
//
// LoDRankTable also stores the coarse_lod, which is the lod information whose
// level is less than input level, in order to restore the output LoD
// information.
class
LoDRankTable
{
public:
struct
TableItem
{
size_t
index
;
size_t
length
;
};
LoDRankTable
()
{}
void
Reset
(
const
LoD
&
lod
,
size_t
level
);
const
std
::
vector
<
TableItem
>&
items
()
const
{
return
this
->
items_
;
}
const
LoD
&
coarse_lod
()
const
{
return
this
->
coarse_lod_
;
}
size_t
level
()
const
{
return
coarse_lod_
.
size
();
}
private:
LoD
coarse_lod_
;
std
::
vector
<
TableItem
>
items_
;
};
}
// namespace framework
}
// namespace paddle
paddle/framework/lod_tensor.cc
浏览文件 @
483947c4
...
...
@@ -135,5 +135,43 @@ void LoDTensor::ShrinkInLevel(size_t level, size_t elem_begin,
PADDLE_ENFORCE_LT
(
begin
,
end
,
"Cannot shrink, the result tensor is empty."
);
ShareDataWith
(
Slice
(
begin
,
end
));
}
void
GetFineGrainedLoDLength
(
const
LoD
&
lod
,
size_t
start_idx
,
size_t
end_idx
,
std
::
vector
<
std
::
vector
<
size_t
>>*
lod_length
,
size_t
*
start_offset
)
{
lod_length
->
clear
();
PADDLE_ENFORCE
(
start_idx
<
lod
.
size
()
-
1
,
"start_idx should be >= 0 and < lod.size() - 1."
);
PADDLE_ENFORCE
(
end_idx
<
lod
.
size
(),
"end_idx should be >= 0 and < lod.size()."
);
PADDLE_ENFORCE_LE
(
start_idx
,
end_idx
,
"start_idx should be less than end_idx."
);
for
(
size_t
level_idx
=
0
;
level_idx
<
lod
.
size
();
++
level_idx
)
{
std
::
vector
<
size_t
>
level_lens
;
for
(
size_t
i
=
start_idx
;
i
<
end_idx
;
++
i
)
{
level_lens
.
push_back
(
lod
[
level_idx
][
i
+
1
]
-
lod
[
level_idx
][
i
]);
}
lod_length
->
emplace_back
(
level_lens
);
start_idx
=
lod
[
level_idx
][
start_idx
];
end_idx
=
lod
[
level_idx
][
end_idx
];
}
*
start_offset
=
start_idx
;
}
void
AppendLoD
(
LoD
*
lod
,
const
std
::
vector
<
std
::
vector
<
size_t
>>&
lod_length
)
{
PADDLE_ENFORCE_EQ
(
lod
->
size
(),
lod_length
.
size
(),
"The lod_length should has the same size with the appended lod."
);
for
(
size_t
i
=
0
;
i
<
lod
->
size
();
++
i
)
{
auto
&
level
=
(
*
lod
)[
i
];
if
(
level
.
empty
())
{
level
.
push_back
(
0
);
}
for
(
size_t
len
:
lod_length
[
i
])
{
level
.
push_back
(
level
.
back
()
+
len
);
}
}
}
}
// namespace framework
}
// namespace paddle
paddle/framework/lod_tensor.h
浏览文件 @
483947c4
...
...
@@ -181,5 +181,11 @@ LoDTensor LodExpand(const LoDTensor& source, const LoD& lod, size_t level,
return
tensor
;
}
void
GetFineGrainedLoDLength
(
const
LoD
&
lod
,
size_t
start_idx
,
size_t
end_idx
,
std
::
vector
<
std
::
vector
<
size_t
>>*
lod_length
,
size_t
*
start_offset
);
void
AppendLoD
(
LoD
*
lod
,
const
std
::
vector
<
std
::
vector
<
size_t
>>&
lod_length
);
}
// namespace framework
}
// namespace paddle
paddle/framework/lod_tensor_array.h
0 → 100644
浏览文件 @
483947c4
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/framework/lod_tensor.h"
namespace
paddle
{
namespace
framework
{
using
LoDTensorArray
=
std
::
vector
<
LoDTensor
>
;
}
}
// namespace paddle
paddle/framework/lod_tensor_test.cc
浏览文件 @
483947c4
...
...
@@ -144,5 +144,47 @@ TEST(LodExpand, test) {
}
}
TEST
(
LoD
,
GetFineGrainedLoDLength
)
{
LoD
lod
;
lod
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
2
,
4
,
5
});
lod
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
1
,
6
,
8
,
10
,
11
});
lod
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
2
,
5
,
7
,
10
,
12
,
15
,
17
,
20
,
24
,
26
,
29
});
std
::
vector
<
std
::
vector
<
size_t
>>
lod_length
;
size_t
start_offset
;
paddle
::
framework
::
GetFineGrainedLoDLength
(
lod
,
1
,
2
,
&
lod_length
,
&
start_offset
);
std
::
vector
<
std
::
vector
<
size_t
>>
expected
;
expected
.
push_back
(
std
::
vector
<
size_t
>
{
2
});
expected
.
push_back
(
std
::
vector
<
size_t
>
{
2
,
2
});
expected
.
push_back
(
std
::
vector
<
size_t
>
{
2
,
3
,
4
,
2
});
EXPECT_EQ
(
lod_length
,
expected
);
EXPECT_EQ
(
start_offset
,
15UL
);
}
TEST
(
LoD
,
AppendLoD
)
{
std
::
vector
<
std
::
vector
<
size_t
>>
lod_lens
;
lod_lens
.
push_back
(
std
::
vector
<
size_t
>
{
2
});
lod_lens
.
push_back
(
std
::
vector
<
size_t
>
{
2
,
2
});
lod_lens
.
push_back
(
std
::
vector
<
size_t
>
{
2
,
3
,
4
,
2
});
LoD
origin
;
origin
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
2
});
origin
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
1
,
6
});
origin
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
2
,
5
,
7
,
10
,
12
,
15
});
paddle
::
framework
::
AppendLoD
(
&
origin
,
lod_lens
);
LoD
expected
;
expected
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
2
,
4
});
expected
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
1
,
6
,
8
,
10
});
expected
.
push_back
(
std
::
vector
<
size_t
>
{
0
,
2
,
5
,
7
,
10
,
12
,
15
,
17
,
20
,
24
,
26
});
EXPECT_EQ
(
origin
,
expected
);
}
}
// namespace framework
}
// namespace paddle
paddle/framework/operator.h
浏览文件 @
483947c4
...
...
@@ -408,7 +408,6 @@ class OperatorWithKernel : public OperatorBase {
// indicate kernel DataType by input data. Defaultly all input data must be
// same.
virtual
DataType
IndicateDataType
(
const
ExecutionContext
&
ctx
)
const
{
VLOG
(
3
)
<<
"Default IndicateDataType "
<<
this
->
Type
();
auto
&
scope
=
ctx
.
scope
();
int
data_type
=
-
1
;
for
(
auto
&
input
:
this
->
inputs_
)
{
...
...
@@ -425,7 +424,6 @@ class OperatorWithKernel : public OperatorBase {
}
if
(
t
!=
nullptr
)
{
int
tmp
=
static_cast
<
int
>
(
ToDataType
(
t
->
type
()));
VLOG
(
3
)
<<
"Input "
<<
ipt_name
<<
" with data_type "
<<
tmp
;
PADDLE_ENFORCE
(
tmp
==
data_type
||
data_type
==
-
1
,
"DataType of Paddle Op %s must be the same."
,
Type
());
...
...
paddle/framework/var_desc.cc
浏览文件 @
483947c4
...
...
@@ -37,13 +37,27 @@ std::vector<int64_t> VarDescBind::Shape() const {
DataType
VarDescBind
::
GetDataType
()
const
{
return
tensor_desc
().
data_type
();
}
void
VarDescBind
::
SetLoDLevel
(
int32_t
lod_level
)
{
PADDLE_ENFORCE
(
desc_
.
type
()
==
VarDesc
::
LOD_TENSOR
);
switch
(
desc_
.
type
())
{
case
VarDesc
::
LOD_TENSOR
:
desc_
.
mutable_lod_tensor
()
->
set_lod_level
(
lod_level
);
break
;
case
VarDesc
::
LOD_TENSOR_ARRAY
:
desc_
.
mutable_tensor_array
()
->
set_lod_level
(
lod_level
);
break
;
default:
PADDLE_THROW
(
"Tensor type=%d does not support LoDLevel"
,
desc_
.
type
());
}
}
int32_t
VarDescBind
::
GetLodLevel
()
const
{
PADDLE_ENFORCE
(
desc_
.
type
()
==
VarDesc
::
LOD_TENSOR
);
switch
(
desc_
.
type
())
{
case
VarDesc
::
LOD_TENSOR
:
return
desc_
.
lod_tensor
().
lod_level
();
case
VarDesc
::
LOD_TENSOR_ARRAY
:
return
desc_
.
tensor_array
().
lod_level
();
default:
PADDLE_THROW
(
"Tensor type=%d does not support LoDLevel"
,
desc_
.
type
());
}
}
const
TensorDesc
&
VarDescBind
::
tensor_desc
()
const
{
...
...
@@ -53,6 +67,8 @@ const TensorDesc &VarDescBind::tensor_desc() const {
return
desc_
.
selected_rows
();
case
VarDesc
::
LOD_TENSOR
:
return
desc_
.
lod_tensor
().
tensor
();
case
VarDesc
::
LOD_TENSOR_ARRAY
:
return
desc_
.
tensor_array
().
tensor
();
default:
PADDLE_THROW
(
"Unexpected branch."
);
}
...
...
@@ -66,6 +82,8 @@ TensorDesc *VarDescBind::mutable_tensor_desc() {
return
desc_
.
mutable_selected_rows
();
case
VarDesc
::
LOD_TENSOR
:
return
desc_
.
mutable_lod_tensor
()
->
mutable_tensor
();
case
VarDesc
::
LOD_TENSOR_ARRAY
:
return
desc_
.
mutable_tensor_array
()
->
mutable_tensor
();
default:
PADDLE_THROW
(
"Unexpected branch."
);
}
...
...
paddle/framework/var_desc.h
浏览文件 @
483947c4
...
...
@@ -15,6 +15,7 @@ limitations under the License. */
#pragma once
#include <vector>
#include "glog/logging.h"
#include "paddle/framework/framework.pb.h"
namespace
paddle
{
...
...
paddle/gserver/layers/MKLDNNAddtoLayer.cpp
0 → 100644
浏览文件 @
483947c4
/* Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "MKLDNNAddtoLayer.h"
using
namespace
mkldnn
;
// NOLINT
namespace
paddle
{
REGISTER_LAYER
(
mkldnn_addto
,
MKLDNNAddtoLayer
);
bool
MKLDNNAddtoLayer
::
init
(
const
LayerMap
&
layerMap
,
const
ParameterMap
&
parameterMap
)
{
if
(
!
MKLDNNLayer
::
init
(
layerMap
,
parameterMap
))
{
return
false
;
}
layerSize_
=
getSize
();
for
(
size_t
i
=
0
;
i
<
inputLayers_
.
size
();
i
++
)
{
CHECK_EQ
(
layerSize_
,
inputLayers_
[
i
]
->
getSize
())
<<
"input size must equal"
;
}
if
(
biasParameter_
.
get
()
!=
NULL
)
{
biases_
=
std
::
unique_ptr
<
Weight
>
(
new
Weight
(
1
,
layerSize_
,
biasParameter_
,
0
));
}
return
true
;
}
void
MKLDNNAddtoLayer
::
reshape
(
int
&
bs
,
int
&
ic
,
int
&
ih
,
int
&
iw
,
int
oc
,
int
&
oh
,
int
&
ow
)
{
CHECK_EQ
(
layerSize_
,
getSize
())
<<
"this layer size can not be changed"
;
reshapeInput
(
bs
,
ih
,
iw
);
ic
=
inputLayers_
[
0
]
->
getSize
()
/
ih
/
iw
;
CHECK_EQ
((
size_t
)
ic
*
ih
*
iw
,
inputLayers_
[
0
]
->
getSize
());
CHECK_EQ
(
inputElemenCnt_
,
(
size_t
)
bs
*
ic
*
ih
*
iw
);
for
(
size_t
i
=
0
;
i
<
inputLayers_
.
size
();
i
++
)
{
CHECK_EQ
(
int64_t
(
bs
),
inputLayers_
[
i
]
->
getOutput
().
getBatchSize
());
CHECK_EQ
(
layerSize_
,
inputLayers_
[
i
]
->
getSize
());
}
oc
=
ic
;
oh
=
ih
;
ow
=
iw
;
reshapeOutput
(
oh
,
ow
);
resizeOutput
(
bs
,
oc
*
oh
*
ow
);
printSizeInfo
();
}
void
MKLDNNAddtoLayer
::
resetFwd
(
std
::
vector
<
primitive
>&
pipeline
,
MKLDNNMatrixPtr
&
in
,
MKLDNNMatrixPtr
&
wgt
,
MKLDNNMatrixPtr
&
bias
,
MKLDNNMatrixPtr
&
out
)
{
if
(
biases_
)
{
LOG
(
FATAL
)
<<
"not implemented yet"
;
}
resetFwdBuffers
(
inVals_
,
out
);
in
=
inVals_
[
0
];
std
::
shared_ptr
<
sum
::
primitive_desc
>
fwdPD
;
resetFwdPD
(
fwdPD
,
inVals_
,
out
);
resetFwdPipeline
(
pipeline
,
fwdPD
,
inVals_
,
out
);
}
void
MKLDNNAddtoLayer
::
resetBwd
(
std
::
vector
<
primitive
>&
pipeline
,
MKLDNNMatrixPtr
&
in
,
MKLDNNMatrixPtr
&
wgt
,
MKLDNNMatrixPtr
&
bias
,
MKLDNNMatrixPtr
&
out
)
{
resetBwdBuffers
(
inGrads_
,
out
);
in
=
inGrads_
[
0
];
// backward only need share output grad to input grad
for
(
size_t
i
=
0
;
i
<
inGrads_
.
size
();
i
++
)
{
if
(
inGrads_
[
i
]
!=
nullptr
)
{
inGrads_
[
i
]
=
out
;
inputLayers_
[
i
]
->
getOutputGrad
()
->
setData
(
inGrads_
[
i
]
->
getData
());
}
}
}
void
MKLDNNAddtoLayer
::
updateWeights
(
const
UpdateCallback
&
callback
)
{
if
(
biases_
&&
biases_
->
getWGrad
())
{
biases_
->
getParameterPtr
()
->
incUpdate
(
callback
);
}
}
void
MKLDNNAddtoLayer
::
resetFwdBuffers
(
std
::
vector
<
MKLDNNMatrixPtr
>&
inputs
,
MKLDNNMatrixPtr
&
out
)
{
inputs
.
resize
(
inputLayers_
.
size
());
for
(
size_t
i
=
0
;
i
<
inputs
.
size
();
i
++
)
{
resetInValue
(
inputs
[
i
],
nullptr
,
i
);
CHECK
(
inputs
[
i
]);
inputs
[
i
]
->
downSpatial
();
}
for
(
size_t
i
=
1
;
i
<
inputs
.
size
();
i
++
)
{
CHECK_PRIMITIVE_DESC_EQ
(
inputs
[
i
],
inputs
[
0
]
->
getPrimitiveDesc
());
}
resetOutValue
(
out
,
inputs
[
0
]
->
getPrimitiveDesc
());
}
void
MKLDNNAddtoLayer
::
resetFwdPD
(
std
::
shared_ptr
<
sum
::
primitive_desc
>&
pd
,
std
::
vector
<
MKLDNNMatrixPtr
>&
inputs
,
MKLDNNMatrixPtr
out
)
{
std
::
vector
<
double
>
scales
(
inputs
.
size
(),
1.0
);
std
::
vector
<
memory
::
primitive_desc
>
srcPDs
;
for
(
size_t
i
=
0
;
i
<
inputs
.
size
();
i
++
)
{
srcPDs
.
push_back
(
inputs
[
i
]
->
getPrimitiveDesc
());
}
CHECK
(
out
);
pd
.
reset
(
new
sum
::
primitive_desc
(
out
->
getMemoryDesc
(),
scales
,
srcPDs
));
CHECK_PRIMITIVE_DESC_EQ
(
out
,
pd
->
dst_primitive_desc
());
}
void
MKLDNNAddtoLayer
::
resetFwdPipeline
(
std
::
vector
<
primitive
>&
pipeline
,
std
::
shared_ptr
<
sum
::
primitive_desc
>&
pd
,
std
::
vector
<
MKLDNNMatrixPtr
>&
inputs
,
MKLDNNMatrixPtr
&
out
)
{
std
::
vector
<
primitive
::
at
>
srcs
;
for
(
size_t
i
=
0
;
i
<
inputs
.
size
();
i
++
)
{
srcs
.
push_back
(
*
(
inputs
[
i
]));
}
fwd_
.
reset
(
new
sum
(
*
pd
,
srcs
,
*
out
));
pipeline
.
push_back
(
*
fwd_
);
}
void
MKLDNNAddtoLayer
::
resetBwdBuffers
(
std
::
vector
<
MKLDNNMatrixPtr
>&
inputs
,
MKLDNNMatrixPtr
&
out
)
{
CHECK
(
outVal_
);
resetOutGrad
(
out
,
outVal_
->
getPrimitiveDesc
());
CHECK
(
out
);
inputs
.
resize
(
inputLayers_
.
size
());
for
(
size_t
i
=
0
;
i
<
inputs
.
size
();
i
++
)
{
resetInGrad
(
inputs
[
i
],
inVal_
->
getPrimitiveDesc
(),
i
);
CHECK_PRIMITIVE_DESC_EQ
(
inputs
[
i
],
out
->
getPrimitiveDesc
());
}
}
}
// namespace paddle
paddle/gserver/layers/MKLDNNAddtoLayer.h
0 → 100644
浏览文件 @
483947c4
/* Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "MKLDNNLayer.h"
#include "mkldnn.hpp"
namespace
paddle
{
/**
* @brief A subclass of MKLDNNLayer Addto layer.
*
* The config file api is mkldnn_addto
*/
class
MKLDNNAddtoLayer
:
public
MKLDNNLayer
{
protected:
std
::
vector
<
MKLDNNMatrixPtr
>
inVals_
;
std
::
vector
<
MKLDNNMatrixPtr
>
inGrads_
;
// layer size == ic * ih * iw == oc * oh *ow, and can not be changed
size_t
layerSize_
;
// TODO(TJ): this part has not been optimized by MKL-DNN
std
::
unique_ptr
<
Weight
>
biases_
;
public:
explicit
MKLDNNAddtoLayer
(
const
LayerConfig
&
config
)
:
MKLDNNLayer
(
config
)
{}
~
MKLDNNAddtoLayer
()
{}
bool
init
(
const
LayerMap
&
layerMap
,
const
ParameterMap
&
parameterMap
)
override
;
void
reshape
(
int
&
bs
,
int
&
ic
,
int
&
ih
,
int
&
iw
,
int
oc
,
int
&
oh
,
int
&
ow
)
override
;
void
resetFwd
(
std
::
vector
<
mkldnn
::
primitive
>&
pipeline
,
MKLDNNMatrixPtr
&
in
,
MKLDNNMatrixPtr
&
wgt
,
MKLDNNMatrixPtr
&
bias
,
MKLDNNMatrixPtr
&
out
)
override
;
void
resetBwd
(
std
::
vector
<
mkldnn
::
primitive
>&
pipeline
,
MKLDNNMatrixPtr
&
in
,
MKLDNNMatrixPtr
&
wgt
,
MKLDNNMatrixPtr
&
bias
,
MKLDNNMatrixPtr
&
out
)
override
;
void
updateWeights
(
const
UpdateCallback
&
callback
)
override
;
void
printValueFormat
()
override
{
for
(
size_t
i
=
0
;
i
<
inVals_
.
size
();
++
i
)
{
VLOG
(
MKLDNN_FMTS
)
<<
i
<<
" input: "
<<
inVals_
[
i
]
->
getFormat
()
<<
" >>>"
;
}
if
(
outVal_
)
{
VLOG
(
MKLDNN_FMTS
)
<<
outVal_
->
getFormat
()
<<
" >>> "
;
}
if
(
extOutVal_
)
{
VLOG
(
MKLDNN_FMTS
)
<<
extOutVal_
->
getFormat
();
}
}
void
printGradFormat
()
override
{
if
(
extOutGrad_
)
{
VLOG
(
MKLDNN_FMTS
)
<<
extOutGrad_
->
getFormat
();
}
if
(
outGrad_
)
{
VLOG
(
MKLDNN_FMTS
)
<<
outGrad_
->
getFormat
()
<<
" <<< "
;
}
for
(
size_t
i
=
0
;
i
<
inGrads_
.
size
();
++
i
)
{
VLOG
(
MKLDNN_FMTS
)
<<
i
<<
" input: "
<<
inGrads_
[
i
]
->
getFormat
()
<<
"<<<"
;
}
}
protected:
/**
* Forward functions: reset buffers(inputs, output, bias),
* reset primitive descriptor,
* reset pipeline.
*/
void
resetFwdBuffers
(
std
::
vector
<
MKLDNNMatrixPtr
>&
inputs
,
MKLDNNMatrixPtr
&
out
);
void
resetFwdPD
(
std
::
shared_ptr
<
mkldnn
::
sum
::
primitive_desc
>&
pd
,
std
::
vector
<
MKLDNNMatrixPtr
>&
inputs
,
MKLDNNMatrixPtr
out
);
void
resetFwdPipeline
(
std
::
vector
<
mkldnn
::
primitive
>&
pipeline
,
std
::
shared_ptr
<
mkldnn
::
sum
::
primitive_desc
>&
pd
,
std
::
vector
<
MKLDNNMatrixPtr
>&
inputs
,
MKLDNNMatrixPtr
&
out
);
/**
* Backward functions: reset buffers(inputs, output, bias)
*/
void
resetBwdBuffers
(
std
::
vector
<
MKLDNNMatrixPtr
>&
inputs
,
MKLDNNMatrixPtr
&
out
);
};
}
// namespace paddle
paddle/gserver/layers/MKLDNNLayer.cpp
浏览文件 @
483947c4
...
...
@@ -77,7 +77,7 @@ void MKLDNNLayer::forward(PassType passType) {
needResetBwd_
=
true
;
}
if
(
inputLayers_
[
0
]
->
getType
()
==
"data"
)
{
if
(
inputLayers_
[
0
]
->
getType
()
==
"data"
&&
inputLayers_
.
size
()
==
1
)
{
// Update input value data when input layer is "data" type,
// since the input value data address might be changed.
CHECK
(
extInVal_
);
...
...
@@ -171,14 +171,16 @@ void MKLDNNLayer::resetWithMatrix(MKLDNNMatrixPtr& dnn,
}
void
MKLDNNLayer
::
resetInValue
(
MKLDNNMatrixPtr
&
in
,
const
std
::
shared_ptr
<
memory
::
primitive_desc
>&
intPD
)
{
MKLDNNMatrixPtr
&
in
,
const
std
::
shared_ptr
<
memory
::
primitive_desc
>&
intPD
,
size_t
inputIdx
)
{
cvtInVal_
=
nullptr
;
extInVal_
=
nullptr
;
in
=
nullptr
;
CHECK_GT
(
bs_
*
ic_
*
ih_
*
iw_
,
0
);
auto
extPD
=
MKLDNNMatrix
::
createPrimitiveDesc
(
{
bs_
,
ic_
,
ih_
,
iw_
},
format
::
nchw
,
engine_
);
const
MatrixPtr
&
inMat
=
inputLayers_
[
0
]
->
getOutputValue
();
const
MatrixPtr
&
inMat
=
inputLayers_
[
inputIdx
]
->
getOutputValue
();
in
=
std
::
dynamic_pointer_cast
<
MKLDNNMatrix
>
(
inMat
);
CHECK_EQ
(
inputIsOnlyMKLDNN
(),
in
!=
nullptr
);
if
(
in
==
nullptr
||
in
->
getFormat
()
==
format
::
nc
)
{
...
...
@@ -216,11 +218,12 @@ void MKLDNNLayer::resetOutValue(MKLDNNMatrixPtr& out,
}
void
MKLDNNLayer
::
resetInGrad
(
MKLDNNMatrixPtr
&
in
,
memory
::
primitive_desc
intPD
)
{
memory
::
primitive_desc
intPD
,
size_t
inputIdx
)
{
cvtInGrad_
=
nullptr
;
extInGrad_
=
nullptr
;
in
=
nullptr
;
LayerPtr
&
input
=
inputLayers_
[
0
];
LayerPtr
&
input
=
inputLayers_
[
inputIdx
];
if
(
input
->
getOutputGrad
()
==
nullptr
)
{
// no need input grad
return
;
...
...
@@ -245,7 +248,6 @@ void MKLDNNLayer::resetInGrad(MKLDNNMatrixPtr& in,
return
;
}
// need create reorder
// TODO(TJ): add macro definition to simplify it
CHECK
(
extInVal_
!=
nullptr
&&
isPaddleFormat
(
extInVal_
->
getFormat
()))
<<
"should have external input value and the format must be nchw(nc)"
;
extInGrad_
=
MKLDNNMatrix
::
create
(
extInVal_
->
getPrimitiveDesc
(),
inMat
);
...
...
paddle/gserver/layers/MKLDNNLayer.h
浏览文件 @
483947c4
...
...
@@ -199,7 +199,8 @@ protected:
*/
void
resetInValue
(
MKLDNNMatrixPtr
&
in
,
const
std
::
shared_ptr
<
mkldnn
::
memory
::
primitive_desc
>&
intPD
=
nullptr
);
const
std
::
shared_ptr
<
mkldnn
::
memory
::
primitive_desc
>&
intPD
=
nullptr
,
size_t
inputIdx
=
0
);
/**
* reset output value from internal primitive desc.
...
...
@@ -212,7 +213,9 @@ protected:
* reset input grad from internal primitive desc.
* reset both internal and external buffer and create reorder if necessary.
*/
void
resetInGrad
(
MKLDNNMatrixPtr
&
in
,
mkldnn
::
memory
::
primitive_desc
intPD
);
void
resetInGrad
(
MKLDNNMatrixPtr
&
in
,
mkldnn
::
memory
::
primitive_desc
intPD
,
size_t
inputIdx
=
0
);
/**
* reset output grad from internal primitive desc.
...
...
paddle/gserver/tests/MKLDNNTester.cpp
浏览文件 @
483947c4
...
...
@@ -132,7 +132,7 @@ void MKLDNNTester::checkForward() {
VLOG
(
MKLDNN_TESTS
)
<<
"Check Forward"
;
printTopDatas
();
double
delta
=
compareMatrix
(
dnnLayer_
->
getOutputValue
(),
ref
Layer_
->
getOutputValue
());
compareMatrix
(
refLayer_
->
getOutputValue
(),
dnn
Layer_
->
getOutputValue
());
EXPECT_LE
(
fabs
(
delta
),
eps_
);
}
...
...
@@ -147,7 +147,7 @@ void MKLDNNTester::checkBackwardData() {
VLOG
(
MKLDNN_ALL
)
<<
"Reference Backward Result: InputGrad "
<<
i
;
printMatrix
(
refDiff
);
double
delta
=
compareMatrix
(
dnnDiff
,
ref
Diff
);
double
delta
=
compareMatrix
(
refDiff
,
dnn
Diff
);
EXPECT_LE
(
fabs
(
delta
),
eps_
);
if
(
isBN
)
{
// the other two inputs in batch norm are for moving mean and var
...
...
@@ -177,7 +177,7 @@ void MKLDNNTester::checkBackwardWgts() {
<<
parameters_
[
REF
][
i
]
->
getName
();
printVector
(
ref
);
double
delta
=
compareVector
(
dnn
,
ref
);
double
delta
=
compareVector
(
ref
,
dnn
);
EXPECT_LE
(
fabs
(
delta
),
eps_
);
}
...
...
paddle/gserver/tests/test_MKLDNN.cpp
浏览文件 @
483947c4
...
...
@@ -271,20 +271,53 @@ TEST(MKLDNNLayer, BatchNormLayer) {
testBatchNormLayer
({
16
,
32
,
16
,
16
});
}
struct
test
Act
Desc
{
struct
test
Image
Desc
{
int
bs
,
ic
,
ih
,
iw
;
};
static
void
getAddtoConfig
(
TestConfig
&
cfg
,
const
testActDesc
&
pm
)
{
static
void
getAddtoConfig
(
TestConfig
&
cfg
,
const
testImageDesc
&
pm
,
const
size_t
nInputs
=
1
)
{
cfg
.
biasSize
=
0
;
cfg
.
layerConfig
.
set_type
(
"addto"
);
size_t
layerSize
=
pm
.
ic
*
pm
.
ih
*
pm
.
iw
;
cfg
.
layerConfig
.
set_size
(
layerSize
);
cfg
.
inputDefs
.
push_back
({
INPUT_DATA
,
"layer_0"
,
layerSize
,
0
});
cfg
.
layerConfig
.
add_inputs
();
cfg
.
layerConfig
.
set_active_type
(
"relu"
);
for
(
size_t
i
=
0
;
i
<
nInputs
;
++
i
)
{
std
::
stringstream
ss
;
ss
<<
"layer_"
<<
i
;
cfg
.
inputDefs
.
push_back
({
INPUT_DATA
,
ss
.
str
(),
layerSize
,
0
});
LayerInputConfig
*
input
=
cfg
.
layerConfig
.
add_inputs
();
ImageConfig
*
img_conf
=
input
->
mutable_image_conf
();
img_conf
->
set_channels
(
pm
.
ic
);
img_conf
->
set_img_size_y
(
pm
.
ih
);
img_conf
->
set_img_size
(
pm
.
iw
);
}
}
void
testAddtoLayer
(
const
testImageDesc
&
pm
,
const
size_t
nInputs
)
{
CHECK_GE
(
nInputs
,
1
);
TestConfig
dnnConfig
;
getAddtoConfig
(
dnnConfig
,
pm
,
nInputs
);
dnnConfig
.
layerConfig
.
set_type
(
"mkldnn_addto"
);
// TODO(TJ): test with bias
for
(
auto
withBias
:
{
false
})
{
if
(
withBias
)
{
dnnConfig
.
biasSize
=
pm
.
ic
*
pm
.
ih
*
pm
.
iw
;
}
else
{
dnnConfig
.
biasSize
=
0
;
}
RUN_MKLDNN_TEST_LAYER
(
dnnConfig
,
"addto"
,
pm
)
}
}
TEST
(
MKLDNNLayer
,
AddtoLayer
)
{
testAddtoLayer
({
16
,
5
,
14
,
14
},
1
);
testAddtoLayer
({
8
,
10
,
8
,
8
},
2
);
testAddtoLayer
({
4
,
12
,
1
,
1
},
3
);
}
void
testActivation
(
std
::
string
actType
,
const
test
Act
Desc
&
pm
)
{
void
testActivation
(
std
::
string
actType
,
const
test
Image
Desc
&
pm
)
{
// TODO(TJ): remove me when paddle support elu activation
if
(
actType
==
"mkldnn_elu"
)
{
return
;
...
...
paddle/operators/CMakeLists.txt
浏览文件 @
483947c4
...
...
@@ -142,6 +142,7 @@ set(DEPS_OPS
nccl_op
sequence_conv_op
sequence_pool_op
lod_rank_table_op
lstm_op
)
op_library
(
cond_op SRCS cond_op.cc DEPS framework_proto tensor operator net_op
)
...
...
@@ -150,6 +151,7 @@ op_library(softmax_with_cross_entropy_op DEPS cross_entropy softmax)
op_library
(
sum_op DEPS net_op selected_rows_functor
)
op_library
(
pool_op DEPS pooling
)
op_library
(
pool_with_index_op DEPS pooling
)
op_library
(
lod_rank_table_op SRCS lod_rank_table_op.cc DEPS lod_rank_table
)
if
(
WITH_GPU
)
op_library
(
nccl_op DEPS nccl_common
)
endif
()
...
...
paddle/operators/accuracy_op.cc
浏览文件 @
483947c4
...
...
@@ -33,7 +33,7 @@ class AccuracyOp : public framework::OperatorWithKernel {
auto
inference_dim
=
ctx
->
GetInputDim
(
"Out"
);
auto
label_dim
=
ctx
->
GetInputDim
(
"Label"
);
// Assume indices has same shape
with inferne
ce, because
// Assume indices has same shape
as inferen
ce, because
// it's the output of topk.
PADDLE_ENFORCE_EQ
(
label_dim
.
size
(),
2
,
"label's rank must be 2."
);
...
...
@@ -60,20 +60,24 @@ class AccuracyOpMaker : public framework::OpProtoAndCheckerMaker {
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
// TODO(typhoonzero): support both inference value and indices.
AddInput
(
"Out"
,
"
topk (inferences) the network output
"
);
AddInput
(
"Indices"
,
"
topk (indices) the network output
"
);
AddInput
(
"Out"
,
"
The network output of topk (inferences)
"
);
AddInput
(
"Indices"
,
"
The the network output of topk (indices)
"
);
AddInput
(
"Label"
,
"Label of the training data"
);
// TODO(typhoonzero): AddInput("Weight", ...
AddOutput
(
"Accuracy"
,
"The accuracy of current batch"
);
AddComment
(
R"DOC(
Accuracy. It will print accuracy rate for classification.
The accuracy is:
.. math::
accuracy = \\frac{NumOfCorrectPredicts}{NumOfAllSamples})
Accuracy Operator.
It will print accuracy rate for classification.
The accuracy is calculated as follows:
$$accuracy = \frac{NumOfCorrectPredicts}{NumOfAllSamples}$$
Both the input Out and Label can carry the LoD (Level of Details)
information, or not. But the output only shares the LoD information
with the input Out(Inference).
Both the input `Out` and `Label` can carry the LoD (Level of Details)
information, or not. But the output only shares the LoD with input `Inference`.
)DOC"
);
}
};
...
...
paddle/operators/activation_op.cc
浏览文件 @
483947c4
...
...
@@ -44,7 +44,7 @@ class SigmoidOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Sigmoid operator"
);
AddOutput
(
"Y"
,
"Output of Sigmoid operator"
);
AddComment
(
R"DOC(
Sigmoid
activation o
perator.
Sigmoid
Activation O
perator.
$y = 1 / (1 + e^{-x})$
...
...
@@ -60,7 +60,7 @@ class LogSigmoidOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of LogSigmoid operator"
);
AddOutput
(
"Y"
,
"Output of LogSigmoid operator"
);
AddComment
(
R"DOC(
Logsigmoid
activation o
perator.
Logsigmoid
Activation O
perator.
$y = \log(1 / (1 + e^{-x}))$
...
...
@@ -75,7 +75,7 @@ class ExpOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Exp operator"
);
AddOutput
(
"Y"
,
"Output of Exp operator"
);
AddComment
(
R"DOC(
Exp
activation o
perator.
Exp
Activation O
perator.
$y = e^x$
...
...
@@ -90,7 +90,7 @@ class ReluOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Relu operator"
);
AddOutput
(
"Y"
,
"Output of Relu operator"
);
AddComment
(
R"DOC(
Relu
activation o
perator.
Relu
Activation O
perator.
$y = \max(x, 0)$
...
...
@@ -109,7 +109,7 @@ class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"alpha"
,
"The small negative slope"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
0.02
f
));
AddComment
(
R"DOC(
LeakyRelu
activation o
perator.
LeakyRelu
Activation O
perator.
$y = \max(x, \alpha * x)$
...
...
@@ -128,7 +128,7 @@ class SoftShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"lambda"
,
"non-negative offset"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
0.5
f
));
AddComment
(
R"DOC(
Softshrink
activation o
perator.
Softshrink
Activation O
perator.
$$
y = \begin{cases}
...
...
@@ -149,7 +149,7 @@ class TanhOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Tanh operator"
);
AddOutput
(
"Y"
,
"Output of Tanh operator"
);
AddComment
(
R"DOC(
Tanh
activation o
perator.
Tanh
Activation O
perator.
$$y = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$
...
...
@@ -165,7 +165,7 @@ class TanhShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of TanhShrink operator"
);
AddOutput
(
"Y"
,
"Output of TanhShrink operator"
);
AddComment
(
R"DOC(
TanhShrink
activation o
perator.
TanhShrink
Activation O
perator.
$$y = x - \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$
...
...
@@ -184,7 +184,7 @@ class HardShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"threshold"
,
"The value of threshold for HardShrink"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
0.5
));
AddComment
(
R"DOC(
HardShrink
activation o
perator.
HardShrink
Activation O
perator.
$$
y = \begin{cases}
...
...
@@ -205,7 +205,7 @@ class SqrtOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Sqrt operator"
);
AddOutput
(
"Y"
,
"Output of Sqrt operator"
);
AddComment
(
R"DOC(
Sqrt
activation o
perator.
Sqrt
Activation O
perator.
$y = \sqrt{x}$
...
...
@@ -220,7 +220,7 @@ class AbsOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Abs operator"
);
AddOutput
(
"Y"
,
"Output of Abs operator"
);
AddComment
(
R"DOC(
Abs
activation o
perator.
Abs
Activation O
perator.
$y = |x|$
...
...
@@ -236,7 +236,7 @@ class ReciprocalOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Reciprocal operator"
);
AddOutput
(
"Y"
,
"Output of Reciprocal operator"
);
AddComment
(
R"DOC(
Reciprocal
activation o
perator.
Reciprocal
Activation O
perator.
$$y = \frac{1}{x}$$
...
...
@@ -251,7 +251,7 @@ class LogOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Log operator"
);
AddOutput
(
"Y"
,
"Output of Log operator"
);
AddComment
(
R"DOC(
Log
activation o
perator.
Log
Activation O
perator.
$y = \ln(x)$
...
...
@@ -268,7 +268,7 @@ class SquareOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Square operator"
);
AddOutput
(
"Y"
,
"Output of Square operator"
);
AddComment
(
R"DOC(
Square
activation o
perator.
Square
Activation O
perator.
$y = x^2$
...
...
@@ -284,7 +284,7 @@ class SoftplusOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Softplus operator"
);
AddOutput
(
"Y"
,
"Output of Softplus operator"
);
AddComment
(
R"DOC(
Softplus
activation o
perator.
Softplus
Activation O
perator.
$y = \ln(1 + e^{x})$
...
...
@@ -300,7 +300,7 @@ class SoftsignOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"Input of Softsign operator"
);
AddOutput
(
"Y"
,
"Output of Softsign operator"
);
AddComment
(
R"DOC(
Softsign
activation o
perator.
Softsign
Activation O
perator.
$$y = \frac{x}{1 + |x|}$$
...
...
@@ -320,7 +320,7 @@ class BReluOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"t_max"
,
"The max marginal value of BRelu"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
24
));
AddComment
(
R"DOC(
BRelu
activation o
perator.
BRelu
Activation O
perator.
$y = \max(\min(x, t_{min}), t_{max})$
...
...
@@ -339,7 +339,7 @@ class SoftReluOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"threshold"
,
"The threshold value of SoftRelu"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
40
));
AddComment
(
R"DOC(
SoftRelu
activation o
perator.
SoftRelu
Activation O
perator.
$y = \ln(1 + \exp(\max(\min(x, threshold), threshold))$
...
...
@@ -357,7 +357,7 @@ class ELUOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"alpha"
,
"The alpha value of ELU"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
1.0
f
));
AddComment
(
R"DOC(
ELU
activation o
perator.
ELU
Activation O
perator.
Applies the following element-wise computation on the input according to
https://arxiv.org/abs/1511.07289.
...
...
@@ -378,7 +378,7 @@ class Relu6OpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"threshold"
,
"The threshold value of Relu6"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
6
));
AddComment
(
R"DOC(
Relu6
activation o
perator.
Relu6
Activation O
perator.
$y = \min(\max(0, x), 6)$
...
...
@@ -396,7 +396,7 @@ class PowOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"factor"
,
"The exponential factor of Pow"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
1
));
AddComment
(
R"DOC(
Pow
activation o
perator.
Pow
Activation O
perator.
$y = x^{factor}$
...
...
@@ -416,7 +416,7 @@ class STanhOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"scale_b"
,
"The scale parameter of b for the input"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
1.7159
));
AddComment
(
R"DOC(
STanh
activation o
perator.
STanh
Activation O
perator.
$$y = b * \frac{e^{a * x} - e^{-a * x}}{e^{a * x} + e^{-a * x}}$$
...
...
@@ -435,7 +435,7 @@ class ThresholdedReluOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"threshold"
,
"The threshold location of activation"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
1.0
));
AddComment
(
R"DOC(
ThresholdedRelu
activation o
perator.
ThresholdedRelu
Activation O
perator.
$$
y = \begin{cases}
...
...
@@ -461,7 +461,7 @@ class HardSigmoidOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"offset"
,
"Offset for linear approximation of sigmoid"
)
.
SetDefault
(
static_cast
<
AttrType
>
(
0.5
));
AddComment
(
R"DOC(
HardSigmoid
activation o
perator.
HardSigmoid
Activation O
perator.
Segment-wise linear approximation of sigmoid(https://arxiv.org/abs/1603.00391),
which is much faster than sigmoid.
...
...
paddle/operators/adadelta_op.cc
浏览文件 @
483947c4
...
...
@@ -64,16 +64,15 @@ class AdadeltaOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"Param"
,
"(Tensor) Input parameter"
);
AddInput
(
"Grad"
,
"(Tensor) Input gradient"
);
AddInput
(
"AvgSquaredGrad"
,
"(Tensor) Input expectation of squared gradient"
);
AddInput
(
"AvgSquaredGrad"
,
"(Tensor) Input average of squared gradient"
);
AddInput
(
"AvgSquaredUpdate"
,
"(Tensor) Input
expectation
of squared parameter updates"
);
"(Tensor) Input
average
of squared parameter updates"
);
AddOutput
(
"ParamOut"
,
"(Tensor) Output parameter"
);
AddOutput
(
"AvgSquaredGradOut"
,
"(Tensor) Output
expectation
of squared gradient"
);
"(Tensor) Output
average
of squared gradient"
);
AddOutput
(
"AvgSquaredUpdateOut"
,
"(Tensor) Output
expectation
of squared parameter updates"
);
"(Tensor) Output
average
of squared parameter updates"
);
AddAttr
<
float
>
(
"rho"
,
"(float, default 0.95) Exponential decay rate "
...
...
@@ -84,22 +83,21 @@ class AdadeltaOpMaker : public framework::OpProtoAndCheckerMaker {
"numerical stability"
)
.
SetDefault
(
1.0e-6
f
);
AddComment
(
R"DOC(
Adadelta
Updates Operato
r.
Adadelta
Optimize
r.
This implements the Adadelta optimizer[1]. Adadelta is a per-dimension
adaptive learning rate method for gradient descent.
Adadelta optimizer is implemented as explained in:
https://arxiv.org/abs/1212.5701
Adadelta is a per-dimension adaptive learning rate method used
for gradient descent.
Adadelta updates:
Adadelta updates
are as follows
:
avg_squared_grad_out = rho * avg_squared_grad + (1 - rho) * grad * grad
param_update = - sqrt((avg_squared_update + epsilon) /
(avg_squared_grad_out + epsilon)) * grad
avg_squared_update_out = rho * avg_squared_update + (1 - rho) * param_update**2
param_out = param + param_update
References:
[1] ADADELTA: An Adaptive Learning Rate Method
https://arxiv.org/abs/1212.5701
$$avgSquaredGradOut = \rho * avgSquaredGrad + (1 - \rho) * grad * grad \break
paramUpdate = - $\sqrt{((avgSquaredUpdate + \epsilon) /
(avgSquaredGrad_out + \epsilon))}$ * grad \break
avgSquaredUpdateOut = \rho * avgSquaredUpdate + (1 - \rho) *
{(paramUpdate)}^2 \break
paramOut = param + paramUpdate$$
)DOC"
);
}
...
...
paddle/operators/adagrad_op.cc
浏览文件 @
483947c4
...
...
@@ -73,12 +73,16 @@ class AdagradOpMaker : public framework::OpProtoAndCheckerMaker {
Adaptive Gradient Algorithm (Adagrad).
moment_out = moment + grad * grad
param_out = param - learning_rate * grad / (sqrt(moment_out) + epsilon)
The update is done as follows:
$$momentOut = moment + grad * grad \break
paramOut = param - learningRate * grad / ($\sqrt{momentOut}$ + \epsilon) \break
$$
The original paper(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
does not have the epsilon attribute. It is added here for numerical stability
by avoiding division by zero.
does not have the epsilon attribute. It is added here in our implementation
as also proposed here: http://cs231n.github.io/neural-networks-3/#ada
for numerical stability to avoid the division by zero error.
)DOC"
);
}
...
...
paddle/operators/adam_op.cc
浏览文件 @
483947c4
...
...
@@ -51,8 +51,8 @@ class AdamOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_EQ
(
framework
::
product
(
beta1_pow_dims
),
1
,
"Beta1 power accumulator should have 1 dimension"
);
auto
beta2_pow_dims
=
ctx
->
GetInputDim
(
"Beta2Pow"
);
PADDLE_ENFORCE_EQ
(
framework
::
product
(
beta
1
_pow_dims
),
1
,
"Beta
1
power accumulator should have 1 dimension"
);
PADDLE_ENFORCE_EQ
(
framework
::
product
(
beta
2
_pow_dims
),
1
,
"Beta
2
power accumulator should have 1 dimension"
);
auto
param_dims
=
ctx
->
GetInputDim
(
"Param"
);
PADDLE_ENFORCE_EQ
(
...
...
@@ -60,10 +60,10 @@ class AdamOp : public framework::OperatorWithKernel {
"Param and Grad input of AdamOp should have same dimension"
);
PADDLE_ENFORCE_EQ
(
param_dims
,
ctx
->
GetInputDim
(
"Moment1"
),
"Param and Moment input of AdamOp should have same dimension"
);
"Param and Moment
1
input of AdamOp should have same dimension"
);
PADDLE_ENFORCE_EQ
(
param_dims
,
ctx
->
GetInputDim
(
"Moment2"
),
"Param and
InfNorm
input of AdamOp should have same dimension"
);
"Param and
Moment2
input of AdamOp should have same dimension"
);
ctx
->
SetOutputDim
(
"ParamOut"
,
param_dims
);
ctx
->
SetOutputDim
(
"Moment1Out"
,
param_dims
);
...
...
@@ -103,23 +103,20 @@ class AdamOpMaker : public framework::OpProtoAndCheckerMaker {
.
SetDefault
(
1.0e-8
f
);
AddComment
(
R"DOC(
Adam
Updates Operato
r.
Adam
Optimize
r.
This implements the Adam optimizer from Section 2 of the Adam
paper[1]. Adam is a first-order gradient-based optimization
method based on adaptive estimates of lower-order moments.
paper : https://arxiv.org/abs/1412.6980.
Adam is a first-order gradient-based optimization method based on
adaptive estimates of lower-order moments.
Adam updates:
moment1_out = beta1 * moment1 + (1 − beta1) * grad
moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
learning_rate_t = learning_rate_t *
sqrt(1 - beta2_pow) / (1 - beta1_pow)
param_out = param - learning_rate_t * moment1/ (sqrt(moment2) + epsilon)
References:
[1] Adam: A Method for Stochastic Optimization
(https://arxiv.org/abs/1412.6980)
$$moment_1_{out} = \beta_1 * moment_1 + (1 - \beta_1) * grad \break
moment_2_{out} = \beta_2 * moment_2 + (1 - \beta_2) * grad * grad \break
learningRate = learningRate *
$\sqrt{(1 - \beta_2_{pow})}$ / (1 - \beta_1_{pow}) \break
paramOut = param - learningRate * moment_1/ ($\sqrt{(moment_2)} + \epsilon)$$
)DOC"
);
}
...
...
paddle/operators/adamax_op.cc
浏览文件 @
483947c4
...
...
@@ -99,26 +99,22 @@ class AdamaxOpMaker : public framework::OpProtoAndCheckerMaker {
"Constant for numerical stability"
)
.
SetDefault
(
1.0e-8
f
);
AddComment
(
R"DOC(
Adamax
Updates Operato
r.
Adamax
Optimize
r.
This implements
the Adamax optimizer from Section 7 of the Adam
paper
[1]
. Adamax is a variant of the
We implement
the Adamax optimizer from Section 7 of the Adam
paper
: https://arxiv.org/abs/1412.6980
. Adamax is a variant of the
Adam algorithm based on the infinity norm.
Adamax updates:
moment_out = beta1 * moment + (1 - beta1) * grad
inf
_norm_out = max(beta2 * inf_norm + epsilon, abs(grad))
learning
_rate_t = learning_rate/(1 - beta1_pow)
param
_out = param - learning_rate_t * moment_out/inf_norm_out
$$momentOut = \beta_1 * moment + (1 - \beta_1) * grad \break
inf
NormOut = max(\beta_2 * infNorm + \epsilon, |grad|) \break
learning
Rate = learningRate /(1 - \beta_1_{pow}) \break
param
Out = param - learningRate * momentPut / infNormOut$$
The original paper does not have an epsilon attribute.
However, it is added here for numerical stability
by preventing divide by 0.
References:
[1] Adam: A Method for Stochastic Optimization
(https://arxiv.org/abs/1412.6980)
However, it is added here for numerical stability to prevent the
division by 0 error.
)DOC"
);
}
...
...
paddle/operators/auc_op.cc
浏览文件 @
483947c4
...
...
@@ -23,11 +23,11 @@ class AucOp : public framework::OperatorWithKernel {
protected:
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Out"
),
"Input of Out
must be initialized
."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Out"
),
"Input of Out
should not be null
."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Indices"
),
"Input of Indices
must be initialized
."
);
"Input of Indices
should not be null
."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Label"
),
"Input of Label
must be initialized
."
);
"Input of Label
should not be null
."
);
auto
inference_height
=
ctx
->
GetInputDim
(
"Out"
)[
0
];
auto
label_height
=
ctx
->
GetInputDim
(
"Label"
)[
0
];
...
...
@@ -52,20 +52,20 @@ class AucOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"Out"
,
"A floating point 2D tensor, values are in the range [0, 1]."
"Each row is
descend sorted
. This input should be the"
"Each row is
sorted in descending order
. This input should be the"
"output of topk."
"Typically, this tensor indicates the probability of each label"
);
AddInput
(
"Indices"
,
"An int 2D tensor, indicating the indices of original"
"tensor before sort
. Typically, this tensor indicates which label
"
"the probability stands for."
);
"tensor before sort
ing. Typically, this tensor indicates which
"
"
label
the probability stands for."
);
AddInput
(
"Label"
,
"A 2D int tensor indicating the label of the training data."
"The height is batch size and width is always 1."
);
// TODO(typhoonzero): support weight input
AddOutput
(
"AUC"
,
"A scalar representing the "
"current area-under-curve."
);
"current area-under-
the-
curve."
);
AddAttr
<
std
::
string
>
(
"curve"
,
"Curve type, can be 'ROC' or 'PR'."
)
.
SetDefault
(
"ROC"
);
...
...
@@ -74,19 +74,18 @@ class AucOpMaker : public framework::OpProtoAndCheckerMaker {
" roc curve."
)
.
SetDefault
(
200
);
AddComment
(
R"DOC(Computes the AUC according forward output and label.
Best to use for binary classification evaluations.
AddComment
(
R"DOC(
Area Under The Curve (AUC) Operator.
This implementation computes the AUC according to forward output and label.
It is used very widely in binary classification evaluation. As a note:
If input label contains values other than 0 and 1, it will be cast
to bool.
You can find the definations here:
to bool. You can find the relevant definitions here:
https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve
Possible curves are
:
-
ROC: Receiver operating characteristic
-
PR: Precision Recall
There are two types of possible curves
:
1.
ROC: Receiver operating characteristic
2.
PR: Precision Recall
)DOC"
);
}
};
...
...
paddle/operators/batch_norm_op.cc
浏览文件 @
483947c4
...
...
@@ -51,6 +51,10 @@ class BatchNormOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"SavedMean"
),
""
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"SavedVariance"
),
""
);
const
float
epsilon
=
ctx
->
Attrs
().
Get
<
float
>
(
"epsilon"
);
PADDLE_ENFORCE_GE
(
epsilon
,
0.0
,
"epsilon should be larger than 0"
);
PADDLE_ENFORCE_LE
(
epsilon
,
0.001
,
"epsilon should not be too large"
);
// make sure Mean/MeanOut and Variance/VarianceOut share memory in Python
PADDLE_ENFORCE_EQ
(
ctx
->
Inputs
(
"Mean"
)[
0
],
ctx
->
Outputs
(
"MeanOut"
)[
0
],
"Mean and MeanOut should share the same memory"
);
...
...
@@ -66,7 +70,7 @@ class BatchNormOp : public framework::OperatorWithKernel {
:
x_dims
[
x_dims
.
size
()
-
1
]);
PADDLE_ENFORCE
(
x_dims
.
size
()
>=
3
&&
x_dims
.
size
()
<=
5
,
"Input
x
must have 3 to 5 dimensions."
);
"Input
X
must have 3 to 5 dimensions."
);
PADDLE_ENFORCE_EQ
(
ctx
->
GetInputDim
(
"Scale"
).
size
(),
1UL
);
PADDLE_ENFORCE_EQ
(
ctx
->
GetInputDim
(
"Scale"
)[
0
],
C
);
...
...
@@ -93,16 +97,16 @@ class BatchNormOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"The input tensor"
);
AddInput
(
"Scale"
,
"Scale is a 1-dimensional tensor of size C "
"t
o be
applied to the output"
);
"t
hat is
applied to the output"
);
AddInput
(
"Bias"
,
"Bias is a 1-dimensional tensor of size C "
"t
o be
applied to the output"
);
"t
hat is
applied to the output"
);
AddInput
(
"Mean"
,
"The global mean (for training) or
the
"
"The global mean (for training) or "
"estimated mean (for testing)"
);
AddInput
(
"Variance"
,
"The global variance (for training) "
"or
the
estimated Variance (for testing)"
);
"or estimated Variance (for testing)"
);
AddOutput
(
"Y"
,
"result after normalization"
);
AddOutput
(
"MeanOut"
,
"Share memory with Mean. "
...
...
@@ -119,10 +123,14 @@ class BatchNormOpMaker : public framework::OpProtoAndCheckerMaker {
"will apply to output when training"
)
.
AsIntermediate
();
AddComment
(
R"DOC(
https://arxiv.org/pdf/1502.03167.pdf
Batch Normalization.
NHWC `[batch, in_height, in_width, in_channels]`
NCHW `[batch, in_channels, in_height, in_width]`
Batch Norm has been implemented as discussed in the paper:
https://arxiv.org/pdf/1502.03167.pdf
Can be used as a normalizer function for conv2d and fully_connected operations.
The required data format for this layer is one of the following:
1. NHWC `[batch, in_height, in_width, in_channels]`
2. NCHW `[batch, in_channels, in_height, in_width]`
)DOC"
);
}
...
...
@@ -297,7 +305,6 @@ class BatchNormGradOp : public framework::OperatorWithKernel {
framework
::
DataType
IndicateDataType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
VLOG
(
3
)
<<
"IndicateDataType "
<<
this
->
Type
();
const
auto
*
var
=
ctx
.
InputVar
(
framework
::
GradVarName
(
"Y"
));
if
(
var
==
nullptr
)
{
PADDLE_THROW
(
"can't find Y@GRAD"
);
...
...
paddle/operators/cast_op.cc
浏览文件 @
483947c4
...
...
@@ -23,13 +23,17 @@ class CastOpProtoMaker : public framework::OpProtoAndCheckerMaker {
CastOpProtoMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"the input tensor of cast op"
);
AddOutput
(
"Out"
,
"the output tensor of cast op"
);
AddComment
(
R"DOC(Cast operator.
cast the input tensor to other data type.
)DOC"
);
AddInput
(
"X"
,
"The input tensor of cast op"
);
AddOutput
(
"Out"
,
"The output tensor of cast op"
);
AddAttr
<
int
>
(
"out_data_type"
,
"output data type"
);
AddAttr
<
int
>
(
"in_data_type"
,
"input data type"
);
AddComment
(
R"DOC(
Cast Operator.
This Operator casts the input tensor to another data type and
returns tha Output Tensor.
)DOC"
);
}
};
...
...
paddle/operators/clip_op.cc
浏览文件 @
483947c4
...
...
@@ -49,8 +49,11 @@ class ClipOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
AttrType
>
(
"max"
,
"(float)Maximum value, above which element is replaced by max"
);
AddComment
(
R"DOC(
Clip operator limits the given input within an interval. The interval is
Clip Operator.
The clip operator limits the value of given input within an interval. The interval is
specified with arguments 'min' and 'max'.
)DOC"
);
}
};
...
...
paddle/operators/concat_op.cc
浏览文件 @
483947c4
...
...
@@ -56,20 +56,24 @@ class ConcatOpMaker : public framework::OpProtoAndCheckerMaker {
public:
ConcatOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"the input tensors of concat operator."
).
AsDuplicable
();
AddOutput
(
"Out"
,
"the output tensor of concat operator."
);
AddInput
(
"X"
,
"Input tensors of concat operator."
).
AsDuplicable
();
AddOutput
(
"Out"
,
"Output tensor of concat operator."
);
AddAttr
<
int
>
(
"axis"
,
"The axis along which the input tensors will be concatenated."
)
.
SetDefault
(
0
);
AddComment
(
R"DOC(
Join the input tensors along with the axis.
Examples:
Concat Operator.
Concatenate the input tensors along dimension axis.
Examples:
Input[0] = [[1,2],[3,4]]
Input[1] = [[5,6]]
axis = 0
Output = [[1,2],
[3,4],
[5,6]]
)DOC"
);
AddAttr
<
int
>
(
"axis"
,
"The axis which the inputs will be joined with."
)
.
SetDefault
(
0
);
)DOC"
);
}
};
...
...
paddle/operators/cond_op.cc
浏览文件 @
483947c4
...
...
@@ -216,11 +216,12 @@ class CondOpProtoAndCheckerMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"IndexTensors"
,
"Index Tensors contains indices for true/false"
);
AddComment
(
R"DOC(
Sample dependent Cond Operator:
Given Cond[i] as a 1/0 vector to indicate true/false
The equation is:
Out[i] = subnet_t[i], if Cond[i] == true
Out[i] = subnet_t[i], if Cond[i] == false
Sample Dependent Conditional Operator.
Given Cond[i] as a 1/0 vector to indicate true/false:
Out[i] = subnet_true[i], if Cond[i] == true
Out[i] = subnet_false[i], if Cond[i] == false
)DOC"
);
}
};
...
...
paddle/operators/conv2d_op.cc
浏览文件 @
483947c4
...
...
@@ -56,17 +56,18 @@ Conv2DOpMaker::Conv2DOpMaker(framework::OpProto* proto,
AddInput
(
"Input"
,
"The input tensor of convolution operator. "
"The format of input tensor is NCHW. Where N is batch size, C is the "
"number of channels, H and W is the height and width of image."
);
"The format of input tensor is NCHW, where N is batch size, C is the "
"number of channels, H is the height of the image, "
"and W is the width of the image."
);
AddInput
(
"Filter"
,
"The filter tensor of convolution operator."
"The filter tensor of convolution operator.
"
"The format of the filter tensor is MCHW, where M is the number of "
"output image channels, C is the number of input image channels, "
"H
and W is height and width of
filter. "
"If the groups attribute is greater than 1, C equal the number of "
"H
is the height of the filter, and W is the width of the
filter. "
"If the groups attribute is greater than 1, C equal
s
the number of "
"input image channels divided by the groups."
);
AddOutput
(
"Output"
,
"The output tensor of convolution operator."
"The output tensor of convolution operator.
"
"The format of output tensor is also NCHW."
);
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"strides of convolution operator."
)
.
SetDefault
({
1
,
1
});
...
...
@@ -74,16 +75,19 @@ Conv2DOpMaker::Conv2DOpMaker(framework::OpProto* proto,
.
SetDefault
({
0
,
0
});
AddAttr
<
int
>
(
"groups"
,
"
g
roup size of convolution operator. "
"
Refer to grouped convolution in Alex Krizhevsky's
paper: "
"when group=2, the first half of the filters
are
only connected to the "
"first half of the input channels,
and the second half only connected
"
"
to the second half
."
)
"
G
roup size of convolution operator. "
"
According to grouped convolution in Alex Krizhevsky's Deep CNN
paper: "
"when group=2, the first half of the filters
is
only connected to the "
"first half of the input channels,
while the second half of the filters
"
"
is only connected to the second half of the input channels
."
)
.
SetDefault
(
1
);
AddComment
(
R"DOC(
The convolution operation calculates the output based on the input, filter
and strides, paddings, groups parameters. The size of each dimension of the
parameters is checked in the infer-shape.
Convolution Operator.
The convolution operation calculates the output based on the input, filter,
strides, paddings, and groups parameters. The size of each dimension of the
parameters is checked in the infer-shape method.
)DOC"
);
}
...
...
paddle/operators/conv2d_transpose_op.cc
浏览文件 @
483947c4
...
...
@@ -54,15 +54,16 @@ Conv2DTransposeOpMaker::Conv2DTransposeOpMaker(
AddInput
(
"Input"
,
"(Tensor) The input tensor of convolution transpose operator. "
"The format of input tensor is NCHW. Where N is batch size, C is the "
"number of input channels, H and W is the height and width of image."
);
"The format of input tensor is NCHW, where N is batch size, C is the "
"number of input channels, H is the height of the image, and "
"W is the width of the image."
);
AddInput
(
"Filter"
,
"(Tensor) The filter tensor of convolution transpose operator."
"The format of the filter tensor is CMHW, where C is the number of "
"output image channels, M is the number of input image channels, "
"H
and W is height and width of
filter. "
"H
is the height of the filter, and W is the width of the
filter. "
"We enforce groups number == 1 and padding == 0 in "
"
convolution transpose S
cenario."
);
"
the convolution transpose s
cenario."
);
AddOutput
(
"Output"
,
"(Tensor) The output tensor of convolution transpose operator."
"The format of output tensor is also NCHW."
);
...
...
@@ -73,9 +74,12 @@ Conv2DTransposeOpMaker::Conv2DTransposeOpMaker(
"paddings of convolution transpose operator."
)
.
SetDefault
({
0
,
0
});
AddComment
(
R"DOC(
The convolution transpose operation calculates the output based on the input, filter
and strides, paddings, groups parameters. The size of each dimension of the
parameters is checked in the infer-shape.
Convolution Transpose Operator.
The convolution transpose operation calculates the output based on the input,
filter, strides, paddings, and groups parameters. The size of each dimension
of the parameters is checked in the infer-shape method.
)DOC"
);
}
...
...
paddle/operators/conv_cudnn_op.cc
浏览文件 @
483947c4
...
...
@@ -29,7 +29,7 @@ class CudnnConvOpMaker : public Conv2DOpMaker {
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardwar
d. This size should be carefully setted
."
)
"better hardwar
e. This size should be chosen carefully
."
)
.
SetDefault
(
4096
);
}
};
...
...
paddle/operators/conv_shift_op.cc
浏览文件 @
483947c4
...
...
@@ -96,14 +96,13 @@ as used in the Neural Turing Machine: https://arxiv.org/abs/1410.5401
The equation is:
\f[
Out[i] = \sum_{j=-(N-1)/2}^{(N-1)/2} X_{i+j} * Y_{j}
\f]
$$Out[i] = \sum_{j=-(N-1)/2}^{(N-1)/2} X_{i+j} * Y_{j}$$
where X's index is computed modulo M, and b's index is computed modulo N.
where X's index is computed modulo M, and Y's index is computed modulo N.
Both inputs X and Y can carry LoD (Level of Details) information.
However, the output only shares the LoD information with input X.
Both of the input `X` and `Y` can carry LoD (Level of Details) information.
However, the output only shares the LoD information with input `X`.
)DOC"
);
}
};
...
...
paddle/operators/cos_sim_op.cc
浏览文件 @
483947c4
...
...
@@ -79,15 +79,16 @@ class CosSimOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment
(
R"DOC(
Cosine Similarity Operator.
The equation is: Out = X^T * Y / (sqrt(X^T * X) * sqrt(Y^T * Y)).
$Out = X^T * Y / (\sqrt{X^T * X} * \sqrt{Y^T * Y})$
The input
`X` and `Y`
must have the same shape, except that the 1st dimension
of input
`Y` could be just 1 (different from input `X`
), which will be
broadcasted to match the shape of input
`X`
before computing their cosine
The input
X and Y
must have the same shape, except that the 1st dimension
of input
Y could be just 1 (different from input X
), which will be
broadcasted to match the shape of input
X
before computing their cosine
similarity.
Both the input `X` and `Y` can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with input `X`.
Both the input X and Y can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD information with input X.
)DOC"
);
}
};
...
...
paddle/operators/crf_decoding_op.cc
0 → 100644
浏览文件 @
483947c4
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/crf_decoding_op.h"
namespace
paddle
{
namespace
operators
{
class
CRFDecodingOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
CRFDecodingOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"Emission"
,
"(LoDTensor, default: LoDTensor<float>). A LoDTensor with shape "
"[N x D] where N is the size of the mini-batch and D is the total "
"tag number. This input is the unscaled emission weight matrix of "
"the linear_chain_crf operator."
);
AddInput
(
"Transition"
,
"(Tensor, default: Tensor<float>). A Tensor with shape [(D + 2) x D]. "
"This input is the transition weights learned by the linear_chain_crf "
"operator, denoted as w. The 1st row of w are transition weights for "
"the start mask. The 2nd row of w are transition weights for the end "
"mask. Transition weights between other tags begin from the 3rd row of "
"w. See more details in comments of the linear_chain_crf operator."
);
AddInput
(
"Label"
,
"(LoDTensor, LoDTensor<int>). The ground truth with shape "
"[N x 1]. This input is optional. See more details in the operator's "
"comments."
)
.
AsDispensable
();
AddOutput
(
"ViterbiPath"
,
"(LoDTensor, LoDTensor<int>). The decoding results. What to "
"return changes depending on whether the Input(Label) (the groud "
"truth) is given. See more details in the operator's comment."
);
AddComment
(
R"DOC(
The crf_decoding operator reads the emission feature weights and the transition
freature weights learned by the linear_chain_crf operator. It implements the
Viterbi algorithm which is a dynamic programming algorithm for finding the most
likely sequence of hidden states, called the Viterbi path, that results in a
sequence of observed tags.
The output of this operator changes according to whether Input(Label) is given:
1. Input(Label) is given:
This happens in training. This operator is used to co-work with the chunk_eval
operator.
When Input(Label) is given, the crf_decoding operator returns a row vector
with shape [N x 1] whose values are fixed to be 0, indicating an incorrect
prediction, or 1 indicating a tag is correctly predicted. Such an ouput is the
input to chunk_eval operator.
2. Input(Label) is not given:
This is the standard decoding process.
The crf_decoding operator returns a row vecotr with shape [N x 1] whose values
range from 0 to maximum tag number - 1. Each element indicates an index of a
predicted tag.
)DOC"
);
}
};
class
CRFDecodingOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Emission"
),
"Input(Emission) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Transition"
),
"Input(Transition) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"ViterbiPath"
),
"Output(ViterbiPath) should be not null."
);
auto
emission_dims
=
ctx
->
GetInputDim
(
"Emission"
);
PADDLE_ENFORCE_EQ
(
emission_dims
.
size
(),
2UL
,
"The Input(Emission) should be a 2-D tensor."
);
PADDLE_ENFORCE
(
emission_dims
[
0
],
"An empty mini-batch is not allowed."
);
auto
transition_dims
=
ctx
->
GetInputDim
(
"Transition"
);
PADDLE_ENFORCE_EQ
(
transition_dims
.
size
(),
2UL
,
"The Input(Transition) should be a 2-D tensor."
);
PADDLE_ENFORCE_EQ
(
transition_dims
[
0
]
-
2
,
transition_dims
[
1
],
"An invalid dimension for the Input(Transition), which should "
"be a 2-D tensor with shape [(D + 2) x D]."
);
PADDLE_ENFORCE_EQ
(
emission_dims
[
1
],
transition_dims
[
1
],
"The 2nd dimension of the Input(Emission) and the Input(Transition) "
"should be equal to the tag number."
);
if
(
ctx
->
HasInput
(
"Label"
))
{
auto
label_dims
=
ctx
->
GetInputDim
(
"Label"
);
PADDLE_ENFORCE
(
label_dims
.
size
()
==
2UL
&&
label_dims
[
1
]
==
1UL
,
"The Input(Label) should be a 2-D tensor with the 2nd "
"dimensions fixed to 1."
);
PADDLE_ENFORCE_EQ
(
emission_dims
[
0
],
label_dims
[
0
],
"The height of Input(Emission) and the height of Input(Label) "
"should be the same."
);
}
ctx
->
ShareLoD
(
"Emission"
,
/*->*/
"ViterbiPath"
);
ctx
->
SetOutputDim
(
"ViterbiPath"
,
{
emission_dims
[
0
],
1
});
}
protected:
framework
::
DataType
IndicateDataType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
ToDataType
(
ctx
.
Input
<
LoDTensor
>
(
"Emission"
)
->
type
());
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_WITHOUT_GRADIENT
(
crf_decoding
,
ops
::
CRFDecodingOp
,
ops
::
CRFDecodingOpMaker
);
REGISTER_OP_CPU_KERNEL
(
crf_decoding
,
ops
::
CRFDecodingOpKernel
<
paddle
::
platform
::
CPUPlace
,
float
>
,
ops
::
CRFDecodingOpKernel
<
paddle
::
platform
::
CPUPlace
,
double
>
);
paddle/operators/crf_decoding_op.h
0 → 100644
浏览文件 @
483947c4
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/math_function.h"
namespace
paddle
{
namespace
operators
{
using
framework
::
LoDTensor
;
using
framework
::
LoD
;
using
framework
::
Tensor
;
template
<
typename
Place
,
typename
T
>
class
CRFDecodingOpKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
PADDLE_ENFORCE
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()),
"The crf_decoding operator can only run on CPU."
);
auto
*
emission_weights
=
ctx
.
Input
<
LoDTensor
>
(
"Emission"
);
auto
*
transition_weights
=
ctx
.
Input
<
Tensor
>
(
"Transition"
);
auto
*
label
=
ctx
.
Input
<
LoDTensor
>
(
"Label"
);
auto
*
decoded_path
=
ctx
.
Output
<
Tensor
>
(
"ViterbiPath"
);
PADDLE_ENFORCE_EQ
(
emission_weights
->
NumLevels
(),
1UL
,
"The Input(Emission) should be a sequence."
);
auto
lod
=
emission_weights
->
lod
();
PADDLE_ENFORCE
(
lod
.
size
(),
"Input(Emission) must be a sequence."
);
const
size_t
level
=
0
;
const
size_t
seq_num
=
lod
[
level
].
size
()
-
1
;
int
*
path
=
decoded_path
->
mutable_data
<
int
>
(
platform
::
CPUPlace
());
math
::
SetConstant
<
platform
::
CPUPlace
,
int
>
()(
ctx
.
device_context
(),
decoded_path
,
0
);
for
(
size_t
i
=
0
;
i
<
seq_num
;
++
i
)
{
int
start_pos
=
static_cast
<
int
>
(
lod
[
level
][
i
]);
int
end_pos
=
static_cast
<
int
>
(
lod
[
level
][
i
+
1
]);
Tensor
decoded_path_one_seq
=
decoded_path
->
Slice
(
start_pos
,
end_pos
);
Decode
(
emission_weights
->
Slice
(
start_pos
,
end_pos
),
*
transition_weights
,
&
decoded_path_one_seq
);
}
if
(
label
)
{
PADDLE_ENFORCE_EQ
(
label
->
NumLevels
(),
1UL
,
"The Input(Label) should be a sequence."
);
const
int
*
label_value
=
label
->
data
<
int
>
();
size_t
batch_size
=
emission_weights
->
dims
()[
0
];
for
(
size_t
i
=
0
;
i
<
batch_size
;
++
i
)
{
path
[
i
]
=
label_value
[
i
]
==
path
[
i
]
?
1
:
0
;
}
}
}
private:
void
Decode
(
const
Tensor
&
emission_weights
,
const
Tensor
&
transition_weights
,
Tensor
*
decoded_path
)
const
{
auto
emission_dims
=
emission_weights
.
dims
();
const
size_t
seq_len
=
emission_dims
[
0
];
const
size_t
tag_num
=
emission_dims
[
1
];
const
size_t
state_trans_base_idx
=
2
;
const
T
*
x
=
emission_weights
.
data
<
T
>
();
const
T
*
w
=
transition_weights
.
data
<
T
>
();
int
*
path
=
decoded_path
->
data
<
int
>
();
// alpha is a memo table. An element alpha(k, v) records the score of the
// best sequence of tags from position 1 to position k with v being the end
// tag.
Tensor
alpha
;
T
*
alpha_value
=
alpha
.
mutable_data
<
T
>
(
emission_dims
,
platform
::
CPUPlace
());
Tensor
track
;
int
*
track_value
=
track
.
mutable_data
<
int
>
(
emission_dims
,
platform
::
CPUPlace
());
for
(
size_t
i
=
0
;
i
<
tag_num
;
++
i
)
alpha_value
[
i
]
=
w
[
i
]
+
x
[
i
];
for
(
size_t
k
=
1
;
k
<
seq_len
;
++
k
)
{
for
(
size_t
i
=
0
;
i
<
tag_num
;
++
i
)
{
T
max_score
=
-
std
::
numeric_limits
<
T
>::
max
();
int
max_j
=
0
;
for
(
size_t
j
=
0
;
j
<
tag_num
;
++
j
)
{
T
score
=
alpha_value
[(
k
-
1
)
*
tag_num
+
j
]
+
w
[(
j
+
state_trans_base_idx
)
*
tag_num
+
i
];
if
(
score
>
max_score
)
{
max_score
=
score
;
max_j
=
j
;
}
}
alpha_value
[
k
*
tag_num
+
i
]
=
max_score
+
x
[
k
*
tag_num
+
i
];
track_value
[
k
*
tag_num
+
i
]
=
max_j
;
}
}
T
max_score
=
-
std
::
numeric_limits
<
T
>::
max
();
int
max_i
=
0
;
for
(
size_t
i
=
0
;
i
<
tag_num
;
++
i
)
{
T
score
=
alpha_value
[(
seq_len
-
1
)
*
tag_num
+
i
]
+
w
[
tag_num
+
i
];
if
(
score
>
max_score
)
{
max_score
=
score
;
max_i
=
i
;
}
}
path
[
seq_len
-
1
]
=
max_i
;
for
(
int
k
=
seq_len
-
1
;
k
>=
1
;
--
k
)
{
path
[
k
-
1
]
=
max_i
=
track_value
[
k
*
tag_num
+
max_i
];
}
}
};
}
// namespace operators
}
// namespace paddle
paddle/operators/crop_op.cc
浏览文件 @
483947c4
...
...
@@ -56,34 +56,35 @@ class CropOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input of pad op. "
"The input should be a k-D tensor(k > 0 and k < 7)"
);
"The input should be a k-D tensor(k > 0 and k < 7)
.
"
);
AddInput
(
"Y"
,
"The input used as reference for cropping"
"
with the same dimension as X.
"
)
"The input used as reference for cropping
,
"
"
which is of the same dimensions as X.
"
)
.
AsDispensable
();
AddOutput
(
"Out"
,
"The output of crop op "
"w
ith the same dimension
as X."
);
"The output of crop op
,
"
"w
hich is of the same dimensions
as X."
);
AddAttr
<
std
::
vector
<
int
>>
(
"offsets"
,
"A list<int> describing offsets to be cropped."
"The size of offsets list should be
as
same as "
"
dimension size of
input X."
);
"A list<int> describing offsets to be cropped.
"
"The size of offsets list should be
the
same as "
"
the dimension size of
input X."
);
AddAttr
<
std
::
vector
<
int
>>
(
"shape"
,
"A list<int> describing the shape of output."
"The size of shape list should be
as
same as "
"
dimension size of
input X."
)
"A list<int> describing the shape of output.
"
"The size of shape list should be
the
same as "
"
the dimension size of
input X."
)
.
SetDefault
(
std
::
vector
<
int
>
());
AddComment
(
R"DOC(
Crop Operator.
Crop input into output, as specified by offsets and shape.
There are two ways to set shape:
1. referenc
input: crop input X as
shape as reference input.
1. referenc
e input: crop input X into the same
shape as reference input.
The dimension of reference input should
be
as same as
input X.
2. shape list: crop input X
by
shape described by a list<int>.
The size of shape list should be
as
same as
dimension size of
input X.
be
the same as the dimension of
input X.
2. shape list: crop input X
into the
shape described by a list<int>.
The size of shape list should be
the
same as
the dimension size of
input X.
The input should be a k-D tensor(k > 0 and k < 7). As an example:
...
...
@@ -91,20 +92,20 @@ Given:
X = [[0, 1, 2, 0, 0]
[0, 3, 4, 0, 0]
[0, 0, 0, 0, 0]]
[0, 0, 0, 0, 0]]
,
and
offsets = [0, 1]
offsets = [0, 1]
,
and
shape = [2, 2]
shape = [2, 2]
,
then we get
we get:
Out = [[1, 2],
[3, 4]]
[3, 4]]
.
)DOC"
);
}
...
...
paddle/operators/cross_entropy_op.cc
浏览文件 @
483947c4
...
...
@@ -49,7 +49,7 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
}
protected:
// Explicitly set that
data type of the output of the cross_entropy operator
// Explicitly set that
the data type of computation kernel of cross_entropy
// is determined by its input "X".
framework
::
DataType
IndicateDataType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
...
...
@@ -96,7 +96,8 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
}
protected:
// CrossEntropy's data type just determined by "X"
// Explicitly set that the data type of computation kernel of cross_entropy
// is determined by its input "X".
framework
::
DataType
IndicateDataType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
ToDataType
(
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
());
...
...
@@ -117,9 +118,9 @@ class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker {
"Label"
,
"(Tensor, default Tensor<int>), the ground truth which is "
"a 2-D tensor. "
"When soft_label is set to false,
`Label`
is a Tensor<int> with shape "
"When soft_label is set to false,
Label
is a Tensor<int> with shape "
"[N x 1]. "
"When soft_label is set to true,
`Label`
is a Tensor<float/double> "
"When soft_label is set to true,
Label
is a Tensor<float/double> "
"with shape [N x K]."
);
AddOutput
(
"Y"
,
"(Tensor, default Tensor<float>), a 2-D tensor "
...
...
@@ -137,13 +138,13 @@ computation.
1) One-hot cross-entropy:
soft_label = false, Label[i, 0] indicates the class index for sample i:
Y[i] = -log(X[i, Label[i]])
$Y[i] = -\log(X[i, Label[i]])$
2) Soft-label cross-entropy:
soft_label = true, Label[i, j] indicates the soft label of class j
for sample i:
Y[i] = \sum_j{-Label[i, j] * log(X[i, j])}
$Y[i] = \sum_j{-Label[i, j] * log(X[i, j])}$
Please make sure that in this case the summuation of each row of Label
equals one.
...
...
@@ -153,8 +154,9 @@ computation.
non-zero element (equals 1), soft-label cross-entropy degenerates to a
one-hot cross-entropy with one-hot label representation.
Both the input `X` and `Label` can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with input `X`.
Both the input X and Label can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD information with input X.
)DOC"
);
}
};
...
...
paddle/operators/decayed_adagrad_op.cc
浏览文件 @
483947c4
...
...
@@ -75,11 +75,18 @@ class DecayedAdagradOpMaker : public framework::OpProtoAndCheckerMaker {
"Constant for numerical stability"
)
.
SetDefault
(
1.0e-6
f
);
AddComment
(
R"DOC(
Decayed Adagrad Optimizer.
Decayed Adagrad
The update is done as follows:
moment_out = decay * moment + (1 - decay) * grad * grad
param_out = param - learning_rate * grad / (sqrt(moment_out) + epsilon)
$$
moment\_out = decay * moment + (1 - decay) * grad * grad \\
param\_out = param - \frac{learning\_rate * grad}{\sqrt{moment\_out} + epsilon}
$$
The original paper(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
does not have an epsilon attribute. It is added here for numerical
stability to avoid the division by zero error.
)DOC"
);
}
...
...
paddle/operators/dropout_op.cc
浏览文件 @
483947c4
...
...
@@ -43,22 +43,24 @@ class DropoutOpMaker : public framework::OpProtoAndCheckerMaker {
DropoutOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddAttr
<
float
>
(
"dropout_prob"
,
"Probability of setting units to zero."
)
.
SetDefault
(
.5
f
);
AddAttr
<
bool
>
(
"is_training"
,
"Whether in training phase."
).
SetDefault
(
true
);
AddAttr
<
int
>
(
"seed"
,
"Dropout random seed."
).
SetDefault
(
0
);
AddInput
(
"X"
,
"The input of dropout op."
);
AddOutput
(
"Out"
,
"The output of dropout op."
);
AddOutput
(
"Mask"
,
"The random sampled dropout mask."
).
AsIntermediate
();
AddAttr
<
float
>
(
"dropout_prob"
,
"Probability of setting units to zero."
)
.
SetDefault
(
.5
f
);
AddAttr
<
bool
>
(
"is_training"
,
"True if in training phase."
).
SetDefault
(
true
);
AddAttr
<
int
>
(
"seed"
,
"Dropout random seed."
).
SetDefault
(
0
);
AddComment
(
R"DOC(
Dropout Operator.
'Dropout'
refers to randomly dropping out units in a nerual network. It is a
Dropout
refers to randomly dropping out units in a nerual network. It is a
regularization technique for reducing overfitting by preventing neuron
co-adaption during training. The dropout operator randomly set (according to
the given dropout probability) the outputs of some units to zero, while others
being set to their inputs.
are set equal to their corresponding inputs.
)DOC"
);
}
};
...
...
paddle/operators/dynamic_recurrent_op.cc
浏览文件 @
483947c4
...
...
@@ -386,12 +386,13 @@ class DynamicRecurrentOpProtoAndCheckerMaker
RNNAlgorithm
::
kArgNames
[
RNNAlgorithm
::
ComputeMode
::
kForward
];
// inputs and outputs stored in proto
AddInput
(
name
.
inlinks
,
"
t
he inputs that need to be segmented for each step."
)
"
T
he inputs that need to be segmented for each step."
)
.
AsDuplicable
();
AddInput
(
name
.
initial_states
,
"
variables to initializ
e states."
)
AddInput
(
name
.
initial_states
,
"
Variables to initialize th
e states."
)
.
AsDuplicable
();
AddOutput
(
name
.
outlinks
,
"the outputs that need to concated for all steps."
)
AddOutput
(
name
.
outlinks
,
"The outputs that need to be concatenated for all steps."
)
.
AsDuplicable
();
AddOutput
(
name
.
step_scopes
,
"step scopes"
);
...
...
@@ -399,7 +400,12 @@ class DynamicRecurrentOpProtoAndCheckerMaker
AddAttr
<
std
::
vector
<
std
::
string
>>
(
name
.
ex_states
,
"names of ex_states"
);
AddAttr
<
std
::
vector
<
std
::
string
>>
(
name
.
states
,
"names of states"
);
AddComment
(
"This is a RNN operator for varience-length sequences."
);
AddComment
(
R"DOC(
Dynamic Recurrent Operator.
This is a RNN operator for varience-length sequences.
)DOC"
);
}
};
...
...
paddle/operators/elementwise_add_op.cc
浏览文件 @
483947c4
...
...
@@ -22,7 +22,7 @@ class ElementwiseAddOpMaker : public ElementwiseOpMaker {
ElementwiseAddOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
ElementwiseOpMaker
(
proto
,
op_checker
)
{
SetComment
(
"
add"
,
"Out = X + Y
"
);
SetComment
(
"
Add"
,
"$Out = X + Y$
"
);
AddComment
(
comment_
);
}
};
...
...
paddle/operators/elementwise_div_op.cc
浏览文件 @
483947c4
...
...
@@ -22,7 +22,7 @@ class ElementwiseDivOpMaker : public ElementwiseOpMaker {
ElementwiseDivOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
ElementwiseOpMaker
(
proto
,
op_checker
)
{
SetComment
(
"Div"
,
"
Out = X / Y
"
);
SetComment
(
"Div"
,
"
$Out = X / Y$
"
);
AddComment
(
comment_
);
}
};
...
...
paddle/operators/elementwise_mul_op.cc
浏览文件 @
483947c4
...
...
@@ -23,7 +23,7 @@ class ElementwiseMulOpMaker : public ElementwiseOpMaker {
ElementwiseMulOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
ElementwiseOpMaker
(
proto
,
op_checker
)
{
SetComment
(
"Mul"
,
"
Out = X ⊙ Y
"
);
SetComment
(
"Mul"
,
"
$Out = X
\\
odot
\\
Y$
"
);
AddComment
(
comment_
);
}
};
...
...
paddle/operators/elementwise_op.h
浏览文件 @
483947c4
...
...
@@ -46,29 +46,33 @@ class ElementwiseOpMaker : public framework::OpProtoAndCheckerMaker {
ElementwiseOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
R"DOC(
The first input of elementwise op, it's a tensor of any dimensions.
)DOC"
);
AddInput
(
"Y"
,
R"DOC(
The sencond input of elementwise op, it's a tensor and it's dimensions
must be small or equal to X's dimensions.
)DOC"
);
AddInput
(
"X"
,
"(Tensor) The first input tensor of elementwise op"
);
AddInput
(
"Y"
,
"(Tensor) The second input tensor of elementwise op"
);
AddOutput
(
"Out"
,
"The output of elementwise op"
);
AddAttr
<
int
>
(
"axis"
,
R"DOC(
When the shape(Y) does not equal the shape(X),Y will be broadcasted
to match the shape of X and axis should be dimension index Y in X
)DOC"
)
"(int, default -1) The starting dimension index "
"for broadcasting Y onto X"
)
.
SetDefault
(
-
1
)
.
EqualGreaterThan
(
-
1
);
AddOutput
(
"Out"
,
"The output of elementwise op"
);
comment_
=
R"DOC(
Limited elementwise {name} operator.The equation is: Out = {equation}.
1. The shape of Y should be same with X or
2. Y's shape is a subset of X.
Y will be broadcasted to match the shape of X and axis should be dimension index Y in X.
Limited Elementwise {name} Operator.
The equation is:
{equation}
example:
X is a tensor of any dimension and the dimensions of tensor Y must be smaller than
or equal to the dimensions of X.
There are two cases for this operator:
1. The shape of Y is same with X;
2. The shape of Y is a subset of X.
For case 2:
Y will be broadcasted to match the shape of X and axis should be
the starting dimension index for broadcasting Y onto X.
example:
shape(X) = (2, 3, 4, 5), shape(Y) = (,)
shape(X) = (2, 3, 4, 5), shape(Y) = (5,)
shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5)
...
...
@@ -76,7 +80,8 @@ Limited elementwise {name} operator.The equation is: Out = {equation}.
shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
Both the input X and Y can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with input X.
or not. But the output only shares the LoD information with input X.
)DOC"
;
AddComment
(
comment_
);
}
...
...
paddle/operators/elementwise_sub_op.cc
浏览文件 @
483947c4
...
...
@@ -22,7 +22,7 @@ class ElementwiseSubOpMaker : public ElementwiseOpMaker {
ElementwiseSubOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
ElementwiseOpMaker
(
proto
,
op_checker
)
{
SetComment
(
"Sub"
,
"
Out = X - Y
"
);
SetComment
(
"Sub"
,
"
$Out = X - Y$
"
);
AddComment
(
comment_
);
}
};
...
...
paddle/operators/feed_op.cc
浏览文件 @
483947c4
...
...
@@ -59,8 +59,13 @@ class FeedOpInfoMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input of feed op"
);
AddOutput
(
"Out"
,
"The output of feed op"
);
AddComment
(
"feed op, it should not be configured by users directly"
);
AddAttr
<
int
>
(
"col"
,
"column of feed"
);
AddAttr
<
int
>
(
"col"
,
"(int) The column of feed"
);
AddComment
(
R"DOC(
Feed Operator.
It should not be configured by users directly.
)DOC"
);
}
};
...
...
paddle/operators/fetch_op.cc
浏览文件 @
483947c4
...
...
@@ -66,8 +66,13 @@ class FetchOpInfoMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input of fetch op"
);
AddOutput
(
"Out"
,
"The output of fetch op"
);
AddComment
(
"fetch op, it should not be configured by users directly"
);
AddAttr
<
int
>
(
"col"
,
"column of fetch"
);
AddAttr
<
int
>
(
"col"
,
"(int) The column of fetch"
);
AddComment
(
R"DOC(
Fetch Operator.
It should not be configured by users directly.
)DOC"
);
}
};
}
// namespace operators
...
...
paddle/operators/fill_constant_batch_size_like_op.cc
浏览文件 @
483947c4
...
...
@@ -70,11 +70,16 @@ class FillConstantBatchSizeLikeOpMaker
"with the specified value"
);
AddAttr
<
std
::
vector
<
int
>>
(
"shape"
,
"(vector<int>) The shape of the output"
);
AddAttr
<
int
>
(
"dim_idx"
,
"(int, default 0)
t
he index of batch size dimension"
)
"(int, default 0)
T
he index of batch size dimension"
)
.
SetDefault
(
0
);
AddAttr
<
float
>
(
"value"
,
"(float, default 0) The value to be filled"
)
.
SetDefault
(
0.0
f
);
AddComment
(
R"DOC(Fill up a variable with specified constant value.)DOC"
);
AddComment
(
R"DOC(
FillConstantBatchSizeLike Operator.
Fill up a variable with specified constant value.
)DOC"
);
}
};
}
// namespace operators
...
...
paddle/operators/fill_constant_op.cc
浏览文件 @
483947c4
...
...
@@ -54,7 +54,12 @@ class FillConstantOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"Out"
,
"(Tensor) Tensor of specified shape will be filled "
"with the specified value"
);
AddComment
(
R"DOC(Fill up a variable with specified constant value.)DOC"
);
AddComment
(
R"DOC(
FillConstantBatchSizeLike Operator.
Fill up a variable with specified constant value.
)DOC"
);
}
};
}
// namespace operators
...
...
paddle/operators/fill_zeros_like_op.cc
浏览文件 @
483947c4
...
...
@@ -37,11 +37,13 @@ class FillZerosLikeOpMaker : public framework::OpProtoAndCheckerMaker {
framework
::
OpAttrChecker
*
op_checker
)
:
framework
::
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input of fill-zeros-like op."
);
AddOutput
(
"Y"
,
"The vari
ba
le will be filled up with zeros."
);
AddOutput
(
"Y"
,
"The vari
ab
le will be filled up with zeros."
);
AddComment
(
R"DOC(
Fill up a vriable with zeros.
FillZerosLike Operator.
Fill up a variable with zeros.
The output will have the same size as the input.
The output will have the same size with input.
)DOC"
);
}
};
...
...
paddle/operators/gather_op.cc
浏览文件 @
483947c4
...
...
@@ -67,11 +67,28 @@ class GatherOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The source input of gather op"
);
AddInput
(
"Index"
,
"The index input of gather op"
);
AddOutput
(
"Out"
,
"The output of
add
op"
);
AddOutput
(
"Out"
,
"The output of
gather
op"
);
AddComment
(
R"DOC(
Gather Operator by selecting from the first axis,
Gather Operator.
$Out = X[Index]$
Out is obtained by gathering entries of the outer-most dimension
of X indexed by Index and concatenate them together.
Example:
X = [[1, 2],
[3, 4],
[5, 6]]
Index = [[1, 2]]
Then:
Out = [[3, 4],
[5, 6]]
Out = X[Index]
)DOC"
);
}
};
...
...
paddle/operators/gaussian_random_op.cc
浏览文件 @
483947c4
...
...
@@ -68,21 +68,35 @@ class GaussianRandomOpMaker : public framework::OpProtoAndCheckerMaker {
GaussianRandomOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
framework
::
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddOutput
(
"Out"
,
"output matrix of random op"
);
AddComment
(
R"DOC(
GaussianRandom operator.
Use to initialize tensor with gaussian random generator.
)DOC"
);
AddOutput
(
"Out"
,
"Output matrix of gaussian random op"
);
AddAttr
<
std
::
vector
<
int
>>
(
"shape"
,
"The dimension of random tensor."
);
AddAttr
<
float
>
(
"mean"
,
"mean of random tensor."
).
SetDefault
(
.0
f
);
AddAttr
<
float
>
(
"std"
,
"std of random tensor."
).
SetDefault
(
1.0
f
);
AddAttr
<
std
::
vector
<
int
>>
(
"shape"
,
"(vector<int>) "
"The dimension of random tensor."
);
AddAttr
<
float
>
(
"mean"
,
"(float, default 0.0) "
"mean of random tensor."
)
.
SetDefault
(
.0
f
);
AddAttr
<
float
>
(
"std"
,
"(float, default 1.0) "
"std of random tensor."
)
.
SetDefault
(
1.0
f
);
AddAttr
<
int
>
(
"seed"
,
"(int, default 0) "
"Random seed of generator."
"0 means use system wide seed"
)
"0 means use system wide seed
.
"
)
.
SetDefault
(
0
);
AddAttr
<
int
>
(
"data_type"
,
"output data type"
)
AddAttr
<
int
>
(
"data_type"
,
"(int, default 5(FP32)) "
"Output data type."
)
.
SetDefault
(
framework
::
DataType
::
FP32
);
AddComment
(
R"DOC(
GaussianRandom Operator.
Used to initialize tensors with gaussian random generator.
)DOC"
);
}
};
...
...
paddle/operators/gru_unit_op.cc
浏览文件 @
483947c4
...
...
@@ -80,19 +80,21 @@ class GRUUnitOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"HiddenPrev"
,
"(Tensor) Matrix with shape [batch_size, frame_size] for the "
"states of previous time step."
);
AddInput
(
"Weight"
,
AddInput
(
"Weight"
,
"(Tensor) Weight matrix with shape [frame_size, frame_size * 3]. "
"The elements continuous in memory can be divided into two parts. "
"The first part are weights of the update gate and reset gate "
"with shape [frame_size, frame_size * 2], and the second part are "
"weights of output candidate with shape [frame_size, frame_size]"
);
AddInput
(
"Bias"
,
"(Tensor) Bias vector with shape [1, frame_size * 3] concating "
"weights of output candidate with shape [frame_size, frame_size]."
);
AddInput
(
"Bias"
,
"(Tensor) Bias vector with shape [1, frame_size * 3] concatenating "
"bias of the update gate, reset gate and output candidate."
)
.
AsDispensable
();
AddOutput
(
"Gate"
,
"(Tensor) Matrix with shape [batch_size, frame_size * 3] for the "
"output of update gate, reset gate and output candidate"
)
"output of update gate, reset gate and output candidate
.
"
)
.
AsIntermediate
();
AddOutput
(
"ResetHiddenPrev"
,
"(Tensor) Matrix with shape [batch_size, frame_size] for the "
...
...
@@ -112,16 +114,19 @@ class GRUUnitOpMaker : public framework::OpProtoAndCheckerMaker {
.
SetDefault
(
sigmoid
)
.
InEnum
({
identity
,
sigmoid
,
tanh
,
relu
});
AddComment
(
R"DOC(
GRUUnit
Op implements part calculations of the GRU unit as following:
GRUUnit
Operator.
\f[
update \ gate: u_t = actGate(xu_t + W_u * hidden_prev + bias_u) \\
reset \ gate: r_t = actGate(xr_t + W_r * hidden_prev + bias_r) \\
output \ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, hidden_prev) + bias_c) \\
output: h_t = dot((1-u_t), {h}_t) + dot(u_t, hidden_prev)
\f]
This operator implements partial calculations of the GRU unit as follows:
$$
update \ gate: u_t = actGate(xu_t + W_u * hidden_{prev} + bias_u) \\
reset \ gate: r_t = actGate(xr_t + W_r * hidden_{prev} + bias_r) \\
output \ candidate: {h}_t = actNode({xc}_t + W_c * dot(r_t, hidden_{prev}) + bias_c) \\
output: h_t = dot((1-u_t), {h}_t) + dot(u_t, hidden_{prev})
$$
The rest of GRU unit can be completed by using FCOp's output as the input of GRUUnitOp.
)DOC"
);
}
};
...
...
paddle/operators/huber_loss_op.cc
浏览文件 @
483947c4
...
...
@@ -59,10 +59,12 @@ class HuberLossOpMaker : public framework::OpProtoAndCheckerMaker {
"The shape is same as Input(X) and will be reused in backward."
)
.
AsIntermediate
();
AddOutput
(
"Out"
,
"The output tensor with shape [batch_size, 1]
which represents
"
"the huber loss."
);
"The output tensor with shape [batch_size, 1] "
"
which represents
the huber loss."
);
AddAttr
<
AttrType
>
(
"delta"
,
"Hyper parameter in huber loss."
);
AddComment
(
R"DOC(
HuberLoss Operator.
Huber loss is a loss function used in robust regression. We define X as the
input value and Y as the target value. Huber loss can evaluate the fitness of
X to Y. Different from MSE loss, Huber loss is more robust for outliers. The
...
...
paddle/operators/increment_op.cc
浏览文件 @
483947c4
...
...
@@ -39,14 +39,18 @@ class IncrementOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(Tensor) The input tensor of increment operator"
);
AddOutput
(
"Out"
,
"(Tensor) The output tensor of increment operator."
);
AddComment
(
R"DOC(Increment operator
The equation is: Out = X + step
)DOC"
);
AddAttr
<
AttrType
>
(
"step"
,
"(float, default 1.0) "
"The step size by which the "
"input tensor will be incremented."
)
.
SetDefault
(
1.0
);
AddComment
(
R"DOC(
Increment Operator.
The equation is:
$$Out = X + step$$
)DOC"
);
}
};
...
...
paddle/operators/l1_norm_op.cc
浏览文件 @
483947c4
...
...
@@ -57,7 +57,7 @@ L1 Norm Operator.
Computes the L1 norm of a tensor.
Out = sum (abs(X))
$$Out = \sum{|X|}$$
)DOC"
);
}
...
...
paddle/operators/linear_chain_crf_op.cc
浏览文件 @
483947c4
...
...
@@ -22,52 +22,55 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker {
LinearChainCRFOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"Emission"
,
"(LoDTensor, default: LoDTensor<float>). "
"The unscaled emission weight matrix for the linear chain CRF. "
"This input is a LoDTensor with shape [N x D] where N is the size of "
"the mini-batch and D is the total tag number."
);
AddInput
(
"Transition"
,
"(Tensor, default: Tensor<float>). A Tensor with shape [(D + 2) x D]. "
"The learnable parameter for the linear_chain_crf operator. "
"See more details in the operator's comments."
);
AddInput
(
"Label"
,
"(LoDTensor, default: LoDTensor<int>). The ground truth which is a 2-D "
"LoDTensor with shape [N x 1], where N is the total element number in "
"a mini-batch."
);
AddInput
(
"Emission"
,
"(LoDTensor, default LoDTensor<float>) "
"A 2-D LoDTensor with shape [N x D], where N is the size of the "
"mini-batch and D is the total tag number. The unscaled emission "
"weight matrix for the linear chain CRF. "
);
AddInput
(
"Transition"
,
"(Tensor, default Tensor<float>) A 2-D Tensor with shape "
"[(D + 2) x D]. The learnable parameter for the linear_chain_crf "
"operator. See more details in the operator's comments."
);
AddInput
(
"Label"
,
"(LoDTensor, default LoDTensor<int>) A LoDTensor with shape "
"[N x 1], where N is the total element number in a mini-batch. "
"The ground truth."
);
AddOutput
(
"Alpha"
,
"
Tensor, default: Tensor<float>. The forward vectors for the entire
"
"
batch. A two dimensional tensor with shape [N x D],
"
"
denoted as
\f
$
\a
lpha
\f
$.
\f
$
\a
lpha$
\f
is a memo table used to
"
"
calculate the normalization factor in CRF.
\f
$
\a
lpha[k, v]$
\f
stores
"
"
the unnormalized probabilites of all possible unfinished sequences of
"
"
tags that end at
position
\f
$k$
\f
with tag
\f
$v$
\f
. For each
\f
$k$
\f
, "
"
(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D].
"
"
The forward vectors for the entire batch. Denote it as
\f
$
\a
lpha
\f
$.
"
"
\f
$
\a
lpha$
\f
is a memo table used to calculate the normalization
"
"
factor in CRF.
\f
$
\a
lpha[k, v]$
\f
stores the unnormalized
"
"
probabilites of all possible unfinished sequences of tags that end at
"
"position
\f
$k$
\f
with tag
\f
$v$
\f
. For each
\f
$k$
\f
, "
"
\f
$
\a
lpha[k, v]$
\f
is a vector of length
\f
$D$
\f
with a component for "
"each tag value
\f
$v$
\f
. This vector is called a forward vecotr and "
"will also be used in backward computations."
)
.
AsIntermediate
();
AddOutput
(
"EmissionExps"
,
AddOutput
(
"EmissionExps"
,
"(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. "
"The exponentials of Input(Emission). This is an intermediate "
"computational result in forward computation, and will be reused
"
"in
backward computation."
)
"computational result in forward computation, and will be reused in
"
"
backward computation."
)
.
AsIntermediate
();
AddOutput
(
"TransitionExps"
,
"The exponentials of Input(Transition). This is an intermediate "
"computational result in forward computation, and will be reused "
"in backward computation."
)
AddOutput
(
"TransitionExps"
,
"(Tensor, default Tensor<float>) A 2-D Tensor with shape "
"[(D + 2) x D]. The exponentials of Input(Transition). This is an "
"intermediate computational result in forward computation, and "
"will be reused in backward computation."
)
.
AsIntermediate
();
AddOutput
(
"LogLikelihood"
,
"(Tensor, default
: Tensor<float>).
The logarithm of the conditional "
"(Tensor, default
Tensor<float>)
The logarithm of the conditional "
"likelihood of each training sample in a mini-batch. This is a 2-D "
"tensor with shape [S x 1], where S is the sequence number in a "
"mini-batch. Note: S is equal to the sequence number in a mini-batch. "
"The output is no longer a LoDTensor."
);
AddComment
(
R"DOC(
LinearChainCRF Operator.
Conditional Random Field defines an undirected probabilistic graph with nodes
denoting random variables and edges denoting dependencies between these
variables. CRF learns the conditional probability \f$P(Y|X)\f$, where
...
...
@@ -81,29 +84,28 @@ and output must be linear sequences. Thus, the graph of such a CRF is a simple
chain or a line, which results in the linear chain CRF.
This operator implements the Forward-Backward algorithm for the linear chain
CRF. Please
see
http://www.cs.columbia.edu/~mcollins/fb.pdf and
http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for
reference
.
CRF. Please
refer to
http://www.cs.columbia.edu/~mcollins/fb.pdf and
http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for
details
.
Equation:
- Denote Input(Emission) to this operator as \f$x\f$ here.
- The first D values of Input(Transition) to this operator are for starting
1. Denote Input(Emission) to this operator as \f$x\f$ here.
2. The first D values of Input(Transition) to this operator are for starting
weights, denoted as \f$a\f$ here.
-
The next D values of Input(Transition) of this operator are for ending
3.
The next D values of Input(Transition) of this operator are for ending
weights, denoted as \f$b\f$ here.
-
The remaning values of Input(Transition) are for transition weights,
4.
The remaning values of Input(Transition) are for transition weights,
denoted as \f$w\f$ here.
-
Denote Input(Label) as \f$s\f$ here.
5.
Denote Input(Label) as \f$s\f$ here.
The probability of a sequence \f$s\f$ of length \f$L\f$ is defined as:
\f$P(s) = (1/Z) exp(a_{s_1} + b_{s_L}
\f$P(s) = (1/Z)
\
exp(a_{s_1} + b_{s_L}
+ \sum_{l=1}^L x_{s_l}
+ \sum_{l=2}^L w_{s_{l-1},s_l})\f$
where \f$Z\f$ is a normalization value so that the sum of \f$P(s)\f$ over
all possible sequences is \f$1\f$, and \f$x\f$ is the emission feature weight
to the linear chain CRF.
Finaly, the linear chain CRF operator outputs the logarithm of the conditional
Final
l
y, the linear chain CRF operator outputs the logarithm of the conditional
likelihood of each training sample in a mini-batch.
NOTE:
...
...
@@ -179,8 +181,8 @@ class LinearChainCRFOp : public framework::OperatorWithKernel {
}
protected:
// Explicitly set that the data type of
output of the
linear_chain_crf
//
operator
is determined by its input "Emission".
// Explicitly set that the data type of
computation kernel of
linear_chain_crf
// is determined by its input "Emission".
framework
::
DataType
IndicateDataType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
ToDataType
(
ctx
.
Input
<
LoDTensor
>
(
"Emission"
)
->
type
());
...
...
paddle/operators/linear_chain_crf_op.h
浏览文件 @
483947c4
...
...
@@ -134,7 +134,7 @@ class LinearChainCRFOpKernel : public framework::OpKernel<T> {
Tensor
emission_row_max
;
emission_row_max
.
mutable_data
<
T
>
(
framework
::
make_ddim
({
static_cast
<
int
>
(
batch_size
),
1
}),
framework
::
make_ddim
({
static_cast
<
int
64_t
>
(
batch_size
),
1
}),
platform
::
CPUPlace
());
auto
place
=
ctx
.
GetEigenDevice
<
platform
::
CPUPlace
>
();
...
...
@@ -273,7 +273,7 @@ class LinearChainCRFOpKernel : public framework::OpKernel<T> {
const
int
*
lbl
=
label
.
data
<
int
>
();
PADDLE_ENFORCE_LT
(
*
std
::
max_element
(
lbl
,
lbl
+
seq_length
),
tag_num
,
static_cast
<
size_t
>
(
*
std
::
max_element
(
lbl
,
lbl
+
seq_length
)
),
tag_num
,
"An invalid tag label that execesses the largest tag number."
);
// Calculate the nominator part, which depends on the label sequence.
...
...
paddle/operators/load_op.cc
浏览文件 @
483947c4
...
...
@@ -115,14 +115,18 @@ class LoadOpProtoMaker : public framework::OpProtoAndCheckerMaker {
LoadOpProtoMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddOutput
(
"Out"
,
"The tensor need to be loaded"
);
AddComment
(
R"DOC(Load Operator
Load operator will load a tensor variable from disk file.
)DOC"
);
AddOutput
(
"Out"
,
"(Tensor) The tensor need to be loaded"
);
AddAttr
<
std
::
string
>
(
"file_path"
,
"(string) "
"Variable will be loaded from
\"
file_path
\"
."
)
.
AddCustomChecker
(
[](
const
std
::
string
&
path
)
{
return
!
path
.
empty
();
});
AddComment
(
R"DOC(
Load Operator.
Load operator will load a tensor variable from disk file.
)DOC"
);
}
};
}
// namespace operators
...
...
paddle/operators/lod_rank_table_op.cc
0 → 100644
浏览文件 @
483947c4
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/lod_rank_table.h"
#include "paddle/framework/op_registry.h"
namespace
paddle
{
namespace
operators
{
class
LoDRankTableOp
:
public
framework
::
OperatorBase
{
public:
LoDRankTableOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
void
Run
(
const
framework
::
Scope
&
scope
,
const
platform
::
DeviceContext
&
dev_ctx
)
const
override
{
auto
x
=
scope
.
FindVar
(
Input
(
"X"
))
->
Get
<
framework
::
LoDTensor
>
();
auto
*
out
=
scope
.
FindVar
(
Output
(
"Out"
))
->
GetMutable
<
framework
::
LoDRankTable
>
();
out
->
Reset
(
x
.
lod
(),
static_cast
<
size_t
>
(
Attr
<
int
>
(
"level"
)));
}
};
class
LoDRankTableOpProtoMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
LoDRankTableOpProtoMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(LoDTensor) input lod tensor, must contain lod information."
);
AddOutput
(
"Out"
,
"(LoDRankTable) The rank table of specific level."
);
AddAttr
<
int
>
(
"level"
,
"(int) the specific lod level to rank."
)
.
SetDefault
(
0
)
.
EqualGreaterThan
(
0
);
AddComment
(
R"DOC(Create LoDRanTable by LoDTensor
LoD Rank Table stores the `level` of `lod` which is ordered by sequence
length in descending order. It is useful when implement dynamic RNN and is
shared by dynamic RNN memory, dynamic RNN slice input and dynamic RNN slice
output operators.
)DOC"
);
}
};
class
LoDRankTableInferShape
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
context
)
const
override
{
PADDLE_ENFORCE
(
context
->
HasInput
(
"X"
),
"LoDRankTable must has input X"
);
}
};
class
LoDRankTableInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDescBind
&
op_desc
,
framework
::
BlockDescBind
*
block
)
const
override
{
for
(
auto
&
o
:
op_desc
.
Output
(
"Out"
))
{
block
->
Var
(
o
)
->
SetType
(
framework
::
VarDesc
::
LOD_RANK_TABLE
);
}
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
lod_rank_table
,
paddle
::
operators
::
LoDRankTableOp
,
paddle
::
operators
::
LoDRankTableOpProtoMaker
,
paddle
::
operators
::
LoDRankTableInferShape
,
paddle
::
operators
::
LoDRankTableInferVarType
,
paddle
::
framework
::
EmptyGradOpMaker
);
paddle/operators/lookup_table_op.cc
浏览文件 @
483947c4
...
...
@@ -53,21 +53,27 @@ class LookupTableOpMaker : public framework::OpProtoAndCheckerMaker {
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"W"
,
"An input represents embedding tensors,"
"
which is a learnable parameter."
);
"An input represents embedding tensors,
"
"which is a learnable parameter."
);
AddInput
(
"Ids"
,
"An input with type int32 or int64"
"contains the ids to be looked up in W."
"Ids must be a column vector with rank = 2."
"The 2nd dimension size must be 1"
);
AddOutput
(
"Out"
,
"The lookup results, which have the same type with W."
);
AddAttr
<
bool
>
(
"is_sparse"
,
"Sparse update"
).
SetDefault
(
false
);
"An input with type int32 or int64 "
"contains the ids to be looked up in W. "
"Ids must be a column vector with rank = 2. "
"The 2nd dimension size must be 1."
);
AddOutput
(
"Out"
,
"The lookup results, which have the same type as W."
);
AddAttr
<
bool
>
(
"is_sparse"
,
"(boolean, default false) "
"Sparse update"
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
Lookup Table Operator.
This operator is used to perform lookups on the parameter W,
then concatenated into a dense tensor.
The input `Ids` can carry the LoD (Level of Details) information,
or not. And the output only shares the LoD with input `Ids`.
The input Ids can carry the LoD (Level of Details) information,
or not. And the output only shares the LoD information with input Ids.
)DOC"
);
}
};
...
...
paddle/operators/lrn_op.cc
浏览文件 @
483947c4
...
...
@@ -45,72 +45,70 @@ class LRNOpMaker : public framework::OpProtoAndCheckerMaker {
public:
LRNOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
R"DOC(
(Tensor) The input of LRN operator. It must be a 4D tenor with NCHW format.
)DOC"
);
AddInput
(
"X"
,
"(Tensor) The input of LRN operator. "
"It must be a 4D tenor with NCHW format."
);
AddOutput
(
"Out"
,
"(Tensor) The output of LRN operator, which is also the 4D "
"tensor with NCHW format."
);
AddOutput
(
"MidOut"
,
R"Doc(
(Tensor)Middle result of lrn op.It's computed in forward process
and also used in backward process.
)Doc"
);
AddAttr
<
int
>
(
"n"
,
R"DOC(
(int, default 5)n is “adjacent” kernel maps at the same spatial position.
)DOC
"
)
AddOutput
(
"MidOut"
,
"(Tensor) Middle result of LRN operator. It's computed in "
"forward process and also used in backward process."
);
AddAttr
<
int
>
(
"n"
,
"(int default 5) "
"n is the
\"
adjacent
\"
kernel that maps "
"at the same spatial position.
"
)
.
SetDefault
(
5
)
.
GreaterThan
(
0
);
AddAttr
<
T
>
(
"k"
,
R"DOC(
(float, default 2.0)k is the bias.
)DOC
"
)
AddAttr
<
T
>
(
"k"
,
"(float, default 2.0) "
"k is the bias.
"
)
.
SetDefault
(
2.0
)
.
GreaterThan
(
0.0
);
AddAttr
<
T
>
(
"alpha"
,
R"DOC(
(float, default 0.0001)alpha is the scale number.
)DOC
"
)
AddAttr
<
T
>
(
"alpha"
,
"(float, default 0.0001) "
"alpha is the scale number.
"
)
.
SetDefault
(
0.0001
)
.
GreaterThan
(
0.0
);
AddAttr
<
T
>
(
"beta"
,
R"DOC(
(float, default 0.75)beta is the power number.
)DOC
"
)
AddAttr
<
T
>
(
"beta"
,
"(float, default 0.75) "
"beta is the power number.
"
)
.
SetDefault
(
0.75
)
.
GreaterThan
(
0.0
);
AddComment
(
R"DOC(
Local Response Normalization
.
Local Response Normalization Operator
.
This Function
comes from the paper
"ImageNet Classification with Deep Convolutional Neural Networks".
This operator
comes from the paper
"ImageNet Classification with Deep Convolutional Neural Networks".
The original formula is:
The original formula is:
Input(i, x, y)
Output(i, x, y) = ----------------------------------------------
-- upper
(k + alpha * > (Input(j, x, y))^2) ^ (beta)
-- j = lower
$$
Output(i, x, y) = Input(i, x, y) / \left(
k + \alpha \sum\limits^{\min(C, c + n/2)}_{j = \max(0, c - n/2)}
(Input(j, x, y))^2
\right)^{\beta}
$$
upper is `min(C, c + n/2)`
lower if `max(0, c - n/2)`
Function implementation:
Function implementation:
Inputs and outpus are in NCHW format, while input.shape.ndims() equals 4.
And dimensions 0 ~ 3 represent batch size, feature maps, rows,
and columns, respectively.
inputs and outpus is NCHW format, while input.shape.ndims() is equal 4.
And the meaning of each dimension(0-3) is respectively batch size,
feature maps, rows and columns.
Input and Output in the formula above is for each map(i) of one image, and
Input(i, x, y), Output(i, x, y) represents an element in an image.
Input and Output in the above formula is for each map(i) of one image, and
Input(i, x, y), Output(i, x, y) represents an element in an image.
C is the number of feature maps of one image. n is a hyper-parameter
configured when operator is initialized. The sum in the denominator
is the sum of the same positions in the neighboring maps.
C is the number of feature maps of one image, and n is a hyper-parameters
is configured when Function is initialized. The sum in the denominator
is the sum of the same position in the neighboring maps.
)DOC"
);
)DOC"
);
}
};
...
...
paddle/operators/lstm_op.cc
浏览文件 @
483947c4
...
...
@@ -103,7 +103,7 @@ class LSTMOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"H0"
,
"(Tensor, optional) the initial hidden state is an optional "
"input. This is a tensor with shape (N x D), where N is the "
"batch size
,
D is the hidden size."
)
"batch size
and
D is the hidden size."
)
.
AsDispensable
();
AddInput
(
"C0"
,
"(Tensor, optional) the initial cell state is an optional "
...
...
@@ -134,85 +134,82 @@ class LSTMOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"BatchGate"
,
"(LoDTensor) This LoDTensor contains input gate, forget gate "
"and output gate after the nonlinear computation. This "
"LoDTensor has the same shape
with
the reorganized input, which "
"LoDTensor has the same shape
as
the reorganized input, which "
"is also be called batch input. The LoD size is 2. The first "
"LoD is the batch offsets and the second LoD contains the "
"indexes, which denote the position of reorganized sequence "
"in the raw input."
)
.
AsIntermediate
();
AddOutput
(
"BatchCellPreAct"
,
"(LoDTensor) This LoDTensor is
got
in the forward and used "
"(LoDTensor) This LoDTensor is
obtained
in the forward and used "
"in the backward."
)
.
AsIntermediate
();
AddAttr
<
bool
>
(
"usePeepholes"
,
"(bool, defa
lut:
True) "
"(bool, defa
ult
True) "
"whether to enable diagonal/peephole connections."
)
.
SetDefault
(
true
);
AddAttr
<
bool
>
(
"isReverse"
,
"(bool, defa
lut:
False) "
"(bool, defa
ult
False) "
"whether to compute reversed LSTM."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
string
>
(
"gateActivation"
,
"(string, default
:
sigmoid)"
"(string, default sigmoid)"
"The activation for input gate, forget gate and output "
"gate, `sigmoid` by default."
)
.
SetDefault
(
"sigmoid"
);
AddAttr
<
std
::
string
>
(
"cellActivation"
,
"(string, default
:
tanh)"
"(string, default tanh)"
"The activation for cell output, `tanh` by defalut."
)
.
SetDefault
(
"tanh"
);
AddAttr
<
std
::
string
>
(
"candidateActivation"
,
"(string, default
:
tanh)"
"(string, default tanh)"
"The activation for candidate hidden state, "
"`tanh` by default."
)
.
SetDefault
(
"tanh"
);
AddComment
(
R"DOC(Long-Short Term Memory (LSTM) Operator
AddComment
(
R"DOC(
Long-Short Term Memory (LSTM) Operator.
The defalut implementation is diagonal/peephole connection
[1], the formula is
as follows
The defalut implementation is diagonal/peephole connection
(https://arxiv.org/pdf/1402.1128.pdf), the formula is as follows:
i_t = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i)
$$
i_t = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i) \\
f_t = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f)
f_t = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f) \\
\tilde{c_t} = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c)
\tilde{c_t} = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c) \\
o_t = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o)
o_t = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o) \\
c_t = f_t ⊙ c_{t-1} + i_t ⊙ \tilde{c_t}
c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c_t} \\
h_t = o_t ⊙ act_h(c_t)
h_t = o_t \odot act_h(c_t)
$$
where the W terms denote weight matrices (e.g. \f$W_{xi}\f$ is the matrix
of weights from the input gate to the input), \f$W_{ic}, W_{fc}, W_{oc}\f$
are diagonal weight matrices for peephole connections. In our imple
nmen
tion,
W
e use vectors to reprenset these diagonal weight matrices. The b terms
are diagonal weight matrices for peephole connections. In our imple
menta
tion,
w
e use vectors to reprenset these diagonal weight matrices. The b terms
denote bias vectors (\f$b_i\f$ is the input gate bias vector), \f$\sigma\f$
is the non-line acti
c
ations, such as logistic sigmoid function, and
\f$i, f, o\f$ and \f$c\f$ are
respectively the input gate, forge
t gate,
output gate and cell activation vectors, all of which ar
e the same size as
is the non-line acti
v
ations, such as logistic sigmoid function, and
\f$i, f, o\f$ and \f$c\f$ are
the input gate, forget gate, outpu
t gate,
and cell activation vectors, respectively, all of which hav
e the same size as
the cell output activation vector \f$h\f$.
The
⊙ is the element-wise product of the vectors,
\f$act_g\f$ and \f$act_h\f$
are the cell input and cell output activation functions
,
`tanh` is usually
The
\f$\odot\f$ is the element-wise product of the vectors.
\f$act_g\f$ and \f$act_h\f$
are the cell input and cell output activation functions
and
`tanh` is usually
used for them. \f$\tilde{c_t}\f$ is also called candidate hidden state,
which is computed based on the current input and the previous hidden state.
Set `usePeepholes` False to disable peephole connection [2]. The formula
Set usePeepholes False to disable peephole connection
(http://www.bioinf.jku.at/publications/older/2604.pdf). The formula
is omitted here.
@note T
hese \f$W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}\f$
operations on the input
x_{t} we
re NOT included in this operator.
Note that t
hese \f$W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}\f$
operations on the input
\f$x_{t}\f$ a
re NOT included in this operator.
Users can choose to use fully-connect operator before LSTM operator.
[1] Hasim Sak, Andrew Senior, and Francoise Beaufays. Long short-term memory
recurrent neural network architectures for large scale acoustic modeling.
INTERSPEECH, 2014.
[2] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory.
Neural Computation, 9(8):1735-1780, 1997.
)DOC"
);
}
};
...
...
paddle/operators/lstm_unit_op.cc
浏览文件 @
483947c4
...
...
@@ -57,17 +57,22 @@ class LstmUnitOpMaker : public framework::OpProtoAndCheckerMaker {
"The cell state tensor of last time-step in the Lstm Unit operator."
);
AddOutput
(
"C"
,
"The cell tensor of Lstm Unit operator."
);
AddOutput
(
"H"
,
"The hidden state tensor of Lstm Unit operator."
);
AddComment
(
R"DOC(Lstm-Unit Operator
AddAttr
<
float
>
(
"forget_bias"
,
"(float, default 0.0) "
"The forget bias of Lstm Unit."
)
.
SetDefault
(
0.0
);
AddComment
(
R"DOC(
Lstm Unit Operator
Equation:
i, f, o, j = split(X)
C = C_prev * sigm(f + forget_bias) + sigm(i) * tanh(j)
H = C * sigm(o)
$$
i, f, o, j = split(X) \\
C = C_{prev} * sigm(f + forget\_bias) + sigm(i) * tanh(j) \\
H = C * sigm(o)
$$
)DOC"
);
AddAttr
<
float
>
(
"forget_bias"
,
"The forget bias of Lstm Unit."
)
.
SetDefault
(
0.0
);
}
};
...
...
paddle/operators/margin_rank_loss_op.cc
浏览文件 @
483947c4
...
...
@@ -55,8 +55,6 @@ class MarginRankLossOpMaker : public framework::OpProtoAndCheckerMaker {
"(2-D tensor with shape [batch_size x 1]) "
"The label indicating X1 ranked higher than X2 or not, "
"can only be +1 or -1."
);
AddAttr
<
T
>
(
"margin"
,
"(scalar, default 0) Margin for MarginRankLossOp."
)
.
SetDefault
(
static_cast
<
T
>
(
0
));
AddOutput
(
"Activated"
,
"(2-D tensor with shape [batch_size x 1]) Intermediate tensor "
"to indicate whether each element of Output(Out) is activated."
)
...
...
@@ -64,23 +62,26 @@ class MarginRankLossOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"Out"
,
"(2-D tensor with shape [batch_size x 1]) "
"The output loss of MarginRankLoss operator."
);
AddAttr
<
T
>
(
"margin"
,
"(scalar, default 0) Margin for MarginRankLossOp."
)
.
SetDefault
(
static_cast
<
T
>
(
0
));
AddComment
(
R"DOC(
MarginRankLoss Operator.
MarginRankLos
s operator measures the loss given a pair of training sample
Thi
s operator measures the loss given a pair of training sample
{`X1`, `X2`} and the `Label` with attribute `margin`, where `Label = +1`
indicating X1 is ranked higher than `X2`
, otherwise `Label = -1`
. The loss
turns out
indicating X1 is ranked higher than `X2`
and `Label = -1` otherwise
. The loss
is calculated as:
loss(X1, X2, Label) = max(0, -Label * (X1 - X2) + margin).
$loss(X1, X2, Label) = \max(0, -Label * (X1 - X2) + margin)$
The attribute `margin`
involved
here helps make the predictions more robust.
The attribute `margin` here helps make the predictions more robust.
Denote the item ranked higher as the positive sample, otherwise the negative
sample. If the score of the two samples satisfies
positive sample - negative sample < margin,
$positive sample - negative sample < margin$
the pair of samples will contribute to the final loss, which will backprop
o
gate
and train the ranking model to enlarge the difference
of the two score
.
the pair of samples will contribute to the final loss, which will backprop
a
gate
and train the ranking model to enlarge the difference
between the two scores
.
For batch input with size `batch_size`, `X1`, `X2` and `Label`
all have the same shape [batch_size x 1].
...
...
paddle/operators/matmul_op.cc
浏览文件 @
483947c4
...
...
@@ -144,7 +144,10 @@ class MatMulOpMaker : public framework::OpProtoAndCheckerMaker {
)DOC"
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
The MatMul operator is used to perform (batched) matrix multiplication
MatMul Operator.
This operator is used to perform (batched) matrix multiplication
over the last two dimensions of the input tensors `X` and `Y`.
If a transpose flag is specified, the last two dimensions of the
...
...
@@ -166,7 +169,8 @@ The differences are:
- We add `transpose_X` and `transpose_Y` flags.
Both the input `X` and `Y` can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with input `X`.
or not. But the output only shares the LoD information with input `X`.
)DOC"
);
}
};
...
...
paddle/operators/mean_op.cc
浏览文件 @
483947c4
...
...
@@ -36,7 +36,11 @@ class MeanOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input of mean op"
);
AddOutput
(
"Out"
,
"The output of mean op"
);
AddComment
(
R"DOC( Mean Operator
AddComment
(
R"DOC(
Mean Operator.
Out is a scalar which is the mean of all elements in X.
)DOC"
);
}
};
...
...
paddle/operators/minus_op.cc
浏览文件 @
483947c4
...
...
@@ -52,14 +52,16 @@ class MinusOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"Y"
,
"The right tensor of minus operator."
);
AddOutput
(
"Out"
,
"The output tensor of minus operator."
);
AddComment
(
R"DOC(Minus Operator
AddComment
(
R"DOC(
Minus Operator.
Equation:
Out = X - Y
$Out = X - Y$
Both the input `X` and `Y` can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with input `X`.
or not. But the output only shares the LoD information with input `X`.
)DOC"
);
}
};
...
...
paddle/operators/modified_huber_loss_op.cc
浏览文件 @
483947c4
...
...
@@ -43,27 +43,35 @@ class ModifiedHuberLossOpMaker : public framework::OpProtoAndCheckerMaker {
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input tensor of modified huber loss op."
"The input tensor of modified huber loss op.
"
"X is 2-D tensor with shape [batch_size, 1]."
);
AddInput
(
"Y"
,
"The target labels of modified huber loss op."
"The shape of Y is same as X. Values of Y must be 0 or 1."
);
"The target labels of modified huber loss op.
"
"The shape of Y is
the
same as X. Values of Y must be 0 or 1."
);
AddOutput
(
"IntermediateVal"
,
"Variable to save intermediate result which will be reused in "
"backward processing."
)
.
AsIntermediate
();
AddOutput
(
"Out"
,
"Classification loss for X."
);
AddComment
(
R"DOC(
Modified huber loss is used in binary classification problem. The shape of
input X and target Y are both [N, 1] and so is the shape of output loss.
Since target Y is not differentiable, cacluating gradient for Y is illegal.
The formulation of modified huber loss is:
L(y, f(x)) = max(0, 1 - yf(x))^2 for yf(x) >= -1,
-4yf(x) otherwise.
Make sure the values of target label Y are in {0, 1} here. The operator will
Modified Huber Loss Operator.
This operator is used in binary classification problem. The shape of
input X and target Y are both [N, 1] and so is the shape of the output loss.
Since target Y is not differentiable, calculating gradient for Y is illegal.
The formula of modified huber loss is:
$$
L(y, f(x)) =
\begin{cases}
(\max(0, 1 - yf(x)))^2, \text{if} \ yf(x) >= -1 \\
-4yf(x), \quad \text{otherwise}
\end{cases}
$$
Make sure the values of target label Y are in {0, 1} here. This operator will
scale values of Y to {-1, +1} when computing losses and gradients.
)DOC"
);
}
};
...
...
paddle/operators/momentum_op.cc
浏览文件 @
483947c4
...
...
@@ -75,17 +75,23 @@ class MomentumOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"VelocityOut"
,
"(Tensor) Output updated velocity"
);
AddAttr
<
float
>
(
"mu"
,
"(float) Momentum coefficient"
);
AddAttr
<
bool
>
(
"useNesterov"
,
"(bool) Use Nesterov Momentum"
)
AddAttr
<
bool
>
(
"useNesterov"
,
"(bool, default false) "
"Use Nesterov Momentum"
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
Momentum Algorithm with a flag for Nestrov Moemntum (momentum).
velocity = mu * velocity + gradient
if (use_nesterov):
param = param - gradient * learning_rate + mu * velocity * learning_rate
else:
param = param - learning_rate * velocity
Momentum Optimizer.
This optimizer has a flag for Nestrov Momentum.
The update equations are as follows:
$$
velocity = mu * velocity + gradient \\
if (use\_nesterov): \\
param = param - gradient * learning\_rate + mu * velocity * learning\_rate \\
else: \\
param = param - learning\_rate * velocity. \\
$$
)DOC"
);
}
...
...
paddle/operators/mul_op.cc
浏览文件 @
483947c4
...
...
@@ -78,6 +78,7 @@ class MulOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"Out"
,
"The output of mul op"
);
AddAttr
<
int
>
(
"x_num_col_dims"
,
"(int, default 1) "
R"DOC(mul_op can take tensors with more than two dimensions as input `X`,
in that case, tensors will be reshaped to a matrix. The matrix's first
dimension(column length) will be the product of tensor's last
...
...
@@ -88,20 +89,24 @@ class MulOpMaker : public framework::OpProtoAndCheckerMaker {
.
EqualGreaterThan
(
1
);
AddAttr
<
int
>
(
"y_num_col_dims"
,
"(int, default 1) "
R"DOC(mul_op can take tensors with more than two dimensions as input `Y`,
in that case, tensors will be reshaped to a matrix. Just like input `X`.
)DOC"
)
.
SetDefault
(
1
)
.
EqualGreaterThan
(
1
);
AddComment
(
R"DOC(
Mul operator is used to perform matrix multiplication for input X and Y.
Mul Operator.
This operator is used to perform matrix multiplication for input X and Y.
The equation is:
Out = X * Y
$$Out = X * Y$$
Both the input `X` and `Y` can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with input `X`.
or not. But the output only shares the LoD information with input `X`.
)DOC"
);
}
};
...
...
paddle/operators/multiplex_op.cc
浏览文件 @
483947c4
...
...
@@ -66,7 +66,8 @@ class MultiplexOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"X"
,
"The candidate tensors of multiplex operator."
)
.
AsDuplicable
();
AddOutput
(
"Out"
,
"The output tensor of multiplex operator."
);
AddComment
(
R"DOC(Multiplex operator
AddComment
(
R"DOC(
Multiplex Operator.
Multiplex multiple tensors according to the index provided by the index tensor.
...
...
@@ -77,10 +78,11 @@ the (Ids[i])-th tensor.
For i-th row of the output tensor:
y[i] = x_{k}[i]
$$y[i] = x_{k}[i]$$
where
y is the output tensor. `x_{k}` is the k-th input tensor
where
`y` is the output tensor, `x_{k}` is the k-th input tensor,
and `k = Ids[i]`.
)DOC"
);
}
};
...
...
paddle/operators/name_convention.md
浏览文件 @
483947c4
...
...
@@ -44,17 +44,21 @@ public:
AddOutput
(
"Out"
,
"(Tensor) Accumulated output tensor"
);
AddAttr
<
float
>
(
"gamma"
,
"(float, default 1.0) Accumulation multiplier"
).
SetDefault
(
1.0
f
);
AddComment
(
R"DOC(
Accumulate operator accumulates the input tensor to the output tensor. If the
Accumulate Operator.
This operator accumulates the input tensor to the output tensor. If the
output tensor already has the right size, we add to it; otherwise, we first
initialize the output tensor to all zeros, and then do accumulation. Any
further calls to the operator, given that no one else fiddles with the output
in the interim, will do simple accumulations.
Accumulation is done as shown:
Accumulation is done as follows:
Out = 1*X + gamma*Out
where X is the input tensor, Out is the output tensor and gamma is the multiplier
argument.
)DOC"
);
}
};
...
...
paddle/operators/nccl_op.cc
浏览文件 @
483947c4
...
...
@@ -48,12 +48,17 @@ class NCCLInitOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddOutput
(
"Communicator"
,
"Create Communicator for communicating between gpus"
);
AddAttr
<
std
::
vector
<
int
>>
(
"gpus"
,
"gpu id lists"
);
AddAttr
<
int
>
(
"data_type"
,
"output data type"
)
AddAttr
<
std
::
vector
<
int
>>
(
"gpus"
,
"(vector<int>) GPU id lists"
);
AddAttr
<
int
>
(
"data_type"
,
"(int, default 5 (FP32)) "
"Output data type"
)
.
SetDefault
(
framework
::
DataType
::
FP32
);
AddComment
(
R"DOC(
create communicator.
)DOC"
);
NCCLInit Operator.
Create communicator.
)DOC"
);
}
};
...
...
@@ -143,11 +148,15 @@ class NCCLAllReduceOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"Communicator"
,
"Communicator for communicating between gpus"
);
AddOutput
(
"Out"
,
"The output of AllReduce op"
);
AddAttr
<
std
::
string
>
(
"reduction"
,
"(string, default 'ncclSum') "
"{'ncclMin', 'ncclMax', 'ncclProd', 'ncclSum'}."
)
.
SetDefault
(
"ncclSum"
);
AddComment
(
R"DOC(
AllReduce the input tensors.
)DOC"
);
NCCLAllReduce Operator.
AllReduce the input tensors.
)DOC"
);
}
};
...
...
@@ -161,14 +170,20 @@ class NCCLReduceOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"Communicator"
,
"Communicator for communicating between gpus"
);
AddOutput
(
"Out"
,
"The output of Reduce op"
);
AddAttr
<
std
::
string
>
(
"reduction"
,
"(string, default 'ncclSum') "
"{'ncclMin', 'ncclMax', 'ncclProd', 'ncclSum'}."
)
.
SetDefault
(
"ncclSum"
);
AddAttr
<
int
>
(
"root"
,
"root gpu of the parameter. if not "
"set(platform::kInvalidGPUId). hashed by name."
)
"(int, default kInvalidGPUId) "
"Root gpu of the parameter. If not, "
"set(platform::kInvalidGPUId). Hashed by name."
)
.
SetDefault
(
platform
::
kInvalidGPUId
);
AddComment
(
R"DOC(
Reduce the tensors)DOC"
);
NCCLReduce Operator.
Reduce the tensors.
)DOC"
);
}
};
...
...
@@ -182,12 +197,16 @@ class NCCLBcastOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"Communicator"
,
"Communicator for communicating between gpus"
);
AddOutput
(
"Out"
,
"The output of Bcast"
);
AddAttr
<
int
>
(
"root"
,
"root gpu of the parameter. if not "
"set(platform::kInvalidGPUId). hashed by name."
)
"(int, default kInvalidGPUId) "
"Root gpu of the parameter. If not, "
"set(platform::kInvalidGPUId). Hashed by name."
)
.
SetDefault
(
platform
::
kInvalidGPUId
);
AddComment
(
R"DOC(
Bcast the tensors.
)DOC"
);
NCCLBcast Operator.
Bcast the tensors.
)DOC"
);
}
};
...
...
paddle/operators/pad_op.cc
浏览文件 @
483947c4
...
...
@@ -54,41 +54,44 @@ class PadOpMaker : public framework::OpProtoAndCheckerMaker {
"The input of pad op. "
"The input should be a k-D tensor(k > 0 and k < 7)"
);
AddOutput
(
"Out"
,
"The output of pad op."
"The output of pad op.
"
"A tensor with the same shape as X."
);
AddAttr
<
std
::
vector
<
int
>>
(
"paddings"
,
"(vector<int>) "
"A list<int> to describe the padding rules for each dimension. "
"For 2-D image tensor, paddings=[0, 1, 2, 3] means "
"padding 0 row to top, 1 row to bottom, 2 columns to left "
"and 3 columns to right. Size of paddings should be equal to "
"2 * dimension size of the input tensor."
);
AddAttr
<
float
>
(
"pad_value"
,
"(float, default 0.0) "
"The value to fill the padded areas."
)
.
SetDefault
(
0.0
f
);
AddComment
(
R"DOC(
Pad input into output, as specified by paddings and pad_value. The input should be a k-D tensor(k > 0 and k < 7). As an example:
Pad Operator.
Pad input into output, as specified by paddings and pad_value.
The input should be a k-D tensor(k > 0 and k < 7). As an example:
Given:
X = [[1, 2],
[3, 4]]
and
[3, 4]],
paddings = [0, 1, 1, 2]
paddings = [0, 1, 1, 2]
,
and
pad_value = 0
pad_value = 0
,
then we get
we have:
Out = [[0, 1, 2, 0, 0]
[0, 3, 4, 0, 0]
[0, 0, 0, 0, 0]]
)DOC"
);
AddAttr
<
std
::
vector
<
int
>>
(
"paddings"
,
"A list<int> to describes padding rules for each dimension."
" For 2-D image tensor, paddings=[0, 1, 2, 3] means"
" padding 0 row to top, 1 row to bottom, 2 columns to left"
" and 3 columns to right.Size of paddings should be equal to"
" 2 * dimension size of input tensor."
);
AddAttr
<
float
>
(
"pad_value"
,
"(float) default to 0; "
"The value to fill padded areas."
)
.
SetDefault
(
0.0
f
);
}
};
...
...
paddle/operators/pool_op.cc
浏览文件 @
483947c4
...
...
@@ -73,125 +73,138 @@ Pool2dOpMaker::Pool2dOpMaker(framework::OpProto *proto,
AddInput
(
"X"
,
"(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCHW. Where N is batch size, C is the "
"number of channels, H and W is the height and width of feature."
);
"The format of input tensor is NCHW, where N is batch size, C is the "
"number of channels, H is the height of the feature, "
"and W is the width of the feature."
);
AddOutput
(
"Out"
,
"(Tensor) The output tensor of pooling operator."
"The format of output tensor is also NCHW
.
"
"
Where N is batch size, C is
"
"
the number of channels, H and W is the height and
"
"
width of
feature."
);
"(Tensor) The output tensor of pooling operator.
"
"The format of output tensor is also NCHW
,
"
"
where N is batch size, C is the number of channels,
"
"
H is the height of the feature,
"
"
and W is the width of the
feature."
);
AddAttr
<
std
::
string
>
(
"poolingType"
,
"(string), pooling type, can be
\"
max
\"
for max-pooling "
"and
\"
avg
\"
for average-pooling."
)
.
InEnum
({
"max"
,
"avg"
});
AddAttr
<
std
::
vector
<
int
>>
(
"ksize"
,
"(vector
), the pooling window size(height, width)
"
"
of pooling operator.
"
"(vector
<int>) The pooling window
"
"
size(height, width) of the pooling operator.
"
"If globalPooling = true, ksize and paddings will "
"be ignored."
);
// TODO(Chengduo): Add checker.
// (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
bool
>
(
"globalPooling"
,
"(bool
default: false), whether to use the global pooling.
"
"(bool
, default false) Whether to use the global pooling.
"
"If globalPooling = true, ksize and paddings will be ignored."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector, default:{1, 1}), strides(height,
width) of pooling operator."
)
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector<int>, default {1, 1}), strides(height, "
"
width) of pooling operator."
)
.
SetDefault
({
1
,
1
});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
std
::
vector
<
int
>>
(
"paddings"
,
"(vector defalut:{0,0}), paddings(height, width) of pooling operator."
"(vector<int>, defalut {0,0}), paddings(height, width) of pooling "
"operator."
"If globalPooling = true, paddings and ksize will be ignored."
)
.
SetDefault
({
0
,
0
});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddComment
(
R"DOC(
Pool2d Operator.
The pooling2d operation calculates the output based on
the input, poolingType and ksize, strides, paddings parameters.
Input(X) and output(Out) are in NCHW format
. W
here N is batch size, C is the
number of channels, H
and W is the height and width of
feature.
Input(X) and output(Out) are in NCHW format
, w
here N is batch size, C is the
number of channels, H
is the height of the feature, and W is the width of the
feature.
Parameters(ksize, strides, paddings) are two elements.
These two elements represent height and width, respectively.
The input(X) size and output(Out) size may be different.
Example:
Input:
X shape:
(N, C, H_in, W_in)
X shape:
$(N, C, H_{in}, W_{in})$
Output:
Out shape:
(N, C, H_out, W_out)
Out shape:
$(N, C, H_{out}, W_{out})$
where
H_out = (H_in - ksize[0] + 2 * paddings[0]) / strides[0] + 1;
W_out = (W_in - ksize[1] + 2 * paddings[1]) / strides[1] + 1;
$$
H_{out} = (H_{in} - ksize[0] + 2 * paddings[0]) / strides[0] + 1 \\
W_{out} = (W_{in} - ksize[1] + 2 * paddings[1]) / strides[1] + 1
$$
)DOC"
);
}
Pool3dOpMaker
::
Pool3dOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
AddInput
(
"X"
,
"(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCDHW. Where N is batch size, C is "
"the number of channels, D, H and W is the depth, height and width of "
"feature."
);
"The format of input tensor is NCDHW, where N is batch size, C is "
"the number of channels, and D, H and W is the depth, height and "
"width of "
"the feature, respectively."
);
AddOutput
(
"Out"
,
"(Tensor) The output tensor of pooling operator."
"The format of output tensor is also NCDHW
.
"
"
W
here N is batch size, C is "
"the number of channels, D, H and W is the depth, height and "
"width of
feature
."
);
"The format of output tensor is also NCDHW
,
"
"
w
here N is batch size, C is "
"the number of channels,
and
D, H and W is the depth, height and "
"width of
the feature, respectively
."
);
AddAttr
<
std
::
string
>
(
"poolingType"
,
"(string)
, p
ooling type, can be
\"
max
\"
for max-pooling "
"(string)
P
ooling type, can be
\"
max
\"
for max-pooling "
"and
\"
avg
\"
for average-pooling."
)
.
InEnum
({
"max"
,
"avg"
});
AddAttr
<
std
::
vector
<
int
>>
(
"ksize"
,
"(vector ), the pooling window size(depth, height, "
"width) of pooling
"
"operator.
"
"If globalPooling = true, ksize and paddings wille
"
AddAttr
<
std
::
vector
<
int
>>
(
"ksize"
,
"(vector<int>) The pooling window size(depth, height,
"
"width) of pooling operator.
"
"If globalPooling = true, ksize and paddings will
"
"be ignored."
);
// TODO(Chengduo): Add checker.
// (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
bool
>
(
"globalPooling"
,
"(bool
default: false), whether to use the global pooling.
"
"(bool
, default false) Whether to use the global pooling.
"
"If globalPooling = true, ksize and paddings wille be ignored."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector, default:{1,1,1}), strides(depth, height, "
"width) of pooling operator."
)
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector<int>, default {1,1,1}) Strides(depth, height, "
"width) of the pooling operator."
)
.
SetDefault
({
1
,
1
,
1
});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
std
::
vector
<
int
>>
(
"paddings"
,
"(vector
defalut:
{0,0,0}), paddings(depth, height, "
"width) of pooling operator."
"If globalPooling = true, ksize and paddings will
e
be ignored."
)
"(vector
<int>, defalut
{0,0,0}), paddings(depth, height, "
"width) of pooling operator.
"
"If globalPooling = true, ksize and paddings will be ignored."
)
.
SetDefault
({
0
,
0
,
0
});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddComment
(
R"DOC(
Pool3d Operator.
The pooling3d operation calculates the output based on
the input, poolingType
and ksize, strides,
paddings parameters.
Input(X) and output(Out) are in NCDHW format
. W
here N is batch
size, C is the number of channels,
D, H and W is
the depth, height and
width of
feature. Parameters(ksize, strides, paddings) are three elements.
These three elements represent depth, height and width, respectively.
The input(X) size and output(Out) size may be different.
the input, poolingType
, ksize, strides, and
paddings parameters.
Input(X) and output(Out) are in NCDHW format
, w
here N is batch
size, C is the number of channels,
and D, H and W are
the depth, height and
width of
the feature, respectively. Parameters(ksize, strides, paddings)
are three elements. These three elements represent depth, height and
width, respectively.
The input(X) size and output(Out) size may be different.
Example:
Input:
X shape:
(N, C, D_in, H_in, W_in)
X shape:
$(N, C, D_{in}, H_{in}, W_{in})$
Output:
Out shape:
(N, C, D_out, H_out, W_out)
Out shape:
$(N, C, D_{out}, H_{out}, W_{out})$
where
D_out = (D_in - ksize[0] + 2 * paddings[0]) / strides[0] + 1;
H_out = (H_in - ksize[1] + 2 * paddings[1]) / strides[1] + 1;
W_out = (W_in - ksize[2] + 2 * paddings[2]) / strides[2] + 1;
$$
D_{out} = (D_{in} - ksize[0] + 2 * paddings[0]) / strides[0] + 1 \\
H_{out} = (H_{in} - ksize[1] + 2 * paddings[1]) / strides[1] + 1 \\
W_{out} = (W_{in} - ksize[2] + 2 * paddings[2]) / strides[2] + 1
$$
)DOC"
);
}
}
// namespace operators
...
...
paddle/operators/pool_with_index_op.cc
浏览文件 @
483947c4
...
...
@@ -89,64 +89,73 @@ class MaxPool2dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(Tensor), the input tensor of pooling operator. "
"The format of input tensor is NCHW. Where N is batch size, C is the "
"number of channels, H and W is the height and width of image."
);
"(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCHW, where N is batch size, C is the "
"number of channels, H is the height of the image, "
"and W is the width of the image."
);
AddOutput
(
"Out"
,
"(Tensor)
, the output tensor of pooling operator.
"
"The format of output tensor is also NCHW
.
"
"
W
here N is batch size, C is "
"the number of channels, H
and W is the height and
"
"
width of
image."
);
"(Tensor)
The output tensor of pooling operator.
"
"The format of output tensor is also NCHW
,
"
"
w
here N is batch size, C is "
"the number of channels, H
is the height of the image
"
"
and W is the width of the
image."
);
AddOutput
(
"Mask"
,
"(Tensor), the Mask tensor of pooling operator."
"The format of output tensor is also NCHW."
"Where N is batch size, C is the number of channels, H and W "
"is the height and width of image."
"The value in it is the index in current feature map"
);
"(Tensor) The Mask tensor of pooling operator."
"The format of output tensor is also NCHW, "
"where N is batch size, C is the number of channels, "
"H is the height of the image, "
"and W is the width of the image. "
"It represents the index in the current feature map."
);
AddAttr
<
std
::
vector
<
int
>>
(
"ksize"
,
"(vector
), t
he pooling window size(height, "
"width) of pooling operator."
"(vector
<int>) T
he pooling window size(height, "
"width) of pooling operator.
"
"If globalPooling = true, ksize and paddings "
"will be ignored."
);
// TODO(Chengduo): Add
// checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
bool
>
(
"globalPooling"
,
"(bool
default: false), whether to use the global pooling.
"
"(bool
, default false) Whether to use the global pooling.
"
"If globalPooling = true, ksize and paddings will be ignored."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector, default:{1, 1}), strides(height,
width) of pooling operator."
)
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector<int>, default {1, 1}), strides(height, "
"
width) of pooling operator."
)
.
SetDefault
({
1
,
1
});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
std
::
vector
<
int
>>
(
"paddings"
,
"(vector defalut:{0, 0}), paddings(height, width) of pooling operator."
"(vector<int>, defalut {0, 0}), paddings(height, width) of pooling "
"operator. "
"If globalPooling = true, paddings and will be ignored."
)
.
SetDefault
({
0
,
0
});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddComment
(
R"DOC(
MaxPool2d Operator.
The maxPooling2d with index operation calculates the output and the mask
based on the input and ksize, strides, paddings parameters. Input(X) and
output(Out, Mask) are in NCHW format. Where N is batch size, C is the
number of channels, H and W is the height and width of feature.
based on the input, ksize, strides, and paddings parameters. Input(X) and
output(Out, Mask) are in NCHW format, where N is batch size, C is the
number of channels, H is the height of the feature,
and W is the width of the feature.
Parameters(ksize, strides, paddings) are two elements.
These two elements represent height and width, respectively.
The input(X) size and output(Out, Mask) size may be different.
Example:
Input:
X shape:
(N, C, H_in, W_in)
X shape:
$(N, C, H_{in}, W_{in})$
Output:
Out shape:
(N, C, H_out, W_out)
Mask shape:
(N, C, H_out, W_out)
Out shape:
$(N, C, H_{out}, W_{out})$
Mask shape:
$(N, C, H_{out}, W_{out})$
where
H_out = (H_in - ksize[0] + 2 * paddings[0]) / strides[0] + 1;
W_out = (W_in - ksize[1] + 2 * paddings[1]) / strides[1] + 1;
$$
H_{out} = (H_{in} - ksize[0] + 2 * paddings[0]) / strides[0] + 1 \\
W_{out} = (W_{in} - ksize[1] + 2 * paddings[1]) / strides[1] + 1
$$
)DOC"
);
}
};
...
...
@@ -156,70 +165,76 @@ class MaxPool3dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker {
MaxPool3dWithIndexOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(Tensor), the input tensor of pooling operator.
"
"The format of input tensor is NCDHW. Where N is batch size, C is
"
"the number of channels, D, H and W is the depth, height and
width of "
"image.
"
);
AddInput
(
"X"
,
"(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCDHW, where N is batch size, C is
"
"the number of channels, and D, H and W are the depth, height and
"
"
width of "
"the image, respectively
"
);
AddOutput
(
"Out"
,
"(Tensor)
, the output tensor of pooling operator.
"
"The format of output tensor is also NCDHW
.
"
"
Where N is batch size, C is
"
"
the number of channels, D, H and W is
the depth, height and "
"width of
image
."
);
"(Tensor)
The output tensor of pooling operator.
"
"The format of output tensor is also NCDHW
,
"
"
where N is the batch size, C is the number of channels,
"
"
and D, H and W are
the depth, height and "
"width of
the image, respectively
."
);
AddOutput
(
"Mask"
,
"(Tensor), the Mask tensor of pooling operator."
"The format of output tensor is also NCDHW."
"Where N is batch size, C is the number of channels, D, H and W "
"is the depth, height and width of image."
"The value in it is the index in current feature map"
);
"(Tensor) The Mask tensor of pooling operator. "
"The format of output tensor is also NCDHW, "
"where N is the batch size, C is the number of channels, and "
"D, H and W are the depth, height and width "
"of the image, respectively. "
"It represents the index in the current feature map."
);
AddAttr
<
std
::
vector
<
int
>>
(
"ksize"
,
"(vector), the pooling window size(depth, "
"height, width) of pooling "
"operator."
"(vector<int>) The pooling window size(depth, "
"height, width) of pooling operator. "
"If globalPooling = true, ksize and paddings "
"will be ignored."
);
// TODO(Chengduo): Add
// checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
bool
>
(
"globalPooling"
,
"(bool
default: false), whether to use the global pooling.
"
"(bool
, default false) Whether to use the global pooling.
"
"If globalPooling = true, ksize and paddings will be ignored."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector
, default:
{1,1,1}), strides(depth, "
"(vector
<int>, default
{1,1,1}), strides(depth, "
"height, width) of pooling operator."
)
.
SetDefault
({
1
,
1
,
1
});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
std
::
vector
<
int
>>
(
"paddings"
,
"(vector
defalut:
{0,0,0}), paddings(depth, "
"height, width) of pooling operator."
"(vector
, defalut
{0,0,0}), paddings(depth, "
"height, width) of pooling operator.
"
"If globalPooling = true, paddings and ksize will be ignored."
)
.
SetDefault
({
0
,
0
,
0
});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.)
AddComment
(
R"DOC(
MaxPool3d Operator.
The maxpooling3d with index operation calculates the output and the mask
based on the input and ksize, strides, paddings parameters.
Input(X) and output(Out, Mask) are in NCDHW format. Where N is batch
size, C is the number of channels, D, H and W is the depth, height and
width of feature. Parameters(ksize, strides, paddings) are three elements.
Input(X) and output(Out, Mask) are in NCDHW format, where N is batch
size, C is the number of channels, and D, H and W are the depth, height and
width of the feature, respectively.
Parameters(ksize, strides, paddings) are three elements.
These three elements represent depth, height and width, respectively.
The input(X) size and output(Out, Mask) size may be different.
Example:
Input:
X shape:
(N, C, D_in, H_in, W_in)
X shape:
$(N, C, D_{in}, H_{in}, W_{in})$
Output:
Out shape:
(N, C, D_out, H_out, W_out)
Mask shape:
(N, C, D_out, H_out, W_out)
Out shape:
$(N, C, D_{out}, H_{out}, W_{out})$
Mask shape:
$(N, C, D_{out}, H_{out}, W_{out})$
where
D_out = (D_in - ksize[0] + 2 * paddings[0]) / strides[0] + 1;
H_out = (H_in - ksize[1] + 2 * paddings[1]) / strides[1] + 1;
W_out = (W_in - ksize[2] + 2 * paddings[2]) / strides[2] + 1;
$$
D_{out} = (D_{in} - ksize[0] + 2 * paddings[0]) / strides[0] + 1 \\
H_{out} = (H_{in} - ksize[1] + 2 * paddings[1]) / strides[1] + 1 \\
W_{out} = (W_{in} - ksize[2] + 2 * paddings[2]) / strides[2] + 1
$$
)DOC"
);
}
};
...
...
paddle/operators/precision_recall_op.cc
浏览文件 @
483947c4
...
...
@@ -92,76 +92,78 @@ class PrecisionRecallOpMaker : public framework::OpProtoAndCheckerMaker {
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"MaxProbs"
,
"(Tensor, default Tensor<float>)
, a
2-D tensor with shape N x 1, "
"(Tensor, default Tensor<float>)
A
2-D tensor with shape N x 1, "
"where N is the batch size. Each row contains the max probability "
"of an instance which computed by the previous top_k (k=1) "
"operator."
);
AddInput
(
"Indices"
,
"(Tensor, default Tensor<int>)
, a
2-D tensor with shape N x 1, "
"(Tensor, default Tensor<int>)
A
2-D tensor with shape N x 1, "
"where N is the batch size. Each row contains the corresponding "
"index which computed by the previous top_k (k=1) operator."
);
AddInput
(
"Labels"
,
"(Tensor, default Tensor<int>)
, a
2-D tensor with shape N x 1, "
"(Tensor, default Tensor<int>)
A
2-D tensor with shape N x 1, "
"where N is the batch size. Each element is a label and the "
"value should be in [0, class_number - 1]."
);
AddInput
(
"Weights"
,
"(Tensor, default Tensor<float>)
, a
2-D tensor with shape N x 1, "
"(Tensor, default Tensor<float>)
A
2-D tensor with shape N x 1, "
"where N is the batch size. This input is optional. If provided, "
"weight of instance would be considered when computing metrics."
)
.
AsDispensable
();
AddInput
(
"StatesInfo"
,
"(Tensor, default Tensor<int>)
, a
2-D tensor with shape D x 4, "
"(Tensor, default Tensor<int>)
A
2-D tensor with shape D x 4, "
"where D is the number of classes. This input is optional. If "
"provided, current state will be accumulated to this state and "
"the accumulation state will be
as
the output state."
)
"the accumulation state will be the output state."
)
.
AsDispensable
();
AddOutput
(
"BatchMetrics"
,
"(Tensor, default Tensor<float>)
, a 1-D tensor with shape {6}.
"
"This output tensor contains metrics for current batch data."
"(Tensor, default Tensor<float>)
A 1-D tensor with shape {6}.
"
"This output tensor contains metrics for current batch data.
"
"The layout is [macro average precision, macro average recall, "
"macro f1 score, micro average precision, micro average recall, "
"micro f1 score]"
);
"micro f1 score]
.
"
);
AddOutput
(
"AccumMetrics"
,
"(Tensor, default Tensor<float>)
, a 1-D tensor with shape {6}.
"
"This output tensor contains metrics for accumulated data."
"(Tensor, default Tensor<float>)
A 1-D tensor with shape {6}.
"
"This output tensor contains metrics for accumulated data.
"
"The layout is [macro average precision, macro average recall, "
"macro f1 score, micro average precision, micro average recall, "
"micro f1 score]"
);
"micro f1 score]
.
"
);
AddOutput
(
"AccumStatesInfo"
,
"(Tensor, default Tensor<float>)
, a
2-D tensor with shape D x 4, "
"(Tensor, default Tensor<float>)
A
2-D tensor with shape D x 4, "
"where D is equal to class number. This output tensor contains "
"accumulated state variables used to compute metrics. The layout "
"for each class is [true positives, false positives, "
"true negatives, false negatives]."
);
AddAttr
<
int
>
(
"class_number"
,
"Number of classes to be evaluated."
);
AddAttr
<
int
>
(
"class_number"
,
"
(int)
Number of classes to be evaluated."
);
AddComment
(
R"DOC(
When given 'Input(Indices)' and 'Input(Labels)', this operator can be used
Precision Recall Operator.
When given Input(Indices) and Input(Labels), this operator can be used
to compute various metrics including:
-
macro average precision
-
macro average recall
-
macro f1 score
-
micro average precision
-
micro average recall
-
micro f1 score
1.
macro average precision
2.
macro average recall
3.
macro f1 score
4.
micro average precision
5.
micro average recall
6.
micro f1 score
To compute the above metrics, we need to do statistics for true positives,
false positives and false negatives. Here count of true negatives is not
false positives and false negatives. Here
the
count of true negatives is not
necessary, but counting it may provide potential usage and the cost is
trivial, so the operator also provides count of true negatives.
trivial, so the operator also provides
the
count of true negatives.
We define state as a 2-D tensor with shape [class_number, 4]. Each row of a
state contains statistic variables for corresponding class. Layout of each row
is: TP(true positives), FP(false positives), TN(true negatives),
FN(false negatives). If
'Input(Weights)'
provided, TP, FP, TN, FN will be
calculated by given weight instead of instance count.
FN(false negatives). If
Input(Weights) is
provided, TP, FP, TN, FN will be
calculated by given weight instead of
the
instance count.
This operator also supports metrics computing for cross-batch situation. To
achieve this,
'Input(StatesInfo)'
should be provided. State of current batch
data will be accumulated to
'Input(StatesInfo)' and 'Output(AccumStatesInfo)'
achieve this,
Input(StatesInfo)
should be provided. State of current batch
data will be accumulated to
Input(StatesInfo) and Output(AccumStatesInfo)
is the accumulation state.
'Output(BatchMetrics)'
is metrics of current batch data while
'Output(AccumStatesInfo)'
is metrics of accumulation data.
Output(BatchMetrics)
is metrics of current batch data while
Output(AccumStatesInfo)
is metrics of accumulation data.
)DOC"
);
}
...
...
paddle/operators/prelu_op.cc
浏览文件 @
483947c4
...
...
@@ -41,17 +41,24 @@ class PReluOpMaker : public framework::OpProtoAndCheckerMaker {
PReluOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input tensor of prelu operator."
);
AddInput
(
"Alpha"
,
"The alpha weight of PRelu operator."
);
AddOutput
(
"Out"
,
"The output tensor of PRelu operator."
);
AddComment
(
R"DOC(PRelu operator
AddInput
(
"Alpha"
,
"The alpha weight of prelu operator."
);
AddOutput
(
"Out"
,
"The output tensor of prelu operator."
);
AddComment
(
R"DOC(
PRelu Operator.
The equation is:
f(x) = alpha * x , for x < 0
f(x) = x , for x >= 0
$$
f(x) =
\begin{cases}
\alpha * x, \quad \text{if} \ x < 0 \\
x, \qquad \text{if} \ x >= 0
\end{cases}
$$
The input `X` can carry the LoD (Level of Details) information,
or not. And the output shares the LoD with input `X`.
or not. And the output shares the LoD information with input `X`.
)DOC"
);
}
};
...
...
paddle/operators/proximal_adagrad_op.cc
浏览文件 @
483947c4
...
...
@@ -83,22 +83,26 @@ class ProximalAdagradOpMaker : public framework::OpProtoAndCheckerMaker {
"L1 regularization strength."
)
.
SetDefault
(
0.0
f
);
AddAttr
<
float
>
(
"l2"
,
"(float, default 0.0)"
"(float, default 0.0)
"
"L2 regularization strength."
)
.
SetDefault
(
0.0
f
);
AddComment
(
R"DOC(
Proximal Adagrad Optimizer.
Optimizer that implements the proximal adagrad algorithm
.
Optimizer that implements the proximal adagrad algorithm
:
moment = moment + grad * grad
prox_param = param - learning_rate * grad * (1 / sqrt(moment))
param = sign(prox_param) / (1 + learning_rate * l2) *
max { |prox_param| - learning_rate * l1 , 0 }
$$
moment = moment + grad * grad \\
prox\_param = param - learning\_rate * grad * (1 / \sqrt{moment}) \\
param = sign(prox\_param) / (1 + learning\_rate * l2) *
\max(|prox\_param| - learning\_rate * l1 , 0)
$$
The paper that proposed Proximal GD:
(http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf)
Here, we use the adagrad learning rate as specified here:
(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
)DOC"
);
}
};
...
...
paddle/operators/proximal_gd_op.cc
浏览文件 @
483947c4
...
...
@@ -67,19 +67,23 @@ class ProximalGDOpMaker : public framework::OpProtoAndCheckerMaker {
"L1 regularization strength."
)
.
SetDefault
(
0.0
f
);
AddAttr
<
float
>
(
"l2"
,
"(float, default 0.0)"
"(float, default 0.0)
"
"L2 regularization strength."
)
.
SetDefault
(
0.0
f
);
AddComment
(
R"DOC(
ProximalGD Operator.
Optimizer that implements the proximal gradient descent algorithm
.
Optimizer that implements the proximal gradient descent algorithm
:
prox_param = param - learning_rate * grad
param = sign(prox_param) / (1 + learning_rate * l2) *
max { |prox_param| - learning_rate * l1 , 0 }
$$
prox\_param = param - learning\_rate * grad \\
param = sign(prox\_param) / (1 + learning\_rate * l2) *
\max(|prox\_param| - learning\_rate * l1, 0)
$$
The paper that proposed Proximal Gradient Descent:
(http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf)
)DOC"
);
}
};
...
...
paddle/operators/rank_loss_op.cc
浏览文件 @
483947c4
...
...
@@ -26,9 +26,9 @@ class RankLossOp : public framework::OperatorWithKernel {
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
// input check
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Label"
),
"Input(Label) shouldn't be null"
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Left"
),
"Input(Left) shouldn't be null"
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Right"
),
"Input(Right) shouldn't be null"
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Label"
),
"Input(Label) shouldn't be null
.
"
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Left"
),
"Input(Left) shouldn't be null
.
"
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Right"
),
"Input(Right) shouldn't be null
.
"
);
auto
label_dims
=
ctx
->
GetInputDim
(
"Label"
);
auto
left_dims
=
ctx
->
GetInputDim
(
"Left"
);
...
...
@@ -50,32 +50,32 @@ class RankLossOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"Label"
,
"The label indicating A ranked higher than B or not, row vector."
);
AddInput
(
"Left"
,
"The output of RankNet for doc A, vector."
);
AddInput
(
"Right"
,
"The output of RankNet for doc B, vetor"
);
AddInput
(
"Right"
,
"The output of RankNet for doc B, vetor
.
"
);
AddOutput
(
"Out"
,
"The output loss of RankLoss operator, vector."
);
AddComment
(
R"DOC(RankLoss operator
AddComment
(
R"DOC(
RankLoss Operator.
Rank loss operator for RankNet[1]. RankNet is a pairwise ranking model with
RankLoss operator for RankNet
(http://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf).
RankNet is a pairwise ranking model with
one training sample consisting of a pair of doc A and B, and the label P
indicating that A is ranked higher than B or not:
P = {0, 1} or {0, 0.5, 1}, where 0.5 means no information about the rank of
the input pair.
The RankLoss operator
contain
s three inputs: Left (o_i), Right (o_j) and Label
(P_{i,j}), which represent the output of RankNet for t
wo docs and the label
respectively, and yields the rank loss C_{i,j}
by following the expression
The RankLoss operator
take
s three inputs: Left (o_i), Right (o_j) and Label
(P_{i,j}), which represent the output of RankNet for t
he two docs and the label,
respectively, and yields the rank loss C_{i,j}
using the following equation:
\f
[
\f
$$
C_{i,j} = -\tilde{P_{ij}} * o_{i,j} + log(1 + e^{o_{i,j}}) \\
o_{i,j} = o_i - o_j \\
\tilde{P_{i,j}} = \left \{0, 0.5, 1 \right \} \ or \ \left \{0, 1 \right \}
\f
]
\f
$$
The operator can take inputs of one sample or in batch.
[1]. Chris Burges, Tal Shaked, Erin Renshaw, et al. Learning to
Rank using Gradient Descent.
http://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf
)DOC"
);
}
};
...
...
paddle/operators/recurrent_op.cc
浏览文件 @
483947c4
...
...
@@ -509,14 +509,14 @@ class RecurrentOpProtoMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
kInitialStates
,
"rnn initial states"
).
AsDuplicable
();
AddInput
(
kParameters
,
"Parameters are used by step block as its input. However, the "
"input
s
is not a sequence tensor. Every time step, each operator "
"in step block just use the parameter directly"
)
"input is not a sequence tensor. Every time step, each operator "
"in step block just use the parameter directly
.
"
)
.
AsDuplicable
();
AddOutput
(
kOutputs
,
"The output sequence of RNN. The sequence length must be same"
)
"The output sequence of RNN. The sequence length must be same
.
"
)
.
AsDuplicable
();
AddOutput
(
kStepScopes
,
"StepScopes contain
s
all local variables in each time step."
);
"StepScopes contain all local variables in each time step."
);
AddAttr
<
std
::
vector
<
std
::
string
>>
(
kExStates
,
string
::
Sprintf
(
R"DOC(The ex-state variable names.
...
...
@@ -556,10 +556,12 @@ if reverse is True
o o o o
)DOC"
).
SetDefault
(
false
);
AddAttr
<
bool
>
(
kIsTrain
,
""
).
SetDefault
(
true
);
AddComment
(
R"DOC(Static Length Recurrent Operator
AddComment
(
R"DOC(
Static Length Recurrent Operator.
The static length recurrent operator can only operate on fixed size sequence
data, i.e. in each mini-batch, the sequence length of all inputs are the same.
The static length recurrent operator can only operate on fix sized sequence
data, i.e. in each mini-batch, the sequence length of all inputs are same.
)DOC"
);
}
};
...
...
paddle/operators/reduce_op.cc
浏览文件 @
483947c4
...
...
@@ -80,24 +80,27 @@ class ReduceOpMaker : public framework::OpProtoAndCheckerMaker {
public:
ReduceOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(Tensor) The input tensor. Tensors with rank at most 6 are supported
"
);
AddInput
(
"X"
,
"(Tensor) The input tensor. Tensors with rank at most 6 are "
"supported.
"
);
AddOutput
(
"Out"
,
"(Tensor) The result tensor."
);
AddAttr
<
int
>
(
"dim"
,
"(int, default
1
) The dimension to reduce. "
"(int, default
0
) The dimension to reduce. "
"Must be in the range [-rank(input), rank(input)). "
"If `dim < 0`, the dim to reduce is `rank + dim`. "
"Not
ing
that reducing on the first dim will make the LoD info lost."
)
"Not
e
that reducing on the first dim will make the LoD info lost."
)
.
SetDefault
(
0
);
AddAttr
<
bool
>
(
"keep_dim"
,
"(bool, default false) "
"If true, retain the reduced dimension with length 1."
)
.
SetDefault
(
false
);
comment_
=
R"DOC(
{ReduceOP} operator computes the {reduce} of input tensor along the given dimension.
The result tensor has 1 fewer dimension than the input unless `keep_dim` is true.
{ReduceOp} Operator.
This operator computes the {reduce} of input tensor along the given dimension.
The result tensor has 1 fewer dimension than the input unless keep_dim is true.
)DOC"
;
AddComment
(
comment_
);
}
...
...
paddle/operators/reshape_op.cc
浏览文件 @
483947c4
...
...
@@ -71,8 +71,11 @@ class ReshapeOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input tensor of reshape operator."
);
AddOutput
(
"Out"
,
"The output tensor of reshape operator."
);
AddAttr
<
std
::
vector
<
int
>>
(
"shape"
,
"Target shape of reshape operator."
);
AddComment
(
R"DOC(Reshape operator
AddAttr
<
std
::
vector
<
int
>>
(
"shape"
,
"(vector<int>) "
"Target shape of reshape operator."
);
AddComment
(
R"DOC(
Reshape Operator.
Reshape Input(X) into the shape specified by Attr(shape).
...
...
@@ -81,7 +84,7 @@ Given a 2-D tensor X with 2 rows and 2 columns
[[1, 2], [3, 4]]
with
target shape = [1, 4], the reshape operator will transform
and
target shape = [1, 4], the reshape operator will transform
the tensor X into a 1-D tensor:
[1, 2, 3, 4]
...
...
paddle/operators/rmsprop_op.cc
浏览文件 @
483947c4
...
...
@@ -68,22 +68,22 @@ class RmspropOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"Param"
,
"(Tensor, default Tensor<float>) "
"Input parameter value that has to be updated"
);
"Input parameter value that has to be updated
.
"
);
AddInput
(
"MeanSquare"
,
"(Tensor, default Tensor<float>)"
" The mean square value that gets updated"
);
" The mean square value that gets updated
.
"
);
AddInput
(
"LearningRate"
,
"(Tensor, default Tensor<float>) "
"The learning rate should be a tensor of size 1"
);
"The learning rate should be a tensor of size 1
.
"
);
AddInput
(
"Grad"
,
"(Tensor, default Tensor<float>) "
"Input gradient of the parameter"
);
"Input gradient of the parameter
.
"
);
AddInput
(
"Moment"
,
"(Tensor, default Tensor<float>) The moment that gets updated"
);
"(Tensor, default Tensor<float>) The moment that gets updated
.
"
);
AddOutput
(
"ParamOut"
,
"(Tensor) Output updated parameter value"
);
AddOutput
(
"MomentOut"
,
"(Tensor) Output updated moment"
);
AddOutput
(
"MeanSquareOut"
,
"(Tensor) Output Mean squared updated value"
);
AddOutput
(
"ParamOut"
,
"(Tensor) Output updated parameter value
.
"
);
AddOutput
(
"MomentOut"
,
"(Tensor) Output updated moment
.
"
);
AddOutput
(
"MeanSquareOut"
,
"(Tensor) Output Mean squared updated value
.
"
);
AddAttr
<
float
>
(
"epsilon"
,
"(float, default 1e-10) Constant "
...
...
@@ -93,18 +93,19 @@ class RmspropOpMaker : public framework::OpProtoAndCheckerMaker {
"(float, default 0.9) "
"Discounting factor for coming gradient."
)
.
SetDefault
(
0.9
f
);
AddAttr
<
float
>
(
"momentum"
,
"(float, default 0.0) Constant value"
)
AddAttr
<
float
>
(
"momentum"
,
"(float, default 0.0) Constant value
.
"
)
.
SetDefault
(
0.0
f
);
AddComment
(
R"DOC(
Rmsprop Optimizer.
RMSprop
MeanSquareOut = decay * MeanSquare + (1 - decay) * Grad * Grad
$$
MeanSquareOut = decay * MeanSquare + (1 - decay) * Grad * Grad \\
MomentOut = momentum * Moment +
LearningRate * Grad / sqrt(MeanSquareOut + epsilon)
\frac{LearningRate * Grad}{\sqrt{MeanSquareOut + epsilon}} \\
ParamOut = Param - MomentOut
$$
The original slides that proposed R
MS
prop: Slide 29 of
The original slides that proposed R
ms
prop: Slide 29 of
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
)DOC"
);
...
...
paddle/operators/save_op.cc
浏览文件 @
483947c4
...
...
@@ -163,14 +163,19 @@ class SaveOpProtoMaker : public framework::OpProtoAndCheckerMaker {
SaveOpProtoMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The tensor need to be saved"
);
AddComment
(
R"DOC(Save operator
Save operator will serialize and write a tensor variable to disk file.
AddInput
(
"X"
,
"(Tensor ) Input tensor to be saved"
);
AddComment
(
R"DOC(
Save operator
This operator will serialize and write a tensor variable to file on disk.
)DOC"
);
AddAttr
<
bool
>
(
"overwrite"
,
"Overwrite the output file if exist"
)
AddAttr
<
bool
>
(
"overwrite"
,
"(boolean, default true)"
"Overwrite the output file if exist"
)
.
SetDefault
(
true
);
AddAttr
<
std
::
string
>
(
"file_path"
,
"Variable will be saved to
\"
file_path
\"
."
)
"(string)"
"The
\"
file_path
\"
where the variable will be saved."
)
.
AddCustomChecker
(
[](
const
std
::
string
&
path
)
{
return
!
path
.
empty
();
});
}
...
...
paddle/operators/scale_op.cc
浏览文件 @
483947c4
...
...
@@ -40,13 +40,16 @@ class ScaleOpMaker : public framework::OpProtoAndCheckerMaker {
public:
ScaleOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input tensor of scale operator."
);
AddOutput
(
"Out"
,
"The output tensor of scale operator."
);
AddComment
(
R"DOC(Scale operator
AddInput
(
"X"
,
"(Tensor) Input tensor of scale operator."
);
AddOutput
(
"Out"
,
"(Tensor) Output tensor of scale operator."
);
AddComment
(
R"DOC(
Scale operator
The equation is: Out = scale*X
$$Out = scale*X$$
)DOC"
);
AddAttr
<
AttrType
>
(
"scale"
,
"The scaling factor of the scale operator."
)
AddAttr
<
AttrType
>
(
"scale"
,
"(float, default 0)"
"The scaling factor of the scale operator."
)
.
SetDefault
(
1.0
);
}
};
...
...
paddle/operators/seq_expand_op.cc
浏览文件 @
483947c4
...
...
@@ -53,8 +53,10 @@ class SeqExpandOpMaker : public framework::OpProtoAndCheckerMaker {
"(LodTensor)The output of seq_expand op."
"The lod of output will be as same as input(Y)'s lod."
);
AddComment
(
R"DOC(
Expand input(X) according to LOD of input(Y)
.
Seq Expand Operator
.
This operator expands input(X) according to LOD of input(Y).
Following are cases to better explain how this works:
Case 1:
Given 2-level a LoDTensor input(X)
...
...
paddle/operators/sequence_concat_op.cc
浏览文件 @
483947c4
...
...
@@ -47,19 +47,19 @@ class SequenceConcatOpMaker : public framework::OpProtoAndCheckerMaker {
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(
A vector of LoDTensor), the i
nput is a vector of LoDTensor, "
"(
vector<LoDTensor>) I
nput is a vector of LoDTensor, "
"each of which is a variable-length sequence or nested sequence."
)
.
AsDuplicable
();
AddOutput
(
"Out"
,
"(
A LoDTensor), the v
ariable-length output of "
"(
LoDTensor), V
ariable-length output of "
"sequence_concat Op."
);
AddAttr
<
int
>
(
"axis"
,
"(int, default 0)"
"The axis
which the inputs will be joined with
. "
"(int, default 0)
"
"The axis
along which the inputs will be joined
. "
"If axis is 0, the inputs will be joined with LoD index."
)
.
SetDefault
(
0
);
AddAttr
<
int
>
(
"level"
,
"(int, default 0)"
"(int, default 0)
"
"The level at which the inputs will be joined. "
"If the level is 0, the inputs will be joined at the nested "
"sequence level. "
...
...
@@ -68,10 +68,13 @@ class SequenceConcatOpMaker : public framework::OpProtoAndCheckerMaker {
"The level should be less than the level number of inputs."
)
.
SetDefault
(
0
);
AddComment
(
R"DOC(
The sequence_concat operator concatenates multiple LoDTensors.
It only supports sequence (LoD Tensor with level number is 1)
or a nested sequence (LoD tensor with level number is 2) as its input.
- Case1:
Sequence Concat Operator.
The sequence_concat operator concatenates multiple LoDTensors.
It supports a sequence (LoD Tensor with level number is 1)
or a nested sequence (LoD tensor with level number is 2) as its input.
The following examples explain how the operator works:
- Case1:
If the axis is other than 0(here, axis is 1 and level is 1),
each input should have the same LoD information and the LoD
information of the output keeps the same as the input.
...
...
@@ -80,7 +83,7 @@ class SequenceConcatOpMaker : public framework::OpProtoAndCheckerMaker {
LoD(x1) = {{0,2,4}, {0,1,2,3,4}}; Dims(x1) = (4,4,4)
LoD(Out) = {{0,2,4}, {0,1,2,3,4}}; Dims(Out) = (4,7,4)
- Case2:
- Case2:
If the axis is 0(here, leve is 0), the inputs are concatenated along
time steps, the LoD information of the output need to re-compute.
...
...
@@ -88,14 +91,15 @@ class SequenceConcatOpMaker : public framework::OpProtoAndCheckerMaker {
LoD(x1) = {{0,3,5}, {0,1,2,3,5}}; Dims(x1) = (5,3,4)
LoD(Out) = {{0,5,9}, {0,1,2,3,4,5,6,7,9}}; Dims(Out) = (9,3,4)
- Case3:
- Case3:
If the axis is 0(here, level is 1).
LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)
LoD(x1) = {{0,3,5}, {0,1,3,4,5}}; Dims(x1) = (5,3,4)
LoD(Out) = {{0,5,9}, {0,2,5,7,9}}; Dims(Out) = (9,3,4)
NOTE: The levels of all the inputs should be the same.
NOTE: The levels of all the inputs should be the same.
)DOC"
);
}
};
...
...
paddle/operators/sequence_conv_op.cc
浏览文件 @
483947c4
...
...
@@ -105,10 +105,10 @@ class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(LoDTensor) the input(X) is a LodTensor, which support "
"(LoDTensor) the input(X) is a LodTensor, which support
s
"
"variable-time length input sequence. The underlying tensor in "
"this LoDTensor is a matrix with shape (T, N), where
,
T is the "
"total time steps in this mini-batch
,
N is the input_hidden_size."
);
"this LoDTensor is a matrix with shape (T, N), where T is the "
"total time steps in this mini-batch
and
N is the input_hidden_size."
);
AddInput
(
"PaddingData"
,
"(Tensor, optional) the input(PaddingData) is an optional "
"parameter, and it is learnable. "
...
...
@@ -157,14 +157,16 @@ class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker {
.
GreaterThan
(
0
);
AddComment
(
R"DOC(
SequenceConvOp performs convolution operation on features of
contextLength time-steps of each instance.
The convolution operation calculates the output based on the input, filter
and strides, paddings parameters. The size of each dimension of the
parameters is checked in the infer-shape. In order to ensure the equal
length of sequence before and after convolution, it is necessary to fill
the top and bottom of each sequence according to context_length,
context_stride and context_start.
Sequence Conv Operator.
SequenceConvOp performs convolution operation on features of contextLength
time-steps of each instance. The convolution operation calculates the output
based on the input, filter, strides and paddings parameters.
The size of each dimension of the parameters is checked during infer-shape.
In order to ensure the equal length of sequence before and after convolution,
it is necessary to fill the top and bottom of each sequence based on
context_length, context_stride and context_start.
)DOC"
);
}
};
...
...
paddle/operators/sequence_pool_op.cc
浏览文件 @
483947c4
...
...
@@ -54,33 +54,36 @@ class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker {
.
SetDefault
(
"AVERAGE"
)
.
InEnum
({
"AVERAGE"
,
"SUM"
,
"SQRT"
,
"LAST"
,
"FIRST"
,
"MAX"
});
AddComment
(
R"DOC(
SequencePoolOp pools features of all time-steps of each instance.
It supports six pooling pooltype:
- AVERAGE: Out[i] = average_{for each instance in i-th sequence}{X[i]}
- SUM: Out[i] = sum_{for each instance in i-th sequence}{X[i]}
- SQRT: Out[i] = sum_{for each instance in i-th sequence}{X[i]}
/ sqrt(i-th sequence length)
- LAST: Out[i] = last instance in i-th sequence X[i]
- FIRST: Out[i] = first instance in i-th sequence X[i]
- MAX: Out[i] = max_{for each instance in i-th sequence}{X[i]}
For a mini-batch of 3 variable-length sentences, containing 2, 3, and 2 time-steps:
Assume X is a [7,M,N] LoDTensor, and X->lod()[0] = [0, 2, 5, 7], 7=2+3+2.
Besides, for the sake of simplicity, we assume M=1 and N=1,
and the value of X = [[1, 3], [2, 4, 6], [5, 1]].
Thus, Out is a [3,1,1] Tensor without LoD infomation.
And for different pooltype, the value of Out is as follows:
- AVERAGE: [2, 4, 3], where 2=(1+3)/2, 4=(2+4+6)/3, 3=(5+1)/2
- SUM: [4, 12, 6], where 4=1+3, 12=2+4+6, 6=5+1
- SQRT: [2.82, 6.93, 4.24], where 2.82=(1+3)/sqrt(2),
Sequence Pool Operator.
The SequencePoolOp pools features of all time-steps of each instance.
It supports six pooling types:
1. AVERAGE: Out[i] = $$avg(X_i)$$
2. SUM: Out[i] = $$\sum_jX_{ij}$$
3. SQRT: Out[i] = $$\frac{\sum_jX_{ij}}{\sqrt{len(X_i)}}$$
4. LAST: Out[i] = last instance in i-th sequence X[i]
5. FIRST: Out[i] = first instance in i-th sequence X[i]
6. MAX: Out[i] = $$max(X_i)$$
The following example explains how this works:
For a mini-batch of 3 variable-length sentences,
containing 2, 3, and 2 time-steps:
Assume X is a [7,M,N] LoDTensor, and X->lod()[0] = [0, 2, 5, 7], 7=2+3+2.
Besides, for the sake of simplicity, we assume M=1 and N=1,
and the value of X = [[1, 3], [2, 4, 6], [5, 1]].
Thus, Out is a [3,1,1] Tensor without LoD infomation.
And for different pooltype, the value of Out is as follows:
- AVERAGE: [2, 4, 3], where 2=(1+3)/2, 4=(2+4+6)/3, 3=(5+1)/2
- SUM: [4, 12, 6], where 4=1+3, 12=2+4+6, 6=5+1
- SQRT: [2.82, 6.93, 4.24], where 2.82=(1+3)/sqrt(2),
6.93=(2+4+6)/sqrt(3), 4.24=(5+1)/sqrt(2)
- MAX: [3, 6, 5], where 3=max(1,3), 6=max(2,4,6), 5=max(5,1)
- LAST: [3, 6, 1], where 3=last(1,3), 6=last(2,4,6), 1=last(5,1)
- FIRST: [1, 2, 5], where 1=first(1,3), 2=first(2,4,6), 5=first(5,1)
- MAX: [3, 6, 5], where 3=max(1,3), 6=max(2,4,6), 5=max(5,1)
- LAST: [3, 6, 1], where 3=last(1,3), 6=last(2,4,6), 1=last(5,1)
- FIRST: [1, 2, 5], where 1=first(1,3), 2=first(2,4,6), 5=first(5,1)
)DOC"
);
}
};
...
...
paddle/operators/sequence_softmax_op.cc
浏览文件 @
483947c4
...
...
@@ -43,20 +43,24 @@ class SequenceSoftmaxOpMaker : public framework::OpProtoAndCheckerMaker {
"(LoDTensor) 1-D or 2-D output LoDTensor with the 2-nd dimension "
"of length 1."
);
AddComment
(
R"DOC(
SequenceSoftmaxOp computes softmax activation among all time-steps for each
Sequence Softmax Operator.
SequenceSoftmaxOp computes the softmax activation among all time-steps for each
sequence. The dimension of each time-step should be 1. Thus, the shape of
input Tensor can be either [N, 1] or [N], where N is the sum of
all sequences'
length
s.
input Tensor can be either [N, 1] or [N], where N is the sum of
the length
of all sequence
s.
Equation
:
The algorithm works as follows
:
for i-th sequence in a mini-batch:
Out(X[lod[i]:lod[i+1]], :) =
exp(X[lod[i]:lod[i+1], :]) / sum(exp(X[lod[i]:lod[i+1], :]))
$$Out(X[lod[i]:lod[i+1]], :) =
\frac{\exp(X[lod[i]:lod[i+1], :])}
{\sum(\exp(X[lod[i]:lod[i+1], :]))}$$
For example, for a mini-batch of 3 sequences with variable-length,
each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],
then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]
and N turns out to be 7.
)DOC"
);
}
};
...
...
paddle/operators/sgd_op.cc
浏览文件 @
483947c4
...
...
@@ -45,15 +45,17 @@ class SGDOpMaker : public framework::OpProtoAndCheckerMaker {
public:
SGDOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"Param"
,
"Input parameter"
);
AddInput
(
"LearningRate"
,
"Learning rate of SGD"
);
AddInput
(
"Grad"
,
"Input gradient"
);
AddOutput
(
"ParamOut"
,
"
o
utput parameter"
);
AddInput
(
"Param"
,
"
(Tensor)
Input parameter"
);
AddInput
(
"LearningRate"
,
"
(Tensor)
Learning rate of SGD"
);
AddInput
(
"Grad"
,
"
(Tensor)
Input gradient"
);
AddOutput
(
"ParamOut"
,
"
(Tensor) O
utput parameter"
);
AddComment
(
R"DOC(
S
implest sgd algorithm.
S
GD operator
param_out = param - learning_rate * grad;
This operator implements one step of the stochastic gradient descent algorithm.
$$param_out = param - learning_rate * grad$$
)DOC"
);
}
...
...
paddle/operators/sigmoid_cross_entropy_with_logits_op.cc
浏览文件 @
483947c4
...
...
@@ -107,26 +107,28 @@ class SigmoidCrossEntropyWithLogitsOpMaker
AddComment
(
R"DOC(
SigmoidCrossEntropyWithLogits Operator.
This measures the element
wise probability error in discrete
classification tasks
This measures the element
-wise probability error in
classification tasks
in which each class is independent. This can be thought of as predicting labels
for a data-point that are not mutually exclusive. For example, a news article
can be about politics, technology or sports at the same time or none of these.
for a data-point, where labels are not mutually exclusive.
For example, a news article can be about politics, technology or sports
at the same time or none of these.
The logistic loss is given as follows:
loss = -Labels * log(sigmoid(X)) - (1 - Labels) * log(1 - sigmoid(X))
$$loss = -Labels * \log(\sigma(X)) - (1 - Labels) * \log(1 - \sigma(X))$$
We know that
sigmoid(X) = (1 / (1 + exp(-X))). By substituting this we get
We know that
$$\sigma(X) = (1 / (1 + \exp(-X)))$$. By substituting this we get:
loss = X - X * Labels + log(1 + exp(-X))
$$loss = X - X * Labels + \log(1 + \exp(-X))$$
For stability and to prevent overflow of
exp(-X)
when X < 0,
we
can
reformulate the loss as follows:
For stability and to prevent overflow of
$$\exp(-X)$$
when X < 0,
we reformulate the loss as follows:
loss = max(X, 0) - X * Labels + log(1 + exp(-abs(X)))
$$loss = \max(X, 0) - X * Labels + \log(1 + \exp(-|X|))$$
Both the input `X` and `Labels` can carry the LoD (Level of Details) information.
However the output only shares the LoD with input `X`.
)DOC"
);
}
};
...
...
paddle/operators/sign_op.cc
浏览文件 @
483947c4
...
...
@@ -38,9 +38,10 @@ class SignOpMaker : public framework::OpProtoAndCheckerMaker {
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(Tensor) Input tensor of sign operator."
);
AddOutput
(
"Out"
,
"(Tensor) Output tensor of sign operator."
);
AddComment
(
R"DOC(Sign operator
AddComment
(
R"DOC(
Sign operator
The equation is: Out = X.sign()
$$Out = X.sign()$$
)DOC"
);
}
};
...
...
paddle/operators/smooth_l1_loss_op.cc
浏览文件 @
483947c4
...
...
@@ -77,14 +77,17 @@ class SmoothL1LossOpMaker : public framework::OpProtoAndCheckerMaker {
"A float scalar with default value 3.0."
)
.
SetDefault
(
3.0
);
AddComment
(
R"DOC(
Compute smooth l1 loss for input and target. The operator take the 1st
dimension of input as batch size. For each instance, it will compute
smooth l1 loss element by element first and sum all losses to one value.
So the output shape is [batch_size, 1].
Smooth L1 Loss Operator.
This operator computes the smooth l1 loss for input and target.
The operator takes the first dimension of input as the batch size.
For each instance, it computes the smooth l1 loss element by element first
and then sums all the losses. So the resulting output shape
is [batch_size, 1].
The equation is:
loss =
0.5 * (sigma * (x-y))^2 if abs(x - y) < 1 / sigma^2
abs(x - y) - 0.5 / sigma^2
otherwise
loss =
$$0.5 * (\sigma * (x-y))^2$$ if $$|x - y| < 1 /({\sigma}^2)$$
$$\frac{|x - y| - 0.5}{{\sigma}^2}$$
otherwise
)DOC"
);
}
...
...
paddle/operators/softmax_op.cc
浏览文件 @
483947c4
...
...
@@ -44,20 +44,23 @@ class SoftmaxOpMaker : public framework::OpProtoAndCheckerMaker {
"2-D with shape [batch_size, input_feature_dimensions]."
);
AddOutput
(
"Y"
,
"The normalized values with the same shape as X."
);
AddComment
(
R"DOC(
The input of softmax operator is a 2-D tensor with shape N x K (N is the
Softmax Operator.
The input of the softmax operator is a 2-D tensor with shape N x K (N is the
batch_size, K is the dimension of input feature). The output tensor has the
same shape as the input tensor.
For each row of the input tensor, the softmax operator squashes the
K-dimensional vector of arbitrary real values to a K-dimensional vector of real
values in the range [0, 1] that add up to 1. Specifically, it computes the
exponential of the given dimension and the sum of exponential values of all
the other dimensions in the K-dimensional vector input. Then the ratio of the
exponential of the given dimension and the sum of exponential values of all
the other dimensions is the output of the softmax operator.
values in the range [0, 1] that add up to 1.
It computes the exponential of the given dimension and the sum of exponential
values of all the other dimensions in the K-dimensional vector input.
Then the ratio of the exponential of the given dimension and the sum of
exponential values of all the other dimensions is the output of the softmax
operator.
For each row `i` and each column `j` in input X, we have:
Y[i, j] = exp(X[i, j]) / sum_j(exp(X[i, j]))
$$Y[i, j] = \frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])}$$
)DOC"
);
}
...
...
paddle/operators/softmax_with_cross_entropy_op.cc
浏览文件 @
483947c4
...
...
@@ -51,32 +51,34 @@ class SoftmaxWithCrossEntropyOpMaker
"the given labels as soft labels."
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
Cross entropy loss with softmax are used as the output layer extensively. This
Softmax With Cross Entropy Operator.
Cross entropy loss with softmax is used as the output layer extensively. This
operator computes the softmax normalized values for each row of the input
tensor, after which cross-entropy loss is
then
computed. This provides a more
tensor, after which cross-entropy loss is computed. This provides a more
numerically stable gradient.
Because this operator
s
performs a softmax on logits internally, it expects
unscaled logits.
Please do not call this op with the output of softmax operator,
which will
produce incorrect results.
Because this operator performs a softmax on logits internally, it expects
unscaled logits.
This operator should not be used with the output of
softmax operator since that would
produce incorrect results.
When the attribute softLabel is set false, this operators expects mutually
exclusive hard labels, each sample in a batch is in exactly one class with
probabilit
ies 1. Each sample in the batch with one and only on
e label.
exclusive hard labels, each sample in a batch is in exactly one class with
a
probabilit
y of 1.0. Each sample in the batch will have a singl
e label.
Equation
:
The equation is as follows
:
1)
hard label (one-hot label
)
1)
Hard label (one-hot label, so every sample has exactly one class
)
Loss_j = \f$ -\text{Logit}_{Label_j} +
$$
Loss_j = \f$ -\text{Logit}_{Label_j} +
\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right),
j = 1, ..., K $\f
j = 1, ..., K $\f
$$
2)
soft label (
a distribution over all classes)
2)
Soft label (each sample can have
a distribution over all classes)
Loss_j = \f$ -\sum_{i=0}^{K}\text{Label}_i\left(\text{Logit}_i -
$$
Loss_j = \f$ -\sum_{i=0}^{K}\text{Label}_i\left(\text{Logit}_i -
\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right)\right),
j = 1,...,K $\f
j = 1,...,K $\f
$$
)DOC"
);
}
...
...
paddle/operators/split_op.cc
浏览文件 @
483947c4
...
...
@@ -67,11 +67,15 @@ class SplitOpMaker : public framework::OpProtoAndCheckerMaker {
public:
SplitOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"the input tensor of split operator."
);
AddOutput
(
"Out"
,
"the output tensors of split operator."
).
AsDuplicable
();
AddInput
(
"X"
,
"(Tensor) Input tensor of the split operator."
);
AddOutput
(
"Out"
,
"(Tensor) Output tensors of the split operator."
)
.
AsDuplicable
();
AddComment
(
R"DOC(
Split the input tensor into multiple sub-tensors.
Example:
Split operator
This operator splits the input tensor into multiple sub-tensors.
Example:
Input = [[1,2],
[3,4],
[5,6]]
...
...
@@ -83,14 +87,18 @@ class SplitOpMaker : public framework::OpProtoAndCheckerMaker {
)DOC"
);
AddAttr
<
std
::
vector
<
int
>>
(
"sections"
,
"the length for each"
"output along with the specify axis."
)
"(vector<int>) "
"the length of each output along the "
"specified axis."
)
.
SetDefault
(
std
::
vector
<
int
>
{});
AddAttr
<
int
>
(
"num"
,
"number of the sub-tensors, it must evenly divide "
"(int, default 0)"
"Number of sub-tensors. This must evenly divide "
"Input.dims()[axis]"
)
.
SetDefault
(
0
);
AddAttr
<
int
>
(
"axis"
,
"The axis which the input will be splited on."
)
AddAttr
<
int
>
(
"axis"
,
"(int, default 0) "
"The axis which the input will be splited on."
)
.
SetDefault
(
0
);
}
};
...
...
paddle/operators/squared_l2_distance_op.cc
浏览文件 @
483947c4
...
...
@@ -59,23 +59,26 @@ class SquaredL2DistanceOpMaker : public framework::OpProtoAndCheckerMaker {
SquaredL2DistanceOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"Input of SquaredL2DistanceOp."
);
AddInput
(
"Y"
,
"Target of SquaredL2DistanceOp."
);
AddInput
(
"X"
,
"
(Tensor)
Input of SquaredL2DistanceOp."
);
AddInput
(
"Y"
,
"
(Tensor)
Target of SquaredL2DistanceOp."
);
AddOutput
(
"sub_result"
,
"
Buffering subs
traction result which "
"
(Tensor) Buffering sub
traction result which "
"will be reused in backward."
)
.
AsIntermediate
();
AddOutput
(
"Out"
,
"Squared l2 distance between input and target."
);
AddOutput
(
"Out"
,
"
(Tensor)
Squared l2 distance between input and target."
);
AddComment
(
R"DOC(
SquaredL2DistanceOp will cacluate the squared L2 distance for
input and target. Number of distance value equals to the
first dimension of input. First dimension of target could be equal to
input or to 1. If the first dimension of target is 1, SquaredL2DistanceOp
will broadcast target's first dimension to input's first dimension.
You can decide whether calculate the gradient of input and target.
Both the input X and Y can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with input X.
SquaredL2Distance operator
This operator will cacluate the squared L2 distance for the input and
the target. Number of distance value will be equal to the first dimension
of input. First dimension of the target could be equal to the input or to 1.
If the first dimension of target is 1, the operator will broadcast target's
first dimension to input's first dimension. During backward propagation,
the user can decide whether to calculate the gradient of the input or
the target or both.
Both the input X and Y can carry the LoD (Level of Details) information.
However, the output only shares the LoD information with input X.
)DOC"
);
}
};
...
...
paddle/operators/squared_l2_norm_op.cc
浏览文件 @
483947c4
...
...
@@ -52,13 +52,13 @@ class SquaredL2NormOpMaker : public framework::OpProtoAndCheckerMaker {
framework
::
OpAttrChecker
*
op_checker
)
:
framework
::
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"(Tensor) The input of squared_l2_norm op."
);
AddOutput
(
"Out"
,
"(
Float
) The output of squared_l2_norm op."
);
AddOutput
(
"Out"
,
"(
Scalar
) The output of squared_l2_norm op."
);
AddComment
(
R"DOC(
SquaredL2Norm Operator.
Computes the squared L2 norm of a tensor.
Out = sum (X ** 2)
$$Out = \sum_{i} X_{i}^2$$
)DOC"
);
}
...
...
paddle/operators/sum_op.cc
浏览文件 @
483947c4
...
...
@@ -45,13 +45,15 @@ class SumOpMaker : public framework::OpProtoAndCheckerMaker {
public:
SumOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"the input tensors of sum operator."
).
AsDuplicable
();
AddOutput
(
"Out"
,
"the output tensor of sum operator."
);
AddInput
(
"X"
,
"(vector<Tensor>) The input tensors of sum operator."
)
.
AsDuplicable
();
AddOutput
(
"Out"
,
"(Tensor) The output tensor of sum operator."
);
AddComment
(
R"DOC(
Sum
the input tensors
.
Sum
operator
.
All the inputs can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with the first input.
This operators sums the input tensors. All the inputs can carry the
LoD (Level of Details) information. However, the output only shares
the LoD information with the first input.
)DOC"
);
}
};
...
...
paddle/operators/top_k_op.cc
浏览文件 @
483947c4
...
...
@@ -48,20 +48,20 @@ class TopkOpMaker : public framework::OpProtoAndCheckerMaker {
public:
TopkOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input of Topk op"
);
AddOutput
(
"Out"
,
"The output tensor of Topk op"
);
AddOutput
(
"Indices"
,
"The indices of Topk elements of input"
);
AddComment
(
R"DOC(If the input is a vector (1d tensor),
finds the k largest entries in the vector
and outputs their values and indices as vectors.
Thus values[j] is the j-th largest entry in input,
and its index is indices[j].
AddInput
(
"X"
,
"(Tensor) The input of Topk op"
);
AddOutput
(
"Out"
,
"(Tensor) The output tensor of Topk op"
);
AddOutput
(
"Indices"
,
"(Tensor) The indices of Topk elements of input"
);
AddComment
(
R"DOC(
Top K operator
For matrices, computes the top k entries in each row. )DOC"
);
If the input is a vector (1d tensor), this operator finds the k largest
entries in the vector and outputs their values and indices as vectors.
Thus values[j] is the j-th largest entry in input, and its index is indices[j].
For matrices, this operator computes the top k entries in each row. )DOC"
);
AddAttr
<
int
>
(
"k"
,
"
Number of top elements to look for along the last
"
"dimension (along each row for matrices)."
)
"
(int, default 1) Number of top elements to look for along
"
"
the last
dimension (along each row for matrices)."
)
.
SetDefault
(
1
);
}
};
...
...
paddle/operators/transpose_op.cc
浏览文件 @
483947c4
...
...
@@ -32,7 +32,7 @@ class TransposeOp : public framework::OperatorWithKernel {
size_t
axis_size
=
axis
.
size
();
PADDLE_ENFORCE_EQ
(
x_rank
,
axis_size
,
"
t
he input tensor's rank(%d) "
"
T
he input tensor's rank(%d) "
"should be equal to the axis's size(%d)"
,
x_rank
,
axis_size
);
...
...
@@ -64,12 +64,14 @@ class TransposeOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"Out"
,
"(Tensor)The output tensor"
);
AddAttr
<
std
::
vector
<
int
>>
(
"axis"
,
"(vector<int>)
a
list of values, and the size of the list should be "
"(vector<int>)
A
list of values, and the size of the list should be "
"the same with the input tensor rank, the tensor will "
"permute the axes according the the values given"
);
AddComment
(
R"DOC(
The Tensor will be permuted according to the axis values given.
The op is very much like the numpy.transpose function in python
Transpose Operator.
The input tensor will be permuted according to the axis values given.
The op functions similar to how numpy.transpose works in python.
For example:
>> input = numpy.arange(6).reshape((2,3))
>> input
...
...
@@ -83,6 +85,7 @@ For example:
[2, 5]])
So, given a input tensor of shape(N, C, H, W) and the axis is {0, 2, 3, 1},
the output tensor shape will be (N, H, W, C)
)DOC"
);
}
};
...
...
paddle/operators/uniform_random_op.cc
浏览文件 @
483947c4
...
...
@@ -74,18 +74,30 @@ class UniformRandomOpMaker : public framework::OpProtoAndCheckerMaker {
UniformRandomOpMaker
(
framework
::
OpProto
*
proto
,
framework
::
OpAttrChecker
*
op_checker
)
:
framework
::
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddOutput
(
"Out"
,
"The output tensor of uniform random op"
);
AddComment
(
R"DOC(Uniform random operator.
Used to initialize tensor with uniform random generator.
AddOutput
(
"Out"
,
"(Tensor) The output tensor of uniform random op"
);
AddComment
(
R"DOC(
Uniform random operator.
This operator initializes a tensor with random values sampled from a
uniform distribution.
)DOC"
);
AddAttr
<
std
::
vector
<
int
>>
(
"shape"
,
"the dimension of random tensor"
);
AddAttr
<
float
>
(
"min"
,
"Minimum value of uniform random"
).
SetDefault
(
-
1.0
f
);
AddAttr
<
float
>
(
"max"
,
"Maximun value of uniform random"
).
SetDefault
(
1.0
f
);
AddAttr
<
std
::
vector
<
int
>>
(
"shape"
,
"(vector<int>) The shape of the output tensor"
);
AddAttr
<
float
>
(
"min"
,
"(float, default -1.0) "
"Minimum value of uniform random"
)
.
SetDefault
(
-
1.0
f
);
AddAttr
<
float
>
(
"max"
,
"(float, default 1.0) "
"Maximun value of uniform random"
)
.
SetDefault
(
1.0
f
);
AddAttr
<
int
>
(
"seed"
,
"Random seed of uniform random. "
"0 means generate a seed by system"
)
"(int, default 0) "
"Random seed used for generating samples. "
"0 means use a seed generated by the system."
)
.
SetDefault
(
0
);
AddAttr
<
int
>
(
"data_type"
,
"
o
utput tensor data type"
)
AddAttr
<
int
>
(
"data_type"
,
"
(int, default 5(FP32)) O
utput tensor data type"
)
.
SetDefault
(
framework
::
DataType
::
FP32
);
}
};
...
...
paddle/pybind/protobuf.cc
浏览文件 @
483947c4
...
...
@@ -238,7 +238,9 @@ void BindVarDsec(py::module &m) {
.
value
(
"SELECTED_ROWS"
,
VarDesc
::
SELECTED_ROWS
)
.
value
(
"FEED_MINIBATCH"
,
VarDesc
::
FEED_MINIBATCH
)
.
value
(
"FETCH_LIST"
,
VarDesc
::
FETCH_LIST
)
.
value
(
"STEP_SCOPES"
,
VarDesc
::
STEP_SCOPES
);
.
value
(
"STEP_SCOPES"
,
VarDesc
::
STEP_SCOPES
)
.
value
(
"LOD_RANK_TABLE"
,
VarDesc
::
LOD_RANK_TABLE
)
.
value
(
"LOD_TENSOR_ARRAY"
,
VarDesc
::
LOD_TENSOR_ARRAY
);
}
void
BindOpDesc
(
py
::
module
&
m
)
{
...
...
paddle/pybind/pybind.cc
浏览文件 @
483947c4
...
...
@@ -21,7 +21,9 @@ limitations under the License. */
#include "paddle/framework/executor.h"
#include "paddle/framework/feed_fetch_method.h"
#include "paddle/framework/framework.pb.h"
#include "paddle/framework/lod_rank_table.h"
#include "paddle/framework/lod_tensor.h"
#include "paddle/framework/lod_tensor_array.h"
#include "paddle/framework/prune.h"
#include "paddle/framework/selected_rows.h"
#include "paddle/framework/tensor_array.h"
...
...
@@ -224,11 +226,17 @@ All parameter, weight, gradient are variables in Paddle.
return
self
.
GetMutable
<
LoDTensor
>
();
},
py
::
return_value_policy
::
reference
)
.
def
(
"get_lod_rank_table"
,
[](
Variable
&
self
)
{
return
self
.
GetMutable
<
LoDRankTable
>
();
},
py
::
return_value_policy
::
reference
)
.
def
(
"get_selected_rows"
,
[](
Variable
&
self
)
->
SelectedRows
*
{
return
self
.
GetMutable
<
SelectedRows
>
();
},
py
::
return_value_policy
::
reference
)
.
def
(
"get_lod_tensor_array"
,
[](
Variable
&
self
)
{
return
self
.
GetMutable
<
LoDTensorArray
>
();
},
py
::
return_value_policy
::
reference
)
#ifdef PADDLE_WITH_CUDA
.
def
(
"get_communicator"
,
[](
Variable
&
self
)
->
platform
::
Communicator
*
{
...
...
@@ -492,6 +500,32 @@ All parameter, weight, gradient are variables in Paddle.
BindVarDsec
(
m
);
BindOpDesc
(
m
);
py
::
class_
<
framework
::
LoDRankTable
>
(
m
,
"LodRankTable"
)
.
def
(
"items"
,
[](
framework
::
LoDRankTable
&
table
)
{
std
::
vector
<
std
::
pair
<
size_t
,
size_t
>>
res
;
for
(
auto
&
item
:
table
.
items
())
{
res
.
push_back
({
item
.
index
,
item
.
length
});
}
return
res
;
});
py
::
class_
<
LoDTensorArray
>
(
m
,
"LoDTensorArray"
)
.
def
(
"__getitem__"
,
[](
LoDTensorArray
&
self
,
size_t
i
)
{
return
&
self
.
at
(
i
);
},
py
::
return_value_policy
::
reference
)
.
def
(
"__len__"
,
[](
LoDTensorArray
&
self
)
{
return
self
.
size
();
})
.
def
(
"__setitem__"
,
[](
LoDTensorArray
&
self
,
size_t
i
,
const
LoDTensor
&
t
)
{
PADDLE_ENFORCE_LT
(
i
,
self
.
size
());
self
[
i
].
ShareDataWith
(
t
);
self
[
i
].
set_lod
(
t
.
lod
());
})
.
def
(
"append"
,
[](
LoDTensorArray
&
self
,
const
LoDTensor
&
t
)
{
self
.
emplace_back
();
self
.
back
().
ShareDataWith
(
t
);
self
.
back
().
set_lod
(
t
.
lod
());
});
m
.
def
(
"op_support_gpu"
,
OpSupportGPU
);
#ifdef PADDLE_WITH_CUDA
m
.
def
(
"get_cuda_device_count"
,
platform
::
GetCUDADeviceCount
);
...
...
paddle/scripts/docker/build.sh
浏览文件 @
483947c4
...
...
@@ -162,6 +162,7 @@ ${DOCKERFILE_CUDNN_DSO}
${
DOCKERFILE_GPU_ENV
}
ADD go/cmd/pserver/pserver /usr/bin/
ADD go/cmd/master/master /usr/bin/
ADD paddle/pybind/print_operators_doc /usr/bin/
# default command shows the paddle version and exit
CMD ["paddle", "version"]
EOF
...
...
python/paddle/trainer/config_parser.py
浏览文件 @
483947c4
...
...
@@ -2775,9 +2775,15 @@ class NCELayer(LayerBase):
@
config_layer
(
'addto'
)
class
AddToLayer
(
LayerBase
):
layer_type
=
'addto'
def
__init__
(
self
,
name
,
inputs
,
bias
=
True
,
**
xargs
):
use_mkldnn
=
bool
(
int
(
g_command_config_args
.
get
(
"use_mkldnn"
,
0
)))
if
self
.
layer_type
==
"mkldnn_addto"
:
config_assert
(
use_mkldnn
,
"mkldnn_addto only support MKLDNN"
)
self
.
layer_type
=
'mkldnn_addto'
if
use_mkldnn
else
'addto'
super
(
AddToLayer
,
self
).
__init__
(
name
,
'addto'
,
0
,
inputs
=
inputs
,
**
xargs
)
name
,
self
.
layer_type
,
0
,
inputs
=
inputs
,
**
xargs
)
config_assert
(
len
(
inputs
)
>
0
,
'inputs cannot be empty for AddToLayer'
)
if
len
(
self
.
inputs
)
>
1
:
...
...
@@ -2796,6 +2802,11 @@ class AddToLayer(LayerBase):
self
.
create_bias_parameter
(
bias
,
self
.
config
.
size
)
@
config_layer
(
'mkldnn_addto'
)
class
MKLDNNAddtoLayer
(
AddToLayer
):
layer_type
=
'mkldnn_addto'
@
config_layer
(
'agent'
)
class
AgentLayer
(
LayerBase
):
def
__init__
(
self
,
name
,
size
,
device
=
None
):
...
...
python/paddle/v2/framework/backward.py
浏览文件 @
483947c4
...
...
@@ -19,8 +19,20 @@ def append_backward_ops(loss, parameter_list=None, no_grad_set=None):
:rtype: list[Variable]
"""
assert
isinstance
(
loss
,
framework
.
Variable
)
param_grad_map
=
loss
.
block
.
program
.
append_backward
(
loss
,
no_grad_set
or
set
())
if
no_grad_set
is
None
:
program
=
loss
.
block
.
program
assert
isinstance
(
program
,
framework
.
Program
)
no_grad_set
=
list
()
for
block
in
program
.
blocks
:
assert
isinstance
(
block
,
framework
.
Block
)
for
var
in
block
.
vars
.
itervalues
():
assert
isinstance
(
var
,
framework
.
Variable
)
if
var
.
stop_gradient
:
no_grad_set
.
append
(
var
.
name
)
no_grad_set
=
set
(
no_grad_set
)
param_grad_map
=
loss
.
block
.
program
.
append_backward
(
loss
,
no_grad_set
)
if
parameter_list
is
not
None
:
parameters
=
parameter_list
else
:
...
...
python/paddle/v2/framework/framework.py
浏览文件 @
483947c4
...
...
@@ -21,6 +21,7 @@ class Variable(object):
dtype
=
None
,
lod_level
=
None
,
persistable
=
None
,
stop_gradient
=
False
,
**
kwargs
):
self
.
block
=
block
...
...
@@ -89,6 +90,7 @@ class Variable(object):
self
.
block
.
vars
[
name
]
=
self
self
.
op
=
None
self
.
stop_gradient
=
stop_gradient
def
__str__
(
self
):
protostr
=
self
.
desc
.
serialize_to_string
()
...
...
@@ -101,6 +103,10 @@ class Variable(object):
def
persistable
(
self
):
return
self
.
desc
.
persistable
()
@
persistable
.
setter
def
persistable
(
self
,
p
):
self
.
desc
.
set_persistable
(
p
)
@
property
def
name
(
self
):
return
self
.
desc
.
name
()
...
...
@@ -546,5 +552,5 @@ class Parameter(Variable):
# program is a global instance.
g_program
=
Program
()
g_
init
_program
=
Program
()
g_
main_
program
=
Program
()
g_
startup
_program
=
Program
()
python/paddle/v2/framework/io.py
浏览文件 @
483947c4
import
os
import
cPickle
as
pickle
from
paddle.v2.framework.framework
import
Program
,
Parameter
,
g_program
,
\
from
paddle.v2.framework.framework
import
Program
,
Parameter
,
g_
main_
program
,
\
Variable
__all__
=
[
...
...
@@ -29,13 +29,13 @@ def _clone_var_in_block_(block, var):
persistable
=
True
)
def
save_vars
(
executor
,
dirname
,
program
=
None
,
vars
=
None
,
predicate
=
None
):
def
save_vars
(
executor
,
dirname
,
main_
program
=
None
,
vars
=
None
,
predicate
=
None
):
"""
Save variables to directory by executor.
:param executor: executor that save variable
:param dirname: directory path
:param program: program. If vars is None, then filter all variables in this
:param
main_
program: program. If vars is None, then filter all variables in this
program which fit `predicate`. Default g_program.
:param predicate: The Predicate describes a callable that returns a variable
as a bool. If it returns true, the variables will be saved.
...
...
@@ -44,15 +44,15 @@ def save_vars(executor, dirname, program=None, vars=None, predicate=None):
:return: None
"""
if
vars
is
None
:
if
program
is
None
:
program
=
g
_program
if
not
isinstance
(
program
,
Program
):
if
main_
program
is
None
:
main_program
=
g_main
_program
if
not
isinstance
(
main_
program
,
Program
):
raise
TypeError
(
"program should be as Program type or None"
)
save_vars
(
executor
,
dirname
=
dirname
,
vars
=
filter
(
predicate
,
program
.
list_vars
()))
vars
=
filter
(
predicate
,
main_
program
.
list_vars
()))
else
:
save_program
=
Program
()
save_block
=
save_program
.
global_block
()
...
...
@@ -66,37 +66,37 @@ def save_vars(executor, dirname, program=None, vars=None, predicate=None):
executor
.
run
(
save_program
)
def
save_params
(
executor
,
dirname
,
program
=
None
):
def
save_params
(
executor
,
dirname
,
main_
program
=
None
):
"""
Save all parameters to directory with executor.
"""
save_vars
(
executor
,
dirname
=
dirname
,
program
=
program
,
main_program
=
main_
program
,
vars
=
None
,
predicate
=
is_parameter
)
def
save_persistables
(
executor
,
dirname
,
program
=
None
):
def
save_persistables
(
executor
,
dirname
,
main_
program
=
None
):
"""
Save all persistables to directory with executor.
"""
save_vars
(
executor
,
dirname
=
dirname
,
program
=
program
,
main_program
=
main_
program
,
vars
=
None
,
predicate
=
is_persistable
)
def
load_vars
(
executor
,
dirname
,
program
=
None
,
vars
=
None
,
predicate
=
None
):
def
load_vars
(
executor
,
dirname
,
main_
program
=
None
,
vars
=
None
,
predicate
=
None
):
"""
Load variables from directory by executor.
:param executor: executor that save variable
:param dirname: directory path
:param program: program. If vars is None, then filter all variables in this
:param
main_
program: program. If vars is None, then filter all variables in this
program which fit `predicate`. Default g_program.
:param predicate: The Predicate describes a callable that returns a variable
as a bool. If it returns true, the variables will be loaded.
...
...
@@ -105,15 +105,15 @@ def load_vars(executor, dirname, program=None, vars=None, predicate=None):
:return: None
"""
if
vars
is
None
:
if
program
is
None
:
program
=
g
_program
if
not
isinstance
(
program
,
Program
):
if
main_
program
is
None
:
main_program
=
g_main
_program
if
not
isinstance
(
main_
program
,
Program
):
raise
TypeError
(
"program's type should be Program"
)
load_vars
(
executor
,
dirname
=
dirname
,
vars
=
filter
(
predicate
,
program
.
list_vars
()))
vars
=
filter
(
predicate
,
main_
program
.
list_vars
()))
else
:
load_prog
=
Program
()
load_block
=
load_prog
.
global_block
()
...
...
@@ -129,27 +129,33 @@ def load_vars(executor, dirname, program=None, vars=None, predicate=None):
executor
.
run
(
load_prog
)
def
load_params
(
executor
,
dirname
,
program
=
None
):
def
load_params
(
executor
,
dirname
,
main_
program
=
None
):
"""
load all parameters from directory by executor.
"""
load_vars
(
executor
,
dirname
=
dirname
,
program
=
program
,
predicate
=
is_parameter
)
executor
,
dirname
=
dirname
,
main_program
=
main_program
,
predicate
=
is_parameter
)
def
load_persistables
(
executor
,
dirname
,
program
=
None
):
def
load_persistables
(
executor
,
dirname
,
main_
program
=
None
):
"""
load all persistables from directory by executor.
"""
load_vars
(
executor
,
dirname
=
dirname
,
program
=
program
,
predicate
=
is_persistable
)
executor
,
dirname
=
dirname
,
main_program
=
main_program
,
predicate
=
is_persistable
)
def
save_inference_model
(
dirname
,
feeded_var_names
,
target_vars
,
executor
,
program
=
None
):
main_
program
=
None
):
"""
Build a model especially for inference,
and save it to directory by the executor.
...
...
@@ -158,20 +164,20 @@ def save_inference_model(dirname,
:param feeded_var_names: Names of variables that need to be feeded data during inference
:param target_vars: Variables from which we can get inference results.
:param executor: executor that save inference model
:param program: original program, which will be pruned to build the inference model.
:param
main_
program: original program, which will be pruned to build the inference model.
Default g_program.
:return: None
"""
if
program
is
None
:
program
=
g
_program
if
main_
program
is
None
:
main_program
=
g_main
_program
if
not
isinstance
(
target_vars
,
list
):
target_vars
=
[
target_vars
]
if
not
os
.
path
.
isdir
(
dirname
):
os
.
makedirs
(
dirname
)
pruned_program
=
program
.
prune
(
target_vars
)
pruned_program
=
main_
program
.
prune
(
target_vars
)
fetch_var_names
=
[
v
.
name
for
v
in
target_vars
]
model_file_name
=
dirname
+
"/__model__"
...
...
@@ -182,10 +188,10 @@ def save_inference_model(dirname,
"fetch_var_names"
:
fetch_var_names
},
f
,
-
1
)
save_params
(
executor
,
dirname
,
program
)
save_params
(
executor
,
dirname
,
main_
program
)
def
load_persistables_if_exist
(
executor
,
dirname
,
program
=
None
):
def
load_persistables_if_exist
(
executor
,
dirname
,
main_
program
=
None
):
filenames
=
next
(
os
.
walk
(
dirname
))[
2
]
filenames
=
set
(
filenames
)
...
...
@@ -198,7 +204,7 @@ def load_persistables_if_exist(executor, dirname, program=None):
load_vars
(
executor
,
dirname
,
program
=
program
,
main_program
=
main_
program
,
vars
=
None
,
predicate
=
_is_presistable_and_exist_
)
...
...
python/paddle/v2/framework/layer_helper.py
浏览文件 @
483947c4
import
copy
import
itertools
from
paddle.v2.framework.framework
import
Variable
,
g_program
,
\
g_
init
_program
,
unique_name
,
Program
from
paddle.v2.framework.framework
import
Variable
,
g_
main_
program
,
\
g_
startup
_program
,
unique_name
,
Program
from
paddle.v2.framework.initializer
import
ConstantInitializer
,
\
UniformInitializer
...
...
@@ -20,23 +20,23 @@ class LayerHelper(object):
return
self
.
kwargs
[
'name'
]
@
property
def
program
(
self
):
prog
=
self
.
kwargs
.
get
(
'program'
,
None
)
def
main_
program
(
self
):
prog
=
self
.
kwargs
.
get
(
'
main_
program'
,
None
)
if
prog
is
None
:
return
g_program
return
g_
main_
program
else
:
return
prog
@
property
def
init
_program
(
self
):
prog
=
self
.
kwargs
.
get
(
'
init
_program'
,
None
)
def
startup
_program
(
self
):
prog
=
self
.
kwargs
.
get
(
'
startup
_program'
,
None
)
if
prog
is
None
:
return
g_
init
_program
return
g_
startup
_program
else
:
return
prog
def
append_op
(
self
,
*
args
,
**
kwargs
):
return
self
.
program
.
current_block
().
append_op
(
*
args
,
**
kwargs
)
return
self
.
main_
program
.
current_block
().
append_op
(
*
args
,
**
kwargs
)
def
multiple_input
(
self
,
input_param_name
=
'input'
):
inputs
=
self
.
kwargs
.
get
(
input_param_name
,
[])
...
...
@@ -112,32 +112,35 @@ class LayerHelper(object):
raise
ValueError
(
"Data Type mismatch"
)
return
dtype
def
create_parameter
(
self
,
attr
,
shape
,
dtype
,
suffix
=
'w'
):
def
create_parameter
(
self
,
attr
,
shape
,
dtype
,
suffix
=
'w'
,
initializer
=
None
):
# Deepcopy the attr so that parameters can be shared in program
attr_copy
=
copy
.
deepcopy
(
attr
)
if
initializer
is
not
None
:
attr_copy
[
'initializer'
]
=
initializer
if
attr_copy
[
'name'
]
is
None
:
attr_copy
[
'name'
]
=
unique_name
(
"."
.
join
([
self
.
name
,
suffix
]))
self
.
init
_program
.
global_block
().
create_parameter
(
self
.
startup
_program
.
global_block
().
create_parameter
(
dtype
=
dtype
,
shape
=
shape
,
**
attr_copy
)
return
self
.
program
.
global_block
().
create_parameter
(
return
self
.
main_
program
.
global_block
().
create_parameter
(
name
=
attr_copy
[
'name'
],
dtype
=
dtype
,
shape
=
shape
)
def
create_tmp_variable
(
self
,
dtype
):
return
self
.
program
.
current_block
().
create_var
(
return
self
.
main_
program
.
current_block
().
create_var
(
name
=
unique_name
(
"."
.
join
([
self
.
name
,
'tmp'
])),
dtype
=
dtype
,
persistable
=
False
)
def
create_variable
(
self
,
*
args
,
**
kwargs
):
return
self
.
program
.
current_block
().
create_var
(
*
args
,
**
kwargs
)
return
self
.
main_
program
.
current_block
().
create_var
(
*
args
,
**
kwargs
)
def
create_global_variable
(
self
,
persistable
=
False
,
*
args
,
**
kwargs
):
return
self
.
program
.
global_block
().
create_var
(
return
self
.
main_
program
.
global_block
().
create_var
(
*
args
,
persistable
=
persistable
,
**
kwargs
)
def
set_variable_initializer
(
self
,
var
,
initializer
):
assert
isinstance
(
var
,
Variable
)
self
.
init
_program
.
global_block
().
create_var
(
self
.
startup
_program
.
global_block
().
create_var
(
name
=
var
.
name
,
type
=
var
.
type
,
dtype
=
var
.
data_type
,
...
...
python/paddle/v2/framework/layers.py
浏览文件 @
483947c4
from
paddle.v2.framework.layer_helper
import
LayerHelper
,
unique_name
import
paddle.v2.framework.core
as
core
from
paddle.v2.framework.framework
import
OpProtoHolder
,
Variable
,
Program
,
\
Operato
r
from
paddle.v2.framework.
initializer
import
ConstantInitializer
from
paddle.v2.framework.framework
import
OpProtoHolder
,
Variable
,
Program
,
Operator
from
paddle.v2.framework.initializer
import
ConstantInitializer
,
NormalInitialize
r
from
paddle.v2.framework.
layer_helper
import
LayerHelper
,
unique_name
import
re
__all__
=
[
...
...
@@ -19,8 +18,8 @@ def fc(input,
name
=
None
,
act
=
None
,
num_flatten_dims
=
1
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
# create helper
helper
=
LayerHelper
(
'fc'
,
**
locals
())
...
...
@@ -65,8 +64,8 @@ def embedding(input,
data_type
=
'float32'
,
is_sparse
=
False
,
param_attr
=
None
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
helper
=
LayerHelper
(
'embedding'
,
**
locals
())
w
=
helper
.
create_parameter
(
attr
=
helper
.
param_attr
,
shape
=
size
,
dtype
=
data_type
)
...
...
@@ -85,8 +84,8 @@ def data(name,
data_type
=
'float32'
,
type
=
core
.
VarDesc
.
VarType
.
LOD_TENSOR
,
append_batch_size
=
True
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
helper
=
LayerHelper
(
'data'
,
**
locals
())
shape
=
list
(
shape
)
for
i
in
xrange
(
len
(
shape
)):
...
...
@@ -100,7 +99,7 @@ def data(name,
shape
=
[
-
1
]
+
shape
# append batch size as -1
return
helper
.
create_global_variable
(
name
=
name
,
shape
=
shape
,
dtype
=
data_type
,
type
=
type
)
name
=
name
,
shape
=
shape
,
dtype
=
data_type
,
type
=
type
,
stop_gradient
=
True
)
def
_convert_
(
name
):
...
...
@@ -179,7 +178,7 @@ _create_op_func_('sigmoid')
_create_op_func_
(
'scale'
)
def
cast
(
x
,
data_type
,
program
=
None
):
def
cast
(
x
,
data_type
,
main_
program
=
None
):
helper
=
LayerHelper
(
'cast'
,
**
locals
())
out
=
helper
.
create_tmp_variable
(
dtype
=
data_type
)
helper
.
append_op
(
...
...
@@ -191,7 +190,7 @@ def cast(x, data_type, program=None):
return
out
def
concat
(
input
,
axis
,
program
=
None
,
init
_program
=
None
):
def
concat
(
input
,
axis
,
main_program
=
None
,
startup
_program
=
None
):
helper
=
LayerHelper
(
'concat'
,
**
locals
())
out
=
helper
.
create_tmp_variable
(
dtype
=
helper
.
input_dtype
())
helper
.
append_op
(
...
...
@@ -202,7 +201,7 @@ def concat(input, axis, program=None, init_program=None):
return
out
def
sums
(
input
,
program
=
None
,
init
_program
=
None
):
def
sums
(
input
,
main_program
=
None
,
startup
_program
=
None
):
helper
=
LayerHelper
(
'sum'
,
**
locals
())
out
=
helper
.
create_tmp_variable
(
dtype
=
helper
.
input_dtype
())
helper
.
append_op
(
type
=
'sum'
,
inputs
=
{
'X'
:
input
},
outputs
=
{
'Out'
:
out
})
...
...
@@ -282,8 +281,8 @@ def sequence_conv(input,
padding
=
None
,
bias_attr
=
None
,
param_attr
=
None
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
# FIXME(dzh) : want to unify the argument of python layer
# function. So we ignore some unecessary attributes.
# such as, padding_trainable, context_start.
...
...
@@ -322,8 +321,8 @@ def conv2d(input,
padding
=
None
,
bias_attr
=
None
,
param_attr
=
None
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
helper
=
LayerHelper
(
'conv2d'
,
**
locals
())
dtype
=
helper
.
input_dtype
()
...
...
@@ -344,8 +343,13 @@ def conv2d(input,
input_shape
=
input
.
shape
filter_shape
=
[
num_filters
,
num_filter_channels
]
+
filter_size
std
=
(
2.0
/
(
filter_size
[
0
]
**
2
*
num_channels
))
**
0.5
filter
=
helper
.
create_parameter
(
attr
=
helper
.
param_attr
,
shape
=
filter_shape
,
dtype
=
dtype
)
attr
=
helper
.
param_attr
,
shape
=
filter_shape
,
dtype
=
dtype
,
initializer
=
NormalInitializer
(
0.0
,
std
,
0
))
pre_bias
=
helper
.
create_tmp_variable
(
dtype
)
helper
.
append_op
(
...
...
@@ -384,8 +388,8 @@ def pool2d(input,
pool_stride
=
[
1
,
1
],
pool_padding
=
[
0
,
0
],
global_pooling
=
False
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
if
pool_type
not
in
[
"max"
,
"avg"
]:
raise
ValueError
(
"Unknown pool_type: '%s'. It can only be 'max' or 'avg'."
,
...
...
@@ -420,12 +424,12 @@ def batch_norm(input,
act
=
None
,
is_test
=
False
,
momentum
=
0.9
,
epsilon
=
1e05
,
epsilon
=
1e
-
05
,
param_attr
=
None
,
bias_attr
=
None
,
data_layout
=
'NCHW'
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
helper
=
LayerHelper
(
'batch_norm'
,
**
locals
())
dtype
=
helper
.
input_dtype
()
...
...
@@ -438,27 +442,29 @@ def batch_norm(input,
else
:
raise
ValueError
(
"unsupported data layout:"
+
data_layout
)
def
create_persistable_var
(
dtype
,
shape
,
initializer
=
None
):
name
=
unique_name
(
"."
.
join
([
helper
.
name
,
"xxxx"
]))
var
=
init_program
.
global_block
().
create_var
(
dtype
=
dtype
,
shape
=
shape
,
name
=
name
,
persistable
=
True
)
if
initializer
is
not
None
:
initializer
(
var
,
var
.
block
)
return
program
.
global_block
().
create_var
(
name
=
name
,
dtype
=
dtype
,
shape
=
shape
,
persistable
=
True
)
param_shape
=
[
channel_num
]
# create parameter
scale
=
helper
.
create_parameter
(
attr
=
helper
.
param_attr
,
shape
=
param_shape
,
dtype
=
dtype
)
attr
=
helper
.
param_attr
,
shape
=
param_shape
,
dtype
=
dtype
,
initializer
=
ConstantInitializer
(
1.0
))
bias
=
helper
.
create_parameter
(
attr
=
helper
.
param_attr
,
shape
=
param_shape
,
dtype
=
dtype
)
attr
=
helper
.
param_attr
,
shape
=
param_shape
,
dtype
=
dtype
,
initializer
=
ConstantInitializer
(
0.0
))
# create input
mean
=
create_persistable_var
(
dtype
,
param_shape
,
ConstantInitializer
(
0.0
))
variance
=
create_persistable_var
(
dtype
,
param_shape
,
ConstantInitializer
(
1.0
))
mean
=
helper
.
create_global_variable
(
dtype
=
input
.
data_type
,
shape
=
param_shape
,
persistable
=
True
)
helper
.
set_variable_initializer
(
var
=
mean
,
initializer
=
ConstantInitializer
(
0.0
))
variance
=
helper
.
create_global_variable
(
dtype
=
input
.
data_type
,
shape
=
param_shape
,
persistable
=
True
)
helper
.
set_variable_initializer
(
var
=
variance
,
initializer
=
ConstantInitializer
(
1.0
))
# create output
# mean and mean_out share the same memory
...
...
@@ -499,16 +505,16 @@ class BlockGuard(object):
keyword.
"""
def
__init__
(
self
,
program
):
if
not
isinstance
(
program
,
Program
):
def
__init__
(
self
,
main_
program
):
if
not
isinstance
(
main_
program
,
Program
):
raise
TypeError
(
"BlockGuard takes a program"
)
self
.
program
=
program
self
.
main_program
=
main_
program
def
__enter__
(
self
):
self
.
program
.
create_block
()
self
.
main_
program
.
create_block
()
def
__exit__
(
self
,
exc_type
,
exc_val
,
exc_tb
):
self
.
program
.
rollback
()
self
.
main_
program
.
rollback
()
if
exc_type
is
not
None
:
return
False
# re-raise exception
return
True
...
...
@@ -518,7 +524,7 @@ class StaticRNNGuard(BlockGuard):
def
__init__
(
self
,
rnn
):
if
not
isinstance
(
rnn
,
StaticRNN
):
raise
TypeError
(
"StaticRNNGuard takes an StaticRNN"
)
super
(
StaticRNNGuard
,
self
).
__init__
(
rnn
.
helper
.
program
)
super
(
StaticRNNGuard
,
self
).
__init__
(
rnn
.
helper
.
main_
program
)
self
.
rnn
=
rnn
def
__enter__
(
self
):
...
...
@@ -554,8 +560,9 @@ class StaticRNN(object):
IN_RNN_BLOCK
=
1
AFTER_RNN_BLOCK
=
2
def
__init__
(
self
,
name
=
None
,
program
=
None
):
self
.
helper
=
LayerHelper
(
"static_rnn"
,
name
=
name
,
program
=
program
)
def
__init__
(
self
,
name
=
None
,
main_program
=
None
):
self
.
helper
=
LayerHelper
(
"static_rnn"
,
name
=
name
,
main_program
=
main_program
)
self
.
memories
=
{}
# memory map, from pre_mem.name --> MemoryLink
self
.
inputs
=
[]
# input variable list in current block
self
.
outputs
=
[]
# output variable list in parent block
...
...
@@ -647,7 +654,7 @@ class StaticRNN(object):
self
.
memories
[
mem
.
name
].
mem
=
var
def
parent_block
(
self
):
prog
=
self
.
helper
.
program
prog
=
self
.
helper
.
main_
program
parent_idx
=
prog
.
current_block
().
parent_idx
assert
parent_idx
>=
0
parent_block
=
prog
.
block
(
parent_idx
)
...
...
@@ -664,8 +671,8 @@ class StaticRNN(object):
return
self
.
outputs
def
complete_rnn_op
(
self
):
program
=
self
.
helper
.
program
rnn_block
=
program
.
current_block
()
main_program
=
self
.
helper
.
main_
program
rnn_block
=
main_
program
.
current_block
()
parent_block
=
self
.
parent_block
()
local_inputs
=
set
()
...
...
@@ -729,3 +736,16 @@ class StaticRNN(object):
'states'
:
memories
,
'step_block'
:
rnn_block
})
def
lod_rank_table
(
x
,
level
=
0
,
main_program
=
None
):
helper
=
LayerHelper
(
"lod_rank_table"
,
**
locals
())
table
=
helper
.
create_variable
(
type
=
core
.
VarDesc
.
VarType
.
LOD_RANK_TABLE
,
name
=
unique_name
(
"lod_rank_table"
))
helper
.
append_op
(
type
=
'lod_rank_table'
,
inputs
=
{
'X'
:
x
},
outputs
=
{
'Out'
:
table
},
attrs
=
{
'level'
:
level
})
return
table
python/paddle/v2/framework/net_drawer.py
浏览文件 @
483947c4
...
...
@@ -80,7 +80,7 @@ def parse_graph(program, graph, var_dict, **kwargs):
graph
.
edge
(
**
draw_edge
(
var_dict
,
op
,
e
,
arg
))
def
draw_graph
(
init_program
,
program
,
**
kwargs
):
def
draw_graph
(
startup_program
,
main_
program
,
**
kwargs
):
if
kwargs
.
has_key
(
"graph_attr"
):
GRAPH_STYLE
.
update
(
kwargs
[
graph_attr
])
if
kwargs
.
has_key
(
"node_attr"
):
...
...
@@ -101,8 +101,8 @@ def draw_graph(init_program, program, **kwargs):
**
kwargs
)
var_dict
=
{}
parse_graph
(
init
_program
,
g
,
var_dict
)
parse_graph
(
program
,
g
,
var_dict
)
parse_graph
(
startup
_program
,
g
,
var_dict
)
parse_graph
(
main_
program
,
g
,
var_dict
)
if
filename
!=
None
:
g
.
save
()
...
...
python/paddle/v2/framework/nets.py
浏览文件 @
483947c4
...
...
@@ -10,23 +10,23 @@ def simple_img_conv_pool(input,
pool_stride
,
act
,
pool_type
=
'max'
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
conv_out
=
layers
.
conv2d
(
input
=
input
,
num_filters
=
num_filters
,
filter_size
=
filter_size
,
act
=
act
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
pool_out
=
layers
.
pool2d
(
input
=
conv_out
,
pool_size
=
pool_size
,
pool_type
=
pool_type
,
pool_stride
=
pool_stride
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
return
pool_out
...
...
@@ -40,8 +40,8 @@ def img_conv_group(input,
conv_batchnorm_drop_rate
=
None
,
pool_stride
=
1
,
pool_type
=
None
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
"""
Image Convolution Group, Used for vgg net.
"""
...
...
@@ -71,30 +71,30 @@ def img_conv_group(input,
filter_size
=
conv_filter_size
[
i
],
padding
=
conv_padding
[
i
],
act
=
local_conv_act
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
if
conv_with_batchnorm
[
i
]:
tmp
=
layers
.
batch_norm
(
input
=
tmp
,
act
=
conv_act
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
drop_rate
=
conv_batchnorm_drop_rate
[
i
]
if
abs
(
drop_rate
)
>
1e-5
:
tmp
=
layers
.
dropout
(
x
=
tmp
,
dropout_prob
=
drop_rate
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
pool_out
=
layers
.
pool2d
(
input
=
tmp
,
pool_size
=
pool_size
,
pool_type
=
pool_type
,
pool_stride
=
pool_stride
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
return
pool_out
...
...
@@ -103,19 +103,19 @@ def sequence_conv_pool(input,
filter_size
,
act
=
"sigmoid"
,
pool_type
=
"max"
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
conv_out
=
layers
.
sequence_conv
(
input
=
input
,
num_filters
=
num_filters
,
filter_size
=
filter_size
,
act
=
act
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
pool_out
=
layers
.
sequence_pool
(
input
=
conv_out
,
pool_type
=
pool_type
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
return
pool_out
python/paddle/v2/framework/optimizer.py
浏览文件 @
483947c4
...
...
@@ -132,7 +132,7 @@ class Optimizer(object):
def
create_optimization_pass
(
self
,
parameters_and_grads
,
loss
,
init
_program
=
None
):
startup
_program
=
None
):
"""Add optimization operators to update gradients to variables.
Args:
...
...
@@ -144,7 +144,7 @@ class Optimizer(object):
optimization. This will include parameter update ops, global step
update ops and any other custom ops required by subclasses to manage
their internal state.
:param
init
_program:
:param
startup
_program:
"""
# This is a default implementation of create_optimization_pass that
# can be shared by most optimizers. This implementation assumes that
...
...
@@ -156,7 +156,9 @@ class Optimizer(object):
# Create any accumulators
program
=
loss
.
block
.
program
self
.
helper
=
LayerHelper
(
self
.
__class__
.
__name__
,
program
=
program
,
init_program
=
init_program
)
self
.
__class__
.
__name__
,
main_program
=
program
,
startup_program
=
startup_program
)
self
.
_create_accumulators
(
loss
.
block
,
[
p
[
0
]
for
p
in
parameters_and_grads
])
# Create any necessary tensors
...
...
@@ -185,7 +187,7 @@ class Optimizer(object):
def
minimize
(
self
,
loss
,
init
_program
=
None
,
startup
_program
=
None
,
parameter_list
=
None
,
no_grad_set
=
None
):
"""Add operations to minimize `loss` by updating `parameter_list`.
...
...
@@ -198,7 +200,7 @@ class Optimizer(object):
# Add regularization if any
params_grads
=
append_regularization_ops
(
params_grads
)
optimize_ops
=
self
.
create_optimization_pass
(
params_grads
,
loss
,
init
_program
)
startup
_program
)
return
optimize_ops
...
...
python/paddle/v2/framework/tests/test_crf_decoding_op.py
0 → 100644
浏览文件 @
483947c4
import
unittest
import
random
import
numpy
as
np
from
op_test
import
OpTest
class
CRFDecoding
(
object
):
def
__init__
(
self
,
emission_weights
,
transition_weights
,
seq_start_positions
):
assert
(
emission_weights
.
shape
[
0
]
==
seq_start_positions
[
-
1
])
self
.
tag_num
=
emission_weights
.
shape
[
1
]
self
.
seq_num
=
len
(
seq_start_positions
)
-
1
self
.
seq_start_positions
=
seq_start_positions
self
.
x
=
emission_weights
self
.
a
=
transition_weights
[
0
,
:]
self
.
b
=
transition_weights
[
1
,
:]
self
.
w
=
transition_weights
[
2
:,
:]
self
.
track
=
np
.
zeros
(
(
seq_start_positions
[
-
1
],
self
.
tag_num
),
dtype
=
"int32"
)
self
.
decoded_path
=
np
.
zeros
(
(
seq_start_positions
[
-
1
],
1
),
dtype
=
"int32"
)
def
_decode_one_sequence
(
self
,
decoded_path
,
x
):
seq_len
,
tag_num
=
x
.
shape
alpha
=
np
.
zeros
((
seq_len
,
tag_num
),
dtype
=
"float64"
)
track
=
np
.
zeros
((
seq_len
,
tag_num
),
dtype
=
"int32"
)
for
i
in
range
(
tag_num
):
alpha
[
0
,
i
]
=
self
.
a
[
i
]
+
x
[
0
,
i
]
for
k
in
range
(
1
,
seq_len
):
for
i
in
range
(
tag_num
):
max_score
=
-
np
.
finfo
(
"float64"
).
max
max_idx
=
0
for
j
in
range
(
tag_num
):
score
=
alpha
[
k
-
1
,
j
]
+
self
.
w
[
j
,
i
]
if
score
>
max_score
:
max_score
=
score
max_idx
=
j
alpha
[
k
,
i
]
=
max_score
+
x
[
k
,
i
]
track
[
k
,
i
]
=
max_idx
max_score
=
-
np
.
finfo
(
"float64"
).
max
max_idx
=
0
for
i
in
range
(
tag_num
):
score
=
alpha
[
seq_len
-
1
,
i
]
+
self
.
b
[
i
]
if
score
>
max_score
:
max_score
=
score
max_idx
=
i
decoded_path
[
-
1
]
=
max_idx
for
i
in
range
(
seq_len
-
1
,
0
,
-
1
):
decoded_path
[
i
-
1
]
=
max_idx
=
track
[
i
,
max_idx
]
def
decode
(
self
):
for
i
in
range
(
self
.
seq_num
):
start
=
self
.
seq_start_positions
[
i
]
end
=
self
.
seq_start_positions
[
i
+
1
]
self
.
_decode_one_sequence
(
self
.
decoded_path
[
start
:
end
,
:],
self
.
x
[
start
:
end
,
:])
return
self
.
decoded_path
class
TestCRFDecodingOp1
(
OpTest
):
"""
Compare the dynamic program with random generated parameters and inputs
with grouth truth not being given.
"""
def
set_test_data
(
self
):
SEQ_NUM
=
3
TAG_NUM
=
17
MAX_SEQ_LEN
=
10
lod
=
[[
0
]]
for
i
in
range
(
SEQ_NUM
):
lod
[
-
1
].
append
(
lod
[
-
1
][
-
1
]
+
random
.
randint
(
1
,
MAX_SEQ_LEN
))
emission
=
np
.
random
.
uniform
(
-
1
,
1
,
[
lod
[
-
1
][
-
1
],
TAG_NUM
]).
astype
(
"float64"
)
transition
=
np
.
random
.
uniform
(
-
0.5
,
0.5
,
[
TAG_NUM
+
2
,
TAG_NUM
]).
astype
(
"float64"
)
self
.
inputs
=
{
"Emission"
:
(
emission
,
lod
),
"Transition"
:
transition
,
}
decoder
=
CRFDecoding
(
emission
,
transition
,
lod
[
0
])
decoded_path
=
decoder
.
decode
()
self
.
outputs
=
{
"ViterbiPath"
:
decoded_path
}
def
setUp
(
self
):
self
.
op_type
=
"crf_decoding"
self
.
set_test_data
()
def
test_check_output
(
self
):
self
.
check_output
()
class
TestCRFDecodingOp2
(
OpTest
):
"""
Compare the dynamic program with brute force computation with
ground truth being given.
"""
def
setUp
(
self
):
self
.
op_type
=
"crf_decoding"
TAG_NUM
=
5
lod
=
[[
0
,
1
,
3
,
6
,
10
]]
transition
=
np
.
repeat
(
np
.
arange
(
TAG_NUM
,
dtype
=
"float64"
).
reshape
(
1
,
TAG_NUM
),
TAG_NUM
+
2
,
axis
=
0
)
emission
=
np
.
repeat
(
np
.
arange
(
TAG_NUM
,
dtype
=
"float64"
).
reshape
(
1
,
TAG_NUM
),
lod
[
-
1
][
-
1
],
axis
=
0
)
labels
=
np
.
random
.
randint
(
low
=
0
,
high
=
TAG_NUM
,
size
=
(
lod
[
-
1
][
-
1
],
1
),
dtype
=
"int32"
)
predicted_labels
=
np
.
ones
(
(
lod
[
-
1
][
-
1
],
1
),
dtype
=
"int32"
)
*
(
TAG_NUM
-
1
)
expected_output
=
(
labels
==
predicted_labels
).
astype
(
"int32"
)
self
.
inputs
=
{
"Emission"
:
(
emission
,
lod
),
"Transition"
:
transition
,
"Label"
:
(
labels
,
lod
)
}
self
.
outputs
=
{
"ViterbiPath"
:
expected_output
}
def
test_check_output
(
self
):
self
.
check_output
()
if
__name__
==
"__main__"
:
unittest
.
main
()
python/paddle/v2/framework/tests/test_executor_and_mul.py
浏览文件 @
483947c4
...
...
@@ -2,7 +2,7 @@ import unittest
from
paddle.v2.framework.layers
import
mul
,
data
import
paddle.v2.framework.core
as
core
from
paddle.v2.framework.executor
import
Executor
from
paddle.v2.framework.framework
import
g_program
from
paddle.v2.framework.framework
import
g_
main_
program
import
numpy
...
...
@@ -23,7 +23,7 @@ class TestExecutor(unittest.TestCase):
tensor_b
=
core
.
LoDTensor
()
tensor_b
.
set
(
b_np
,
place
)
exe
=
Executor
(
place
)
outs
=
exe
.
run
(
g_program
,
outs
=
exe
.
run
(
g_
main_
program
,
feed
=
{
'a'
:
tensor_a
,
'b'
:
tensor_b
},
fetch_list
=
[
out
])
...
...
python/paddle/v2/framework/tests/test_fit_a_line.py
浏览文件 @
483947c4
...
...
@@ -3,40 +3,44 @@ import paddle.v2.framework.layers as layers
import
paddle.v2.framework.core
as
core
import
paddle.v2.framework.optimizer
as
optimizer
from
paddle.v2.framework.framework
import
Program
,
g_program
from
paddle.v2.framework.framework
import
Program
,
g_
main_
program
from
paddle.v2.framework.io
import
save_persistables
,
load_persistables
from
paddle.v2.framework.executor
import
Executor
import
numpy
as
np
init
_program
=
Program
()
program
=
Program
()
startup
_program
=
Program
()
main_
program
=
Program
()
x
=
layers
.
data
(
name
=
'x'
,
shape
=
[
13
],
data_type
=
'float32'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
y_predict
=
layers
.
fc
(
input
=
x
,
size
=
1
,
act
=
None
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
y
=
layers
.
data
(
name
=
'y'
,
shape
=
[
1
],
data_type
=
'float32'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
cost
=
layers
.
square_error_cost
(
input
=
y_predict
,
label
=
y
,
program
=
program
,
init_program
=
init_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
,
init_program
=
init_program
)
input
=
y_predict
,
label
=
y
,
main_program
=
main_program
,
startup_program
=
startup_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
main_program
=
main_program
,
startup_program
=
startup_program
)
sgd_optimizer
=
optimizer
.
SGDOptimizer
(
learning_rate
=
0.001
)
opts
=
sgd_optimizer
.
minimize
(
avg_cost
,
init
_program
)
opts
=
sgd_optimizer
.
minimize
(
avg_cost
,
startup
_program
)
BATCH_SIZE
=
20
...
...
@@ -48,12 +52,12 @@ train_reader = paddle.batch(
place
=
core
.
CPUPlace
()
exe
=
Executor
(
place
)
exe
.
run
(
init
_program
,
feed
=
{},
fetch_list
=
[])
exe
.
run
(
startup
_program
,
feed
=
{},
fetch_list
=
[])
PASS_NUM
=
100
for
pass_id
in
range
(
PASS_NUM
):
save_persistables
(
exe
,
"./fit_a_line.model/"
,
program
=
program
)
load_persistables
(
exe
,
"./fit_a_line.model/"
,
program
=
program
)
save_persistables
(
exe
,
"./fit_a_line.model/"
,
main_program
=
main_
program
)
load_persistables
(
exe
,
"./fit_a_line.model/"
,
main_program
=
main_
program
)
for
data
in
train_reader
():
x_data
=
np
.
array
(
map
(
lambda
x
:
x
[
0
],
data
)).
astype
(
"float32"
)
y_data
=
np
.
array
(
map
(
lambda
x
:
x
[
1
],
data
)).
astype
(
"float32"
)
...
...
@@ -65,7 +69,7 @@ for pass_id in range(PASS_NUM):
tensor_y
=
core
.
LoDTensor
()
tensor_y
.
set
(
y_data
,
place
)
# print tensor_y.get_dims()
outs
=
exe
.
run
(
program
,
outs
=
exe
.
run
(
main_
program
,
feed
=
{
'x'
:
tensor_x
,
'y'
:
tensor_y
},
fetch_list
=
[
avg_cost
])
...
...
python/paddle/v2/framework/tests/test_image_classification_layer.py
浏览文件 @
483947c4
...
...
@@ -9,8 +9,8 @@ def conv_block(input,
num_filter
,
groups
,
dropouts
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
return
nets
.
img_conv_group
(
input
=
input
,
pool_size
=
2
,
...
...
@@ -21,77 +21,81 @@ def conv_block(input,
conv_with_batchnorm
=
True
,
conv_batchnorm_drop_rate
=
dropouts
,
pool_type
=
'max'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
class
TestLayer
(
unittest
.
TestCase
):
def
test_batch_norm_layer
(
self
):
program
=
Program
()
init
_program
=
Program
()
main_
program
=
Program
()
startup
_program
=
Program
()
images
=
layers
.
data
(
name
=
'pixel'
,
shape
=
[
3
,
48
,
48
],
data_type
=
'float32'
,
program
=
program
)
main_program
=
main_
program
)
layers
.
batch_norm
(
input
=
images
,
program
=
program
,
init_program
=
init_program
)
input
=
images
,
main_program
=
main_program
,
startup_program
=
startup_program
)
# print str(program)
# print str(
main_
program)
def
test_dropout_layer
(
self
):
program
=
Program
()
init
_program
=
Program
()
main_
program
=
Program
()
startup
_program
=
Program
()
images
=
layers
.
data
(
name
=
'pixel'
,
shape
=
[
3
,
48
,
48
],
data_type
=
'float32'
,
program
=
program
)
main_program
=
main_
program
)
layers
.
dropout
(
x
=
images
,
dropout_prob
=
0.5
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
# print str(program)
# print str(
main_
program)
def
test_img_conv_group
(
self
):
program
=
Program
()
init
_program
=
Program
()
main_
program
=
Program
()
startup
_program
=
Program
()
images
=
layers
.
data
(
name
=
'pixel'
,
shape
=
[
3
,
48
,
48
],
data_type
=
'float32'
,
program
=
program
,
init_program
=
init_program
)
conv1
=
conv_block
(
images
,
64
,
2
,
[
0.3
,
0
],
program
,
init_program
)
conv2
=
conv_block
(
conv1
,
256
,
3
,
[
0.4
,
0.4
,
0
],
program
,
init_program
)
main_program
=
main_program
,
startup_program
=
startup_program
)
conv1
=
conv_block
(
images
,
64
,
2
,
[
0.3
,
0
],
main_program
,
startup_program
)
conv2
=
conv_block
(
conv1
,
256
,
3
,
[
0.4
,
0.4
,
0
],
main_program
,
startup_program
)
# print str(program)
# print str(
main_
program)
def
test_elementwise_add_with_act
(
self
):
program
=
Program
()
init
_program
=
Program
()
main_
program
=
Program
()
startup
_program
=
Program
()
image1
=
layers
.
data
(
name
=
'pixel1'
,
shape
=
[
3
,
48
,
48
],
data_type
=
'float32'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
image2
=
layers
.
data
(
name
=
'pixel2'
,
shape
=
[
3
,
48
,
48
],
data_type
=
'float32'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
out
=
layers
.
elementwise_add
(
x
=
image1
,
y
=
image2
,
act
=
'relu'
,
program
=
program
,
init_program
=
init
_program
)
# print(program)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
# print(
main_
program)
if
__name__
==
'__main__'
:
...
...
python/paddle/v2/framework/tests/test_image_classification_train.py
浏览文件 @
483947c4
import
numpy
as
np
import
paddle.v2
as
paddle
import
paddle.v2.framework.core
as
core
import
paddle.v2.framework.layers
as
layers
import
paddle.v2.framework.nets
as
nets
import
paddle.v2.framework.core
as
core
import
paddle.v2.framework.optimizer
as
optimizer
from
paddle.v2.framework.framework
import
Program
,
g_program
from
paddle.v2.framework.executor
import
Executor
from
paddle.v2.framework.framework
import
g_startup_program
,
g_main_program
from
paddle.v2.framework.initializer
import
XavierInitializer
import
numpy
as
np
def
resnet_cifar10
(
input
,
depth
=
32
,
program
=
None
,
init_program
=
None
):
def
resnet_cifar10
(
input
,
depth
=
32
,
main_program
=
None
,
startup_program
=
None
):
def
conv_bn_layer
(
input
,
ch_out
,
filter_size
,
stride
,
padding
,
act
=
'relu'
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
tmp
=
layers
.
conv2d
(
input
=
input
,
filter_size
=
filter_size
,
...
...
@@ -27,10 +26,13 @@ def resnet_cifar10(input, depth=32, program=None, init_program=None):
padding
=
padding
,
act
=
None
,
bias_attr
=
False
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
return
layers
.
batch_norm
(
input
=
tmp
,
act
=
act
,
program
=
program
,
init_program
=
init_program
)
input
=
tmp
,
act
=
act
,
main_program
=
main_program
,
startup_program
=
startup_program
)
def
shortcut
(
input
,
ch_in
,
ch_out
,
stride
,
program
,
init_program
):
if
ch_in
!=
ch_out
:
...
...
@@ -43,16 +45,16 @@ def resnet_cifar10(input, depth=32, program=None, init_program=None):
ch_in
,
ch_out
,
stride
,
program
=
program
,
init_program
=
init
_program
):
main_program
=
main_
program
,
startup_program
=
startup
_program
):
tmp
=
conv_bn_layer
(
input
,
ch_out
,
3
,
stride
,
1
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
tmp
=
conv_bn_layer
(
tmp
,
ch_out
,
...
...
@@ -60,21 +62,22 @@ def resnet_cifar10(input, depth=32, program=None, init_program=None):
1
,
1
,
act
=
None
,
program
=
program
,
init_program
=
init_program
)
short
=
shortcut
(
input
,
ch_in
,
ch_out
,
stride
,
program
,
init_program
)
main_program
=
main_program
,
startup_program
=
startup_program
)
short
=
shortcut
(
input
,
ch_in
,
ch_out
,
stride
,
main_program
,
startup_program
)
return
layers
.
elementwise_add
(
x
=
tmp
,
y
=
short
,
act
=
'relu'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
def
layer_warp
(
block_func
,
input
,
ch_in
,
ch_out
,
count
,
stride
,
program
,
init
_program
):
tmp
=
block_func
(
input
,
ch_in
,
ch_out
,
stride
,
program
,
init
_program
)
startup
_program
):
tmp
=
block_func
(
input
,
ch_in
,
ch_out
,
stride
,
program
,
startup
_program
)
for
i
in
range
(
1
,
count
):
tmp
=
block_func
(
tmp
,
ch_out
,
ch_out
,
1
,
program
,
init
_program
)
tmp
=
block_func
(
tmp
,
ch_out
,
ch_out
,
1
,
program
,
startup
_program
)
return
tmp
assert
(
depth
-
2
)
%
6
==
0
...
...
@@ -85,8 +88,8 @@ def resnet_cifar10(input, depth=32, program=None, init_program=None):
filter_size
=
3
,
stride
=
1
,
padding
=
1
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
res1
=
layer_warp
(
basicblock
,
conv1
,
...
...
@@ -94,8 +97,8 @@ def resnet_cifar10(input, depth=32, program=None, init_program=None):
16
,
n
,
1
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
res2
=
layer_warp
(
basicblock
,
res1
,
...
...
@@ -103,8 +106,8 @@ def resnet_cifar10(input, depth=32, program=None, init_program=None):
32
,
n
,
2
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
res3
=
layer_warp
(
basicblock
,
res2
,
...
...
@@ -112,25 +115,25 @@ def resnet_cifar10(input, depth=32, program=None, init_program=None):
64
,
n
,
2
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
pool
=
layers
.
pool2d
(
input
=
res3
,
pool_size
=
8
,
pool_type
=
'avg'
,
pool_stride
=
1
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
return
pool
def
vgg16_bn_drop
(
input
,
program
,
init_program
):
def
vgg16_bn_drop
(
input
,
main_program
=
None
,
startup_program
=
None
):
def
conv_block
(
input
,
num_filter
,
groups
,
dropouts
,
program
=
None
,
init
_program
=
None
):
main_
program
=
None
,
startup
_program
=
None
):
return
nets
.
img_conv_group
(
input
=
input
,
pool_size
=
2
,
...
...
@@ -141,74 +144,75 @@ def vgg16_bn_drop(input, program, init_program):
conv_with_batchnorm
=
True
,
conv_batchnorm_drop_rate
=
dropouts
,
pool_type
=
'max'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
conv1
=
conv_block
(
input
,
64
,
2
,
[
0.3
,
0
],
program
,
init_program
)
conv2
=
conv_block
(
conv1
,
128
,
2
,
[
0.4
,
0
],
program
,
init_program
)
conv3
=
conv_block
(
conv2
,
256
,
3
,
[
0.4
,
0.4
,
0
],
program
,
init_program
)
conv4
=
conv_block
(
conv3
,
512
,
3
,
[
0.4
,
0.4
,
0
],
program
,
init_program
)
conv5
=
conv_block
(
conv4
,
512
,
3
,
[
0.4
,
0.4
,
0
],
program
,
init_program
)
conv1
=
conv_block
(
input
,
64
,
2
,
[
0.3
,
0
],
main_program
,
startup_program
)
conv2
=
conv_block
(
conv1
,
128
,
2
,
[
0.4
,
0
],
main_program
,
startup_program
)
conv3
=
conv_block
(
conv2
,
256
,
3
,
[
0.4
,
0.4
,
0
],
main_program
,
startup_program
)
conv4
=
conv_block
(
conv3
,
512
,
3
,
[
0.4
,
0.4
,
0
],
main_program
,
startup_program
)
conv5
=
conv_block
(
conv4
,
512
,
3
,
[
0.4
,
0.4
,
0
],
main_program
,
startup_program
)
drop
=
layers
.
dropout
(
x
=
conv5
,
dropout_prob
=
0.5
,
program
=
program
,
init_program
=
init_program
)
x
=
conv5
,
dropout_prob
=
0.5
,
main_program
=
main_program
,
startup_program
=
startup_program
)
fc1
=
layers
.
fc
(
input
=
drop
,
size
=
512
,
act
=
None
,
program
=
program
,
init_program
=
init_program
)
param_attr
=
{
"initializer"
:
XavierInitializer
()},
main_program
=
main_program
,
startup_program
=
startup_program
)
reshape1
=
layers
.
reshape
(
x
=
fc1
,
shape
=
list
(
fc1
.
shape
+
(
1
,
1
)),
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
bn
=
layers
.
batch_norm
(
input
=
reshape1
,
act
=
'relu'
,
program
=
program
,
init_program
=
init_program
)
input
=
reshape1
,
act
=
'relu'
,
main_program
=
main_program
,
startup_program
=
startup_program
)
drop2
=
layers
.
dropout
(
x
=
bn
,
dropout_prob
=
0.5
,
program
=
program
,
init_program
=
init_program
)
x
=
bn
,
dropout_prob
=
0.5
,
main_program
=
main_program
,
startup_program
=
startup_program
)
fc2
=
layers
.
fc
(
input
=
drop2
,
size
=
512
,
act
=
None
,
program
=
program
,
init_program
=
init_program
)
param_attr
=
{
"initializer"
:
XavierInitializer
()},
main_program
=
main_program
,
startup_program
=
startup_program
)
return
fc2
init_program
=
Program
()
program
=
Program
()
classdim
=
10
data_shape
=
[
3
,
32
,
32
]
images
=
layers
.
data
(
name
=
'pixel'
,
shape
=
data_shape
,
data_type
=
'float32'
,
program
=
program
)
label
=
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init_program
)
images
=
layers
.
data
(
name
=
'pixel'
,
shape
=
data_shape
,
data_type
=
'float32'
)
label
=
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
data_type
=
'int64'
)
# Add neural network config
# option 1. resnet
net
=
resnet_cifar10
(
images
,
32
,
program
,
init_program
)
# net = resnet_cifar10(images, 32
)
# option 2. vgg
# net = vgg16_bn_drop(images, program, init_program
)
net
=
vgg16_bn_drop
(
images
)
# print(program)
predict
=
layers
.
fc
(
input
=
net
,
size
=
classdim
,
act
=
'softmax'
,
program
=
program
,
init_program
=
init_program
)
cost
=
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
,
program
=
program
,
init_program
=
init_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
,
init_program
=
init_program
)
predict
=
layers
.
fc
(
input
=
net
,
size
=
classdim
,
act
=
'softmax'
)
cost
=
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
)
avg_cost
=
layers
.
mean
(
x
=
cost
)
accuracy
=
layers
.
accuracy
(
input
=
predict
,
label
=
label
)
sgd_optimizer
=
optimizer
.
SGDOptimizer
(
learning_rate
=
0.001
)
opts
=
sgd_optimizer
.
minimize
(
avg_cost
,
init_program
)
# optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
optimizer
=
optimizer
.
AdamOptimizer
(
learning_rate
=
0.001
)
opts
=
optimizer
.
minimize
(
avg_cost
)
BATCH_SIZE
=
128
PASS_NUM
=
1
...
...
@@ -221,7 +225,7 @@ train_reader = paddle.batch(
place
=
core
.
CPUPlace
()
exe
=
Executor
(
place
)
exe
.
run
(
init
_program
,
feed
=
{},
fetch_list
=
[])
exe
.
run
(
g_startup
_program
,
feed
=
{},
fetch_list
=
[])
for
pass_id
in
range
(
PASS_NUM
):
batch_id
=
0
...
...
@@ -239,14 +243,15 @@ for pass_id in range(PASS_NUM):
tensor_img
.
set
(
img_data
,
place
)
tensor_y
.
set
(
y_data
,
place
)
outs
=
exe
.
run
(
program
,
outs
=
exe
.
run
(
g_main_
program
,
feed
=
{
"pixel"
:
tensor_img
,
"label"
:
tensor_y
},
fetch_list
=
[
avg_cost
])
fetch_list
=
[
avg_cost
,
accuracy
])
loss
=
np
.
array
(
outs
[
0
])
acc
=
np
.
array
(
outs
[
1
])
print
(
"pass_id:"
+
str
(
pass_id
)
+
" batch_id:"
+
str
(
batch_id
)
+
" loss:"
+
str
(
loss
))
" loss:"
+
str
(
loss
)
+
" acc:"
+
str
(
acc
)
)
batch_id
=
batch_id
+
1
if
batch_id
>
1
:
...
...
python/paddle/v2/framework/tests/test_inference_model_io.py
浏览文件 @
483947c4
...
...
@@ -3,7 +3,7 @@ import paddle.v2.framework.layers as layers
import
paddle.v2.framework.core
as
core
import
paddle.v2.framework.optimizer
as
optimizer
from
paddle.v2.framework.framework
import
Program
,
g_program
from
paddle.v2.framework.framework
import
Program
,
g_
main_
program
from
paddle.v2.framework.io
import
save_inference_model
,
load_inference_model
import
paddle.v2.framework.executor
as
executor
import
unittest
...
...
@@ -20,28 +20,28 @@ class TestBook(unittest.TestCase):
name
=
'x'
,
shape
=
[
2
],
data_type
=
'float32'
,
program
=
program
,
init
_program
=
init_program
)
main_
program
=
program
,
startup
_program
=
init_program
)
y
=
layers
.
data
(
name
=
'y'
,
shape
=
[
1
],
data_type
=
'float32'
,
program
=
program
,
init
_program
=
init_program
)
main_
program
=
program
,
startup
_program
=
init_program
)
y_predict
=
layers
.
fc
(
input
=
x
,
size
=
1
,
act
=
None
,
program
=
program
,
init
_program
=
init_program
)
main_
program
=
program
,
startup
_program
=
init_program
)
cost
=
layers
.
square_error_cost
(
input
=
y_predict
,
label
=
y
,
program
=
program
,
init
_program
=
init_program
)
main_
program
=
program
,
startup
_program
=
init_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
,
init
_program
=
init_program
)
x
=
cost
,
main_program
=
program
,
startup
_program
=
init_program
)
sgd_optimizer
=
optimizer
.
SGDOptimizer
(
learning_rate
=
0.001
)
opts
=
sgd_optimizer
.
minimize
(
avg_cost
,
init_program
)
...
...
python/paddle/v2/framework/tests/test_layers.py
浏览文件 @
483947c4
import
paddle.v2.framework.layers
as
layers
import
paddle.v2.framework.nets
as
nets
from
paddle.v2.framework.framework
import
Program
,
g_program
from
paddle.v2.framework.framework
import
Program
,
g_
main_
program
import
paddle.v2.framework.core
as
core
import
unittest
...
...
@@ -9,15 +9,15 @@ class TestBook(unittest.TestCase):
def
test_fit_a_line
(
self
):
program
=
Program
()
x
=
layers
.
data
(
name
=
'x'
,
shape
=
[
13
],
data_type
=
'float32'
,
program
=
program
)
y_predict
=
layers
.
fc
(
input
=
x
,
size
=
1
,
act
=
None
,
program
=
program
)
name
=
'x'
,
shape
=
[
13
],
data_type
=
'float32'
,
main_
program
=
program
)
y_predict
=
layers
.
fc
(
input
=
x
,
size
=
1
,
act
=
None
,
main_
program
=
program
)
y
=
layers
.
data
(
name
=
'y'
,
shape
=
[
1
],
data_type
=
'float32'
,
program
=
program
)
name
=
'y'
,
shape
=
[
1
],
data_type
=
'float32'
,
main_
program
=
program
)
cost
=
layers
.
square_error_cost
(
input
=
y_predict
,
label
=
y
,
program
=
program
)
input
=
y_predict
,
label
=
y
,
main_
program
=
program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
main_
program
=
program
)
self
.
assertIsNotNone
(
avg_cost
)
program
.
append_backward
(
avg_cost
)
print
str
(
program
)
...
...
@@ -27,26 +27,42 @@ class TestBook(unittest.TestCase):
# Change g_program, so the rest layers use `g_program`
images
=
layers
.
data
(
name
=
'pixel'
,
shape
=
[
784
],
data_type
=
'float32'
,
program
=
program
)
name
=
'pixel'
,
shape
=
[
784
],
data_type
=
'float32'
,
main_program
=
program
)
label
=
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
data_type
=
'int32'
,
program
=
program
)
hidden1
=
layers
.
fc
(
input
=
images
,
size
=
128
,
act
=
'relu'
,
program
=
program
)
hidden2
=
layers
.
fc
(
input
=
hidden1
,
size
=
64
,
act
=
'relu'
,
program
=
program
)
name
=
'label'
,
shape
=
[
1
],
data_type
=
'int32'
,
main_program
=
program
)
hidden1
=
layers
.
fc
(
input
=
images
,
size
=
128
,
act
=
'relu'
,
main_program
=
program
)
hidden2
=
layers
.
fc
(
input
=
hidden1
,
size
=
64
,
act
=
'relu'
,
main_program
=
program
)
predict
=
layers
.
fc
(
input
=
hidden2
,
size
=
10
,
act
=
'softmax'
,
program
=
program
)
cost
=
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
,
program
=
program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
)
main_program
=
program
)
cost
=
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
,
main_program
=
program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
main_program
=
program
)
self
.
assertIsNotNone
(
avg_cost
)
print
str
(
program
)
def
test_simple_conv2d
(
self
):
program
=
Program
()
images
=
layers
.
data
(
name
=
'pixel'
,
shape
=
[
3
,
48
,
48
],
data_type
=
'int32'
,
program
=
program
)
name
=
'pixel'
,
shape
=
[
3
,
48
,
48
],
data_type
=
'int32'
,
main_program
=
program
)
layers
.
conv2d
(
input
=
images
,
num_filters
=
3
,
filter_size
=
[
4
,
4
],
program
=
program
)
input
=
images
,
num_filters
=
3
,
filter_size
=
[
4
,
4
],
main_program
=
program
)
print
str
(
program
)
...
...
@@ -57,9 +73,9 @@ class TestBook(unittest.TestCase):
name
=
'pixel'
,
shape
=
[
1
,
28
,
28
],
data_type
=
'float32'
,
program
=
program
)
main_
program
=
program
)
label
=
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
data_type
=
'int32'
,
program
=
program
)
name
=
'label'
,
shape
=
[
1
],
data_type
=
'int32'
,
main_
program
=
program
)
conv_pool_1
=
nets
.
simple_img_conv_pool
(
input
=
images
,
filter_size
=
5
,
...
...
@@ -67,7 +83,7 @@ class TestBook(unittest.TestCase):
pool_size
=
2
,
pool_stride
=
2
,
act
=
"relu"
,
program
=
program
)
main_
program
=
program
)
conv_pool_2
=
nets
.
simple_img_conv_pool
(
input
=
conv_pool_1
,
filter_size
=
5
,
...
...
@@ -75,14 +91,15 @@ class TestBook(unittest.TestCase):
pool_size
=
2
,
pool_stride
=
2
,
act
=
"relu"
,
program
=
program
)
main_
program
=
program
)
predict
=
layers
.
fc
(
input
=
conv_pool_2
,
size
=
10
,
act
=
"softmax"
,
program
=
program
)
cost
=
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
,
program
=
program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
)
main_program
=
program
)
cost
=
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
,
main_program
=
program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
main_program
=
program
)
program
.
append_backward
(
avg_cost
)
...
...
@@ -93,58 +110,58 @@ class TestBook(unittest.TestCase):
dict_size
=
10000
embed_size
=
32
first_word
=
layers
.
data
(
name
=
'firstw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
)
name
=
'firstw'
,
shape
=
[
1
],
data_type
=
'int64'
,
main_
program
=
program
)
second_word
=
layers
.
data
(
name
=
'secondw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
)
name
=
'secondw'
,
shape
=
[
1
],
data_type
=
'int64'
,
main_
program
=
program
)
third_word
=
layers
.
data
(
name
=
'thirdw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
)
name
=
'thirdw'
,
shape
=
[
1
],
data_type
=
'int64'
,
main_
program
=
program
)
forth_word
=
layers
.
data
(
name
=
'forthw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
)
name
=
'forthw'
,
shape
=
[
1
],
data_type
=
'int64'
,
main_
program
=
program
)
next_word
=
layers
.
data
(
name
=
'nextw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
)
name
=
'nextw'
,
shape
=
[
1
],
data_type
=
'int64'
,
main_
program
=
program
)
embed_first
=
layers
.
embedding
(
input
=
first_word
,
size
=
[
dict_size
,
embed_size
],
data_type
=
'float32'
,
param_attr
=
{
'name'
:
'shared_w'
},
program
=
program
)
main_
program
=
program
)
embed_second
=
layers
.
embedding
(
input
=
second_word
,
size
=
[
dict_size
,
embed_size
],
data_type
=
'float32'
,
param_attr
=
{
'name'
:
'shared_w'
},
program
=
program
)
main_
program
=
program
)
embed_third
=
layers
.
embedding
(
input
=
third_word
,
size
=
[
dict_size
,
embed_size
],
data_type
=
'float32'
,
param_attr
=
{
'name'
:
'shared_w'
},
program
=
program
)
main_
program
=
program
)
embed_forth
=
layers
.
embedding
(
input
=
forth_word
,
size
=
[
dict_size
,
embed_size
],
data_type
=
'float32'
,
param_attr
=
{
'name'
:
'shared_w'
},
program
=
program
)
main_
program
=
program
)
concat_embed
=
layers
.
concat
(
input
=
[
embed_first
,
embed_second
,
embed_third
,
embed_forth
],
axis
=
1
,
program
=
program
)
main_
program
=
program
)
hidden1
=
layers
.
fc
(
input
=
concat_embed
,
size
=
256
,
act
=
'sigmoid'
,
program
=
program
)
main_
program
=
program
)
predict_word
=
layers
.
fc
(
input
=
hidden1
,
size
=
dict_size
,
act
=
'softmax'
,
program
=
program
)
main_
program
=
program
)
cost
=
layers
.
cross_entropy
(
input
=
predict_word
,
label
=
next_word
,
program
=
program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
)
input
=
predict_word
,
label
=
next_word
,
main_
program
=
program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
main_
program
=
program
)
self
.
assertIsNotNone
(
avg_cost
)
print
str
(
program
)
...
...
python/paddle/v2/framework/tests/test_lod_rank_table.py
0 → 100644
浏览文件 @
483947c4
from
paddle.v2.framework.layers
import
lod_rank_table
,
data
from
paddle.v2.framework.executor
import
Executor
from
paddle.v2.framework.framework
import
g_main_program
import
paddle.v2.framework.core
as
core
import
numpy
import
unittest
class
TestLoDRankTable
(
unittest
.
TestCase
):
def
test_lod_rank_table
(
self
):
x
=
data
(
name
=
'x'
,
shape
=
[
100
])
cpu
=
core
.
CPUPlace
()
rank_table
=
lod_rank_table
(
x
=
x
,
level
=
1
)
rank_table
.
persistable
=
True
exe
=
Executor
(
cpu
)
scope
=
core
.
Scope
()
tensor
=
core
.
LoDTensor
()
tensor
.
set
(
numpy
.
random
.
random
(
size
=
(
17
,
100
)),
cpu
)
tensor
.
set_lod
([[
0
,
1
,
3
],
[
0
,
5
,
6
,
7
],
[
0
,
3
,
4
,
9
,
10
,
13
,
16
,
17
]])
exe
.
run
(
g_main_program
,
scope
=
scope
,
feed
=
{
'x'
:
tensor
})
var
=
scope
.
find_var
(
rank_table
.
name
)
table
=
var
.
get_lod_rank_table
()
self
.
assertEqual
([(
0
,
5
),
(
1
,
1
),
(
2
,
1
)],
table
.
items
())
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/v2/framework/tests/test_lod_tensor_array.py
0 → 100644
浏览文件 @
483947c4
import
unittest
import
paddle.v2.framework.core
as
core
import
numpy
class
TestLoDTensorArray
(
unittest
.
TestCase
):
def
test_get_set
(
self
):
scope
=
core
.
Scope
()
arr
=
scope
.
var
(
'tmp_lod_tensor_array'
)
tensor_array
=
arr
.
get_lod_tensor_array
()
self
.
assertEqual
(
0
,
len
(
tensor_array
))
cpu
=
core
.
CPUPlace
()
for
i
in
xrange
(
10
):
t
=
core
.
LoDTensor
()
t
.
set
(
numpy
.
array
([
i
],
dtype
=
'float32'
),
cpu
)
t
.
set_lod
([[
0
,
1
]])
tensor_array
.
append
(
t
)
self
.
assertEqual
(
10
,
len
(
tensor_array
))
for
i
in
xrange
(
10
):
t
=
tensor_array
[
i
]
self
.
assertEqual
(
numpy
.
array
(
t
),
numpy
.
array
([
i
],
dtype
=
'float32'
))
self
.
assertEqual
([[
0
,
1
]],
t
.
lod
())
t
=
core
.
LoDTensor
()
t
.
set
(
numpy
.
array
([
i
+
10
],
dtype
=
'float32'
),
cpu
)
t
.
set_lod
([[
0
,
2
]])
tensor_array
[
i
]
=
t
t
=
tensor_array
[
i
]
self
.
assertEqual
(
numpy
.
array
(
t
),
numpy
.
array
(
[
i
+
10
],
dtype
=
'float32'
))
self
.
assertEqual
([[
0
,
2
]],
t
.
lod
())
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/v2/framework/tests/test_operator_desc.py
浏览文件 @
483947c4
import
unittest
from
paddle.v2.framework.framework
import
Variable
,
Program
,
g_program
from
paddle.v2.framework.framework
import
Variable
,
Program
,
g_
main_
program
import
paddle.v2.framework.core
as
core
class
TestOperator
(
unittest
.
TestCase
):
def
test_error_type
(
self
):
block
=
g_program
.
create_block
()
block
=
g_
main_
program
.
create_block
()
try
:
block
.
append_op
()
self
.
assertFail
()
...
...
python/paddle/v2/framework/tests/test_parameter.py
浏览文件 @
483947c4
import
unittest
from
paddle.v2.framework.framework
import
g_program
from
paddle.v2.framework.framework
import
g_
main_
program
import
paddle.v2.framework.core
as
core
class
TestParameter
(
unittest
.
TestCase
):
def
test_param
(
self
):
b
=
g_program
.
create_block
()
b
=
g_
main_
program
.
create_block
()
param
=
b
.
create_parameter
(
name
=
'fc.w'
,
shape
=
[
784
,
100
],
...
...
python/paddle/v2/framework/tests/test_program.py
浏览文件 @
483947c4
...
...
@@ -2,35 +2,35 @@ import unittest
import
paddle.v2.framework.core
as
core
from
paddle.v2.framework.framework
import
Program
from
paddle.v2.framework.framework
import
g_program
from
paddle.v2.framework.framework
import
g_
main_
program
class
TestProgram
(
unittest
.
TestCase
):
def
test_program
(
self
):
b
=
g_program
.
current_block
()
b
=
g_
main_
program
.
current_block
()
self
.
assertEqual
(
-
1
,
b
.
parent_idx
)
self
.
assertEqual
(
0
,
b
.
idx
)
b
=
g_program
.
create_block
()
b
=
g_
main_
program
.
create_block
()
self
.
assertEqual
(
1
,
b
.
idx
)
self
.
assertEqual
(
0
,
b
.
parent_idx
)
b
=
g_program
.
create_block
()
b
=
g_
main_
program
.
create_block
()
self
.
assertEqual
(
2
,
b
.
idx
)
self
.
assertEqual
(
1
,
b
.
parent_idx
)
g_program
.
rollback
()
g_
main_
program
.
rollback
()
b
=
g_program
.
current_block
()
b
=
g_
main_
program
.
current_block
()
self
.
assertEqual
(
1
,
b
.
idx
)
self
.
assertEqual
(
0
,
b
.
parent_idx
)
b
=
g_program
.
create_block
()
b
=
g_
main_
program
.
create_block
()
self
.
assertEqual
(
3
,
b
.
idx
)
self
.
assertEqual
(
1
,
b
.
parent_idx
)
g_program
.
rollback
()
b
=
g_program
.
current_block
()
g_
main_
program
.
rollback
()
b
=
g_
main_
program
.
current_block
()
self
.
assertEqual
(
1
,
b
.
idx
)
self
.
assertEqual
(
0
,
b
.
parent_idx
)
...
...
python/paddle/v2/framework/tests/test_recognize_digits_conv.py
浏览文件 @
483947c4
...
...
@@ -4,26 +4,26 @@ import paddle.v2.framework.nets as nets
import
paddle.v2.framework.core
as
core
import
paddle.v2.framework.optimizer
as
optimizer
from
paddle.v2.framework.framework
import
Program
,
g_program
from
paddle.v2.framework.framework
import
Program
,
g_
main_
program
from
paddle.v2.framework.executor
import
Executor
import
numpy
as
np
init
_program
=
Program
()
program
=
Program
()
startup
_program
=
Program
()
main_
program
=
Program
()
images
=
layers
.
data
(
name
=
'pixel'
,
shape
=
[
1
,
28
,
28
],
data_type
=
'float32'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
label
=
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
conv_pool_1
=
nets
.
simple_img_conv_pool
(
input
=
images
,
filter_size
=
5
,
...
...
@@ -31,8 +31,8 @@ conv_pool_1 = nets.simple_img_conv_pool(
pool_size
=
2
,
pool_stride
=
2
,
act
=
"relu"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
conv_pool_2
=
nets
.
simple_img_conv_pool
(
input
=
conv_pool_1
,
filter_size
=
5
,
...
...
@@ -40,24 +40,30 @@ conv_pool_2 = nets.simple_img_conv_pool(
pool_size
=
2
,
pool_stride
=
2
,
act
=
"relu"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
predict
=
layers
.
fc
(
input
=
conv_pool_2
,
size
=
10
,
act
=
"softmax"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
cost
=
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
,
program
=
program
,
init_program
=
init_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
)
input
=
predict
,
label
=
label
,
main_program
=
main_program
,
startup_program
=
startup_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
main_program
=
main_program
)
accuracy
=
layers
.
accuracy
(
input
=
predict
,
label
=
label
,
program
=
program
,
init_program
=
init_program
)
input
=
predict
,
label
=
label
,
main_program
=
main_program
,
startup_program
=
startup_program
)
# optimizer = optimizer.MomentumOptimizer(learning_rate=0.1 / 128.0,
# momentum=0.9)
optimizer
=
optimizer
.
AdamOptimizer
(
learning_rate
=
0.01
,
beta1
=
0.9
,
beta2
=
0.999
)
opts
=
optimizer
.
minimize
(
avg_cost
,
init
_program
)
opts
=
optimizer
.
minimize
(
avg_cost
,
startup
_program
)
BATCH_SIZE
=
50
PASS_NUM
=
3
...
...
@@ -69,7 +75,7 @@ train_reader = paddle.batch(
place
=
core
.
CPUPlace
()
exe
=
Executor
(
place
)
exe
.
run
(
init
_program
,
feed
=
{},
fetch_list
=
[])
exe
.
run
(
startup
_program
,
feed
=
{},
fetch_list
=
[])
for
pass_id
in
range
(
PASS_NUM
):
count
=
0
...
...
@@ -84,7 +90,7 @@ for pass_id in range(PASS_NUM):
tensor_img
.
set
(
img_data
,
place
)
tensor_y
.
set
(
y_data
,
place
)
outs
=
exe
.
run
(
program
,
outs
=
exe
.
run
(
main_
program
,
feed
=
{
"pixel"
:
tensor_img
,
"label"
:
tensor_y
},
fetch_list
=
[
avg_cost
,
accuracy
])
...
...
python/paddle/v2/framework/tests/test_recognize_digits_mlp.py
浏览文件 @
483947c4
...
...
@@ -11,14 +11,14 @@ from paddle.v2.framework.initializer import UniformInitializer
import
numpy
as
np
BATCH_SIZE
=
128
init
_program
=
Program
()
program
=
Program
()
startup
_program
=
Program
()
main_
program
=
Program
()
image
=
layers
.
data
(
name
=
'x'
,
shape
=
[
784
],
data_type
=
'float32'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
param_attr
=
{
'name'
:
None
,
...
...
@@ -30,36 +30,45 @@ param_attr = {
hidden1
=
layers
.
fc
(
input
=
image
,
size
=
128
,
act
=
'relu'
,
program
=
program
,
init_program
=
init
_program
,
main_program
=
main_
program
,
startup_program
=
startup
_program
,
param_attr
=
param_attr
)
hidden2
=
layers
.
fc
(
input
=
hidden1
,
size
=
64
,
act
=
'relu'
,
program
=
program
,
init_program
=
init
_program
,
main_program
=
main_
program
,
startup_program
=
startup
_program
,
param_attr
=
param_attr
)
predict
=
layers
.
fc
(
input
=
hidden2
,
size
=
10
,
act
=
'softmax'
,
program
=
program
,
init_program
=
init
_program
,
main_program
=
main_
program
,
startup_program
=
startup
_program
,
param_attr
=
param_attr
)
label
=
layers
.
data
(
name
=
'y'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
cost
=
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
,
program
=
program
,
init_program
=
init_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
,
init_program
=
init_program
)
input
=
predict
,
label
=
label
,
main_program
=
main_program
,
startup_program
=
startup_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
main_program
=
main_program
,
startup_program
=
startup_program
)
accuracy
=
layers
.
accuracy
(
input
=
predict
,
label
=
label
,
main_program
=
main_program
,
startup_program
=
startup_program
)
optimizer
=
optimizer
.
MomentumOptimizer
(
learning_rate
=
0.001
,
momentum
=
0.9
)
opts
=
optimizer
.
minimize
(
avg_cost
,
init
_program
)
opts
=
optimizer
.
minimize
(
avg_cost
,
startup
_program
)
train_reader
=
paddle
.
batch
(
paddle
.
reader
.
shuffle
(
...
...
@@ -69,7 +78,7 @@ train_reader = paddle.batch(
place
=
core
.
CPUPlace
()
exe
=
Executor
(
place
)
exe
.
run
(
init
_program
,
feed
=
{},
fetch_list
=
[])
exe
.
run
(
startup
_program
,
feed
=
{},
fetch_list
=
[])
PASS_NUM
=
100
for
pass_id
in
range
(
PASS_NUM
):
...
...
@@ -84,12 +93,12 @@ for pass_id in range(PASS_NUM):
tensor_y
=
core
.
LoDTensor
()
tensor_y
.
set
(
y_data
,
place
)
outs
=
exe
.
run
(
program
,
outs
=
exe
.
run
(
main_
program
,
feed
=
{
'x'
:
tensor_x
,
'y'
:
tensor_y
},
fetch_list
=
[
avg_cost
])
fetch_list
=
[
avg_cost
,
accuracy
])
out
=
np
.
array
(
outs
[
0
])
acc
=
np
.
array
(
outs
[
1
])
if
out
[
0
]
<
5.0
:
exit
(
0
)
# if avg cost less than 5.0, we think our code is good.
exit
(
1
)
python/paddle/v2/framework/tests/test_recommender_system.py
浏览文件 @
483947c4
...
...
@@ -4,13 +4,13 @@ import paddle.v2.framework.nets as nets
import
paddle.v2.framework.core
as
core
import
paddle.v2.framework.optimizer
as
optimizer
from
paddle.v2.framework.framework
import
Program
,
g_program
from
paddle.v2.framework.framework
import
Program
,
g_
main_
program
from
paddle.v2.framework.executor
import
Executor
import
numpy
as
np
init
_program
=
Program
()
program
=
Program
()
startup
_program
=
Program
()
main_
program
=
Program
()
is_sparse
=
True
use_gpu
=
False
BATCH_SIZE
=
256
...
...
@@ -26,8 +26,8 @@ def get_usr_combined_features():
name
=
'user_id'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_emb
=
layers
.
embedding
(
input
=
uid
,
...
...
@@ -35,13 +35,13 @@ def get_usr_combined_features():
size
=
[
USR_DICT_SIZE
,
32
],
param_attr
=
{
'name'
:
'user_table'
},
is_sparse
=
is_sparse
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_fc
=
layers
.
fc
(
input
=
usr_emb
,
size
=
32
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
USR_GENDER_DICT_SIZE
=
2
...
...
@@ -49,75 +49,75 @@ def get_usr_combined_features():
name
=
'gender_id'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_gender_emb
=
layers
.
embedding
(
input
=
usr_gender_id
,
size
=
[
USR_GENDER_DICT_SIZE
,
16
],
param_attr
=
{
'name'
:
'gender_table'
},
is_sparse
=
is_sparse
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_gender_fc
=
layers
.
fc
(
input
=
usr_gender_emb
,
size
=
16
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
USR_AGE_DICT_SIZE
=
len
(
paddle
.
dataset
.
movielens
.
age_table
)
usr_age_id
=
layers
.
data
(
name
=
'age_id'
,
shape
=
[
1
],
data_type
=
"int64"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_age_emb
=
layers
.
embedding
(
input
=
usr_age_id
,
size
=
[
USR_AGE_DICT_SIZE
,
16
],
is_sparse
=
is_sparse
,
param_attr
=
{
'name'
:
'age_table'
},
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_age_fc
=
layers
.
fc
(
input
=
usr_age_emb
,
size
=
16
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
USR_JOB_DICT_SIZE
=
paddle
.
dataset
.
movielens
.
max_job_id
()
+
1
usr_job_id
=
layers
.
data
(
name
=
'job_id'
,
shape
=
[
1
],
data_type
=
"int64"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_job_emb
=
layers
.
embedding
(
input
=
usr_job_id
,
size
=
[
USR_JOB_DICT_SIZE
,
16
],
param_attr
=
{
'name'
:
'job_table'
},
is_sparse
=
is_sparse
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_job_fc
=
layers
.
fc
(
input
=
usr_job_emb
,
size
=
16
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
concat_embed
=
layers
.
concat
(
input
=
[
usr_fc
,
usr_gender_fc
,
usr_age_fc
,
usr_job_fc
],
axis
=
1
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
usr_combined_features
=
layers
.
fc
(
input
=
concat_embed
,
size
=
200
,
act
=
"tanh"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
return
usr_combined_features
...
...
@@ -130,8 +130,8 @@ def get_mov_combined_features():
name
=
'movie_id'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
mov_emb
=
layers
.
embedding
(
input
=
mov_id
,
...
...
@@ -139,13 +139,13 @@ def get_mov_combined_features():
size
=
[
MOV_DICT_SIZE
,
32
],
param_attr
=
{
'name'
:
'movie_table'
},
is_sparse
=
is_sparse
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
mov_fc
=
layers
.
fc
(
input
=
mov_emb
,
size
=
32
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
CATEGORY_DICT_SIZE
=
len
(
paddle
.
dataset
.
movielens
.
movie_categories
())
...
...
@@ -153,21 +153,21 @@ def get_mov_combined_features():
name
=
'category_id'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
mov_categories_emb
=
layers
.
embedding
(
input
=
category_id
,
size
=
[
CATEGORY_DICT_SIZE
,
32
],
is_sparse
=
is_sparse
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
mov_categories_hidden
=
layers
.
sequence_pool
(
input
=
mov_categories_emb
,
pool_type
=
"sum"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
MOV_TITLE_DICT_SIZE
=
len
(
paddle
.
dataset
.
movielens
.
get_movie_title_dict
())
...
...
@@ -175,15 +175,15 @@ def get_mov_combined_features():
name
=
'movie_title'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
mov_title_emb
=
layers
.
embedding
(
input
=
mov_title_id
,
size
=
[
MOV_TITLE_DICT_SIZE
,
32
],
is_sparse
=
is_sparse
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
mov_title_conv
=
nets
.
sequence_conv_pool
(
input
=
mov_title_emb
,
...
...
@@ -191,21 +191,21 @@ def get_mov_combined_features():
filter_size
=
3
,
act
=
"tanh"
,
pool_type
=
"sum"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
concat_embed
=
layers
.
concat
(
input
=
[
mov_fc
,
mov_categories_hidden
,
mov_title_conv
],
axis
=
1
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
# FIXME(dzh) : need tanh operator
mov_combined_features
=
layers
.
fc
(
input
=
concat_embed
,
size
=
200
,
act
=
"tanh"
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
return
mov_combined_features
...
...
@@ -218,24 +218,26 @@ def model():
inference
=
layers
.
cos_sim
(
X
=
usr_combined_features
,
Y
=
mov_combined_features
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
label
=
layers
.
data
(
name
=
'score'
,
shape
=
[
1
],
data_type
=
'float32'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
square_cost
=
layers
.
square_error_cost
(
input
=
inference
,
label
=
label
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
avg_cost
=
layers
.
mean
(
x
=
square_cost
,
program
=
program
,
init_program
=
init_program
)
x
=
square_cost
,
main_program
=
main_program
,
startup_program
=
startup_program
)
return
avg_cost
...
...
@@ -243,8 +245,8 @@ def model():
def
main
():
cost
=
model
()
sgd_optimizer
=
optimizer
.
SGDOptimizer
(
learning_rate
=
0.2
)
opts
=
sgd_optimizer
.
minimize
(
cost
,
init_program
=
init
_program
)
block
=
program
.
block
(
0
)
opts
=
sgd_optimizer
.
minimize
(
cost
,
startup_program
=
startup
_program
)
block
=
main_
program
.
block
(
0
)
if
use_gpu
:
place
=
core
.
GPUPlace
(
0
)
...
...
@@ -252,7 +254,7 @@ def main():
place
=
core
.
CPUPlace
()
exe
=
Executor
(
place
)
exe
.
run
(
init
_program
,
feed
=
{},
fetch_list
=
[])
exe
.
run
(
startup
_program
,
feed
=
{},
fetch_list
=
[])
train_reader
=
paddle
.
batch
(
paddle
.
reader
.
shuffle
(
...
...
@@ -301,7 +303,7 @@ def main():
PASS_NUM
=
100
for
pass_id
in
range
(
PASS_NUM
):
for
data
in
train_reader
():
outs
=
exe
.
run
(
program
,
outs
=
exe
.
run
(
main_
program
,
feed
=
func_feed
(
feeding
,
data
),
fetch_list
=
[
cost
])
out
=
np
.
array
(
outs
[
0
])
...
...
python/paddle/v2/framework/tests/test_recurrent_op.py
浏览文件 @
483947c4
...
...
@@ -99,17 +99,17 @@ class RecurrentOpTest1(unittest.TestCase):
batch_size
=
1
sent_len
=
1
def
init
_program
(
self
):
self
.
program
=
Program
()
self
.
init
_program
=
Program
()
def
setup
_program
(
self
):
self
.
main_
program
=
Program
()
self
.
startup
_program
=
Program
()
self
.
p_info
=
{
"
program"
:
self
.
program
,
"
init_program"
:
self
.
init
_program
"
main_program"
:
self
.
main_
program
,
"
startup_program"
:
self
.
startup
_program
}
self
.
place
=
core
.
CPUPlace
()
def
setUp
(
self
):
self
.
init
_program
()
self
.
setup
_program
()
self
.
data_field
=
{
"x"
,
"h_boot"
}
self
.
input_shape
=
(
self
.
sent_len
,
self
.
batch_size
,
self
.
input_dim
)
...
...
@@ -125,13 +125,15 @@ class RecurrentOpTest1(unittest.TestCase):
name
=
'x'
,
append_batch_size
=
False
,
**
self
.
p_info
)
x
.
stop_gradient
=
False
h_boot
=
data
(
shape
=
[
self
.
input_dim
],
data_type
=
'float32'
,
name
=
'h_boot'
,
**
self
.
p_info
)
h_boot
.
stop_gradient
=
False
rnn
=
StaticRNN
(
program
=
self
.
program
)
rnn
=
StaticRNN
(
main_program
=
self
.
main_
program
)
with
rnn
.
step
():
h_pre
=
rnn
.
memory
(
init
=
h_boot
)
x_t
=
rnn
.
step_input
(
x
)
...
...
@@ -153,7 +155,7 @@ class RecurrentOpTest1(unittest.TestCase):
for
x
in
self
.
data_field
}
exe
=
Executor
(
self
.
place
)
out
=
exe
.
run
(
self
.
program
,
out
=
exe
.
run
(
self
.
main_
program
,
feed
=
self
.
feed_map
,
fetch_list
=
[
self
.
output
])
...
...
@@ -165,12 +167,14 @@ class RecurrentOpTest1(unittest.TestCase):
for
x
in
self
.
data_field
}
fetch_list
=
[
self
.
program
.
global_block
().
var
(
x
+
"@GRAD"
)
self
.
main_
program
.
global_block
().
var
(
x
+
"@GRAD"
)
for
x
in
self
.
data_field
]
exe
=
Executor
(
self
.
place
)
return
exe
.
run
(
self
.
program
,
feed
=
self
.
feed_map
,
fetch_list
=
fetch_list
)
return
exe
.
run
(
self
.
main_program
,
feed
=
self
.
feed_map
,
fetch_list
=
fetch_list
)
def
test_backward
(
self
):
self
.
check_forward
()
...
...
@@ -237,7 +241,7 @@ class RecurrentOpTest2(RecurrentOpTest1):
sent_len
=
2
def
setUp
(
self
):
self
.
init
_program
()
self
.
setup
_program
()
self
.
data_field
=
{
"x"
,
"h_boot"
,
"W"
,
"U"
}
...
...
@@ -254,13 +258,15 @@ class RecurrentOpTest2(RecurrentOpTest1):
name
=
'x'
,
append_batch_size
=
False
,
**
self
.
p_info
)
x
.
stop_gradient
=
False
h_boot
=
data
(
shape
=
[
self
.
input_dim
],
data_type
=
'float32'
,
name
=
'h_boot'
,
**
self
.
p_info
)
h_boot
.
stop_gradient
=
False
rnn
=
StaticRNN
(
program
=
self
.
program
)
rnn
=
StaticRNN
(
main_program
=
self
.
main_
program
)
with
rnn
.
step
():
h_pre
=
rnn
.
memory
(
init
=
h_boot
)
x_t
=
rnn
.
step_input
(
x
)
...
...
@@ -333,7 +339,7 @@ class RecurrentOpTest3(RecurrentOpTest1):
sent_len
=
2
def
setUp
(
self
):
self
.
init
_program
()
self
.
setup
_program
()
self
.
data_field
=
{
"x"
,
"h_boot1"
,
"h_boot2"
}
...
...
@@ -351,20 +357,23 @@ class RecurrentOpTest3(RecurrentOpTest1):
name
=
'x'
,
append_batch_size
=
False
,
**
self
.
p_info
)
x
.
stop_gradient
=
False
h_boot1
=
data
(
shape
=
[
self
.
batch_size
,
self
.
input_dim
],
data_type
=
'float32'
,
name
=
'h_boot1'
,
append_batch_size
=
False
,
**
self
.
p_info
)
h_boot1
.
stop_gradient
=
False
h_boot2
=
data
(
shape
=
[
self
.
batch_size
,
self
.
input_dim
],
data_type
=
'float32'
,
name
=
'h_boot2'
,
append_batch_size
=
False
,
**
self
.
p_info
)
h_boot2
.
stop_gradient
=
False
rnn
=
StaticRNN
(
program
=
self
.
program
)
rnn
=
StaticRNN
(
main_program
=
self
.
main_
program
)
with
rnn
.
step
():
h_pre1
=
rnn
.
memory
(
init
=
h_boot1
)
h_pre2
=
rnn
.
memory
(
init
=
h_boot2
)
...
...
python/paddle/v2/framework/tests/test_understand_sentiment_conv.py
浏览文件 @
483947c4
...
...
@@ -4,7 +4,7 @@ import paddle.v2.framework.nets as nets
import
paddle.v2.framework.core
as
core
import
paddle.v2.framework.optimizer
as
optimizer
from
paddle.v2.framework.framework
import
Program
,
g_
program
,
g_init
_program
from
paddle.v2.framework.framework
import
Program
,
g_
main_program
,
g_startup
_program
from
paddle.v2.framework.executor
import
Executor
import
numpy
as
np
...
...
@@ -70,7 +70,7 @@ def main():
place
=
core
.
CPUPlace
()
exe
=
Executor
(
place
)
exe
.
run
(
g_
init
_program
)
exe
.
run
(
g_
startup
_program
)
for
pass_id
in
xrange
(
PASS_NUM
):
for
data
in
train_data
():
...
...
@@ -82,7 +82,7 @@ def main():
tensor_label
=
core
.
LoDTensor
()
tensor_label
.
set
(
label
,
place
)
outs
=
exe
.
run
(
g_program
,
outs
=
exe
.
run
(
g_
main_
program
,
feed
=
{
"words"
:
tensor_words
,
"label"
:
tensor_label
},
fetch_list
=
[
cost
,
acc
])
...
...
python/paddle/v2/framework/tests/test_variable.py
浏览文件 @
483947c4
import
unittest
from
paddle.v2.framework.framework
import
Variable
,
g_program
,
Program
from
paddle.v2.framework.framework
import
Variable
,
g_
main_
program
,
Program
import
paddle.v2.framework.core
as
core
import
numpy
as
np
...
...
@@ -18,7 +18,7 @@ class TestVariable(unittest.TestCase):
self
.
assertRaises
(
ValueError
,
lambda
:
convert
(
"int8"
))
def
test_var
(
self
):
b
=
g_program
.
current_block
()
b
=
g_
main_
program
.
current_block
()
w
=
b
.
create_var
(
dtype
=
"float64"
,
shape
=
[
784
,
100
],
lod_level
=
0
,
name
=
"fc.w"
)
self
.
assertNotEqual
(
str
(
w
),
""
)
...
...
python/paddle/v2/framework/tests/test_word2vec.py
浏览文件 @
483947c4
...
...
@@ -3,13 +3,13 @@ import paddle.v2.framework.layers as layers
import
paddle.v2.framework.core
as
core
import
paddle.v2.framework.optimizer
as
optimizer
from
paddle.v2.framework.framework
import
Program
,
g_program
from
paddle.v2.framework.framework
import
Program
,
g_
main_
program
from
paddle.v2.framework.executor
import
Executor
import
numpy
as
np
init
_program
=
Program
()
program
=
Program
()
startup
_program
=
Program
()
main_
program
=
Program
()
embed_size
=
32
hidden_size
=
256
...
...
@@ -24,32 +24,32 @@ first_word = layers.data(
name
=
'firstw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
second_word
=
layers
.
data
(
name
=
'secondw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
third_word
=
layers
.
data
(
name
=
'thirdw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
forth_word
=
layers
.
data
(
name
=
'forthw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
next_word
=
layers
.
data
(
name
=
'nextw'
,
shape
=
[
1
],
data_type
=
'int64'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
embed_first
=
layers
.
embedding
(
input
=
first_word
,
...
...
@@ -57,16 +57,16 @@ embed_first = layers.embedding(
data_type
=
'float32'
,
is_sparse
=
is_sparse
,
param_attr
=
{
'name'
:
'shared_w'
},
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
embed_second
=
layers
.
embedding
(
input
=
second_word
,
size
=
[
dict_size
,
embed_size
],
data_type
=
'float32'
,
is_sparse
=
is_sparse
,
param_attr
=
{
'name'
:
'shared_w'
},
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
embed_third
=
layers
.
embedding
(
input
=
third_word
,
...
...
@@ -74,42 +74,43 @@ embed_third = layers.embedding(
data_type
=
'float32'
,
is_sparse
=
is_sparse
,
param_attr
=
{
'name'
:
'shared_w'
},
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
embed_forth
=
layers
.
embedding
(
input
=
forth_word
,
size
=
[
dict_size
,
embed_size
],
data_type
=
'float32'
,
is_sparse
=
is_sparse
,
param_attr
=
{
'name'
:
'shared_w'
},
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
concat_embed
=
layers
.
concat
(
input
=
[
embed_first
,
embed_second
,
embed_third
,
embed_forth
],
axis
=
1
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
hidden1
=
layers
.
fc
(
input
=
concat_embed
,
size
=
hidden_size
,
act
=
'sigmoid'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
predict_word
=
layers
.
fc
(
input
=
hidden1
,
size
=
dict_size
,
act
=
'softmax'
,
program
=
program
,
init_program
=
init
_program
)
main_program
=
main_
program
,
startup_program
=
startup
_program
)
cost
=
layers
.
cross_entropy
(
input
=
predict_word
,
label
=
next_word
,
program
=
program
,
init_program
=
init_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
program
=
program
,
init_program
=
init_program
)
main_program
=
main_program
,
startup_program
=
startup_program
)
avg_cost
=
layers
.
mean
(
x
=
cost
,
main_program
=
main_program
,
startup_program
=
startup_program
)
sgd_optimizer
=
optimizer
.
SGDOptimizer
(
learning_rate
=
0.001
)
opts
=
sgd_optimizer
.
minimize
(
avg_cost
,
init
_program
)
opts
=
sgd_optimizer
.
minimize
(
avg_cost
,
startup
_program
)
train_reader
=
paddle
.
batch
(
paddle
.
dataset
.
imikolov
.
train
(
word_dict
,
N
),
batch_size
)
...
...
@@ -117,7 +118,7 @@ train_reader = paddle.batch(
place
=
core
.
CPUPlace
()
exe
=
Executor
(
place
)
exe
.
run
(
init
_program
,
feed
=
{},
fetch_list
=
[])
exe
.
run
(
startup
_program
,
feed
=
{},
fetch_list
=
[])
PASS_NUM
=
100
for
pass_id
in
range
(
PASS_NUM
):
for
data
in
train_reader
():
...
...
@@ -145,7 +146,7 @@ for pass_id in range(PASS_NUM):
next_tensor
=
core
.
LoDTensor
()
next_tensor
.
set
(
next_data
,
place
)
outs
=
exe
.
run
(
program
,
outs
=
exe
.
run
(
main_
program
,
feed
=
{
'firstw'
:
first_tensor
,
'secondw'
:
second_tensor
,
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录