Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleDetection
提交
4cc782db
P
PaddleDetection
项目概览
PaddlePaddle
/
PaddleDetection
1 年多 前同步成功
通知
696
Star
11112
Fork
2696
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
184
列表
看板
标记
里程碑
合并请求
40
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleDetection
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
184
Issue
184
列表
看板
标记
里程碑
合并请求
40
合并请求
40
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
4cc782db
编写于
9月 30, 2018
作者:
S
sneaxiy
浏览文件
操作
浏览文件
下载
差异文件
test=release/1.0.0
上级
baa19fea
acde6e3c
变更
91
展开全部
隐藏空白更改
内联
并排
Showing
91 changed file
with
3050 addition
and
4357 deletion
+3050
-4357
paddle/fluid/API.spec
paddle/fluid/API.spec
+33
-33
paddle/fluid/framework/CMakeLists.txt
paddle/fluid/framework/CMakeLists.txt
+0
-7
paddle/fluid/framework/channel.h
paddle/fluid/framework/channel.h
+0
-291
paddle/fluid/framework/channel_impl.h
paddle/fluid/framework/channel_impl.h
+0
-369
paddle/fluid/framework/channel_test.cc
paddle/fluid/framework/channel_test.cc
+0
-1008
paddle/fluid/framework/concurrency_test.cc
paddle/fluid/framework/concurrency_test.cc
+0
-292
paddle/fluid/framework/executor.cc
paddle/fluid/framework/executor.cc
+1
-4
paddle/fluid/framework/framework.proto
paddle/fluid/framework/framework.proto
+0
-7
paddle/fluid/framework/selected_rows_test.cc
paddle/fluid/framework/selected_rows_test.cc
+8
-1
paddle/fluid/framework/tuple.h
paddle/fluid/framework/tuple.h
+0
-1
paddle/fluid/framework/var_desc.cc
paddle/fluid/framework/var_desc.cc
+2
-52
paddle/fluid/framework/var_desc.h
paddle/fluid/framework/var_desc.h
+0
-4
paddle/fluid/framework/var_type.h
paddle/fluid/framework/var_type.h
+0
-6
paddle/fluid/inference/analysis/analysis_pass.h
paddle/fluid/inference/analysis/analysis_pass.h
+0
-6
paddle/fluid/inference/analysis/analyzer_tester.cc
paddle/fluid/inference/analysis/analyzer_tester.cc
+7
-3
paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass.cc
...fluid/inference/analysis/data_flow_graph_to_fluid_pass.cc
+7
-2
paddle/fluid/inference/analysis/subgraph_splitter.cc
paddle/fluid/inference/analysis/subgraph_splitter.cc
+2
-0
paddle/fluid/inference/analysis/subgraph_splitter.h
paddle/fluid/inference/analysis/subgraph_splitter.h
+7
-2
paddle/fluid/inference/analysis/subgraph_splitter_tester.cc
paddle/fluid/inference/analysis/subgraph_splitter_tester.cc
+3
-1
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.cc
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.cc
+1
-1
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.h
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.h
+5
-1
paddle/fluid/inference/analysis/tensorrt_subgraph_pass_tester.cc
...fluid/inference/analysis/tensorrt_subgraph_pass_tester.cc
+4
-0
paddle/fluid/inference/api/api_impl_tester.cc
paddle/fluid/inference/api/api_impl_tester.cc
+11
-5
paddle/fluid/inference/api/api_tensorrt_subgraph_engine.cc
paddle/fluid/inference/api/api_tensorrt_subgraph_engine.cc
+8
-2
paddle/fluid/inference/api/paddle_inference_api.h
paddle/fluid/inference/api/paddle_inference_api.h
+8
-0
paddle/fluid/inference/tests/api/CMakeLists.txt
paddle/fluid/inference/tests/api/CMakeLists.txt
+10
-0
paddle/fluid/inference/tests/api/trt_models_tester.cc
paddle/fluid/inference/tests/api/trt_models_tester.cc
+106
-0
paddle/fluid/operators/CMakeLists.txt
paddle/fluid/operators/CMakeLists.txt
+1
-5
paddle/fluid/operators/auc_op.cc
paddle/fluid/operators/auc_op.cc
+15
-8
paddle/fluid/operators/auc_op.h
paddle/fluid/operators/auc_op.h
+70
-19
paddle/fluid/operators/channel_close_op.cc
paddle/fluid/operators/channel_close_op.cc
+0
-70
paddle/fluid/operators/channel_create_op.cc
paddle/fluid/operators/channel_create_op.cc
+0
-113
paddle/fluid/operators/channel_recv_op.cc
paddle/fluid/operators/channel_recv_op.cc
+0
-98
paddle/fluid/operators/channel_send_op.cc
paddle/fluid/operators/channel_send_op.cc
+0
-76
paddle/fluid/operators/concurrency/CMakeLists.txt
paddle/fluid/operators/concurrency/CMakeLists.txt
+0
-1
paddle/fluid/operators/concurrency/channel_util.cc
paddle/fluid/operators/concurrency/channel_util.cc
+0
-111
paddle/fluid/operators/concurrency/channel_util.h
paddle/fluid/operators/concurrency/channel_util.h
+0
-38
paddle/fluid/operators/conv_op.h
paddle/fluid/operators/conv_op.h
+4
-3
paddle/fluid/operators/conv_transpose_op.h
paddle/fluid/operators/conv_transpose_op.h
+4
-3
paddle/fluid/operators/cub_reduce.h
paddle/fluid/operators/cub_reduce.h
+322
-0
paddle/fluid/operators/distributed/grpc_client.h
paddle/fluid/operators/distributed/grpc_client.h
+1
-0
paddle/fluid/operators/distributed/request_handler.h
paddle/fluid/operators/distributed/request_handler.h
+1
-0
paddle/fluid/operators/distributed/rpc_server.h
paddle/fluid/operators/distributed/rpc_server.h
+1
-0
paddle/fluid/operators/elementwise_op.h
paddle/fluid/operators/elementwise_op.h
+1
-1
paddle/fluid/operators/math/depthwise_conv.cu
paddle/fluid/operators/math/depthwise_conv.cu
+323
-156
paddle/fluid/operators/math/depthwise_conv.h
paddle/fluid/operators/math/depthwise_conv.h
+4
-1
paddle/fluid/operators/reduce_mean_op.cu
paddle/fluid/operators/reduce_mean_op.cu
+56
-9
paddle/fluid/operators/reduce_sum_op.cu
paddle/fluid/operators/reduce_sum_op.cu
+51
-9
paddle/fluid/operators/scale_op.cc
paddle/fluid/operators/scale_op.cc
+4
-2
paddle/fluid/operators/select_op.cc
paddle/fluid/operators/select_op.cc
+0
-419
paddle/fluid/operators/sum_op.h
paddle/fluid/operators/sum_op.h
+7
-6
paddle/fluid/operators/tensorrt_engine_op.cc
paddle/fluid/operators/tensorrt_engine_op.cc
+2
-2
paddle/fluid/operators/tensorrt_engine_op.h
paddle/fluid/operators/tensorrt_engine_op.h
+6
-7
paddle/fluid/operators/tensorrt_engine_op_test.cc
paddle/fluid/operators/tensorrt_engine_op_test.cc
+4
-6
paddle/fluid/pybind/protobuf.cc
paddle/fluid/pybind/protobuf.cc
+0
-2
paddle/fluid/pybind/pybind.cc
paddle/fluid/pybind/pybind.cc
+0
-1
paddle/legacy/trainer/tests/CMakeLists.txt
paddle/legacy/trainer/tests/CMakeLists.txt
+5
-1
paddle/scripts/paddle_build.sh
paddle/scripts/paddle_build.sh
+15
-4
python/paddle/dataset/common.py
python/paddle/dataset/common.py
+6
-4
python/paddle/fluid/clip.py
python/paddle/fluid/clip.py
+3
-1
python/paddle/fluid/concurrency.py
python/paddle/fluid/concurrency.py
+0
-454
python/paddle/fluid/framework.py
python/paddle/fluid/framework.py
+1
-2
python/paddle/fluid/layers/control_flow.py
python/paddle/fluid/layers/control_flow.py
+1
-1
python/paddle/fluid/layers/detection.py
python/paddle/fluid/layers/detection.py
+95
-8
python/paddle/fluid/layers/metric_op.py
python/paddle/fluid/layers/metric_op.py
+51
-8
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+476
-114
python/paddle/fluid/layers/ops.py
python/paddle/fluid/layers/ops.py
+1
-12
python/paddle/fluid/nets.py
python/paddle/fluid/nets.py
+6
-16
python/paddle/fluid/tests/book/high-level-api/recognize_digits/CMakeLists.txt
...tests/book/high-level-api/recognize_digits/CMakeLists.txt
+13
-3
python/paddle/fluid/tests/no_test_concurrency.py
python/paddle/fluid/tests/no_test_concurrency.py
+0
-260
python/paddle/fluid/tests/unittests/CMakeLists.txt
python/paddle/fluid/tests/unittests/CMakeLists.txt
+3
-0
python/paddle/fluid/tests/unittests/dist_ctr.py
python/paddle/fluid/tests/unittests/dist_ctr.py
+109
-0
python/paddle/fluid/tests/unittests/dist_ctr_reader.py
python/paddle/fluid/tests/unittests/dist_ctr_reader.py
+172
-0
python/paddle/fluid/tests/unittests/dist_mnist.py
python/paddle/fluid/tests/unittests/dist_mnist.py
+3
-3
python/paddle/fluid/tests/unittests/dist_se_resnext.py
python/paddle/fluid/tests/unittests/dist_se_resnext.py
+1
-1
python/paddle/fluid/tests/unittests/dist_simnet_bow.py
python/paddle/fluid/tests/unittests/dist_simnet_bow.py
+238
-0
python/paddle/fluid/tests/unittests/dist_text_classification.py
.../paddle/fluid/tests/unittests/dist_text_classification.py
+231
-0
python/paddle/fluid/tests/unittests/dist_transformer.py
python/paddle/fluid/tests/unittests/dist_transformer.py
+8
-4
python/paddle/fluid/tests/unittests/dist_word2vec.py
python/paddle/fluid/tests/unittests/dist_word2vec.py
+3
-0
python/paddle/fluid/tests/unittests/test_auc_op.py
python/paddle/fluid/tests/unittests/test_auc_op.py
+5
-2
python/paddle/fluid/tests/unittests/test_conv2d_op.py
python/paddle/fluid/tests/unittests/test_conv2d_op.py
+52
-7
python/paddle/fluid/tests/unittests/test_dist_base.py
python/paddle/fluid/tests/unittests/test_dist_base.py
+181
-124
python/paddle/fluid/tests/unittests/test_dist_ctr.py
python/paddle/fluid/tests/unittests/test_dist_ctr.py
+9
-18
python/paddle/fluid/tests/unittests/test_dist_mnist.py
python/paddle/fluid/tests/unittests/test_dist_mnist.py
+2
-2
python/paddle/fluid/tests/unittests/test_dist_se_resnext.py
python/paddle/fluid/tests/unittests/test_dist_se_resnext.py
+9
-8
python/paddle/fluid/tests/unittests/test_dist_simnet_bow.py
python/paddle/fluid/tests/unittests/test_dist_simnet_bow.py
+79
-0
python/paddle/fluid/tests/unittests/test_dist_text_classification.py
...le/fluid/tests/unittests/test_dist_text_classification.py
+40
-0
python/paddle/fluid/tests/unittests/test_dist_word2vec.py
python/paddle/fluid/tests/unittests/test_dist_word2vec.py
+1
-1
python/paddle/fluid/tests/unittests/test_layers.py
python/paddle/fluid/tests/unittests/test_layers.py
+9
-0
python/paddle/fluid/transpiler/distribute_transpiler.py
python/paddle/fluid/transpiler/distribute_transpiler.py
+21
-24
python/paddle/fluid/transpiler/memory_optimization_transpiler.py
...paddle/fluid/transpiler/memory_optimization_transpiler.py
+101
-11
未找到文件。
paddle/fluid/API.spec
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
paddle/fluid/framework/CMakeLists.txt
浏览文件 @
4cc782db
...
...
@@ -167,15 +167,8 @@ cc_test(selected_rows_test SRCS selected_rows_test.cc DEPS selected_rows)
cc_test
(
op_kernel_type_test SRCS op_kernel_type_test.cc DEPS place device_context framework_proto
)
cc_test
(
cow_ptr_tests SRCS details/cow_ptr_test.cc
)
# cc_test(channel_test SRCS channel_test.cc)
cc_test
(
tuple_test SRCS tuple_test.cc
)
if
(
NOT WIN32
)
cc_test
(
rw_lock_test SRCS rw_lock_test.cc
)
endif
(
NOT WIN32
)
# disable test temporarily.
# TODO https://github.com/PaddlePaddle/Paddle/issues/11971
# cc_test(concurrency_test SRCS concurrency_test.cc DEPS go_op channel_close_op channel_create_op
# channel_send_op channel_recv_op sum_op select_op elementwise_add_op compare_op
# conditional_block_op while_op assign_op print_op executor proto_desc)
paddle/fluid/framework/channel.h
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <stddef.h> // for size_t
#include <condition_variable> // NOLINT
#include <typeindex>
#include "paddle/fluid/platform/enforce.h"
namespace
paddle
{
namespace
framework
{
enum
class
ChannelAction
{
SEND
=
0
,
RECEIVE
=
1
,
CLOSE
=
2
,
};
// Channel is the abstract class of buffered and un-buffered channels.
template
<
typename
T
>
class
Channel
{
public:
virtual
bool
CanSend
()
=
0
;
virtual
bool
CanReceive
()
=
0
;
virtual
void
Send
(
T
*
)
=
0
;
virtual
bool
Receive
(
T
*
)
=
0
;
virtual
size_t
Cap
()
=
0
;
virtual
void
Lock
()
=
0
;
virtual
void
Unlock
()
=
0
;
virtual
bool
IsClosed
()
=
0
;
virtual
void
Close
()
=
0
;
virtual
~
Channel
()
{}
virtual
void
AddToSendQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
=
0
;
virtual
void
AddToReceiveQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
=
0
;
virtual
void
RemoveFromSendQ
(
const
void
*
referrer
)
=
0
;
virtual
void
RemoveFromReceiveQ
(
const
void
*
referrer
)
=
0
;
};
// Forward declaration of channel implementations.
template
<
typename
T
>
class
ChannelImpl
;
template
<
typename
T
>
Channel
<
T
>*
MakeChannel
(
size_t
buffer_size
)
{
return
new
ChannelImpl
<
T
>
(
buffer_size
);
}
template
<
typename
T
>
void
CloseChannel
(
Channel
<
T
>*
ch
)
{
ch
->
Close
();
}
/*
* The ChannelHolder class serves two main purposes:
* 1. It acts as a unified wrapper for the different kinds of
* channels, i.e. Buffered and Unbuffered channels. This is
* similar to the ReaderHolder class.
* 2. It also helps us in TypeHiding. This is similar to the
* PlaceHolder implementations in variable.h and tensor.h.
*/
class
ChannelHolder
{
public:
template
<
typename
T
>
void
Reset
(
size_t
buffer_size
)
{
holder_
.
reset
(
new
PlaceholderImpl
<
T
>
(
buffer_size
));
}
template
<
typename
T
>
void
Send
(
T
*
data
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
PADDLE_ENFORCE_EQ
(
holder_
->
Type
(),
std
::
type_index
(
typeid
(
T
)),
"Channel type is not same as the type of the data being sent"
);
// Static cast should be safe because we have ensured that types are same
Channel
<
T
>*
channel
=
static_cast
<
Channel
<
T
>*>
(
holder_
->
Ptr
());
PADDLE_ENFORCE_EQ
(
channel
!=
nullptr
,
true
,
"Channel should not be null."
);
channel
->
Send
(
data
);
}
template
<
typename
T
>
bool
Receive
(
T
*
data
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
PADDLE_ENFORCE_EQ
(
holder_
->
Type
(),
std
::
type_index
(
typeid
(
T
)),
"Channel type is not same as the type of the data being sent"
);
Channel
<
T
>*
channel
=
static_cast
<
Channel
<
T
>*>
(
holder_
->
Ptr
());
PADDLE_ENFORCE_EQ
(
channel
!=
nullptr
,
true
,
"Channel should not be null."
);
return
channel
->
Receive
(
data
);
}
bool
IsClosed
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
IsClosed
();
}
bool
CanSend
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
CanSend
();
}
bool
CanReceive
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
CanReceive
();
}
void
close
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
Close
();
}
size_t
Cap
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
Cap
();
}
void
Lock
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
Lock
();
}
void
Unlock
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
Unlock
();
}
template
<
typename
T
>
void
AddToSendQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
Channel
<
T
>*
channel
=
static_cast
<
Channel
<
T
>*>
(
holder_
->
Ptr
());
if
(
channel
!=
nullptr
)
{
channel
->
AddToSendQ
(
referrer
,
data
,
cond
,
cb
);
}
}
template
<
typename
T
>
void
AddToReceiveQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
Channel
<
T
>*
channel
=
static_cast
<
Channel
<
T
>*>
(
holder_
->
Ptr
());
if
(
channel
!=
nullptr
)
{
channel
->
AddToReceiveQ
(
referrer
,
data
,
cond
,
cb
);
}
}
void
RemoveFromSendQ
(
const
void
*
referrer
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
RemoveFromSendQ
(
referrer
);
}
void
RemoveFromReceiveQ
(
const
void
*
referrer
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
RemoveFromReceiveQ
(
referrer
);
}
inline
bool
IsInitialized
()
const
{
return
holder_
!=
nullptr
;
}
inline
const
std
::
type_index
Type
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
Type
();
}
private:
/**
* @note Placeholder hides type T, so it doesn't appear as a template
* parameter of ChannelHolder.
*/
struct
Placeholder
{
virtual
~
Placeholder
()
{}
virtual
const
std
::
type_index
Type
()
const
=
0
;
virtual
void
*
Ptr
()
const
=
0
;
virtual
bool
IsClosed
()
=
0
;
virtual
bool
CanSend
()
=
0
;
virtual
bool
CanReceive
()
=
0
;
virtual
void
RemoveFromSendQ
(
const
void
*
referrer
)
=
0
;
virtual
void
RemoveFromReceiveQ
(
const
void
*
referrer
)
=
0
;
virtual
void
Close
()
=
0
;
virtual
void
Lock
()
=
0
;
virtual
void
Unlock
()
=
0
;
virtual
size_t
Cap
()
=
0
;
};
template
<
typename
T
>
struct
PlaceholderImpl
:
public
Placeholder
{
explicit
PlaceholderImpl
(
size_t
buffer_size
)
:
type_
(
std
::
type_index
(
typeid
(
T
)))
{
channel_
.
reset
(
MakeChannel
<
T
>
(
buffer_size
));
}
virtual
const
std
::
type_index
Type
()
const
{
return
type_
;
}
virtual
void
*
Ptr
()
const
{
return
static_cast
<
void
*>
(
channel_
.
get
());
}
virtual
bool
IsClosed
()
{
if
(
channel_
)
{
return
channel_
->
IsClosed
();
}
return
false
;
}
virtual
bool
CanSend
()
{
if
(
channel_
)
{
return
channel_
->
CanSend
();
}
return
false
;
}
virtual
bool
CanReceive
()
{
if
(
channel_
)
{
return
channel_
->
CanReceive
();
}
return
false
;
}
virtual
void
RemoveFromSendQ
(
const
void
*
referrer
)
{
if
(
channel_
)
{
channel_
->
RemoveFromSendQ
(
referrer
);
}
}
virtual
void
RemoveFromReceiveQ
(
const
void
*
referrer
)
{
if
(
channel_
)
{
channel_
->
RemoveFromReceiveQ
(
referrer
);
}
}
virtual
void
Close
()
{
if
(
channel_
)
channel_
->
Close
();
}
virtual
size_t
Cap
()
{
if
(
channel_
)
return
channel_
->
Cap
();
else
return
-
1
;
}
virtual
void
Lock
()
{
if
(
channel_
)
channel_
->
Lock
();
}
virtual
void
Unlock
()
{
if
(
channel_
)
channel_
->
Unlock
();
}
std
::
unique_ptr
<
Channel
<
T
>>
channel_
;
const
std
::
type_index
type_
;
};
// Pointer to a PlaceholderImpl object
std
::
unique_ptr
<
Placeholder
>
holder_
;
};
}
// namespace framework
}
// namespace paddle
#include "paddle/fluid/framework/channel_impl.h"
paddle/fluid/framework/channel_impl.h
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <stddef.h> // for size_t
#include <atomic>
#include <condition_variable> // NOLINT
#include <deque>
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/platform/enforce.h"
namespace
paddle
{
namespace
framework
{
template
<
typename
T
>
class
ChannelImpl
:
public
paddle
::
framework
::
Channel
<
T
>
{
friend
Channel
<
T
>
*
paddle
::
framework
::
MakeChannel
<
T
>
(
size_t
);
friend
void
paddle
::
framework
::
CloseChannel
<
T
>
(
Channel
<
T
>
*
);
public:
virtual
bool
CanSend
();
virtual
bool
CanReceive
();
virtual
void
Send
(
T
*
);
virtual
bool
Receive
(
T
*
);
virtual
size_t
Cap
()
{
return
cap_
;
}
virtual
void
Lock
();
virtual
void
Unlock
();
virtual
bool
IsClosed
();
virtual
void
Close
();
explicit
ChannelImpl
(
size_t
);
virtual
~
ChannelImpl
();
virtual
void
AddToSendQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
);
virtual
void
AddToReceiveQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
);
virtual
void
RemoveFromSendQ
(
const
void
*
referrer
);
virtual
void
RemoveFromReceiveQ
(
const
void
*
referrer
);
private:
struct
QueueMessage
{
T
*
data
;
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
;
bool
chan_closed
=
false
;
bool
completed
=
false
;
const
void
*
referrer
;
// TODO(thuan): figure out better way to do this
std
::
function
<
bool
(
ChannelAction
)
>
callback
;
explicit
QueueMessage
(
T
*
item
)
:
data
(
item
),
cond
(
std
::
make_shared
<
std
::
condition_variable_any
>
())
{}
QueueMessage
(
T
*
item
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
)
:
data
(
item
),
cond
(
cond
)
{}
void
Wait
(
std
::
unique_lock
<
std
::
recursive_mutex
>
&
lock
)
{
cond
->
wait
(
lock
,
[
this
]()
{
return
completed
;
});
}
void
Notify
()
{
completed
=
true
;
cond
->
notify_all
();
}
};
void
send_return
()
{
send_ctr
--
;
destructor_cond_
.
notify_all
();
}
bool
recv_return
(
bool
value
)
{
recv_ctr
--
;
destructor_cond_
.
notify_all
();
return
value
;
}
std
::
shared_ptr
<
QueueMessage
>
get_first_message
(
std
::
deque
<
std
::
shared_ptr
<
QueueMessage
>>
*
queue
,
ChannelAction
action
)
{
while
(
!
queue
->
empty
())
{
// Check whether this message was added by Select
// If this was added by Select then execute the callback
// to check if you can execute this message. The callback
// can return false if some other case was executed in Select.
// In that case just discard this QueueMessage and process next.
std
::
shared_ptr
<
QueueMessage
>
m
=
queue
->
front
();
queue
->
pop_front
();
if
(
m
->
callback
==
nullptr
||
m
->
callback
(
action
))
return
m
;
}
return
nullptr
;
}
size_t
cap_
;
std
::
recursive_mutex
mu_
;
bool
closed_
;
std
::
deque
<
T
>
buf_
;
std
::
deque
<
std
::
shared_ptr
<
QueueMessage
>>
recvq
;
std
::
deque
<
std
::
shared_ptr
<
QueueMessage
>>
sendq
;
std
::
atomic
<
unsigned
>
send_ctr
{
0
};
std
::
atomic
<
unsigned
>
recv_ctr
{
0
};
std
::
condition_variable_any
destructor_cond_
;
};
template
<
typename
T
>
ChannelImpl
<
T
>::
ChannelImpl
(
size_t
capacity
)
:
cap_
(
capacity
),
closed_
(
false
),
send_ctr
(
0
),
recv_ctr
(
0
)
{
PADDLE_ENFORCE_GE
(
capacity
,
0
);
}
template
<
typename
T
>
bool
ChannelImpl
<
T
>::
CanSend
()
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
return
!
closed_
&&
(
!
recvq
.
empty
()
||
buf_
.
size
()
<
cap_
);
}
template
<
typename
T
>
bool
ChannelImpl
<
T
>::
CanReceive
()
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
return
!
(
closed_
&&
buf_
.
empty
())
&&
(
!
sendq
.
empty
()
||
buf_
.
size
()
>
0
);
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
Send
(
T
*
item
)
{
send_ctr
++
;
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mu_
};
// If channel is closed, throw exception
if
(
closed_
)
{
send_return
();
lock
.
unlock
();
PADDLE_THROW
(
"Cannot send on closed channel"
);
}
// If there is a receiver, directly pass the value we want
// to send to the receiver, bypassing the channel buffer if any
if
(
!
recvq
.
empty
())
{
std
::
shared_ptr
<
QueueMessage
>
m
=
get_first_message
(
&
recvq
,
ChannelAction
::
SEND
);
if
(
m
!=
nullptr
)
{
*
(
m
->
data
)
=
std
::
move
(
*
item
);
m
->
Notify
();
send_return
();
return
;
}
else
{
Send
(
item
);
send_return
();
return
;
}
}
// Unbuffered channel will always bypass this
// If buffered channel has space in buffer,
// write the element to the buffer.
if
(
buf_
.
size
()
<
cap_
)
{
// Copy to buffer
buf_
.
push_back
(
std
::
move
(
*
item
));
send_return
();
return
;
}
// Block on channel, because some receiver will complete
// the operation for us
auto
m
=
std
::
make_shared
<
QueueMessage
>
(
item
);
sendq
.
push_back
(
m
);
m
->
Wait
(
lock
);
if
(
m
->
chan_closed
)
{
send_return
();
lock
.
unlock
();
PADDLE_THROW
(
"Cannot send on closed channel"
);
}
send_return
();
}
template
<
typename
T
>
bool
ChannelImpl
<
T
>::
Receive
(
T
*
item
)
{
recv_ctr
++
;
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mu_
};
// If channel is closed and buffer is empty or
// channel is unbuffered
if
(
closed_
&&
buf_
.
empty
())
return
recv_return
(
false
);
// If there is a sender, directly receive the value we want
// from the sender. In case of a buffered channel, read from
// buffer and move front of send queue to the buffer
if
(
!
sendq
.
empty
())
{
std
::
shared_ptr
<
QueueMessage
>
m
=
get_first_message
(
&
sendq
,
ChannelAction
::
RECEIVE
);
if
(
buf_
.
size
()
>
0
)
{
// Case 1 : Channel is Buffered
// Do Data transfer from front of buffer
// and add a QueueMessage to the buffer
*
item
=
std
::
move
(
buf_
.
front
());
buf_
.
pop_front
();
// If first message from sendq is not null
// add it to the buffer and notify it
if
(
m
!=
nullptr
)
{
// Copy to buffer
buf_
.
push_back
(
std
::
move
(
*
(
m
->
data
)));
m
->
Notify
();
}
// Ignore if there is no first message
}
else
{
// Case 2: Channel is Unbuffered
// Do data transfer from front of SendQ
// If front is nullptr, then recursively call itself
if
(
m
!=
nullptr
)
{
*
item
=
std
::
move
(
*
(
m
->
data
));
m
->
Notify
();
}
else
{
return
recv_return
(
Receive
(
item
));
}
}
return
recv_return
(
true
);
}
// If this is a buffered channel and there are items in buffer
if
(
buf_
.
size
()
>
0
)
{
// Directly read from buffer
*
item
=
std
::
move
(
buf_
.
front
());
buf_
.
pop_front
();
// return true
return
recv_return
(
true
);
}
// No sender available, block on this channel
// Some receiver will complete the option for us
auto
m
=
std
::
make_shared
<
QueueMessage
>
(
item
);
recvq
.
push_back
(
m
);
m
->
Wait
(
lock
);
return
recv_return
(
!
m
->
chan_closed
);
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
Lock
()
{
mu_
.
lock
();
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
Unlock
()
{
mu_
.
unlock
();
}
template
<
typename
T
>
bool
ChannelImpl
<
T
>::
IsClosed
()
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
return
closed_
;
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
Close
()
{
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mu_
};
if
(
closed_
)
{
// TODO(abhinavarora): closing an already closed channel should panic
lock
.
unlock
();
return
;
}
closed_
=
true
;
// Empty the readers
while
(
!
recvq
.
empty
())
{
std
::
shared_ptr
<
QueueMessage
>
m
=
recvq
.
front
();
recvq
.
pop_front
();
m
->
chan_closed
=
true
;
// Execute callback function (if any)
if
(
m
->
callback
!=
nullptr
)
{
m
->
callback
(
ChannelAction
::
CLOSE
);
}
m
->
Notify
();
}
// Empty the senders
while
(
!
sendq
.
empty
())
{
std
::
shared_ptr
<
QueueMessage
>
m
=
sendq
.
front
();
sendq
.
pop_front
();
m
->
chan_closed
=
true
;
// Execute callback function (if any)
if
(
m
->
callback
!=
nullptr
)
{
m
->
callback
(
ChannelAction
::
CLOSE
);
}
m
->
Notify
();
}
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
AddToSendQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
auto
m
=
std
::
make_shared
<
QueueMessage
>
(
data
,
cond
);
m
->
referrer
=
referrer
;
m
->
callback
=
cb
;
sendq
.
push_back
(
m
);
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
AddToReceiveQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
auto
m
=
std
::
make_shared
<
QueueMessage
>
(
data
,
cond
);
m
->
referrer
=
referrer
;
m
->
callback
=
cb
;
recvq
.
push_back
(
m
);
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
RemoveFromSendQ
(
const
void
*
referrer
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
for
(
auto
it
=
sendq
.
begin
();
it
!=
sendq
.
end
();)
{
std
::
shared_ptr
<
QueueMessage
>
sendMsg
=
(
std
::
shared_ptr
<
QueueMessage
>
)
*
it
;
if
(
sendMsg
->
referrer
==
referrer
)
{
it
=
sendq
.
erase
(
it
);
}
else
{
++
it
;
}
}
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
RemoveFromReceiveQ
(
const
void
*
referrer
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
for
(
auto
it
=
recvq
.
begin
();
it
!=
recvq
.
end
();)
{
std
::
shared_ptr
<
QueueMessage
>
recvMsg
=
(
std
::
shared_ptr
<
QueueMessage
>
)
*
it
;
if
(
recvMsg
->
referrer
==
referrer
)
{
it
=
recvq
.
erase
(
it
);
}
else
{
++
it
;
}
}
}
template
<
typename
T
>
ChannelImpl
<
T
>::~
ChannelImpl
()
{
Close
();
// The destructor must wait for all readers and writers to complete their task
// The channel has been closed, so we will not accept new readers and writers
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mu_
};
destructor_cond_
.
wait
(
lock
,
[
this
]()
{
return
send_ctr
==
0
&&
recv_ctr
==
0
;
});
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/channel_test.cc
已删除
100644 → 0
浏览文件 @
baa19fea
此差异已折叠。
点击以展开。
paddle/fluid/framework/concurrency_test.cc
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <thread> // NOLINT
#include "gtest/gtest.h"
#include "paddle/fluid/framework/block_desc.h"
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/op_registry.h"
USE_NO_KERNEL_OP
(
go
);
USE_NO_KERNEL_OP
(
channel_close
);
USE_NO_KERNEL_OP
(
channel_create
);
USE_NO_KERNEL_OP
(
channel_recv
);
USE_NO_KERNEL_OP
(
channel_send
);
USE_NO_KERNEL_OP
(
elementwise_add
);
USE_NO_KERNEL_OP
(
select
);
USE_NO_KERNEL_OP
(
conditional_block
);
USE_NO_KERNEL_OP
(
equal
);
USE_NO_KERNEL_OP
(
assign
);
USE_NO_KERNEL_OP
(
while
);
USE_NO_KERNEL_OP
(
print
);
namespace
f
=
paddle
::
framework
;
namespace
p
=
paddle
::
platform
;
namespace
paddle
{
namespace
framework
{
template
<
typename
T
>
LoDTensor
*
CreateVariable
(
Scope
*
scope
,
const
p
::
CPUPlace
&
place
,
std
::
string
name
,
T
value
)
{
// Create LoDTensor<int> of dim [1]
auto
var
=
scope
->
Var
(
name
);
auto
tensor
=
var
->
GetMutable
<
LoDTensor
>
();
tensor
->
Resize
({
1
});
T
*
expect
=
tensor
->
mutable_data
<
T
>
(
place
);
expect
[
0
]
=
value
;
return
tensor
;
}
void
AddOp
(
const
std
::
string
&
type
,
const
VariableNameMap
&
inputs
,
const
VariableNameMap
&
outputs
,
AttributeMap
attrs
,
BlockDesc
*
block
)
{
// insert op
auto
op
=
block
->
AppendOp
();
op
->
SetType
(
type
);
for
(
auto
&
kv
:
inputs
)
{
op
->
SetInput
(
kv
.
first
,
kv
.
second
);
}
for
(
auto
&
kv
:
outputs
)
{
op
->
SetOutput
(
kv
.
first
,
kv
.
second
);
}
op
->
SetAttrMap
(
attrs
);
}
void
AddCase
(
ProgramDesc
*
program
,
Scope
*
scope
,
p
::
CPUPlace
*
place
,
BlockDesc
*
casesBlock
,
int
caseId
,
int
caseType
,
std
::
string
caseChannel
,
std
::
string
caseVarName
,
std
::
function
<
void
(
BlockDesc
*
,
Scope
*
)
>
func
)
{
std
::
string
caseCondName
=
std
::
string
(
"caseCond"
)
+
std
::
to_string
(
caseId
);
std
::
string
caseCondXVarName
=
std
::
string
(
"caseCondX"
)
+
std
::
to_string
(
caseId
);
BlockDesc
*
caseBlock
=
program
->
AppendBlock
(
*
casesBlock
);
func
(
caseBlock
,
scope
);
CreateVariable
(
scope
,
*
place
,
caseCondName
,
false
);
CreateVariable
(
scope
,
*
place
,
caseCondXVarName
,
caseId
);
CreateVariable
(
scope
,
*
place
,
caseVarName
,
caseId
);
scope
->
Var
(
"step_scope"
);
AddOp
(
"equal"
,
{{
"X"
,
{
caseCondXVarName
}},
{
"Y"
,
{
"caseToExecute"
}}},
{{
"Out"
,
{
caseCondName
}}},
{},
casesBlock
);
AddOp
(
"conditional_block"
,
{{
"X"
,
{
caseCondName
}},
{
"Params"
,
{}}},
{{
"Out"
,
{}},
{
"Scope"
,
{
"step_scope"
}}},
{{
"sub_block"
,
caseBlock
},
{
"is_scalar_condition"
,
true
}},
casesBlock
);
}
void
AddFibonacciSelect
(
Scope
*
scope
,
p
::
CPUPlace
*
place
,
ProgramDesc
*
program
,
BlockDesc
*
parentBlock
,
std
::
string
dataChanName
,
std
::
string
quitChanName
)
{
BlockDesc
*
whileBlock
=
program
->
AppendBlock
(
*
parentBlock
);
CreateVariable
(
scope
,
*
place
,
"whileExitCond"
,
true
);
CreateVariable
(
scope
,
*
place
,
"caseToExecute"
,
-
1
);
CreateVariable
(
scope
,
*
place
,
"case1var"
,
0
);
CreateVariable
(
scope
,
*
place
,
"xtemp"
,
0
);
// TODO(thuan): Need to create fibXToSend, since channel send moves the actual
// data,
// which causes the data to be no longer accessible to do the fib calculation
// TODO(abhinav): Change channel send to do a copy instead of a move!
CreateVariable
(
scope
,
*
place
,
"fibXToSend"
,
0
);
CreateVariable
(
scope
,
*
place
,
"fibX"
,
0
);
CreateVariable
(
scope
,
*
place
,
"fibY"
,
1
);
CreateVariable
(
scope
,
*
place
,
"quitVar"
,
0
);
BlockDesc
*
casesBlock
=
program
->
AppendBlock
(
*
whileBlock
);
std
::
function
<
void
(
BlockDesc
*
caseBlock
)
>
f
=
[](
BlockDesc
*
caseBlock
)
{};
// TODO(thuan): Remove this once we change channel send to do a copy instead
// of move
AddOp
(
"assign"
,
{{
"X"
,
{
"fibX"
}}},
{{
"Out"
,
{
"fibXToSend"
}}},
{},
whileBlock
);
// Case 0: Send to dataChanName
std
::
function
<
void
(
BlockDesc
*
caseBlock
,
Scope
*
scope
)
>
case0Func
=
[
&
](
BlockDesc
*
caseBlock
,
Scope
*
scope
)
{
AddOp
(
"assign"
,
{{
"X"
,
{
"fibX"
}}},
{{
"Out"
,
{
"xtemp"
}}},
{},
caseBlock
);
AddOp
(
"assign"
,
{{
"X"
,
{
"fibY"
}}},
{{
"Out"
,
{
"fibX"
}}},
{},
caseBlock
);
AddOp
(
"elementwise_add"
,
{{
"X"
,
{
"xtemp"
}},
{
"Y"
,
{
"fibY"
}}},
{{
"Out"
,
{
"fibY"
}}},
{},
caseBlock
);
};
AddCase
(
program
,
scope
,
place
,
casesBlock
,
0
,
1
,
dataChanName
,
"fibXToSend"
,
case0Func
);
std
::
string
case0Config
=
std
::
string
(
"0,1,"
)
+
dataChanName
+
std
::
string
(
",fibXToSend"
);
// Case 1: Receive from quitChanName
std
::
function
<
void
(
BlockDesc
*
caseBlock
,
Scope
*
scope
)
>
case2Func
=
[
&
](
BlockDesc
*
caseBlock
,
Scope
*
scope
)
{
// Exit the while loop after we receive from quit channel.
// We assign a false to "whileExitCond" variable, which will
// break out of while_op loop
CreateVariable
(
scope
,
*
place
,
"whileFalse"
,
false
);
AddOp
(
"assign"
,
{{
"X"
,
{
"whileFalse"
}}},
{{
"Out"
,
{
"whileExitCond"
}}},
{},
caseBlock
);
};
AddCase
(
program
,
scope
,
place
,
casesBlock
,
1
,
2
,
quitChanName
,
"quitVar"
,
case2Func
);
std
::
string
case1Config
=
std
::
string
(
"1,2,"
)
+
quitChanName
+
std
::
string
(
",quitVar"
);
// Select block
AddOp
(
"select"
,
{{
"X"
,
{
dataChanName
,
quitChanName
}},
{
"case_to_execute"
,
{
"caseToExecute"
}}},
{{
"Out"
,
{}}},
{{
"sub_block"
,
casesBlock
},
{
"cases"
,
std
::
vector
<
std
::
string
>
{
case0Config
,
case1Config
}}},
whileBlock
);
scope
->
Var
(
"stepScopes"
);
AddOp
(
"while"
,
{{
"X"
,
{
dataChanName
,
quitChanName
}},
{
"Condition"
,
{
"whileExitCond"
}}},
{{
"Out"
,
{}},
{
"StepScopes"
,
{
"stepScopes"
}}},
{{
"sub_block"
,
whileBlock
}},
parentBlock
);
}
TEST
(
Concurrency
,
Go_Op
)
{
Scope
scope
;
p
::
CPUPlace
place
;
// Initialize scope variables
p
::
CPUDeviceContext
ctx
(
place
);
// Create channel variable
scope
.
Var
(
"Channel"
);
// Create Variables, x0 will be put into channel,
// result will be pulled from channel
CreateVariable
(
&
scope
,
place
,
"Status"
,
false
);
CreateVariable
(
&
scope
,
place
,
"x0"
,
99
);
CreateVariable
(
&
scope
,
place
,
"result"
,
0
);
framework
::
Executor
executor
(
place
);
ProgramDesc
program
;
BlockDesc
*
block
=
program
.
MutableBlock
(
0
);
// Create channel OP
AddOp
(
"channel_create"
,
{},
{{
"Out"
,
{
"Channel"
}}},
{{
"capacity"
,
10
},
{
"data_type"
,
f
::
proto
::
VarType
::
LOD_TENSOR
}},
block
);
// Create Go Op routine
BlockDesc
*
goOpBlock
=
program
.
AppendBlock
(
program
.
Block
(
0
));
AddOp
(
"channel_send"
,
{{
"Channel"
,
{
"Channel"
}},
{
"X"
,
{
"x0"
}}},
{{
"Status"
,
{
"Status"
}}},
{},
goOpBlock
);
// Create Go Op
AddOp
(
"go"
,
{{
"X"
,
{
"Channel"
,
"x0"
}}},
{},
{{
"sub_block"
,
goOpBlock
}},
block
);
// Create Channel Receive Op
AddOp
(
"channel_recv"
,
{{
"Channel"
,
{
"Channel"
}}},
{{
"Status"
,
{
"Status"
}},
{
"Out"
,
{
"result"
}}},
{},
block
);
// Create Channel Close Op
AddOp
(
"channel_close"
,
{{
"Channel"
,
{
"Channel"
}}},
{},
{},
block
);
// Check the result tensor to make sure it is set to 0
const
LoDTensor
&
tensor
=
(
scope
.
FindVar
(
"result"
))
->
Get
<
LoDTensor
>
();
auto
*
initialData
=
tensor
.
data
<
int
>
();
EXPECT_EQ
(
initialData
[
0
],
0
);
executor
.
Run
(
program
,
&
scope
,
0
,
true
,
true
);
// After we call executor.run, the Go operator should do a channel_send to
// set the "result" variable to 99.
auto
*
finalData
=
tensor
.
data
<
int
>
();
EXPECT_EQ
(
finalData
[
0
],
99
);
}
/**
* This test implements the fibonacci function using go_op and select_op
*/
TEST
(
Concurrency
,
Select
)
{
Scope
scope
;
p
::
CPUPlace
place
;
// Initialize scope variables
p
::
CPUDeviceContext
ctx
(
place
);
CreateVariable
(
&
scope
,
place
,
"Status"
,
false
);
CreateVariable
(
&
scope
,
place
,
"result"
,
0
);
CreateVariable
(
&
scope
,
place
,
"currentXFib"
,
0
);
framework
::
Executor
executor
(
place
);
ProgramDesc
program
;
BlockDesc
*
block
=
program
.
MutableBlock
(
0
);
// Create channel OP
std
::
string
dataChanName
=
"Channel"
;
scope
.
Var
(
dataChanName
);
AddOp
(
"channel_create"
,
{},
{{
"Out"
,
{
dataChanName
}}},
{{
"capacity"
,
0
},
{
"data_type"
,
f
::
proto
::
VarType
::
LOD_TENSOR
}},
block
);
std
::
string
quitChanName
=
"Quit"
;
scope
.
Var
(
quitChanName
);
AddOp
(
"channel_create"
,
{},
{{
"Out"
,
{
quitChanName
}}},
{{
"capacity"
,
0
},
{
"data_type"
,
f
::
proto
::
VarType
::
LOD_TENSOR
}},
block
);
// Create Go Op routine, which loops 10 times over fibonacci sequence
CreateVariable
(
&
scope
,
place
,
"xReceiveVar"
,
0
);
BlockDesc
*
goOpBlock
=
program
.
AppendBlock
(
program
.
Block
(
0
));
for
(
int
i
=
0
;
i
<
10
;
++
i
)
{
AddOp
(
"channel_recv"
,
{{
"Channel"
,
{
dataChanName
}}},
{{
"Status"
,
{
"Status"
}},
{
"Out"
,
{
"currentXFib"
}}},
{},
goOpBlock
);
AddOp
(
"print"
,
{{
"In"
,
{
"currentXFib"
}}},
{{
"Out"
,
{
"currentXFib"
}}},
{{
"first_n"
,
100
},
{
"summarize"
,
-
1
},
{
"print_tensor_name"
,
false
},
{
"print_tensor_type"
,
true
},
{
"print_tensor_shape"
,
false
},
{
"print_tensor_lod"
,
false
},
{
"print_phase"
,
std
::
string
(
"FORWARD"
)},
{
"message"
,
std
::
string
(
"X: "
)}},
goOpBlock
);
}
CreateVariable
(
&
scope
,
place
,
"quitSignal"
,
0
);
AddOp
(
"channel_send"
,
{{
"Channel"
,
{
quitChanName
}},
{
"X"
,
{
"quitSignal"
}}},
{{
"Status"
,
{
"Status"
}}},
{},
goOpBlock
);
// Create Go Op
AddOp
(
"go"
,
{{
"X"
,
{
dataChanName
,
quitChanName
}}},
{},
{{
"sub_block"
,
goOpBlock
}},
block
);
AddFibonacciSelect
(
&
scope
,
&
place
,
&
program
,
block
,
dataChanName
,
quitChanName
);
// Create Channel Close Op
AddOp
(
"channel_close"
,
{{
"Channel"
,
{
dataChanName
}}},
{},
{},
block
);
AddOp
(
"channel_close"
,
{{
"Channel"
,
{
quitChanName
}}},
{},
{},
block
);
executor
.
Run
(
program
,
&
scope
,
0
,
true
,
true
);
// After we call executor.run, "result" variable should be equal to 34
// (which is 10 loops through fibonacci sequence)
const
LoDTensor
&
tensor
=
(
scope
.
FindVar
(
"currentXFib"
))
->
Get
<
LoDTensor
>
();
auto
*
finalData
=
tensor
.
data
<
int
>
();
EXPECT_EQ
(
finalData
[
0
],
34
);
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/executor.cc
浏览文件 @
4cc782db
...
...
@@ -14,7 +14,6 @@ limitations under the License. */
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/feed_fetch_method.h"
#include "paddle/fluid/framework/lod_rank_table.h"
#include "paddle/fluid/framework/lod_tensor_array.h"
...
...
@@ -76,15 +75,13 @@ void InitializeVariable(Variable* var, proto::VarType::Type var_type) {
var
->
GetMutable
<
platform
::
PlaceList
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
READER
)
{
var
->
GetMutable
<
ReaderHolder
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
CHANNEL
)
{
var
->
GetMutable
<
ChannelHolder
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
RAW
)
{
// GetMutable will be called in operator
}
else
{
PADDLE_THROW
(
"Variable type %d is not in "
"[LOD_TENSOR, SELECTED_ROWS, FEED_MINIBATCH, FETCH_LIST, "
"LOD_RANK_TABLE, PLACE_LIST, READER,
CHANNEL,
RAW]"
,
"LOD_RANK_TABLE, PLACE_LIST, READER, RAW]"
,
var_type
);
}
}
...
...
paddle/fluid/framework/framework.proto
浏览文件 @
4cc782db
...
...
@@ -126,7 +126,6 @@ message VarType {
LOD_TENSOR_ARRAY
=
13
;
PLACE_LIST
=
14
;
READER
=
15
;
CHANNEL
=
16
;
// Any runtime decided variable type is raw
// raw variables should manage their own allocations
// in operators like nccl_op
...
...
@@ -158,12 +157,6 @@ message VarType {
message
ReaderDesc
{
repeated
LoDTensorDesc
lod_tensor
=
1
;
}
optional
ReaderDesc
reader
=
5
;
message
ChannelDesc
{
required
Type
data_type
=
1
;
required
int64
capacity
=
2
;
}
optional
ChannelDesc
channel
=
6
;
message
Tuple
{
repeated
Type
element_type
=
1
;
}
optional
Tuple
tuple
=
7
;
}
...
...
paddle/fluid/framework/selected_rows_test.cc
浏览文件 @
4cc782db
...
...
@@ -27,8 +27,11 @@ class SelectedRowsTester : public ::testing::Test {
selected_rows_
.
reset
(
new
SelectedRows
(
rows
,
height
));
Tensor
*
value
=
selected_rows_
->
mutable_value
();
value
->
mutable_data
<
float
>
(
auto
*
data
=
value
->
mutable_data
<
float
>
(
make_ddim
({
static_cast
<
int64_t
>
(
rows
.
size
()),
row_numel
}),
place_
);
for
(
int64_t
i
=
0
;
i
<
value
->
numel
();
++
i
)
{
data
[
i
]
=
static_cast
<
float
>
(
i
);
}
}
protected:
...
...
@@ -60,6 +63,10 @@ TEST_F(SelectedRowsTester, SerializeAndDeseralize) {
ASSERT_EQ
(
selected_rows_
->
height
(),
dst_tensor
.
height
());
ASSERT_EQ
(
selected_rows_
->
value
().
dims
(),
dst_tensor
.
value
().
dims
());
ASSERT_EQ
(
selected_rows_
->
GetCompleteDims
(),
dst_tensor
.
GetCompleteDims
());
auto
*
dst_data
=
dst_tensor
.
value
().
data
<
float
>
();
for
(
int64_t
i
=
0
;
i
<
dst_tensor
.
value
().
numel
();
++
i
)
{
ASSERT_EQ
(
dst_data
[
i
],
static_cast
<
float
>
(
i
));
}
}
TEST
(
SelectedRows
,
SparseTable
)
{
...
...
paddle/fluid/framework/tuple.h
浏览文件 @
4cc782db
...
...
@@ -17,7 +17,6 @@ limitations under the License. */
#include <stdexcept>
#include <string>
#include <vector>
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/framework/var_desc.h"
...
...
paddle/fluid/framework/var_desc.cc
浏览文件 @
4cc782db
...
...
@@ -88,13 +88,7 @@ std::vector<std::vector<int64_t>> VarDesc::GetShapes() const {
}
void
VarDesc
::
SetDataType
(
proto
::
VarType
::
Type
data_type
)
{
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
mutable_channel_desc
()
->
set_data_type
(
data_type
);
break
;
default:
mutable_tensor_desc
()
->
set_data_type
(
data_type
);
}
mutable_tensor_desc
()
->
set_data_type
(
data_type
);
}
void
VarDesc
::
SetDataTypes
(
...
...
@@ -115,13 +109,7 @@ void VarDesc::SetDataTypes(
}
proto
::
VarType
::
Type
VarDesc
::
GetDataType
()
const
{
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
return
channel_desc
().
data_type
();
break
;
default:
return
tensor_desc
().
data_type
();
}
return
tensor_desc
().
data_type
();
}
std
::
vector
<
proto
::
VarType
::
Type
>
VarDesc
::
GetDataTypes
()
const
{
...
...
@@ -134,17 +122,6 @@ std::vector<proto::VarType::Type> VarDesc::GetDataTypes() const {
return
res
;
}
void
VarDesc
::
SetCapacity
(
int64_t
capacity
)
{
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
desc_
.
mutable_type
()
->
mutable_channel
()
->
set_capacity
(
capacity
);
break
;
default:
PADDLE_THROW
(
"Setting 'capacity' is not supported by the type of var %s."
,
this
->
Name
());
}
}
void
VarDesc
::
SetLoDLevel
(
int32_t
lod_level
)
{
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
LOD_TENSOR
:
...
...
@@ -214,19 +191,6 @@ std::vector<int32_t> VarDesc::GetLoDLevels() const {
}
}
const
proto
::
VarType
::
ChannelDesc
&
VarDesc
::
channel_desc
()
const
{
PADDLE_ENFORCE
(
desc_
.
has_type
(),
"The var's type hasn't been set."
);
PADDLE_ENFORCE
(
desc_
.
type
().
has_type
(),
"The var type hasn't been set."
);
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
return
desc_
.
type
().
channel
();
default:
PADDLE_THROW
(
"Getting 'channel_desc' is not supported by the type of var %s."
,
this
->
Name
());
}
}
const
proto
::
VarType
::
TensorDesc
&
VarDesc
::
tensor_desc
()
const
{
PADDLE_ENFORCE
(
desc_
.
has_type
(),
"The var's type hasn't been set."
);
PADDLE_ENFORCE
(
desc_
.
type
().
has_type
(),
"The var type hasn't been set."
);
...
...
@@ -262,20 +226,6 @@ std::vector<proto::VarType::TensorDesc> VarDesc::tensor_descs() const {
}
}
proto
::
VarType
::
ChannelDesc
*
VarDesc
::
mutable_channel_desc
()
{
PADDLE_ENFORCE
(
desc_
.
has_type
(),
"The var type hasn't been set."
);
PADDLE_ENFORCE
(
desc_
.
type
().
has_type
(),
"The var type hasn't been set."
);
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
return
desc_
.
mutable_type
()
->
mutable_channel
();
default:
PADDLE_THROW
(
"Getting 'mutable_channel_desc' is not supported by the type of var "
"%s."
,
this
->
Name
());
}
}
proto
::
VarType
::
TensorDesc
*
VarDesc
::
mutable_tensor_desc
()
{
PADDLE_ENFORCE
(
desc_
.
has_type
(),
"The var type hasn't been set."
);
PADDLE_ENFORCE
(
desc_
.
type
().
has_type
(),
"The var type hasn't been set."
);
...
...
paddle/fluid/framework/var_desc.h
浏览文件 @
4cc782db
...
...
@@ -87,8 +87,6 @@ class VarDesc {
void
SetDataTypes
(
const
std
::
vector
<
proto
::
VarType
::
Type
>
&
multiple_data_type
);
void
SetCapacity
(
int64_t
capacity
);
proto
::
VarType
::
Type
GetDataType
()
const
;
std
::
vector
<
proto
::
VarType
::
Type
>
GetDataTypes
()
const
;
...
...
@@ -110,10 +108,8 @@ class VarDesc {
void
SetPersistable
(
bool
persistable
)
{
desc_
.
set_persistable
(
persistable
);
}
private:
const
proto
::
VarType
::
ChannelDesc
&
channel_desc
()
const
;
const
proto
::
VarType
::
TensorDesc
&
tensor_desc
()
const
;
std
::
vector
<
proto
::
VarType
::
TensorDesc
>
tensor_descs
()
const
;
proto
::
VarType
::
ChannelDesc
*
mutable_channel_desc
();
proto
::
VarType
::
TensorDesc
*
mutable_tensor_desc
();
std
::
vector
<
proto
::
VarType
::
TensorDesc
*>
mutable_tensor_descs
();
...
...
paddle/fluid/framework/var_type.h
浏览文件 @
4cc782db
...
...
@@ -13,7 +13,6 @@ See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/framework.pb.h"
#include "paddle/fluid/framework/lod_rank_table.h"
#include "paddle/fluid/framework/lod_tensor.h"
...
...
@@ -41,8 +40,6 @@ inline proto::VarType::Type ToVarType(std::type_index type) {
return
proto
::
VarType_Type_SELECTED_ROWS
;
}
else
if
(
IsType
<
ReaderHolder
>
(
type
))
{
return
proto
::
VarType_Type_READER
;
}
else
if
(
IsType
<
ChannelHolder
>
(
type
))
{
return
proto
::
VarType_Type_CHANNEL
;
}
else
{
PADDLE_THROW
(
"ToVarType:Unsupported type %s"
,
type
.
name
());
}
...
...
@@ -66,9 +63,6 @@ inline void VisitVarType(const framework::Variable& var, Visitor visitor) {
case
proto
::
VarType_Type_READER
:
visitor
(
var
.
Get
<
ReaderHolder
>
());
return
;
case
proto
::
VarType_Type_CHANNEL
:
visitor
(
var
.
Get
<
ChannelHolder
>
());
return
;
default:
PADDLE_THROW
(
"Not supported visit type, %d"
,
ToVarType
(
var
.
Type
()));
}
...
...
paddle/fluid/inference/analysis/analysis_pass.h
浏览文件 @
4cc782db
...
...
@@ -41,12 +41,6 @@ class AnalysisPass {
// all passes have run.
virtual
bool
Finalize
()
{
return
false
;
}
// Get a Pass appropriate to print the Node this pass operates on.
virtual
AnalysisPass
*
CreatePrinterPass
(
std
::
ostream
&
os
,
const
std
::
string
&
banner
)
const
{
return
nullptr
;
}
// Create a debugger Pass that draw the DFG by graphviz toolkit.
virtual
AnalysisPass
*
CreateGraphvizDebugerPass
()
const
{
return
nullptr
;
}
...
...
paddle/fluid/inference/analysis/analyzer_tester.cc
浏览文件 @
4cc782db
...
...
@@ -37,12 +37,16 @@ TEST(Analyzer, analysis_without_tensorrt) {
TEST
(
Analyzer
,
analysis_with_tensorrt
)
{
FLAGS_IA_enable_tensorrt_subgraph_engine
=
true
;
Argument
argument
;
argument
.
Set
<
int
>
(
"minimum_subgraph_size"
,
new
int
(
0
));
argument
.
Set
<
int
>
(
"max_batch_size"
,
new
int
(
3
));
argument
.
Set
<
int
>
(
"workspace_size"
,
new
int
(
1
<<
20
));
argument
.
Set
<
std
::
string
>
(
"precision_mode"
,
new
std
::
string
(
"FP32"
));
argument
.
fluid_model_dir
.
reset
(
new
std
::
string
(
FLAGS_inference_model_dir
));
Analyzer
analyser
;
analyser
.
Run
(
&
argument
);
}
void
TestWord2vecPrediction
(
const
std
::
string
&
model_path
)
{
void
TestWord2vecPrediction
(
const
std
::
string
&
model_path
)
{
NativeConfig
config
;
config
.
model_dir
=
model_path
;
config
.
use_gpu
=
false
;
...
...
@@ -73,8 +77,8 @@ void TestWord2vecPrediction(const std::string &model_path) {
// The outputs' buffers are in CPU memory.
for
(
size_t
i
=
0
;
i
<
std
::
min
(
5UL
,
num_elements
);
i
++
)
{
LOG
(
INFO
)
<<
"data: "
<<
static_cast
<
float
*>
(
outputs
.
front
().
data
.
data
())[
i
];
PADDLE_ENFORCE
(
static_cast
<
float
*>
(
outputs
.
front
().
data
.
data
())[
i
],
<<
static_cast
<
float
*>
(
outputs
.
front
().
data
.
data
())[
i
];
PADDLE_ENFORCE
(
static_cast
<
float
*>
(
outputs
.
front
().
data
.
data
())[
i
],
result
[
i
]);
}
}
...
...
paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass.cc
浏览文件 @
4cc782db
...
...
@@ -97,8 +97,10 @@ void DataFlowGraphToFluidPass::AddFluidOp(Node *node) {
}
}
void
CreateTrtEngineOp
(
Node
*
node
,
const
DataFlowGraph
&
graph
,
void
CreateTrtEngineOp
(
Node
*
node
,
Argument
*
argument
,
framework
::
proto
::
BlockDesc
*
block
)
{
PADDLE_ENFORCE
(
argument
->
main_dfg
.
get
());
const
DataFlowGraph
&
graph
=
*
(
argument
->
main_dfg
);
static
int
counter
{
0
};
PADDLE_ENFORCE
(
node
->
IsFunctionBlock
());
framework
::
OpDesc
desc
;
...
...
@@ -204,7 +206,10 @@ void CreateTrtEngineOp(Node *node, const DataFlowGraph &graph,
PADDLE_ENFORCE
(
!
block
->
vars
().
empty
(),
"the block has no var-desc"
);
// Set attrs
SetAttr
(
desc
.
Proto
(),
"subgraph"
,
block
->
SerializeAsString
());
SetAttr
(
desc
.
Proto
(),
"max_batch_size"
,
argument
->
Get
<
int
>
(
"max_batch_size"
));
SetAttr
(
desc
.
Proto
(),
"workspace_size"
,
argument
->
Get
<
int
>
(
"workspace_size"
));
SetAttr
(
desc
.
Proto
(),
"engine_uniq_key"
,
"trt-"
+
std
::
to_string
(
counter
++
));
SetAttr
(
desc
.
Proto
(),
"parameters"
,
ExtractParameters
(
graph
.
nodes
.
nodes
()));
SetAttr
(
desc
.
Proto
(),
"output_name_mapping"
,
output_mapping
);
...
...
@@ -248,7 +253,7 @@ void DataFlowGraphToFluidPass::AddEngineOp(Node *node) {
*
block_desc
.
Proto
()
->
mutable_vars
()
=
argument_
->
origin_program_desc
->
blocks
(
0
).
vars
();
PADDLE_ENFORCE
(
!
block_desc
.
Proto
()
->
vars
().
empty
());
CreateTrtEngineOp
(
node
,
*
argument_
->
main_dfg
,
block_desc
.
Proto
());
CreateTrtEngineOp
(
node
,
argument_
,
block_desc
.
Proto
());
auto
*
main_block
=
desc_
->
mutable_blocks
(
framework
::
kRootBlockIndex
);
auto
*
op
=
main_block
->
add_ops
();
PADDLE_ENFORCE
(
!
node
->
pb_msg
().
empty
(),
"failed to set desc for block"
);
...
...
paddle/fluid/inference/analysis/subgraph_splitter.cc
浏览文件 @
4cc782db
...
...
@@ -309,6 +309,8 @@ void SubGraphFuse::operator()() { ReplaceNodesWithSubGraphs(); }
void
SubGraphFuse
::
ReplaceNodesWithSubGraphs
()
{
auto
subgraphs
=
SubGraphSplitter
(
graph_
,
node_inside_subgraph_teller_
)();
for
(
auto
&
subgraph
:
subgraphs
)
{
if
(
subgraph
.
size
()
<=
argument_
->
Get
<
int
>
(
"minimum_subgraph_size"
))
continue
;
std
::
unordered_set
<
Node
*>
subgraph_uniq
(
subgraph
.
begin
(),
subgraph
.
end
());
// replace this sub-graph with the first node. Two steps: 1. Create a Block
// Node that contains this subgraph 2. Mark the nodes inside the sub-graph
...
...
paddle/fluid/inference/analysis/subgraph_splitter.h
浏览文件 @
4cc782db
...
...
@@ -20,6 +20,7 @@ limitations under the License. */
#include <vector>
#include "paddle/fluid/inference/analysis/argument.h"
#include "paddle/fluid/inference/analysis/data_flow_graph.h"
#include "paddle/fluid/inference/analysis/node.h"
...
...
@@ -63,8 +64,11 @@ class SubGraphFuse {
public:
using
NodeInsideSubgraphTeller
=
SubGraphSplitter
::
NodeInsideSubgraphTeller
;
SubGraphFuse
(
DataFlowGraph
*
graph
,
const
NodeInsideSubgraphTeller
&
teller
)
:
graph_
(
graph
),
node_inside_subgraph_teller_
(
teller
)
{}
SubGraphFuse
(
DataFlowGraph
*
graph
,
const
NodeInsideSubgraphTeller
&
teller
,
Argument
*
argument
)
:
graph_
(
graph
),
node_inside_subgraph_teller_
(
teller
),
argument_
(
argument
)
{}
// The main method which run all the logic.
void
operator
()();
...
...
@@ -76,6 +80,7 @@ class SubGraphFuse {
private:
DataFlowGraph
*
graph_
;
NodeInsideSubgraphTeller
node_inside_subgraph_teller_
;
Argument
*
argument_
;
};
}
// namespace analysis
...
...
paddle/fluid/inference/analysis/subgraph_splitter_tester.cc
浏览文件 @
4cc782db
...
...
@@ -66,10 +66,12 @@ TEST(SubGraphSplitter, Split) {
TEST
(
SubGraphSplitter
,
Fuse
)
{
auto
desc
=
LoadProgramDesc
(
FLAGS_inference_model_dir
+
"/__model__"
);
auto
dfg
=
ProgramDescToDFG
(
desc
);
Argument
argument
;
argument
.
Set
<
int
>
(
"minimum_subgraph_size"
,
new
int
(
3
));
size_t
count0
=
dfg
.
nodes
.
size
();
SubGraphFuse
fuse
(
&
dfg
,
teller
);
SubGraphFuse
fuse
(
&
dfg
,
teller
,
&
argument
);
fuse
();
int
count1
=
0
;
...
...
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.cc
浏览文件 @
4cc782db
...
...
@@ -24,7 +24,7 @@ TensorRTSubGraphPass::TensorRTSubGraphPass(
:
node_inside_subgraph_teller_
(
teller
)
{}
void
TensorRTSubGraphPass
::
Run
(
DataFlowGraph
*
graph
)
{
SubGraphFuse
(
graph
,
node_inside_subgraph_teller_
)();
SubGraphFuse
(
graph
,
node_inside_subgraph_teller_
,
argument_
)();
VLOG
(
4
)
<<
"debug info "
<<
graph
->
HumanReadableInfo
(
false
/*show_values*/
,
true
/*show_functions*/
);
...
...
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.h
浏览文件 @
4cc782db
...
...
@@ -33,7 +33,10 @@ class TensorRTSubGraphPass : public DataFlowGraphPass {
explicit
TensorRTSubGraphPass
(
const
NodeInsideSubgraphTeller
&
teller
);
bool
Initialize
(
Argument
*
argument
)
override
{
return
true
;
}
bool
Initialize
(
Argument
*
argument
)
override
{
argument_
=
argument
;
return
true
;
}
// This class get a sub-graph as input and determine whether to transform this
// sub-graph into TensorRT.
...
...
@@ -46,6 +49,7 @@ class TensorRTSubGraphPass : public DataFlowGraphPass {
private:
NodeInsideSubgraphTeller
node_inside_subgraph_teller_
;
Argument
*
argument_
;
};
}
// namespace analysis
...
...
paddle/fluid/inference/analysis/tensorrt_subgraph_pass_tester.cc
浏览文件 @
4cc782db
...
...
@@ -36,6 +36,10 @@ TEST(TensorRTSubGraphPass, main) {
};
Argument
argument
(
FLAGS_inference_model_dir
);
argument
.
Set
<
int
>
(
"minimum_subgraph_size"
,
new
int
(
0
));
argument
.
Set
<
int
>
(
"max_batch_size"
,
new
int
(
3
));
argument
.
Set
<
int
>
(
"workspace_size"
,
new
int
(
1
<<
20
));
argument
.
Set
<
std
::
string
>
(
"precision_mode"
,
new
std
::
string
(
"FP32"
));
DFG_GraphvizDrawPass
::
Config
config
{
FLAGS_dot_dir
,
"origin"
};
DFG_GraphvizDrawPass
::
Config
config1
{
FLAGS_dot_dir
,
"fusion"
};
...
...
paddle/fluid/inference/api/api_impl_tester.cc
浏览文件 @
4cc782db
...
...
@@ -21,6 +21,12 @@ limitations under the License. */
#include "paddle/fluid/inference/api/api_impl.h"
#include "paddle/fluid/inference/tests/test_helper.h"
#ifdef __clang__
#define ACC_DIFF 4e-3
#else
#define ACC_DIFF 1e-3
#endif
DEFINE_string
(
dirname
,
""
,
"Directory of the inference model."
);
namespace
paddle
{
...
...
@@ -99,8 +105,8 @@ void MainWord2Vec(bool use_gpu) {
float
*
lod_data
=
output1
.
data
<
float
>
();
for
(
int
i
=
0
;
i
<
output1
.
numel
();
++
i
)
{
EXPECT_LT
(
lod_data
[
i
]
-
data
[
i
],
1e-3
);
EXPECT_GT
(
lod_data
[
i
]
-
data
[
i
],
-
1e-3
);
EXPECT_LT
(
lod_data
[
i
]
-
data
[
i
],
ACC_DIFF
);
EXPECT_GT
(
lod_data
[
i
]
-
data
[
i
],
-
ACC_DIFF
);
}
}
...
...
@@ -144,7 +150,7 @@ void MainImageClassification(bool use_gpu) {
float
*
data
=
static_cast
<
float
*>
(
outputs
[
0
].
data
.
data
());
float
*
lod_data
=
output1
.
data
<
float
>
();
for
(
size_t
j
=
0
;
j
<
len
/
sizeof
(
float
);
++
j
)
{
EXPECT_NEAR
(
lod_data
[
j
],
data
[
j
],
1e-3
);
EXPECT_NEAR
(
lod_data
[
j
],
data
[
j
],
ACC_DIFF
);
}
}
...
...
@@ -199,7 +205,7 @@ void MainThreadsWord2Vec(bool use_gpu) {
float
*
ref_data
=
refs
[
tid
].
data
<
float
>
();
EXPECT_EQ
(
refs
[
tid
].
numel
(),
static_cast
<
int64_t
>
(
len
/
sizeof
(
float
)));
for
(
int
i
=
0
;
i
<
refs
[
tid
].
numel
();
++
i
)
{
EXPECT_NEAR
(
ref_data
[
i
],
data
[
i
],
1e-3
);
EXPECT_NEAR
(
ref_data
[
i
],
data
[
i
],
ACC_DIFF
);
}
});
}
...
...
@@ -251,7 +257,7 @@ void MainThreadsImageClassification(bool use_gpu) {
float
*
ref_data
=
refs
[
tid
].
data
<
float
>
();
EXPECT_EQ
((
size_t
)
refs
[
tid
].
numel
(),
len
/
sizeof
(
float
));
for
(
int
i
=
0
;
i
<
refs
[
tid
].
numel
();
++
i
)
{
EXPECT_NEAR
(
ref_data
[
i
],
data
[
i
],
1e-3
);
EXPECT_NEAR
(
ref_data
[
i
],
data
[
i
],
ACC_DIFF
);
}
});
}
...
...
paddle/fluid/inference/api/api_tensorrt_subgraph_engine.cc
浏览文件 @
4cc782db
...
...
@@ -35,8 +35,6 @@ class TensorRTSubgraphPredictor : public NativePaddlePredictor {
bool
Init
(
const
std
::
shared_ptr
<
framework
::
Scope
>&
parent_scope
)
{
FLAGS_IA_enable_tensorrt_subgraph_engine
=
true
;
VLOG
(
3
)
<<
"Predictor::init()"
;
FLAGS_tensorrt_max_batch_size
=
config_
.
max_batch_size
;
FLAGS_tensorrt_workspace_size
=
config_
.
workspace_size
;
if
(
config_
.
use_gpu
)
{
place_
=
paddle
::
platform
::
CUDAPlace
(
config_
.
device
);
}
else
{
...
...
@@ -92,6 +90,14 @@ class TensorRTSubgraphPredictor : public NativePaddlePredictor {
void
OptimizeInferenceProgram
()
{
// Analyze inference_program
Argument
argument
;
argument
.
Set
<
int
>
(
"minimum_subgraph_size"
,
new
int
(
config_
.
minimum_subgraph_size
));
argument
.
Set
<
int
>
(
"max_batch_size"
,
new
int
(
config_
.
max_batch_size
));
argument
.
Set
<
int
>
(
"workspace_size"
,
new
int
(
config_
.
workspace_size
));
argument
.
Set
<
std
::
string
>
(
"precision_mode"
,
new
std
::
string
(
config_
.
precision_mode
));
if
(
!
config_
.
model_dir
.
empty
())
{
argument
.
fluid_model_dir
.
reset
(
new
std
::
string
(
config_
.
model_dir
));
}
else
{
...
...
paddle/fluid/inference/api/paddle_inference_api.h
浏览文件 @
4cc782db
...
...
@@ -194,6 +194,14 @@ struct MixedRTConfig : public NativeConfig {
// For workspace_size, refer it from here:
// https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#troubleshooting
int
workspace_size
{
1
<<
30
};
// We transform the Ops that can be converted into TRT layer in the model,
// and aggregate these Ops into subgraphs for TRT execution.
// We set this variable to control the minimum number of nodes in the
// subgraph, 3 as default value.
int
minimum_subgraph_size
=
3
;
// Reserved configuration
// We just support "FP32" now, "FP16" and "INT8" will be supported.
std
::
string
precision_mode
=
"FP32"
;
};
// NOTE WIP, not stable yet.
...
...
paddle/fluid/inference/tests/api/CMakeLists.txt
浏览文件 @
4cc782db
...
...
@@ -85,3 +85,13 @@ if (WITH_ANAKIN AND WITH_MKL) # only needed in CI
DEPS inference_anakin_api_shared dynload_cuda SERIAL
)
endif
()
endif
()
if
(
WITH_GPU AND TENSORRT_FOUND
)
set
(
TRT_MODEL_INSTALL_DIR
"
${
INFERENCE_DEMO_INSTALL_DIR
}
/trt"
)
if
(
NOT EXISTS
${
TRT_MODEL_INSTALL_DIR
}
)
inference_download_and_uncompress
(
${
TRT_MODEL_INSTALL_DIR
}
${
INFERENCE_URL
}
/tensorrt_test
"trt_test_models.tar.gz"
)
endif
()
cc_test
(
test_trt_models SRCS trt_models_tester.cc
ARGS --dirname=
${
TRT_MODEL_INSTALL_DIR
}
/trt_test_models
DEPS paddle_inference_tensorrt_subgraph_engine
)
endif
()
paddle/fluid/inference/tests/api/trt_models_tester.cc
0 → 100644
浏览文件 @
4cc782db
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <gtest/gtest.h>
#include "paddle/fluid/inference/analysis/analyzer.h"
#include "paddle/fluid/inference/api/paddle_inference_api.h"
namespace
paddle
{
using
paddle
::
contrib
::
MixedRTConfig
;
DEFINE_string
(
dirname
,
""
,
"Directory of the inference model."
);
NativeConfig
GetConfigNative
()
{
NativeConfig
config
;
config
.
model_dir
=
FLAGS_dirname
;
// LOG(INFO) << "dirname " << config.model_dir;
config
.
fraction_of_gpu_memory
=
0.45
;
config
.
use_gpu
=
true
;
config
.
device
=
0
;
return
config
;
}
MixedRTConfig
GetConfigTRT
()
{
MixedRTConfig
config
;
config
.
model_dir
=
FLAGS_dirname
;
config
.
use_gpu
=
true
;
config
.
fraction_of_gpu_memory
=
0.2
;
config
.
device
=
0
;
config
.
max_batch_size
=
3
;
return
config
;
}
void
CompareTensorRTWithFluid
(
int
batch_size
,
std
::
string
model_dirname
)
{
NativeConfig
config0
=
GetConfigNative
();
config0
.
model_dir
=
model_dirname
;
MixedRTConfig
config1
=
GetConfigTRT
();
config1
.
model_dir
=
model_dirname
;
config1
.
max_batch_size
=
batch_size
;
auto
predictor0
=
CreatePaddlePredictor
<
NativeConfig
,
PaddleEngineKind
::
kNative
>
(
config0
);
auto
predictor1
=
CreatePaddlePredictor
<
MixedRTConfig
,
PaddleEngineKind
::
kAutoMixedTensorRT
>
(
config1
);
// Prepare inputs
int
height
=
224
;
int
width
=
224
;
float
*
data
=
new
float
[
batch_size
*
3
*
height
*
width
];
memset
(
data
,
0
,
sizeof
(
float
)
*
(
batch_size
*
3
*
height
*
width
));
data
[
0
]
=
1.0
f
;
// Prepare inputs
PaddleTensor
tensor
;
tensor
.
name
=
"input_0"
;
tensor
.
shape
=
std
::
vector
<
int
>
({
batch_size
,
3
,
height
,
width
});
tensor
.
data
=
PaddleBuf
(
static_cast
<
void
*>
(
data
),
sizeof
(
float
)
*
(
batch_size
*
3
*
height
*
width
));
tensor
.
dtype
=
PaddleDType
::
FLOAT32
;
std
::
vector
<
PaddleTensor
>
paddle_tensor_feeds
(
1
,
tensor
);
// Prepare outputs
std
::
vector
<
PaddleTensor
>
outputs0
;
std
::
vector
<
PaddleTensor
>
outputs1
;
CHECK
(
predictor0
->
Run
(
paddle_tensor_feeds
,
&
outputs0
));
CHECK
(
predictor1
->
Run
(
paddle_tensor_feeds
,
&
outputs1
,
batch_size
));
// Get output.
ASSERT_EQ
(
outputs0
.
size
(),
1UL
);
ASSERT_EQ
(
outputs1
.
size
(),
1UL
);
const
size_t
num_elements
=
outputs0
.
front
().
data
.
length
()
/
sizeof
(
float
);
const
size_t
num_elements1
=
outputs1
.
front
().
data
.
length
()
/
sizeof
(
float
);
EXPECT_EQ
(
num_elements
,
num_elements1
);
auto
*
data0
=
static_cast
<
float
*>
(
outputs0
.
front
().
data
.
data
());
auto
*
data1
=
static_cast
<
float
*>
(
outputs1
.
front
().
data
.
data
());
ASSERT_GT
(
num_elements
,
0UL
);
for
(
size_t
i
=
0
;
i
<
std
::
min
(
num_elements
,
num_elements1
);
i
++
)
{
EXPECT_NEAR
(
data0
[
i
],
data1
[
i
],
1e-3
);
}
}
TEST
(
trt_models_test
,
main
)
{
std
::
vector
<
std
::
string
>
infer_models
=
{
"mobilenet"
,
"resnet50"
,
"resnext50"
};
for
(
auto
&
model_dir
:
infer_models
)
{
CompareTensorRTWithFluid
(
1
,
FLAGS_dirname
+
"/"
+
model_dir
);
}
}
}
// namespace paddle
paddle/fluid/operators/CMakeLists.txt
浏览文件 @
4cc782db
...
...
@@ -301,6 +301,7 @@ op_library(fusion_lstm_op DEPS cpu_lstm_compute)
if
(
WITH_GPU
)
op_library
(
conv_op DEPS vol2col depthwise_conv im2col
)
op_library
(
layer_norm_op DEPS cub
)
op_library
(
reduce_mean_op DEPS cub
)
else
()
op_library
(
conv_op DEPS vol2col im2col
)
endif
()
...
...
@@ -313,11 +314,6 @@ op_library(save_combine_op DEPS lod_tensor)
op_library
(
load_combine_op DEPS lod_tensor
)
op_library
(
concat_op DEPS concat
)
# FIXME(thuan): Move CSP operators to paddle/fluid/framework/operators/concurrency
add_subdirectory
(
concurrency
)
op_library
(
channel_send_op DEPS concurrency
)
op_library
(
channel_recv_op DEPS concurrency
)
list
(
REMOVE_ITEM GENERAL_OPS
${
DEPS_OPS
}
)
foreach
(
src
${
GENERAL_OPS
}
)
...
...
paddle/fluid/operators/auc_op.cc
浏览文件 @
4cc782db
...
...
@@ -36,11 +36,16 @@ class AucOp : public framework::OperatorWithKernel {
"Out and Label should have same height."
);
int
num_pred_buckets
=
ctx
->
Attrs
().
Get
<
int
>
(
"num_thresholds"
)
+
1
;
int
slide_steps
=
ctx
->
Attrs
().
Get
<
int
>
(
"slide_steps"
);
PADDLE_ENFORCE_GE
(
num_pred_buckets
,
1
,
"num_thresholds must larger than 1"
);
PADDLE_ENFORCE_GE
(
slide_steps
,
0
,
"slide_steps must be natural number"
);
ctx
->
SetOutputDim
(
"AUC"
,
{
1
});
ctx
->
SetOutputDim
(
"BatchAUC"
,
{
1
});
ctx
->
SetOutputDim
(
"StatPosOut"
,
{
num_pred_buckets
});
ctx
->
SetOutputDim
(
"StatNegOut"
,
{
num_pred_buckets
});
slide_steps
=
slide_steps
==
0
?
1
:
slide_steps
;
ctx
->
SetOutputDim
(
"StatPosOut"
,
{
slide_steps
,
num_pred_buckets
});
ctx
->
SetOutputDim
(
"StatNegOut"
,
{
slide_steps
,
num_pred_buckets
});
}
protected:
...
...
@@ -62,6 +67,7 @@ class AucOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput
(
"Label"
,
"A 2D int tensor indicating the label of the training data. "
"shape: [batch_size, 1]"
);
// TODO(typhoonzero): support weight input
AddInput
(
"StatPos"
,
"Statistic value when label = 1"
);
AddInput
(
"StatNeg"
,
"Statistic value when label = 0"
);
...
...
@@ -69,18 +75,19 @@ class AucOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"AUC"
,
"A scalar representing the "
"current area-under-the-curve."
);
AddOutput
(
"BatchAUC"
,
"The AUC for current batch"
);
AddOutput
(
"StatPosOut"
,
"Statistic value when label = 1"
);
AddOutput
(
"StatNegOut"
,
"Statistic value when label = 0"
);
AddAttr
<
std
::
string
>
(
"curve"
,
"Curve type, can be 'ROC' or 'PR'."
)
.
SetDefault
(
"ROC"
);
AddAttr
<
int
>
(
"num_thresholds"
,
"The number of thresholds to use when discretizing the"
"
roc curve."
)
AddAttr
<
int
>
(
"num_thresholds"
,
"The number of thresholds to use when discretizing the
roc curve."
)
.
SetDefault
((
2
<<
12
)
-
1
);
AddAttr
<
int
>
(
"slide_steps"
,
"Use slide steps to calc batch auc."
)
.
SetDefault
(
1
);
AddComment
(
R"DOC(
Area Under The Curve (AUC) Operator.
...
...
paddle/fluid/operators/auc_op.h
浏览文件 @
4cc782db
...
...
@@ -32,7 +32,9 @@ class AucKernel : public framework::OpKernel<T> {
std
::
string
curve
=
ctx
.
Attr
<
std
::
string
>
(
"curve"
);
int
num_thresholds
=
ctx
.
Attr
<
int
>
(
"num_thresholds"
);
// buckets contain numbers from 0 to num_thresholds
int
num_pred_buckets
=
num_thresholds
+
1
;
int
slide_steps
=
ctx
.
Attr
<
int
>
(
"slide_steps"
);
// Only use output var for now, make sure it's persistable and
// not cleaned up for each batch.
...
...
@@ -40,16 +42,19 @@ class AucKernel : public framework::OpKernel<T> {
auto
*
stat_pos
=
ctx
.
Output
<
Tensor
>
(
"StatPosOut"
);
auto
*
stat_neg
=
ctx
.
Output
<
Tensor
>
(
"StatNegOut"
);
auto
*
stat_pos_data
=
stat_pos
->
mutable_data
<
int64_t
>
(
ctx
.
GetPlace
());
auto
*
stat_neg_data
=
stat_neg
->
mutable_data
<
int64_t
>
(
ctx
.
GetPlace
());
calcAuc
(
ctx
,
label
,
predict
,
stat_pos_data
,
stat_neg_data
,
num_thresholds
,
auc
);
auto
*
origin_stat_pos
=
stat_pos
->
mutable_data
<
int64_t
>
(
ctx
.
GetPlace
());
auto
*
origin_stat_neg
=
stat_neg
->
mutable_data
<
int64_t
>
(
ctx
.
GetPlace
());
auto
*
batch_auc
=
ctx
.
Output
<
Tensor
>
(
"BatchAUC"
);
std
::
vector
<
int64_t
>
stat_pos_batch
(
num_pred_buckets
,
0
);
std
::
vector
<
int64_t
>
stat_neg_batch
(
num_pred_buckets
,
0
);
calcAuc
(
ctx
,
label
,
predict
,
stat_pos_batch
.
data
(),
stat_neg_batch
.
data
(),
num_thresholds
,
batch_auc
);
std
::
vector
<
int64_t
>
stat_pos_data
(
num_pred_buckets
,
0
);
std
::
vector
<
int64_t
>
stat_neg_data
(
num_pred_buckets
,
0
);
auto
stat_pos_calc
=
stat_pos_data
.
data
();
auto
stat_neg_calc
=
stat_neg_data
.
data
();
statAuc
(
label
,
predict
,
num_pred_buckets
,
num_thresholds
,
slide_steps
,
origin_stat_pos
,
origin_stat_neg
,
&
stat_pos_calc
,
&
stat_neg_calc
);
calcAuc
(
ctx
,
stat_pos_calc
,
stat_neg_calc
,
num_thresholds
,
auc
);
}
private:
...
...
@@ -58,29 +63,76 @@ class AucKernel : public framework::OpKernel<T> {
return
(
X1
>
X2
?
(
X1
-
X2
)
:
(
X2
-
X1
))
*
(
Y1
+
Y2
)
/
2.0
;
}
inline
static
void
calcAuc
(
const
framework
::
ExecutionContext
&
ctx
,
const
framework
::
Tensor
*
label
,
inline
static
void
statAuc
(
const
framework
::
Tensor
*
label
,
const
framework
::
Tensor
*
predict
,
int64_t
*
stat_pos
,
int64_t
*
stat_neg
,
int
num_thresholds
,
framework
::
Tensor
*
auc_tensor
)
{
const
int
num_pred_buckets
,
const
int
num_thresholds
,
const
int
slide_steps
,
int64_t
*
origin_stat_pos
,
int64_t
*
origin_stat_neg
,
int64_t
**
stat_pos
,
int64_t
**
stat_neg
)
{
size_t
batch_size
=
predict
->
dims
()[
0
];
size_t
inference_width
=
predict
->
dims
()[
1
];
const
T
*
inference_data
=
predict
->
data
<
T
>
();
const
auto
*
label_data
=
label
->
data
<
int64_t
>
();
auto
*
auc
=
auc_tensor
->
mutable_data
<
double
>
(
ctx
.
GetPlace
());
for
(
size_t
i
=
0
;
i
<
batch_size
;
i
++
)
{
uint32_t
binIdx
=
static_cast
<
uint32_t
>
(
inference_data
[
i
*
inference_width
+
1
]
*
num_thresholds
);
if
(
label_data
[
i
])
{
stat_pos
[
binIdx
]
+=
1.0
;
(
*
stat_pos
)
[
binIdx
]
+=
1.0
;
}
else
{
stat_neg
[
binIdx
]
+=
1.0
;
(
*
stat_neg
)
[
binIdx
]
+=
1.0
;
}
}
int
bucket_length
=
num_pred_buckets
*
sizeof
(
int64_t
);
// will stat auc unlimited.
if
(
slide_steps
==
0
)
{
for
(
int
slide
=
0
;
slide
<
num_pred_buckets
;
++
slide
)
{
origin_stat_pos
[
slide
]
+=
(
*
stat_pos
)[
slide
];
origin_stat_neg
[
slide
]
+=
(
*
stat_neg
)[
slide
];
}
*
stat_pos
=
origin_stat_pos
;
*
stat_neg
=
origin_stat_neg
;
}
else
{
for
(
int
slide
=
1
;
slide
<
slide_steps
;
++
slide
)
{
int
dst_idx
=
(
slide
-
1
)
*
num_pred_buckets
;
int
src_inx
=
slide
*
num_pred_buckets
;
std
::
memcpy
(
origin_stat_pos
+
dst_idx
,
origin_stat_pos
+
src_inx
,
bucket_length
);
std
::
memcpy
(
origin_stat_neg
+
dst_idx
,
origin_stat_neg
+
src_inx
,
bucket_length
);
}
std
::
memcpy
(
origin_stat_pos
+
(
slide_steps
-
1
)
*
num_pred_buckets
,
*
stat_pos
,
bucket_length
);
std
::
memcpy
(
origin_stat_neg
+
(
slide_steps
-
1
)
*
num_pred_buckets
,
*
stat_neg
,
bucket_length
);
std
::
memset
(
*
stat_pos
,
0
,
bucket_length
);
std
::
memset
(
*
stat_neg
,
0
,
bucket_length
);
for
(
int
slide
=
0
;
slide
<
num_pred_buckets
;
++
slide
)
{
int
stat_pos_steps
=
0
;
int
stat_neg_steps
=
0
;
for
(
int
step
=
0
;
step
<
slide_steps
;
++
step
)
{
stat_pos_steps
+=
origin_stat_pos
[
slide
+
step
*
num_pred_buckets
];
stat_neg_steps
+=
origin_stat_neg
[
slide
+
step
*
num_pred_buckets
];
}
(
*
stat_pos
)[
slide
]
+=
stat_pos_steps
;
(
*
stat_neg
)[
slide
]
+=
stat_neg_steps
;
}
}
}
inline
static
void
calcAuc
(
const
framework
::
ExecutionContext
&
ctx
,
int64_t
*
stat_pos
,
int64_t
*
stat_neg
,
int
num_thresholds
,
framework
::
Tensor
*
auc_tensor
)
{
auto
*
auc
=
auc_tensor
->
mutable_data
<
double
>
(
ctx
.
GetPlace
());
*
auc
=
0.0
f
;
double
totPos
=
0.0
;
...
...
@@ -96,7 +148,6 @@ class AucKernel : public framework::OpKernel<T> {
totPos
+=
stat_pos
[
idx
];
totNeg
+=
stat_neg
[
idx
];
*
auc
+=
trapezoidArea
(
totNeg
,
totNegPrev
,
totPos
,
totPosPrev
);
--
idx
;
}
...
...
paddle/fluid/operators/channel_close_op.cc
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/op_registry.h"
namespace
pf
=
paddle
::
framework
;
static
constexpr
char
kChannel
[]
=
"Channel"
;
namespace
paddle
{
namespace
operators
{
class
ChannelCloseOp
:
public
framework
::
OperatorBase
{
public:
ChannelCloseOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
private:
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
auto
&
inp
=
*
scope
.
FindVar
(
Input
(
kChannel
));
// Get the mutable version of the channel variable and closes it.
pf
::
ChannelHolder
*
ch
=
inp
.
GetMutable
<
framework
::
ChannelHolder
>
();
ch
->
close
();
}
};
class
ChannelCloseOpOpInferShape
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
context
)
const
override
{
PADDLE_ENFORCE
(
context
->
HasInput
(
"Channel"
),
"The input of ChannelClose op must be set"
);
}
};
class
ChannelCloseOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
kChannel
,
"The Channel Variable that should be closed by"
" the ChannelClose Op."
);
AddComment
(
R"DOC(
Channel Close Operator.
This operator closes an open channel.
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
channel_close
,
paddle
::
operators
::
ChannelCloseOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
ChannelCloseOpMaker
);
paddle/fluid/operators/channel_create_op.cc
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/lod_rank_table.h"
#include "paddle/fluid/framework/lod_tensor_array.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/reader.h"
namespace
pf
=
paddle
::
framework
;
static
constexpr
char
kOutput
[]
=
"Out"
;
namespace
paddle
{
namespace
operators
{
class
ChannelCreateOp
:
public
framework
::
OperatorBase
{
public:
ChannelCreateOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
private:
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
auto
&
out
=
*
scope
.
FindVar
(
Output
(
kOutput
));
// Determine the datatype and capacity of the channel to be created
// from the attributes provided.
auto
dtype
=
static_cast
<
framework
::
proto
::
VarType
::
Type
>
(
Attr
<
int
>
(
"data_type"
));
auto
capacity
=
Attr
<
int
>
(
"capacity"
);
// Based on the datatype, create a new channel holder initialized with
// the given capacity. When capacity is 0, an unbuffered channel is
// created.
pf
::
ChannelHolder
*
ch
=
out
.
GetMutable
<
framework
::
ChannelHolder
>
();
if
(
dtype
==
framework
::
proto
::
VarType
::
LOD_TENSOR
)
{
ch
->
Reset
<
pf
::
LoDTensor
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
SELECTED_ROWS
)
{
ch
->
Reset
<
pf
::
SelectedRows
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
LOD_RANK_TABLE
)
{
ch
->
Reset
<
pf
::
LoDRankTable
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
)
{
ch
->
Reset
<
pf
::
LoDTensorArray
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
READER
)
{
ch
->
Reset
<
pf
::
ReaderHolder
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
CHANNEL
)
{
ch
->
Reset
<
pf
::
ChannelHolder
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
BOOL
)
{
ch
->
Reset
<
bool
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
INT32
)
{
ch
->
Reset
<
int
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
INT64
)
{
ch
->
Reset
<
int64_t
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
FP32
)
{
ch
->
Reset
<
float
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
FP64
)
{
ch
->
Reset
<
double
>
(
capacity
);
}
else
{
PADDLE_THROW
(
"Data type %d is not in "
"[LOD_TENSOR, SELECTED_ROWS, LOD_RANK_TABLE, LOD_TENSOR_ARRAY, "
"READER, CHANNEL, BOOL, INT32, INT64, FP32, FP64]"
,
dtype
);
}
}
};
class
ChannelCreateOpOpInferShape
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
context
)
const
override
{
PADDLE_ENFORCE
(
context
->
HasOutput
(
kOutput
),
"The output of ChannelCreate op must be set"
);
context
->
SetOutputDim
(
kOutput
,
{
1
});
}
};
class
ChannelCreateOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddOutput
(
kOutput
,
"The object of a Channel type created by ChannelCreate Op."
);
AddAttr
<
int
>
(
"capacity"
,
"The size of the buffer of Channel."
)
.
SetDefault
(
0
);
AddAttr
<
int
>
(
"data_type"
,
"The data type of elements inside the Channel."
);
AddComment
(
R"DOC(
Channel Create Operator.
This operator creates an object of the VarType Channel and returns it.
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
channel_create
,
paddle
::
operators
::
ChannelCreateOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
ChannelCreateOpMaker
);
paddle/fluid/operators/channel_recv_op.cc
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/channel.h"
#include <paddle/fluid/framework/lod_rank_table.h>
#include <paddle/fluid/framework/lod_tensor_array.h>
#include <paddle/fluid/framework/reader.h>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/var_type.h"
#include "paddle/fluid/operators/concurrency/channel_util.h"
#include "paddle/fluid/operators/math/math_function.h"
static
constexpr
char
Channel
[]
=
"Channel"
;
static
constexpr
char
Status
[]
=
"Status"
;
static
constexpr
char
Out
[]
=
"Out"
;
namespace
paddle
{
namespace
operators
{
void
SetReceiveStatus
(
const
platform
::
Place
&
dev_place
,
framework
::
Variable
*
status_var
,
bool
status
)
{
auto
cpu
=
platform
::
CPUPlace
();
auto
status_tensor
=
status_var
->
GetMutable
<
framework
::
LoDTensor
>
()
->
mutable_data
<
bool
>
({
1
},
cpu
);
status_tensor
[
0
]
=
status
;
}
class
ChannelRecvOp
:
public
framework
::
OperatorBase
{
public:
ChannelRecvOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
Channel
),
"Input(Channel) of ChannelRecvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
Out
),
"Input(Channel) of ChannelRecvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
Status
),
"Output(Status) of ChannelRecvOp should not be null."
);
ctx
->
SetOutputDim
(
"Status"
,
{
1
});
}
private:
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
// Get the channel holder created by channel_create op, passed as input.
framework
::
ChannelHolder
*
ch
=
scope
.
FindVar
(
Input
(
Channel
))
->
GetMutable
<
framework
::
ChannelHolder
>
();
auto
output_var
=
scope
.
FindVar
(
Output
(
Out
));
// Receive the data from the channel.
bool
ok
=
concurrency
::
ChannelReceive
(
ch
,
output_var
);
// Set the status output of the `ChannelReceive` call.
SetReceiveStatus
(
dev_place
,
scope
.
FindVar
(
Output
(
Status
)),
ok
);
}
};
class
ChannelRecvOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
Channel
,
"(Channel) A variable which
\"
receives
\"
the a value sent"
"to it by a channel_send op."
)
.
AsDuplicable
();
AddOutput
(
Out
,
"(Variable) Output Variable that will hold the data received"
" from the Channel"
)
.
AsDuplicable
();
AddOutput
(
Status
,
"(Tensor) An LoD Tensor that returns a boolean status of the"
"result of the receive operation."
)
.
AsDuplicable
();
AddComment
(
R"DOC(
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
channel_recv
,
paddle
::
operators
::
ChannelRecvOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
ChannelRecvOpMaker
);
paddle/fluid/operators/channel_send_op.cc
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/channel.h"
#include <paddle/fluid/framework/lod_rank_table.h>
#include <paddle/fluid/framework/lod_tensor_array.h>
#include <paddle/fluid/framework/reader.h>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/var_type.h"
#include "paddle/fluid/operators/concurrency/channel_util.h"
#include "paddle/fluid/operators/math/math_function.h"
static
constexpr
char
Channel
[]
=
"Channel"
;
static
constexpr
char
X
[]
=
"X"
;
namespace
paddle
{
namespace
operators
{
class
ChannelSendOp
:
public
framework
::
OperatorBase
{
public:
ChannelSendOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
Channel
),
"Input(Channel) of ChannelSendOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
X
),
"Input(X) of ChannelSendOp should not be null."
);
}
private:
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
// Get the channel holder created by channel_create op, passed as input.
framework
::
ChannelHolder
*
ch
=
scope
.
FindVar
(
Input
(
Channel
))
->
GetMutable
<
framework
::
ChannelHolder
>
();
auto
input_var
=
scope
.
FindVar
(
Input
(
X
));
// Send the input data through the channel.
concurrency
::
ChannelSend
(
ch
,
input_var
);
}
};
class
ChannelSendOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
Channel
,
"(Channel) A variable which
\"
sends
\"
the passed in value to "
"a listening receiver."
)
.
AsDuplicable
();
AddInput
(
X
,
"(Variable) The value which gets sent by the channel."
)
.
AsDuplicable
();
AddComment
(
R"DOC(
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
channel_send
,
paddle
::
operators
::
ChannelSendOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
ChannelSendOpMaker
);
paddle/fluid/operators/concurrency/CMakeLists.txt
已删除
100644 → 0
浏览文件 @
baa19fea
cc_library
(
concurrency SRCS channel_util.cc DEPS device_context framework_proto boost eigen3
)
paddle/fluid/operators/concurrency/channel_util.cc
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/concurrency/channel_util.h"
#include "paddle/fluid/framework/var_type.h"
namespace
poc
=
paddle
::
operators
::
concurrency
;
void
poc
::
ChannelSend
(
framework
::
ChannelHolder
*
ch
,
framework
::
Variable
*
var
)
{
auto
type
=
framework
::
ToVarType
(
var
->
Type
());
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
LoDTensor
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_RANK_TABLE
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
LoDRankTable
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR_ARRAY
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
LoDTensorArray
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_SELECTED_ROWS
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
SelectedRows
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_READER
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
ReaderHolder
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_CHANNEL
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
ChannelHolder
>
());
else
PADDLE_THROW
(
"ChannelSend:Unsupported type"
);
}
bool
poc
::
ChannelReceive
(
framework
::
ChannelHolder
*
ch
,
framework
::
Variable
*
var
)
{
// Get type of channel and use that to call mutable data for Variable
auto
type
=
framework
::
ToVarType
(
ch
->
Type
());
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
LoDTensor
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_RANK_TABLE
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
LoDRankTable
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR_ARRAY
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
LoDTensorArray
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_SELECTED_ROWS
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
SelectedRows
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_READER
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
ReaderHolder
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_CHANNEL
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
ChannelHolder
>
());
else
PADDLE_THROW
(
"ChannelReceive:Unsupported type"
);
}
void
poc
::
ChannelAddToSendQ
(
framework
::
ChannelHolder
*
ch
,
const
void
*
referrer
,
framework
::
Variable
*
var
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
framework
::
ChannelAction
)
>
cb
)
{
auto
type
=
framework
::
ToVarType
(
var
->
Type
());
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDTensor
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_RANK_TABLE
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDRankTable
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR_ARRAY
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDTensorArray
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_SELECTED_ROWS
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
SelectedRows
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_READER
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
ReaderHolder
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_CHANNEL
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
ChannelHolder
>
(),
cond
,
cb
);
}
else
{
PADDLE_THROW
(
"ChannelAddToSendQ:Unsupported type"
);
}
}
void
poc
::
ChannelAddToReceiveQ
(
framework
::
ChannelHolder
*
ch
,
const
void
*
referrer
,
framework
::
Variable
*
var
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
framework
::
ChannelAction
)
>
cb
)
{
auto
type
=
framework
::
ToVarType
(
var
->
Type
());
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDTensor
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_RANK_TABLE
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDRankTable
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR_ARRAY
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDTensorArray
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_SELECTED_ROWS
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
SelectedRows
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_READER
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
ReaderHolder
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_CHANNEL
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
ChannelHolder
>
(),
cond
,
cb
);
}
else
{
PADDLE_THROW
(
"ChannelAddToReceiveQ:Unsupported type"
);
}
}
paddle/fluid/operators/concurrency/channel_util.h
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/variable.h"
namespace
paddle
{
namespace
operators
{
namespace
concurrency
{
void
ChannelSend
(
framework
::
ChannelHolder
*
ch
,
framework
::
Variable
*
var
);
bool
ChannelReceive
(
framework
::
ChannelHolder
*
ch
,
framework
::
Variable
*
var
);
void
ChannelAddToSendQ
(
framework
::
ChannelHolder
*
ch
,
const
void
*
referrer
,
framework
::
Variable
*
var
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
framework
::
ChannelAction
)
>
cb
);
void
ChannelAddToReceiveQ
(
framework
::
ChannelHolder
*
ch
,
const
void
*
referrer
,
framework
::
Variable
*
var
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
framework
::
ChannelAction
)
>
cb
);
}
// namespace concurrency
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/conv_op.h
浏览文件 @
4cc782db
...
...
@@ -380,7 +380,8 @@ class DepthwiseConvKernel : public framework::OpKernel<T> {
math
::
DepthwiseConvFunctor
<
DeviceContext
,
T
>
depthwiseConv
;
auto
&
dev_ctx
=
context
.
template
device_context
<
DeviceContext
>();
depthwiseConv
(
dev_ctx
,
*
input
,
filter
,
strides
,
paddings
,
output
);
depthwiseConv
(
dev_ctx
,
*
input
,
filter
,
strides
,
paddings
,
dilations
,
output
);
}
};
...
...
@@ -415,14 +416,14 @@ class DepthwiseConvGradKernel : public framework::OpKernel<T> {
input_grad
->
mutable_data
<
T
>
(
context
.
GetPlace
());
set_zero
(
dev_ctx
,
input_grad
,
static_cast
<
T
>
(
0
));
depthwiseConvInputGrad
(
dev_ctx
,
*
input
,
filter
,
*
output_grad
,
strides
,
paddings
,
input_grad
);
paddings
,
dilations
,
input_grad
);
}
if
(
filter_grad
)
{
filter_grad
->
mutable_data
<
T
>
(
context
.
GetPlace
());
set_zero
(
dev_ctx
,
filter_grad
,
static_cast
<
T
>
(
0
));
depthwiseConvFilterGrad
(
dev_ctx
,
*
input
,
*
output_grad
,
strides
,
paddings
,
filter_grad
);
dilations
,
filter_grad
);
}
}
};
...
...
paddle/fluid/operators/conv_transpose_op.h
浏览文件 @
4cc782db
...
...
@@ -345,7 +345,7 @@ class DepthwiseConvTransposeKernel : public framework::OpKernel<T> {
math
::
DepthwiseConvInputGradFunctor
<
DeviceContext
,
T
>
depthwiseConvInputGrad
;
depthwiseConvInputGrad
(
dev_ctx
,
*
output
,
filter
,
*
input
,
strides
,
paddings
,
output
);
dilations
,
output
);
}
};
...
...
@@ -367,10 +367,11 @@ class DepthwiseConvTransposeGradKernel : public framework::OpKernel<T> {
auto
&
dev_ctx
=
context
.
template
device_context
<
DeviceContext
>();
std
::
vector
<
int
>
strides
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"strides"
);
std
::
vector
<
int
>
paddings
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"paddings"
);
std
::
vector
<
int
>
dilations
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"dilations"
);
if
(
input_grad
)
{
math
::
DepthwiseConvFunctor
<
DeviceContext
,
T
>
depthwiseConv
;
depthwiseConv
(
dev_ctx
,
*
output_grad
,
filter
,
strides
,
paddings
,
depthwiseConv
(
dev_ctx
,
*
output_grad
,
filter
,
strides
,
paddings
,
dilations
,
input_grad
);
}
...
...
@@ -382,7 +383,7 @@ class DepthwiseConvTransposeGradKernel : public framework::OpKernel<T> {
math
::
DepthwiseConvFilterGradFunctor
<
DeviceContext
,
T
>
depthwiseConvFilterGrad
;
depthwiseConvFilterGrad
(
dev_ctx
,
*
output_grad
,
*
input
,
strides
,
paddings
,
filter_grad
);
dilations
,
filter_grad
);
}
}
};
...
...
paddle/fluid/operators/cub_reduce.h
0 → 100644
浏览文件 @
4cc782db
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <algorithm>
#include <cmath>
#include <numeric>
#include <set>
#include <vector>
#include <cub/cub.cuh> // NOLINT
#include "paddle/fluid/framework/tensor.h"
namespace
paddle
{
namespace
operators
{
namespace
detail
{
template
<
typename
T
,
size_t
ElementCount
>
struct
Array
{
public:
HOSTDEVICE
inline
Array
()
{}
HOSTDEVICE
inline
T
&
operator
[](
size_t
index
)
{
return
data_
[
index
];
}
HOSTDEVICE
inline
const
T
&
operator
[](
size_t
index
)
const
{
return
data_
[
index
];
}
HOSTDEVICE
constexpr
inline
size_t
size
()
const
{
return
ElementCount
;
}
template
<
typename
VectorLikeType
>
static
inline
Array
<
T
,
ElementCount
>
From
(
const
VectorLikeType
&
vec
)
{
PADDLE_ENFORCE_EQ
(
vec
.
size
(),
ElementCount
,
"size not match"
);
size_t
n
=
static_cast
<
size_t
>
(
vec
.
size
());
Array
<
T
,
ElementCount
>
ret
;
for
(
size_t
i
=
0
;
i
<
n
;
++
i
)
ret
[
i
]
=
vec
[
i
];
return
ret
;
}
private:
T
data_
[
ElementCount
];
};
// reduce the last axis of 2d array
template
<
typename
Tx
,
typename
Ty
,
typename
ReduceOp
,
typename
TransformOp
,
int
BlockDim
>
__global__
void
ReduceKernel2D
(
const
Tx
*
x
,
Ty
*
y
,
ReduceOp
reducer
,
TransformOp
transformer
,
Ty
init
,
int
reduce_num
)
{
__shared__
typename
cub
::
BlockReduce
<
Ty
,
BlockDim
>::
TempStorage
temp_storage
;
int
idx_x
=
blockIdx
.
x
*
reduce_num
;
int
idx_y
=
threadIdx
.
x
;
Ty
reduce_var
=
init
;
for
(
int
idx_y
=
threadIdx
.
x
;
idx_y
<
reduce_num
;
idx_y
+=
BlockDim
)
reduce_var
=
reducer
(
reduce_var
,
transformer
(
x
[
idx_x
+
idx_y
]));
reduce_var
=
cub
::
BlockReduce
<
Ty
,
BlockDim
>
(
temp_storage
).
Reduce
(
reduce_var
,
reducer
);
if
(
threadIdx
.
x
==
0
)
{
y
[
blockIdx
.
x
]
=
reduce_var
;
}
}
template
<
typename
Tx
,
typename
Ty
,
typename
ReduceOp
,
typename
TransformOp
,
int
BlockDim
,
int
Rank
,
int
ReduceRank
>
__global__
void
ReduceKernel
(
const
Tx
*
x
,
Ty
*
y
,
ReduceOp
reducer
,
TransformOp
transformer
,
Ty
init
,
int
reduce_num
,
Array
<
int
,
Rank
>
x_strides
,
Array
<
int
,
ReduceRank
>
reduce_dim
,
Array
<
int
,
ReduceRank
>
reduce_strides
,
Array
<
int
,
Rank
-
ReduceRank
>
left_dim
,
Array
<
int
,
Rank
-
ReduceRank
>
left_strides
)
{
__shared__
typename
cub
::
BlockReduce
<
Ty
,
BlockDim
>::
TempStorage
temp_storage
;
Array
<
int
,
Rank
>
sub_index
;
int
left_idx
=
blockIdx
.
x
;
for
(
int
i
=
0
;
i
<
Rank
-
ReduceRank
;
++
i
)
{
sub_index
[
left_dim
[
i
]]
=
left_idx
/
left_strides
[
i
];
left_idx
%=
left_strides
[
i
];
}
int
reduce_idx
=
threadIdx
.
x
;
for
(
int
j
=
0
;
j
<
ReduceRank
;
++
j
)
{
sub_index
[
reduce_dim
[
j
]]
=
reduce_idx
/
reduce_strides
[
j
];
reduce_idx
%=
reduce_strides
[
j
];
}
int
idx_x
=
0
;
for
(
int
k
=
0
;
k
<
Rank
;
++
k
)
idx_x
+=
(
sub_index
[
k
]
*
x_strides
[
k
]);
Ty
reduce_var
=
static_cast
<
Ty
>
(
transformer
(
x
[
idx_x
]));
for
(
int
i
=
threadIdx
.
x
+
BlockDim
;
i
<
reduce_num
;
i
+=
BlockDim
)
{
int
reduce_idx
=
i
;
for
(
int
j
=
0
;
j
<
ReduceRank
;
++
j
)
{
sub_index
[
reduce_dim
[
j
]]
=
reduce_idx
/
reduce_strides
[
j
];
reduce_idx
%=
reduce_strides
[
j
];
}
int
idx_x
=
0
;
for
(
int
k
=
0
;
k
<
Rank
;
++
k
)
idx_x
+=
(
sub_index
[
k
]
*
x_strides
[
k
]);
reduce_var
=
static_cast
<
Ty
>
(
reducer
(
reduce_var
,
transformer
(
x
[
idx_x
])));
}
reduce_var
=
cub
::
BlockReduce
<
Ty
,
BlockDim
>
(
temp_storage
).
Reduce
(
reduce_var
,
reducer
);
if
(
threadIdx
.
x
==
0
)
{
y
[
blockIdx
.
x
]
=
reduce_var
;
}
}
static
inline
std
::
vector
<
int
>
GetStrides
(
const
std
::
vector
<
int
>&
dims
)
{
int
n
=
static_cast
<
int
>
(
dims
.
size
());
if
(
n
==
0
)
return
std
::
vector
<
int
>
();
std
::
vector
<
int
>
strides
(
n
);
strides
.
back
()
=
1
;
for
(
int
i
=
n
-
2
;
i
>=
0
;
--
i
)
{
strides
[
i
]
=
strides
[
i
+
1
]
*
dims
[
i
+
1
];
}
return
strides
;
}
static
inline
std
::
vector
<
int
>
GetStrides
(
const
std
::
vector
<
int
>&
dims
,
const
std
::
vector
<
int
>&
idx
)
{
int
n
=
static_cast
<
int
>
(
idx
.
size
());
if
(
n
==
0
)
return
std
::
vector
<
int
>
();
std
::
vector
<
int
>
strides
(
n
);
strides
.
back
()
=
1
;
for
(
int
i
=
n
-
2
;
i
>=
0
;
--
i
)
{
strides
[
i
]
=
strides
[
i
+
1
]
*
dims
[
idx
[
i
+
1
]];
}
return
strides
;
}
constexpr
int
kMaxBlockDim
=
512
;
static
inline
int
GetDesiredBlockDim
(
int
block_dim
)
{
return
block_dim
>=
kMaxBlockDim
?
kMaxBlockDim
:
(
1
<<
static_cast
<
int
>
(
std
::
log2
(
block_dim
)));
}
template
<
typename
Tx
,
typename
Ty
,
int
BlockDim
,
typename
ReduceOp
,
typename
TransformOp
>
static
void
TensorReduceImpl
(
const
Tx
*
x_data
,
Ty
*
y_data
,
const
platform
::
Place
&
place
,
const
ReduceOp
&
reducer
,
const
TransformOp
&
transformer
,
const
Ty
&
init
,
int
left_num
,
int
reduce_num
,
const
std
::
vector
<
int
>&
x_strides
,
const
std
::
vector
<
int
>&
reduce_dim
,
const
std
::
vector
<
int
>&
reduce_strides
,
const
std
::
vector
<
int
>&
left_dim
,
const
std
::
vector
<
int
>&
left_strides
,
cudaStream_t
stream
)
{
#define CUB_RANK_CASE(i, ...) \
case i: { \
constexpr auto kRank = i; \
switch (reduce_rank) { __VA_ARGS__; } \
} break
#define CUB_REDUCE_RANK_CASE(i, ...) \
case i: { \
constexpr auto kReduceRank = i; \
ReduceKernel<Tx, Ty, ReduceOp, TransformOp, BlockDim, kRank, \
kReduceRank><<<left_num, BlockDim, 0, stream>>>( \
x_data, y_data, reducer, transformer, init, reduce_num, \
Array<int, kRank>::From(x_strides), \
Array<int, kReduceRank>::From(reduce_dim), \
Array<int, kReduceRank>::From(reduce_strides), \
Array<int, kRank - kReduceRank>::From(left_dim), \
Array<int, kRank - kReduceRank>::From(left_strides)); \
} break
int
rank
=
x_strides
.
size
();
int
reduce_rank
=
reduce_strides
.
size
();
if
(
rank
==
reduce_rank
)
{
cub
::
TransformInputIterator
<
Ty
,
TransformOp
,
const
Tx
*>
trans_x
(
x_data
,
transformer
);
size_t
temp_storage_bytes
=
0
;
cub
::
DeviceReduce
::
Reduce
(
nullptr
,
temp_storage_bytes
,
trans_x
,
y_data
,
reduce_num
,
reducer
,
init
,
stream
);
framework
::
Tensor
tmp
;
auto
*
temp_storage
=
tmp
.
mutable_data
<
uint8_t
>
(
framework
::
make_ddim
({
static_cast
<
int64_t
>
(
temp_storage_bytes
)}),
place
);
cub
::
DeviceReduce
::
Reduce
(
temp_storage
,
temp_storage_bytes
,
trans_x
,
y_data
,
reduce_num
,
reducer
,
init
,
stream
);
return
;
}
if
(
rank
==
2
&&
reduce_rank
==
1
&&
reduce_dim
[
0
]
==
1
)
{
ReduceKernel2D
<
Tx
,
Ty
,
ReduceOp
,
TransformOp
,
BlockDim
><<<
left_num
,
BlockDim
,
0
,
stream
>>>
(
x_data
,
y_data
,
reducer
,
transformer
,
init
,
reduce_num
);
return
;
}
/*
if (rank == 3 && reduce_rank == 1 && reduce_dim[0] == 1) {
// TODO(liangdun): we can optimize 3d case which the 2nd axis is reduced.
// Currently, it is handled by code below, but inefficient
return;
}
*/
switch
(
rank
)
{
CUB_RANK_CASE
(
2
,
CUB_REDUCE_RANK_CASE
(
1
););
CUB_RANK_CASE
(
3
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
););
CUB_RANK_CASE
(
4
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
););
CUB_RANK_CASE
(
5
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
););
CUB_RANK_CASE
(
6
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
);
CUB_REDUCE_RANK_CASE
(
5
););
CUB_RANK_CASE
(
7
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
);
CUB_REDUCE_RANK_CASE
(
5
);
CUB_REDUCE_RANK_CASE
(
6
););
CUB_RANK_CASE
(
8
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
);
CUB_REDUCE_RANK_CASE
(
5
);
CUB_REDUCE_RANK_CASE
(
6
););
CUB_RANK_CASE
(
9
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
);
CUB_REDUCE_RANK_CASE
(
5
);
CUB_REDUCE_RANK_CASE
(
6
);
CUB_REDUCE_RANK_CASE
(
7
);
CUB_REDUCE_RANK_CASE
(
8
););
}
#undef CUB_REDUCE_RANK_CASE
#undef CUB_RANK_CASE
}
}
// namespace detail
template
<
typename
Tx
,
typename
Ty
,
typename
ReduceOp
,
typename
TransformOp
>
void
TensorReduce
(
const
framework
::
Tensor
&
x
,
framework
::
Tensor
*
y
,
std
::
vector
<
int
>
origin_reduce_dims
,
const
Ty
&
init
,
const
ReduceOp
&
reducer
,
const
TransformOp
&
transformer
,
cudaStream_t
stream
)
{
auto
x_dim
=
framework
::
vectorize2int
(
x
.
dims
());
std
::
vector
<
int
>
new_x_dim
,
new_reduce_dims
;
int
is_reduced
=
0
;
for
(
auto
e
:
origin_reduce_dims
)
{
auto
pos
=
e
>=
0
?
e
:
e
+
x_dim
.
size
();
is_reduced
|=
1
<<
e
;
}
for
(
int
i
=
0
;
i
<
x_dim
.
size
();
i
++
)
{
if
((
i
==
0
)
||
(((
is_reduced
>>
i
)
^
(
is_reduced
>>
(
i
-
1
)))
&
1
))
{
new_x_dim
.
push_back
(
x_dim
[
i
]);
if
((
is_reduced
>>
i
)
&
1
)
new_reduce_dims
.
push_back
(
new_x_dim
.
size
()
-
1
);
}
else
{
new_x_dim
[
new_x_dim
.
size
()
-
1
]
*=
x_dim
[
i
];
}
}
x_dim
=
new_x_dim
;
origin_reduce_dims
=
new_reduce_dims
;
int
x_rank
=
static_cast
<
int
>
(
x_dim
.
size
());
std
::
set
<
int
>
left_set
,
reduce_set
;
for
(
int
i
=
0
;
i
<
x_rank
;
++
i
)
left_set
.
insert
(
i
);
for
(
auto
e
:
origin_reduce_dims
)
{
left_set
.
erase
(
e
);
reduce_set
.
insert
(
e
);
}
std
::
vector
<
int
>
reduce_dim
(
reduce_set
.
begin
(),
reduce_set
.
end
());
std
::
vector
<
int
>
left_dim
(
left_set
.
begin
(),
left_set
.
end
());
std
::
vector
<
int
>
x_strides
=
detail
::
GetStrides
(
x_dim
);
std
::
vector
<
int
>
reduce_strides
=
detail
::
GetStrides
(
x_dim
,
reduce_dim
);
std
::
vector
<
int
>
left_strides
=
detail
::
GetStrides
(
x_dim
,
left_dim
);
int
reduce_num
=
reduce_strides
[
0
]
*
x_dim
[
reduce_dim
[
0
]];
int
left_num
=
1
;
if
(
left_dim
.
size
())
left_num
=
left_strides
[
0
]
*
x_dim
[
left_dim
[
0
]];
std
::
vector
<
int
>
y_dim
(
left_dim
.
size
());
for
(
int
i
=
0
;
i
<
left_dim
.
size
();
++
i
)
{
y_dim
[
i
]
=
x_dim
[
left_dim
[
i
]];
}
auto
x_data
=
x
.
data
<
Tx
>
();
auto
y_data
=
y
->
mutable_data
<
Ty
>
(
x
.
place
());
if
(
reduce_num
==
1
)
return
;
#define CUB_BLOCK_DIM_CASE(block_dim) \
case block_dim: { \
constexpr auto kBlockDim = block_dim; \
detail::TensorReduceImpl<Tx, Ty, block_dim, ReduceOp, TransformOp>( \
x_data, y_data, x.place(), reducer, transformer, init, left_num, \
reduce_num, x_strides, reduce_dim, reduce_strides, left_dim, \
left_strides, stream); \
} break
switch
(
detail
::
GetDesiredBlockDim
(
reduce_num
))
{
CUB_BLOCK_DIM_CASE
(
512
);
CUB_BLOCK_DIM_CASE
(
256
);
CUB_BLOCK_DIM_CASE
(
128
);
CUB_BLOCK_DIM_CASE
(
64
);
CUB_BLOCK_DIM_CASE
(
32
);
CUB_BLOCK_DIM_CASE
(
16
);
CUB_BLOCK_DIM_CASE
(
8
);
CUB_BLOCK_DIM_CASE
(
4
);
CUB_BLOCK_DIM_CASE
(
2
);
}
#undef CUB_BLOCK_DIM_CASE
}
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/distributed/grpc_client.h
浏览文件 @
4cc782db
...
...
@@ -15,6 +15,7 @@ limitations under the License. */
#pragma once
#include <time.h>
#include <atomic>
#include <chrono> // NOLINT
#include <condition_variable> // NOLINT
...
...
paddle/fluid/operators/distributed/request_handler.h
浏览文件 @
4cc782db
...
...
@@ -15,6 +15,7 @@
#pragma once
#include <time.h>
#include <condition_variable> // NOLINT
#include <functional>
#include <string>
...
...
paddle/fluid/operators/distributed/rpc_server.h
浏览文件 @
4cc782db
...
...
@@ -14,6 +14,7 @@
#pragma once
#include <atomic>
#include <set>
#include <string>
#include <thread> // NOLINT
...
...
paddle/fluid/operators/elementwise_op.h
浏览文件 @
4cc782db
...
...
@@ -89,7 +89,7 @@ class ElementwiseOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
bool
>
(
"use_mkldnn"
,
"(bool, default false). Used by MKLDNN."
)
.
SetDefault
(
false
);
AddComment
(
string
::
Sprintf
(
R"DOC(
Limited
Elementwise %s Operator
Elementwise %s Operator
The equation is:
...
...
paddle/fluid/operators/math/depthwise_conv.cu
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/depthwise_conv.h
浏览文件 @
4cc782db
...
...
@@ -32,7 +32,8 @@ class DepthwiseConvFunctor {
void
operator
()(
const
DeviceContext
&
context
,
const
framework
::
Tensor
&
input
,
const
framework
::
Tensor
&
filter
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
framework
::
Tensor
*
output
);
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
framework
::
Tensor
*
output
);
};
template
<
typename
DeviceContext
,
typename
T
>
...
...
@@ -43,6 +44,7 @@ class DepthwiseConvInputGradFunctor {
const
framework
::
Tensor
&
output_grad
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
framework
::
Tensor
*
input_grad
);
};
...
...
@@ -53,6 +55,7 @@ class DepthwiseConvFilterGradFunctor {
const
framework
::
Tensor
&
output_grad
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
framework
::
Tensor
*
filter_grad
);
};
...
...
paddle/fluid/operators/reduce_mean_op.cu
浏览文件 @
4cc782db
...
...
@@ -12,17 +12,64 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include <vector>
#include "paddle/fluid/operators/cub_reduce.h"
#include "paddle/fluid/operators/reduce_mean_op.h"
REGISTER_OP_CUDA_KERNEL
(
reduce_mean
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
,
ops
::
MeanFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
,
ops
::
MeanFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
,
ops
::
MeanFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int64_t
,
ops
::
MeanFunctor
>
);
namespace
paddle
{
namespace
operators
{
template
<
typename
T
>
struct
DivideFunctor
{
HOSTDEVICE
explicit
inline
DivideFunctor
(
int
n
)
:
n_inv
((
T
)(
1.0
/
n
))
{}
HOSTDEVICE
inline
T
operator
()(
const
T
&
x
)
const
{
return
x
*
n_inv
;
}
private:
T
n_inv
;
};
template
<
typename
T
>
class
ReduceMeanKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
bool
reduce_all
=
context
.
Attr
<
bool
>
(
"reduce_all"
);
auto
*
input
=
context
.
Input
<
Tensor
>
(
"X"
);
auto
*
output
=
context
.
Output
<
Tensor
>
(
"Out"
);
auto
dims
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"dim"
);
bool
keep_dim
=
context
.
Attr
<
bool
>
(
"keep_dim"
);
std
::
vector
<
int
>
reduce_dims
;
if
(
reduce_all
)
{
reduce_dims
.
resize
(
input
->
dims
().
size
());
for
(
int
i
=
0
;
i
<
reduce_dims
.
size
();
++
i
)
reduce_dims
[
i
]
=
i
;
}
else
{
for
(
auto
e
:
dims
)
{
reduce_dims
.
push_back
(
e
>=
0
?
e
:
e
+
input
->
dims
().
size
());
}
}
int
reduce_num
=
1
;
for
(
int
i
=
0
;
i
<
reduce_dims
.
size
();
++
i
)
{
reduce_num
*=
input
->
dims
()[
reduce_dims
[
i
]];
}
auto
stream
=
context
.
cuda_device_context
().
stream
();
TensorReduce
<
T
,
T
,
cub
::
Sum
,
DivideFunctor
<
T
>>
(
*
input
,
output
,
reduce_dims
,
static_cast
<
T
>
(
0
),
cub
::
Sum
(),
DivideFunctor
<
T
>
(
reduce_num
),
stream
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OP_CUDA_KERNEL
(
reduce_mean
,
ops
::
ReduceMeanKernel
<
float
>
,
ops
::
ReduceMeanKernel
<
double
>
,
ops
::
ReduceMeanKernel
<
int
>
,
ops
::
ReduceMeanKernel
<
int64_t
>
);
REGISTER_OP_CUDA_KERNEL
(
reduce_mean_grad
,
ops
::
ReduceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
,
ops
::
MeanGradFunctor
>
,
...
...
paddle/fluid/operators/reduce_sum_op.cu
浏览文件 @
4cc782db
...
...
@@ -12,17 +12,59 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/operators/cub_reduce.h"
#include "paddle/fluid/operators/reduce_sum_op.h"
REGISTER_OP_CUDA_KERNEL
(
reduce_sum
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
,
ops
::
SumFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
,
ops
::
SumFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
,
ops
::
SumFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int64_t
,
ops
::
SumFunctor
>
);
namespace
paddle
{
namespace
operators
{
template
<
typename
T
>
struct
IdentityFunctor
{
HOSTDEVICE
explicit
inline
IdentityFunctor
()
{}
HOSTDEVICE
inline
T
operator
()(
const
T
&
x
)
const
{
return
x
;
}
};
template
<
typename
T
>
class
ReduceSumKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
bool
reduce_all
=
context
.
Attr
<
bool
>
(
"reduce_all"
);
auto
*
input
=
context
.
Input
<
Tensor
>
(
"X"
);
auto
*
output
=
context
.
Output
<
Tensor
>
(
"Out"
);
auto
dims
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"dim"
);
bool
keep_dim
=
context
.
Attr
<
bool
>
(
"keep_dim"
);
std
::
vector
<
int
>
reduce_dims
;
if
(
reduce_all
)
{
reduce_dims
.
resize
(
input
->
dims
().
size
());
for
(
int
i
=
0
;
i
<
reduce_dims
.
size
();
++
i
)
reduce_dims
[
i
]
=
i
;
}
else
{
for
(
auto
e
:
dims
)
{
reduce_dims
.
push_back
(
e
>=
0
?
e
:
e
+
input
->
dims
().
size
());
}
}
int
reduce_num
=
1
;
for
(
int
i
=
0
;
i
<
reduce_dims
.
size
();
++
i
)
{
reduce_num
*=
input
->
dims
()[
reduce_dims
[
i
]];
}
auto
stream
=
context
.
cuda_device_context
().
stream
();
TensorReduce
<
T
,
T
,
cub
::
Sum
,
IdentityFunctor
<
T
>>
(
*
input
,
output
,
reduce_dims
,
static_cast
<
T
>
(
0
),
cub
::
Sum
(),
IdentityFunctor
<
T
>
(),
stream
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OP_CUDA_KERNEL
(
reduce_sum
,
ops
::
ReduceSumKernel
<
float
>
,
ops
::
ReduceSumKernel
<
double
>
,
ops
::
ReduceSumKernel
<
int
>
,
ops
::
ReduceSumKernel
<
int64_t
>
);
REGISTER_OP_CUDA_KERNEL
(
reduce_sum_grad
,
ops
::
ReduceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
,
ops
::
SumGradFunctor
>
,
...
...
paddle/fluid/operators/scale_op.cc
浏览文件 @
4cc782db
...
...
@@ -77,8 +77,10 @@ class ScaleOpVarTypeInference : public framework::VarTypeInference {
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
auto
*
out_var
=
block
->
FindVarRecursive
(
out_var_name
);
out_var
->
SetType
(
in_var
.
GetType
());
out_var
->
SetDataType
(
in_var
.
GetDataType
());
if
(
in_var_name
!=
out_var_name
)
{
out_var
->
SetType
(
in_var
.
GetType
());
out_var
->
SetDataType
(
in_var
.
GetDataType
());
}
}
};
...
...
paddle/fluid/operators/select_op.cc
已删除
100644 → 0
浏览文件 @
baa19fea
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <thread> // NOLINT
#include <vector>
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/concurrency/channel_util.h"
#include <boost/tokenizer.hpp>
namespace
paddle
{
namespace
operators
{
static
constexpr
char
kX
[]
=
"X"
;
static
constexpr
char
kCaseToExecute
[]
=
"case_to_execute"
;
static
constexpr
char
kOutputs
[]
=
"Out"
;
static
constexpr
char
kCases
[]
=
"cases"
;
static
constexpr
char
kCasesBlock
[]
=
"sub_block"
;
class
SelectOp
:
public
framework
::
OperatorBase
{
public:
SelectOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
private:
enum
class
SelectOpCaseType
{
DEFAULT
=
0
,
SEND
=
1
,
RECEIVE
=
2
,
};
struct
SelectOpCase
{
int
caseIndex
;
SelectOpCaseType
caseType
;
std
::
string
channelName
;
std
::
string
varName
;
SelectOpCase
()
{}
SelectOpCase
(
int
caseIndex
,
SelectOpCaseType
caseType
,
std
::
string
channelName
,
std
::
string
varName
)
:
caseIndex
(
caseIndex
),
caseType
(
caseType
),
channelName
(
channelName
),
varName
(
varName
)
{}
};
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
std
::
vector
<
std
::
string
>
casesConfigs
=
Attr
<
std
::
vector
<
std
::
string
>>
(
kCases
);
framework
::
BlockDesc
*
casesBlock
=
Attr
<
framework
::
BlockDesc
*>
(
kCasesBlock
);
framework
::
Scope
&
casesBlockScope
=
scope
.
NewScope
();
std
::
string
caseToExecuteVarName
=
Input
(
kCaseToExecute
);
framework
::
Variable
*
caseToExecuteVar
=
casesBlockScope
.
FindVar
(
caseToExecuteVarName
);
// Construct cases from "conditional_block_op"(s) in the casesBlock
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>
cases
=
ParseAndShuffleCases
(
&
casesConfigs
);
// Get all unique channels involved in select
std
::
set
<
framework
::
ChannelHolder
*>
channelsSet
;
for
(
auto
c
:
cases
)
{
if
(
!
c
->
channelName
.
empty
())
{
auto
channelVar
=
scope
.
FindVar
(
c
->
channelName
);
framework
::
ChannelHolder
*
ch
=
channelVar
->
GetMutable
<
framework
::
ChannelHolder
>
();
if
(
channelsSet
.
find
(
ch
)
==
channelsSet
.
end
())
{
channelsSet
.
insert
(
ch
);
}
}
}
// Order all channels by their pointer address
std
::
vector
<
framework
::
ChannelHolder
*>
channels
(
channelsSet
.
begin
(),
channelsSet
.
end
());
std
::
sort
(
channels
.
begin
(),
channels
.
end
());
// Poll all cases
int32_t
caseToExecute
=
pollCases
(
&
scope
,
&
cases
,
channels
);
// At this point, the case to execute has already been determined,
// so we can proceed with executing the cases block
framework
::
LoDTensor
*
caseToExecuteTensor
=
caseToExecuteVar
->
GetMutable
<
framework
::
LoDTensor
>
();
caseToExecuteTensor
->
data
<
int32_t
>
()[
0
]
=
caseToExecute
;
// Execute the cases block, only one case will be executed since we set the
// case_to_execute value to the index of the case we want to execute
framework
::
Executor
executor
(
dev_place
);
framework
::
ProgramDesc
*
program
=
casesBlock
->
Program
();
executor
.
Run
(
*
program
,
&
casesBlockScope
,
casesBlock
->
ID
(),
false
/*create_local_scope*/
);
}
/**
* Goes through all operators in the casesConfigs and processes
* "conditional_block" operators. These operators are mapped to our
* SelectOpCase objects. We randomize the case orders, and set the
* default case (if any exists) as the last case)
* @param casesBlock
* @return
*/
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>
ParseAndShuffleCases
(
std
::
vector
<
std
::
string
>
*
casesConfigs
)
const
{
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>
cases
;
std
::
shared_ptr
<
SelectOpCase
>
defaultCase
;
if
(
casesConfigs
!=
nullptr
)
{
boost
::
char_delimiters_separator
<
char
>
sep
(
false
,
","
,
""
);
for
(
std
::
vector
<
std
::
string
>::
iterator
itr
=
casesConfigs
->
begin
();
itr
<
casesConfigs
->
end
();
++
itr
)
{
std
::
string
caseConfig
=
*
itr
;
boost
::
tokenizer
<>
tokens
(
caseConfig
,
sep
);
boost
::
tokenizer
<>::
iterator
tok_iter
=
tokens
.
begin
();
PADDLE_ENFORCE
(
tok_iter
!=
tokens
.
end
(),
"Cannot get case index"
);
std
::
string
caseIndexString
=
*
tok_iter
;
int
caseIndex
=
std
::
stoi
(
caseIndexString
);
++
tok_iter
;
PADDLE_ENFORCE
(
tok_iter
!=
tokens
.
end
(),
"Cannot get case type"
);
std
::
string
caseTypeString
=
*
tok_iter
;
SelectOpCaseType
caseType
=
(
SelectOpCaseType
)
std
::
stoi
(
caseTypeString
);
std
::
string
caseChannel
;
std
::
string
caseChannelVar
;
++
tok_iter
;
if
(
caseType
!=
SelectOpCaseType
::
DEFAULT
)
{
PADDLE_ENFORCE
(
tok_iter
!=
tokens
.
end
(),
"Cannot get case channel"
);
caseChannel
=
*
tok_iter
;
++
tok_iter
;
PADDLE_ENFORCE
(
tok_iter
!=
tokens
.
end
(),
"Cannot get case channel variable"
);
caseChannelVar
=
*
tok_iter
;
}
auto
c
=
std
::
make_shared
<
SelectOpCase
>
(
caseIndex
,
caseType
,
caseChannel
,
caseChannelVar
);
if
(
caseType
==
SelectOpCaseType
::
DEFAULT
)
{
PADDLE_ENFORCE
(
defaultCase
==
nullptr
,
"Select can only contain one default case."
);
defaultCase
=
c
;
}
else
{
cases
.
push_back
(
c
);
}
}
}
// Randomly sort cases, with default case being last
std
::
random_shuffle
(
cases
.
begin
(),
cases
.
end
());
if
(
defaultCase
!=
nullptr
)
{
cases
.
push_back
(
defaultCase
);
}
return
cases
;
}
/**
* This method will recursively poll the cases and determines if any case
* condition is true.
* If none of the cases conditions are true (and there is no default case),
* then block
* the thread. The thread may be woken up by a channel operation, at which
* point we
* execute the case.
* @param scope
* @param cases
* @param channels
* @return
*/
int32_t
pollCases
(
const
framework
::
Scope
*
scope
,
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>
*
cases
,
std
::
vector
<
framework
::
ChannelHolder
*>
channels
)
const
{
// Lock all involved channels
lockChannels
(
channels
);
std
::
atomic
<
int
>
caseToExecute
(
-
1
);
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>::
iterator
it
=
cases
->
begin
();
while
(
it
!=
cases
->
end
())
{
std
::
shared_ptr
<
SelectOpCase
>
c
=
*
it
;
auto
chVar
=
scope
->
FindVar
(
c
->
channelName
);
framework
::
ChannelHolder
*
ch
=
chVar
->
GetMutable
<
framework
::
ChannelHolder
>
();
switch
(
c
->
caseType
)
{
case
SelectOpCaseType
::
SEND
:
PADDLE_ENFORCE
(
!
ch
->
IsClosed
(),
"Cannot send to a closed channel"
);
if
(
ch
->
CanSend
())
{
// We can send to channel directly, send the data to channel
// and execute case
auto
chVar
=
scope
->
FindVar
(
c
->
varName
);
concurrency
::
ChannelSend
(
ch
,
chVar
);
caseToExecute
=
c
->
caseIndex
;
}
break
;
case
SelectOpCaseType
::
RECEIVE
:
if
(
ch
->
CanReceive
())
{
// We can receive from channel directly, send the data to channel
// and execute case
auto
chVar
=
scope
->
FindVar
(
c
->
varName
);
concurrency
::
ChannelReceive
(
ch
,
chVar
);
caseToExecute
=
c
->
caseIndex
;
}
break
;
case
SelectOpCaseType
::
DEFAULT
:
caseToExecute
=
c
->
caseIndex
;
break
;
}
if
(
caseToExecute
!=
-
1
)
{
// We found a case to execute, stop looking at other case statements
break
;
}
++
it
;
}
if
(
caseToExecute
==
-
1
)
{
// None of the cases are eligible to execute, enqueue current thread
// into all the sending/receiving queue of each involved channel
std
::
atomic
<
bool
>
completed
(
false
);
std
::
recursive_mutex
mutex
;
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mutex
};
// std::condition_variable_any selectCond;
auto
selectCond
=
std
::
make_shared
<
std
::
condition_variable_any
>
();
std
::
recursive_mutex
callbackMutex
;
pushThreadOnChannelQueues
(
scope
,
cases
,
selectCond
,
&
caseToExecute
,
&
completed
,
&
callbackMutex
);
// TODO(thuan): Atomically unlock all channels and sleep current thread
unlockChannels
(
channels
);
selectCond
->
wait
(
lock
,
[
&
completed
]()
{
return
completed
.
load
();
});
// Select has been woken up by case operation
lockChannels
(
channels
);
removeThreadOnChannelQueues
(
scope
,
cases
);
if
(
caseToExecute
==
-
1
)
{
// Recursively poll cases, since we were woken up by a channel close
// TODO(thuan): Need to test if this is a valid case
unlockChannels
(
channels
);
return
pollCases
(
scope
,
cases
,
channels
);
}
}
// At this point, caseToExecute != -1, and we can proceed with executing
// the case block
unlockChannels
(
channels
);
return
caseToExecute
;
}
void
lockChannels
(
std
::
vector
<
framework
::
ChannelHolder
*>
chs
)
const
{
std
::
vector
<
framework
::
ChannelHolder
*>::
iterator
it
=
chs
.
begin
();
while
(
it
!=
chs
.
end
())
{
framework
::
ChannelHolder
*
ch
=
*
it
;
ch
->
Lock
();
++
it
;
}
}
void
unlockChannels
(
std
::
vector
<
framework
::
ChannelHolder
*>
chs
)
const
{
std
::
vector
<
framework
::
ChannelHolder
*>::
reverse_iterator
it
=
chs
.
rbegin
();
while
(
it
!=
chs
.
rend
())
{
framework
::
ChannelHolder
*
ch
=
*
it
;
ch
->
Unlock
();
++
it
;
}
}
void
pushThreadOnChannelQueues
(
const
framework
::
Scope
*
scope
,
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>
*
cases
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
rCond
,
std
::
atomic
<
int
>
*
caseToExecute
,
std
::
atomic
<
bool
>
*
completed
,
std
::
recursive_mutex
*
callbackMutex
)
const
{
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>::
iterator
it
=
cases
->
begin
();
while
(
it
!=
cases
->
end
())
{
std
::
shared_ptr
<
SelectOpCase
>
c
=
*
it
;
auto
chVar
=
scope
->
FindVar
(
c
->
channelName
);
framework
::
ChannelHolder
*
ch
=
chVar
->
GetMutable
<
framework
::
ChannelHolder
>
();
std
::
function
<
bool
(
framework
::
ChannelAction
channelAction
)
>
cb
=
[
&
caseToExecute
,
&
completed
,
&
callbackMutex
,
c
](
framework
::
ChannelAction
channelAction
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
*
callbackMutex
};
bool
canProcess
=
false
;
if
(
!
(
*
completed
))
{
// If the channel wasn't closed, we set the caseToExecute index
// as this current case
if
(
channelAction
!=
framework
::
ChannelAction
::
CLOSE
)
{
*
caseToExecute
=
c
->
caseIndex
;
}
// This will allow our conditional variable to break out of wait
*
completed
=
true
;
canProcess
=
true
;
}
return
canProcess
;
};
switch
(
c
->
caseType
)
{
case
SelectOpCaseType
::
SEND
:
{
auto
chOutputVar
=
scope
->
FindVar
(
c
->
varName
);
concurrency
::
ChannelAddToSendQ
(
ch
,
this
,
chOutputVar
,
rCond
,
cb
);
break
;
}
case
SelectOpCaseType
::
RECEIVE
:
{
auto
chOutputVar
=
scope
->
FindVar
(
c
->
varName
);
concurrency
::
ChannelAddToReceiveQ
(
ch
,
this
,
chOutputVar
,
rCond
,
cb
);
break
;
}
default:
break
;
}
++
it
;
}
}
void
removeThreadOnChannelQueues
(
const
framework
::
Scope
*
scope
,
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>
*
cases
)
const
{
std
::
vector
<
std
::
shared_ptr
<
SelectOpCase
>>::
iterator
it
=
cases
->
begin
();
while
(
it
!=
cases
->
end
())
{
std
::
shared_ptr
<
SelectOpCase
>
c
=
*
it
;
auto
chVar
=
scope
->
FindVar
(
c
->
channelName
);
framework
::
ChannelHolder
*
ch
=
chVar
->
GetMutable
<
framework
::
ChannelHolder
>
();
switch
(
c
->
caseType
)
{
case
SelectOpCaseType
::
SEND
:
{
ch
->
RemoveFromSendQ
(
this
);
break
;
}
case
SelectOpCaseType
::
RECEIVE
:
{
ch
->
RemoveFromReceiveQ
(
this
);
break
;
}
default:
break
;
}
++
it
;
}
}
};
class
SelectOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
kX
,
"A set of variables, which are required by operators inside the "
"cases of Select Op"
)
.
AsDuplicable
();
AddInput
(
kCaseToExecute
,
"(Int) The variable the sets the index of the case to execute, "
"after evaluating the channels being sent to and received from"
)
.
AsDuplicable
();
AddOutput
(
kOutputs
,
"A set of variables, which will be assigned with values "
"generated by the operators inside the cases of Select Op."
)
.
AsDuplicable
();
AddAttr
<
std
::
vector
<
std
::
string
>>
(
kCases
,
"(String vector) Serialized list of"
"all cases in the select op. Each"
"case is serialized as: "
"'<index>,<type>,<channel>,<value>'"
"where type is 0 for default, 1 for"
"send, and 2 for receive"
"No channel and values are needed for"
"default cases."
);
AddAttr
<
framework
::
BlockDesc
*>
(
kCasesBlock
,
"The cases block inside select_op"
);
AddComment
(
R"DOC(
)DOC"
);
}
};
// TODO(thuan): Implement Gradient Operator for SELECT_OP
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
select
,
paddle
::
operators
::
SelectOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
SelectOpMaker
);
paddle/fluid/operators/sum_op.h
浏览文件 @
4cc782db
...
...
@@ -32,7 +32,7 @@ class SumKernel : public framework::OpKernel<T> {
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
in_vars
=
context
.
MultiInputVar
(
"X"
);
int
N
=
in_vars
.
size
();
size_t
in_num
=
in_vars
.
size
();
auto
out_var
=
context
.
OutputVar
(
"Out"
);
bool
in_place
=
out_var
==
in_vars
[
0
];
...
...
@@ -53,7 +53,7 @@ class SumKernel : public framework::OpKernel<T> {
auto
&
place
=
*
context
.
template
device_context
<
DeviceContext
>().
eigen_device
();
// If in_place, just skip the first tensor
for
(
int
i
=
in_place
?
1
:
0
;
i
<
N
;
i
++
)
{
for
(
size_t
i
=
in_place
?
1
:
0
;
i
<
in_num
;
i
++
)
{
if
(
in_vars
[
i
]
->
IsType
<
framework
::
LoDTensor
>
())
{
auto
&
in_t
=
in_vars
[
i
]
->
Get
<
framework
::
LoDTensor
>
();
if
(
in_t
.
numel
()
==
0
)
{
...
...
@@ -101,13 +101,13 @@ class SumKernel : public framework::OpKernel<T> {
// Runtime InferShape
size_t
first_dim
=
0
;
for
(
int
i
=
0
;
i
<
N
;
i
++
)
{
for
(
size_t
i
=
0
;
i
<
in_num
;
i
++
)
{
auto
&
sel_row
=
get_selected_row
(
i
);
first_dim
+=
sel_row
.
rows
().
size
();
}
std
::
vector
<
int64_t
>
in_dim
;
for
(
int
i
=
0
;
i
<
N
;
i
++
)
{
for
(
size_t
i
=
0
;
i
<
in_num
;
i
++
)
{
auto
&
sel_row
=
get_selected_row
(
i
);
if
(
sel_row
.
rows
().
size
()
>
0
)
{
in_dim
=
framework
::
vectorize
(
sel_row
.
value
().
dims
());
...
...
@@ -116,7 +116,8 @@ class SumKernel : public framework::OpKernel<T> {
}
if
(
in_dim
.
empty
())
{
VLOG
(
3
)
<<
"WARNING: all the inputs are empty"
;
in_dim
=
framework
::
vectorize
(
get_selected_row
(
N
-
1
).
value
().
dims
());
in_dim
=
framework
::
vectorize
(
get_selected_row
(
in_num
-
1
).
value
().
dims
());
}
else
{
in_dim
[
0
]
=
static_cast
<
int64_t
>
(
first_dim
);
}
...
...
@@ -133,7 +134,7 @@ class SumKernel : public framework::OpKernel<T> {
math
::
SelectedRowsAddTo
<
DeviceContext
,
T
>
functor
;
int64_t
offset
=
0
;
for
(
int
i
=
0
;
i
<
N
;
i
++
)
{
for
(
size_t
i
=
0
;
i
<
in_num
;
i
++
)
{
auto
&
sel_row
=
get_selected_row
(
i
);
if
(
sel_row
.
rows
().
size
()
==
0
)
{
continue
;
...
...
paddle/fluid/operators/tensorrt_engine_op.cc
浏览文件 @
4cc782db
...
...
@@ -22,8 +22,6 @@
namespace
paddle
{
DEFINE_int32
(
tensorrt_engine_batch_size
,
1
,
"the batch_size of TensorRT"
);
DEFINE_int32
(
tensorrt_max_batch_size
,
1
,
"TensorRT maximum batch size"
);
DEFINE_int32
(
tensorrt_workspace_size
,
16
<<
20
,
"TensorRT workspace size"
);
namespace
operators
{
...
...
@@ -34,6 +32,8 @@ class TensorRTEngineOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"Ys"
,
"A list of outputs"
).
AsDuplicable
();
AddAttr
<
std
::
string
>
(
"subgraph"
,
"the subgraph."
);
AddAttr
<
std
::
string
>
(
"engine_uniq_key"
,
"unique key for the TRT engine."
);
AddAttr
<
int
>
(
"max_batch_size"
,
"the maximum batch size."
);
AddAttr
<
int
>
(
"workspace_size"
,
"the workspace size."
);
AddComment
(
"TensorRT engine operator."
);
}
};
...
...
paddle/fluid/operators/tensorrt_engine_op.h
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
paddle/fluid/operators/tensorrt_engine_op_test.cc
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
paddle/fluid/pybind/protobuf.cc
浏览文件 @
4cc782db
...
...
@@ -214,7 +214,6 @@ void BindVarDsec(pybind11::module *m) {
.
def
(
"set_shapes"
,
&
pd
::
VarDesc
::
SetShapes
)
.
def
(
"set_dtype"
,
&
pd
::
VarDesc
::
SetDataType
)
.
def
(
"set_dtypes"
,
&
pd
::
VarDesc
::
SetDataTypes
)
.
def
(
"set_capacity"
,
&
pd
::
VarDesc
::
SetCapacity
)
.
def
(
"shape"
,
&
pd
::
VarDesc
::
GetShape
,
pybind11
::
return_value_policy
::
reference
)
.
def
(
"shapes"
,
&
pd
::
VarDesc
::
GetShapes
,
...
...
@@ -251,7 +250,6 @@ void BindVarDsec(pybind11::module *m) {
.
value
(
"STEP_SCOPES"
,
pd
::
proto
::
VarType
::
STEP_SCOPES
)
.
value
(
"LOD_RANK_TABLE"
,
pd
::
proto
::
VarType
::
LOD_RANK_TABLE
)
.
value
(
"LOD_TENSOR_ARRAY"
,
pd
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
)
.
value
(
"CHANNEL"
,
pd
::
proto
::
VarType
::
CHANNEL
)
.
value
(
"PLACE_LIST"
,
pd
::
proto
::
VarType
::
PLACE_LIST
)
.
value
(
"READER"
,
pd
::
proto
::
VarType
::
READER
)
.
value
(
"RAW"
,
pd
::
proto
::
VarType
::
RAW
);
...
...
paddle/fluid/pybind/pybind.cc
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
paddle/legacy/trainer/tests/CMakeLists.txt
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
paddle/scripts/paddle_build.sh
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/dataset/common.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/clip.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/concurrency.py
已删除
100644 → 0
浏览文件 @
baa19fea
此差异已折叠。
点击以展开。
python/paddle/fluid/framework.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/control_flow.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/detection.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/metric_op.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/nn.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/ops.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/nets.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/book/high-level-api/recognize_digits/CMakeLists.txt
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/no_test_concurrency.py
已删除
100644 → 0
浏览文件 @
baa19fea
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/CMakeLists.txt
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_ctr.py
0 → 100644
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_ctr_reader.py
0 → 100644
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_mnist.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_se_resnext.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_simnet_bow.py
0 → 100644
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_text_classification.py
0 → 100644
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_transformer.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_word2vec.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_auc_op.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_conv2d_op.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_base.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/
notest_concurrency
.py
→
python/paddle/fluid/tests/
unittests/test_dist_ctr
.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_mnist.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_se_resnext.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_simnet_bow.py
0 → 100644
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_text_classification.py
0 → 100644
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_word2vec.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_layers.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/transpiler/distribute_transpiler.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
python/paddle/fluid/transpiler/memory_optimization_transpiler.py
浏览文件 @
4cc782db
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录