Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
88d3dc94
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 2 年 前同步成功
通知
2325
Star
20933
Fork
5424
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
88d3dc94
编写于
2月 14, 2019
作者:
Y
Yancey1989
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into refine_pg
test=develop
上级
f3463ecb
3a5d6e5e
变更
167
显示空白变更内容
内联
并排
Showing
167 changed file
with
4890 addition
and
1175 deletion
+4890
-1175
CMakeLists.txt
CMakeLists.txt
+6
-0
README.md
README.md
+1
-84
README_cn.md
README_cn.md
+88
-0
cmake/configure.cmake
cmake/configure.cmake
+6
-1
cmake/cuda.cmake
cmake/cuda.cmake
+19
-18
cmake/external/glog.cmake
cmake/external/glog.cmake
+3
-1
cmake/external/mkldnn.cmake
cmake/external/mkldnn.cmake
+2
-1
cmake/external/snappy.cmake
cmake/external/snappy.cmake
+7
-1
cmake/flags.cmake
cmake/flags.cmake
+4
-10
cmake/version.cmake
cmake/version.cmake
+17
-2
paddle/fluid/API.spec
paddle/fluid/API.spec
+12
-12
paddle/fluid/framework/CMakeLists.txt
paddle/fluid/framework/CMakeLists.txt
+13
-11
paddle/fluid/framework/async_executor.cc
paddle/fluid/framework/async_executor.cc
+1
-0
paddle/fluid/framework/details/CMakeLists.txt
paddle/fluid/framework/details/CMakeLists.txt
+5
-7
paddle/fluid/framework/details/build_strategy.cc
paddle/fluid/framework/details/build_strategy.cc
+31
-11
paddle/fluid/framework/details/build_strategy.h
paddle/fluid/framework/details/build_strategy.h
+4
-2
paddle/fluid/framework/details/computation_op_handle.h
paddle/fluid/framework/details/computation_op_handle.h
+1
-1
paddle/fluid/framework/details/fused_broadcast_op_handle_test.cc
...fluid/framework/details/fused_broadcast_op_handle_test.cc
+16
-15
paddle/fluid/framework/details/graph_test_base.h
paddle/fluid/framework/details/graph_test_base.h
+80
-0
paddle/fluid/framework/details/inplace_op_pass.cc
paddle/fluid/framework/details/inplace_op_pass.cc
+432
-0
paddle/fluid/framework/details/inplace_op_pass.h
paddle/fluid/framework/details/inplace_op_pass.h
+94
-0
paddle/fluid/framework/details/memory_early_delete_pass.cc
paddle/fluid/framework/details/memory_early_delete_pass.cc
+0
-117
paddle/fluid/framework/details/memory_optimize_helper.cc
paddle/fluid/framework/details/memory_optimize_helper.cc
+474
-0
paddle/fluid/framework/details/memory_optimize_helper.h
paddle/fluid/framework/details/memory_optimize_helper.h
+182
-0
paddle/fluid/framework/details/memory_optimize_helper_test.cc
...le/fluid/framework/details/memory_optimize_helper_test.cc
+78
-50
paddle/fluid/framework/details/memory_optimize_pass.cc
paddle/fluid/framework/details/memory_optimize_pass.cc
+327
-0
paddle/fluid/framework/details/memory_optimize_pass.h
paddle/fluid/framework/details/memory_optimize_pass.h
+71
-0
paddle/fluid/framework/details/memory_reuse_types.cc
paddle/fluid/framework/details/memory_reuse_types.cc
+0
-155
paddle/fluid/framework/details/memory_reuse_types_test.cc
paddle/fluid/framework/details/memory_reuse_types_test.cc
+0
-99
paddle/fluid/framework/details/op_registry.h
paddle/fluid/framework/details/op_registry.h
+18
-3
paddle/fluid/framework/details/parallel_ssa_graph_executor.cc
...le/fluid/framework/details/parallel_ssa_graph_executor.cc
+2
-2
paddle/fluid/framework/details/sequential_execution_pass.cc
paddle/fluid/framework/details/sequential_execution_pass.cc
+1
-0
paddle/fluid/framework/details/sequential_execution_pass.h
paddle/fluid/framework/details/sequential_execution_pass.h
+0
-2
paddle/fluid/framework/feed_fetch_method.cc
paddle/fluid/framework/feed_fetch_method.cc
+1
-0
paddle/fluid/framework/inplace_op_inference.h
paddle/fluid/framework/inplace_op_inference.h
+115
-0
paddle/fluid/framework/inplace_op_inference_test.cc
paddle/fluid/framework/inplace_op_inference_test.cc
+288
-0
paddle/fluid/framework/ir/graph.cc
paddle/fluid/framework/ir/graph.cc
+1
-1
paddle/fluid/framework/ir/graph.h
paddle/fluid/framework/ir/graph.h
+2
-1
paddle/fluid/framework/ir/graph_helper.cc
paddle/fluid/framework/ir/graph_helper.cc
+25
-6
paddle/fluid/framework/ir/graph_helper.h
paddle/fluid/framework/ir/graph_helper.h
+5
-0
paddle/fluid/framework/ir/graph_helper_test.cc
paddle/fluid/framework/ir/graph_helper_test.cc
+11
-0
paddle/fluid/framework/ir/graph_pattern_detector.cc
paddle/fluid/framework/ir/graph_pattern_detector.cc
+0
-5
paddle/fluid/framework/ir/infer_clean_graph_pass.cc
paddle/fluid/framework/ir/infer_clean_graph_pass.cc
+1
-0
paddle/fluid/framework/ir/seqpool_concat_fuse_pass_tester.cc
paddle/fluid/framework/ir/seqpool_concat_fuse_pass_tester.cc
+1
-1
paddle/fluid/framework/op_info.h
paddle/fluid/framework/op_info.h
+1
-0
paddle/fluid/framework/operator.cc
paddle/fluid/framework/operator.cc
+10
-8
paddle/fluid/framework/operator.h
paddle/fluid/framework/operator.h
+1
-6
paddle/fluid/framework/parallel_executor.cc
paddle/fluid/framework/parallel_executor.cc
+2
-9
paddle/fluid/framework/type_defs.h
paddle/fluid/framework/type_defs.h
+3
-0
paddle/fluid/imperative/CMakeLists.txt
paddle/fluid/imperative/CMakeLists.txt
+2
-2
paddle/fluid/inference/CMakeLists.txt
paddle/fluid/inference/CMakeLists.txt
+2
-1
paddle/fluid/inference/analysis/ir_pass_manager.cc
paddle/fluid/inference/analysis/ir_pass_manager.cc
+1
-1
paddle/fluid/inference/analysis/ir_passes/CMakeLists.txt
paddle/fluid/inference/analysis/ir_passes/CMakeLists.txt
+3
-0
paddle/fluid/inference/analysis/passes/memory_optimize_pass.cc
...e/fluid/inference/analysis/passes/memory_optimize_pass.cc
+6
-1
paddle/fluid/inference/api/CMakeLists.txt
paddle/fluid/inference/api/CMakeLists.txt
+2
-2
paddle/fluid/inference/api/analysis_predictor.cc
paddle/fluid/inference/api/analysis_predictor.cc
+1
-1
paddle/fluid/inference/api/paddle_api.h
paddle/fluid/inference/api/paddle_api.h
+47
-15
paddle/fluid/inference/api/paddle_pass_builder.cc
paddle/fluid/inference/api/paddle_pass_builder.cc
+46
-0
paddle/fluid/inference/api/paddle_pass_builder.h
paddle/fluid/inference/api/paddle_pass_builder.h
+2
-45
paddle/fluid/inference/utils/benchmark_tester.cc
paddle/fluid/inference/utils/benchmark_tester.cc
+2
-2
paddle/fluid/memory/allocation/allocator_facade.cc
paddle/fluid/memory/allocation/allocator_facade.cc
+1
-1
paddle/fluid/memory/allocation/best_fit_allocator.cc
paddle/fluid/memory/allocation/best_fit_allocator.cc
+2
-0
paddle/fluid/memory/allocation/legacy_allocator.cc
paddle/fluid/memory/allocation/legacy_allocator.cc
+17
-16
paddle/fluid/memory/allocation/pinned_allocator.cc
paddle/fluid/memory/allocation/pinned_allocator.cc
+1
-1
paddle/fluid/memory/allocation/pinned_allocator.h
paddle/fluid/memory/allocation/pinned_allocator.h
+1
-1
paddle/fluid/memory/detail/system_allocator.cc
paddle/fluid/memory/detail/system_allocator.cc
+2
-2
paddle/fluid/operators/activation_op.cc
paddle/fluid/operators/activation_op.cc
+13
-11
paddle/fluid/operators/batch_norm_op.cc
paddle/fluid/operators/batch_norm_op.cc
+37
-2
paddle/fluid/operators/conv_op.cc
paddle/fluid/operators/conv_op.cc
+2
-2
paddle/fluid/operators/detection/box_coder_op.cc
paddle/fluid/operators/detection/box_coder_op.cc
+6
-14
paddle/fluid/operators/detection/box_coder_op.cu
paddle/fluid/operators/detection/box_coder_op.cu
+2
-8
paddle/fluid/operators/detection/box_coder_op.h
paddle/fluid/operators/detection/box_coder_op.h
+44
-33
paddle/fluid/operators/detection/density_prior_box_op.h
paddle/fluid/operators/detection/density_prior_box_op.h
+38
-26
paddle/fluid/operators/elementwise/elementwise_add_op.cc
paddle/fluid/operators/elementwise/elementwise_add_op.cc
+1
-0
paddle/fluid/operators/elementwise/elementwise_op.h
paddle/fluid/operators/elementwise/elementwise_op.h
+35
-2
paddle/fluid/operators/expand_op.cc
paddle/fluid/operators/expand_op.cc
+6
-2
paddle/fluid/operators/expand_op.cu
paddle/fluid/operators/expand_op.cu
+6
-2
paddle/fluid/operators/flatten_op.cc
paddle/fluid/operators/flatten_op.cc
+36
-4
paddle/fluid/operators/jit/gen/act.h
paddle/fluid/operators/jit/gen/act.h
+2
-3
paddle/fluid/operators/jit/gen/blas.h
paddle/fluid/operators/jit/gen/blas.h
+2
-2
paddle/fluid/operators/jit/gen/gru.h
paddle/fluid/operators/jit/gen/gru.h
+2
-2
paddle/fluid/operators/jit/gen/hopv.h
paddle/fluid/operators/jit/gen/hopv.h
+2
-2
paddle/fluid/operators/jit/gen/jitcode.h
paddle/fluid/operators/jit/gen/jitcode.h
+2
-2
paddle/fluid/operators/jit/gen/lstm.h
paddle/fluid/operators/jit/gen/lstm.h
+2
-2
paddle/fluid/operators/jit/gen/matmul.h
paddle/fluid/operators/jit/gen/matmul.h
+2
-2
paddle/fluid/operators/jit/gen/seqpool.h
paddle/fluid/operators/jit/gen/seqpool.h
+2
-2
paddle/fluid/operators/jit/gen_base.cc
paddle/fluid/operators/jit/gen_base.cc
+17
-0
paddle/fluid/operators/jit/gen_base.h
paddle/fluid/operators/jit/gen_base.h
+7
-1
paddle/fluid/operators/lookup_table_op.h
paddle/fluid/operators/lookup_table_op.h
+10
-4
paddle/fluid/operators/math/CMakeLists.txt
paddle/fluid/operators/math/CMakeLists.txt
+1
-1
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
+1
-1
paddle/fluid/operators/ngraph/ngraph_bridge.cc
paddle/fluid/operators/ngraph/ngraph_bridge.cc
+7
-0
paddle/fluid/operators/ngraph/ngraph_engine_op.h
paddle/fluid/operators/ngraph/ngraph_engine_op.h
+1
-1
paddle/fluid/operators/ngraph/ngraph_ops.h
paddle/fluid/operators/ngraph/ngraph_ops.h
+5
-1
paddle/fluid/operators/ngraph/ops/accuracy_op.h
paddle/fluid/operators/ngraph/ops/accuracy_op.h
+65
-0
paddle/fluid/operators/ngraph/ops/activation_op.h
paddle/fluid/operators/ngraph/ops/activation_op.h
+52
-0
paddle/fluid/operators/ngraph/ops/batch_norm_op.h
paddle/fluid/operators/ngraph/ops/batch_norm_op.h
+150
-0
paddle/fluid/operators/ngraph/ops/binary_unary_op.h
paddle/fluid/operators/ngraph/ops/binary_unary_op.h
+0
-0
paddle/fluid/operators/ngraph/ops/sum_op.h
paddle/fluid/operators/ngraph/ops/sum_op.h
+55
-0
paddle/fluid/operators/ngraph/ops/top_k_op.h
paddle/fluid/operators/ngraph/ops/top_k_op.h
+0
-5
paddle/fluid/operators/norm_op.h
paddle/fluid/operators/norm_op.h
+2
-3
paddle/fluid/operators/pool_op.cc
paddle/fluid/operators/pool_op.cc
+4
-4
paddle/fluid/operators/random_crop_op.h
paddle/fluid/operators/random_crop_op.h
+1
-1
paddle/fluid/operators/reader/buffered_reader.cc
paddle/fluid/operators/reader/buffered_reader.cc
+55
-1
paddle/fluid/operators/reader/buffered_reader.h
paddle/fluid/operators/reader/buffered_reader.h
+8
-0
paddle/fluid/operators/reader/ctr_reader.cc
paddle/fluid/operators/reader/ctr_reader.cc
+2
-2
paddle/fluid/operators/reader/ctr_reader_test.cc
paddle/fluid/operators/reader/ctr_reader_test.cc
+1
-1
paddle/fluid/operators/reduce_ops/CMakeLists.txt
paddle/fluid/operators/reduce_ops/CMakeLists.txt
+5
-1
paddle/fluid/operators/reshape_op.cc
paddle/fluid/operators/reshape_op.cc
+36
-4
paddle/fluid/operators/scale_op.cc
paddle/fluid/operators/scale_op.cc
+2
-1
paddle/fluid/operators/softmax_op.cc
paddle/fluid/operators/softmax_op.cc
+15
-0
paddle/fluid/platform/CMakeLists.txt
paddle/fluid/platform/CMakeLists.txt
+2
-2
paddle/fluid/platform/cuda_device_function.h
paddle/fluid/platform/cuda_device_function.h
+6
-4
paddle/fluid/platform/ngraph_helper.h
paddle/fluid/platform/ngraph_helper.h
+44
-13
paddle/fluid/pybind/CMakeLists.txt
paddle/fluid/pybind/CMakeLists.txt
+1
-1
paddle/fluid/pybind/inference_api.cc
paddle/fluid/pybind/inference_api.cc
+2
-2
paddle/fluid/pybind/pybind.cc
paddle/fluid/pybind/pybind.cc
+10
-3
paddle/scripts/fast_install.sh
paddle/scripts/fast_install.sh
+923
-0
python/CMakeLists.txt
python/CMakeLists.txt
+1
-1
python/paddle/__init__.py
python/paddle/__init__.py
+1
-0
python/paddle/distributed/__init__.py
python/paddle/distributed/__init__.py
+13
-0
python/paddle/distributed/launch.py
python/paddle/distributed/launch.py
+22
-16
python/paddle/fluid/__init__.py
python/paddle/fluid/__init__.py
+2
-2
python/paddle/fluid/compiler.py
python/paddle/fluid/compiler.py
+5
-0
python/paddle/fluid/contrib/decoder/beam_search_decoder.py
python/paddle/fluid/contrib/decoder/beam_search_decoder.py
+3
-3
python/paddle/fluid/contrib/inferencer.py
python/paddle/fluid/contrib/inferencer.py
+2
-2
python/paddle/fluid/contrib/trainer.py
python/paddle/fluid/contrib/trainer.py
+2
-2
python/paddle/fluid/executor.py
python/paddle/fluid/executor.py
+2
-2
python/paddle/fluid/framework.py
python/paddle/fluid/framework.py
+21
-8
python/paddle/fluid/imperative/base.py
python/paddle/fluid/imperative/base.py
+2
-2
python/paddle/fluid/initializer.py
python/paddle/fluid/initializer.py
+2
-2
python/paddle/fluid/io.py
python/paddle/fluid/io.py
+8
-0
python/paddle/fluid/layer_helper.py
python/paddle/fluid/layer_helper.py
+2
-1
python/paddle/fluid/layers/control_flow.py
python/paddle/fluid/layers/control_flow.py
+2
-2
python/paddle/fluid/layers/detection.py
python/paddle/fluid/layers/detection.py
+4
-4
python/paddle/fluid/layers/io.py
python/paddle/fluid/layers/io.py
+2
-2
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+1
-0
python/paddle/fluid/optimizer.py
python/paddle/fluid/optimizer.py
+2
-2
python/paddle/fluid/parallel_executor.py
python/paddle/fluid/parallel_executor.py
+4
-0
python/paddle/fluid/profiler.py
python/paddle/fluid/profiler.py
+3
-3
python/paddle/fluid/recordio_writer.py
python/paddle/fluid/recordio_writer.py
+2
-2
python/paddle/fluid/tests/unittests/CMakeLists.txt
python/paddle/fluid/tests/unittests/CMakeLists.txt
+5
-0
python/paddle/fluid/tests/unittests/ngraph/test_accuracy_ngraph_op.py
...e/fluid/tests/unittests/ngraph/test_accuracy_ngraph_op.py
+53
-0
python/paddle/fluid/tests/unittests/ngraph/test_activation_ngraph_op.py
...fluid/tests/unittests/ngraph/test_activation_ngraph_op.py
+1
-11
python/paddle/fluid/tests/unittests/ngraph/test_batch_norm_ngraph_op.py
...fluid/tests/unittests/ngraph/test_batch_norm_ngraph_op.py
+37
-0
python/paddle/fluid/tests/unittests/ngraph/test_conv2d_ngraph_op.py
...dle/fluid/tests/unittests/ngraph/test_conv2d_ngraph_op.py
+25
-1
python/paddle/fluid/tests/unittests/ngraph/test_elementwise_add_ngraph_op.py
.../tests/unittests/ngraph/test_elementwise_add_ngraph_op.py
+5
-62
python/paddle/fluid/tests/unittests/ngraph/test_mean_ngraph_op.py
...addle/fluid/tests/unittests/ngraph/test_mean_ngraph_op.py
+2
-6
python/paddle/fluid/tests/unittests/ngraph/test_mul_ngraph_op.py
...paddle/fluid/tests/unittests/ngraph/test_mul_ngraph_op.py
+25
-14
python/paddle/fluid/tests/unittests/ngraph/test_pool2d_ngraph_op.py
...dle/fluid/tests/unittests/ngraph/test_pool2d_ngraph_op.py
+25
-1
python/paddle/fluid/tests/unittests/ngraph/test_scale_ngraph_op.py
...ddle/fluid/tests/unittests/ngraph/test_scale_ngraph_op.py
+8
-10
python/paddle/fluid/tests/unittests/ngraph/test_sum_ngraph_op.py
...paddle/fluid/tests/unittests/ngraph/test_sum_ngraph_op.py
+19
-0
python/paddle/fluid/tests/unittests/ngraph/test_top_k_ngraph_op.py
...ddle/fluid/tests/unittests/ngraph/test_top_k_ngraph_op.py
+4
-0
python/paddle/fluid/tests/unittests/parallel_executor_test_base.py
...ddle/fluid/tests/unittests/parallel_executor_test_base.py
+58
-56
python/paddle/fluid/tests/unittests/test_box_coder_op.py
python/paddle/fluid/tests/unittests/test_box_coder_op.py
+4
-29
python/paddle/fluid/tests/unittests/test_eager_deletion_transformer.py
.../fluid/tests/unittests/test_eager_deletion_transformer.py
+3
-5
python/paddle/fluid/tests/unittests/test_expand_op.py
python/paddle/fluid/tests/unittests/test_expand_op.py
+27
-0
python/paddle/fluid/tests/unittests/test_inference_model_io.py
...n/paddle/fluid/tests/unittests/test_inference_model_io.py
+27
-0
python/paddle/fluid/tests/unittests/test_ir_inplace_pass.py
python/paddle/fluid/tests/unittests/test_ir_inplace_pass.py
+76
-0
python/paddle/fluid/tests/unittests/test_parallel_executor_seresnext.py
...fluid/tests/unittests/test_parallel_executor_seresnext.py
+7
-7
python/paddle/fluid/tests/unittests/test_parallel_executor_transformer.py
...uid/tests/unittests/test_parallel_executor_transformer.py
+1
-1
python/paddle/fluid/tests/unittests/transformer_model.py
python/paddle/fluid/tests/unittests/transformer_model.py
+2
-1
python/paddle/fluid/transpiler/memory_optimization_transpiler.py
...paddle/fluid/transpiler/memory_optimization_transpiler.py
+2
-0
python/paddle/fluid/unique_name.py
python/paddle/fluid/unique_name.py
+2
-2
python/paddle/fluid/wrapped_decorator.py
python/paddle/fluid/wrapped_decorator.py
+30
-0
python/requirements.txt
python/requirements.txt
+1
-0
python/setup.py.in
python/setup.py.in
+1
-0
未找到文件。
CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -25,12 +25,18 @@ message(STATUS "CXX compiler: ${CMAKE_CXX_COMPILER}, version: "
message
(
STATUS
"C compiler:
${
CMAKE_C_COMPILER
}
, version: "
"
${
CMAKE_C_COMPILER_ID
}
${
CMAKE_C_COMPILER_VERSION
}
"
)
if
(
WIN32
)
set
(
CMAKE_SUPPRESS_REGENERATION ON
)
set
(
CMAKE_STATIC_LIBRARY_PREFIX lib
)
add_definitions
(
"/DGOOGLE_GLOG_DLL_DECL="
)
set
(
CMAKE_C_FLAGS_DEBUG
"
${
CMAKE_C_FLAGS_DEBUG
}
/bigobj /MTd"
)
set
(
CMAKE_C_FLAGS_RELEASE
"
${
CMAKE_C_FLAGS_RELEASE
}
/bigobj /MT"
)
set
(
CMAKE_CXX_FLAGS_DEBUG
"
${
CMAKE_CXX_FLAGS_DEBUG
}
/bigobj /MTd"
)
set
(
CMAKE_CXX_FLAGS_RELEASE
"
${
CMAKE_CXX_FLAGS_RELEASE
}
/bigobj /MT"
)
add_compile_options
(
/wd4068 /wd4129 /wd4244 /wd4267 /wd4297 /wd4530 /wd4577 /wd4819 /wd4838
)
set
(
PADDLE_LINK_FLAGS
"/IGNORE:4006 /IGNORE:4098 /IGNORE:4217 /IGNORE:4221"
)
set
(
CMAKE_STATIC_LINKER_FLAGS
"
${
CMAKE_STATIC_LINKER_FLAGS
}
${
PADDLE_LINK_FLAGS
}
"
)
set
(
CMAKE_SHARED_LINKER_FLAGS
"
${
CMAKE_SHARED_LINKER_FLAGS
}
${
PADDLE_LINK_FLAGS
}
"
)
set
(
CMAKE_EXE_LINKER_FLAGS
"
${
CMAKE_EXE_LINKER_FLAGS
}
${
PADDLE_LINK_FLAGS
}
"
)
endif
(
WIN32
)
find_package
(
CUDA QUIET
)
...
...
README.md
浏览文件 @
88d3dc94
# PaddlePaddle
English |
[
简体中文
](
./README_cn.md
)
[

](https://travis-ci.org/PaddlePaddle/Paddle)
[

](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html)
...
...
@@ -7,7 +8,6 @@
[

](https://github.com/PaddlePaddle/Paddle/releases)
[

](LICENSE)
Welcome to the PaddlePaddle GitHub.
PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use,
...
...
@@ -18,16 +18,6 @@ learning to many products at Baidu.
Our vision is to enable deep learning for everyone via PaddlePaddle.
Please refer to our
[
release announcement
](
https://github.com/PaddlePaddle/Paddle/releases
)
to track the latest feature of PaddlePaddle.
欢迎来到 PaddlePaddle GitHub
PaddlePaddle (PArallel Distributed Deep LEarning) 是一个简单易用、高效灵活、可扩展的深度学习平台,最初由百度科学家和工程师共同开发,目的是将深度学习技术应用到百度的众多产品中。
我们的愿景是让每个人都能通过PaddlePaddle接触深度学习
跟进PaddlePaddle最新特性请参考我们的
[
版本说明
](
https://github.com/PaddlePaddle/Paddle/releases
)
### Latest PaddlePaddle Release: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2)
### Install Latest Stable Release:
```
...
...
@@ -43,23 +33,6 @@ pip install paddlepaddle-gpu==1.2.0.post85
# For installation on other platform, refer to http://paddlepaddle.org/
```
### PaddlePaddle最新版本: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2)
### 安装最新稳定版本:
```
# Linux CPU
pip install paddlepaddle
# Linux GPU cuda9cudnn7
pip install paddlepaddle-gpu
# Linux GPU cuda8cudnn7
pip install paddlepaddle-gpu==1.2.0.post87
# Linux GPU cuda8cudnn5
pip install paddlepaddle-gpu==1.2.0.post85
# 其他平台上的安装指引请参考 http://paddlepaddle.org/
```
## Features
-
**Flexibility**
...
...
@@ -100,38 +73,10 @@ pip install paddlepaddle-gpu==1.2.0.post85
Baidu and it has achieved a significant impact. We hope you can also explore
the capability of PaddlePaddle to make an impact on your product.
## 特点
-
**灵活性**
PaddlePaddle支持丰富的神经网络架构和优化算法。易于配置复杂模型,例如带有注意力机制或复杂记忆连接的神经网络机器翻译模型。
-
**高效性**
为了高效使用异步计算资源,PaddlePaddle对框架的不同层进行优化,包括计算、存储、架构和通信。下面是一些样例:
- 通过SSE/AVX 内置函数、BLAS库(例如MKL、OpenBLAS、cuBLAS)或定制的CPU/GPU内核优化数学操作。
- 通过MKL-DNN库优化CNN网络
- 高度优化循环网络,无需执行 `padding` 操作即可处理 **变长** 序列
- 针对高维稀疏数据模型,优化了局部和分布式训练。
-
**稳定性**
有了 PaddlePaddle,使得利用各种CPU/GPU和机器来加速训练变得简单。PaddlePaddle 通过优化通信可以实现巨大吞吐量和快速执行。
-
**连接产品**
另外,PaddlePaddle 的设计也易于部署。在百度,PaddlePaddle 已经部署到含有巨大用户量的产品和服务上,包括广告点击率(CTR)预测、大规模图像分类、光学字符识别(OCR)、搜索排序,计算机病毒检测、推荐系统等等。PaddlePaddle广泛应用于百度产品中,产生了非常重要的影响。我们希望您也能探索 PaddlePaddle 的能力,为您的产品创造新的影响力和效果。
## Installation
It is recommended to read
[
this doc
](
http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html
)
on our website.
## 安装
推荐阅读官网上的
[
安装说明
](
http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html
)
## Documentation
We provide
[
English
](
http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html
)
and
...
...
@@ -153,37 +98,9 @@ We provide [English](http://paddlepaddle.org/documentation/docs/en/1.2/getstarte
We appreciate your contributions!
## 文档
我们提供
[
英文
](
http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html
)
和
[
中文
](
http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html
)
文档
-
[
深度学习101
](
https://github.com/PaddlePaddle/book
)
或许您想从这个在线交互式书籍开始,可以在Jupyter Notebook中运行
-
[
分布式训练
](
http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/training/cluster_howto.html
)
可以在MPI集群上运行分布式训练任务
-
[
Python API
](
http://paddlepaddle.org/documentation/docs/zh/1.2/api_cn/index_cn.html
)
新的API支持代码更少更简洁的程序
-
[
贡献方式
](
http://paddlepaddle.org/documentation/docs/zh/1.2/advanced_usage/development/contribute_to_paddle/index_cn.html
)
欢迎您的贡献!
## Ask Questions
You are welcome to submit questions and bug reports as
[
Github Issues
](
https://github.com/PaddlePaddle/Paddle/issues
)
.
## 答疑
欢迎您将问题和bug报告以
[
Github Issues
](
https://github.com/PaddlePaddle/Paddle/issues
)
的形式提交
## Copyright and License
PaddlePaddle is provided under the
[
Apache-2.0 license
](
LICENSE
)
.
## 版权和许可证
PaddlePaddle由
[
Apache-2.0 license
](
LICENSE
)
提供
README_cn.md
0 → 100644
浏览文件 @
88d3dc94
# PaddlePaddle
[
English
](
./README.md
)
| 简体中文
[

](https://travis-ci.org/PaddlePaddle/Paddle)
[

](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html)
[

](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html)
[

](https://github.com/PaddlePaddle/Paddle/releases)
[

](LICENSE)
欢迎来到 PaddlePaddle GitHub
PaddlePaddle (PArallel Distributed Deep LEarning) 是一个简单易用、高效灵活、可扩展的深度学习平台,最初由百度科学家和工程师共同开发,目的是将深度学习技术应用到百度的众多产品中。
我们的愿景是让每个人都能通过PaddlePaddle接触深度学习
跟进PaddlePaddle最新特性请参考我们的
[
版本说明
](
https://github.com/PaddlePaddle/Paddle/releases
)
### PaddlePaddle最新版本: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2)
### 安装最新稳定版本:
```
# Linux CPU
pip install paddlepaddle
# Linux GPU cuda9cudnn7
pip install paddlepaddle-gpu
# Linux GPU cuda8cudnn7
pip install paddlepaddle-gpu==1.2.0.post87
# Linux GPU cuda8cudnn5
pip install paddlepaddle-gpu==1.2.0.post85
# 其他平台上的安装指引请参考 http://paddlepaddle.org/
```
## 特性
-
**灵活性**
PaddlePaddle支持丰富的神经网络架构和优化算法。易于配置复杂模型,例如带有注意力机制或复杂记忆连接的神经网络机器翻译模型。
-
**高效性**
为了高效使用异步计算资源,PaddlePaddle对框架的不同层进行优化,包括计算、存储、架构和通信。下面是一些样例:
- 通过SSE/AVX 内置函数、BLAS库(例如MKL、OpenBLAS、cuBLAS)或定制的CPU/GPU内核优化数学操作。
- 通过MKL-DNN库优化CNN网络
- 高度优化循环网络,无需执行 `padding` 操作即可处理 **变长** 序列
- 针对高维稀疏数据模型,优化了局部和分布式训练。
-
**稳定性**
有了 PaddlePaddle,使得利用各种CPU/GPU和机器来加速训练变得简单。PaddlePaddle 通过优化通信可以实现巨大吞吐量和快速执行。
-
**与产品相连**
另外,PaddlePaddle 的设计也易于部署。在百度,PaddlePaddle 已经部署到含有巨大用户量的产品和服务上,包括广告点击率(CTR)预测、大规模图像分类、光学字符识别(OCR)、搜索排序,计算机病毒检测、推荐系统等等。PaddlePaddle广泛应用于百度产品中,产生了非常重要的影响。我们希望您也能探索 PaddlePaddle 的能力,为您的产品创造新的影响力和效果。
## 安装
推荐阅读官网上的
[
安装说明
](
http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html
)
## 文档
我们提供
[
英文
](
http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html
)
和
[
中文
](
http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html
)
文档
-
[
深度学习101
](
https://github.com/PaddlePaddle/book
)
或许您想从这个在线交互式书籍开始,可以在Jupyter Notebook中运行
-
[
分布式训练
](
http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/training/cluster_howto.html
)
可以在MPI集群上运行分布式训练任务
-
[
Python API
](
http://paddlepaddle.org/documentation/docs/zh/1.2/api_cn/index_cn.html
)
新的API支持代码更少更简洁的程序
-
[
贡献方式
](
http://paddlepaddle.org/documentation/docs/zh/1.2/advanced_usage/development/contribute_to_paddle/index_cn.html
)
欢迎您的贡献!
## 答疑
欢迎您将问题和bug报告以
[
Github Issues
](
https://github.com/PaddlePaddle/Paddle/issues
)
的形式提交
## 版权和许可证
PaddlePaddle由
[
Apache-2.0 license
](
LICENSE
)
提供
cmake/configure.cmake
浏览文件 @
88d3dc94
...
...
@@ -152,7 +152,12 @@ endif()
if
(
WITH_MKLML AND MKLML_IOMP_LIB
)
message
(
STATUS
"Enable Intel OpenMP with
${
MKLML_IOMP_LIB
}
"
)
if
(
WIN32
)
# openmp not support well for now on windows
set
(
OPENMP_FLAGS
""
)
else
(
WIN32
)
set
(
OPENMP_FLAGS
"-fopenmp"
)
endif
(
WIN32
)
set
(
CMAKE_C_CREATE_SHARED_LIBRARY_FORBIDDEN_FLAGS
${
OPENMP_FLAGS
}
)
set
(
CMAKE_CXX_CREATE_SHARED_LIBRARY_FORBIDDEN_FLAGS
${
OPENMP_FLAGS
}
)
set
(
CMAKE_C_FLAGS
"
${
CMAKE_C_FLAGS
}
${
OPENMP_FLAGS
}
"
)
...
...
cmake/cuda.cmake
浏览文件 @
88d3dc94
...
...
@@ -203,25 +203,26 @@ list(APPEND CUDA_NVCC_FLAGS "-w")
list
(
APPEND CUDA_NVCC_FLAGS
"--expt-relaxed-constexpr"
)
if
(
NOT WIN32
)
if
(
CMAKE_BUILD_TYPE STREQUAL
"Debug"
)
if
(
CMAKE_BUILD_TYPE STREQUAL
"Debug"
)
list
(
APPEND CUDA_NVCC_FLAGS
${
CMAKE_CXX_FLAGS_DEBUG
}
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"Release"
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"Release"
)
list
(
APPEND CUDA_NVCC_FLAGS
${
CMAKE_CXX_FLAGS_RELEASE
}
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"RelWithDebInfo"
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"RelWithDebInfo"
)
list
(
APPEND CUDA_NVCC_FLAGS
${
CMAKE_CXX_FLAGS_RELWITHDEBINFO
}
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"MinSizeRel"
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"MinSizeRel"
)
# nvcc 9 does not support -Os. Use Release flags instead
list
(
APPEND CUDA_NVCC_FLAGS
${
CMAKE_CXX_FLAGS_RELEASE
}
)
endif
()
endif
()
else
(
NOT WIN32
)
list
(
APPEND CUDA_NVCC_FLAGS
"--compiler-options;/bigobj"
)
if
(
CMAKE_BUILD_TYPE STREQUAL
"Debug"
)
list
(
APPEND CUDA_NVCC_FLAGS
"-Xcompiler
\"
/wd 4244 /wd 4267 /wd 4819
\"
"
)
list
(
APPEND CUDA_NVCC_FLAGS
"--compiler-options;/bigobj"
)
if
(
CMAKE_BUILD_TYPE STREQUAL
"Debug"
)
list
(
APPEND CUDA_NVCC_FLAGS
"-g -G"
)
# match the cl's _ITERATOR_DEBUG_LEVEL
list
(
APPEND CUDA_NVCC_FLAGS
"-D_DEBUG"
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"Release"
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"Release"
)
list
(
APPEND CUDA_NVCC_FLAGS
"-O3 -DNDEBUG"
)
else
()
else
()
message
(
FATAL
"Windows only support Release or Debug build now. Please set visual studio build type to Release/Debug, x64 build."
)
endif
()
endif
(
NOT WIN32
)
...
...
cmake/external/glog.cmake
浏览文件 @
88d3dc94
...
...
@@ -20,8 +20,10 @@ SET(GLOG_INCLUDE_DIR "${GLOG_INSTALL_DIR}/include" CACHE PATH "glog include dire
IF
(
WIN32
)
SET
(
GLOG_LIBRARIES
"
${
GLOG_INSTALL_DIR
}
/lib/libglog.lib"
CACHE FILEPATH
"glog library."
FORCE
)
SET
(
GLOG_CMAKE_CXX_FLAGS
"
${
CMAKE_CXX_FLAGS
}
/wd4267 /wd4530"
)
ELSE
(
WIN32
)
SET
(
GLOG_LIBRARIES
"
${
GLOG_INSTALL_DIR
}
/lib/libglog.a"
CACHE FILEPATH
"glog library."
FORCE
)
SET
(
GLOG_CMAKE_CXX_FLAGS
${
CMAKE_CXX_FLAGS
}
)
ENDIF
(
WIN32
)
INCLUDE_DIRECTORIES
(
${
GLOG_INCLUDE_DIR
}
)
...
...
@@ -39,7 +41,7 @@ ExternalProject_Add(
UPDATE_COMMAND
""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=
${
CMAKE_CXX_COMPILER
}
-DCMAKE_C_COMPILER=
${
CMAKE_C_COMPILER
}
-DCMAKE_CXX_FLAGS=
${
CMAKE_CXX_FLAGS
}
-DCMAKE_CXX_FLAGS=
${
GLOG_
CMAKE_CXX_FLAGS
}
-DCMAKE_CXX_FLAGS_RELEASE=
${
CMAKE_CXX_FLAGS_RELEASE
}
-DCMAKE_CXX_FLAGS_DEBUG=
${
CMAKE_CXX_FLAGS_DEBUG
}
-DCMAKE_C_FLAGS=
${
CMAKE_C_FLAGS
}
...
...
cmake/external/mkldnn.cmake
浏览文件 @
88d3dc94
...
...
@@ -49,6 +49,8 @@ IF(NOT WIN32)
SET
(
MKLDNN_FLAG
"
${
MKLDNN_FLAG
}
-Wno-unused-result -Wno-unused-value"
)
SET
(
MKLDNN_CFLAG
"
${
CMAKE_C_FLAGS
}
${
MKLDNN_FLAG
}
"
)
SET
(
MKLDNN_CXXFLAG
"
${
CMAKE_CXX_FLAGS
}
${
MKLDNN_FLAG
}
"
)
ELSE
()
SET
(
MKLDNN_CXXFLAG
"
${
CMAKE_CXX_FLAGS
}
/EHsc"
)
ENDIF
(
NOT WIN32
)
ExternalProject_Add
(
...
...
@@ -61,7 +63,6 @@ ExternalProject_Add(
UPDATE_COMMAND
""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=
${
CMAKE_CXX_COMPILER
}
CMAKE_ARGS -DCMAKE_C_COMPILER=
${
CMAKE_C_COMPILER
}
CMAKE_ARGS -DCMAKE_CXX_FLAGS=
${
CMAKE_CXX_FLAGS
}
CMAKE_ARGS -DCMAKE_CXX_FLAGS_RELEASE=
${
CMAKE_CXX_FLAGS_RELEASE
}
CMAKE_ARGS -DCMAKE_CXX_FLAGS_DEBUG=
${
CMAKE_CXX_FLAGS_DEBUG
}
CMAKE_ARGS -DCMAKE_C_FLAGS=
${
CMAKE_C_FLAGS
}
...
...
cmake/external/snappy.cmake
浏览文件 @
88d3dc94
...
...
@@ -20,6 +20,12 @@ set(SNAPPY_SOURCES_DIR ${THIRD_PARTY_PATH}/snappy)
set
(
SNAPPY_INSTALL_DIR
${
THIRD_PARTY_PATH
}
/install/snappy
)
set
(
SNAPPY_INCLUDE_DIR
"
${
SNAPPY_INSTALL_DIR
}
/include"
CACHE PATH
"snappy include directory."
FORCE
)
if
(
WIN32
)
SET
(
SNAPPY_CMAKE_CXX_FLAGS
"
${
CMAKE_CXX_FLAGS
}
/wd4244 /wd4267"
)
else
()
SET
(
SNAPPY_CMAKE_CXX_FLAGS
${
CMAKE_CXX_FLAGS
}
)
endif
()
ExternalProject_Add
(
extern_snappy
GIT_REPOSITORY
"https://github.com/google/snappy"
...
...
@@ -31,7 +37,7 @@ ExternalProject_Add(
-DCMAKE_C_FLAGS=
${
CMAKE_C_FLAGS
}
-DCMAKE_C_FLAGS_DEBUG=
${
CMAKE_C_FLAGS_DEBUG
}
-DCMAKE_C_FLAGS_RELEASE=
${
CMAKE_C_FLAGS_RELEASE
}
-DCMAKE_CXX_FLAGS=
${
CMAKE_CXX_FLAGS
}
-DCMAKE_CXX_FLAGS=
${
SNAPPY_
CMAKE_CXX_FLAGS
}
-DCMAKE_CXX_FLAGS_RELEASE=
${
CMAKE_CXX_FLAGS_RELEASE
}
-DCMAKE_CXX_FLAGS_DEBUG=
${
CMAKE_CXX_FLAGS_DEBUG
}
-DCMAKE_INSTALL_PREFIX=
${
SNAPPY_INSTALL_DIR
}
...
...
cmake/flags.cmake
浏览文件 @
88d3dc94
...
...
@@ -147,12 +147,7 @@ set(GPU_COMMON_FLAGS
-Wno-error=unused-function
# Warnings in Numpy Header.
-Wno-error=array-bounds
# Warnings in Eigen::array
)
else
(
NOT WIN32
)
set
(
COMMON_FLAGS
"/w"
)
#disable all warnings.
set
(
GPU_COMMON_FLAGS
"/w"
)
#disable all warnings
set
(
CMAKE_CXX_FLAGS
"
${
CMAKE_CXX_FLAGS
}
-m64"
)
endif
(
NOT WIN32
)
if
(
APPLE
)
...
...
@@ -193,8 +188,7 @@ safe_set_static_flag()
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO
CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE
CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO
)
if
(
${
flag_var
}
MATCHES
"/W3"
)
string
(
REGEX REPLACE
"/W3"
"/w"
${
flag_var
}
"
${${
flag_var
}}
"
)
endif
(
${
flag_var
}
MATCHES
"/W3"
)
string
(
REGEX REPLACE
"(^| )/W[0-9]( |$)"
" "
${
flag_var
}
"
${${
flag_var
}}
"
)
set
(
flag_var
"
${
flag_var
}
/w"
)
endforeach
(
flag_var
)
endif
(
WIN32
)
cmake/version.cmake
浏览文件 @
88d3dc94
...
...
@@ -30,10 +30,25 @@ while ("${PADDLE_VERSION}" STREQUAL "")
else
()
# otherwise, get the previous git tag name.
set
(
tmp_version
"
${
GIT_TAG_NAME
}
~1"
)
endif
()
else
()
execute_process
(
COMMAND
${
GIT_EXECUTABLE
}
describe --exact-match --tags
${
tmp_version
}
WORKING_DIRECTORY
${
PADDLE_SOURCE_DIR
}
OUTPUT_VARIABLE GIT_EXACT_TAG_NAME
RESULT_VARIABLE GIT_EXACT_TAG_RESULT
ERROR_QUIET OUTPUT_STRIP_TRAILING_WHITESPACE
)
if
(
NOT
${
GIT_EXACT_TAG_NAME
}
)
# Check if current branch is tag branch
if
(
${
GIT_EXACT_TAG_NAME
}
MATCHES
"v
${
TAG_VERSION_REGEX
}
"
)
string
(
REPLACE
"v"
""
PADDLE_VERSION
${
GIT_EXACT_TAG_NAME
}
)
else
()
set
(
PADDLE_VERSION
"0.0.0"
)
endif
()
else
()
# otherwise, we always set PADDLE_VERSION to 0.0.0 to represent latest
set
(
PADDLE_VERSION
"0.0.0"
)
endif
()
endif
()
else
()
set
(
PADDLE_VERSION
"0.0.0"
)
message
(
WARNING
"Cannot add paddle version from git tag"
)
...
...
paddle/fluid/API.spec
浏览文件 @
88d3dc94
...
...
@@ -8,13 +8,13 @@ paddle.fluid.Program.parse_from_string ArgSpec(args=['binary_str'], varargs=None
paddle.fluid.Program.to_string ArgSpec(args=['self', 'throw_on_error', 'with_details'], varargs=None, keywords=None, defaults=(False,))
paddle.fluid.default_startup_program ArgSpec(args=[], varargs=None, keywords=None, defaults=None)
paddle.fluid.default_main_program ArgSpec(args=[], varargs=None, keywords=None, defaults=None)
paddle.fluid.program_guard ArgSpec(args=[
], varargs='args', keywords='kwds', defaults=None
)
paddle.fluid.name_scope ArgSpec(args=[
], varargs='args', keywords='kwds', defaults=None
)
paddle.fluid.program_guard ArgSpec(args=[
'main_program', 'startup_program'], varargs=None, keywords=None, defaults=(None,)
)
paddle.fluid.name_scope ArgSpec(args=[
'prefix'], varargs=None, keywords=None, defaults=(None,)
)
paddle.fluid.Executor.__init__ ArgSpec(args=['self', 'place'], varargs=None, keywords=None, defaults=None)
paddle.fluid.Executor.close ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.Executor.run ArgSpec(args=['self', 'program', 'feed', 'fetch_list', 'feed_var_name', 'fetch_var_name', 'scope', 'return_numpy', 'use_program_cache'], varargs=None, keywords=None, defaults=(None, None, None, 'feed', 'fetch', None, True, False))
paddle.fluid.global_scope ArgSpec(args=[], varargs=None, keywords=None, defaults=None)
paddle.fluid.scope_guard ArgSpec(args=[
], varargs='args', keywords='kwds'
, defaults=None)
paddle.fluid.scope_guard ArgSpec(args=[
'scope'], varargs=None, keywords=None
, defaults=None)
paddle.fluid.DistributeTranspiler.__init__ ArgSpec(args=['self', 'config'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.DistributeTranspiler.get_pserver_program ArgSpec(args=['self', 'endpoint'], varargs=None, keywords=None, defaults=None)
paddle.fluid.DistributeTranspiler.get_pserver_programs ArgSpec(args=['self', 'endpoint'], varargs=None, keywords=None, defaults=None)
...
...
@@ -66,7 +66,7 @@ paddle.fluid.initializer.XavierInitializer.__init__ ArgSpec(args=['self', 'unifo
paddle.fluid.initializer.BilinearInitializer.__init__ ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.initializer.MSRAInitializer.__init__ ArgSpec(args=['self', 'uniform', 'fan_in', 'seed'], varargs=None, keywords=None, defaults=(True, None, 0))
paddle.fluid.initializer.force_init_on_cpu ArgSpec(args=[], varargs=None, keywords=None, defaults=None)
paddle.fluid.initializer.init_on_cpu ArgSpec(args=[], varargs=
'args', keywords='kwds'
, defaults=None)
paddle.fluid.initializer.init_on_cpu ArgSpec(args=[], varargs=
None, keywords=None
, defaults=None)
paddle.fluid.initializer.NumpyArrayInitializer.__init__ ArgSpec(args=['self', 'value'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.fc ArgSpec(args=['input', 'size', 'num_flatten_dims', 'param_attr', 'bias_attr', 'act', 'is_test', 'name'], varargs=None, keywords=None, defaults=(1, None, None, None, False, None))
paddle.fluid.layers.embedding ArgSpec(args=['input', 'size', 'is_sparse', 'is_distributed', 'padding_idx', 'param_attr', 'dtype'], varargs=None, keywords=None, defaults=(False, False, None, None, 'float32'))
...
...
@@ -229,7 +229,7 @@ paddle.fluid.layers.random_data_generator ArgSpec(args=['low', 'high', 'shapes',
paddle.fluid.layers.py_reader ArgSpec(args=['capacity', 'shapes', 'dtypes', 'lod_levels', 'name', 'use_double_buffer'], varargs=None, keywords=None, defaults=(None, None, True))
paddle.fluid.layers.create_py_reader_by_data ArgSpec(args=['capacity', 'feed_list', 'name', 'use_double_buffer'], varargs=None, keywords=None, defaults=(None, True))
paddle.fluid.layers.Preprocessor.__init__ ArgSpec(args=['self', 'reader', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.Preprocessor.block ArgSpec(args=[
], varargs='args', keywords='kwds'
, defaults=None)
paddle.fluid.layers.Preprocessor.block ArgSpec(args=[
'self'], varargs=None, keywords=None
, defaults=None)
paddle.fluid.layers.Preprocessor.inputs ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.Preprocessor.outputs ArgSpec(args=['self'], varargs='outs', keywords=None, defaults=None)
paddle.fluid.layers.load ArgSpec(args=['out', 'file_path', 'load_as_fp16'], varargs=None, keywords=None, defaults=(None,))
...
...
@@ -270,7 +270,7 @@ paddle.fluid.layers.IfElse.input ArgSpec(args=['self', 'x'], varargs=None, keywo
paddle.fluid.layers.IfElse.output ArgSpec(args=['self'], varargs='outs', keywords=None, defaults=None)
paddle.fluid.layers.IfElse.true_block ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.DynamicRNN.__init__ ArgSpec(args=['self', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.DynamicRNN.block ArgSpec(args=[
], varargs='args', keywords='kwds'
, defaults=None)
paddle.fluid.layers.DynamicRNN.block ArgSpec(args=[
'self'], varargs=None, keywords=None
, defaults=None)
paddle.fluid.layers.DynamicRNN.memory ArgSpec(args=['self', 'init', 'shape', 'value', 'need_reorder', 'dtype'], varargs=None, keywords=None, defaults=(None, None, 0.0, False, 'float32'))
paddle.fluid.layers.DynamicRNN.output ArgSpec(args=['self'], varargs='outputs', keywords=None, defaults=None)
paddle.fluid.layers.DynamicRNN.static_input ArgSpec(args=['self', 'x'], varargs=None, keywords=None, defaults=None)
...
...
@@ -346,12 +346,12 @@ paddle.fluid.contrib.StateCell.set_state ArgSpec(args=['self', 'state_name', 'st
paddle.fluid.contrib.StateCell.state_updater ArgSpec(args=['self', 'updater'], varargs=None, keywords=None, defaults=None)
paddle.fluid.contrib.StateCell.update_states ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.contrib.TrainingDecoder.__init__ ArgSpec(args=['self', 'state_cell', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.contrib.TrainingDecoder.block ArgSpec(args=[
], varargs='args', keywords='kwds'
, defaults=None)
paddle.fluid.contrib.TrainingDecoder.block ArgSpec(args=[
'self'], varargs=None, keywords=None
, defaults=None)
paddle.fluid.contrib.TrainingDecoder.output ArgSpec(args=['self'], varargs='outputs', keywords=None, defaults=None)
paddle.fluid.contrib.TrainingDecoder.static_input ArgSpec(args=['self', 'x'], varargs=None, keywords=None, defaults=None)
paddle.fluid.contrib.TrainingDecoder.step_input ArgSpec(args=['self', 'x'], varargs=None, keywords=None, defaults=None)
paddle.fluid.contrib.BeamSearchDecoder.__init__ ArgSpec(args=['self', 'state_cell', 'init_ids', 'init_scores', 'target_dict_dim', 'word_dim', 'input_var_dict', 'topk_size', 'sparse_emb', 'max_len', 'beam_size', 'end_id', 'name'], varargs=None, keywords=None, defaults=({}, 50, True, 100, 1, 1, None))
paddle.fluid.contrib.BeamSearchDecoder.block ArgSpec(args=[
], varargs='args', keywords='kwds'
, defaults=None)
paddle.fluid.contrib.BeamSearchDecoder.block ArgSpec(args=[
'self'], varargs=None, keywords=None
, defaults=None)
paddle.fluid.contrib.BeamSearchDecoder.decode ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.contrib.BeamSearchDecoder.early_stop ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.contrib.BeamSearchDecoder.read_array ArgSpec(args=['self', 'init', 'is_ids', 'is_scores'], varargs=None, keywords=None, defaults=(False, False))
...
...
@@ -456,7 +456,7 @@ paddle.fluid.optimizer.AdadeltaOptimizer.apply_gradients ArgSpec(args=['self', '
paddle.fluid.optimizer.AdadeltaOptimizer.backward ArgSpec(args=['self', 'loss', 'startup_program', 'parameter_list', 'no_grad_set', 'callbacks'], varargs=None, keywords=None, defaults=(None, None, None, None))
paddle.fluid.optimizer.AdadeltaOptimizer.minimize ArgSpec(args=['self', 'loss', 'startup_program', 'parameter_list', 'no_grad_set'], varargs=None, keywords=None, defaults=(None, None, None))
paddle.fluid.optimizer.ModelAverage.__init__ ArgSpec(args=['self', 'average_window_rate', 'min_average_window', 'max_average_window', 'regularization', 'name'], varargs=None, keywords=None, defaults=(10000, 10000, None, None))
paddle.fluid.optimizer.ModelAverage.apply ArgSpec(args=[
], varargs='args', keywords='kwds', defaults=None
)
paddle.fluid.optimizer.ModelAverage.apply ArgSpec(args=[
'self', 'executor', 'need_restore'], varargs=None, keywords=None, defaults=(True,)
)
paddle.fluid.optimizer.ModelAverage.apply_gradients ArgSpec(args=['self', 'params_grads'], varargs=None, keywords=None, defaults=None)
paddle.fluid.optimizer.ModelAverage.backward ArgSpec(args=['self', 'loss', 'startup_program', 'parameter_list', 'no_grad_set', 'callbacks'], varargs=None, keywords=None, defaults=(None, None, None, None))
paddle.fluid.optimizer.ModelAverage.minimize ArgSpec(args=['self', 'loss', 'startup_program', 'parameter_list', 'no_grad_set'], varargs=None, keywords=None, defaults=(None, None, None))
...
...
@@ -491,14 +491,14 @@ paddle.fluid.clip.ErrorClipByValue.__init__ ArgSpec(args=['self', 'max', 'min'],
paddle.fluid.clip.GradientClipByValue.__init__ ArgSpec(args=['self', 'max', 'min'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.clip.GradientClipByNorm.__init__ ArgSpec(args=['self', 'clip_norm'], varargs=None, keywords=None, defaults=None)
paddle.fluid.clip.GradientClipByGlobalNorm.__init__ ArgSpec(args=['self', 'clip_norm', 'group_name'], varargs=None, keywords=None, defaults=('default_group',))
paddle.fluid.profiler.cuda_profiler ArgSpec(args=[
], varargs='args', keywords='kwds', defaults=None
)
paddle.fluid.profiler.cuda_profiler ArgSpec(args=[
'output_file', 'output_mode', 'config'], varargs=None, keywords=None, defaults=(None, None)
)
paddle.fluid.profiler.reset_profiler ArgSpec(args=[], varargs=None, keywords=None, defaults=None)
paddle.fluid.profiler.profiler ArgSpec(args=[
], varargs='args', keywords='kwds', defaults=None
)
paddle.fluid.profiler.profiler ArgSpec(args=[
'state', 'sorted_key', 'profile_path'], varargs=None, keywords=None, defaults=(None, '/tmp/profile')
)
paddle.fluid.profiler.start_profiler ArgSpec(args=['state'], varargs=None, keywords=None, defaults=None)
paddle.fluid.profiler.stop_profiler ArgSpec(args=['sorted_key', 'profile_path'], varargs=None, keywords=None, defaults=(None, '/tmp/profile'))
paddle.fluid.unique_name.generate ArgSpec(args=['key'], varargs=None, keywords=None, defaults=None)
paddle.fluid.unique_name.switch ArgSpec(args=['new_generator'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.unique_name.guard ArgSpec(args=[
], varargs='args', keywords='kwds', defaults=None
)
paddle.fluid.unique_name.guard ArgSpec(args=[
'new_generator'], varargs=None, keywords=None, defaults=(None,)
)
paddle.fluid.recordio_writer.convert_reader_to_recordio_file ArgSpec(args=['filename', 'reader_creator', 'feeder', 'compressor', 'max_num_records', 'feed_order'], varargs=None, keywords=None, defaults=(Compressor.Snappy, 1000, None))
paddle.fluid.recordio_writer.convert_reader_to_recordio_files ArgSpec(args=['filename', 'batch_per_file', 'reader_creator', 'feeder', 'compressor', 'max_num_records', 'feed_order'], varargs=None, keywords=None, defaults=(Compressor.Snappy, 1000, None))
paddle.fluid.Scope Scope() -> paddle.fluid.core._Scope
...
...
paddle/fluid/framework/CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -128,7 +128,7 @@ cc_test(version_test SRCS version_test.cc DEPS version)
cc_library
(
proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS shape_inference op_info operator glog version
)
cc_library
(
op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog proto_desc
)
cc_library
(
op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog proto_desc
memory_optimize_helper
)
nv_test
(
op_registry_test SRCS op_registry_test.cc DEPS op_registry
)
py_proto_compile
(
framework_py_proto SRCS framework.proto data_feed.proto
)
...
...
@@ -158,18 +158,19 @@ cc_library(variable_helper SRCS variable_helper.cc DEPS lod_tensor)
cc_library
(
naive_executor SRCS naive_executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass variable_helper
)
if
(
WITH_NGRAPH
)
set
(
NGRAPH_EXE_DEPS ngraph_engine
)
else
()
set
(
NGRAPH_EXE_DEPS
)
endif
()
if
(
WITH_DISTRIBUTE
)
cc_library
(
executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog
lod_rank_table feed_fetch_method sendrecvop_rpc
${
GLOB_DISTRIBUTE_DEPS
}
graph_to_program_pass variable_helper
)
lod_rank_table feed_fetch_method sendrecvop_rpc
${
GLOB_DISTRIBUTE_DEPS
}
graph_to_program_pass variable_helper
${
NGRAPH_EXE_DEPS
}
)
set
(
DISTRIBUTE_COMPILE_FLAGS
"-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor"
)
set_source_files_properties
(
executor.cc PROPERTIES COMPILE_FLAGS
${
DISTRIBUTE_COMPILE_FLAGS
}
)
else
()
if
(
WITH_NGRAPH
)
cc_library
(
executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass variable_helper ngraph_engine
)
else
()
cc_library
(
executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass variable_helper
)
endif
()
cc_library
(
executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass variable_helper
${
NGRAPH_EXE_DEPS
}
)
cc_test
(
test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor elementwise_add_op
)
endif
()
...
...
@@ -192,6 +193,7 @@ cc_library(prune SRCS prune.cc DEPS framework_proto)
cc_test
(
prune_test SRCS prune_test.cc DEPS op_info prune recurrent_op device_context
)
cc_test
(
var_type_inference_test SRCS var_type_inference_test.cc DEPS op_registry
proto_desc
)
cc_test
(
inplace_op_inference_test SRCS inplace_op_inference_test.cc DEPS op_registry proto_desc op_info memory_optimize_helper
)
cc_library
(
selected_rows SRCS selected_rows.cc DEPS tensor
)
cc_test
(
selected_rows_test SRCS selected_rows_test.cc DEPS selected_rows
)
...
...
paddle/fluid/framework/async_executor.cc
浏览文件 @
88d3dc94
...
...
@@ -244,6 +244,7 @@ void AsyncExecutor::RunFromFile(const ProgramDesc& main_program,
auto
&
block
=
main_program
.
Block
(
0
);
for
(
auto
var_name
:
fetch_var_names
)
{
auto
var_desc
=
block
.
FindVar
(
var_name
);
PADDLE_ENFORCE_NOT_NULL
(
var_desc
,
"%s is not found."
,
var_name
);
auto
shapes
=
var_desc
->
GetShape
();
PADDLE_ENFORCE
(
shapes
[
shapes
.
size
()
-
1
]
==
1
,
"var %s: Fetched var has wrong shape, "
...
...
paddle/fluid/framework/details/CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -50,10 +50,10 @@ cc_library(data_balance_op_handle SRCS data_balance_op_handle.cc DEPS op_handle_
cc_library
(
gather_op_handle SRCS gather_op_handle.cc DEPS op_handle_base scope ddim memory variable_visitor
)
cc_library
(
fuse_vars_op_handle SRCS fuse_vars_op_handle.cc DEPS op_handle_base scope
)
cc_library
(
memory_optimize_pass SRCS analysis_var_pass.cc memory_reuse_types.cc DEPS graph graph_helper pass
)
cc_library
(
memory_optimize_helper SRCS memory_optimize_helper.cc DEPS graph graph_helper
)
cc_library
(
memory_optimize_pass SRCS memory_optimize_pass.cc DEPS memory_optimize_helper pass
)
cc_library
(
inplace_op_pass SRCS inplace_op_pass.cc DEPS memory_optimize_pass op_info
)
cc_library
(
modify_op_lock_and_record_event_pass SRCS modify_op_lock_and_record_event_pass.cc DEPS computation_op_handle op_graph_view multi_devices_helper
)
cc_library
(
memory_early_delete_pass SRCS memory_early_delete_pass.cc DEPS memory_optimize_pass computation_op_handle scale_loss_grad_op_handle rpc_op_handle
all_reduce_op_handle reduce_op_handle broadcast_op_handle data_balance_op_handle graph graph_helper pass
)
cc_library
(
reference_count_pass_helper SRCS reference_count_pass_helper.cc DEPS garbage_collector computation_op_handle
)
cc_library
(
eager_deletion_op_handle SRCS eager_deletion_op_handle.cc DEPS lod_tensor selected_rows reference_count_pass_helper
)
cc_library
(
eager_deletion_pass SRCS eager_deletion_pass.cc DEPS computation_op_handle eager_deletion_op_handle graph graph_helper pass
)
...
...
@@ -65,13 +65,11 @@ cc_library(all_reduce_deps_pass SRCS all_reduce_deps_pass.cc DEPS graph graph_he
cc_library
(
multi_devices_graph_pass SRCS multi_devices_graph_pass.cc DEPS multi_devices_helper computation_op_handle
scale_loss_grad_op_handle rpc_op_handle all_reduce_op_handle reduce_op_handle broadcast_op_handle data_balance_op_handle fused_broadcast_op_handle
)
set
(
SSA_GRAPH_EXECUTOR_DEPS graph framework_proto sequential_execution_pass modify_op_lock_and_record_event_pass all_reduce_deps_pass reference_count_pass eager_deletion_pass memory_optimize_pass
memory_early_delete
_pass
)
set
(
SSA_GRAPH_EXECUTOR_DEPS graph framework_proto sequential_execution_pass modify_op_lock_and_record_event_pass all_reduce_deps_pass reference_count_pass eager_deletion_pass memory_optimize_pass
inplace_op
_pass
)
if
(
WITH_GPU
)
list
(
APPEND SSA_GRAPH_EXECUTOR_DEPS reference_count_pass
)
endif
()
cc_test
(
memory_reuse_types_test SRCS memory_reuse_types_test.cc memory_reuse_types.cc DEPS framework_proto graph
)
cc_test
(
analysis_var_pass_test SRCS analysis_var_pass_test.cc analysis_var_pass.cc memory_reuse_types.cc DEPS framework_proto graph graph_helper op_registry pass
)
cc_test
(
memory_optimize_helper_test SRCS memory_optimize_helper_test.cc memory_optimize_helper.cc DEPS framework_proto graph graph_helper op_registry
)
cc_library
(
ssa_graph_executor SRCS ssa_graph_executor.cc DEPS
${
SSA_GRAPH_EXECUTOR_DEPS
}
)
cc_library
(
threaded_ssa_graph_executor SRCS threaded_ssa_graph_executor.cc DEPS fetch_op_handle ssa_graph_executor scope
...
...
paddle/fluid/framework/details/build_strategy.cc
浏览文件 @
88d3dc94
...
...
@@ -17,7 +17,7 @@ limitations under the License. */
#include <glog/logging.h>
#include <memory>
#include "paddle/fluid/framework/details/memory_
reuse_types
.h"
#include "paddle/fluid/framework/details/memory_
optimize_helper
.h"
#include "paddle/fluid/framework/details/multi_devices_graph_pass.h"
#include "paddle/fluid/framework/details/multi_devices_graph_print_pass.h"
#include "paddle/fluid/framework/details/reduce_op_handle.h"
...
...
@@ -47,6 +47,22 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
AppendPass
(
"sequential_execution_pass"
);
}
// Add op fusion.
if
(
strategy
.
fuse_relu_depthwise_conv_
)
{
AppendPass
(
"fuse_relu_depthwise_conv_pass"
);
}
// NOTE(dzhwinter): A note for automatical inplace.
// 1. modify program desc passes should put
// before inplace pass.
// 2. manually configured inplace should put
// before inplace_pass
// Add automatically inplace.
if
(
strategy_
.
enable_inplace_
)
{
AppendPass
(
"inplace_pass"
);
}
// Add a graph viz pass to record a graph.
if
(
!
strategy_
.
debug_graphviz_path_
.
empty
())
{
auto
viz_pass
=
AppendPass
(
"graph_viz_pass"
);
...
...
@@ -55,10 +71,6 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
viz_pass
->
Set
<
std
::
string
>
(
"graph_viz_path"
,
new
std
::
string
(
graph_path
));
}
// Add op fusion.
if
(
strategy
.
fuse_relu_depthwise_conv_
)
{
AppendPass
(
"fuse_relu_depthwise_conv_pass"
);
}
if
(
strategy
.
fuse_elewise_add_act_ops_
)
{
auto
fuse_elewise_add_act_pass
=
AppendPass
(
"fuse_elewise_add_act_pass"
);
// Add a graph viz pass to record a graph.
...
...
@@ -88,7 +100,7 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
// A side-effect of that, memory optimize cannot forsee the fetched vars
// , so fetchlist should be set persistable before call the Run interface.
if
(
strategy
.
memory_optimize_
)
{
auto
analysis_var_pass
=
AppendPass
(
"analysis_var
_pass"
);
auto
memory_optimize_pass
=
AppendPass
(
"memory_optimize
_pass"
);
}
AppendMultiDevPass
(
strategy
);
...
...
@@ -190,14 +202,14 @@ std::unique_ptr<ir::Graph> BuildStrategy::Apply(
pass
->
Erase
(
"nccl_ctxs"
);
pass
->
SetNotOwned
<
platform
::
NCCLContextMap
>
(
"nccl_ctxs"
,
nctx
);
#endif
}
else
if
(
pass
->
Type
()
==
"analysis_var_pass"
)
{
}
else
if
(
pass
->
Type
()
==
"memory_optimize_pass"
)
{
if
(
graph
->
Has
(
kAllOpDescs
))
{
graph
->
Erase
(
kAllOpDescs
);
}
const
std
::
vector
<
OpDesc
*>
*
all_op_descs
=
new
std
::
vector
<
OpDesc
*>
(
main_program
.
Block
(
0
).
AllOps
());
graph
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
all_op_descs
);
// take ownership
graph
->
Set
<
GraphNodePool
>
(
kGraphNodePool
,
new
GraphNodePool
);
// take ownership
pass
->
Erase
(
kAllOpDescs
);
pass
->
SetNotOwned
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
all_op_descs
);
...
...
@@ -218,6 +230,13 @@ std::unique_ptr<ir::Graph> BuildStrategy::Apply(
pass
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
new
std
::
vector
<
OpDesc
*>
(
main_program
.
Block
(
0
).
AllOps
()));
}
else
if
(
pass
->
Type
()
==
"inplace_pass"
)
{
if
(
graph
->
Has
(
kAllOpDescs
))
{
graph
->
Erase
(
kAllOpDescs
);
}
graph
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
new
std
::
vector
<
OpDesc
*>
(
main_program
.
Block
(
0
).
AllOps
()));
}
else
if
(
pass
->
Type
()
==
"fuse_relu_depthwise_conv_pass"
)
{
if
(
!
use_cuda
)
{
LOG
(
WARNING
)
<<
"fuse_relu_depthwise_conv_pass is only supported on "
...
...
@@ -243,9 +262,10 @@ USE_PASS(allreduce_mode_multi_devices_pass);
USE_PASS
(
dist_multi_devices_pass
);
USE_PASS
(
multi_devices_check_pass
);
USE_PASS
(
multi_devices_print_pass
);
USE_PASS
(
analysis_var
_pass
);
USE_PASS
(
memory_optimize
_pass
);
USE_PASS
(
sequential_execution_pass
);
USE_PASS
(
all_reduce_deps_pass
);
USE_PASS
(
modify_op_lock_and_record_event_pass
);
USE_PASS
(
inplace_pass
);
USE_PASS
(
lock_free_optimize_pass
);
USE_PASS
(
graph_to_program_pass
);
paddle/fluid/framework/details/build_strategy.h
浏览文件 @
88d3dc94
...
...
@@ -77,8 +77,10 @@ struct BuildStrategy {
bool
fuse_relu_depthwise_conv_
{
false
};
bool
memory_optimize_
{
false
};
bool
memory_early_delete_
{
false
};
// TODO(dzhwinter):
// make enable_inplace, memory_optimize_
// memory_early_delete_ true by default
bool
enable_inplace_
{
false
};
bool
enable_sequential_execution_
{
false
};
...
...
paddle/fluid/framework/details/computation_op_handle.h
浏览文件 @
88d3dc94
...
...
@@ -26,7 +26,7 @@
namespace
paddle
{
namespace
framework
{
namespace
details
{
struct
ComputationOpHandle
:
public
OpHandleBase
{
class
ComputationOpHandle
:
public
OpHandleBase
{
public:
ComputationOpHandle
(
ir
::
Node
*
node
,
Scope
*
scope
,
platform
::
Place
place
,
size_t
scope_idx
);
...
...
paddle/fluid/framework/details/fused_broadcast_op_handle_test.cc
浏览文件 @
88d3dc94
...
...
@@ -34,8 +34,8 @@ struct TestFusedBroadcastOpHandle : TestBroadcastOpHandle {
->
Var
(
details
::
kLocalExecScopeName
)
->
GetMutable
<
Scope
*>
()
=
&
local_scope
;
for
(
size_t
j
=
0
;
j
<
input_scope_idxes
.
size
();
++
j
)
{
local_scope
.
Var
(
"out_var"
+
j
);
if
(
i
==
j
)
local_scope
.
Var
(
"in_var"
+
j
);
local_scope
.
Var
(
"out_var"
+
std
::
to_string
(
j
)
);
if
(
i
==
j
)
local_scope
.
Var
(
"in_var"
+
std
::
to_string
(
j
)
);
}
param_scopes_
.
emplace_back
(
&
local_scope
);
}
...
...
@@ -62,20 +62,21 @@ struct TestFusedBroadcastOpHandle : TestBroadcastOpHandle {
for
(
size_t
i
=
0
;
i
<
input_scope_idxes
.
size
();
++
i
)
{
// add input var handle
nodes_
.
emplace_back
(
ir
::
CreateNodeForTest
(
"in_node"
+
i
,
ir
::
Node
::
Type
::
kVariable
));
VarHandle
*
in_var_handle
=
n
ew
VarHandle
(
n
odes_
.
back
().
get
(),
1
,
input_scope_idxes
[
i
],
"in_var"
+
i
,
place_list_
[
input_scope_idxes
[
i
]]);
nodes_
.
emplace_back
(
ir
::
CreateNodeForTest
(
"in_node"
+
std
::
to_string
(
i
),
ir
::
Node
::
Type
::
kVariable
));
VarHandle
*
in_var_handle
=
new
VarHandle
(
nodes_
.
back
().
get
(),
1
,
input_scope_idxes
[
i
],
"in_var"
+
std
::
to_string
(
i
)
,
place_list_
[
input_scope_idxes
[
i
]]);
vars_
.
emplace_back
(
in_var_handle
);
op_handle_
->
AddInput
(
in_var_handle
);
// add output var handle
for
(
size_t
j
=
0
;
j
<
place_list_
.
size
();
++
j
)
{
nodes_
.
emplace_back
(
ir
::
CreateNodeForTest
(
"out_node"
+
i
,
ir
::
Node
::
Type
::
kVariable
));
VarHandle
*
out_var_handle
=
new
VarHandle
(
nodes_
.
back
().
get
(),
2
,
j
,
"out_var"
+
i
,
place_list_
[
j
]);
nodes_
.
emplace_back
(
ir
::
CreateNodeForTest
(
"out_node"
+
std
::
to_string
(
i
),
ir
::
Node
::
Type
::
kVariable
));
VarHandle
*
out_var_handle
=
new
VarHandle
(
nodes_
.
back
().
get
(),
2
,
j
,
"out_var"
+
std
::
to_string
(
i
),
place_list_
[
j
]);
vars_
.
emplace_back
(
out_var_handle
);
op_handle_
->
AddOutput
(
out_var_handle
);
}
...
...
@@ -86,7 +87,7 @@ struct TestFusedBroadcastOpHandle : TestBroadcastOpHandle {
std
::
vector
<
std
::
vector
<
float
>>
send_vec
;
f
::
LoD
lod
{{
0
,
10
,
20
}};
for
(
size_t
i
=
0
;
i
<
input_scope_idxes
.
size
();
++
i
)
{
const
std
::
string
varname
(
"in_var"
+
i
);
const
std
::
string
varname
(
"in_var"
+
std
::
to_string
(
i
)
);
float
val_scalar
=
static_cast
<
float
>
(
i
);
send_vec
.
push_back
(
InitLoDTensor
(
varname
,
input_scope_idxes
[
i
],
lod
,
val_scalar
));
...
...
@@ -96,7 +97,7 @@ struct TestFusedBroadcastOpHandle : TestBroadcastOpHandle {
WaitAll
();
for
(
size_t
i
=
0
;
i
<
input_scope_idxes
.
size
();
++
i
)
{
const
std
::
string
&
varname
(
"out_var"
+
i
);
const
std
::
string
&
varname
(
"out_var"
+
std
::
to_string
(
i
)
);
for
(
size_t
j
=
0
;
j
<
place_list_
.
size
();
++
j
)
{
LoDTensorEqual
(
varname
,
send_vec
[
i
],
lod
,
param_scopes_
[
j
]);
}
...
...
@@ -109,7 +110,7 @@ struct TestFusedBroadcastOpHandle : TestBroadcastOpHandle {
2
,
4
,
6
,
3
,
1
,
1
,
1
,
1
,
3
,
7
};
int
height
=
static_cast
<
int
>
(
kDims
[
0
]
*
2
);
for
(
size_t
i
=
0
;
i
<
input_scope_idxes
.
size
();
++
i
)
{
const
std
::
string
varname
(
"in_var"
+
i
);
const
std
::
string
varname
(
"in_var"
+
std
::
to_string
(
i
)
);
float
val_scalar
=
static_cast
<
float
>
(
i
);
send_vector
.
push_back
(
InitSelectedRows
(
varname
,
input_scope_idxes
[
i
],
rows
,
height
,
val_scalar
));
...
...
@@ -119,7 +120,7 @@ struct TestFusedBroadcastOpHandle : TestBroadcastOpHandle {
WaitAll
();
for
(
size_t
i
=
0
;
i
<
input_scope_idxes
.
size
();
++
i
)
{
const
std
::
string
&
varname
(
"out_var"
+
i
);
const
std
::
string
&
varname
(
"out_var"
+
std
::
to_string
(
i
)
);
for
(
size_t
j
=
0
;
j
<
place_list_
.
size
();
++
j
)
{
SelectedRowsEqual
(
varname
,
input_scope_idxes
[
i
],
send_vector
[
i
],
rows
,
height
);
...
...
paddle/fluid/framework/details/
memory_early_delete_pass
.h
→
paddle/fluid/framework/details/
graph_test_base
.h
浏览文件 @
88d3dc94
// Copyright (c) 201
8
PaddlePaddle Authors. All Rights Reserved.
// Copyright (c) 201
9
PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
...
...
@@ -13,20 +13,68 @@
// limitations under the License.
#pragma once
#include "paddle/fluid/framework/details/early_delete_op_handle.h"
#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
#include "glog/logging.h"
#include "gtest/gtest.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/pass.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/program_desc.h"
namespace
paddle
{
namespace
framework
{
namespace
details
{
class
MemoryEarlyDeletePass
:
public
ir
::
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
;
class
DummyOp
:
public
OperatorBase
{
public:
DummyOp
(
const
std
::
string
&
type
,
const
VariableNameMap
&
inputs
,
const
VariableNameMap
&
outputs
,
const
AttributeMap
&
attrs
)
:
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
private:
void
RunImpl
(
const
Scope
&
scope
,
const
platform
::
Place
&
place
)
const
override
{}
};
class
SumOpMaker
:
public
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
AddInput
(
"X"
,
""
).
AsDuplicable
();
AddOutput
(
"Out"
,
""
);
AddComment
(
""
);
}
};
class
AssignOpMaker
:
public
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
AddInput
(
"X"
,
""
).
AsDuplicable
();
AddOutput
(
"Out"
,
""
);
AddComment
(
""
);
}
};
class
SplitOpMaker
:
public
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
AddInput
(
"X"
,
""
);
AddOutput
(
"Out"
,
""
).
AsDuplicable
();
AddComment
(
""
);
}
};
class
DummyVarTypeInference
:
public
VarTypeInference
{
public:
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
auto
&
inputs
=
op_desc
.
Input
(
"X"
);
auto
type
=
block
->
Var
(
inputs
.
front
())
->
GetType
();
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
block
->
Var
(
out_var_name
)
->
SetType
(
type
);
}
};
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/inplace_op_pass.cc
0 → 100644
浏览文件 @
88d3dc94
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/inplace_op_pass.h"
#include <algorithm>
#include <deque>
#include <iterator>
#include <stack>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>
#include "paddle/fluid/framework/details/memory_optimize_pass.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/op_info.h"
// NOTE(dzhwinter): inplace means one op output variable reuse the input space.
// By our design, one operator only can read its input(const Variable),
// write its output(non-const Variable). If one operator is inplaced, means
// user have chance to write the space before reading happens.
// Especially when some optimize code writing style is applied.
//
//
// /* wrong case in operator */
// /*In this case, a larger allocation is allocated, input content is lost*/
// const Tensor* in = ctx.Input<Tensor>("In")
// Tensor* out = ctx.Output<Tensor>("Out");
// auto* out_ptr = out->mutable_data<T>(ctx.GetPlace());
// out_ptr[0] = 0; // input contect is overwrited.
// NOTE(dzhwinter):
// Only for backward compacity and stable. if enable_inplace_whitelist is turn
// on.
// only the ops in whitelist will be use inplace strategy.
// if not, all the op will be inplaced if it registered with InplaceClass
DEFINE_bool
(
enable_inplace_whitelist
,
false
,
"If this option turns on, only these op in whitelist can be inplaced."
"If it turns off, all of the running op can be candidate of inplaced op."
"Such as scale, elementwise_add"
"By default, it's turned on"
);
DECLARE_string
(
memory_optimize_debug
);
// clang-format off
const
std
::
string
kInplacedOpWhiteList
[]
=
{
// NOLINT
"sigmoid"
,
"exp"
,
"relu"
,
"tanh"
,
"sqrt"
,
"ceil"
,
"floor"
,
"reciprocal"
,
"relu6"
,
"soft_relu"
,
"hard_sigmoid"
,
"batch_norm"
,
"batch_norm_grad"
,
"sum"
,
"sum_grad"
,
"scale"
,
"reshape"
,
"elementwise_add"
,
"elementwise_add_grad"
,
};
// clang-format on
namespace
paddle
{
namespace
framework
{
namespace
details
{
static
inline
ir
::
Node
*
GetNextCascadeInplacedVar
(
ir
::
Node
*
var
)
{
// if next op is inplaced, then return the output var
// otherwise return nullptr
PADDLE_ENFORCE
(
var
&&
var
->
IsVar
()
&&
!
var
->
IsCtrlVar
());
ir
::
Node
*
inplaced_var
=
nullptr
;
for
(
auto
*
next_op
:
var
->
outputs
)
{
for
(
auto
*
output
:
next_op
->
outputs
)
{
if
(
output
->
IsVar
()
&&
!
output
->
IsCtrlVar
()
&&
output
->
Name
()
==
var
->
Name
())
{
inplaced_var
=
output
;
}
}
}
return
inplaced_var
;
}
static
inline
ir
::
Node
*
GetPrevCascadeInplacedVar
(
ir
::
Node
*
var
)
{
PADDLE_ENFORCE
(
var
&&
var
->
IsVar
()
&&
!
var
->
IsCtrlVar
());
if
(
var
->
inputs
.
empty
())
return
nullptr
;
auto
*
prev_op
=
var
->
inputs
.
at
(
0
);
auto
input_it
=
std
::
find_if
(
prev_op
->
inputs
.
begin
(),
prev_op
->
inputs
.
end
(),
[
&
](
ir
::
Node
*
node
)
{
if
(
node
->
IsVar
()
&&
!
node
->
IsCtrlVar
()
&&
node
->
Name
()
==
var
->
Name
())
{
return
true
;
}
else
{
return
false
;
}
});
return
input_it
==
prev_op
->
inputs
.
end
()
?
nullptr
:
*
input_it
;
}
InplacePass
::
InplacePass
()
:
Pass
()
{
if
(
FLAGS_enable_inplace_whitelist
)
{
for
(
auto
&
s
:
kInplacedOpWhiteList
)
{
whitelist_
.
emplace
(
s
);
}
}
}
void
InplacePass
::
InitSSAGraphNodes
()
const
{
std
::
unordered_map
<
std
::
string
,
std
::
unordered_set
<
ir
::
Node
*>>
all_vars
;
for
(
auto
*
op
:
view_
.
AllOps
())
{
for
(
auto
*
node
:
op
->
inputs
)
{
if
(
!
node
->
IsVar
()
||
node
->
IsCtrlVar
())
continue
;
if
(
all_vars
[
node
->
Name
()].
count
(
node
)
==
0
)
{
all_vars
[
node
->
Name
()].
emplace
(
node
);
var_nodes_
[
node
->
Name
()].
emplace_back
(
node
);
}
}
for
(
auto
*
node
:
op
->
outputs
)
{
if
(
!
node
->
IsVar
()
||
node
->
IsCtrlVar
())
continue
;
if
(
all_vars
[
node
->
Name
()].
count
(
node
)
==
0
)
{
all_vars
[
node
->
Name
()].
emplace
(
node
);
var_nodes_
[
node
->
Name
()].
emplace_back
(
node
);
}
}
}
}
std
::
unique_ptr
<
ir
::
Graph
>
InplacePass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
var_nodes_
.
clear
();
view_
.
Build
(
graph
.
get
());
InitSSAGraphNodes
();
for
(
auto
*
op
:
view_
.
AllOps
())
{
if
(
FLAGS_enable_inplace_whitelist
&&
!
whitelist_
.
count
(
op
->
Name
()))
continue
;
TryInplaceOpInputOutput
(
op
,
graph
.
get
());
}
graph
->
ResolveHazard
(
var_nodes_
);
return
graph
;
}
void
InplacePass
::
InplaceModifyDesc
(
const
std
::
string
&
var
,
const
std
::
string
&
cache_var
,
const
size_t
&
idx
)
const
{
for
(
size_t
i
=
idx
;
i
<
view_
.
AllOps
().
size
();
++
i
)
{
ir
::
Node
*
op
=
view_
.
AllOps
()[
i
];
PADDLE_ENFORCE
(
op
->
IsOp
()
&&
op
->
Op
());
auto
*
op_desc
=
op
->
Op
();
op_desc
->
RenameInput
(
var
,
cache_var
);
op_desc
->
RenameOutput
(
var
,
cache_var
);
if
(
op_desc
->
Block
()
->
HasVar
(
var
))
op_desc
->
Block
()
->
RemoveVar
(
var
);
op_desc
->
Flush
();
}
}
const
NodeSwapQueue
InplacePass
::
TryInplaceModifyVar
(
const
std
::
string
&
var
,
const
std
::
string
&
cache_var
,
const
size_t
&
idx
,
ir
::
Graph
*
graph
)
const
{
PADDLE_ENFORCE
(
var_nodes_
[
var
].
size
()
>=
1
&&
var_nodes_
[
var
].
at
(
0
)
->
Var
()
!=
nullptr
);
std
::
unique_ptr
<
VarDesc
>
var_desc
(
new
VarDesc
(
*
var_nodes_
[
var
].
at
(
0
)
->
Var
()));
var_desc
->
SetName
(
cache_var
);
NodeSwapQueue
swap_nodes
;
for
(
size_t
i
=
idx
;
i
<
view_
.
AllOps
().
size
();
++
i
)
{
auto
*
op
=
view_
.
AllOps
()[
i
];
// redirect the input to the latest version of cache_var
for
(
auto
*
node
:
op
->
inputs
)
{
if
(
node
->
Name
()
==
var
)
{
ir
::
Node
*
cache_node
=
graph
->
CreateVarNode
(
var_desc
.
get
());
// swap node to cache_node
cache_node
->
outputs
.
insert
(
cache_node
->
outputs
.
end
(),
node
->
outputs
.
begin
(),
node
->
outputs
.
end
());
PADDLE_ENFORCE
(
node
->
inputs
.
size
()
==
1
&&
node
->
inputs
[
0
]
->
IsOp
());
auto
*
prev_op
=
node
->
inputs
[
0
];
std
::
replace
(
prev_op
->
outputs
.
begin
(),
prev_op
->
outputs
.
end
(),
node
,
cache_node
);
cache_node
->
inputs
.
emplace_back
(
prev_op
);
for
(
auto
*
next_op
:
node
->
outputs
)
{
std
::
replace
(
next_op
->
inputs
.
begin
(),
next_op
->
inputs
.
end
(),
node
,
cache_node
);
}
swap_nodes
.
emplace_back
(
std
::
make_pair
(
node
,
cache_node
));
}
}
// if we need to rename the output,
// always create a newer version of cache_var
for
(
auto
*
node
:
op
->
outputs
)
{
if
(
node
->
Name
()
==
var
)
{
ir
::
Node
*
cache_node
=
graph
->
CreateVarNode
(
var_desc
.
get
());
// swap node to cache node
cache_node
->
outputs
.
insert
(
cache_node
->
outputs
.
end
(),
node
->
outputs
.
begin
(),
node
->
outputs
.
end
());
cache_node
->
inputs
.
emplace_back
(
op
);
std
::
replace
(
op
->
outputs
.
begin
(),
op
->
outputs
.
end
(),
node
,
cache_node
);
for
(
auto
*
next_op
:
node
->
outputs
)
{
std
::
replace
(
next_op
->
inputs
.
begin
(),
next_op
->
inputs
.
end
(),
node
,
cache_node
);
}
swap_nodes
.
emplace_back
(
std
::
make_pair
(
node
,
cache_node
));
}
}
}
return
swap_nodes
;
}
void
InplacePass
::
CommitModify
(
const
NodeSwapQueue
&
swap_nodes
,
ir
::
Graph
*
graph
)
const
{
for
(
auto
&
pair
:
swap_nodes
)
{
auto
*
node
=
pair
.
first
,
*
cache_node
=
pair
.
second
;
const
std
::
string
var
=
node
->
Name
(),
cache_var
=
cache_node
->
Name
();
var_nodes_
[
cache_var
].
emplace_back
(
cache_node
);
graph
->
RemoveNode
(
node
);
auto
&
nodes
=
var_nodes_
.
at
(
var
);
// release unused var in graph. Because python side memory optimize
// may reused the var in same name, so we only clear the var node
// after current inplaced index.
nodes
.
erase
(
std
::
remove
(
nodes
.
begin
(),
nodes
.
end
(),
node
),
nodes
.
end
());
}
}
void
InplacePass
::
WithdrawModify
(
const
NodeSwapQueue
&
nodes
,
ir
::
Graph
*
graph
)
const
{
for
(
auto
&
pair
:
nodes
)
{
auto
*
node
=
pair
.
first
,
*
cache_node
=
pair
.
second
;
const
std
::
string
var
=
node
->
Name
(),
cache_var
=
cache_node
->
Name
();
auto
*
prev_op
=
node
->
inputs
[
0
];
std
::
replace
(
prev_op
->
outputs
.
begin
(),
prev_op
->
outputs
.
end
(),
cache_node
,
node
);
for
(
auto
*
next_op
:
node
->
outputs
)
{
std
::
replace
(
next_op
->
inputs
.
begin
(),
next_op
->
inputs
.
end
(),
cache_node
,
node
);
}
graph
->
RemoveNode
(
cache_node
);
}
}
void
InplacePass
::
TryInplaceOpInputOutput
(
ir
::
Node
*
op
,
ir
::
Graph
*
graph
)
const
{
VLOG
(
4
)
<<
"Try to inplace op "
<<
op
->
Name
();
PADDLE_ENFORCE
(
op
->
Op
()
!=
nullptr
&&
op
->
Op
()
->
Block
()
!=
nullptr
,
"op_desc is nullptr"
);
// some pre-requirments need to meet if the op want to inplaced.
auto
*
op_desc
=
op
->
Op
();
auto
&
infer_inplace
=
OpInfoMap
::
Instance
().
Get
(
op_desc
->
Type
()).
infer_inplace_
;
// 1. infer_inplace_ is registered.
if
(
!
static_cast
<
bool
>
(
infer_inplace
))
return
;
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
infer_inplace
),
"%s's infer_inplace has not been registered"
,
op_desc
->
Type
());
auto
*
block
=
op_desc
->
Block
();
auto
in_to_outs
=
infer_inplace
(
*
op_desc
,
block
);
auto
&
all_ops
=
view_
.
AllOps
();
auto
cursor
=
std
::
find
(
all_ops
.
begin
(),
all_ops
.
end
(),
op
);
size_t
idx
=
std
::
distance
(
all_ops
.
begin
(),
cursor
);
for
(
auto
&
pair
:
in_to_outs
)
{
auto
&
in_var_name
=
pair
.
first
;
auto
&
out_var_name
=
pair
.
second
;
auto
*
in_node
=
view_
.
GetNodeByName
(
in_var_name
,
op
->
inputs
);
auto
*
out_node
=
view_
.
GetNodeByName
(
out_var_name
,
op
->
outputs
);
// 2. there is no external pending op on the input node
if
(
view_
.
PendingOpsOnVar
(
in_node
).
size
()
>
1
)
{
VLOG
(
4
)
<<
string
::
Sprintf
(
"Skiped pair %s => %s. %s input has external dependency."
"inplace such pair will overwrite the memory."
,
out_var_name
,
in_var_name
,
op
->
Name
());
continue
;
}
// 3. if output has been memory optimize by python(fluid.memory_optmize()).
// this candidate can not be inplaced. Will be deprecated in the future.
if
(
view_
.
InSkipSet
(
out_node
->
Name
()))
{
VLOG
(
4
)
<<
string
::
Sprintf
(
"Skiped %s => %s reused previous memory block in python memory "
"optmize,"
"it inplace may generate a circle"
,
out_var_name
,
in_var_name
,
op
->
Name
());
continue
;
}
// Debug Interface. Which would be skipped by the pass.
if
(
out_node
->
Name
()
==
FLAGS_memory_optimize_debug
)
{
VLOG
(
3
)
<<
"Skiped var by force. FLAGS_memory_optimize_debug="
<<
out_node
->
Name
();
continue
;
}
// NOTE(dzhwinter):
// two stage commit of inplaced process. if after inplace happens generate a
// circle,
// then withdraw the changes. Otherwise, safely add the node.
auto
swap_nodes
=
TryInplaceModifyVar
(
out_var_name
,
in_var_name
,
idx
,
graph
);
if
(
!
ir
::
HasCircle
(
*
graph
))
{
VLOG
(
3
)
<<
string
::
Sprintf
(
"!!! %s, %s => %s inplaced"
,
op
->
Name
(),
out_var_name
,
in_var_name
);
InplaceModifyDesc
(
out_var_name
,
in_var_name
,
idx
);
CommitModify
(
swap_nodes
,
graph
);
}
else
{
VLOG
(
3
)
<<
string
::
Sprintf
(
"Skiped pair %s => %s, inplace will generate a circle. withdraw %s"
,
out_var_name
,
in_var_name
,
op
->
Name
());
WithdrawModify
(
swap_nodes
,
graph
);
}
}
}
ir
::
Node
*
GraphView
::
GetNodeByName
(
const
std
::
string
&
name
,
const
std
::
vector
<
ir
::
Node
*>&
nodes
)
const
{
// nodes should be op->inputs/outputs
// node in same node do have different name.
std
::
unordered_set
<
std
::
string
>
nodes_in_op
;
bool
has_dup_node
=
std
::
all_of
(
nodes
.
begin
(),
nodes
.
end
(),
[
&
nodes_in_op
](
ir
::
Node
*
node
)
{
if
(
!
node
->
IsVar
()
||
node
->
IsCtrlVar
()
||
node
->
Var
()
==
nullptr
)
{
if
(
nodes_in_op
.
count
(
node
->
Name
()))
return
true
;
nodes_in_op
.
emplace
(
node
->
Name
());
}
return
false
;
});
PADDLE_ENFORCE
(
has_dup_node
==
false
,
"nodes has same name!"
);
ir
::
Node
*
node
=
nullptr
;
for
(
auto
*
it
:
nodes
)
{
if
(
!
it
->
IsVar
()
||
it
->
IsCtrlVar
()
||
it
->
Var
()
==
nullptr
)
continue
;
if
(
it
->
Name
()
==
name
)
{
node
=
it
;
break
;
}
}
PADDLE_ENFORCE
(
node
!=
nullptr
,
string
::
Sprintf
(
"Not found var %s in nodes!"
,
name
));
return
node
;
}
std
::
vector
<
ir
::
Node
*>
GraphView
::
PendingOpsOnVar
(
ir
::
Node
*
node
)
{
// get the pending ops depends on same var node.
// because node also maybe a inplaced variable, so need to backtrack all the
// previous inplaced vars.
std
::
vector
<
ir
::
Node
*>
pending_ops
;
ir
::
Node
*
p
=
node
;
while
(
p
!=
nullptr
)
{
pending_ops
.
insert
(
pending_ops
.
end
(),
p
->
outputs
.
begin
(),
p
->
outputs
.
end
());
p
=
GetPrevCascadeInplacedVar
(
p
);
}
return
pending_ops
;
}
void
GraphView
::
Build
(
ir
::
Graph
*
g
)
{
// track the var nodes in correct order.
// Because we insert some new created node. Which may have data race between
// nodes.
// resolve data harzards depends on the var nodes in right order.
ops_
=
SortOpLikeDescOrder
(
*
g
);
// 1. track the nodes which reused previous node in Python memory optimize.
// these node can not be inplaced, otherwise may generate a circle in graph.
std
::
unordered_set
<
std
::
string
>
all_vars
;
for
(
auto
&
node
:
g
->
Nodes
())
{
if
(
node
->
IsVar
())
continue
;
for
(
auto
&
out
:
node
->
outputs
)
{
if
(
out
->
IsCtrlVar
()
||
out
->
Var
()
==
nullptr
)
continue
;
if
(
all_vars
.
count
(
out
->
Name
()))
{
dup_nodes_
.
emplace
(
out
->
Name
());
}
else
{
all_vars
.
emplace
(
out
->
Name
());
}
}
}
// 2. track the nodes which used by parameter server.
// these node can not be inplaced, otherwise trainer
// pserver can not find each other name.
auto
update_skip_set
=
[
&
](
ir
::
Node
*
node
)
{
for
(
auto
&
in
:
node
->
inputs
)
{
if
(
in
->
IsVar
()
&&
in
->
Var
()
!=
nullptr
)
dup_nodes_
.
emplace
(
in
->
Name
());
}
for
(
auto
&
out
:
node
->
outputs
)
{
if
(
out
->
IsVar
()
&&
out
->
Var
()
!=
nullptr
)
dup_nodes_
.
emplace
(
out
->
Name
());
}
};
for
(
auto
&
node
:
g
->
Nodes
())
{
if
(
!
node
->
IsOp
())
continue
;
if
(
node
->
Name
()
==
"send"
)
update_skip_set
(
node
);
if
(
node
->
Name
()
==
"recv"
)
update_skip_set
(
node
);
if
(
node
->
Name
()
==
"prefetch"
)
update_skip_set
(
node
);
}
}
const
std
::
vector
<
ir
::
Node
*>&
GraphView
::
AllOps
()
{
return
ops_
;
}
bool
GraphView
::
InSkipSet
(
const
std
::
string
&
var
)
const
{
return
dup_nodes_
.
count
(
var
);
}
}
// namespace details
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
inplace_pass
,
paddle
::
framework
::
details
::
InplacePass
);
paddle/fluid/framework/details/inplace_op_pass.h
0 → 100644
浏览文件 @
88d3dc94
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may abtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <map>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/details/memory_optimize_helper.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
details
{
class
GraphView
{
public:
GraphView
()
=
default
;
void
Build
(
ir
::
Graph
*
g
);
const
std
::
vector
<
ir
::
Node
*>&
AllOps
();
ir
::
Node
*
GetNodeByName
(
const
std
::
string
&
name
,
const
std
::
vector
<
ir
::
Node
*>&
nodes
)
const
;
std
::
vector
<
ir
::
Node
*>
PendingOpsOnVar
(
ir
::
Node
*
var
);
// Will Deperated in the future.
// NOTE(dzhwinter) :
// 1. Python memory optimize will reuse
// memory based var name, so different op output may
// have the same variable name. enable inplace on such node
// will generate a circle in ssa graph.
// 2. DistributeTranspiler will use unique name to
// map the parameter and gradient, must be skipped.
bool
InSkipSet
(
const
std
::
string
&
var
)
const
;
private:
std
::
vector
<
ir
::
Node
*>
ops_
;
std
::
unordered_set
<
std
::
string
>
dup_nodes_
;
// mem opt affect nodes
std
::
map
<
ir
::
Node
*
,
std
::
unordered_set
<
ir
::
Node
*>>
adj_list_
;
};
// swap pairs in sequence
typedef
std
::
vector
<
std
::
pair
<
ir
::
Node
*
,
ir
::
Node
*>>
NodeSwapQueue
;
class
InplacePass
:
public
ir
::
Pass
{
public:
InplacePass
();
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
;
void
InitSSAGraphNodes
()
const
;
private:
const
NodeSwapQueue
TryInplaceModifyVar
(
const
std
::
string
&
var
,
const
std
::
string
&
cache_var
,
const
size_t
&
idx
,
ir
::
Graph
*
graph
)
const
;
void
CommitModify
(
const
NodeSwapQueue
&
,
ir
::
Graph
*
graph
)
const
;
void
WithdrawModify
(
const
NodeSwapQueue
&
nodes
,
ir
::
Graph
*
graph
)
const
;
void
InplaceModifyDesc
(
const
std
::
string
&
in_var
,
const
std
::
string
&
out_var
,
const
size_t
&
idx
)
const
;
void
TryInplaceOpInputOutput
(
ir
::
Node
*
op
,
ir
::
Graph
*
graph
)
const
;
mutable
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
var_nodes_
;
mutable
std
::
unordered_set
<
std
::
string
>
whitelist_
;
mutable
GraphView
view_
;
};
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/memory_early_delete_pass.cc
已删除
100644 → 0
浏览文件 @
f3463ecb
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/memory_early_delete_pass.h"
#include <queue>
#include <string>
#include <vector>
#include "paddle/fluid/framework/details/memory_reuse_types.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/reference_count_pass_helper.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
namespace
paddle
{
namespace
framework
{
namespace
details
{
static
ComputationOpHandle
*
FindNextComputationOpHandle
(
VarHandle
*
var_in
)
{
std
::
queue
<
VarHandleBase
*>
queue
;
queue
.
push
(
var_in
);
do
{
auto
*
var
=
queue
.
front
();
queue
.
pop
();
for
(
auto
*
op
:
var
->
PendingOps
())
{
auto
*
compute_op
=
dynamic_cast
<
ComputationOpHandle
*>
(
op
);
if
(
compute_op
!=
nullptr
&&
compute_op
->
GetPlace
()
==
var_in
->
place
())
{
return
compute_op
;
}
for
(
auto
*
out_var
:
op
->
Outputs
())
{
queue
.
push
(
out_var
);
}
}
}
while
(
!
queue
.
empty
());
return
nullptr
;
}
std
::
unique_ptr
<
ir
::
Graph
>
MemoryEarlyDeletePass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
auto
&
graph_pool
=
Get
<
GraphNodePool
>
(
kGraphNodePool
);
auto
&
gcs
=
Get
<
GarbageCollectorMap
>
(
kGarbageCollector
);
std
::
unordered_map
<
std
::
string
,
std
::
unordered_set
<
OpDesc
*>>
unlived_vars
;
unlived_vars
.
reserve
(
graph_pool
.
size
());
for
(
auto
&
pair
:
graph_pool
)
{
unlived_vars
.
insert
(
std
::
make_pair
(
pair
.
first
,
pair
.
second
));
}
auto
compare_and_insert_early_delete_op
=
[
&
](
OpHandleBase
*
op
,
const
std
::
vector
<
VarHandleBase
*>&
vars
)
{
if
(
unlived_vars
.
empty
())
return
;
// unlived vars can be deleted after the last used op has finished.
auto
*
compute_op
=
dynamic_cast
<
ComputationOpHandle
*>
(
op
);
const
auto
&
places
=
Get
<
std
::
vector
<
platform
::
Place
>>
(
kAllPlaces
);
for
(
auto
&
var
:
vars
)
{
auto
*
var_handle
=
dynamic_cast
<
VarHandle
*>
(
var
);
auto
var_name
=
var
->
Node
()
->
Name
();
auto
&
var_place
=
var_handle
->
place
();
if
(
unlived_vars
.
count
(
var_name
)
==
0
)
continue
;
if
(
!
unlived_vars
[
var_name
].
empty
())
{
if
(
compute_op
!=
nullptr
&&
unlived_vars
[
var_name
].
count
(
compute_op
->
Node
()
->
Op
())
!=
0
)
{
unlived_vars
[
var_name
].
erase
(
compute_op
->
Node
()
->
Op
());
}
continue
;
}
if
(
var_handle
==
nullptr
||
!
var_handle
->
Node
()
->
IsVar
()
||
var_handle
->
Node
()
->
IsCtrlVar
())
continue
;
// shameless copyed from reference count pass.
if
(
compute_op
==
nullptr
)
{
// use next computation op scope
compute_op
=
FindNextComputationOpHandle
(
var_handle
);
}
auto
*
early_delete_node
=
graph
->
CreateEmptyNode
(
"early_delete"
,
ir
::
Node
::
Type
::
kOperation
);
GarbageCollector
*
gc
=
gcs
.
at
(
places
[
compute_op
->
GetScopeIdx
()]).
get
();
auto
*
early_delete_handle
=
new
EarlyDeleteOpHandle
(
early_delete_node
,
compute_op
->
GetScope
(),
var_place
,
{
var_name
},
gc
);
if
(
compute_op
->
Outputs
().
empty
())
{
auto
*
dep_var
=
new
DummyVarHandle
(
graph
->
CreateControlDepVar
());
compute_op
->
AddOutput
(
dep_var
);
graph
->
Get
<
GraphDepVars
>
(
kGraphDepVars
).
emplace
(
dep_var
);
}
early_delete_handle
->
AddInput
(
compute_op
->
Outputs
().
front
());
VLOG
(
5
)
<<
"Add early delete op "
<<
var_name
<<
" to Operator"
<<
compute_op
->
Name
();
}
};
auto
all_ops
=
ir
::
FilterByNodeWrapper
<
OpHandleBase
>
(
*
graph
);
for
(
auto
&
op
:
all_ops
)
{
compare_and_insert_early_delete_op
(
op
,
op
->
Inputs
());
compare_and_insert_early_delete_op
(
op
,
op
->
Outputs
());
}
return
graph
;
}
}
// namespace details
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
memory_early_delete_pass
,
paddle
::
framework
::
details
::
MemoryEarlyDeletePass
)
.
RequireGraphAttr
(
paddle
::
framework
::
details
::
kGraphNodePool
)
.
RequireGraphAttr
(
paddle
::
framework
::
details
::
kGarbageCollector
);
paddle/fluid/framework/details/
analysis_var_pass
.cc
→
paddle/fluid/framework/details/
memory_optimize_helper
.cc
浏览文件 @
88d3dc94
...
...
@@ -12,384 +12,19 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/analysis_var_pass.h"
#include <algorithm>
#include <atomic>
#include "paddle/fluid/framework/details/memory_optimize_helper.h"
#include <deque>
#include <f
stream
>
#include <f
unctional
>
#include <iostream>
#include <iterator>
#include <memory>
#include <queue>
#include <numeric>
#include <sstream>
#include <string>
#include <type_traits>
#include <vector>
#include "gflags/gflags.h"
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
DEFINE_bool
(
enable_subgraph_optimize
,
false
,
"SubGraph also reuse global graph variables, it will reduce the "
"memory occupation"
"but a higher risk of memory reuse error. default disabled."
);
DEFINE_string
(
memory_optimize_debug
,
""
,
"debug the operator output variable when do the variable reuse."
"memory reuse pass."
"only for debug, default disabled."
);
#include "paddle/fluid/framework/var_desc.h"
namespace
paddle
{
namespace
framework
{
namespace
details
{
static
inline
bool
IsSameDesc
(
OpDesc
*
op1
,
OpDesc
*
op2
)
{
return
op1
->
Type
()
==
op2
->
Type
()
&&
op1
->
Inputs
()
==
op2
->
Inputs
()
&&
op1
->
Outputs
()
==
op2
->
Outputs
();
}
template
<
typename
Container
,
typename
Callback
>
class
FilterVariableImpl
{
public:
void
operator
()(
const
Container
&
nodes
,
Callback
callback
)
{
for
(
auto
*
node
:
nodes
)
{
callback
(
node
);
}
}
};
// filter var node for op->inputs/outputs
template
<
typename
Callback
>
class
FilterVariableImpl
<
std
::
vector
<
ir
::
Node
*>
,
Callback
>
{
public:
void
operator
()(
const
std
::
vector
<
ir
::
Node
*>&
nodes
,
Callback
callback
)
{
for
(
auto
*
var
:
nodes
)
{
if
(
var
->
IsVar
()
&&
!
var
->
IsCtrlVar
())
{
callback
(
var
);
}
}
}
};
template
<
typename
Container
,
typename
Callback
>
void
FilterVariables
(
const
Container
&
nodes
,
Callback
callback
)
{
FilterVariableImpl
<
Container
,
Callback
>
()(
nodes
,
callback
);
}
std
::
unique_ptr
<
ir
::
Graph
>
AnalysisVarPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
auto
nodes
=
graph
->
Nodes
();
auto
subblock_vars
=
GetSubBlockVars
(
nodes
);
skip_set_
.
insert
(
subblock_vars
.
begin
(),
subblock_vars
.
end
());
cfg_
.
reset
(
new
details
::
ControlFlowGraph
(
*
graph
));
cfg_
->
LiveVariableAnalysis
();
InitSSAGraphNodes
();
int
reuse_id
=
0
;
for
(
size_t
idx
=
0
;
idx
<
cfg_
->
Ops
().
size
();
++
idx
)
{
auto
&
op
=
cfg_
->
Ops
()[
idx
];
auto
*
op_desc
=
op
->
Op
();
// some op in graph has no op desc
if
(
op_desc
==
nullptr
)
continue
;
if
(
OpHasSubBlock
(
op_desc
))
{
if
(
FLAGS_enable_subgraph_optimize
)
{
SubGraphOptimize
(
op_desc
);
}
else
{
VLOG
(
3
)
<<
op
->
Name
()
<<
" has subblock, but disable subgraph optimize. skipped."
;
continue
;
}
}
for
(
auto
&
var
:
op
->
outputs
)
{
if
(
NodeCanReused
(
var
)
&&
cfg_
->
Use
(
op
).
count
(
var
->
Name
())
==
0
)
{
ir
::
Node
*
cache
=
pool_
.
NodeMatch
(
var
);
if
(
var
->
Name
()
==
FLAGS_memory_optimize_debug
)
{
VLOG
(
3
)
<<
"start match var "
<<
DebugString
(
var
)
<<
" of op "
<<
op
->
Name
();
VLOG
(
3
)
<<
pool_
.
ToString
();
VLOG
(
3
)
<<
"matched in pool : "
<<
((
cache
==
nullptr
)
?
"False"
:
"True"
);
}
if
(
cache
!=
nullptr
)
{
if
(
var
->
Name
()
==
cache
->
Name
())
{
VLOG
(
3
)
<<
"The same cache variable is cascade reused."
<<
var
->
Name
()
<<
" is re-filled to the pool after"
<<
"the reused op is finished. Current op can not "
<<
"replace it again. Skip this candidate."
;
continue
;
}
int
node_idx_in_pool
=
pool_
.
GetIndex
(
cache
);
VLOG
(
3
)
<<
string
::
Sprintf
(
"!!! %s, %s => %s, cache idx %d, pool size %d"
,
std
::
to_string
(
reuse_id
++
),
DebugString
(
var
),
DebugString
(
cache
),
node_idx_in_pool
,
static_cast
<
int
>
(
pool_
.
size
()));
// update CFG Graph on the fly.
// reused var maybe re-fill into the pool
cfg_
->
RenameVarInCFGGraph
(
var
->
Name
(),
cache
->
Name
(),
idx
);
// NOTE(dzhwinter): we need to both update the ProgramDesc
// and IR Graph. because op_desc/var_desc is used in CreateOp,
// CreateVar when running happens. But IR Graph
// define the dependence relationship between nodes.
RenameVarInGraphDesc
(
var
->
Name
(),
cache
->
Name
(),
idx
);
RenameVarInGraphNode
(
var
->
Name
(),
cache
->
Name
(),
idx
,
graph
.
get
());
pool_
.
Erase
(
cache
);
}
}
}
// fill the pool
for
(
auto
var
:
cfg_
->
LiveIn
(
op
))
{
if
(
cfg_
->
LiveOut
(
op
).
count
(
var
)
==
0
)
{
ir
::
Node
*
var_node
=
cfg_
->
GetNodeFromVarName
(
var
,
op
);
if
(
var_node
==
nullptr
)
continue
;
if
(
NodeCanReused
(
var_node
)
&&
!
pool_
.
Has
(
var_node
))
{
pool_
.
Insert
(
var_node
,
op
);
}
}
}
}
graph
->
ResolveHazard
(
var_nodes_
);
// For early delete pass. use GraphNodePool load the unlived vars.
// 1. find all deps op for each unlived var in memory pool.
for
(
auto
&
op
:
graph
->
Nodes
())
{
for
(
auto
&
var
:
op
->
inputs
)
{
if
(
pool_
.
Has
(
var
))
{
pool_
.
Insert
(
var
,
op
);
}
}
}
// 2. convert ir node based memory pool to graph node
// because Node* maybe released bettwen passes.
auto
&
graph_pool
=
graph
->
Get
<
GraphNodePool
>
(
kGraphNodePool
);
for
(
auto
it
=
pool_
.
begin
();
it
!=
pool_
.
end
();
++
it
)
{
std
::
unordered_set
<
OpDesc
*>
descs
;
for
(
auto
&
op
:
it
->
second
)
{
PADDLE_ENFORCE
(
op
->
IsOp
());
descs
.
insert
(
op
->
Op
());
}
graph_pool
.
push_back
(
std
::
make_pair
(
it
->
first
->
Name
(),
descs
));
}
return
graph
;
}
void
AnalysisVarPass
::
SubGraphOptimize
(
OpDesc
*
op_desc
)
const
{
// conditional block, while op and their grad op
auto
*
sub_block_desc
=
AttrReader
(
op_desc
->
GetAttrMap
()).
Get
<
BlockDesc
*>
(
"sub_block"
);
// create a mirror block to construct an IR Graph.
ProgramDesc
prog
;
auto
*
copy_block
=
prog
.
MutableBlock
(
0
);
for
(
auto
*
op
:
sub_block_desc
->
AllOps
())
{
auto
*
copy_op
=
copy_block
->
AppendOp
();
copy_op
->
CopyFrom
(
*
op
);
copy_op
->
Flush
();
}
for
(
auto
*
var
:
sub_block_desc
->
AllVars
())
{
auto
*
copy_var
=
copy_block
->
Var
(
var
->
Name
());
copy_var
->
SetDataType
(
var
->
GetDataType
());
// only lod tensor can be reused. So ignore the multiple dims case.
copy_var
->
SetType
(
var
->
GetType
());
copy_var
->
SetShape
(
var
->
GetShape
());
copy_var
->
SetPersistable
(
var
->
Persistable
());
}
ir
::
Graph
sub_graph
(
prog
);
std
::
unordered_set
<
ir
::
Node
*>
sub_graph_all_ops
;
FilterVariables
(
sub_graph
.
Nodes
(),
[
&
](
ir
::
Node
*
var
)
{
// sub_graph_all_ops.emplace(var);
if
(
var
->
IsVar
()
&&
!
var
->
IsCtrlVar
())
{
sub_graph_all_ops
.
emplace
(
var
);
}
});
int
sub_reuse_id
=
0
;
// subgraph nodes is unordered, reuse need to follow the desc order.
// find the right op node through the descs
for
(
auto
*
sub_op_desc
:
sub_block_desc
->
AllOps
())
{
ir
::
Node
*
sub_op
=
nullptr
;
for
(
auto
*
node
:
sub_graph_all_ops
)
{
if
(
node
->
Op
()
==
sub_op_desc
)
{
sub_op
=
node
;
break
;
}
}
PADDLE_ENFORCE
(
sub_op
!=
nullptr
);
for
(
auto
*
var
:
sub_op
->
outputs
)
{
if
(
NodeCanReused
(
var
))
{
ir
::
Node
*
cache
=
pool_
.
NodeMatch
(
var
);
if
(
cache
!=
nullptr
)
{
if
(
var
->
Var
()
->
GetDataType
()
!=
cache
->
Var
()
->
GetDataType
())
{
continue
;
}
int
node_idx_in_pool
=
pool_
.
GetIndex
(
cache
);
VLOG
(
3
)
<<
string
::
Sprintf
(
"!!! %s, %s => %s, cache idx %d, pool size %d"
,
std
::
to_string
(
sub_reuse_id
++
),
DebugString
(
var
),
DebugString
(
cache
),
node_idx_in_pool
,
static_cast
<
int
>
(
pool_
.
size
()));
// NOTE(dzh): subblock is not in IR graph. Modify the block_desc
// immediately to make the subblock variable reuse strategy take
// effect. Because it is a single op in graph. No need to
// update the ir nodes.
sub_op_desc
->
Rename
(
var
->
Name
(),
cache
->
Name
());
if
(
sub_op_desc
->
Block
()
->
HasVar
(
var
->
Name
()))
{
sub_op_desc
->
Block
()
->
RemoveVar
(
var
->
Name
());
}
}
}
}
}
}
std
::
unordered_set
<
std
::
string
>
AnalysisVarPass
::
GetSubBlockVars
(
const
std
::
unordered_set
<
ir
::
Node
*>&
nodes
)
const
{
std
::
unordered_set
<
std
::
string
>
vars
;
for
(
auto
&
op
:
nodes
)
{
if
(
!
op
->
IsOp
()
||
op
->
Op
()
==
nullptr
)
continue
;
auto
*
op_desc
=
op
->
Op
();
if
(
OpHasSubBlock
(
op_desc
))
{
auto
inputs
=
op_desc
->
InputArgumentNames
();
auto
outputs
=
op_desc
->
OutputArgumentNames
();
vars
.
insert
(
inputs
.
begin
(),
inputs
.
end
());
vars
.
insert
(
outputs
.
begin
(),
outputs
.
end
());
}
}
return
vars
;
}
void
AnalysisVarPass
::
RenameVarInGraphDesc
(
const
std
::
string
&
var
,
const
std
::
string
&
cache_var
,
size_t
idx
)
const
{
for
(
size_t
i
=
idx
;
i
<
cfg_
->
Ops
().
size
();
++
i
)
{
auto
*
op
=
cfg_
->
Ops
()[
i
];
PADDLE_ENFORCE
(
op
->
IsOp
()
&&
op
->
Op
());
auto
*
op_desc
=
op
->
Op
();
op_desc
->
RenameInput
(
var
,
cache_var
);
op_desc
->
RenameOutput
(
var
,
cache_var
);
if
(
op_desc
->
Block
()
->
HasVar
(
var
))
op_desc
->
Block
()
->
RemoveVar
(
var
);
op_desc
->
Flush
();
}
}
void
AnalysisVarPass
::
InitSSAGraphNodes
()
const
{
std
::
unordered_map
<
std
::
string
,
std
::
unordered_set
<
ir
::
Node
*>>
all_vars
;
if
(
var_nodes_
.
empty
())
{
for
(
auto
*
op
:
cfg_
->
Ops
())
{
for
(
auto
*
node
:
op
->
inputs
)
{
if
(
all_vars
[
node
->
Name
()].
count
(
node
)
==
0
)
{
all_vars
[
node
->
Name
()].
emplace
(
node
);
var_nodes_
[
node
->
Name
()].
emplace_back
(
node
);
}
}
for
(
auto
*
node
:
op
->
outputs
)
{
if
(
all_vars
[
node
->
Name
()].
count
(
node
)
==
0
)
{
all_vars
[
node
->
Name
()].
emplace
(
node
);
var_nodes_
[
node
->
Name
()].
emplace_back
(
node
);
}
}
}
}
}
void
AnalysisVarPass
::
RenameVarInGraphNode
(
const
std
::
string
&
var
,
const
std
::
string
&
cache_var
,
size_t
idx
,
ir
::
Graph
*
graph
)
const
{
// if replace happens, we need to create a newer version cache_var
// but use the same dims/data_type with var.
PADDLE_ENFORCE
(
var_nodes_
[
var
].
size
()
>=
1
&&
var_nodes_
[
var
].
at
(
0
)
->
Var
()
!=
nullptr
);
std
::
unique_ptr
<
VarDesc
>
var_desc
(
new
VarDesc
(
*
var_nodes_
[
var
].
at
(
0
)
->
Var
()));
var_desc
->
SetName
(
cache_var
);
for
(
size_t
i
=
idx
;
i
<
cfg_
->
Ops
().
size
();
++
i
)
{
auto
*
op
=
cfg_
->
Ops
()[
i
];
// redirect the input to the latest version of cache_var
for
(
auto
*
node
:
op
->
inputs
)
{
if
(
node
->
Name
()
==
var
)
{
ir
::
Node
*
cache_node
=
graph
->
CreateVarNode
(
var_desc
.
get
());
var_nodes_
[
cache_var
].
emplace_back
(
cache_node
);
// swap node to cache_node
cache_node
->
outputs
.
insert
(
cache_node
->
outputs
.
end
(),
node
->
outputs
.
begin
(),
node
->
outputs
.
end
());
PADDLE_ENFORCE
(
node
->
inputs
.
size
()
==
1
&&
node
->
inputs
[
0
]
->
IsOp
());
auto
*
prev_op
=
node
->
inputs
[
0
];
std
::
replace
(
prev_op
->
outputs
.
begin
(),
prev_op
->
outputs
.
end
(),
node
,
cache_node
);
cache_node
->
inputs
.
emplace_back
(
prev_op
);
for
(
auto
*
next_op
:
node
->
outputs
)
{
std
::
replace
(
next_op
->
inputs
.
begin
(),
next_op
->
inputs
.
end
(),
node
,
cache_node
);
}
}
}
// if we need to rename the output,
// always create a newer version of cache_var
for
(
auto
*
node
:
op
->
outputs
)
{
if
(
node
->
Name
()
==
var
)
{
ir
::
Node
*
cache_node
=
graph
->
CreateVarNode
(
var_desc
.
get
());
var_nodes_
[
cache_var
].
emplace_back
(
cache_node
);
// swap node to cache node
cache_node
->
outputs
.
insert
(
cache_node
->
outputs
.
end
(),
node
->
outputs
.
begin
(),
node
->
outputs
.
end
());
cache_node
->
inputs
.
emplace_back
(
op
);
std
::
replace
(
op
->
outputs
.
begin
(),
op
->
outputs
.
end
(),
node
,
cache_node
);
for
(
auto
*
next_op
:
node
->
outputs
)
{
std
::
replace
(
next_op
->
inputs
.
begin
(),
next_op
->
inputs
.
end
(),
node
,
cache_node
);
}
}
}
}
// release node of unused var in graph
for
(
auto
*
node
:
var_nodes_
[
var
])
{
graph
->
RemoveNode
(
node
);
}
var_nodes_
.
at
(
var
).
clear
();
}
bool
AnalysisVarPass
::
NodeCanReused
(
ir
::
Node
*
node
)
const
{
if
(
!
node
->
IsVar
()
||
node
->
IsCtrlVar
())
return
false
;
auto
*
desc
=
node
->
Var
();
auto
type
=
desc
->
GetType
();
if
(
desc
->
Persistable
()
||
type
!=
proto
::
VarType
::
LOD_TENSOR
||
desc
->
GetShape
().
empty
())
{
return
false
;
}
// vars can be @EMPTY@, @LR_DECAY_REUSE_ID@. For example, while_grad
std
::
string
name
=
node
->
Name
();
if
(
!
name
.
empty
()
&&
name
[
0
]
==
'@'
&&
name
[
name
.
size
()
-
1
]
==
'@'
)
return
false
;
if
(
skip_set_
.
count
(
name
))
return
false
;
for
(
auto
*
op
:
node
->
inputs
)
{
if
(
op
->
Op
()
->
HasAttr
(
"force_cpu"
))
{
// op output force generated in cpu, can not be reused.
return
framework
::
AttrReader
(
op
->
Op
()
->
GetAttrMap
())
.
Get
<
bool
>
(
"force_cpu"
)
==
0
;
}
}
return
true
;
}
bool
AnalysisVarPass
::
OpHasSubBlock
(
OpDesc
*
desc
)
const
{
const
AttributeMap
&
attrs
=
desc
->
GetAttrMap
();
for
(
auto
&
attr
:
attrs
)
{
if
(
attr
.
second
.
type
()
==
typeid
(
BlockDesc
*
)
||
// NOLINT
attr
.
second
.
type
()
==
typeid
(
std
::
vector
<
BlockDesc
*>
))
// NOLINT
return
true
;
}
return
false
;
}
using
paddle
::
framework
::
VarDesc
;
std
::
vector
<
ir
::
Node
*>
SortOpLikeDescOrder
(
const
ir
::
Graph
&
graph
)
{
PADDLE_ENFORCE
(
graph
.
Has
(
kAllOpDescs
),
...
...
@@ -479,6 +114,193 @@ std::vector<ir::Node*> SortOpLikeDescOrder(const ir::Graph& graph) {
return
ret
;
}
size_t
NodeSize
(
const
VarDesc
&
node
)
{
auto
shape
=
node
.
GetShape
();
int
size
=
std
::
accumulate
(
shape
.
begin
(),
shape
.
end
(),
1
,
std
::
multiplies
<
int
>
());
size_t
type_size
=
SizeOfType
(
node
.
GetDataType
());
return
type_size
*
std
::
abs
(
size
);
}
size_t
NodeSize
(
ir
::
Node
*
n
)
{
auto
*
desc
=
FindVarDescInBlock
(
n
);
return
NodeSize
(
*
desc
);
}
std
::
string
DebugStringImpl
(
VarDesc
*
var
)
{
std
::
stringstream
ss
;
ss
<<
var
->
Name
();
ss
<<
"["
;
try
{
auto
shape
=
var
->
GetShape
();
for
(
size_t
i
=
0
;
i
<
shape
.
size
();
++
i
)
{
if
(
i
!=
shape
.
size
()
-
1
)
{
ss
<<
shape
[
i
]
<<
","
;
}
else
{
ss
<<
shape
[
i
];
}
}
ss
<<
"]"
;
}
catch
(...)
{
ss
<<
"Var has no VarDesc !!! Name:"
<<
var
->
Name
();
}
return
ss
.
str
();
}
std
::
string
DebugString
(
ir
::
Node
*
var
)
{
return
DebugStringImpl
(
FindVarDescInBlock
(
var
));
}
// NOTE(dzh): based ir node, if a large node has been reused
// by a small size node, then next time it appear in pool, it will
// have the small size. Find the original node shap from blockdesc.
VarDesc
*
FindVarDescInBlock
(
ir
::
Node
*
n
)
{
PADDLE_ENFORCE
(
n
->
IsVar
()
&&
!
n
->
IsCtrlVar
()
&&
n
->
inputs
.
size
()
==
1
);
BlockDesc
*
block
=
n
->
inputs
[
0
]
->
Op
()
->
Block
();
PADDLE_ENFORCE
(
block
->
HasVar
(
n
->
Name
()),
string
::
Sprintf
(
"Block do not has var %s"
,
n
->
Name
()));
return
block
->
FindVar
(
n
->
Name
());
}
struct
NodeComparator
{
bool
operator
()(
ir
::
Node
*
lhs
,
ir
::
Node
*
rhs
)
const
{
auto
*
lhs_desc
=
FindVarDescInBlock
(
lhs
);
auto
*
rhs_desc
=
FindVarDescInBlock
(
rhs
);
auto
lhs_shape
=
lhs_desc
->
GetShape
();
auto
rhs_shape
=
rhs_desc
->
GetShape
();
if
((
lhs_shape
[
0
]
==
-
1
&&
rhs_shape
[
0
]
==
-
1
)
||
(
lhs_shape
[
0
]
!=
-
1
&&
rhs_shape
[
0
]
!=
-
1
))
{
return
NodeSize
(
lhs
)
<=
NodeSize
(
rhs
);
}
else
{
return
false
;
}
}
};
void
OrderedSet
::
Insert
(
ir
::
Node
*
var
)
{
PADDLE_ENFORCE
(
var
->
IsVar
()
&&
!
var
->
IsCtrlVar
());
if
(
mark_table_
.
count
(
var
->
Name
())
!=
0
)
{
mark_table_
[
var
->
Name
()]
->
emplace_back
(
var
);
return
;
}
auto
*
var_desc
=
FindVarDescInBlock
(
var
);
auto
var_shape
=
var_desc
->
GetShape
();
int
batch_size
=
static_cast
<
int
>
(
var_shape
[
0
]);
NodeComparator
functor
;
Iter
it
=
nodes_
.
begin
();
while
(
it
!=
nodes_
.
end
())
{
auto
&
prev
=
it
->
front
();
auto
*
cache_desc
=
FindVarDescInBlock
(
prev
);
int
cache_batch_size
=
cache_desc
->
GetShape
()[
0
];
if
((
cache_batch_size
==
-
1
&&
batch_size
==
-
1
)
||
(
cache_batch_size
!=
-
1
&&
batch_size
!=
-
1
))
{
if
(
functor
(
prev
,
var
))
{
++
it
;
}
else
{
break
;
}
}
else
if
(
cache_batch_size
==
-
1
&&
batch_size
!=
-
1
)
{
++
it
;
}
else
if
(
cache_batch_size
!=
-
1
&&
batch_size
==
-
1
)
{
break
;
}
}
it
=
nodes_
.
insert
(
it
,
{
var
});
mark_table_
[
var
->
Name
()]
=
it
;
}
int
OrderedSet
::
GetNodeIndexInPool
(
ir
::
Node
*
var
)
{
return
std
::
distance
(
nodes_
.
begin
(),
mark_table_
[
var
->
Name
()]);
}
ir
::
Node
*
OrderedSet
::
FindBestFitNode
(
ir
::
Node
*
var
)
const
{
ir
::
Node
*
found_node
=
nullptr
;
NodeComparator
functor
;
for
(
auto
it
=
nodes_
.
begin
();
it
!=
nodes_
.
end
();
++
it
)
{
auto
&
candidate
=
it
->
front
();
if
(
functor
(
var
,
candidate
))
{
found_node
=
candidate
;
break
;
}
}
return
found_node
;
}
bool
OrderedSet
::
Has
(
ir
::
Node
*
var
)
const
{
if
(
mark_table_
.
count
(
var
->
Name
()))
{
auto
&
node_in_samename
=
mark_table_
.
at
(
var
->
Name
());
auto
iter
=
std
::
find_if
(
node_in_samename
->
begin
(),
node_in_samename
->
end
(),
[
&
](
ir
::
Node
*
n
)
{
return
n
->
Name
()
==
var
->
Name
();
});
return
iter
!=
node_in_samename
->
end
();
}
return
false
;
}
void
OrderedSet
::
Erase
(
ir
::
Node
*
var
)
{
PADDLE_ENFORCE
(
mark_table_
.
count
(
var
->
Name
()));
nodes_
.
erase
(
mark_table_
[
var
->
Name
()]);
mark_table_
.
erase
(
var
->
Name
());
}
std
::
string
OrderedSet
::
ToString
()
const
{
std
::
stringstream
ss
;
for
(
auto
it
=
nodes_
.
begin
();
it
!=
nodes_
.
end
();
++
it
)
{
for
(
auto
&
node
:
*
it
)
{
ss
<<
DebugString
(
node
)
<<
" "
;
}
}
return
ss
.
str
();
}
bool
NodeCanReused
(
ir
::
Node
*
node
)
{
// valid the node is a var node
if
(
node
==
nullptr
||
!
node
->
IsVar
()
||
node
->
IsCtrlVar
())
return
false
;
bool
flag
=
true
;
// op output force generated in cpu, can not be reused.
for
(
auto
*
op
:
node
->
inputs
)
{
if
(
op
->
Op
()
->
HasAttr
(
"force_cpu"
))
{
flag
&=
framework
::
AttrReader
(
op
->
Op
()
->
GetAttrMap
())
.
Get
<
bool
>
(
"force_cpu"
)
==
0
;
}
}
// var desc validation.
flag
&=
NodeCanReused
(
*
node
->
Var
());
return
flag
;
}
bool
NodeCanReused
(
const
VarDesc
&
node
)
{
auto
type
=
node
.
GetType
();
if
(
!
(
type
==
proto
::
VarType
::
LOD_TENSOR
||
type
==
proto
::
VarType
::
SELECTED_ROWS
||
type
==
proto
::
VarType
::
LOD_TENSOR_ARRAY
))
{
return
false
;
}
if
(
node
.
Persistable
()
||
node
.
GetShape
().
empty
())
{
return
false
;
}
// vars can be @EMPTY@, @LR_DECAY_REUSE_ID@. For example, while_grad
std
::
string
name
=
node
.
Name
();
if
(
!
name
.
empty
()
&&
name
[
0
]
==
'@'
&&
name
[
name
.
size
()
-
1
]
==
'@'
)
return
false
;
return
true
;
}
bool
OpHasSubBlock
(
OpDesc
*
desc
)
{
const
AttributeMap
&
attrs
=
desc
->
GetAttrMap
();
for
(
auto
&
attr
:
attrs
)
{
if
(
attr
.
second
.
type
()
==
typeid
(
BlockDesc
*
)
||
// NOLINT
attr
.
second
.
type
()
==
typeid
(
std
::
vector
<
BlockDesc
*>
))
// NOLINT
return
true
;
}
return
false
;
}
ControlFlowGraph
::
ControlFlowGraph
(
const
ir
::
Graph
&
graph
)
{
ops_
=
SortOpLikeDescOrder
(
graph
);
ConnectNodes
();
...
...
@@ -630,7 +452,7 @@ const std::vector<ir::Node*> ControlFlowGraph::Ops() const { return ops_; }
std
::
vector
<
ir
::
Node
*>&
ControlFlowGraph
::
Ops
()
{
return
ops_
;
}
ir
::
Node
*
ControlFlowGraph
::
GetNode
FromVar
Name
(
const
std
::
string
&
name
,
ir
::
Node
*
ControlFlowGraph
::
GetNode
By
Name
(
const
std
::
string
&
name
,
ir
::
Node
*
op
)
const
{
// in ssa-graph, different version nodes have same name,
// this function get the latest version var before target op
...
...
@@ -650,7 +472,3 @@ ir::Node* ControlFlowGraph::GetNodeFromVarName(const std::string& name,
}
// namespace details
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
analysis_var_pass
,
paddle
::
framework
::
details
::
AnalysisVarPass
)
.
RequireGraphAttr
(
paddle
::
framework
::
details
::
kGraphNodePool
)
.
RequireGraphAttr
(
paddle
::
framework
::
details
::
kAllOpDescs
);
paddle/fluid/framework/details/memory_
reuse_types
.h
→
paddle/fluid/framework/details/memory_
optimize_helper
.h
浏览文件 @
88d3dc94
...
...
@@ -17,6 +17,8 @@
#include <iostream>
#include <iterator>
#include <list>
#include <map>
#include <set>
#include <string>
#include <utility>
#include <vector>
...
...
@@ -27,37 +29,41 @@ namespace paddle {
namespace
framework
{
namespace
details
{
constexpr
char
kFetchedVars
[]
=
"fetched_vars"
;
constexpr
char
kGraphNodePool
[]
=
"graph_node_pool"
;
constexpr
char
kAllOpDescs
[]
=
"all_op_descs"
;
// NOTE(dzh): Variable and the operators use the var.
// for early delete pass.
// Because analysis var pass build base on ir::Node, which maybe released
// or modified between passes, so we use OpDesc* to mark ops.
using
GraphNodePool
=
std
::
vector
<
std
::
pair
<
std
::
string
/*var node*/
,
std
::
unordered_set
<
OpDesc
*>
/* ops */
>>
;
std
::
vector
<
ir
::
Node
*>
SortOpLikeDescOrder
(
const
ir
::
Graph
&
graph
);
// NOTE(dzh): by default, it sort node in ascend order(by node bytes size).
// in fluid, -1 means the batch_size is determined in runtime.
// the node batch_size equal -1 always ranking in the front than the node not.
// NOTE(dzh): A ordered set for node reuse in memory optimize.
// the orderedset sort node in ascend order(by node bytes size).
// in fluid, -1 means the batch_size, which is determined in runtime.
// So the reuse happens between nodes who's batch_size both are -1
// simultaneously or not.
//
// sort rule:
// rule 0 : smaller node ranking in front.
// rule 1 : batch_size equal -1 ranking in the front than the node not.
//
// For example,
// node0[-1, 1] node1[-1, 1, 1], node2[1,1], node3[1,1024], ..
// O(1) insert, delete
class
OrderedNodePairPool
{
public:
using
NodePair
=
std
::
pair
<
ir
::
Node
*
,
std
::
unordered_set
<
ir
::
Node
*>>
;
using
Iter
=
typename
std
::
list
<
NodePair
>::
iterator
;
using
ConstIter
=
typename
std
::
list
<
NodePair
>::
const_iterator
;
void
Insert
(
ir
::
Node
*
var
,
ir
::
Node
*
op
);
class
OrderedSet
{
public:
// nodes with same name exists in pool.
using
NodeVector
=
std
::
vector
<
ir
::
Node
*>
;
using
Iter
=
typename
std
::
list
<
NodeVector
>::
iterator
;
using
ConstIter
=
typename
std
::
list
<
NodeVector
>::
const_iterator
;
void
Insert
(
ir
::
Node
*
var
);
void
Erase
(
ir
::
Node
*
var
);
bool
Has
(
ir
::
Node
*
var
)
{
return
mark_table_
.
count
(
var
->
Name
());
}
ir
::
Node
*
NodeMatch
(
ir
::
Node
*
var
)
const
;
bool
Has
(
ir
::
Node
*
var
)
const
;
void
Clear
()
{
mark_table_
.
clear
();
nodes_
.
clear
();
}
// find the bestfit shape node block with var.
ir
::
Node
*
FindBestFitNode
(
ir
::
Node
*
var
)
const
;
// map store non-const iterator, can not promise const
int
Get
Index
(
ir
::
Node
*
var
);
int
Get
NodeIndexInPool
(
ir
::
Node
*
var
);
// pool all node to string
std
::
string
ToString
()
const
;
...
...
@@ -65,23 +71,112 @@ class OrderedNodePairPool {
Iter
end
()
{
return
nodes_
.
end
();
}
ConstIter
begin
()
const
{
return
nodes_
.
begin
();
}
ConstIter
end
()
const
{
return
nodes_
.
end
();
}
size_t
size
()
const
{
return
nodes_
.
size
();
}
private:
// for searching.
std
::
unordered_map
<
std
::
string
,
Iter
>
mark_table_
;
// node
swap pairs. var -> ops dep var
std
::
list
<
Node
Pai
r
>
nodes_
;
// node
pool
std
::
list
<
Node
Vecto
r
>
nodes_
;
};
class
ControlFlowGraph
{
public:
ControlFlowGraph
()
=
default
;
// IR Graph
explicit
ControlFlowGraph
(
const
ir
::
Graph
&
graph
);
void
LiveVariableAnalysis
();
void
RenameVarInCFGGraph
(
const
std
::
string
&
old_node
,
const
std
::
string
&
new_node
,
int
begin_idx
);
const
std
::
set
<
std
::
string
>
LiveIn
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>
LiveOut
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>
Use
(
ir
::
Node
*
op
)
const
;
const
std
::
vector
<
ir
::
Node
*>
Ops
()
const
;
std
::
vector
<
ir
::
Node
*>&
Ops
();
// for ssa-graph nodes
ir
::
Node
*
GetNodeByName
(
const
std
::
string
&
name
,
ir
::
Node
*
op
)
const
;
private:
void
BuildCFGGraph
();
void
ConnectNodes
();
using
NodeListMap
=
std
::
unordered_map
<
ir
::
Node
*
,
std
::
set
<
ir
::
Node
*>>
;
using
VarSetMap
=
std
::
map
<
ir
::
Node
*
,
std
::
set
<
std
::
string
>>
;
// successors ops use the output variables.
NodeListMap
successors_
;
// predecessors ops generated input variables.
NodeListMap
predecessors_
;
// variables lived before run current op.
VarSetMap
live_in_
;
// variables lived after run current op.
VarSetMap
live_out_
;
VarSetMap
uses_
;
// op inputs
VarSetMap
defs_
;
// op outputs
std
::
vector
<
ir
::
Node
*>
ops_
;
// op sequence by topology sort
};
// valid a tensor can be reuse or not
bool
NodeCanReused
(
ir
::
Node
*
node
);
// valid a tensor can be reuse or not.
bool
NodeCanReused
(
const
VarDesc
&
node
);
// check op has subblock or not
bool
OpHasSubBlock
(
OpDesc
*
desc
);
// node memory size in bytes
size_t
NodeSize
(
ir
::
Node
*
n
);
// node memory size in bytes
size_t
NodeSize
InBytes
(
ir
::
Node
*
n
);
size_t
NodeSize
(
const
VarDesc
&
);
std
::
string
DebugString
(
ir
::
Node
*
var
);
// std::string DebugString(VarDesc* var);
// NOTE(dzhwinter)
// after node reuse, the replaced node shape is
// different with its VarDesc. So need to find the
// correct VarDesc in Block.
VarDesc
*
FindVarDescInBlock
(
ir
::
Node
*
n
);
static
inline
bool
IsSameDesc
(
OpDesc
*
op1
,
OpDesc
*
op2
)
{
return
op1
->
Type
()
==
op2
->
Type
()
&&
op1
->
Inputs
()
==
op2
->
Inputs
()
&&
op1
->
Outputs
()
==
op2
->
Outputs
();
}
template
<
typename
Container
,
typename
Callback
>
class
FilterVariableImpl
{
public:
void
operator
()(
const
Container
&
nodes
,
Callback
callback
)
{
for
(
auto
*
node
:
nodes
)
{
callback
(
node
);
}
}
};
// filter var node for op->inputs/outputs
template
<
typename
Callback
>
class
FilterVariableImpl
<
std
::
vector
<
ir
::
Node
*>
,
Callback
>
{
public:
void
operator
()(
const
std
::
vector
<
ir
::
Node
*>&
nodes
,
Callback
callback
)
{
for
(
auto
*
var
:
nodes
)
{
if
(
var
->
IsVar
()
&&
!
var
->
IsCtrlVar
())
{
callback
(
var
);
}
}
}
};
template
<
typename
Container
,
typename
Callback
>
void
FilterVariables
(
const
Container
&
nodes
,
Callback
callback
)
{
FilterVariableImpl
<
Container
,
Callback
>
()(
nodes
,
callback
);
}
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/
analysis_var_pass
_test.cc
→
paddle/fluid/framework/details/
memory_optimize_helper
_test.cc
浏览文件 @
88d3dc94
...
...
@@ -12,12 +12,18 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/
analysis_var_pass
.h"
#include "paddle/fluid/framework/details/
memory_optimize_helper
.h"
#include <algorithm>
#include <iostream>
#include <iterator>
#include <memory>
#include <sstream>
#include <string>
#include <utility>
#include <vector>
#include "glog/logging.h"
#include "gtest/gtest.h"
#include "paddle/fluid/framework/details/graph_test_base.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/op_registry.h"
...
...
@@ -26,46 +32,82 @@
namespace
paddle
{
namespace
framework
{
namespace
details
{
TEST
(
OrderedSet
,
Normal
)
{
OrderedSet
pool
;
std
::
vector
<
std
::
unique_ptr
<
ir
::
Node
>>
nodes
;
// clang-format off
std
::
vector
<
std
::
vector
<
int64_t
>>
shapes
=
{{
-
1
,
10
},
{
-
1
,
20
},
{
1
,
2
},
{
5
,
2
},
{
10
,
20
},
{
-
1
,
2
,
5
},
{
-
1
,
1
,
5
},
{
-
1
,
1
}};
// clang-format on
const
int
COUNT
=
shapes
.
size
();
ProgramDesc
prog
;
BlockDesc
*
block_desc
=
prog
.
MutableBlock
(
0
);
auto
*
op_desc
=
block_desc
->
AppendOp
();
op_desc
->
SetType
(
"dummy"
);
std
::
unique_ptr
<
ir
::
Node
>
op
=
ir
::
CreateNodeForTest
(
op_desc
);
for
(
int
i
=
0
;
i
<
COUNT
;
++
i
)
{
auto
desc
=
block_desc
->
Var
(
std
::
to_string
(
i
));
desc
->
SetShape
(
shapes
[
i
]);
std
::
unique_ptr
<
ir
::
Node
>
node
=
ir
::
CreateNodeForTest
(
desc
);
node
->
inputs
.
emplace_back
(
op
.
get
());
nodes
.
emplace_back
(
std
::
move
(
node
));
}
// Insert
for
(
auto
&
node
:
nodes
)
{
pool
.
Insert
(
node
.
get
());
}
// Has/size
ASSERT_EQ
(
pool
.
size
(),
shapes
.
size
());
for
(
auto
&
node
:
nodes
)
{
ASSERT_TRUE
(
pool
.
Has
(
node
.
get
()));
}
class
DummyOp
:
public
OperatorBase
{
public:
DummyOp
(
const
std
::
string
&
type
,
const
VariableNameMap
&
inputs
,
const
VariableNameMap
&
outputs
,
const
AttributeMap
&
attrs
)
:
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
private:
void
RunImpl
(
const
Scope
&
scope
,
const
platform
::
Place
&
place
)
const
override
{}
};
class
SumOpMaker
:
public
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
AddInput
(
"X"
,
""
).
AsDuplicable
();
AddOutput
(
"Out"
,
""
);
AddComment
(
""
);
}
};
class
AssignOpMaker
:
public
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
AddInput
(
"X"
,
""
).
AsDuplicable
();
AddOutput
(
"Out"
,
""
);
AddComment
(
""
);
}
};
class
DummyVarTypeInference
:
public
VarTypeInference
{
public:
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
auto
&
inputs
=
op_desc
.
Input
(
"X"
);
auto
type
=
block
->
Var
(
inputs
.
front
())
->
GetType
();
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
block
->
Var
(
out_var_name
)
->
SetType
(
type
);
}
};
// assert its order and interface.
std
::
cout
<<
pool
.
ToString
()
<<
std
::
endl
;
pool
.
Erase
(
nodes
.
front
().
get
());
std
::
cout
<<
pool
.
ToString
()
<<
std
::
endl
;
ASSERT_EQ
(
pool
.
size
(),
static_cast
<
size_t
>
(
COUNT
-
1
));
ASSERT_EQ
(
pool
.
GetNodeIndexInPool
(
nodes
.
back
().
get
()),
0
);
{
auto
v1
=
block_desc
->
Var
(
"11"
);
v1
->
SetShape
({
-
1
,
256
,
56
,
56
});
std
::
unique_ptr
<
ir
::
Node
>
node1
=
ir
::
CreateNodeForTest
(
v1
);
node1
->
inputs
.
emplace_back
(
op
.
get
());
auto
*
cache
=
pool
.
FindBestFitNode
(
node1
.
get
());
ASSERT_EQ
(
cache
,
nullptr
);
}
{
auto
v2
=
block_desc
->
Var
(
"12"
);
v2
->
SetShape
({
-
1
,
2
,
5
});
std
::
unique_ptr
<
ir
::
Node
>
node1
=
ir
::
CreateNodeForTest
(
v2
);
node1
->
inputs
.
emplace_back
(
op
.
get
());
auto
*
cache
=
pool
.
FindBestFitNode
(
node1
.
get
());
ASSERT_EQ
(
pool
.
GetNodeIndexInPool
(
cache
),
2
);
// match 6:[-1,2,5]
}
{
auto
v3
=
block_desc
->
Var
(
"13"
);
v3
->
SetShape
({
2
,
5
});
std
::
unique_ptr
<
ir
::
Node
>
node1
=
ir
::
CreateNodeForTest
(
v3
);
node1
->
inputs
.
emplace_back
(
op
.
get
());
auto
*
cache
=
pool
.
FindBestFitNode
(
node1
.
get
());
ASSERT_EQ
(
pool
.
GetNodeIndexInPool
(
cache
),
5
);
// match 4:[5,2]
}
}
}
// namespace details
}
// namespace framework
}
// namespace paddle
...
...
@@ -102,11 +144,6 @@ namespace paddle {
namespace
framework
{
namespace
details
{
static
inline
bool
IsSameDesc
(
OpDesc
*
op1
,
OpDesc
*
op2
)
{
return
op1
->
Type
()
==
op2
->
Type
()
&&
op1
->
Inputs
()
==
op2
->
Inputs
()
&&
op1
->
Outputs
()
==
op2
->
Outputs
();
}
inline
static
ProgramDesc
FillProgramDesc
()
{
ProgramDesc
prog
;
prog
.
MutableBlock
(
0
)
->
Var
(
"a"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
...
...
@@ -141,15 +178,6 @@ inline static ProgramDesc FillProgramDesc() {
return
prog
;
}
template
<
typename
Container
>
inline
static
std
::
string
DebugString
(
const
Container
&
c
)
{
std
::
stringstream
ss
;
for
(
auto
&
item
:
c
)
{
ss
<<
item
<<
" "
;
}
return
ss
.
str
();
}
TEST
(
CFGGraph
,
IRGraph
)
{
// prepare ir graph
auto
prog
=
FillProgramDesc
();
...
...
paddle/fluid/framework/details/memory_optimize_pass.cc
0 → 100644
浏览文件 @
88d3dc94
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/memory_optimize_pass.h"
#include <algorithm>
#include <atomic>
#include <deque>
#include <fstream>
#include <iostream>
#include <iterator>
#include <memory>
#include <queue>
#include <sstream>
#include <string>
#include <type_traits>
#include <vector>
#include "gflags/gflags.h"
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
DEFINE_bool
(
enable_subgraph_optimize
,
false
,
"SubGraph also reuse global graph variables, it will reduce the "
"memory occupation"
"but a higher risk of memory reuse error. default disabled."
);
DEFINE_string
(
memory_optimize_debug
,
""
,
"debug the operator output variable when do the variable reuse."
"memory reuse pass."
"only for debug, default disabled."
);
namespace
paddle
{
namespace
framework
{
namespace
details
{
std
::
unique_ptr
<
ir
::
Graph
>
MemoryOptimizePass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
auto
nodes
=
graph
->
Nodes
();
CollectSkipVarsSet
(
nodes
);
cfg_
.
reset
(
new
details
::
ControlFlowGraph
(
*
graph
));
cfg_
->
LiveVariableAnalysis
();
InitSSAGraphNodes
();
int
reuse_id
=
0
;
for
(
size_t
idx
=
0
;
idx
<
cfg_
->
Ops
().
size
();
++
idx
)
{
auto
&
op
=
cfg_
->
Ops
()[
idx
];
auto
*
op_desc
=
op
->
Op
();
// some op in graph has no op desc
if
(
op_desc
==
nullptr
)
continue
;
if
(
OpHasSubBlock
(
op_desc
))
{
if
(
FLAGS_enable_subgraph_optimize
)
{
SubGraphOptimize
(
op_desc
);
}
else
{
VLOG
(
3
)
<<
op
->
Name
()
<<
" has subblock, but disable subgraph optimize. skipped."
;
continue
;
}
}
for
(
auto
&
var
:
op
->
outputs
)
{
if
(
!
NodeCanReused
(
var
)
||
cfg_
->
Use
(
op
).
count
(
var
->
Name
())
==
0
||
skip_set_
.
count
(
var
->
Name
()))
continue
;
ir
::
Node
*
cache
=
pool_
.
FindBestFitNode
(
var
);
if
(
var
->
Name
()
==
FLAGS_memory_optimize_debug
)
{
VLOG
(
3
)
<<
"start match var "
<<
DebugString
(
var
)
<<
" of op "
<<
op
->
Name
();
VLOG
(
3
)
<<
pool_
.
ToString
();
VLOG
(
3
)
<<
"matched in pool : "
<<
((
cache
==
nullptr
)
?
"False"
:
"True"
);
}
if
(
cache
==
nullptr
)
continue
;
if
(
var
->
Name
()
==
cache
->
Name
())
{
VLOG
(
3
)
<<
"The same cache variable is cascade reused."
<<
var
->
Name
()
<<
" is re-filled to the pool after"
<<
"the reused op is finished. Current op can not "
<<
"replace it again. Skip this candidate."
;
continue
;
int
node_idx_in_pool
=
pool_
.
GetNodeIndexInPool
(
cache
);
VLOG
(
3
)
<<
string
::
Sprintf
(
"!!! %s, %s => %s, cache idx %d, pool size %d"
,
std
::
to_string
(
reuse_id
++
),
DebugString
(
var
),
DebugString
(
cache
),
node_idx_in_pool
,
static_cast
<
int
>
(
pool_
.
size
()));
// update CFG Graph on the fly.
// reused var maybe re-fill into the pool
cfg_
->
RenameVarInCFGGraph
(
var
->
Name
(),
cache
->
Name
(),
idx
);
// NOTE(dzhwinter): we need to both update the ProgramDesc
// and IR Graph. because op_desc/var_desc is used in CreateOp,
// CreateVar when running happens. But IR Graph
// define the dependence relationship between nodes.
RenameVarInGraphDesc
(
var
->
Name
(),
cache
->
Name
(),
idx
);
RenameVarInGraphNode
(
var
->
Name
(),
cache
->
Name
(),
idx
,
graph
.
get
());
pool_
.
Erase
(
cache
);
}
// fill the pool
std
::
unordered_set
<
std
::
string
>
unlived_vars
;
for
(
auto
var
:
cfg_
->
LiveIn
(
op
))
{
if
(
cfg_
->
LiveOut
(
op
).
count
(
var
)
==
0
)
{
unlived_vars
.
emplace
(
var
);
}
}
for
(
auto
var
:
unlived_vars
)
{
ir
::
Node
*
var_node
=
cfg_
->
GetNodeByName
(
var
,
op
);
if
(
NodeCanReused
(
var_node
)
&&
!
pool_
.
Has
(
var_node
))
{
pool_
.
Insert
(
var_node
);
}
}
}
}
graph
->
ResolveHazard
(
var_nodes_
);
return
graph
;
}
void
MemoryOptimizePass
::
SubGraphOptimize
(
OpDesc
*
op_desc
)
const
{
// conditional block, while op and their grad op
auto
*
sub_block_desc
=
AttrReader
(
op_desc
->
GetAttrMap
()).
Get
<
BlockDesc
*>
(
"sub_block"
);
// create a mirror block to construct an IR Graph.
ProgramDesc
prog
;
auto
*
copy_block
=
prog
.
MutableBlock
(
0
);
for
(
auto
*
op
:
sub_block_desc
->
AllOps
())
{
auto
*
copy_op
=
copy_block
->
AppendOp
();
copy_op
->
CopyFrom
(
*
op
);
copy_op
->
Flush
();
}
for
(
auto
*
var
:
sub_block_desc
->
AllVars
())
{
auto
*
copy_var
=
copy_block
->
Var
(
var
->
Name
());
copy_var
->
SetDataType
(
var
->
GetDataType
());
// only lod tensor can be reused. So ignore the multiple dims case.
copy_var
->
SetType
(
var
->
GetType
());
copy_var
->
SetShape
(
var
->
GetShape
());
copy_var
->
SetPersistable
(
var
->
Persistable
());
}
ir
::
Graph
sub_graph
(
prog
);
std
::
unordered_set
<
ir
::
Node
*>
sub_graph_all_ops
;
FilterVariables
(
sub_graph
.
Nodes
(),
[
&
](
ir
::
Node
*
var
)
{
// sub_graph_all_ops.emplace(var);
if
(
var
->
IsVar
()
&&
!
var
->
IsCtrlVar
())
{
sub_graph_all_ops
.
emplace
(
var
);
}
});
int
sub_reuse_id
=
0
;
// subgraph nodes is unordered, reuse need to follow the desc order.
// find the right op node through the descs
for
(
auto
*
sub_op_desc
:
sub_block_desc
->
AllOps
())
{
ir
::
Node
*
sub_op
=
nullptr
;
for
(
auto
*
node
:
sub_graph_all_ops
)
{
if
(
node
->
Op
()
==
sub_op_desc
)
{
sub_op
=
node
;
break
;
}
}
PADDLE_ENFORCE
(
sub_op
!=
nullptr
);
for
(
auto
*
var
:
sub_op
->
outputs
)
{
if
(
NodeCanReused
(
var
))
{
ir
::
Node
*
cache
=
pool_
.
FindBestFitNode
(
var
);
if
(
cache
!=
nullptr
)
{
if
(
var
->
Var
()
->
GetDataType
()
!=
cache
->
Var
()
->
GetDataType
())
{
continue
;
}
int
node_idx_in_pool
=
pool_
.
GetNodeIndexInPool
(
cache
);
VLOG
(
3
)
<<
string
::
Sprintf
(
"!!! %s, %s => %s, cache idx %d, pool size %d"
,
std
::
to_string
(
sub_reuse_id
++
),
DebugString
(
var
),
DebugString
(
cache
),
node_idx_in_pool
,
static_cast
<
int
>
(
pool_
.
size
()));
// NOTE(dzh): subblock is not in IR graph. Modify the block_desc
// immediately to make the subblock variable reuse strategy take
// effect. Because it is a single op in graph. No need to
// update the ir nodes.
sub_op_desc
->
Rename
(
var
->
Name
(),
cache
->
Name
());
if
(
sub_op_desc
->
Block
()
->
HasVar
(
var
->
Name
()))
{
sub_op_desc
->
Block
()
->
RemoveVar
(
var
->
Name
());
}
}
}
}
}
}
void
MemoryOptimizePass
::
CollectSkipVarsSet
(
const
std
::
unordered_set
<
ir
::
Node
*>&
nodes
)
const
{
auto
update_skip_set
=
[
&
](
OpDesc
*
op_desc
)
{
auto
inputs
=
op_desc
->
InputArgumentNames
();
auto
outputs
=
op_desc
->
OutputArgumentNames
();
skip_set_
.
insert
(
inputs
.
begin
(),
inputs
.
end
());
skip_set_
.
insert
(
outputs
.
begin
(),
outputs
.
end
());
};
for
(
auto
&
op
:
nodes
)
{
if
(
!
op
->
IsOp
()
||
op
->
Op
()
==
nullptr
)
continue
;
auto
*
op_desc
=
op
->
Op
();
// NOTE(dzhwinter):
// current block can not reuse next level block vars.
if
(
OpHasSubBlock
(
op_desc
))
update_skip_set
(
op_desc
);
// NOTE(dzhwinter):
// distributed ops input/output name need to
// keep same bettwen trainer/pserver
if
(
op_desc
->
Type
()
==
"send"
)
update_skip_set
(
op_desc
);
if
(
op_desc
->
Type
()
==
"recv"
)
update_skip_set
(
op_desc
);
if
(
op_desc
->
Type
()
==
"prefetch"
)
update_skip_set
(
op_desc
);
}
}
void
MemoryOptimizePass
::
RenameVarInGraphDesc
(
const
std
::
string
&
var
,
const
std
::
string
&
cache_var
,
size_t
idx
)
const
{
for
(
size_t
i
=
idx
;
i
<
cfg_
->
Ops
().
size
();
++
i
)
{
auto
*
op
=
cfg_
->
Ops
()[
i
];
PADDLE_ENFORCE
(
op
->
IsOp
()
&&
op
->
Op
());
auto
*
op_desc
=
op
->
Op
();
op_desc
->
RenameInput
(
var
,
cache_var
);
op_desc
->
RenameOutput
(
var
,
cache_var
);
if
(
op_desc
->
Block
()
->
HasVar
(
var
))
op_desc
->
Block
()
->
RemoveVar
(
var
);
op_desc
->
Flush
();
}
}
void
MemoryOptimizePass
::
InitSSAGraphNodes
()
const
{
std
::
unordered_map
<
std
::
string
,
std
::
unordered_set
<
ir
::
Node
*>>
all_vars
;
if
(
var_nodes_
.
empty
())
{
for
(
auto
*
op
:
cfg_
->
Ops
())
{
for
(
auto
*
node
:
op
->
inputs
)
{
if
(
all_vars
[
node
->
Name
()].
count
(
node
)
==
0
)
{
all_vars
[
node
->
Name
()].
emplace
(
node
);
var_nodes_
[
node
->
Name
()].
emplace_back
(
node
);
}
}
for
(
auto
*
node
:
op
->
outputs
)
{
if
(
all_vars
[
node
->
Name
()].
count
(
node
)
==
0
)
{
all_vars
[
node
->
Name
()].
emplace
(
node
);
var_nodes_
[
node
->
Name
()].
emplace_back
(
node
);
}
}
}
}
}
void
MemoryOptimizePass
::
RenameVarInGraphNode
(
const
std
::
string
&
var
,
const
std
::
string
&
cache_var
,
size_t
idx
,
ir
::
Graph
*
graph
)
const
{
// if replace happens, we need to create a newer version cache_var
// but use the same dims/data_type with var.
PADDLE_ENFORCE
(
var_nodes_
[
var
].
size
()
>=
1
&&
var_nodes_
[
var
].
at
(
0
)
->
Var
()
!=
nullptr
);
std
::
unique_ptr
<
VarDesc
>
var_desc
(
new
VarDesc
(
*
var_nodes_
[
var
].
at
(
0
)
->
Var
()));
var_desc
->
SetName
(
cache_var
);
for
(
size_t
i
=
idx
;
i
<
cfg_
->
Ops
().
size
();
++
i
)
{
auto
*
op
=
cfg_
->
Ops
()[
i
];
// redirect the input to the latest version of cache_var
for
(
auto
*
node
:
op
->
inputs
)
{
if
(
node
->
Name
()
==
var
)
{
ir
::
Node
*
cache_node
=
graph
->
CreateVarNode
(
var_desc
.
get
());
var_nodes_
[
cache_var
].
emplace_back
(
cache_node
);
// swap node to cache_node
cache_node
->
outputs
.
insert
(
cache_node
->
outputs
.
end
(),
node
->
outputs
.
begin
(),
node
->
outputs
.
end
());
PADDLE_ENFORCE
(
node
->
inputs
.
size
()
==
1
&&
node
->
inputs
[
0
]
->
IsOp
());
auto
*
prev_op
=
node
->
inputs
[
0
];
std
::
replace
(
prev_op
->
outputs
.
begin
(),
prev_op
->
outputs
.
end
(),
node
,
cache_node
);
cache_node
->
inputs
.
emplace_back
(
prev_op
);
for
(
auto
*
next_op
:
node
->
outputs
)
{
std
::
replace
(
next_op
->
inputs
.
begin
(),
next_op
->
inputs
.
end
(),
node
,
cache_node
);
}
}
}
// if we need to rename the output,
// always create a newer version of cache_var
for
(
auto
*
node
:
op
->
outputs
)
{
if
(
node
->
Name
()
==
var
)
{
ir
::
Node
*
cache_node
=
graph
->
CreateVarNode
(
var_desc
.
get
());
var_nodes_
[
cache_var
].
emplace_back
(
cache_node
);
// swap node to cache node
cache_node
->
outputs
.
insert
(
cache_node
->
outputs
.
end
(),
node
->
outputs
.
begin
(),
node
->
outputs
.
end
());
cache_node
->
inputs
.
emplace_back
(
op
);
std
::
replace
(
op
->
outputs
.
begin
(),
op
->
outputs
.
end
(),
node
,
cache_node
);
for
(
auto
*
next_op
:
node
->
outputs
)
{
std
::
replace
(
next_op
->
inputs
.
begin
(),
next_op
->
inputs
.
end
(),
node
,
cache_node
);
}
}
}
}
// release node of unused var in graph
for
(
auto
*
node
:
var_nodes_
[
var
])
{
graph
->
RemoveNode
(
node
);
}
var_nodes_
.
at
(
var
).
clear
();
}
}
// namespace details
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
memory_optimize_pass
,
paddle
::
framework
::
details
::
MemoryOptimizePass
)
.
RequireGraphAttr
(
paddle
::
framework
::
details
::
kAllOpDescs
);
paddle/fluid/framework/details/
analysis_var
_pass.h
→
paddle/fluid/framework/details/
memory_optimize
_pass.h
浏览文件 @
88d3dc94
...
...
@@ -25,29 +25,22 @@
#include <vector>
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/details/memory_
reuse_types
.h"
#include "paddle/fluid/framework/details/memory_
optimize_helper
.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
details
{
constexpr
char
kAllOpDescs
[]
=
"all_op_descs"
;
std
::
vector
<
ir
::
Node
*>
SortOpLikeDescOrder
(
const
ir
::
Graph
&
graph
);
// sort op in bfs order
std
::
vector
<
ir
::
Node
*>
BFSSortGraphOps
(
const
ir
::
Graph
&
graph
);
class
ControlFlowGraph
;
class
AnalysisVarPass
:
public
ir
::
Pass
{
class
MemoryOptimizePass
:
public
ir
::
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
;
private:
// fill the variable map(var_nodes) by version.
void
InitSSAGraphNodes
()
const
;
private:
// update program descs
void
RenameVarInGraphDesc
(
const
std
::
string
&
var
,
const
std
::
string
&
cache_var
,
size_t
idx
)
const
;
...
...
@@ -57,17 +50,14 @@ class AnalysisVarPass : public ir::Pass {
ir
::
Graph
*
graph
)
const
;
void
SubGraphOptimize
(
OpDesc
*
op_desc
)
const
;
// valid a tensor can be reuse or not
bool
NodeCanReused
(
ir
::
Node
*
node
)
const
;
// scan subblock and collect the output/input variables.
std
::
unordered_set
<
std
::
string
>
GetSubBlockVars
(
const
std
::
unordered_set
<
ir
::
Node
*>&
)
const
;
// check op has subblock or not
bool
OpHasSubBlock
(
OpDesc
*
desc
)
const
;
// 1. scan op with subblock and collect the output/input vars.
// while, while_grad, conditional_block
// 2. scan distributed ops and collect the output/input vars
void
CollectSkipVarsSet
(
const
std
::
unordered_set
<
ir
::
Node
*>&
)
const
;
private:
// Reuse Node Pool, Owned.
mutable
Ordered
NodePairPool
pool_
;
mutable
Ordered
Set
pool_
;
// controlflow Graph
mutable
std
::
unique_ptr
<
ControlFlowGraph
>
cfg_
;
// skip set
...
...
@@ -76,45 +66,6 @@ class AnalysisVarPass : public ir::Pass {
mutable
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
var_nodes_
;
};
class
ControlFlowGraph
{
public:
ControlFlowGraph
()
=
default
;
// For IR Graph in parallelexecutor
explicit
ControlFlowGraph
(
const
ir
::
Graph
&
graph
);
void
LiveVariableAnalysis
();
void
RenameVarInCFGGraph
(
const
std
::
string
&
old_node
,
const
std
::
string
&
new_node
,
int
begin_idx
);
const
std
::
set
<
std
::
string
>
LiveIn
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>
LiveOut
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>
Use
(
ir
::
Node
*
op
)
const
;
const
std
::
vector
<
ir
::
Node
*>
Ops
()
const
;
std
::
vector
<
ir
::
Node
*>&
Ops
();
// for ssa-graph nodes
ir
::
Node
*
GetNodeFromVarName
(
const
std
::
string
&
name
,
ir
::
Node
*
op
)
const
;
private:
void
BuildCFGGraph
();
void
ConnectNodes
();
using
NodeListMap
=
std
::
unordered_map
<
ir
::
Node
*
,
std
::
set
<
ir
::
Node
*>>
;
using
VarSetMap
=
std
::
map
<
ir
::
Node
*
,
std
::
set
<
std
::
string
>>
;
// successors ops use the output variables.
NodeListMap
successors_
;
// predecessors ops generated input variables.
NodeListMap
predecessors_
;
// variables lived before run current op.
VarSetMap
live_in_
;
// variables lived after run current op.
VarSetMap
live_out_
;
VarSetMap
uses_
;
// op inputs
VarSetMap
defs_
;
// op outputs
std
::
vector
<
ir
::
Node
*>
ops_
;
// op sequence by topology sort
};
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/memory_reuse_types.cc
已删除
100644 → 0
浏览文件 @
f3463ecb
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/memory_reuse_types.h"
#include <iostream>
#include <sstream>
#include <string>
namespace
paddle
{
namespace
framework
{
namespace
details
{
size_t
NodeSizeInBytes
(
ir
::
Node
*
n
)
{
auto
*
desc
=
FindVarDescInBlock
(
n
);
auto
shape
=
desc
->
GetShape
();
size_t
type_size
=
SizeOfType
(
desc
->
GetDataType
());
int
size
=
1
;
for
(
auto
&
s
:
shape
)
{
size
*=
s
;
}
return
type_size
*
std
::
abs
(
size
);
}
std
::
string
DebugStringImpl
(
VarDesc
*
var
)
{
std
::
stringstream
ss
;
ss
<<
var
->
Name
();
ss
<<
"["
;
try
{
auto
shape
=
var
->
GetShape
();
for
(
size_t
i
=
0
;
i
<
shape
.
size
();
++
i
)
{
if
(
i
!=
shape
.
size
()
-
1
)
{
ss
<<
shape
[
i
]
<<
","
;
}
else
{
ss
<<
shape
[
i
];
}
}
ss
<<
"]"
;
}
catch
(...)
{
ss
<<
"Var has no VarDesc !!! Name:"
<<
var
->
Name
();
}
return
ss
.
str
();
}
std
::
string
DebugString
(
ir
::
Node
*
var
)
{
return
DebugStringImpl
(
FindVarDescInBlock
(
var
));
}
// return DebugString(var->Var()); }
// NOTE(dzh): based ir node, if a large node has been reused
// by a small size node, then next time it appear in pool, it will
// have the small size. Find the original node shap from blockdesc.
VarDesc
*
FindVarDescInBlock
(
ir
::
Node
*
n
)
{
PADDLE_ENFORCE
(
n
->
IsVar
()
&&
!
n
->
IsCtrlVar
()
&&
n
->
inputs
.
size
()
==
1
);
BlockDesc
*
block
=
n
->
inputs
[
0
]
->
Op
()
->
Block
();
PADDLE_ENFORCE
(
block
->
HasVar
(
n
->
Name
()),
string
::
Sprintf
(
"Block do not has var %s"
,
n
->
Name
()));
return
block
->
FindVar
(
n
->
Name
());
}
struct
NodeComparator
{
bool
operator
()(
ir
::
Node
*
lhs
,
ir
::
Node
*
rhs
)
const
{
auto
*
lhs_desc
=
FindVarDescInBlock
(
lhs
);
auto
*
rhs_desc
=
FindVarDescInBlock
(
rhs
);
auto
lhs_shape
=
lhs_desc
->
GetShape
();
auto
rhs_shape
=
rhs_desc
->
GetShape
();
if
((
lhs_shape
[
0
]
==
-
1
&&
rhs_shape
[
0
]
==
-
1
)
||
(
lhs_shape
[
0
]
!=
-
1
&&
rhs_shape
[
0
]
!=
-
1
))
{
return
NodeSizeInBytes
(
lhs
)
<=
NodeSizeInBytes
(
rhs
);
}
else
{
return
false
;
}
}
};
void
OrderedNodePairPool
::
Insert
(
ir
::
Node
*
var
,
ir
::
Node
*
op
)
{
PADDLE_ENFORCE
(
var
->
IsVar
()
&&
!
var
->
IsCtrlVar
());
PADDLE_ENFORCE
(
op
->
IsOp
());
if
(
mark_table_
.
count
(
var
->
Name
())
!=
0
)
{
mark_table_
[
var
->
Name
()]
->
second
.
insert
(
op
);
return
;
}
auto
*
var_desc
=
FindVarDescInBlock
(
var
);
auto
var_shape
=
var_desc
->
GetShape
();
int
batch_size
=
static_cast
<
int
>
(
var_shape
[
0
]);
NodeComparator
compare_node
;
Iter
it
=
nodes_
.
begin
();
while
(
it
!=
nodes_
.
end
())
{
auto
*
cache_desc
=
FindVarDescInBlock
(
it
->
first
);
int
cache_batch_size
=
cache_desc
->
GetShape
()[
0
];
if
((
cache_batch_size
==
-
1
&&
batch_size
==
-
1
)
||
(
cache_batch_size
!=
-
1
&&
batch_size
!=
-
1
))
{
if
(
compare_node
(
it
->
first
,
var
))
{
++
it
;
}
else
{
break
;
}
}
else
if
(
cache_batch_size
==
-
1
&&
batch_size
!=
-
1
)
{
++
it
;
}
else
if
(
cache_batch_size
!=
-
1
&&
batch_size
==
-
1
)
{
break
;
}
}
it
=
nodes_
.
insert
(
it
,
std
::
make_pair
(
var
,
std
::
unordered_set
<
ir
::
Node
*>
{
op
}));
mark_table_
[
var
->
Name
()]
=
it
;
}
int
OrderedNodePairPool
::
GetIndex
(
ir
::
Node
*
var
)
{
return
std
::
distance
(
nodes_
.
begin
(),
mark_table_
[
var
->
Name
()]);
}
ir
::
Node
*
OrderedNodePairPool
::
NodeMatch
(
ir
::
Node
*
var
)
const
{
ir
::
Node
*
found_node
=
nullptr
;
NodeComparator
compare_node
;
for
(
auto
it
=
nodes_
.
begin
();
it
!=
nodes_
.
end
();
++
it
)
{
if
(
compare_node
(
var
,
it
->
first
))
{
found_node
=
it
->
first
;
break
;
}
}
return
found_node
;
}
void
OrderedNodePairPool
::
Erase
(
ir
::
Node
*
var
)
{
PADDLE_ENFORCE
(
mark_table_
.
count
(
var
->
Name
()));
nodes_
.
erase
(
mark_table_
[
var
->
Name
()]);
mark_table_
.
erase
(
var
->
Name
());
}
std
::
string
OrderedNodePairPool
::
ToString
()
const
{
std
::
stringstream
ss
;
for
(
auto
it
=
nodes_
.
begin
();
it
!=
nodes_
.
end
();
++
it
)
{
ss
<<
DebugString
(
it
->
first
)
<<
" "
;
}
return
ss
.
str
();
}
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/memory_reuse_types_test.cc
已删除
100644 → 0
浏览文件 @
f3463ecb
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/memory_reuse_types.h"
#include <algorithm>
#include <iostream>
#include <memory>
#include <sstream>
#include <string>
#include <utility>
#include <vector>
#include "glog/logging.h"
#include "gtest/gtest.h"
namespace
paddle
{
namespace
framework
{
namespace
details
{
TEST
(
OrderedNodePairPool
,
Normal
)
{
OrderedNodePairPool
pool
;
std
::
vector
<
std
::
unique_ptr
<
ir
::
Node
>>
nodes
;
// clang-format off
std
::
vector
<
std
::
vector
<
int64_t
>>
shapes
=
{{
-
1
,
10
},
{
-
1
,
20
},
{
1
,
2
},
{
5
,
2
},
{
10
,
20
},
{
-
1
,
2
,
5
},
{
-
1
,
1
,
5
},
{
-
1
,
1
}};
// clang-format on
const
int
COUNT
=
shapes
.
size
();
ProgramDesc
prog
;
BlockDesc
*
block_desc
=
prog
.
MutableBlock
(
0
);
auto
*
op_desc
=
block_desc
->
AppendOp
();
op_desc
->
SetType
(
"dummy"
);
std
::
unique_ptr
<
ir
::
Node
>
op
=
ir
::
CreateNodeForTest
(
op_desc
);
for
(
int
i
=
0
;
i
<
COUNT
;
++
i
)
{
auto
desc
=
block_desc
->
Var
(
std
::
to_string
(
i
));
desc
->
SetShape
(
shapes
[
i
]);
std
::
unique_ptr
<
ir
::
Node
>
node
=
ir
::
CreateNodeForTest
(
desc
);
node
->
inputs
.
emplace_back
(
op
.
get
());
nodes
.
emplace_back
(
std
::
move
(
node
));
}
for
(
auto
&
node
:
nodes
)
{
pool
.
Insert
(
node
.
get
(),
op
.
get
());
}
// assert its order and interface.
std
::
cout
<<
pool
.
ToString
()
<<
std
::
endl
;
pool
.
Erase
(
nodes
.
front
().
get
());
std
::
cout
<<
pool
.
ToString
()
<<
std
::
endl
;
ASSERT_EQ
(
pool
.
size
(),
static_cast
<
size_t
>
(
COUNT
-
1
));
ASSERT_EQ
(
pool
.
GetIndex
(
nodes
.
back
().
get
()),
0
);
{
auto
v1
=
block_desc
->
Var
(
"11"
);
v1
->
SetShape
({
-
1
,
256
,
56
,
56
});
std
::
unique_ptr
<
ir
::
Node
>
node1
=
ir
::
CreateNodeForTest
(
v1
);
node1
->
inputs
.
emplace_back
(
op
.
get
());
auto
*
cache
=
pool
.
NodeMatch
(
node1
.
get
());
ASSERT_EQ
(
cache
,
nullptr
);
}
{
auto
v2
=
block_desc
->
Var
(
"12"
);
v2
->
SetShape
({
-
1
,
2
,
5
});
std
::
unique_ptr
<
ir
::
Node
>
node1
=
ir
::
CreateNodeForTest
(
v2
);
node1
->
inputs
.
emplace_back
(
op
.
get
());
auto
*
cache
=
pool
.
NodeMatch
(
node1
.
get
());
ASSERT_EQ
(
pool
.
GetIndex
(
cache
),
2
);
// match 6:[-1,2,5]
}
{
auto
v3
=
block_desc
->
Var
(
"13"
);
v3
->
SetShape
({
2
,
5
});
std
::
unique_ptr
<
ir
::
Node
>
node1
=
ir
::
CreateNodeForTest
(
v3
);
node1
->
inputs
.
emplace_back
(
op
.
get
());
auto
*
cache
=
pool
.
NodeMatch
(
node1
.
get
());
ASSERT_EQ
(
pool
.
GetIndex
(
cache
),
5
);
// match 4:[5,2]
}
}
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/op_registry.h
浏览文件 @
88d3dc94
...
...
@@ -18,6 +18,7 @@ limitations under the License. */
#include <tuple>
#include <vector>
#include "paddle/fluid/framework/grad_op_desc_maker.h"
#include "paddle/fluid/framework/inplace_op_inference.h"
#include "paddle/fluid/framework/op_info.h"
#include "paddle/fluid/framework/op_proto_maker.h"
#include "paddle/fluid/framework/operator.h"
...
...
@@ -32,7 +33,8 @@ enum OpInfoFillType {
kOpProtoAndCheckerMaker
=
1
,
kGradOpDescMaker
=
2
,
kVarTypeInference
=
3
,
kShapeInference
=
4
kShapeInference
=
4
,
kInplaceOpInference
=
5
};
template
<
typename
T
>
...
...
@@ -48,8 +50,11 @@ struct OpInfoFillTypeID {
?
kVarTypeInference
:
(
std
::
is_base_of
<
InferShapeBase
,
T
>::
value
?
kShapeInference
:
(
std
::
is_base_of
<
InplaceOpInference
,
T
>::
value
?
kInplaceOpInference
:
static_cast
<
OpInfoFillType
>
(
-
1
)))));
-
1
)
)))));
}
};
...
...
@@ -139,6 +144,16 @@ struct OpInfoFiller<T, kShapeInference> {
}
};
template
<
typename
T
>
struct
OpInfoFiller
<
T
,
kInplaceOpInference
>
{
void
operator
()(
const
char
*
op_type
,
OpInfo
*
info
)
const
{
info
->
infer_inplace_
=
[](
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
{
T
infer
;
return
infer
(
op_desc
,
block
);
};
}
};
}
// namespace details
}
// namespace framework
...
...
paddle/fluid/framework/details/parallel_ssa_graph_executor.cc
浏览文件 @
88d3dc94
...
...
@@ -128,7 +128,7 @@ FeedFetchList ParallelSSAGraphExecutor::Run(
if
(
pool_
)
{
run_futures
.
emplace_back
(
pool_
->
enqueue
(
std
::
move
(
call
)));
}
else
{
fetch_data
.
emplace_back
(
std
::
move
(
call
()
));
fetch_data
.
emplace_back
(
call
(
));
}
}
...
...
@@ -137,7 +137,7 @@ FeedFetchList ParallelSSAGraphExecutor::Run(
if
(
exception_holder_
.
IsCaught
())
{
f
.
wait
();
}
else
{
fetch_data
.
emplace_back
(
std
::
move
(
f
.
get
()
));
fetch_data
.
emplace_back
(
f
.
get
(
));
}
}
}
...
...
paddle/fluid/framework/details/sequential_execution_pass.cc
浏览文件 @
88d3dc94
...
...
@@ -17,6 +17,7 @@
#include <unordered_map>
#include <unordered_set>
#include <vector>
#include "paddle/fluid/framework/details/memory_optimize_helper.h"
#include "paddle/fluid/framework/op_proto_maker.h"
namespace
paddle
{
...
...
paddle/fluid/framework/details/sequential_execution_pass.h
浏览文件 @
88d3dc94
...
...
@@ -21,8 +21,6 @@ namespace paddle {
namespace
framework
{
namespace
details
{
constexpr
char
kAllOpDescs
[]
=
"all_op_descs"
;
class
SequentialExecutionPass
:
public
ir
::
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
...
...
paddle/fluid/framework/feed_fetch_method.cc
浏览文件 @
88d3dc94
...
...
@@ -44,6 +44,7 @@ LoDTensor& GetFetchVariable(const Scope& scope, const std::string& var_name,
// Since we want to fetch LodTensor from a variable, the variable must
// be created alreadly.
Variable
*
g_fetch_value
=
scope
.
FindVar
(
var_name
);
PADDLE_ENFORCE_NOT_NULL
(
g_fetch_value
,
"%s is not found."
,
var_name
);
PADDLE_ENFORCE
(
g_fetch_value
->
IsType
<
FeedFetchList
>
(),
"Only %s can be invoked by GetFetchVariable"
,
typeid
(
FeedFetchList
).
name
());
...
...
paddle/fluid/framework/inplace_op_inference.h
0 → 100644
浏览文件 @
88d3dc94
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <functional>
#include <numeric>
#include <string>
#include <unordered_map>
#include "glog/logging.h"
#include "paddle/fluid/framework/block_desc.h"
#include "paddle/fluid/framework/details/memory_optimize_helper.h"
#include "paddle/fluid/framework/op_desc.h"
#include "paddle/fluid/framework/type_defs.h"
namespace
paddle
{
namespace
framework
{
/*
Inplace Inference for create In->Out pairs for inplaced operator.
If we specify a pair of corresponding names. For example, X->Out.
then Out will inplaced use X's memory. The base class will do
legality validation for both variables.
*/
class
InplaceOpInference
{
public:
virtual
~
InplaceOpInference
()
{}
virtual
std
::
unordered_map
<
std
::
string
,
std
::
string
>
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
=
0
;
};
class
InplaceInToOut
:
public
InplaceOpInference
{
public:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
ret
;
auto
in_out_var_names_pair
=
this
->
Apply
(
op_desc
,
block
);
for
(
auto
&
pair
:
in_out_var_names_pair
)
{
PADDLE_ENFORCE
(
!
op_desc
.
Input
(
pair
.
first
).
empty
(),
string
::
Sprintf
(
"op %s do not have input of %s!"
,
op_desc
.
Type
(),
pair
.
first
));
PADDLE_ENFORCE
(
!
op_desc
.
Output
(
pair
.
second
).
empty
(),
string
::
Sprintf
(
"op %s do not have output of %s!"
,
op_desc
.
Type
(),
pair
.
second
));
auto
&
in_name
=
op_desc
.
Input
(
pair
.
first
).
at
(
0
);
auto
&
out_name
=
op_desc
.
Output
(
pair
.
second
).
at
(
0
);
auto
in
=
block
->
FindRecursiveOrCreateVar
(
in_name
);
auto
out
=
block
->
FindRecursiveOrCreateVar
(
out_name
);
if
(
TryInplaceInputOutput
(
in
,
out
))
ret
.
insert
({
in_name
,
out_name
});
}
return
ret
;
}
protected:
virtual
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
=
0
;
bool
TryInplaceInputOutput
(
const
VarDesc
&
in
,
const
VarDesc
&
out
)
const
{
return
in
.
Name
()
!=
out
.
Name
()
&&
details
::
NodeCanReused
(
in
)
&&
details
::
NodeCanReused
(
out
)
&&
details
::
NodeSize
(
out
)
<=
details
::
NodeSize
(
in
);
}
};
/*
Inplace In and Out for operator only have an Input and an Output.
For example, activation op.
*/
class
SingleOpInplaceInToOut
:
public
InplaceInToOut
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
PADDLE_ENFORCE
(
!
op_desc
.
InputNames
().
empty
(),
"Op inputs must not be empty"
);
PADDLE_ENFORCE
(
!
op_desc
.
OutputNames
().
empty
(),
"Op outputs must not be empty"
);
auto
x_name
=
op_desc
.
InputNames
().
at
(
0
);
auto
out_name
=
op_desc
.
OutputNames
().
at
(
0
);
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
x_name
,
out_name
}};
}
};
/*
Gradient op. Inplace output use it's Input.
For example, Input@Grad->Input reuse strategy.
*/
class
GradOpInplaceInToOut
:
public
InplaceInToOut
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
ret
;
std
::
unordered_set
<
std
::
string
>
output_names
(
op_desc
.
OutputNames
().
begin
(),
op_desc
.
OutputNames
().
end
());
for
(
auto
&
input_name
:
op_desc
.
InputNames
())
{
if
(
output_names
.
count
(
GradVarName
(
input_name
)))
{
ret
.
insert
({
input_name
,
GradVarName
(
input_name
)});
}
}
return
ret
;
}
};
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/inplace_op_inference_test.cc
0 → 100644
浏览文件 @
88d3dc94
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <iterator>
#include <string>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_info.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/var_type_inference.h"
namespace
paddle
{
namespace
framework
{
class
NOP
:
public
OperatorBase
{
public:
NOP
(
const
std
::
string
&
type
,
const
VariableNameMap
&
inputs
,
const
VariableNameMap
&
outputs
,
const
AttributeMap
&
attrs
)
:
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
private:
void
RunImpl
(
const
Scope
&
scope
,
const
platform
::
Place
&
place
)
const
override
{}
};
class
SingleOpMaker
:
public
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
AddInput
(
"X"
,
""
).
AsDuplicable
();
AddOutput
(
"Out"
,
""
);
AddComment
(
""
);
}
};
class
SingleGradOpMaker
:
public
framework
::
SingleGradOpDescMaker
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
{
auto
*
op
=
new
framework
::
OpDesc
();
op
->
SetType
(
"single_op_grad"
);
op
->
SetInput
(
"Out"
,
OutputGrad
(
"Out"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"X"
),
InputGrad
(
"X"
));
return
std
::
unique_ptr
<
OpDesc
>
(
op
);
}
};
class
SingleOpShapeInference
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
ctx
)
const
override
{
ctx
->
HasInput
(
"X"
);
ctx
->
HasOutput
(
"Out"
);
ctx
->
SetOutputDim
(
"Out"
,
ctx
->
GetInputDim
(
"X"
));
}
};
class
SingleGradOpShapeInference
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
ctx
)
const
override
{
ctx
->
HasInput
(
framework
::
GradVarName
(
"Out"
));
ctx
->
HasOutput
(
framework
::
GradVarName
(
"X"
));
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
ctx
->
GetInputDim
(
"Out"
));
}
};
class
MultiOutOpMaker
:
public
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
AddInput
(
"X"
,
""
).
AsDuplicable
();
AddInput
(
"Y"
,
""
).
AsDuplicable
();
AddInput
(
"Z"
,
""
).
AsDuplicable
();
AddOutput
(
"Out"
,
""
);
AddOutput
(
"YOut"
,
""
);
AddOutput
(
"ZOut"
,
""
);
AddOutput
(
"NotReuseOut"
,
""
);
AddComment
(
""
);
}
};
class
MultiOutShapeInference
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
ctx
)
const
override
{
ctx
->
ShareDim
(
"X"
,
"Out"
);
ctx
->
ShareDim
(
"Y"
,
"YOut"
);
ctx
->
ShareDim
(
"Z"
,
"ZOut"
);
}
};
class
MultiGradOpMaker
:
public
framework
::
SingleGradOpDescMaker
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
{
auto
*
op
=
new
framework
::
OpDesc
();
op
->
SetType
(
"multi_out_grad"
);
op
->
SetInput
(
"X"
,
Input
(
"X"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"Y"
),
OutputGrad
(
"YOut"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"X"
),
OutputGrad
(
"Out"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"Z"
),
OutputGrad
(
"ZOut"
));
return
std
::
unique_ptr
<
framework
::
OpDesc
>
(
op
);
}
};
class
MultiOutGradShapeInference
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
ctx
)
const
override
{
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Y"
),
ctx
->
GetInputDim
(
framework
::
GradVarName
(
"YOut"
)));
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
ctx
->
GetInputDim
(
framework
::
GradVarName
(
"Out"
)));
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Z"
),
ctx
->
GetInputDim
(
framework
::
GradVarName
(
"ZOut"
)));
}
};
class
MultiOutInplaceInToOut
:
public
framework
::
InplaceInToOut
{
public:
using
framework
::
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{
{
"X"
,
"Out"
},
{
"Y"
,
"YOut"
},
{
"Z"
,
"ZOut"
},
};
}
};
class
MultiOutGradInplaceInToOut
:
public
framework
::
InplaceInToOut
{
public:
using
framework
::
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{
{
framework
::
GradVarName
(
"YOut"
),
framework
::
GradVarName
(
"Y"
)},
{
framework
::
GradVarName
(
"Out"
),
framework
::
GradVarName
(
"X"
)},
{
framework
::
GradVarName
(
"ZOut"
),
framework
::
GradVarName
(
"Z"
)},
};
}
};
}
// namespace framework
}
// namespace paddle
namespace
f
=
paddle
::
framework
;
REGISTER_OPERATOR
(
single_op
,
f
::
NOP
,
f
::
SingleOpMaker
,
f
::
SingleGradOpMaker
,
f
::
SingleOpInplaceInToOut
,
f
::
SingleOpShapeInference
);
REGISTER_OPERATOR
(
single_op_grad
,
f
::
NOP
,
f
::
SingleOpInplaceInToOut
,
f
::
SingleGradOpShapeInference
);
REGISTER_OPERATOR
(
multi_out_op
,
f
::
NOP
,
f
::
MultiOutOpMaker
,
f
::
MultiGradOpMaker
,
f
::
MultiOutInplaceInToOut
,
f
::
MultiOutShapeInference
);
REGISTER_OPERATOR
(
multi_out_grad
,
f
::
NOP
,
f
::
MultiOutGradInplaceInToOut
,
f
::
MultiOutGradShapeInference
);
namespace
paddle
{
namespace
framework
{
TEST
(
InferInplace
,
SingleOpInplaceInToOut
)
{
ProgramDesc
prog
;
auto
*
op
=
prog
.
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
"single_op"
);
op
->
SetInput
(
"X"
,
{
"test2_a"
,
"test2_b"
,
"test2_c"
});
op
->
SetOutput
(
"Out"
,
{
"test2_out"
});
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_a"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_a"
)
->
SetShape
({
32
,
64
});
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_b"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_c"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_out"
);
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_out"
)
->
SetShape
({
32
,
16
});
auto
&
infer_inplace
=
OpInfoMap
::
Instance
().
Get
(
op
->
Type
()).
infer_inplace_
;
auto
in_to_outs
=
infer_inplace
(
*
op
,
op
->
Block
());
EXPECT_EQ
(
in_to_outs
.
size
(),
1ul
);
auto
it
=
in_to_outs
.
begin
();
EXPECT_EQ
(
it
->
first
,
"test2_a"
);
EXPECT_EQ
(
it
->
second
,
"test2_out"
);
}
TEST
(
InferInplace
,
SingleGradOpInplaceInToOut
)
{
ProgramDesc
prog
;
auto
*
op
=
prog
.
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
"single_op_grad"
);
op
->
SetInput
(
GradVarName
(
"Out"
),
{
"test2_out"
});
op
->
SetOutput
(
GradVarName
(
"X"
),
{
"test2_a"
,
"test2_b"
,
"test2_c"
});
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_a"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_a"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_b"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_c"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_out"
);
prog
.
MutableBlock
(
0
)
->
Var
(
"test2_out"
)
->
SetShape
({
32
,
16
});
auto
&
infer_inplace
=
OpInfoMap
::
Instance
().
Get
(
op
->
Type
()).
infer_inplace_
;
auto
in_to_outs
=
infer_inplace
(
*
op
,
op
->
Block
());
EXPECT_EQ
(
in_to_outs
.
size
(),
1ul
);
auto
it
=
in_to_outs
.
begin
();
EXPECT_EQ
(
it
->
first
,
"test2_out"
);
EXPECT_EQ
(
it
->
second
,
"test2_a"
);
}
TEST
(
InferInplace
,
MultiOutInplaceInToOut
)
{
ProgramDesc
prog
;
auto
*
op
=
prog
.
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
"multi_out_op"
);
op
->
SetInput
(
"X"
,
{
"a0"
,
"a1"
});
op
->
SetInput
(
"Y"
,
{
"b0"
});
op
->
SetInput
(
"Z"
,
{
"c0"
,
"c1"
});
op
->
SetOutput
(
"Out"
,
{
"o0"
});
op
->
SetOutput
(
"YOut"
,
{
"y0"
});
op
->
SetOutput
(
"ZOut"
,
{
"z0"
});
prog
.
MutableBlock
(
0
)
->
Var
(
"a0"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"b0"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"c0"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"c1"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"o0"
);
prog
.
MutableBlock
(
0
)
->
Var
(
"y0"
);
prog
.
MutableBlock
(
0
)
->
Var
(
"z0"
);
prog
.
MutableBlock
(
0
)
->
Var
(
"a0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"b0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"c0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"o0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"y0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"z0"
)
->
SetShape
({
32
,
16
});
auto
&
infer_inplace
=
OpInfoMap
::
Instance
().
Get
(
op
->
Type
()).
infer_inplace_
;
auto
in_to_outs
=
infer_inplace
(
*
op
,
op
->
Block
());
EXPECT_EQ
(
in_to_outs
.
size
(),
3ul
);
std
::
unordered_map
<
std
::
string
,
std
::
string
>
expects
=
{
{
"a0"
,
"o0"
},
{
"b0"
,
"y0"
},
{
"c0"
,
"z0"
},
};
EXPECT_TRUE
(
expects
==
in_to_outs
);
}
TEST
(
InferInplace
,
MultiGradInplaceInToOut
)
{
ProgramDesc
prog
;
auto
*
op
=
prog
.
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
"multi_out_grad"
);
op
->
SetInput
(
GradVarName
(
"Out"
),
{
"o0"
});
op
->
SetInput
(
GradVarName
(
"YOut"
),
{
"y0"
});
op
->
SetInput
(
GradVarName
(
"ZOut"
),
{
"z0"
});
op
->
SetOutput
(
GradVarName
(
"X"
),
{
"a0"
,
"a1"
});
op
->
SetOutput
(
GradVarName
(
"Y"
),
{
"b0"
});
op
->
SetOutput
(
GradVarName
(
"Z"
),
{
"c0"
,
"c1"
});
prog
.
MutableBlock
(
0
)
->
Var
(
"a0"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"b0"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"c0"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"c1"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
prog
.
MutableBlock
(
0
)
->
Var
(
"o0"
);
prog
.
MutableBlock
(
0
)
->
Var
(
"y0"
);
prog
.
MutableBlock
(
0
)
->
Var
(
"z0"
);
prog
.
MutableBlock
(
0
)
->
Var
(
"a0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"b0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"c0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"o0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"y0"
)
->
SetShape
({
32
,
16
});
prog
.
MutableBlock
(
0
)
->
Var
(
"z0"
)
->
SetShape
({
32
,
16
});
auto
&
infer_inplace
=
OpInfoMap
::
Instance
().
Get
(
op
->
Type
()).
infer_inplace_
;
auto
in_to_outs
=
infer_inplace
(
*
op
,
op
->
Block
());
EXPECT_EQ
(
in_to_outs
.
size
(),
3ul
);
std
::
unordered_map
<
std
::
string
,
std
::
string
>
expects
=
{
{
"o0"
,
"a0"
},
{
"y0"
,
"b0"
},
{
"z0"
,
"c0"
},
};
EXPECT_TRUE
(
expects
==
in_to_outs
);
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/graph.cc
浏览文件 @
88d3dc94
...
...
@@ -76,7 +76,7 @@ std::map<std::string, std::vector<ir::Node *>> Graph::InitFromProgram(
var
->
inputs
.
push_back
(
node
);
}
}
return
std
::
move
(
var_nodes
)
;
return
var_nodes
;
}
void
Graph
::
ResolveHazard
(
...
...
paddle/fluid/framework/ir/graph.h
浏览文件 @
88d3dc94
...
...
@@ -141,7 +141,8 @@ class Graph {
ir
::
Node
*
CreateControlDepVar
()
{
// TODO(panyx0718): control var name should be really unique.
const
std
::
string
name
=
string
::
Sprintf
(
"%s@%llu"
,
ir
::
Node
::
kControlDepVarName
,
node_set_
.
size
());
"%s@%llu"
,
static_cast
<
const
char
*>
(
ir
::
Node
::
kControlDepVarName
),
num_node_created_
);
auto
*
x
=
AddNode
(
new
ir
::
Node
(
name
,
ir
::
Node
::
Type
::
kVariable
));
x
->
SetId
(
num_node_created_
++
);
return
x
;
...
...
paddle/fluid/framework/ir/graph_helper.cc
浏览文件 @
88d3dc94
...
...
@@ -52,16 +52,29 @@ bool HasCircleHelper(
ir
::
Node
*
node
,
const
std
::
map
<
ir
::
Node
*
,
std
::
unordered_set
<
ir
::
Node
*>>
&
adj_list
,
std
::
unordered_set
<
ir
::
Node
*>
*
visited
,
std
::
unordered_set
<
ir
::
Node
*>
*
in_trace
)
{
std
::
unordered_set
<
ir
::
Node
*>
*
in_trace
,
std
::
vector
<
std
::
vector
<
ir
::
Node
*>>
*
circles
)
{
if
(
visited
->
find
(
node
)
==
visited
->
end
())
{
visited
->
insert
(
node
);
in_trace
->
insert
(
node
);
for
(
ir
::
Node
*
in
:
adj_list
.
at
(
node
))
{
if
(
visited
->
find
(
in
)
==
visited
->
end
()
&&
HasCircleHelper
(
in
,
adj_list
,
visited
,
in_trace
))
{
HasCircleHelper
(
in
,
adj_list
,
visited
,
in_trace
,
circles
))
{
return
true
;
}
else
if
(
in_trace
->
find
(
in
)
!=
in_trace
->
end
())
{
if
(
circles
!=
nullptr
)
{
std
::
vector
<
ir
::
Node
*>
circle
;
circle
.
emplace_back
(
in
);
ir
::
Node
*
p
=
in
;
for
(
auto
&
adj
:
adj_list
.
at
(
p
))
{
if
(
in_trace
->
count
(
adj
))
{
circle
.
emplace_back
(
adj
);
p
=
adj
;
}
}
circles
->
emplace_back
(
circle
);
}
return
true
;
}
}
...
...
@@ -71,11 +84,12 @@ bool HasCircleHelper(
}
bool
HasCircleInternal
(
const
std
::
map
<
ir
::
Node
*
,
std
::
unordered_set
<
ir
::
Node
*>>
&
adj_list
)
{
const
std
::
map
<
ir
::
Node
*
,
std
::
unordered_set
<
ir
::
Node
*>>
&
adj_list
,
std
::
vector
<
std
::
vector
<
ir
::
Node
*>>
*
circles
)
{
std
::
unordered_set
<
ir
::
Node
*>
visited
;
std
::
unordered_set
<
ir
::
Node
*>
in_trace
;
for
(
auto
&
adj
:
adj_list
)
{
if
(
HasCircleHelper
(
adj
.
first
,
adj_list
,
&
visited
,
&
in_trace
))
{
if
(
HasCircleHelper
(
adj
.
first
,
adj_list
,
&
visited
,
&
in_trace
,
circles
))
{
return
true
;
}
}
...
...
@@ -84,13 +98,18 @@ bool HasCircleInternal(
}
// namespace
bool
HasCircle
(
const
Graph
&
graph
)
{
return
HasCircleInternal
(
BuildOperationAdjList
(
graph
));
return
HasCircleInternal
(
BuildOperationAdjList
(
graph
),
nullptr
);
}
bool
FindCircleSubGraph
(
const
Graph
&
graph
,
std
::
vector
<
std
::
vector
<
ir
::
Node
*>>
*
circles
)
{
return
HasCircleInternal
(
BuildOperationAdjList
(
graph
),
circles
);
}
std
::
vector
<
ir
::
Node
*>
TopologySortOperations
(
const
Graph
&
graph
)
{
std
::
map
<
ir
::
Node
*
,
std
::
unordered_set
<
ir
::
Node
*>>
adj_list
=
BuildOperationAdjList
(
graph
);
PADDLE_ENFORCE
(
!
HasCircleInternal
(
adj_list
));
PADDLE_ENFORCE
(
!
HasCircleInternal
(
adj_list
,
nullptr
));
std
::
unordered_set
<
ir
::
Node
*>
visited
;
std
::
vector
<
ir
::
Node
*>
ret
;
for
(
auto
adj
:
adj_list
)
{
...
...
paddle/fluid/framework/ir/graph_helper.h
浏览文件 @
88d3dc94
...
...
@@ -28,6 +28,11 @@ namespace ir {
// Test if the graph contains circle.
bool
HasCircle
(
const
Graph
&
graph
);
// Find All Circles for debugging,
// store all subgraph in circles.
bool
FindCircleSubGraph
(
const
Graph
&
graph
,
std
::
vector
<
std
::
vector
<
ir
::
Node
*>>
*
circles
);
size_t
GraphNum
(
const
Graph
&
graph
);
// Topology Sort the operations in the graph from inputs to outputs.
...
...
paddle/fluid/framework/ir/graph_helper_test.cc
浏览文件 @
88d3dc94
...
...
@@ -195,6 +195,17 @@ void BuildTwoGraphs(Graph* g) {
// v4->outputs.push_back(o5);
}
TEST
(
GraphHelperTest
,
Circles
)
{
ProgramDesc
prog
;
Graph
g
(
prog
);
BuildCircleGraph
(
&
g
);
std
::
vector
<
std
::
vector
<
ir
::
Node
*>>
circles
;
ASSERT_TRUE
(
FindCircleSubGraph
(
g
,
&
circles
));
ASSERT_EQ
(
circles
.
size
(),
1UL
);
}
TEST
(
GraphHelperTest
,
GraphNum
)
{
ProgramDesc
prog
;
...
...
paddle/fluid/framework/ir/graph_pattern_detector.cc
浏览文件 @
88d3dc94
...
...
@@ -117,11 +117,6 @@ bool GraphPatternDetector::MarkPDNodesInGraph(const ir::Graph &graph) {
// return false;
}
}
for
(
auto
&
item
:
pdnodes2nodes_
)
{
for
(
auto
&
n
:
item
.
second
)
{
GetMarkedNodes
(
const_cast
<
Graph
*>
(
&
graph
)).
insert
(
n
);
}
}
VLOG
(
3
)
<<
pdnodes2nodes_
.
size
()
<<
" nodes marked"
;
return
!
pdnodes2nodes_
.
empty
();
...
...
paddle/fluid/framework/ir/infer_clean_graph_pass.cc
浏览文件 @
88d3dc94
...
...
@@ -37,6 +37,7 @@ class InferCleanGraphPass : public FusePassBase {
std
::
unordered_set
<
const
Node
*>
invalid_nodes
;
int
valid_op
=
0
;
for
(
auto
*
node
:
graph
->
Nodes
())
{
PADDLE_ENFORCE_NOT_NULL
(
node
);
if
(
is_valid_node
(
node
))
{
invalid_nodes
.
insert
(
node
);
}
else
if
(
node
->
IsOp
())
{
...
...
paddle/fluid/framework/ir/seqpool_concat_fuse_pass_tester.cc
浏览文件 @
88d3dc94
...
...
@@ -164,7 +164,7 @@ ProgramDesc BuildProgramDesc(int num_inputs_of_concat) {
};
std
::
vector
<
std
::
string
>
concat_inputs
;
for
(
int
i
=
0
;
i
<
num_inputs_of_concat
;
++
i
)
{
std
::
string
prefix
=
"seqpool_op_"
+
i
;
std
::
string
prefix
=
"seqpool_op_"
+
std
::
to_string
(
i
)
;
new_var
(
prefix
+
"in"
);
new_var
(
prefix
+
"out"
);
new_var
(
prefix
+
"out_unused"
);
...
...
paddle/fluid/framework/op_info.h
浏览文件 @
88d3dc94
...
...
@@ -38,6 +38,7 @@ struct OpInfo {
OpAttrChecker
*
checker_
{
nullptr
};
InferVarTypeFN
infer_var_type_
;
InferShapeFN
infer_shape_
;
InferInplaceOpFN
infer_inplace_
;
bool
HasOpProtoAndChecker
()
const
{
return
proto_
!=
nullptr
&&
checker_
!=
nullptr
;
...
...
paddle/fluid/framework/operator.cc
浏览文件 @
88d3dc94
...
...
@@ -188,14 +188,14 @@ void OperatorBase::Run(const Scope& scope, const platform::Place& place) {
VLOG
(
3
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
}
catch
(
platform
::
EnforceNotMet
exception
)
{
if
(
Attrs
().
count
(
"sub_block"
)
!=
0
)
{
throw
exception
;
throw
;
}
auto
&
callstack
=
Attr
<
std
::
vector
<
std
::
string
>>
(
OpProtoAndCheckerMaker
::
OpCreationCallstackAttrName
());
if
(
callstack
.
empty
())
{
throw
exception
;
throw
;
}
std
::
ostringstream
sout
;
sout
<<
"Invoke operator "
<<
Type
()
<<
" error.
\n
"
;
...
...
@@ -206,7 +206,7 @@ void OperatorBase::Run(const Scope& scope, const platform::Place& place) {
sout
<<
"C++ Callstacks:
\n
"
;
sout
<<
exception
.
err_str_
;
exception
.
err_str_
=
sout
.
str
();
throw
exception
;
throw
;
}
catch
(...)
{
std
::
rethrow_exception
(
std
::
current_exception
());
}
...
...
@@ -589,7 +589,7 @@ class RuntimeInferShapeContext : public InferShapeContext {
public:
RuntimeInferShapeContext
(
const
OperatorBase
&
op
,
const
Scope
&
scope
,
const
RuntimeContext
&
ctx
)
:
op_
(
op
),
scope_
(
scope
),
ctx_
(
ctx
)
{}
:
op_
(
op
),
ctx_
(
ctx
)
{}
bool
HasInput
(
const
std
::
string
&
name
)
const
override
{
// has only one input
...
...
@@ -881,7 +881,6 @@ class RuntimeInferShapeContext : public InferShapeContext {
}
const
OperatorBase
&
op_
;
const
Scope
&
scope_
;
const
RuntimeContext
&
ctx_
;
};
...
...
@@ -990,10 +989,13 @@ void OperatorWithKernel::TransferInplaceVarsBack(
const
Scope
&
transfer_scope
)
const
{
for
(
auto
&
var_name
:
inplace_vars
)
{
VLOG
(
3
)
<<
"share inplace var "
+
var_name
+
" back to it's original scope"
;
auto
*
origin_var
=
scope
.
FindVar
(
var_name
);
PADDLE_ENFORCE_NOT_NULL
(
origin_var
,
"The var[%s] should not be nullptr."
,
var_name
);
auto
*
original_tensor
=
GetMutableLoDTensorOrSelectedRowsValueFromVar
(
scope
.
FindVar
(
var_name
)
);
GetMutableLoDTensorOrSelectedRowsValueFromVar
(
origin_var
);
auto
*
var
=
transfer_scope
.
FindVar
(
var_name
);
PADDLE_ENFORCE
(
var
!=
nullptr
,
"The var[%s] should not be nullptr
"
,
PADDLE_ENFORCE
_NOT_NULL
(
var
,
"The var[%s] should not be nullptr.
"
,
var_name
);
auto
*
transformed_tensor
=
GetLoDTensorOrSelectedRowsValueFromVar
(
*
var
);
original_tensor
->
ShareDataWith
(
*
transformed_tensor
);
...
...
paddle/fluid/framework/operator.h
浏览文件 @
88d3dc94
...
...
@@ -222,12 +222,7 @@ class ExecutionContext {
if
(
it
==
ctx_
.
inputs
.
end
())
{
return
{};
}
std
::
vector
<
const
Variable
*>
res
;
res
.
reserve
(
it
->
second
.
size
());
std
::
transform
(
it
->
second
.
begin
(),
it
->
second
.
end
(),
std
::
back_inserter
(
res
),
[
this
](
Variable
*
var
)
{
return
var
;
});
return
res
;
return
{
it
->
second
.
begin
(),
it
->
second
.
end
()};
}
std
::
vector
<
Variable
*>
MultiOutputVar
(
const
std
::
string
&
name
)
const
{
...
...
paddle/fluid/framework/parallel_executor.cc
浏览文件 @
88d3dc94
...
...
@@ -172,14 +172,6 @@ std::unique_ptr<ir::Graph> ParallelExecutorPrivate::PrepareGCAndRefCnts(
eager_deletion_pass
->
SetNotOwned
(
details
::
kAllPlaces
,
&
places_
);
graph
=
eager_deletion_pass
->
Apply
(
std
::
move
(
graph
));
VLOG
(
10
)
<<
"EagerDeletionPass Applied"
;
if
(
build_strategy_
.
memory_early_delete_
)
{
auto
early_delete_pass
=
ir
::
PassRegistry
::
Instance
().
Get
(
"memory_early_delete_pass"
);
early_delete_pass
->
SetNotOwned
(
details
::
kGarbageCollector
,
&
gcs_
);
graph
=
early_delete_pass
->
Apply
(
std
::
move
(
graph
));
}
VLOG
(
10
)
<<
"MemoryEarlyDeletePass Applied."
;
}
return
graph
;
...
...
@@ -277,6 +269,8 @@ ParallelExecutor::ParallelExecutor(
member_
->
use_cuda_
);
#endif
auto
max_memory_size
=
GetEagerDeletionThreshold
();
VLOG
(
10
)
<<
"Eager Deletion Threshold "
<<
static_cast
<
float
>
(
max_memory_size
)
/
(
1
<<
30
);
if
(
max_memory_size
>=
0
)
{
graph
=
member_
->
PrepareGCAndRefCnts
(
std
::
move
(
graph
),
static_cast
<
size_t
>
(
max_memory_size
));
...
...
@@ -503,6 +497,5 @@ ParallelExecutor::~ParallelExecutor() {
}
// namespace framework
}
// namespace paddle
USE_PASS
(
memory_early_delete_pass
);
USE_PASS
(
reference_count_pass
);
USE_PASS
(
eager_deletion_pass
);
paddle/fluid/framework/type_defs.h
浏览文件 @
88d3dc94
...
...
@@ -57,5 +57,8 @@ using InferVarTypeFN =
using
InferShapeFN
=
std
::
function
<
void
(
InferShapeContext
*
)
>
;
using
InplacePair
=
std
::
unordered_map
<
std
::
string
,
std
::
string
>
;
using
InferInplaceOpFN
=
std
::
function
<
InplacePair
(
const
OpDesc
&
,
BlockDesc
*
)
>
;
}
// namespace framework
}
// namespace paddle
paddle/fluid/imperative/CMakeLists.txt
浏览文件 @
88d3dc94
if
(
WITH_PYTHON
)
cc_library
(
layer SRCS layer.cc DEPS proto_desc operator device_context blas
)
cc_library
(
tracer SRCS tracer.cc DEPS proto_desc device_context
)
cc_library
(
layer SRCS layer.cc DEPS proto_desc operator device_context blas
pybind
)
cc_library
(
tracer SRCS tracer.cc DEPS proto_desc device_context
pybind
)
cc_library
(
engine SRCS engine.cc
)
endif
()
paddle/fluid/inference/CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -58,12 +58,13 @@ if(WIN32)
sep_library
(
paddle_fluid_shared SHARED SRCS
${
SHARED_INFERENCE_SRCS
}
DEPS
${
fluid_modules
}
paddle_fluid_api reset_tensor_array
analysis_config paddle_pass_builder
)
target_link_libraries
(
paddle_fluid_shared shlwapi
)
else
(
WIN32
)
cc_library
(
paddle_fluid_shared SHARED SRCS
${
SHARED_INFERENCE_SRCS
}
DEPS
${
fluid_modules
}
paddle_fluid_api reset_tensor_array
analysis_config paddle_pass_builder
)
endif
()
get_property
(
os_dependency_modules GLOBAL PROPERTY OS_DEPENDENCY_MODULES
)
target_link_libraries
(
paddle_fluid_shared
${
os_dependency_modules
}
)
set_target_properties
(
paddle_fluid_shared PROPERTIES OUTPUT_NAME paddle_fluid
)
if
(
NOT APPLE AND NOT WIN32
)
...
...
paddle/fluid/inference/analysis/ir_pass_manager.cc
浏览文件 @
88d3dc94
...
...
@@ -101,7 +101,7 @@ std::unique_ptr<Graph> IRPassManager::Apply(std::unique_ptr<Graph> graph) {
}
graph
=
pass
->
Apply
(
std
::
move
(
graph
));
}
return
std
::
move
(
graph
)
;
return
graph
;
}
framework
::
proto
::
ProgramDesc
IRPassManager
::
AcquireProgram
(
...
...
paddle/fluid/inference/analysis/ir_passes/CMakeLists.txt
浏览文件 @
88d3dc94
cc_library
(
subgraph_detector SRCS subgraph_detector.cc DEPS proto_desc
)
if
(
WITH_TESTING
)
add_dependencies
(
subgraph_detector gtest
)
endif
()
if
(
WITH_GPU AND TENSORRT_FOUND
)
cc_library
(
tensorrt_subgraph_pass SRCS tensorrt_subgraph_pass.cc DEPS subgraph_detector tensorrt_op_teller
)
...
...
paddle/fluid/inference/analysis/passes/memory_optimize_pass.cc
浏览文件 @
88d3dc94
...
...
@@ -18,6 +18,7 @@
#include <limits>
#include <map>
#include <string>
#include <type_traits>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/ir/graph_helper.h"
...
...
@@ -168,7 +169,11 @@ bool FindSuitableTensorToReuse(
if
(
!
cluster
->
count
(
candidate
))
continue
;
size_t
space
=
space_table
.
at
(
candidate
);
size_t
space_diff
=
std
::
abs
<
size_t
>
(
space
-
space_required
);
PADDLE_ENFORCE
(
space
<=
std
::
numeric_limits
<
std
::
make_signed
<
size_t
>::
type
>::
max
(),
"space overload"
);
size_t
space_diff
=
std
::
abs
((
std
::
make_signed
<
size_t
>::
type
)
space
-
space_required
);
if
(
space_diff
<
best_fit
.
second
)
{
best_fit
.
first
=
candidate
;
best_fit
.
second
=
space_diff
;
...
...
paddle/fluid/inference/api/CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -52,8 +52,8 @@ cc_test(test_analysis_predictor SRCS analysis_predictor_tester.cc DEPS analysis_
if
(
WITH_ANAKIN AND WITH_MKL
)
# only needed in CI
# compile the libinference_anakin_api.a and anakin.so.
cc_library
(
inference_anakin_api SRCS api.cc api_anakin_engine.cc DEPS anakin_shared anakin_saber mklml zero_copy_tensor_dummy
)
cc_library
(
inference_anakin_api_shared SHARED SRCS api.cc api_anakin_engine.cc DEPS anakin_shared anakin_saber zero_copy_tensor_dummy
)
cc_library
(
inference_anakin_api SRCS api.cc api_anakin_engine.cc DEPS anakin_shared anakin_saber mklml zero_copy_tensor_dummy
device_context
)
cc_library
(
inference_anakin_api_shared SHARED SRCS api.cc api_anakin_engine.cc DEPS anakin_shared anakin_saber zero_copy_tensor_dummy
device_context
)
function
(
anakin_target target_name
)
target_compile_options
(
${
target_name
}
BEFORE PUBLIC
${
ANAKIN_COMPILE_EXTRA_FLAGS
}
)
endfunction
()
...
...
paddle/fluid/inference/api/analysis_predictor.cc
浏览文件 @
88d3dc94
...
...
@@ -421,7 +421,7 @@ std::unique_ptr<PaddlePredictor> CreatePaddlePredictor<
if
(
!
dynamic_cast
<
AnalysisPredictor
*>
(
predictor
.
get
())
->
Init
(
nullptr
))
{
return
nullptr
;
}
return
std
::
move
(
predictor
)
;
return
predictor
;
}
void
AnalysisPredictor
::
PrepareFeedFetch
()
{
...
...
paddle/fluid/inference/api/paddle_api.h
浏览文件 @
88d3dc94
...
...
@@ -16,6 +16,12 @@
/*! \file paddle_api.h
*/
/*! \mainpage Paddle Inference APIs
* \section intro_sec Introduction
* The Paddle inference library aims to offer an high performance inference SDK
* for Paddle users.
*/
#include <cassert>
#include <memory>
#include <string>
...
...
@@ -34,26 +40,49 @@ enum PaddleDType {
};
/**
*
\brief Memory menager for PaddleTensor
.
*
\brief Memory manager for `PaddleTensor`
.
*
*The PaddleBuf holds a buffer for data input or output. The memory can be
*allocated by user or by PaddleBuf itself, but in any case, the PaddleBuf
*should be reused for better performance.
*
The PaddleBuf holds a buffer for data input or output. The memory can be
*
allocated by user or by PaddleBuf itself, but in any case, the PaddleBuf
*
should be reused for better performance.
*
*For user allocated memory, the following API can be used:
*- PaddleBuf(void* data, size_t length) to set an external memory by
*specifying
* the memory address and length.
*- Reset(void* data, size_t length) to reset the PaddleBuf with an external
* For user allocated memory, the following API can be used:
* - PaddleBuf(void* data, size_t length) to set an external memory by
* specifying the memory address and length.
* - Reset(void* data, size_t length) to reset the PaddleBuf with an external
*memory.
*ATTENTION, for user allocated memory, deallocation should be done by users
*
ATTENTION, for user allocated memory, deallocation should be done by users
*externally after the program finished. The PaddleBuf won't do any allocation
*or deallocation.
*
*To have the PaddleBuf allocate and manage the memory:
*- PaddleBuf(size_t length) will allocate a memory of size `length`.
*- Resize(size_t length) resize the memory to no less than `length`, ATTENTION
*
To have the PaddleBuf allocate and manage the memory:
*
- PaddleBuf(size_t length) will allocate a memory of size `length`.
*
- Resize(size_t length) resize the memory to no less than `length`, ATTENTION
* if the allocated memory is larger than `length`, nothing will done.
*
* Usage:
*
* Let PaddleBuf manage the memory internally.
* \code{cpp}
* const int num_elements = 128;
* PaddleBuf buf(num_elements * sizeof(float));
* \endcode
*
* Or
* \code{cpp}
* PaddleBuf buf;
* buf.Resize(num_elements * sizeof(float));
* \endcode
* Works the exactly the same.
*
* One can also make the `PaddleBuf` use the external memory.
* \code{cpp}
* PaddleBuf buf;
* void* external_memory = new float[num_elements];
* buf.Reset(external_memory, num_elements*sizeof(float));
* ...
* delete[] external_memory; // manage the memory lifetime outside.
* \endcode
*/
class
PaddleBuf
{
public:
...
...
@@ -78,7 +107,7 @@ class PaddleBuf {
/** Tell whether the buffer is empty.
*/
bool
empty
()
const
{
return
length_
==
0
;
}
/** Get the memory address.
/** Get the
data's
memory address.
*/
void
*
data
()
const
{
return
data_
;
}
/** Get the memory length.
...
...
@@ -110,7 +139,8 @@ struct PaddleTensor {
};
enum
class
PaddlePlace
{
kUNK
=
-
1
,
kCPU
,
kGPU
};
/** Tensor without copy, currently only supports AnalysisPredictor.
/** Tensor without copy, currently only supports `AnalysisPredictor`.
*/
class
ZeroCopyTensor
{
public:
...
...
@@ -269,9 +299,11 @@ struct NativeConfig : public PaddlePredictor::Config {
*
* Usage:
*
* \code{.cpp}
* NativeConfig config;
* ... // change the configs.
* auto native_predictor = CreatePaddlePredictor(config);
* \endcode
*
* FOR EXTENSION DEVELOPER:
* Different predictors are designated by config type. Similar configs can be
...
...
paddle/fluid/inference/api/paddle_pass_builder.cc
浏览文件 @
88d3dc94
...
...
@@ -66,8 +66,54 @@ void GpuPassStrategy::EnableMKLDNN() {
LOG
(
ERROR
)
<<
"GPU not support MKLDNN yet"
;
}
GpuPassStrategy
::
GpuPassStrategy
()
:
PassStrategy
({})
{
passes_
.
assign
({
"infer_clean_graph_pass"
,
//
"identity_scale_op_clean_pass"
,
//
"conv_affine_channel_fuse_pass"
,
//
"conv_eltwiseadd_affine_channel_fuse_pass"
,
//
"conv_bn_fuse_pass"
,
//
#if CUDNN_VERSION >= 7100 // To run conv_fusion, the version of cudnn must be
// guaranteed at least v7
"conv_elementwise_add_act_fuse_pass"
,
//
"conv_elementwise_add2_act_fuse_pass"
,
//
"conv_elementwise_add_fuse_pass"
,
//
#endif
});
for
(
int
i
=
6
;
i
>=
3
;
i
--
)
{
passes_
.
push_back
(
"transpose_flatten"
+
std
::
to_string
(
i
)
+
"_concat_fuse_pass"
);
}
use_gpu_
=
true
;
}
void
PaddlePassBuilder
::
AppendAnalysisPass
(
const
std
::
string
&
pass
)
{
analysis_passes_
.
push_back
(
pass
);
}
CpuPassStrategy
::
CpuPassStrategy
()
:
PassStrategy
({})
{
// NOTE the large fusions should be located in the front, so that they will
// not be damaged by smaller ones.
passes_
.
assign
({
"infer_clean_graph_pass"
,
//
"attention_lstm_fuse_pass"
,
//
"seqpool_concat_fuse_pass"
,
//
"seqconv_eltadd_relu_fuse_pass"
,
//
// "embedding_fc_lstm_fuse_pass", //
"fc_lstm_fuse_pass"
,
//
"mul_lstm_fuse_pass"
,
//
"fc_gru_fuse_pass"
,
//
"mul_gru_fuse_pass"
,
//
"seq_concat_fc_fuse_pass"
,
//
"fc_fuse_pass"
,
//
"repeated_fc_relu_fuse_pass"
,
//
"squared_mat_sub_fuse_pass"
,
//
"conv_bn_fuse_pass"
,
//
"conv_eltwiseadd_bn_fuse_pass"
,
//
"is_test_pass"
,
//
"identity_scale_op_clean_pass"
,
//
});
use_gpu_
=
false
;
}
}
// namespace paddle
paddle/fluid/inference/api/paddle_pass_builder.h
浏览文件 @
88d3dc94
...
...
@@ -97,30 +97,7 @@ class PassStrategy : public PaddlePassBuilder {
*/
class
CpuPassStrategy
:
public
PassStrategy
{
public:
CpuPassStrategy
()
:
PassStrategy
({})
{
// NOTE the large fusions should be located in the front, so that they will
// not be damaged by smaller ones.
passes_
.
assign
({
"infer_clean_graph_pass"
,
//
"attention_lstm_fuse_pass"
,
//
"seqpool_concat_fuse_pass"
,
//
"seqconv_eltadd_relu_fuse_pass"
,
//
// "embedding_fc_lstm_fuse_pass", //
"fc_lstm_fuse_pass"
,
//
"mul_lstm_fuse_pass"
,
//
"fc_gru_fuse_pass"
,
//
"mul_gru_fuse_pass"
,
//
"seq_concat_fc_fuse_pass"
,
//
"fc_fuse_pass"
,
//
"repeated_fc_relu_fuse_pass"
,
//
"squared_mat_sub_fuse_pass"
,
//
"conv_bn_fuse_pass"
,
//
"conv_eltwiseadd_bn_fuse_pass"
,
//
"is_test_pass"
,
//
"identity_scale_op_clean_pass"
,
//
});
use_gpu_
=
false
;
}
CpuPassStrategy
();
explicit
CpuPassStrategy
(
const
CpuPassStrategy
&
other
)
:
PassStrategy
(
other
.
AllPasses
())
{}
...
...
@@ -153,27 +130,7 @@ class CpuPassStrategy : public PassStrategy {
*/
class
GpuPassStrategy
:
public
PassStrategy
{
public:
GpuPassStrategy
()
:
PassStrategy
({})
{
passes_
.
assign
({
"infer_clean_graph_pass"
,
//
"identity_scale_op_clean_pass"
,
//
"conv_affine_channel_fuse_pass"
,
//
"conv_eltwiseadd_affine_channel_fuse_pass"
,
//
"conv_bn_fuse_pass"
,
//
#if CUDNN_VERSION >= 7100 // To run conv_fusion, the version of cudnn must be
// guaranteed at least v7
"conv_elementwise_add_act_fuse_pass"
,
//
"conv_elementwise_add2_act_fuse_pass"
,
//
"conv_elementwise_add_fuse_pass"
,
//
#endif
});
for
(
int
i
=
6
;
i
>=
3
;
i
--
)
{
passes_
.
push_back
(
"transpose_flatten"
+
std
::
to_string
(
i
)
+
"_concat_fuse_pass"
);
}
use_gpu_
=
true
;
}
GpuPassStrategy
();
explicit
GpuPassStrategy
(
const
GpuPassStrategy
&
other
)
:
PassStrategy
(
other
.
AllPasses
())
{
...
...
paddle/fluid/inference/utils/benchmark_tester.cc
浏览文件 @
88d3dc94
...
...
@@ -34,6 +34,6 @@ TEST(Benchmark, PersistToFile) {
benchmark
.
SetLatency
(
220
);
benchmark
.
PersistToFile
(
"1.log"
);
benchmark
.
PersistToFile
(
"
1
.log"
);
benchmark
.
PersistToFile
(
"
1
.log"
);
benchmark
.
PersistToFile
(
"
2
.log"
);
benchmark
.
PersistToFile
(
"
3
.log"
);
}
paddle/fluid/memory/allocation/allocator_facade.cc
浏览文件 @
88d3dc94
...
...
@@ -83,7 +83,7 @@ class ChunkedAllocator : public Allocator {
VLOG
(
1
)
<<
"Create AutoIncrementAllocator with chunk_size "
<<
max_chunk_size_
<<
" and capacity "
<<
capacity
;
default_allocator_
=
std
::
make_shared
<
AutoIncrementAllocator
>
(
[
this
]
{
return
std
::
move
(
CreateAllocatorWithChunk
()
);
},
capacity
);
[
this
]
{
return
CreateAllocatorWithChunk
(
);
},
capacity
);
}
}
...
...
paddle/fluid/memory/allocation/best_fit_allocator.cc
浏览文件 @
88d3dc94
...
...
@@ -111,6 +111,8 @@ size_t BestFitAllocator::NumFreeChunks() const {
}
void
BestFitAllocator
::
Free
(
Allocation
*
allocation
)
{
auto
*
bf_allocation
=
dynamic_cast
<
BestFitAllocation
*>
(
allocation
);
PADDLE_ENFORCE_NOT_NULL
(
bf_allocation
,
"The input allocation is not BestFitAllocation."
);
auto
chunk_it
=
bf_allocation
->
ChunkIterator
();
PADDLE_ENFORCE
(
!
chunk_it
->
is_free
);
chunk_it
->
is_free
=
true
;
...
...
paddle/fluid/memory/allocation/legacy_allocator.cc
浏览文件 @
88d3dc94
...
...
@@ -35,8 +35,8 @@ DEFINE_bool(init_allocated_mem, false,
"To find this error in time, we use init_allocated_mem to indicate "
"that initializing the allocated memory with a small value "
"during unit testing."
);
DECLARE_bool
(
benchmark
);
DECLARE_double
(
fraction_of_gpu_memory_to_use
);
DECLARE_bool
(
benchmark
);
namespace
paddle
{
namespace
memory
{
...
...
@@ -188,21 +188,20 @@ void *Alloc<platform::CUDAPlace>(const platform::CUDAPlace &place,
platform
::
SetDeviceId
(
place
.
device
);
size_t
avail
,
total
;
platform
::
GpuMemoryUsage
(
&
avail
,
&
total
);
LOG
(
WARNING
)
<<
"Cannot allocate "
<<
string
::
HumanReadableSize
(
size
)
LOG
(
FATAL
)
<<
"Cannot allocate "
<<
string
::
HumanReadableSize
(
size
)
<<
" in GPU "
<<
place
.
device
<<
", available "
<<
string
::
HumanReadableSize
(
avail
);
LOG
(
WARNING
)
<<
"total "
<<
total
;
LOG
(
WARNING
)
<<
"GpuMinChunkSize "
<<
string
::
HumanReadableSize
(
buddy_allocator
->
GetMinChunkSize
());
LOG
(
WARNING
)
<<
"GpuMaxChunkSize "
<<
string
::
HumanReadableSize
(
buddy_allocator
->
GetMaxChunkSize
());
LOG
(
WARNING
)
<<
"GPU memory used: "
<<
string
::
HumanReadableSize
(
avail
)
<<
"total "
<<
total
<<
"GpuMinChunkSize "
<<
string
::
HumanReadableSize
(
buddy_allocator
->
GetMinChunkSize
())
<<
"GpuMaxChunkSize "
<<
string
::
HumanReadableSize
(
buddy_allocator
->
GetMaxChunkSize
())
<<
"GPU memory used: "
<<
string
::
HumanReadableSize
(
Used
<
platform
::
CUDAPlace
>
(
place
));
platform
::
SetDeviceId
(
cur_dev
);
}
else
{
if
(
FLAGS_benchmark
)
allocation
::
GPUMemMonitor
.
Add
(
place
.
device
,
size
);
if
(
FLAGS_benchmark
)
{
allocation
::
GPUMemMonitor
.
Add
(
place
.
device
,
size
);
}
if
(
FLAGS_init_allocated_mem
)
{
cudaMemset
(
ptr
,
0xEF
,
size
);
}
...
...
@@ -218,7 +217,9 @@ void Free<platform::CUDAPlace>(const platform::CUDAPlace &place, void *p,
size_t
size
)
{
#ifdef PADDLE_WITH_CUDA
GetGPUBuddyAllocator
(
place
.
device
)
->
Free
(
p
);
if
(
FLAGS_benchmark
)
allocation
::
GPUMemMonitor
.
Minus
(
place
.
device
,
size
);
if
(
FLAGS_benchmark
)
{
allocation
::
GPUMemMonitor
.
Minus
(
place
.
device
,
size
);
}
#else
PADDLE_THROW
(
"'CUDAPlace' is not supported in CPU only device."
);
#endif
...
...
@@ -257,7 +258,7 @@ void *Alloc<platform::CUDAPinnedPlace>(const platform::CUDAPinnedPlace &place,
void
*
ptr
=
buddy_allocator
->
Alloc
(
size
);
if
(
ptr
==
nullptr
)
{
LOG
(
WARNING
)
<<
"cuda
MallocHost
Cannot allocate "
<<
size
LOG
(
WARNING
)
<<
"cuda
HostAlloc
Cannot allocate "
<<
size
<<
" bytes in CUDAPinnedPlace"
;
}
if
(
FLAGS_init_allocated_mem
)
{
...
...
paddle/fluid/memory/allocation/pinned_allocator.cc
浏览文件 @
88d3dc94
...
...
@@ -32,7 +32,7 @@ Allocation *CPUPinnedAllocator::AllocateImpl(size_t size,
// "CPUPinnedAllocator should be used for Cross-Device Communication");
void
*
ptr
;
PADDLE_ENFORCE
(
cuda
MallocHost
(
&
ptr
,
siz
e
));
PADDLE_ENFORCE
(
cuda
HostAlloc
(
&
ptr
,
size
,
cudaHostAllocPortabl
e
));
return
new
CPUPinnedAllocation
(
ptr
,
size
);
}
}
// namespace allocation
...
...
paddle/fluid/memory/allocation/pinned_allocator.h
浏览文件 @
88d3dc94
...
...
@@ -19,7 +19,7 @@ namespace paddle {
namespace
memory
{
namespace
allocation
{
// Allocator uses `cuda
MallocHost
`
// Allocator uses `cuda
HostAlloc
`
class
CPUPinnedAllocation
:
public
Allocation
{
public:
CPUPinnedAllocation
(
void
*
ptr
,
size_t
size
)
...
...
paddle/fluid/memory/detail/system_allocator.cc
浏览文件 @
88d3dc94
...
...
@@ -173,14 +173,14 @@ void* CUDAPinnedAllocator::Alloc(size_t* index, size_t size) {
void
*
p
;
// PINNED memory is visible to all CUDA contexts.
cudaError_t
result
=
cuda
MallocHost
(
&
p
,
siz
e
);
cudaError_t
result
=
cuda
HostAlloc
(
&
p
,
size
,
cudaHostAllocPortabl
e
);
if
(
result
==
cudaSuccess
)
{
*
index
=
1
;
// PINNED memory
cuda_pinnd_alloc_size_
+=
size
;
return
p
;
}
else
{
LOG
(
WARNING
)
<<
"cuda
MallocHost
failed."
;
LOG
(
WARNING
)
<<
"cuda
HostAlloc
failed."
;
return
nullptr
;
}
...
...
paddle/fluid/operators/activation_op.cc
浏览文件 @
88d3dc94
...
...
@@ -37,7 +37,7 @@ using paddle::framework::Tensor;
"(bool, default false) Set to true for inference only, false " \
"for training. Some layers may run faster when this is true.") \
.SetDefault(false); \
AddComment(
#OP_COMMENT);
\
AddComment(
OP_COMMENT);
\
} \
}
...
...
@@ -124,7 +124,7 @@ class ActivationOpGrad : public framework::OperatorWithKernel {
UNUSED
constexpr
char
SigmoidDoc
[]
=
R"DOC(
Sigmoid Activation Operator
$$out = \frac{1}{1 + e^{-x}}$$
$$out = \
\
frac{1}{1 + e^{-x}}$$
)DOC"
;
...
...
@@ -187,14 +187,14 @@ $out = |x|$
UNUSED
constexpr
char
CeilDoc
[]
=
R"DOC(
Ceil Activation Operator.
$out =
ceil(x)
$
$out =
\left \lceil x \right \rceil
$
)DOC"
;
UNUSED
constexpr
char
FloorDoc
[]
=
R"DOC(
Floor Activation Operator.
$out =
floor(x)
$
$out =
\left \lfloor x \right \rfloor
$
)DOC"
;
...
...
@@ -252,7 +252,7 @@ $out = \ln(1 + e^{x})$
UNUSED
constexpr
char
SoftsignDoc
[]
=
R"DOC(
Softsign Activation Operator.
$$out = \
frac{x}{1 + |x
|}$$
$$out = \
\frac{x}{1 + \|x\
|}$$
)DOC"
;
...
...
@@ -551,8 +551,10 @@ namespace ops = paddle::operators;
REGISTER_OPERATOR(KERNEL_TYPE, ::paddle::operators::ActivationOp, \
::paddle::operators::OP_NAME##OpMaker, \
::paddle::operators::ActivationOpInferVarType, \
::paddle::operators::OP_NAME##GradMaker); \
REGISTER_OPERATOR(KERNEL_TYPE##_grad, ::paddle::operators::ActivationOpGrad)
::paddle::operators::OP_NAME##GradMaker, \
::paddle::framework::SingleOpInplaceInToOut); \
REGISTER_OPERATOR(KERNEL_TYPE##_grad, ::paddle::operators::ActivationOpGrad, \
::paddle::framework::SingleOpInplaceInToOut)
#define REGISTER_ACTIVATION_OP(OP_NAME, KERNEL_TYPE) \
REGISTER_OPERATOR(KERNEL_TYPE, ::paddle::operators::ActivationOp, \
...
...
paddle/fluid/operators/batch_norm_op.cc
浏览文件 @
88d3dc94
...
...
@@ -604,13 +604,48 @@ class BatchNormGradMaker : public framework::SingleGradOpDescMaker {
}
};
class
BatchNormInplaceInToOut
:
public
framework
::
InplaceInToOut
{
public:
using
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
inplace_in_to_out
=
{
{
"Mean"
,
"MeanOut"
},
{
"Variance"
,
"VarianceOut"
},
{
"X"
,
"Y"
},
};
return
inplace_in_to_out
;
}
};
class
BatchNormGradInplaceInToOut
:
public
framework
::
InplaceInToOut
{
public:
using
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
inplace_in_to_out
=
{
// Scale, Bias, SavedMean, SavedVariance shape is [batch_size, C]
{
framework
::
GradVarName
(
"Y"
),
framework
::
GradVarName
(
"X"
)},
{
"SavedMean"
,
framework
::
GradVarName
(
"Scale"
)},
{
"SavedVariance"
,
framework
::
GradVarName
(
"Bias"
)},
};
return
inplace_in_to_out
;
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
batch_norm
,
ops
::
BatchNormOp
,
ops
::
BatchNormOpMaker
,
ops
::
BatchNormOpInferVarType
,
ops
::
BatchNormGradMaker
);
REGISTER_OPERATOR
(
batch_norm_grad
,
ops
::
BatchNormGradOp
);
ops
::
BatchNormOpInferVarType
,
ops
::
BatchNormGradMaker
,
ops
::
BatchNormInplaceInToOut
);
REGISTER_OPERATOR
(
batch_norm_grad
,
ops
::
BatchNormGradOp
,
ops
::
BatchNormGradInplaceInToOut
);
REGISTER_OP_CPU_KERNEL
(
batch_norm
,
ops
::
BatchNormKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
...
...
paddle/fluid/operators/conv_op.cc
浏览文件 @
88d3dc94
...
...
@@ -222,7 +222,7 @@ void Conv2DOpMaker::Make() {
.
SetDefault
(
4096
);
AddAttr
<
bool
>
(
"exhaustive_search"
,
"(bool, default false) cuDNN has many algorithm to calculation "
"convolution, whether enable exhaustive search "
,
"convolution, whether enable exhaustive search "
"for cuDNN convolution or not, defalut is False."
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
...
...
@@ -341,7 +341,7 @@ void Conv3DOpMaker::Make() {
.
SetDefault
(
4096
);
AddAttr
<
bool
>
(
"exhaustive_search"
,
"(bool, default false) cuDNN has many algorithm to calculation "
"convolution, whether enable exhaustive search "
,
"convolution, whether enable exhaustive search "
"for cuDNN convolution or not, defalut is False."
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
...
...
paddle/fluid/operators/detection/box_coder_op.cc
浏览文件 @
88d3dc94
...
...
@@ -38,20 +38,12 @@ class BoxCoderOp : public framework::OperatorWithKernel {
"The shape of PriorBox is [N, 4]"
);
if
(
ctx
->
HasInput
(
"PriorBoxVar"
))
{
auto
prior_box_var_dims
=
ctx
->
GetInputDim
(
"PriorBoxVar"
);
PADDLE_ENFORCE
(
prior_box_var_dims
.
size
()
==
1
||
prior_box_var_dims
.
size
()
==
2
,
"Input(PriorBoxVar) of BoxCoderOp should be 1 or 2."
);
if
(
prior_box_var_dims
.
size
()
==
1
)
{
PADDLE_ENFORCE_EQ
(
prior_box_var_dims
[
0
],
4
,
"The 1st dimension of Input(PriorBoxVar) should be 4"
"when the rank is 1."
);
}
else
{
PADDLE_ENFORCE
(
prior_box_var_dims
.
size
()
==
2
,
"Input(PriorBoxVar) of BoxCoderOp should be 2."
);
PADDLE_ENFORCE_EQ
(
prior_box_dims
,
prior_box_var_dims
,
"The dimension of Input(PriorBoxVar) should be equal to"
"the dimension of Input(PriorBox when the rank is 2.)"
);
}
"the dimension of Input(PriorBox) when the rank is 2."
);
}
}
...
...
paddle/fluid/operators/detection/box_coder_op.cu
浏览文件 @
88d3dc94
...
...
@@ -56,10 +56,7 @@ __global__ void EncodeCenterSizeKernel(
output
[
idx
*
len
+
2
]
=
log
(
fabs
(
target_box_width
/
prior_box_width
));
output
[
idx
*
len
+
3
]
=
log
(
fabs
(
target_box_height
/
prior_box_height
));
if
(
prior_box_var_data
)
{
int
prior_var_offset
=
0
;
if
(
prior_box_var_size
==
2
)
{
prior_var_offset
=
col_idx
*
len
;
}
int
prior_var_offset
=
col_idx
*
len
;
output
[
idx
*
len
]
/=
prior_box_var_data
[
prior_var_offset
];
output
[
idx
*
len
+
1
]
/=
prior_box_var_data
[
prior_var_offset
+
1
];
output
[
idx
*
len
+
2
]
/=
prior_box_var_data
[
prior_var_offset
+
2
];
...
...
@@ -99,10 +96,7 @@ __global__ void DecodeCenterSizeKernel(
T
box_var_x
=
T
(
1
),
box_var_y
=
T
(
1
);
T
box_var_w
=
T
(
1
),
box_var_h
=
T
(
1
);
if
(
prior_box_var_data
)
{
int
prior_var_offset
=
0
;
if
(
prior_box_var_size
==
2
)
{
prior_var_offset
=
axis
==
0
?
col_idx
*
len
:
row_idx
*
len
;
}
int
prior_var_offset
=
axis
==
0
?
col_idx
*
len
:
row_idx
*
len
;
box_var_x
=
prior_box_var_data
[
prior_var_offset
];
box_var_y
=
prior_box_var_data
[
prior_var_offset
+
1
];
box_var_w
=
prior_box_var_data
[
prior_var_offset
+
2
];
...
...
paddle/fluid/operators/detection/box_coder_op.h
浏览文件 @
88d3dc94
...
...
@@ -79,10 +79,7 @@ class BoxCoderKernel : public framework::OpKernel<T> {
output
[
offset
+
3
]
=
std
::
log
(
std
::
fabs
(
target_box_height
/
prior_box_height
));
if
(
prior_box_var
)
{
int
prior_var_offset
=
0
;
if
(
prior_box_var
->
dims
().
size
()
==
2
)
{
prior_var_offset
=
j
*
len
;
}
int
prior_var_offset
=
j
*
len
;
output
[
offset
]
/=
prior_box_var_data
[
prior_var_offset
];
output
[
offset
+
1
]
/=
prior_box_var_data
[
prior_var_offset
+
1
];
output
[
offset
+
2
]
/=
prior_box_var_data
[
prior_var_offset
+
2
];
...
...
@@ -95,11 +92,12 @@ class BoxCoderKernel : public framework::OpKernel<T> {
}
}
}
template
<
int
axis
,
int
var_size
>
void
DecodeCenterSize
(
const
framework
::
Tensor
*
target_box
,
const
framework
::
Tensor
*
prior_box
,
const
framework
::
Tensor
*
prior_box_var
,
const
bool
normalized
,
const
int
axis
,
const
std
::
vector
<
float
>
variance
,
T
*
output
)
const
{
const
bool
normalized
,
std
::
vector
<
float
>
variance
,
T
*
output
)
const
{
int64_t
row
=
target_box
->
dims
()[
0
];
int64_t
col
=
target_box
->
dims
()[
1
];
int64_t
len
=
target_box
->
dims
()[
2
];
...
...
@@ -107,19 +105,17 @@ class BoxCoderKernel : public framework::OpKernel<T> {
auto
*
target_box_data
=
target_box
->
data
<
T
>
();
auto
*
prior_box_data
=
prior_box
->
data
<
T
>
();
const
T
*
prior_box_var_data
=
nullptr
;
if
(
prior_box_var
)
prior_box_var_data
=
prior_box_var
->
data
<
T
>
();
if
(
var_size
==
2
)
prior_box_var_data
=
prior_box_var
->
data
<
T
>
();
int
prior_box_offset
=
0
;
T
var_data
[
4
]
=
{
1.
,
1.
,
1.
,
1.
};
T
*
var_ptr
=
var_data
;
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for collapse(2)
#endif
for
(
int64_t
i
=
0
;
i
<
row
;
++
i
)
{
for
(
int64_t
j
=
0
;
j
<
col
;
++
j
)
{
size_t
offset
=
i
*
col
*
len
+
j
*
len
;
if
(
axis
==
0
)
{
prior_box_offset
=
j
*
len
;
}
else
if
(
axis
==
1
)
{
prior_box_offset
=
i
*
len
;
}
prior_box_offset
=
axis
==
0
?
j
*
len
:
i
*
len
;
T
prior_box_width
=
prior_box_data
[
prior_box_offset
+
2
]
-
prior_box_data
[
prior_box_offset
]
+
(
normalized
==
false
);
...
...
@@ -133,26 +129,18 @@ class BoxCoderKernel : public framework::OpKernel<T> {
T
target_box_center_x
=
0
,
target_box_center_y
=
0
;
T
target_box_width
=
0
,
target_box_height
=
0
;
T
box_var_x
=
T
(
1
),
box_var_y
=
T
(
1
);
T
box_var_w
=
T
(
1
),
box_var_h
=
T
(
1
);
if
(
prior_box_var
)
{
int
prior_var_offset
=
0
;
if
(
prior_box_var
->
dims
().
size
()
==
2
)
{
if
(
axis
==
0
)
prior_var_offset
=
j
*
len
;
else
if
(
axis
==
1
)
prior_var_offset
=
i
*
len
;
}
box_var_x
=
prior_box_var_data
[
prior_var_offset
];
box_var_y
=
prior_box_var_data
[
prior_var_offset
+
1
];
box_var_w
=
prior_box_var_data
[
prior_var_offset
+
2
];
box_var_h
=
prior_box_var_data
[
prior_var_offset
+
3
];
}
else
if
(
!
(
variance
.
empty
()))
{
box_var_x
=
static_cast
<
T
>
(
variance
[
0
]);
box_var_y
=
static_cast
<
T
>
(
variance
[
1
]);
box_var_w
=
static_cast
<
T
>
(
variance
[
2
]);
box_var_h
=
static_cast
<
T
>
(
variance
[
3
]);
}
int
prior_var_offset
=
axis
==
0
?
j
*
len
:
i
*
len
;
if
(
var_size
==
2
)
{
std
::
memcpy
(
var_ptr
,
prior_box_var_data
+
prior_var_offset
,
4
*
sizeof
(
T
));
}
else
if
(
var_size
==
1
)
{
var_ptr
=
reinterpret_cast
<
T
*>
(
variance
.
data
());
}
T
box_var_x
=
*
var_ptr
;
T
box_var_y
=
*
(
var_ptr
+
1
);
T
box_var_w
=
*
(
var_ptr
+
2
);
T
box_var_h
=
*
(
var_ptr
+
3
);
target_box_center_x
=
box_var_x
*
target_box_data
[
offset
]
*
prior_box_width
+
prior_box_center_x
;
...
...
@@ -211,8 +199,31 @@ class BoxCoderKernel : public framework::OpKernel<T> {
EncodeCenterSize
(
target_box
,
prior_box
,
prior_box_var
,
normalized
,
variance
,
output
);
}
else
if
(
code_type
==
BoxCodeType
::
kDecodeCenterSize
)
{
DecodeCenterSize
(
target_box
,
prior_box
,
prior_box_var
,
normalized
,
axis
,
variance
,
output
);
if
(
prior_box_var
)
{
if
(
axis
==
0
)
{
DecodeCenterSize
<
0
,
2
>
(
target_box
,
prior_box
,
prior_box_var
,
normalized
,
variance
,
output
);
}
else
{
DecodeCenterSize
<
1
,
2
>
(
target_box
,
prior_box
,
prior_box_var
,
normalized
,
variance
,
output
);
}
}
else
if
(
!
(
variance
.
empty
()))
{
if
(
axis
==
0
)
{
DecodeCenterSize
<
0
,
1
>
(
target_box
,
prior_box
,
prior_box_var
,
normalized
,
variance
,
output
);
}
else
{
DecodeCenterSize
<
1
,
1
>
(
target_box
,
prior_box
,
prior_box_var
,
normalized
,
variance
,
output
);
}
}
else
{
if
(
axis
==
0
)
{
DecodeCenterSize
<
0
,
0
>
(
target_box
,
prior_box
,
prior_box_var
,
normalized
,
variance
,
output
);
}
else
{
DecodeCenterSize
<
1
,
0
>
(
target_box
,
prior_box
,
prior_box_var
,
normalized
,
variance
,
output
);
}
}
}
}
};
...
...
paddle/fluid/operators/detection/density_prior_box_op.h
浏览文件 @
88d3dc94
...
...
@@ -52,6 +52,10 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
step_height
=
step_h
;
}
int
num_priors
=
0
;
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for reduction(+ : num_priors)
#endif
for
(
size_t
i
=
0
;
i
<
densities
.
size
();
++
i
)
{
num_priors
+=
(
fixed_ratios
.
size
())
*
(
pow
(
densities
[
i
],
2
));
}
...
...
@@ -64,6 +68,17 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
auto
e_boxes
=
framework
::
EigenTensor
<
T
,
4
>::
From
(
*
boxes
).
setConstant
(
0.0
);
int
step_average
=
static_cast
<
int
>
((
step_width
+
step_height
)
*
0.5
);
std
::
vector
<
float
>
sqrt_fixed_ratios
;
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for
#endif
for
(
int
i
=
0
;
i
<
fixed_ratios
.
size
();
i
++
)
{
sqrt_fixed_ratios
.
push_back
(
sqrt
(
fixed_ratios
[
i
]));
}
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for collapse(2)
#endif
for
(
int
h
=
0
;
h
<
feature_height
;
++
h
)
{
for
(
int
w
=
0
;
w
<
feature_width
;
++
w
)
{
T
center_x
=
(
w
+
offset
)
*
step_width
;
...
...
@@ -73,34 +88,25 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
for
(
size_t
s
=
0
;
s
<
fixed_sizes
.
size
();
++
s
)
{
auto
fixed_size
=
fixed_sizes
[
s
];
int
density
=
densities
[
s
];
int
shift
=
step_average
/
density
;
// Generate density prior boxes with fixed ratios.
for
(
size_t
r
=
0
;
r
<
fixed_ratios
.
size
();
++
r
)
{
float
ar
=
fixed_ratios
[
r
];
int
shift
=
step_average
/
density
;
float
box_width_ratio
=
fixed_size
*
sqrt
(
ar
)
;
float
box_height_ratio
=
fixed_size
/
sqrt
(
ar
)
;
float
box_width_ratio
=
fixed_size
*
sqrt_
fixed_ratios
[
r
];
float
box_height_ratio
=
fixed_size
/
sqrt_fixed_ratios
[
r
]
;
float
density_center_x
=
center_x
-
step_average
/
2.
+
shift
/
2.
;
float
density_center_y
=
center_y
-
step_average
/
2.
+
shift
/
2.
;
for
(
int
di
=
0
;
di
<
density
;
++
di
)
{
for
(
int
dj
=
0
;
dj
<
density
;
++
dj
)
{
float
center_x_temp
=
center_x
-
step_average
/
2.
+
shift
/
2.
+
dj
*
shift
;
float
center_y_temp
=
center_y
-
step_average
/
2.
+
shift
/
2.
+
di
*
shift
;
e_boxes
(
h
,
w
,
idx
,
0
)
=
(
center_x_temp
-
box_width_ratio
/
2.
)
/
img_width
>=
0
?
(
center_x_temp
-
box_width_ratio
/
2.
)
/
img_width
:
0
;
e_boxes
(
h
,
w
,
idx
,
1
)
=
(
center_y_temp
-
box_height_ratio
/
2.
)
/
img_height
>=
0
?
(
center_y_temp
-
box_height_ratio
/
2.
)
/
img_height
:
0
;
e_boxes
(
h
,
w
,
idx
,
2
)
=
(
center_x_temp
+
box_width_ratio
/
2.
)
/
img_width
<=
1
?
(
center_x_temp
+
box_width_ratio
/
2.
)
/
img_width
:
1
;
e_boxes
(
h
,
w
,
idx
,
3
)
=
(
center_y_temp
+
box_height_ratio
/
2.
)
/
img_height
<=
1
?
(
center_y_temp
+
box_height_ratio
/
2.
)
/
img_height
:
1
;
float
center_x_temp
=
density_center_x
+
dj
*
shift
;
float
center_y_temp
=
density_center_y
+
di
*
shift
;
e_boxes
(
h
,
w
,
idx
,
0
)
=
std
::
max
(
(
center_x_temp
-
box_width_ratio
/
2.
)
/
img_width
,
0.
);
e_boxes
(
h
,
w
,
idx
,
1
)
=
std
::
max
(
(
center_y_temp
-
box_height_ratio
/
2.
)
/
img_height
,
0.
);
e_boxes
(
h
,
w
,
idx
,
2
)
=
std
::
min
(
(
center_x_temp
+
box_width_ratio
/
2.
)
/
img_width
,
1.
);
e_boxes
(
h
,
w
,
idx
,
3
)
=
std
::
min
(
(
center_y_temp
+
box_height_ratio
/
2.
)
/
img_height
,
1.
);
idx
++
;
}
}
...
...
@@ -131,8 +137,14 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
vars
->
Resize
({
box_num
,
static_cast
<
int
>
(
variances
.
size
())});
auto
e_vars
=
framework
::
EigenMatrix
<
T
,
Eigen
::
RowMajor
>::
From
(
*
vars
);
e_vars
=
var_et
.
broadcast
(
Eigen
::
DSizes
<
int
,
2
>
(
box_num
,
1
));
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for collapse(2)
#endif
for
(
int
i
=
0
;
i
<
box_num
;
++
i
)
{
for
(
int
j
=
0
;
j
<
variances
.
size
();
++
j
)
{
e_vars
(
i
,
j
)
=
variances
[
j
];
}
}
vars
->
Resize
(
var_dim
);
boxes
->
Resize
(
box_dim
);
...
...
paddle/fluid/operators/elementwise/elementwise_add_op.cc
浏览文件 @
88d3dc94
...
...
@@ -18,6 +18,7 @@ namespace ops = paddle::operators;
REGISTER_ELEMWISE_GRAD_MAKER
(
elementwise_add
,
Add
);
REGISTER_ELEMWISE_EXPLICIT_OP
(
elementwise_add
,
"Add"
,
"Out = X + Y"
,
"Out"
,
"X"
);
REGISTER_OP_CPU_KERNEL
(
elementwise_add
,
ops
::
ElementwiseAddKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
...
...
paddle/fluid/operators/elementwise/elementwise_op.h
浏览文件 @
88d3dc94
...
...
@@ -250,6 +250,37 @@ class ElemwiseGradKernel : public framework::OpKernel<T> {
}
};
class
ElementwiseOpInplace
:
public
framework
::
InplaceInToOut
{
public:
using
framework
::
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{
{
"X"
,
"Out"
},
};
}
};
class
ElementwiseGradOpInplace
:
public
framework
::
InplaceInToOut
{
public:
using
framework
::
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
ret
;
if
(
block
->
HasVar
(
framework
::
GradVarName
(
"X"
))
&&
block
->
HasVar
(
framework
::
GradVarName
(
"Out"
)))
{
ret
[
framework
::
GradVarName
(
"Out"
)]
=
framework
::
GradVarName
(
"X"
);
}
return
ret
;
}
};
}
// namespace operators
}
// namespace paddle
...
...
@@ -299,6 +330,8 @@ class ElemwiseGradKernel : public framework::OpKernel<T> {
REGISTER_OPERATOR(op_type, ::paddle::operators::ElementwiseOp, \
__ElemwiseOp##op_type##Maker__, \
::paddle::operators::ElementwiseOpInferVarType, \
op_type##GradMaker); \
op_type##GradMaker, \
::paddle::operators::ElementwiseOpInplace); \
REGISTER_OPERATOR(op_type##_grad, \
::paddle::operators::ElementwiseOpExplicitGrad)
::paddle::operators::ElementwiseOpExplicitGrad, \
::paddle::operators::ElementwiseGradOpInplace)
paddle/fluid/operators/expand_op.cc
浏览文件 @
88d3dc94
...
...
@@ -146,7 +146,11 @@ REGISTER_OPERATOR(expand, ops::ExpandOp, ops::ExpandOpMaker,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
REGISTER_OPERATOR
(
expand_grad
,
ops
::
ExpandGradOp
);
REGISTER_OP_CPU_KERNEL
(
expand
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
);
expand
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CPUDeviceContext
,
double
>
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CPUDeviceContext
,
int
>
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CPUDeviceContext
,
bool
>
);
REGISTER_OP_CPU_KERNEL
(
expand_grad
,
ops
::
ExpandGradKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
);
ops
::
ExpandGradKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
ops
::
ExpandGradKernel
<
paddle
::
platform
::
CPUDeviceContext
,
double
>
);
paddle/fluid/operators/expand_op.cu
浏览文件 @
88d3dc94
...
...
@@ -15,7 +15,11 @@ limitations under the License. */
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_CUDA_KERNEL
(
expand
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
);
expand
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
>
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CUDADeviceContext
,
bool
>
);
REGISTER_OP_CUDA_KERNEL
(
expand_grad
,
ops
::
ExpandGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
);
ops
::
ExpandGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
ExpandGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
);
paddle/fluid/operators/flatten_op.cc
浏览文件 @
88d3dc94
...
...
@@ -267,6 +267,35 @@ class Flatten2GradOp : public framework::OperatorBase {
}
};
class
FlattenOpInplaceInToOut
:
public
framework
::
InplaceInToOut
{
public:
using
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
inplace_in_to_out
=
{
{
"X"
,
"Out"
},
};
return
inplace_in_to_out
;
}
};
class
FlattenGradInplaceinToOut
:
public
framework
::
InplaceInToOut
{
using
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
inplace_in_to_out
=
{
{
framework
::
GradVarName
(
"Out"
),
framework
::
GradVarName
(
"X"
)},
};
return
inplace_in_to_out
;
}
};
}
// namespace operators
}
// namespace paddle
...
...
@@ -275,10 +304,13 @@ USE_OP(reshape);
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
flatten
,
ops
::
FlattenOp
,
ops
::
FlattenOpMaker
,
ops
::
FlattenOpInferShape
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
REGISTER_OPERATOR
(
flatten_grad
,
ops
::
FlattenGradOp
,
ops
::
FlattenGradInferShape
);
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
,
ops
::
FlattenOpInplaceInToOut
);
REGISTER_OPERATOR
(
flatten_grad
,
ops
::
FlattenGradOp
,
ops
::
FlattenGradInferShape
,
ops
::
FlattenGradInplaceinToOut
);
REGISTER_OPERATOR
(
flatten2
,
ops
::
Flatten2Op
,
ops
::
Flatten2OpMaker
,
ops
::
Flatten2OpInferShape
,
ops
::
Flatten2GradOpMaker
);
ops
::
Flatten2OpInferShape
,
ops
::
Flatten2GradOpMaker
,
ops
::
FlattenOpInplaceInToOut
);
REGISTER_OPERATOR
(
flatten2_grad
,
ops
::
Flatten2GradOp
,
ops
::
Flatten2GradInferShape
);
ops
::
Flatten2GradInferShape
,
ops
::
FlattenGradInplaceinToOut
);
paddle/fluid/operators/jit/gen/act.h
浏览文件 @
88d3dc94
...
...
@@ -63,7 +63,6 @@ class VActFunc : public JitCode {
public:
explicit
VActFunc
(
size_t
code_size
,
void
*
code_ptr
)
:
JitCode
(
code_size
,
code_ptr
)
{}
virtual
const
char
*
name
()
const
=
0
;
virtual
void
genCode
()
=
0
;
protected:
...
...
@@ -269,7 +268,7 @@ class VActJitCode : public VActFunc {
this
->
genCode
();
}
const
char
*
name
()
const
override
{
std
::
string
name
()
const
override
{
std
::
string
base
=
"VActJitCode"
;
switch
(
type_
)
{
case
operand_type
::
RELU
:
...
...
@@ -293,7 +292,7 @@ class VActJitCode : public VActFunc {
default:
break
;
}
return
base
.
c_str
()
;
return
base
;
}
void
genCode
()
override
;
...
...
paddle/fluid/operators/jit/gen/blas.h
浏览文件 @
88d3dc94
...
...
@@ -41,7 +41,7 @@ class VXXJitCode : public JitCode {
this
->
genCode
();
}
virtual
const
char
*
name
()
const
{
std
::
string
name
()
const
override
{
std
::
string
base
=
"VXXJitCode"
;
if
(
scalar_index_
==
1
)
{
base
+=
"_Scalar"
;
...
...
@@ -62,7 +62,7 @@ class VXXJitCode : public JitCode {
}
base
+=
(
with_relu_
?
"_Relu"
:
""
);
base
+=
"_D"
+
std
::
to_string
(
num_
);
return
base
.
c_str
()
;
return
base
;
}
void
genCode
()
override
;
...
...
paddle/fluid/operators/jit/gen/gru.h
浏览文件 @
88d3dc94
...
...
@@ -49,7 +49,7 @@ class GRUJitCode : public VActFunc {
this
->
genCode
();
}
const
char
*
name
()
const
override
{
std
::
string
name
()
const
override
{
std
::
string
base
=
"GRUJitCode"
;
if
(
id_
==
0
)
{
base
+=
"_H1"
;
...
...
@@ -81,7 +81,7 @@ class GRUJitCode : public VActFunc {
};
AddTypeStr
(
act_gate_
);
AddTypeStr
(
act_cand_
);
return
base
.
c_str
()
;
return
base
;
}
void
genCode
()
override
;
...
...
paddle/fluid/operators/jit/gen/hopv.h
浏览文件 @
88d3dc94
...
...
@@ -35,14 +35,14 @@ class HOPVJitCode : public JitCode {
this
->
genCode
();
}
virtual
const
char
*
name
()
const
{
std
::
string
name
()
const
override
{
std
::
string
base
=
"VXXJitCode"
;
if
(
type_
==
operand_type
::
MAX
)
{
base
+=
"_MAX"
;
}
else
{
base
+=
"_SUM"
;
}
return
base
.
c_str
()
;
return
base
;
}
void
genCode
()
override
;
...
...
paddle/fluid/operators/jit/gen/jitcode.h
浏览文件 @
88d3dc94
...
...
@@ -14,6 +14,7 @@
#pragma once
#include <string>
#include <type_traits>
#include "paddle/fluid/operators/jit/gen_base.h"
#include "paddle/fluid/platform/cpu_info.h"
...
...
@@ -59,7 +60,7 @@ typedef enum {
}
operand_type
;
#define DECLARE_JIT_CODE(codename) \
const char*
name() const override { return #codename; }
std::string
name() const override { return #codename; }
class
JitCode
:
public
GenBase
,
public
Xbyak
::
CodeGenerator
{
public:
...
...
@@ -68,7 +69,6 @@ class JitCode : public GenBase, public Xbyak::CodeGenerator {
(
code_size
%
4096
!=
0
?
(
code_size
/
4096
+
1
)
*
4096
:
code_size
),
code_ptr
)
{}
virtual
const
char
*
name
()
const
=
0
;
virtual
void
genCode
()
=
0
;
size_t
getSize
()
const
override
{
return
CodeGenerator
::
getSize
();
}
...
...
paddle/fluid/operators/jit/gen/lstm.h
浏览文件 @
88d3dc94
...
...
@@ -53,7 +53,7 @@ class LSTMJitCode : public VActFunc {
this
->
genCode
();
}
const
char
*
name
()
const
override
{
std
::
string
name
()
const
override
{
std
::
string
base
=
"LSTMJitCode"
;
if
(
use_peephole_
)
{
base
+=
"_Peephole"
;
...
...
@@ -85,7 +85,7 @@ class LSTMJitCode : public VActFunc {
AddTypeStr
(
act_gate_
);
AddTypeStr
(
act_cand_
);
AddTypeStr
(
act_cell_
);
return
base
.
c_str
()
;
return
base
;
}
void
genCode
()
override
;
...
...
paddle/fluid/operators/jit/gen/matmul.h
浏览文件 @
88d3dc94
...
...
@@ -36,11 +36,11 @@ class MatMulJitCode : public JitCode {
this
->
genCode
();
}
virtual
const
char
*
name
()
const
{
std
::
string
name
()
const
override
{
std
::
string
base
=
"MatMulJitCode"
;
base
=
base
+
"_M"
+
std
::
to_string
(
m_
)
+
"_N"
+
std
::
to_string
(
n_
)
+
"_K"
+
std
::
to_string
(
k_
);
return
base
.
c_str
()
;
return
base
;
}
void
genCode
()
override
;
...
...
paddle/fluid/operators/jit/gen/seqpool.h
浏览文件 @
88d3dc94
...
...
@@ -38,7 +38,7 @@ class SeqPoolJitCode : public JitCode {
this
->
genCode
();
}
virtual
const
char
*
name
()
const
{
std
::
string
name
()
const
override
{
std
::
string
base
=
"SeqPoolJitCode"
;
if
(
type_
==
SeqPoolType
::
kSum
)
{
base
+=
"_Sum"
;
...
...
@@ -48,7 +48,7 @@ class SeqPoolJitCode : public JitCode {
base
+=
"_Sqrt"
;
}
base
+=
(
"_W"
+
std
::
to_string
(
w_
));
return
base
.
c_str
()
;
return
base
;
}
void
genCode
()
override
;
...
...
paddle/fluid/operators/jit/gen_base.cc
浏览文件 @
88d3dc94
...
...
@@ -17,7 +17,13 @@
#include <iostream>
#include <sstream>
#include <vector>
#include "paddle/fluid/memory/allocation/cpu_allocator.h" // for posix_memalign
#include "paddle/fluid/platform/cpu_info.h"
#include "paddle/fluid/platform/enforce.h"
#ifndef _WIN32
#define posix_memalign_free free
#endif
DEFINE_bool
(
dump_jitcode
,
false
,
"Whether to dump the jitcode to file"
);
...
...
@@ -40,6 +46,17 @@ void GenBase::dumpCode(const unsigned char* code) const {
}
}
void
*
GenBase
::
operator
new
(
size_t
size
)
{
void
*
ptr
;
constexpr
size_t
alignment
=
32ul
;
PADDLE_ENFORCE_EQ
(
posix_memalign
(
&
ptr
,
alignment
,
size
),
0
,
"GenBase Alloc %ld error!"
,
size
);
PADDLE_ENFORCE
(
ptr
,
"Fail to allocate GenBase CPU memory: size = %d ."
,
size
);
return
ptr
;
}
void
GenBase
::
operator
delete
(
void
*
ptr
)
{
posix_memalign_free
(
ptr
);
}
std
::
vector
<
int
>
packed_groups
(
int
n
,
int
k
,
int
*
block_out
,
int
*
rest_out
)
{
int
block
;
int
max_num_regs
;
...
...
paddle/fluid/operators/jit/gen_base.h
浏览文件 @
88d3dc94
...
...
@@ -16,6 +16,7 @@
#include <gflags/gflags.h>
#include <memory> // for unique_ptr
#include <string>
#include <vector>
#include "paddle/fluid/operators/jit/kernel_base.h"
...
...
@@ -28,7 +29,7 @@ namespace jit {
class
GenBase
:
public
Kernel
{
public:
virtual
~
GenBase
()
=
default
;
virtual
const
char
*
name
()
const
=
0
;
virtual
std
::
string
name
()
const
=
0
;
virtual
size_t
getSize
()
const
=
0
;
virtual
const
unsigned
char
*
getCodeInternal
()
=
0
;
template
<
typename
Func
>
...
...
@@ -42,6 +43,11 @@ class GenBase : public Kernel {
return
reinterpret_cast
<
Func
>
(
const_cast
<
unsigned
char
*>
(
code
));
}
void
*
operator
new
(
size_t
size
);
void
operator
delete
(
void
*
ptr
);
void
*
operator
new
[](
size_t
size
)
{
return
operator
new
(
size
);
}
void
operator
delete
[](
void
*
ptr
)
{
operator
delete
(
ptr
);
}
protected:
void
dumpCode
(
const
unsigned
char
*
code
)
const
;
};
...
...
paddle/fluid/operators/lookup_table_op.h
浏览文件 @
88d3dc94
...
...
@@ -129,6 +129,7 @@ class LookupTableGradKernel : public framework::OpKernel<T> {
"must be either LoDTensor or SelectedRows"
);
}
int64_t
padding_idx
=
context
.
Attr
<
int64_t
>
(
"padding_idx"
);
bool
is_sparse
=
context
.
Attr
<
bool
>
(
"is_sparse"
);
// Since paddings are not trainable and fixed in forward, the gradient of
// paddings makes no sense and we don't deal with it in backward.
...
...
@@ -187,6 +188,10 @@ class LookupTableGradKernel : public framework::OpKernel<T> {
memset
(
d_table_data
,
0
,
d_table
->
numel
()
*
sizeof
(
T
));
for
(
int64_t
i
=
0
;
i
<
ids
->
numel
();
++
i
)
{
if
(
padding_idx
!=
kNoPadding
&&
ids_data
[
i
]
==
padding_idx
)
{
// the gradient of padding_idx should be 0, already done by memset, so
// do nothing.
}
else
{
PADDLE_ENFORCE_LT
(
ids_data
[
i
],
N
);
PADDLE_ENFORCE_GE
(
ids_data
[
i
],
0
);
for
(
int
j
=
0
;
j
<
D
;
++
j
)
{
...
...
@@ -195,6 +200,7 @@ class LookupTableGradKernel : public framework::OpKernel<T> {
}
}
}
}
};
}
// namespace operators
...
...
paddle/fluid/operators/math/CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -37,7 +37,7 @@ math_library(concat_and_split)
math_library
(
context_project DEPS im2col math_function
)
math_library
(
cross_entropy
)
math_library
(
cos_sim_functor
)
math_library
(
depthwise_conv
)
math_library
(
depthwise_conv
DEPS cub
)
math_library
(
im2col
)
math_library
(
sampler
)
...
...
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
浏览文件 @
88d3dc94
...
...
@@ -282,7 +282,7 @@ class FCMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
?
mkldnn
::
inner_product_backward_weights
::
desc
(
src
,
diff_weights
,
bias
,
diff_dst
)
:
mkldnn
::
inner_product_backward_weights
::
desc
(
src
,
diff_weights
,
bias
,
diff_dst
);
src
,
diff_weights
,
diff_dst
);
return
mkldnn
::
inner_product_backward_weights
::
primitive_desc
(
bwd_weight_desc
,
engine
,
pd
);
...
...
paddle/fluid/operators/ngraph/ngraph_bridge.cc
浏览文件 @
88d3dc94
...
...
@@ -31,8 +31,11 @@ std::map<std::string,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
)
>>
NgraphBridge
::
NG_NODE_MAP
=
{
{
"accuracy"
,
NG_OPS
::
BuildAccuracyNode
},
{
"conv2d"
,
NG_OPS
::
BuildConv2dNode
},
{
"conv2d_grad"
,
NG_OPS
::
BuildConv2dGradNode
},
{
"batch_norm"
,
NG_OPS
::
BuildBatchNormNode
},
{
"batch_norm_grad"
,
NG_OPS
::
BuildBatchNormGradNode
},
{
"elementwise_add"
,
NG_OPS
::
BuildElementwiseAddNode
},
{
"elementwise_add_grad"
,
NG_OPS
::
BuildElementwiseAddGradNode
},
{
"fill_constant"
,
NG_OPS
::
BuildFillConstantNode
},
...
...
@@ -45,8 +48,12 @@ std::map<std::string,
{
"softmax"
,
NG_OPS
::
BuildSoftmaxNode
},
{
"softmax_grad"
,
NG_OPS
::
BuildSoftmaxGradNode
},
{
"scale"
,
NG_OPS
::
BuildScaleNode
},
{
"sigmoid"
,
NG_OPS
::
BuildUnaryNode
<
ngraph
::
op
::
Sigmoid
>
},
{
"sum"
,
NG_OPS
::
BuildSumNode
},
{
"relu"
,
NG_OPS
::
BuildUnaryNode
<
ngraph
::
op
::
Relu
>
},
{
"relu_grad"
,
NG_OPS
::
BuildReluGradNode
},
{
"tanh"
,
NG_OPS
::
BuildUnaryNode
<
ngraph
::
op
::
Tanh
>
},
{
"tanh_grad"
,
NG_OPS
::
BuildTanhGradNode
},
{
"top_k"
,
NG_OPS
::
BuildTopKNode
}};
void
NgraphBridge
::
BuildNgNode
(
...
...
paddle/fluid/operators/ngraph/ngraph_engine_op.h
浏览文件 @
88d3dc94
...
...
@@ -35,7 +35,7 @@ class NgraphEngineOp : public framework::OperatorWithKernel {
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
framework
::
OpKernelType
kt
=
framework
::
OpKernelType
(
framework
::
proto
::
VarType
::
FP32
,
ctx
.
Get
Place
());
framework
::
proto
::
VarType
::
FP32
,
platform
::
CPU
Place
());
return
kt
;
}
};
...
...
paddle/fluid/operators/ngraph/ngraph_ops.h
浏览文件 @
88d3dc94
...
...
@@ -21,7 +21,10 @@ limitations under the License. */
#pragma once
#include "ops/binary_unnary_op.h"
#include "ops/accuracy_op.h"
#include "ops/activation_op.h"
#include "ops/batch_norm_op.h"
#include "ops/binary_unary_op.h"
#include "ops/conv2d_op.h"
#include "ops/elementwise_add_op.h"
#include "ops/fill_constant_op.h"
...
...
@@ -30,4 +33,5 @@ limitations under the License. */
#include "ops/pool2d_op.h"
#include "ops/scale_op.h"
#include "ops/softmax_op.h"
#include "ops/sum_op.h"
#include "ops/top_k_op.h"
paddle/fluid/operators/ngraph/ops/accuracy_op.h
0 → 100644
浏览文件 @
88d3dc94
/*Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include <vector>
#include "ngraph/ngraph.hpp"
#include "paddle/fluid/platform/ngraph_helper.h"
namespace
paddle
{
namespace
operators
{
namespace
ngraphs
{
void
BuildAccuracyNode
(
const
std
::
shared_ptr
<
framework
::
OperatorBase
>&
op
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
auto
indices
=
platform
::
GetInputNode
(
op
,
"Indices"
,
ngb_node_map
);
auto
label
=
platform
::
GetInputNode
(
op
,
"Label"
,
ngb_node_map
);
auto
inference
=
platform
::
GetInputNode
(
op
,
"Out"
,
ngb_node_map
);
auto
inference_shape
=
inference
->
get_shape
();
size_t
num_samples
=
inference_shape
.
at
(
0
);
size_t
k
=
inference_shape
.
at
(
1
);
std
::
shared_ptr
<
ngraph
::
Node
>
label_k
=
label
;
if
(
k
>
1
)
{
auto
label_1d
=
std
::
make_shared
<
ngraph
::
op
::
Reshape
>
(
label
,
ngraph
::
AxisVector
{
0
,
1
},
ngraph
::
Shape
{
num_samples
});
label_k
=
std
::
make_shared
<
ngraph
::
op
::
Broadcast
>
(
label_1d
,
inference_shape
,
ngraph
::
AxisSet
{
1
});
}
auto
node_equal
=
std
::
make_shared
<
ngraph
::
op
::
Equal
>
(
indices
,
label_k
);
auto
node_eq_int
=
std
::
make_shared
<
ngraph
::
op
::
Convert
>
(
node_equal
,
ngraph
::
element
::
i64
);
auto
num_correct_0d
=
std
::
make_shared
<
ngraph
::
op
::
Sum
>
(
node_eq_int
,
ngraph
::
AxisSet
{
0
,
1
});
std
::
shared_ptr
<
ngraph
::
Node
>
num_correct
=
platform
::
NgReshaper
(
num_correct_0d
,
ngraph
::
Shape
{
1
});
std
::
shared_ptr
<
ngraph
::
Node
>
n_samples
=
ngraph
::
op
::
Constant
::
create
(
ngraph
::
element
::
i64
,
ngraph
::
Shape
{
1
},
{
num_samples
});
std
::
shared_ptr
<
ngraph
::
Node
>
accuracy
=
std
::
make_shared
<
ngraph
::
op
::
Divide
>
(
std
::
make_shared
<
ngraph
::
op
::
Convert
>
(
num_correct
,
ngraph
::
element
::
f32
),
std
::
make_shared
<
ngraph
::
op
::
Convert
>
(
n_samples
,
ngraph
::
element
::
f32
));
platform
::
SetOutputNode
(
op
,
"Accuracy"
,
accuracy
,
ngb_node_map
);
platform
::
SetOutputNode
(
op
,
"Correct"
,
num_correct
,
ngb_node_map
);
platform
::
SetOutputNode
(
op
,
"Total"
,
n_samples
,
ngb_node_map
);
}
}
// namespace ngraphs
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/ngraph/ops/activation_op.h
0 → 100644
浏览文件 @
88d3dc94
/*Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include "ngraph/ngraph.hpp"
#include "paddle/fluid/platform/ngraph_helper.h"
namespace
paddle
{
namespace
operators
{
namespace
ngraphs
{
void
BuildReluGradNode
(
const
std
::
shared_ptr
<
framework
::
OperatorBase
>&
op
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
auto
out
=
platform
::
GetInputNode
(
op
,
"Out"
,
ngb_node_map
);
auto
dout
=
platform
::
GetInputNode
(
op
,
"Out@GRAD"
,
ngb_node_map
);
auto
relu_grad
=
std
::
make_shared
<
ngraph
::
op
::
ReluBackprop
>
(
out
,
dout
);
platform
::
SetOutputNode
(
op
,
"X@GRAD"
,
relu_grad
,
ngb_node_map
);
}
void
BuildTanhGradNode
(
const
std
::
shared_ptr
<
framework
::
OperatorBase
>&
op
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
auto
out
=
platform
::
GetInputNode
(
op
,
"Out"
,
ngb_node_map
);
auto
dout
=
platform
::
GetInputNode
(
op
,
"Out@GRAD"
,
ngb_node_map
);
auto
shape
=
out
->
get_shape
();
auto
node_const
=
ngraph
::
op
::
Constant
::
create
(
ngraph
::
element
::
f32
,
shape
,
{
1
});
auto
result
=
dout
*
(
node_const
-
out
*
out
);
platform
::
SetOutputNode
(
op
,
"X@GRAD"
,
result
,
ngb_node_map
);
}
}
// namespace ngraphs
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/ngraph/ops/batch_norm_op.h
0 → 100644
浏览文件 @
88d3dc94
/*Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include <vector>
#include "ngraph/ngraph.hpp"
#include "paddle/fluid/operators/ngraph/ops/elementwise_node.h"
#include "paddle/fluid/operators/ngraph/ops/elementwise_scalar_op.h"
#include "paddle/fluid/platform/ngraph_helper.h"
namespace
paddle
{
namespace
operators
{
namespace
ngraphs
{
void
BuildBatchNormNode
(
const
std
::
shared_ptr
<
paddle
::
framework
::
OperatorBase
>&
op
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
auto
op_attrs
=
paddle
::
framework
::
AttrReader
(
op
->
Attrs
());
auto
&
data_layout
=
op_attrs
.
Get
<
std
::
string
>
(
"data_layout"
);
auto
bias
=
paddle
::
platform
::
GetInputNode
(
op
,
"Bias"
,
ngb_node_map
);
auto
mean
=
paddle
::
platform
::
GetInputNode
(
op
,
"Mean"
,
ngb_node_map
);
auto
variance
=
paddle
::
platform
::
GetInputNode
(
op
,
"Variance"
,
ngb_node_map
);
auto
scale
=
paddle
::
platform
::
GetInputNode
(
op
,
"Scale"
,
ngb_node_map
);
auto
x
=
paddle
::
platform
::
GetInputNode
(
op
,
"X"
,
ngb_node_map
);
const
bool
is_test
=
op_attrs
.
Get
<
bool
>
(
"is_test"
);
const
float
epsilon
=
op_attrs
.
Get
<
float
>
(
"epsilon"
);
const
float
momentum
=
op_attrs
.
Get
<
float
>
(
"momentum"
);
if
(
data_layout
==
"NHWC"
)
{
x
=
paddle
::
platform
::
Nhwc2Nchw
(
x
);
}
std
::
shared_ptr
<
ngraph
::
Node
>
mean_out
,
saved_mean
,
saved_variance
,
variance_out
,
y
;
if
(
!
is_test
)
{
auto
BN
=
std
::
make_shared
<
ngraph
::
op
::
BatchNormTraining
>
(
epsilon
,
scale
,
bias
,
x
);
y
=
std
::
make_shared
<
ngraph
::
op
::
GetOutputElement
>
(
BN
,
0
);
saved_mean
=
std
::
make_shared
<
ngraph
::
op
::
GetOutputElement
>
(
BN
,
1
);
saved_variance
=
std
::
make_shared
<
ngraph
::
op
::
GetOutputElement
>
(
BN
,
2
);
mean_out
=
std
::
make_shared
<
ngraph
::
op
::
Add
>
(
paddle
::
operators
::
ngraphs
::
ElementwiseScalar
<
ngraph
::
op
::
Multiply
>
(
momentum
,
mean
),
paddle
::
operators
::
ngraphs
::
ElementwiseScalar
<
ngraph
::
op
::
Multiply
>
(
1.
-
momentum
,
saved_mean
));
variance_out
=
std
::
make_shared
<
ngraph
::
op
::
Add
>
(
paddle
::
operators
::
ngraphs
::
ElementwiseScalar
<
ngraph
::
op
::
Multiply
>
(
momentum
,
variance
),
paddle
::
operators
::
ngraphs
::
ElementwiseScalar
<
ngraph
::
op
::
Multiply
>
(
1.
-
momentum
,
saved_variance
));
if
(
data_layout
==
"NHWC"
)
{
y
=
paddle
::
platform
::
Nchw2Nhwc
(
y
);
}
paddle
::
platform
::
SetOutputNode
(
op
,
"MeanOut"
,
mean_out
,
ngb_node_map
);
paddle
::
platform
::
SetOutputNode
(
op
,
"VarianceOut"
,
variance_out
,
ngb_node_map
);
paddle
::
platform
::
SetOutputNode
(
op
,
"SavedMean"
,
saved_mean
,
ngb_node_map
);
paddle
::
platform
::
SetOutputNode
(
op
,
"SavedVariance"
,
saved_variance
,
ngb_node_map
);
paddle
::
platform
::
SetOutputNode
(
op
,
"Y"
,
y
,
ngb_node_map
);
}
else
{
y
=
std
::
make_shared
<
ngraph
::
op
::
BatchNormInference
>
(
epsilon
,
scale
,
bias
,
x
,
mean
,
variance
);
paddle
::
platform
::
SetOutputNode
(
op
,
"Y"
,
y
,
ngb_node_map
);
}
}
void
BuildBatchNormGradNode
(
const
std
::
shared_ptr
<
paddle
::
framework
::
OperatorBase
>&
op
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
auto
op_attrs
=
paddle
::
framework
::
AttrReader
(
op
->
Attrs
());
auto
&
data_layout
=
op_attrs
.
Get
<
std
::
string
>
(
"data_layout"
);
auto
bias
=
paddle
::
platform
::
GetInputNode
(
op
,
"Bias"
,
ngb_node_map
);
auto
saved_mean
=
paddle
::
platform
::
GetInputNode
(
op
,
"SavedMean"
,
ngb_node_map
);
auto
saved_variance
=
paddle
::
platform
::
GetInputNode
(
op
,
"SavedVariance"
,
ngb_node_map
);
auto
scale
=
paddle
::
platform
::
GetInputNode
(
op
,
"Scale"
,
ngb_node_map
);
auto
x
=
paddle
::
platform
::
GetInputNode
(
op
,
"X"
,
ngb_node_map
);
auto
dy
=
paddle
::
platform
::
GetInputNode
(
op
,
"Y@GRAD"
,
ngb_node_map
);
auto
x_shape
=
x
->
get_shape
();
auto
dy_shape
=
dy
->
get_shape
();
PADDLE_ENFORCE
(
x_shape
.
size
()
==
2
||
x_shape
.
size
()
==
4
,
"BN grap input size needs to be 2 or 4"
);
PADDLE_ENFORCE_EQ
(
x_shape
.
size
(),
dy_shape
.
size
(),
"BN grap input and delta size needs to be equal"
);
if
(
x_shape
.
size
()
==
2
)
{
x
=
std
::
make_shared
<
ngraph
::
op
::
Reshape
>
(
x
,
ngraph
::
AxisVector
{
0
,
1
},
ngraph
::
Shape
{
x_shape
.
at
(
0
),
x_shape
.
at
(
1
),
1
,
1
});
dy
=
std
::
make_shared
<
ngraph
::
op
::
Reshape
>
(
dy
,
ngraph
::
AxisVector
{
0
,
1
},
ngraph
::
Shape
{
dy_shape
.
at
(
0
),
dy_shape
.
at
(
1
),
1
,
1
});
}
if
(
data_layout
==
"NHWC"
)
{
x
=
paddle
::
platform
::
Nhwc2Nchw
(
dy
);
dy
=
paddle
::
platform
::
Nhwc2Nchw
(
dy
);
}
const
float
epsilon
=
op_attrs
.
Get
<
float
>
(
"epsilon"
);
auto
bn_bprop
=
std
::
make_shared
<
ngraph
::
op
::
BatchNormTrainingBackprop
>
(
epsilon
,
scale
,
bias
,
x
,
saved_mean
,
saved_variance
,
dy
);
std
::
shared_ptr
<
ngraph
::
Node
>
dx
=
std
::
make_shared
<
ngraph
::
op
::
GetOutputElement
>
(
bn_bprop
,
0
);
auto
dscale
=
std
::
make_shared
<
ngraph
::
op
::
GetOutputElement
>
(
bn_bprop
,
1
);
auto
dbias
=
std
::
make_shared
<
ngraph
::
op
::
GetOutputElement
>
(
bn_bprop
,
2
);
paddle
::
platform
::
SetOutputNode
(
op
,
"Bias@GRAD"
,
dbias
,
ngb_node_map
);
paddle
::
platform
::
SetOutputNode
(
op
,
"Scale@GRAD"
,
dscale
,
ngb_node_map
);
if
(
x_shape
.
size
()
==
2
)
{
paddle
::
platform
::
SetOutputNode
(
op
,
"X@GRAD"
,
paddle
::
platform
::
NgReshaper
(
dx
,
x_shape
),
ngb_node_map
);
}
else
{
if
(
data_layout
==
"NHWC"
)
{
dx
=
paddle
::
platform
::
Nchw2Nhwc
(
dx
);
}
paddle
::
platform
::
SetOutputNode
(
op
,
"X@GRAD"
,
dx
,
ngb_node_map
);
}
}
}
// namespace ngraphs
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/ngraph/ops/binary_un
n
ary_op.h
→
paddle/fluid/operators/ngraph/ops/binary_unary_op.h
浏览文件 @
88d3dc94
文件已移动
paddle/fluid/operators/ngraph/ops/sum_op.h
0 → 100644
浏览文件 @
88d3dc94
/*Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include <vector>
#include "ngraph/ngraph.hpp"
#include "paddle/fluid/platform/ngraph_helper.h"
namespace
paddle
{
namespace
operators
{
namespace
ngraphs
{
void
BuildSumNode
(
const
std
::
shared_ptr
<
framework
::
OperatorBase
>&
op
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
std
::
vector
<
std
::
string
>
op_inputs
;
for
(
auto
&
var_name_item
:
op
->
Inputs
())
{
for
(
auto
&
var_name
:
var_name_item
.
second
)
{
op_inputs
.
push_back
(
var_name
);
if
(
ngb_node_map
->
find
(
var_name
)
==
ngb_node_map
->
end
())
{
PADDLE_THROW
(
"op % input varname %s is not found in var_node_map"
,
op
->
Type
(),
var_name
);
}
}
}
std
::
shared_ptr
<
ngraph
::
Node
>&
sum
=
ngb_node_map
->
at
(
op_inputs
[
0
]);
for
(
size_t
k
=
1
;
k
<
op_inputs
.
size
();
++
k
)
{
std
::
shared_ptr
<
ngraph
::
Node
>&
nodek
=
ngb_node_map
->
at
(
op_inputs
[
k
]);
if
(
nodek
->
get_element_type
()
!=
sum
->
get_element_type
())
{
nodek
=
std
::
make_shared
<
ngraph
::
op
::
Convert
>
(
nodek
,
sum
->
get_element_type
());
}
sum
=
sum
+
nodek
;
}
platform
::
SetOutputNode
(
op
,
"Out"
,
sum
,
ngb_node_map
);
}
}
// namespace ngraphs
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/ngraph/ops/top_k_op.h
浏览文件 @
88d3dc94
...
...
@@ -36,11 +36,6 @@ void BuildTopKNode(
std
::
make_shared
<
ngraph
::
op
::
GetOutputElement
>
(
top_k
,
0
);
std
::
shared_ptr
<
ngraph
::
Node
>
out
=
std
::
make_shared
<
ngraph
::
op
::
GetOutputElement
>
(
top_k
,
1
);
auto
dummy_out
=
paddle
::
platform
::
GetOutputNode
(
op
,
"Out"
,
ngb_node_map
);
if
(
dummy_out
&&
dummy_out
->
get_element_type
()
!=
out
->
get_element_type
())
{
out
=
std
::
make_shared
<
ngraph
::
op
::
Convert
>
(
out
,
dummy_out
->
get_element_type
());
}
paddle
::
platform
::
SetOutputNode
(
op
,
"Indices"
,
indices
,
ngb_node_map
);
paddle
::
platform
::
SetOutputNode
(
op
,
"Out"
,
out
,
ngb_node_map
);
}
...
...
paddle/fluid/operators/norm_op.h
浏览文件 @
88d3dc94
...
...
@@ -99,10 +99,10 @@ class NormGradKernel : public framework::OpKernel<T> {
auto
dx_e
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
out_dx
);
Eigen
::
DSizes
<
int
,
3
>
shape
(
pre
,
n
,
post
);
Eigen
::
DSizes
<
int
,
2
>
norm_shape
(
pre
,
post
);
Eigen
::
DSizes
<
int
,
3
>
rshape
(
pre
,
1
,
post
);
auto
x
=
x_e
.
reshape
(
shape
);
auto
dy
=
dy_e
.
reshape
(
shape
);
auto
norm
=
norm_e
.
reshape
(
norm_
shape
);
auto
norm
=
norm_e
.
reshape
(
r
shape
);
auto
dx
=
dx_e
.
reshape
(
shape
);
framework
::
Tensor
rsum
;
...
...
@@ -111,7 +111,6 @@ class NormGradKernel : public framework::OpKernel<T> {
Eigen
::
DSizes
<
int
,
1
>
rdim
(
1
);
Eigen
::
DSizes
<
int
,
3
>
bcast
(
1
,
n
,
1
);
Eigen
::
DSizes
<
int
,
3
>
rshape
(
pre
,
1
,
post
);
// dx = ( dy/sqrt(sum(x*x)) ) * [1 - x*sum(x) / (sum(x*x) + e)]
// = [dy - dy * x * sum(x) / (sum(x*x) + e)] / sqrt(sum(x*x))
...
...
paddle/fluid/operators/pool_op.cc
浏览文件 @
88d3dc94
...
...
@@ -259,7 +259,7 @@ Example:
W_{out} = \\frac{(W_{in} - ksize[1] + 2 * paddings[1] + strides[1] - 1)}{strides[1]} + 1
$$
For exclusive =
tru
e:
For exclusive =
fals
e:
$$
hstart = i * strides[0] - paddings[0]
hend = hstart + ksize[0]
...
...
@@ -267,7 +267,7 @@ Example:
wend = wstart + ksize[1]
Output(i ,j) = \\frac{sum(Input[hstart:hend, wstart:wend])}{ksize[0] * ksize[1]}
$$
For exclusive =
fals
e:
For exclusive =
tru
e:
$$
hstart = max(0, i * strides[0] - paddings[0])
hend = min(H, hstart + ksize[0])
...
...
@@ -403,7 +403,7 @@ Example:
H_{out} = \frac{(H_{in} - ksize[1] + 2 * paddings[1] + strides[1] -1)}{strides[1]} + 1 \\
W_{out} = \frac{(W_{in} - ksize[2] + 2 * paddings[2] + strides[2] -1)}{strides[2]} + 1
$$
For exclusive =
tru
e:
For exclusive =
fals
e:
$$
dstart = i * strides[0] - paddings[0]
dend = dstart + ksize[0]
...
...
@@ -413,7 +413,7 @@ Example:
wend = wstart + ksize[2]
Output(i ,j, k) = \\frac{sum(Input[dstart:dend, hstart:hend, wstart:wend])}{ksize[0] * ksize[1] * ksize[2]}
$$
For exclusive =
fals
e:
For exclusive =
tru
e:
$$
dstart = max(0, i * strides[0] - paddings[0])
dend = min(D, dstart + ksize[0])
...
...
paddle/fluid/operators/random_crop_op.h
浏览文件 @
88d3dc94
...
...
@@ -121,7 +121,7 @@ struct RandomCropFunctor {
HOSTDEVICE
void
operator
()(
size_t
ins_idx
)
{
typename
Random
<
DeviceContext
>::
Engine
engine
(
seed_
);
engine
.
discard
(
ins_idx
*
(
rank_
-
num_batchsize_dims_
));
size_t
offsets
[
9
];
size_t
offsets
[
9
]
=
{}
;
for
(
int
i
=
num_batchsize_dims_
;
i
<
rank_
;
++
i
)
{
typename
Random
<
DeviceContext
>::
template
UniformIntDist
<
size_t
>
dist
(
0
,
x_dims_
[
i
]
-
out_dims_
[
i
]);
...
...
paddle/fluid/operators/reader/buffered_reader.cc
浏览文件 @
88d3dc94
...
...
@@ -14,6 +14,7 @@
#include "paddle/fluid/operators/reader/buffered_reader.h"
#include <vector>
#include "paddle/fluid/framework/data_type.h"
namespace
paddle
{
namespace
operators
{
...
...
@@ -24,6 +25,13 @@ BufferedReader::~BufferedReader() {
position_
.
front
().
wait
();
position_
.
pop
();
}
#ifdef PADDLE_WITH_CUDA
if
(
platform
::
is_gpu_place
(
place_
))
{
platform
::
SetDeviceId
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
).
device
);
PADDLE_ENFORCE
(
cudaStreamDestroy
(
stream
));
for
(
auto
&
event
:
events
)
PADDLE_ENFORCE
(
cudaEventDestroy
(
event
));
}
#endif
}
BufferedReader
::
BufferedReader
(
...
...
@@ -33,6 +41,19 @@ BufferedReader::BufferedReader(
thread_pool_
(
1
),
place_
(
place
),
buffer_size_
(
buffer_size
)
{
#ifdef PADDLE_WITH_CUDA
if
(
platform
::
is_gpu_place
(
place_
))
{
platform
::
SetDeviceId
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
).
device
);
compute_stream
=
((
platform
::
CUDADeviceContext
*
)(
platform
::
DeviceContextPool
::
Instance
()
.
Get
(
place_
)))
->
stream
();
events
.
resize
(
buffer_size
);
for
(
auto
&
event
:
events
)
PADDLE_ENFORCE
(
cudaEventCreateWithFlags
(
&
event
,
cudaEventDisableTiming
));
PADDLE_ENFORCE
(
cudaStreamCreateWithFlags
(
&
stream
,
cudaStreamNonBlocking
));
}
#endif
cpu_buffer_
.
resize
(
buffer_size
);
gpu_buffer_
.
resize
(
buffer_size
);
ReadTillBufferFullAsync
();
...
...
@@ -46,6 +67,12 @@ void BufferedReader::ReadTillBufferFullAsync() {
}
void
BufferedReader
::
ReadAsync
(
size_t
i
)
{
#ifdef PADDLE_WITH_CUDA
if
(
platform
::
is_gpu_place
(
place_
))
{
platform
::
SetDeviceId
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
).
device
);
PADDLE_ENFORCE
(
cudaEventRecord
(
events
[
i
],
compute_stream
));
}
#endif
position_
.
emplace
(
thread_pool_
.
enqueue
([
this
,
i
]()
->
size_t
{
TensorVec
&
cpu
=
cpu_buffer_
[
i
];
reader_
->
ReadNext
(
&
cpu
);
...
...
@@ -54,14 +81,41 @@ void BufferedReader::ReadAsync(size_t i) {
return
-
1UL
;
}
#ifdef PADDLE_WITH_CUDA
// NOTE(liangdun): using async copy instead of TensorCopySync
// TensorCopySync would block other stream
if
(
platform
::
is_gpu_place
(
place_
))
{
platform
::
SetDeviceId
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
).
device
);
PADDLE_ENFORCE
(
cudaStreamWaitEvent
(
stream
,
events
[
i
],
0
));
TensorVec
&
gpu
=
gpu_buffer_
[
i
];
gpu
.
resize
(
cpu
.
size
());
for
(
size_t
i
=
0
;
i
<
cpu
.
size
();
++
i
)
{
framework
::
TensorCopySync
(
cpu
[
i
],
place_
,
&
gpu
[
i
]);
gpu
[
i
].
Resize
(
cpu
[
i
].
dims
());
gpu
[
i
].
set_layout
(
cpu
[
i
].
layout
());
auto
cpu_place
=
cpu
[
i
].
place
();
auto
cpu_ptr
=
cpu
[
i
].
data
<
void
>
();
auto
gpu_ptr
=
gpu
[
i
].
mutable_data
(
place_
,
cpu
[
i
].
type
());
auto
size
=
cpu
[
i
].
numel
()
*
paddle
::
framework
::
SizeOfType
(
cpu
[
i
].
type
());
if
(
platform
::
is_cuda_pinned_place
(
cpu_place
))
memory
::
Copy
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
),
gpu_ptr
,
boost
::
get
<
platform
::
CUDAPinnedPlace
>
(
cpu_place
),
cpu_ptr
,
size
,
stream
);
else
if
((
platform
::
is_gpu_place
(
cpu_place
)))
memory
::
Copy
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
),
gpu_ptr
,
boost
::
get
<
platform
::
CUDAPlace
>
(
cpu_place
),
cpu_ptr
,
size
,
stream
);
else
// if cpu place is not pinned, async copy is slower than sync copy,
// so we use sync copy instead.
memory
::
Copy
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
),
gpu_ptr
,
boost
::
get
<
platform
::
CPUPlace
>
(
cpu_place
),
cpu_ptr
,
size
,
0
);
gpu
[
i
].
set_lod
(
cpu
[
i
].
lod
());
}
PADDLE_ENFORCE
(
cudaStreamSynchronize
(
stream
));
}
#endif
return
i
;
}));
}
...
...
paddle/fluid/operators/reader/buffered_reader.h
浏览文件 @
88d3dc94
...
...
@@ -19,6 +19,9 @@
#include <vector>
#include "ThreadPool.h"
#include "paddle/fluid/framework/reader.h"
#ifdef PADDLE_WITH_CUDA
#include "paddle/fluid/platform/gpu_info.h"
#endif
namespace
paddle
{
namespace
operators
{
...
...
@@ -59,6 +62,11 @@ class BufferedReader : public framework::DecoratedReader {
std
::
vector
<
TensorVec
>
cpu_buffer_
;
std
::
vector
<
TensorVec
>
gpu_buffer_
;
size_t
prev_pos_
{
-
1UL
};
#ifdef PADDLE_WITH_CUDA
cudaStream_t
stream
;
cudaStream_t
compute_stream
;
std
::
vector
<
cudaEvent_t
>
events
;
#endif
};
}
// namespace reader
...
...
paddle/fluid/operators/reader/ctr_reader.cc
浏览文件 @
88d3dc94
...
...
@@ -213,7 +213,7 @@ void ReadSvmData(const DataDesc& data_desc, std::shared_ptr<Reader> reader,
framework
::
LoD
lod
{
lod_data
};
lod_tensor
.
set_lod
(
lod
);
int64_t
*
tensor_data
=
lod_tensor
.
mutable_data
<
int64_t
>
(
framework
::
make_ddim
({
1
,
static_cast
<
int64_t
>
(
batch_feasign
.
size
())
}),
framework
::
make_ddim
({
static_cast
<
int64_t
>
(
batch_feasign
.
size
()),
1
}),
platform
::
CPUPlace
());
memcpy
(
tensor_data
,
batch_feasign
.
data
(),
batch_feasign
.
size
()
*
sizeof
(
int64_t
));
...
...
@@ -223,7 +223,7 @@ void ReadSvmData(const DataDesc& data_desc, std::shared_ptr<Reader> reader,
// insert label tensor
framework
::
LoDTensor
label_tensor
;
auto
*
label_tensor_data
=
label_tensor
.
mutable_data
<
int64_t
>
(
framework
::
make_ddim
({
1
,
static_cast
<
int64_t
>
(
batch_label
.
size
())
}),
framework
::
make_ddim
({
static_cast
<
int64_t
>
(
batch_label
.
size
()),
1
}),
platform
::
CPUPlace
());
memcpy
(
label_tensor_data
,
batch_label
.
data
(),
batch_label
.
size
()
*
sizeof
(
int64_t
));
...
...
paddle/fluid/operators/reader/ctr_reader_test.cc
浏览文件 @
88d3dc94
...
...
@@ -123,7 +123,7 @@ TEST(CTR_READER, read_data) {
std
::
vector
<
std
::
tuple
<
LoD
,
std
::
vector
<
int64_t
>>>
data_slot_6003
{
b1
,
b2
,
b3
,
b4
};
std
::
vector
<
DDim
>
label_dims
=
{{
1
,
3
},
{
1
,
3
},
{
1
,
3
},
{
1
,
1
}};
std
::
vector
<
DDim
>
label_dims
=
{{
3
,
1
},
{
3
,
1
},
{
3
,
1
},
{
1
,
1
}};
LoDTensorBlockingQueueHolder
queue_holder
;
int
capacity
=
64
;
...
...
paddle/fluid/operators/reduce_ops/CMakeLists.txt
浏览文件 @
88d3dc94
include
(
operators
)
register_operators
()
if
(
WITH_GPU
)
register_operators
(
DEPS cub
)
else
()
register_operators
()
endif
()
if
(
WITH_GPU
)
file
(
GLOB OPS RELATIVE
"
${
CMAKE_CURRENT_SOURCE_DIR
}
"
"*.part.cu"
)
...
...
paddle/fluid/operators/reshape_op.cc
浏览文件 @
88d3dc94
...
...
@@ -327,14 +327,45 @@ class Reshape2GradOp : public framework::OperatorWithKernel {
}
};
class
ReshapeOpInplaceInToOut
:
public
framework
::
InplaceInToOut
{
public:
using
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
inplace_in_to_out
=
{
{
"X"
,
"Out"
},
};
return
inplace_in_to_out
;
}
};
class
ReshapeGradInplaceInToOut
:
public
framework
::
InplaceInToOut
{
using
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
inplace_in_to_out
=
{
{
framework
::
GradVarName
(
"Out"
),
framework
::
GradVarName
(
"X"
)},
};
return
inplace_in_to_out
;
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
namespace
plat
=
paddle
::
platform
;
REGISTER_OPERATOR
(
reshape
,
ops
::
ReshapeOp
,
ops
::
ReshapeOpMaker
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
REGISTER_OPERATOR
(
reshape_grad
,
ops
::
ReshapeGradOp
);
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
,
ops
::
ReshapeOpInplaceInToOut
);
REGISTER_OPERATOR
(
reshape_grad
,
ops
::
ReshapeGradOp
,
ops
::
ReshapeGradInplaceInToOut
);
REGISTER_OP_CPU_KERNEL_FUNCTOR
(
reshape
,
float
,
ops
::
ReshapeKernel
,
double
,
ops
::
ReshapeKernel
,
int
,
ops
::
ReshapeKernel
,
int64_t
,
ops
::
ReshapeKernel
);
...
...
@@ -344,8 +375,9 @@ REGISTER_OP_CPU_KERNEL_FUNCTOR(reshape_grad, float, ops::ReshapeGradKernel,
ops
::
ReshapeGradKernel
);
REGISTER_OPERATOR
(
reshape2
,
ops
::
Reshape2Op
,
ops
::
Reshape2OpMaker
,
ops
::
Reshape2GradMaker
);
REGISTER_OPERATOR
(
reshape2_grad
,
ops
::
Reshape2GradOp
);
ops
::
Reshape2GradMaker
,
ops
::
ReshapeOpInplaceInToOut
);
REGISTER_OPERATOR
(
reshape2_grad
,
ops
::
Reshape2GradOp
,
ops
::
ReshapeGradInplaceInToOut
);
REGISTER_OP_CPU_KERNEL_FUNCTOR
(
reshape2
,
float
,
ops
::
ReshapeKernel
,
double
,
ops
::
ReshapeKernel
,
int
,
ops
::
ReshapeKernel
,
int64_t
,
ops
::
ReshapeKernel
);
...
...
paddle/fluid/operators/scale_op.cc
浏览文件 @
88d3dc94
...
...
@@ -100,13 +100,14 @@ class ScaleGradMaker : public framework::SingleGradOpDescMaker {
}
};
using
ScaleOpInplace
=
framework
::
SingleOpInplaceInToOut
;
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
scale
,
ops
::
ScaleOp
,
ops
::
ScaleOpMaker
,
ops
::
ScaleGradMaker
,
ops
::
ScaleOpVarTypeInference
);
ops
::
ScaleOpVarTypeInference
,
ops
::
ScaleOpInplace
);
REGISTER_OP_CPU_KERNEL
(
scale
,
ops
::
ScaleKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
ops
::
ScaleKernel
<
paddle
::
platform
::
CPUDeviceContext
,
double
>
,
...
...
paddle/fluid/operators/softmax_op.cc
浏览文件 @
88d3dc94
...
...
@@ -198,6 +198,21 @@ class SoftmaxOpGradMaker : public framework::SingleGradOpDescMaker {
return
std
::
unique_ptr
<
framework
::
OpDesc
>
(
op
);
}
};
class
SoftmaxInplaceInToOut
:
public
framework
::
InplaceInToOut
{
public:
using
framework
::
InplaceInToOut
::
InplaceInToOut
;
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
Apply
(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{
{
"X"
,
"Out"
},
};
}
};
}
// namespace operators
}
// namespace paddle
...
...
paddle/fluid/platform/CMakeLists.txt
浏览文件 @
88d3dc94
proto_library
(
profiler_proto SRCS profiler.proto DEPS framework_proto
)
proto_library
(
profiler_proto SRCS profiler.proto DEPS framework_proto
simple_threadpool
)
py_proto_compile
(
profiler_py_proto SRCS profiler.proto
)
add_custom_target
(
profiler_py_proto_init ALL COMMAND
${
CMAKE_COMMAND
}
-E touch __init__.py
)
...
...
@@ -36,7 +36,7 @@ cc_test(cpu_info_test SRCS cpu_info_test.cc DEPS cpu_info)
nv_library
(
gpu_info SRCS gpu_info.cc DEPS gflags glog enforce
)
cc_library
(
place SRCS place.cc DEPS enforce boost
)
cc_library
(
place SRCS place.cc DEPS enforce boost
lib_any
)
cc_test
(
place_test SRCS place_test.cc DEPS place glog gflags
)
add_subdirectory
(
dynload
)
...
...
paddle/fluid/platform/cuda_device_function.h
浏览文件 @
88d3dc94
...
...
@@ -54,6 +54,8 @@ inline static int RoundToPowerOfTwo(int dim) {
} break
#define CUDA_LAUNCH_KERNEL_HELPER(...) \
CUDA_LAUNCH_KERNEL_BASE(1024, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(512, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(256, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(128, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(64, ##__VA_ARGS__); \
...
...
paddle/fluid/platform/ngraph_helper.h
浏览文件 @
88d3dc94
...
...
@@ -23,6 +23,26 @@ limitations under the License. */
namespace
paddle
{
namespace
platform
{
std
::
shared_ptr
<
ngraph
::
Node
>
Nhwc2Nchw
(
std
::
shared_ptr
<
ngraph
::
Node
>
in
)
{
auto
in_shape
=
in
->
get_shape
();
in_shape
[
0
]
=
in
->
get_shape
()[
0
];
in_shape
[
1
]
=
in
->
get_shape
()[
3
];
in_shape
[
2
]
=
in
->
get_shape
()[
1
];
in_shape
[
3
]
=
in
->
get_shape
()[
2
];
ngraph
::
AxisVector
axis_vec
=
{
0
,
3
,
1
,
2
};
return
std
::
make_shared
<
ngraph
::
op
::
Reshape
>
(
in
,
axis_vec
,
in_shape
);
}
std
::
shared_ptr
<
ngraph
::
Node
>
Nchw2Nhwc
(
std
::
shared_ptr
<
ngraph
::
Node
>
in
)
{
auto
in_shape
=
in
->
get_shape
();
in_shape
[
0
]
=
in
->
get_shape
()[
0
];
in_shape
[
1
]
=
in
->
get_shape
()[
2
];
in_shape
[
2
]
=
in
->
get_shape
()[
3
];
in_shape
[
3
]
=
in
->
get_shape
()[
1
];
ngraph
::
AxisVector
axis_vec
=
{
0
,
2
,
3
,
1
};
return
std
::
make_shared
<
ngraph
::
op
::
Reshape
>
(
in
,
axis_vec
,
in_shape
);
}
ngraph
::
Shape
FlattenTo2d
(
ngraph
::
Shape
sh
,
int
num
)
{
auto
x1
=
std
::
accumulate
(
std
::
begin
(
sh
),
std
::
begin
(
sh
)
+
num
,
1
,
std
::
multiplies
<
size_t
>
());
...
...
@@ -43,13 +63,14 @@ std::shared_ptr<ngraph::Node> NgReshaper(std::shared_ptr<ngraph::Node> input,
std
::
shared_ptr
<
ngraph
::
Node
>
GetNode
(
const
std
::
shared_ptr
<
paddle
::
framework
::
OperatorBase
>&
op
,
const
std
::
string
prm
,
const
paddle
::
framework
::
VariableNameMap
&
var_map
,
const
std
::
string
name
,
const
paddle
::
framework
::
VariableNameMap
&
var_map
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
auto
&
var_names
=
var_map
.
at
(
prm
);
auto
&
var_names
=
var_map
.
at
(
name
);
PADDLE_ENFORCE_EQ
(
var_names
.
size
(),
1
,
"op %s prm %s expects one associated var"
,
op
->
Type
(),
prm
);
"op %s name %s expects one associated var"
,
op
->
Type
(),
name
);
if
(
ngb_node_map
->
find
(
var_names
[
0
])
!=
ngb_node_map
->
end
())
{
return
(
*
ngb_node_map
)[
var_names
[
0
]];
}
else
{
...
...
@@ -59,43 +80,53 @@ std::shared_ptr<ngraph::Node> GetNode(
std
::
shared_ptr
<
ngraph
::
Node
>
GetInputNode
(
const
std
::
shared_ptr
<
paddle
::
framework
::
OperatorBase
>&
op
,
const
std
::
string
prm
,
const
std
::
string
name
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
return
GetNode
(
op
,
prm
,
op
->
Inputs
(),
ngb_node_map
);
return
GetNode
(
op
,
name
,
op
->
Inputs
(),
ngb_node_map
);
}
std
::
shared_ptr
<
ngraph
::
Node
>
GetOutputNode
(
const
std
::
shared_ptr
<
paddle
::
framework
::
OperatorBase
>&
op
,
const
std
::
string
prm
,
const
std
::
string
name
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
return
GetNode
(
op
,
prm
,
op
->
Outputs
(),
ngb_node_map
);
return
GetNode
(
op
,
name
,
op
->
Outputs
(),
ngb_node_map
);
}
void
SetOutputNode
(
const
std
::
shared_ptr
<
paddle
::
framework
::
OperatorBase
>&
op
,
const
std
::
string
prm
,
std
::
shared_ptr
<
ngraph
::
Node
>
node
,
const
std
::
string
name
,
std
::
shared_ptr
<
ngraph
::
Node
>
node
,
std
::
shared_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
ngraph
::
Node
>>>
ngb_node_map
)
{
auto
&
var_names
=
op
->
Outputs
().
at
(
prm
);
auto
&
var_names
=
op
->
Outputs
().
at
(
name
);
if
(
var_names
.
size
()
==
1
)
{
/* */
auto
dummy_out
=
GetOutputNode
(
op
,
name
,
ngb_node_map
);
if
(
dummy_out
&&
dummy_out
->
get_shape
()
!=
node
->
get_shape
())
{
node
=
NgReshaper
(
node
,
dummy_out
->
get_shape
());
}
if
(
dummy_out
&&
dummy_out
->
get_element_type
()
!=
node
->
get_element_type
())
{
node
=
std
::
make_shared
<
ngraph
::
op
::
Convert
>
(
node
,
dummy_out
->
get_element_type
());
}
(
*
ngb_node_map
)[
var_names
[
0
]]
=
node
;
}
else
if
(
var_names
.
size
()
==
0
)
{
(
*
ngb_node_map
)[
""
]
=
node
;
}
else
{
PADDLE_THROW
(
"
prm %s has more than 1 var_names."
,
prm
);
PADDLE_THROW
(
"
name %s has more than 1 var_names."
,
name
);
}
}
bool
HasOutput
(
const
std
::
shared_ptr
<
paddle
::
framework
::
OperatorBase
>&
op
,
const
std
::
string
prm
)
{
const
std
::
string
name
)
{
auto
&
outputs
=
op
->
Outputs
();
if
(
outputs
.
find
(
prm
)
==
outputs
.
end
())
return
false
;
return
outputs
.
at
(
prm
).
size
()
>
0
;
if
(
outputs
.
find
(
name
)
==
outputs
.
end
())
return
false
;
return
outputs
.
at
(
name
).
size
()
>
0
;
}
inline
void
GetMidDims
(
const
ngraph
::
Shape
&
x_shape
,
...
...
paddle/fluid/pybind/CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -26,5 +26,5 @@ if(WITH_PYTHON)
get_property
(
os_dependency_modules GLOBAL PROPERTY OS_DEPENDENCY_MODULES
)
target_link_libraries
(
paddle_pybind
${
os_dependency_modules
}
)
cc_test
(
tensor_py_test SRCS tensor_py_test.cc DEPS python
)
cc_test
(
tensor_py_test SRCS tensor_py_test.cc DEPS python
pybind
)
endif
(
WITH_PYTHON
)
paddle/fluid/pybind/inference_api.cc
浏览文件 @
88d3dc94
...
...
@@ -74,12 +74,12 @@ void BindPaddleBuf(py::module *m) {
.
def
(
py
::
init
([](
std
::
vector
<
float
>
&
data
)
{
auto
buf
=
PaddleBuf
(
data
.
size
()
*
sizeof
(
float
));
std
::
memcpy
(
buf
.
data
(),
static_cast
<
void
*>
(
data
.
data
()),
buf
.
length
());
return
std
::
move
(
buf
)
;
return
buf
;
}))
.
def
(
py
::
init
([](
std
::
vector
<
int64_t
>
&
data
)
{
auto
buf
=
PaddleBuf
(
data
.
size
()
*
sizeof
(
int64_t
));
std
::
memcpy
(
buf
.
data
(),
static_cast
<
void
*>
(
data
.
data
()),
buf
.
length
());
return
std
::
move
(
buf
)
;
return
buf
;
}))
.
def
(
"resize"
,
&
PaddleBuf
::
Resize
)
.
def
(
"reset"
,
...
...
paddle/fluid/pybind/pybind.cc
浏览文件 @
88d3dc94
...
...
@@ -295,6 +295,7 @@ PYBIND11_MODULE(core, m) {
.
def
(
"_get_float_element"
,
TensorGetElement
<
float
>
)
.
def
(
"_set_double_element"
,
TensorSetElement
<
double
>
)
.
def
(
"_get_double_element"
,
TensorGetElement
<
double
>
)
.
def
(
"_place"
,
[](
Tensor
&
self
)
{
return
self
.
place
();
})
.
def
(
"_dtype"
,
[](
Tensor
&
self
)
{
return
self
.
type
();
});
py
::
class_
<
LoDTensor
,
Tensor
>
(
m
,
"LoDTensor"
,
R"DOC(
...
...
@@ -673,6 +674,12 @@ All parameter, weight, gradient are variables in Paddle.
py
::
class_
<
platform
::
Place
>
(
m
,
"Place"
)
.
def
(
py
::
init
<>
())
.
def
(
"is_gpu_place"
,
[](
platform
::
Place
&
self
)
{
return
platform
::
is_gpu_place
(
self
);
})
.
def
(
"gpu_device_id"
,
[](
platform
::
Place
&
self
)
{
return
boost
::
get
<
platform
::
CUDAPlace
>
(
self
).
device
;
})
.
def
(
"set_place"
,
[](
platform
::
Place
&
self
,
const
platform
::
CPUPlace
&
cpu_place
)
{
self
=
cpu_place
;
...
...
@@ -1093,9 +1100,9 @@ All parameter, weight, gradient are variables in Paddle.
[](
const
BuildStrategy
&
self
)
{
return
self
.
is_distribution_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
is_distribution_
=
b
;
})
.
def_property
(
"
memory_early_delet
e"
,
[](
const
BuildStrategy
&
self
)
{
return
self
.
memory_early_delet
e_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
memory_early_delet
e_
=
b
;
})
"
enable_inplac
e"
,
[](
const
BuildStrategy
&
self
)
{
return
self
.
enable_inplac
e_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
enable_inplac
e_
=
b
;
})
.
def
(
"_finalize_strategy_and_create_passes"
,
[](
BuildStrategy
&
self
)
->
std
::
shared_ptr
<
ir
::
PassBuilder
>
{
return
self
.
CreatePassesFromStrategy
(
true
);
...
...
paddle/scripts/fast_install.sh
0 → 100644
浏览文件 @
88d3dc94
#!/bin/bash
path
=
'http://paddlepaddle.org/download?url='
#release_version=`curl -s https://pypi.org/project/paddlepaddle/|grep -E "/project/paddlepaddle/"|grep "release"|awk -F '/' '{print $(NF-1)}'|head -1`
release_version
=
1.2.0
python_list
=(
"27"
"35"
"36"
"37"
)
function
use_cpu
(){
while
true
do
read
-p
"是否安装CPU版本的PaddlePaddle?(y/n)"
cpu_option
cpu_option
=
`
echo
$cpu_option
|
tr
'A-Z'
'a-z'
`
if
[[
"
$cpu_option
"
==
""
||
"
$cpu_option
"
==
"n"
]]
;
then
echo
"退出安装中..."
exit
else
GPU
=
'cpu'
echo
"将为您安装CPU版本的PaddlePaddle"
break
fi
done
}
function
checkLinuxCUDNN
(){
echo
read
-n1
-p
"请按回车键进行下一步..."
echo
while
true
do
version_file
=
'/usr/local/cuda/include/cudnn.h'
if
[
-f
"
$version_file
"
]
;
then
CUDNN
=
`
cat
$version_file
|
grep
CUDNN_MAJOR |awk
'NR==1{print $NF}'
`
fi
if
[
"
$CUDNN
"
==
""
]
;
then
version_file
=
`
sudo
find /usr
-name
"cudnn.h"
|head
-1
`
if
[
"
$version_file
"
!=
""
]
;
then
CUDNN
=
`
cat
${
version_file
}
|
grep
CUDNN_MAJOR
-A
2|awk
'NR==1{print $NF}'
`
else
echo
"检测结果:未在常规路径下找到cuda/include/cudnn.h文件"
while
true
do
read
-p
"请核实cudnn.h位置,并在此输入路径(请注意,路径需要输入到“cudnn.h”这一级):"
cudnn_version
echo
if
[
"
$cudnn_version
"
==
""
]
||
[
!
-f
"
$cudnn_version
"
]
;
then
read
-p
"仍未找到cuDNN,输入y将安装CPU版本的PaddlePaddle,输入n可重新录入cuDNN路径,请输入(y/n)"
cpu_option
echo
cpu_option
=
`
echo
$cpu_option
|
tr
'A-Z'
'a-z'
`
if
[
"
$cpu_option
"
==
"y"
-o
"
$cpu_option
"
==
""
]
;
then
GPU
=
'cpu'
break
else
echo
"请重新输入"
echo
fi
else
CUDNN
=
`
cat
$cudnn_version
|
grep
CUDNN_MAJOR |awk
'NR==1{print $NF}'
`
echo
"检测结果:找到cudnn.h"
break
fi
done
if
[
"
$GPU
"
==
"cpu"
]
;
then
break
fi
fi
fi
if
[
"
$CUDA
"
==
"9"
-a
"
$CUDNN
"
!=
"7"
]
;
then
echo
echo
"目前CUDA9下仅支持cuDNN7,暂不支持您机器上的CUDNN
${
CUDNN
}
。您可以访问NVIDIA官网下载适合版本的CUDNN,请ctrl+c退出安装进程。按回车键将为您安装CPU版本的PaddlePaddle"
echo
use_cpu
()
if
[
"
$GPU
"
==
"cpu"
]
;
then
break
fi
fi
if
[
"
$CUDNN
"
==
5
]
||
[
"
$CUDNN
"
==
7
]
;
then
echo
echo
"您的CUDNN版本是: CUDNN
$CUDNN
"
break
else
echo
read
-n1
-p
"目前支持的CUDNN版本为5和7,暂不支持您机器上的CUDNN
${
CUDNN
}
,将为您安装CPU版本的PaddlePaddle,请按回车键开始安装"
echo
use_cpu
if
[
"
$GPU
"
==
"cpu"
]
;
then
break
fi
fi
done
}
function
checkLinuxCUDA
(){
while
true
do
CUDA
=
`
echo
${
CUDA_VERSION
}
|awk
-F
"[ .]"
'{print $1}'
`
if
[
"
$CUDA
"
==
""
]
;
then
if
[
-f
"/usr/local/cuda/version.txt"
]
;
then
CUDA
=
`
cat
/usr/local/cuda/version.txt |
grep
'CUDA Version'
|awk
-F
'[ .]'
'{print $3}'
`
tmp_cuda
=
$CUDA
fi
if
[
-f
"/usr/local/cuda8/version.txt"
]
;
then
CUDA
=
`
cat
/usr/local/cuda8/version.txt |
grep
'CUDA Version'
|awk
-F
'[ .]'
'{print $3}'
`
tmp_cuda8
=
$CUDA
fi
if
[
-f
"/usr/local/cuda9/version.txt"
]
;
then
CUDA
=
`
cat
/usr/local/cuda9/version.txt |
grep
'CUDA Version'
|awk
-F
'[ .]'
'{print $3}'
`
tmp_cuda9
=
$CUDA
fi
fi
if
[
"
$tmp_cuda
"
!=
""
]
;
then
echo
"检测结果:找到CUDA
$tmp_cuda
"
fi
if
[
"
$tmp_cudai8
"
!=
""
]
;
then
echo
"检测结果:找到CUDA
$tmp_cuda8
"
fi
if
[
"
$tmp_cuda9
"
!=
""
]
;
then
echo
"检测结果:找到CUDA
$tmp_cuda9
"
fi
if
[
"
$CUDA
"
==
""
]
;
then
echo
"检测结果:没有在常规路径下找到cuda/version.txt文件"
while
true
do
read
-p
"请输入cuda/version.txt的路径:"
cuda_version
if
[
"
$cuda_version
"
==
""
||
!
-f
"
$cuda_version
"
]
;
then
read
-p
"仍未找到CUDA,输入y将安装CPU版本的PaddlePaddle,输入n可重新录入CUDA路径,请输入(y/n)"
cpu_option
cpu_option
=
`
echo
$cpu_option
|
tr
'A-Z'
'a-z'
`
if
[
"
$cpu_option
"
==
"y"
||
"
$cpu_option
"
==
""
]
;
then
GPU
=
'cpu'
break
else
echo
"重新输入..."
fi
else
CUDA
=
`
cat
$cuda_version
|
grep
'CUDA Version'
|awk
-F
'[ .]'
'{print $3}'
`
if
[
"
$CUDA
"
==
""
]
;
then
echo
"未能在version.txt中找到CUDA相关信息"
else
break
fi
fi
done
if
[
"
$GPU
"
==
"cpu"
]
;
then
break
fi
fi
if
[
"
$CUDA
"
==
"8"
]
||
[
"
$CUDA
"
==
"9"
]
;
then
echo
"您的CUDA版本是
${
CUDA
}
"
break
else
echo
"目前支持CUDA8/9,暂不支持您的CUDA
${
CUDA
}
,将为您安装CPU版本的PaddlePaddle"
echo
use_cpu
fi
if
[
"
$GPU
"
==
"cpu"
]
;
then
break
fi
done
}
function
checkLinuxMathLibrary
(){
while
true
do
if
[
"
$AVX
"
==
""
]
;
then
echo
"正在检测您环境中是否存在AVX指令集..."
echo
echo
"检测结果:您电脑上没有AVX指令集,目前针对无AVX指令集的环境,我们仅提供支持mkl数学库的PaddlePaddle,将为您安装此版本的PaddlePaddle"
math
=
'mkl'
break
elif
[
"
$GPU
"
==
"gpu"
]
;
then
math
=
'mkl'
echo
"检测到您的机器上配备GPU,推荐您使用mkl数学库"
break
else
read
-p
"请输入您希望使用的数学库:
1:openblas 一个高性能多核 BLAS 库
2:mkl(推荐) 英特尔数学核心函数库
=> 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. mkl 】 。请在这里输入并回车:"
math
if
[
"
$math
"
==
""
]
;
then
math
=
"mkl"
echo
"您选择了数字【2】"
break
fi
if
[
"
$math
"
==
"1"
]
;
then
math
=
openblas
echo
"您选择了数字【1】"
break
elif
[
"
$math
"
==
"2"
]
;
then
math
=
mkl
echo
"您选择了数字【2】"
break
fi
echo
"输入错误,请再次输入"
fi
done
}
function
checkLinuxPaddleVersion
(){
read
-n1
-p
"请按回车键继续..."
while
true
do
read
-p
"
1. 开发版:对应Github上develop分支,如您需要开发、或希望使用PaddlePaddle最新功能,请选用此版本
2. 稳定版(推荐):如您无特殊开发需求,建议使用此版本,目前最新的版本号为
${
release_version
}
=> 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. 稳定版 】 。请在这里输入并回车:"
paddle_version
if
[
"
$paddle_version
"
==
""
]
;
then
paddle_version
=
"release-
${
release_version
}
"
echo
"您选择了数字【2】,为您安装release-
${
release_version
}
"
break
fi
if
[
"
$paddle_version
"
==
"1"
]
;
then
echo
"您选择了数字【1】,将为您安装开发版"
break
elif
[
"
$paddle_version
"
==
"2"
]
;
then
echo
"您选择了数字【2】,为您安装release-
${
release_version
}
"
break
fi
echo
"输入错误,请再次输入"
done
}
function
checkLinuxPip
(){
while
true
do
echo
"请输入您要使用的pip目录(您可以另起终端,并使用which pip来查看):"
read
-p
""
pip_path
if
[
"
$pip_path
"
==
""
-o
!
-f
"
$pip_path
"
]
;
then
echo
"检测结果:pip不存在,请重新输入"
continue
fi
python_version
=
`
$pip_path
--version
|awk
-F
"[ |)]"
'{print $6}'
|sed
's#\.##g'
`
if
[
"
$python_version
"
==
"27"
]
;
then
uncode
=
`
python
-c
"import pip._internal;print(pip._internal.pep425tags.get_supported())"
|grep
"cp27mu"
`
if
[[
"
$uncode
"
==
""
]]
;
then
uncode
=
else
uncode
=
u
fi
fi
if
[
"
$python_version
"
==
""
]
;
then
echo
"检测结果:pip不存在,请重新输入"
else
version_list
=
`
echo
"
${
python_list
[@]
}
"
|
grep
"
$python_version
"
`
if
[
"
$version_list
"
!=
""
]
;
then
echo
"检测结果:找到python
${
python_version
}
版本"
break
else
echo
"检测结果:找不到可用的 pip, 我们只支持Python27/35/36/37及其对应的pip, 请重新输入, 或使用ctrl + c退出 "
fi
fi
done
}
function
checkLinuxAVX
(){
while
true
do
if
[[
"
$AVX
"
!=
""
]]
;
then
AVX
=
"avx"
break
else
if
[
"
$CUDA
"
==
"8"
-a
"
$CUDNN
"
==
"7"
]
||
[
"
$GPU
"
==
"cpu"
]
;
then
AVX
=
"noavx"
break
else
echo
"Step 6. 检测是否有avx"
echo
echo
"检测结果:未能找到avx,我们仅提供CPU版本或配置为CUDA8 cuDNN7的GPU版本的安装包"
break
fi
fi
done
}
function
PipLinuxInstall
(){
wheel_cpu_release
=
"http://paddle-wheel.bj.bcebos.com/
${
release_version
}
-
${
GPU
}
-
${
AVX
}
-
${
math
}
/paddlepaddle-
${
release_version
}
-cp
${
python_version
}
-cp
${
python_version
}
m
${
uncode
}
-linux_x86_64.whl"
wheel_gpu_release
=
"http://paddle-wheel.bj.bcebos.com/
${
release_version
}
-gpu-cuda
${
CUDA
}
-cudnn
${
CUDNN
}
-
${
AVX
}
-
${
math
}
/paddlepaddle_gpu-
${
release_version
}
.post
${
CUDA
}${
CUDNN
}
-cp
${
python_version
}
-cp
${
python_version
}
m
${
uncode
}
-linux_x86_64.whl"
wheel_gpu_release_noavx
=
"http://paddle-wheel.bj.bcebos.com/
${
release_version
}
-gpu-cuda
${
CUDA
}
-cudnn
${
CUDNN
}
-
${
AVX
}
-
${
math
}
/paddlepaddle_gpu-
${
release_version
}
-cp
${
python_version
}
-cp
${
python_version
}
m
${
uncode
}
-linux_x86_64.whl"
wheel_cpu_develop
=
"http://paddle-wheel.bj.bcebos.com/latest-cpu-
${
AVX
}
-
${
math
}
/paddlepaddle-latest-cp
${
python_version
}
-cp
${
python_version
}
m
${
uncode
}
-linux_x86_64.whl"
wheel_gpu_develop
=
"http://paddle-wheel.bj.bcebos.com/latest-gpu-cuda
${
CUDA
}
-cudnn
${
CUDNN
}
-
${
AVX
}
-
${
math
}
/paddlepaddle_gpu-latest-cp
${
python_version
}
-cp
${
python_version
}
m
${
uncode
}
-linux_x86_64.whl"
if
[[
"
$paddle_version
"
==
"2"
]]
;
then
if
[[
"
$GPU
"
==
"gpu"
]]
;
then
if
[[
${
AVX
}
==
"avx"
]]
;
then
rm
-rf
`
echo
$wheel_gpu_release
|awk
-F
'/'
'{print $NF}'
`
wget
-q
$wheel_gpu_release
if
[
"
$?
"
==
"0"
]
;
then
$pip_path
install
--user
-i
https://mirrors.aliyun.com/pypi/simple
--trusted-host
=
mirrors.aliyun.com
$wheel_gpu_release
else
echo
"paddlepaddle whl包下载失败"
exit
1
fi
else
rm
-rf
`
echo
$wheel_gpu_release_novax
|awk
-F
'/'
'{print $NF}'
`
wget
-q
$wheel_gpu_release_novax
if
[
"
$?
"
==
"0"
]
;
then
$pip_path
install
--user
-i
https://mirrors.aliyun.com/pypi/simple
--trusted-host
=
mirrors.aliyun.com
$wheel_gpu_release_noavx
else
echo
"paddlepaddle whl包下载失败"
exit
1
fi
fi
else
rm
-rf
`
echo
$wheel_cpu_release
|awk
-F
'/'
'{print $NF}'
`
wget
-q
$wheel_cpu_release
if
[
"
$?
"
==
"0"
]
;
then
$pip_path
install
--user
-i
https://mirrors.aliyun.com/pypi/simple
--trusted-host
=
mirrors.aliyun.com
$wheel_cpu_release
else
echo
"paddlepaddle whl包下载失败"
exit
1
fi
fi
else
if
[[
"
$GPU
"
==
"gpu"
]]
;
then
rm
-rf
`
echo
$wheel_gpu_develop
|awk
-F
'/'
'{print $NF}'
`
wget
-q
$wheel_gpu_develop
if
[
"
$?
"
==
"0"
]
;
then
$pip_path
install
--user
-i
https://mirrors.aliyun.com/pypi/simple
--trusted-host
=
mirrors.aliyun.com
$wheel_gpu_develop
else
echo
"paddlepaddle whl包下载失败"
exit
1
fi
else
rm
-rf
`
echo
$wheel_cpu_develop
|awk
-F
'/'
'{print $NF}'
`
wget
-q
$wheel_cpu_develop
if
[
"
$?
"
==
"0"
]
;
then
$pip_path
install
--user
-i
https://mirrors.aliyun.com/pypi/simple
--trusted-host
=
mirrors.aliyun.com
$wheel_cpu_develop
else
echo
"paddlepaddle whl包下载失败"
exit
1
fi
fi
fi
}
function
checkLinuxGPU
(){
read
-n1
-p
"即将检测您的机器是否含GPU,请按回车键继续..."
echo
AVX
=
`
cat
/proc/cpuinfo |grep avx|tail
-1
|grep avx
`
which nvidia-smi
>
/dev/null 2>&1
if
[
"
$?
"
!=
"0"
]
;
then
GPU
=
'cpu'
echo
"未在机器上找到GPU,或PaddlePaddle暂不支持此型号的GPU"
else
GPU
=
'gpu'
echo
"已在您的机器上找到GPU,即将确认CUDA和CUDNN版本..."
echo
fi
if
[
"
$GPU
"
==
'gpu'
]
;
then
checkLinuxCUDA
checkLinuxCUDNN
fi
}
function
linux
(){
gpu_list
=(
"GeForce 410M"
"GeForce 610M"
"GeForce 705M"
"GeForce 710M"
"GeForce 800M"
"GeForce 820M"
"GeForce 830M"
"GeForce 840M"
"GeForce 910M"
"GeForce 920M"
"GeForce 930M"
"GeForce 940M"
"GeForce GT 415M"
"GeForce GT 420M"
"GeForce GT 430"
"GeForce GT 435M"
"GeForce GT 440"
"GeForce GT 445M"
"GeForce GT 520"
"GeForce GT 520M"
"GeForce GT 520MX"
"GeForce GT 525M"
"GeForce GT 540M"
"GeForce GT 550M"
"GeForce GT 555M"
"GeForce GT 610"
"GeForce GT 620"
"GeForce GT 620M"
"GeForce GT 625M"
"GeForce GT 630"
"GeForce GT 630M"
"GeForce GT 635M"
"GeForce GT 640"
"GeForce GT 640 (GDDR5)"
"GeForce GT 640M"
"GeForce GT 640M LE"
"GeForce GT 645M"
"GeForce GT 650M"
"GeForce GT 705"
"GeForce GT 720"
"GeForce GT 720M"
"GeForce GT 730"
"GeForce GT 730M"
"GeForce GT 735M"
"GeForce GT 740"
"GeForce GT 740M"
"GeForce GT 745M"
"GeForce GT 750M"
"GeForce GTS 450"
"GeForce GTX 1050"
"GeForce GTX 1060"
"GeForce GTX 1070"
"GeForce GTX 1080"
"GeForce GTX 1080 Ti"
"GeForce GTX 460"
"GeForce GTX 460M"
"GeForce GTX 465"
"GeForce GTX 470"
"GeForce GTX 470M"
"GeForce GTX 480"
"GeForce GTX 480M"
"GeForce GTX 485M"
"GeForce GTX 550 Ti"
"GeForce GTX 560M"
"GeForce GTX 560 Ti"
"GeForce GTX 570"
"GeForce GTX 570M"
"GeForce GTX 580"
"GeForce GTX 580M"
"GeForce GTX 590"
"GeForce GTX 650"
"GeForce GTX 650 Ti"
"GeForce GTX 650 Ti BOOST"
"GeForce GTX 660"
"GeForce GTX 660M"
"GeForce GTX 660 Ti"
"GeForce GTX 670"
"GeForce GTX 670M"
"GeForce GTX 670MX"
"GeForce GTX 675M"
"GeForce GTX 675MX"
"GeForce GTX 680"
"GeForce GTX 680M"
"GeForce GTX 680MX"
"GeForce GTX 690"
"GeForce GTX 750"
"GeForce GTX 750 Ti"
"GeForce GTX 760"
"GeForce GTX 760M"
"GeForce GTX 765M"
"GeForce GTX 770"
"GeForce GTX 770M"
"GeForce GTX 780"
"GeForce GTX 780M"
"GeForce GTX 780 Ti"
"GeForce GTX 850M"
"GeForce GTX 860M"
"GeForce GTX 870M"
"GeForce GTX 880M"
"GeForce GTX 950"
"GeForce GTX 950M"
"GeForce GTX 960"
"GeForce GTX 960M"
"GeForce GTX 965M"
"GeForce GTX 970"
"GeForce GTX 970M"
"GeForce GTX 980"
"GeForce GTX 980M"
"GeForce GTX 980 Ti"
"GeForce GTX TITAN"
"GeForce GTX TITAN Black"
"GeForce GTX TITAN X"
"GeForce GTX TITAN Z"
"Jetson TK1"
"Jetson TX1"
"Jetson TX2"
"Mobile Products"
"NVIDIA NVS 310"
"NVIDIA NVS 315"
"NVIDIA NVS 510"
"NVIDIA NVS 810"
"NVIDIA TITAN V"
"NVIDIA TITAN X"
"NVIDIA TITAN Xp"
"NVS 4200M"
"NVS 5200M"
"NVS 5400M"
"Quadro 410"
"Quadro GP100"
"Quadro K1100M"
"Quadro K1200"
"Quadro K2000"
"Quadro K2000D"
"Quadro K2100M"
"Quadro K2200"
"Quadro K2200M"
"Quadro K3100M"
"Quadro K4000"
"Quadro K4100M"
"Quadro K420"
"Quadro K4200"
"Quadro K4200M"
"Quadro K5000"
"Quadro K500M"
"Quadro K5100M"
"Quadro K510M"
"Quadro K5200"
"Quadro K5200M"
"Quadro K600"
"Quadro K6000"
"Quadro K6000M"
"Quadro K610M"
"Quadro K620"
"Quadro K620M"
"Quadro M1000M"
"Quadro M1200"
"Quadro M2000"
"Quadro M2000M"
"Quadro M2200"
"Quadro M3000M"
"Quadro M4000"
"Quadro M4000M"
"Quadro M5000"
"Quadro M5000M"
"Quadro M500M"
"Quadro M520"
"Quadro M5500M"
"Quadro M6000"
"Quadro M6000 24GB"
"Quadro M600M"
"Quadro M620"
"Quadro Mobile Products"
"Quadro P1000"
"Quadro P2000"
"Quadro P3000"
"Quadro P400"
"Quadro P4000"
"Quadro P5000"
"Quadro P600"
"Quadro P6000"
"Quadro Plex 7000"
"Tegra K1"
"Tegra X1"
"Tesla C2050/C2070"
"Tesla C2075"
"Tesla Data Center Products"
"Tesla K10"
"Tesla K20"
"Tesla K40"
"Tesla K80"
"Tesla M40"
"Tesla M60"
"Tesla P100"
"Tesla P4"
"Tesla P40"
"Tesla V100"
)
echo
"Step 2. 检测GPU型号和CUDA/cuDNN版本"
echo
checkLinuxGPU
echo
echo
"Step 3. 检测数学库"
echo
checkLinuxMathLibrary
echo
echo
"Step 4. 选择要安装的PaddlePaddle版本"
echo
checkLinuxPaddleVersion
echo
echo
"Step 5. 检测pip版本"
echo
checkLinuxPip
echo
checkLinuxAVX
echo
"*********************2. 开始安装*****************************"
PipLinuxInstall
}
function
checkMacPython2
(){
while
true
do
read
-p
"
=> 未能在常规路径下找到Python2,请使用ctrl+c命令退出安装程序,并使用brew或pypi.org下载安装Python2(注意Python版本不能低于2.7.15)
如希望自定义Python路径,请输入路径:"
python_root
echo
python_version
=
`
$python_root
--version
2>&1 1>&1
`
if
[
$?
==
"0"
]
;
then
:
else
python_version
=
""
fi
check_python
=
`
echo
$python_version
|
grep
"Python 2"
`
if
[
"
$python_version
"
==
""
]
||
[
"
$python_root
"
==
"/usr/bin/python"
-a
"
$python_version
"
==
"Python 2.7.10"
]
;
then
python_version
=
""
elif
[
-n
"
$check_python
"
]
;
then
while
true
do
read
-p
"
=> 在您的环境中找到
$python_version
, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车: "
use_python
echo
use_python
=
`
echo
$use_python
|
tr
'A-Z'
'a-z'
`
if
[
"
$use_python
"
==
"y"
]||[
"
$use_python
"
==
""
]
;
then
use_python
=
"y"
break
elif
[
"
$use_python
"
==
"n"
]
;
then
python_root
=
""
break
else
echo
"输入错误,请重新输入(y/n)"
fi
done
if
[
"
$use_python
"
==
"y"
]
;
then
break
fi
else
echo
"您输入Python的不是Python2"
python_version
=
""
fi
done
}
function
checkMacPython3
(){
while
true
do
read
-p
"
=> 未能在常规路径下找到Python3,请使用ctrl+c命令退出安装程序,并使用brew或pypi.org下载Python3
如希望自定义Python路径,请输入路径:"
python_root
python_version
=
`
$python_root
--version
2>&1 1>&1
`
if
[
$?
==
"0"
]
;
then
:
else
python_version
=
""
fi
check_python
=
`
echo
$python_version
|
grep
"Python 3"
`
if
[
"
$python_version
"
==
""
]
||
[
"
$python_root
"
==
"/usr/bin/python"
-a
"
$python_version
"
==
"Python 2.7.10"
]
;
then
python_version
=
""
elif
[
-n
"
$check_python
"
]
;
then
while
true
do
read
-p
"
=> 在您的环境中找到
$python_version
, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车: "
use_python
echo
use_python
=
`
echo
$use_python
|
tr
'A-Z'
'a-z'
`
if
[
"
$use_python
"
==
"y"
]||[
"
$use_python
"
==
""
]
;
then
use_python
=
"y"
break
elif
[
"
$use_python
"
==
"n"
]
;
then
python_root
=
""
break
else
echo
"输入错误,请重新输入(y/n)"
fi
done
if
[
"
$use_python
"
==
"y"
]
;
then
break
fi
else
echo
"您输入Python的不是Python3"
python_version
=
""
fi
done
}
function
checkMacPaddleVersion
(){
while
true
do
read
-n1
-p
"Step 2. 选择PaddlePaddle的版本,请按回车键继续..."
echo
read
-p
"
1. 开发版:对应Github上develop分支,如您需要开发、或希望使用PaddlePaddle最新功能,请选用此版本
2. 稳定版(推荐):如您无特殊开发需求,建议使用此版本,目前最新的版本号为
${
release_version
}
=> 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. 稳定版 】 。请在这里输入并回车:"
paddle_version
if
[
"
$paddle_version
"
==
"1"
]||[
"
$paddle_version
"
==
"2"
]
;
then
echo
echo
"您选择了数字【"
$paddle_version
" 】"
echo
break
else
paddle_version
=
"2"
echo
echo
"您选择了数字【2】"
echo
break
fi
done
}
function
checkMacPythonVersion
(){
while
true
do
read
-n1
-p
"Step 3. 选择Python版本,请按回车键继续..."
read
-p
"
2. 使用python 2.x
3. 使用python 3.x
=> 请输入数字2或3。如输入其他字符或直接回车,将会默认使用【Python 2 】。请在这里输入并回车:"
python_V
echo
if
[
"
$python_V
"
==
""
]
;
then
python_V
=
"2"
fi
echo
"您选择了数字【"
$python_V
"】,正在寻找符合您要求的Python版本,请按回车键继续..."
echo
if
[
"
$python_V
"
==
"2"
]
;
then
python_root
=
`
which python2.7
`
if
[
"
$python_root
"
==
""
]
;
then
python_root
=
`
which python
`
fi
python_version
=
`
$python_root
--version
2>&1 1>&1
`
if
[
$?
==
"0"
]
;
then
:
else
python_version
=
""
fi
if
[
"
$python_root
"
==
""
]||[
"
$python_root
"
==
"/usr/bin/python"
-a
"
$python_version
"
==
"Python 2.7.10"
]||[
"
$python_root
"
==
"/usr/bin/python2.7"
-a
"
$python_version
"
==
"Python 2.7.10"
]
;
then
checkMacPython2
fi
while
true
do
read
-p
"
=> 在您的环境中找到
$python_version
, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车:"
use_python
echo
use_python
=
`
echo
$use_python
|
tr
'A-Z'
'a-z'
`
if
[
"
$use_python
"
==
"y"
]||[
"
$use_python
"
==
""
]
;
then
break
elif
[
"
$use_python
"
==
"n"
]
;
then
python_root
=
""
checkMacPython2
break
else
echo
"输入错误,请重新输入(y/n)"
fi
done
elif
[
"
$python_V
"
==
"3"
]
;
then
python_root
=
`
which python3
`
python_version
=
`
$python_root
--version
2>&1 1>&1
`
if
[
$?
==
"0"
]
;
then
:
else
python_version
=
""
fi
if
[
"
$python_root
"
==
""
]||[
"
$python_root
"
==
"/usr/bin/python"
-a
"
$python_version
"
==
"Python 2.7.10"
]
;
then
checkMacPython3
fi
while
true
do
read
-p
"
=> 在您的环境中找到
$python_version
, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车:"
use_python
echo
use_python
=
`
echo
$use_python
|
tr
'A-Z'
'a-z'
`
if
[
"
$use_python
"
==
"y"
]||[
"
$use_python
"
==
""
]
;
then
break
elif
[
"
$use_python
"
==
"n"
]
;
then
checkMacPython3
break
else
echo
"输入错误,请重新输入(y/n)"
fi
done
else
:
fi
if
[
"
$python_V
"
==
"2"
]||[
"
$python_V
"
==
"3"
]
;
then
python_brief_version
=
`
$python_root
-m
pip
-V
|awk
-F
"[ |)]"
'{print $6}'
|sed
's#\.##g'
`
if
[[
$python_brief_version
==
"27"
]]
;
then
uncode
=
`
python
-c
"import pip._internal;print(pip._internal.pep425tags.get_supported())"
|grep
"cp27"
`
if
[[
$uncode
==
""
]]
;
then
uncode
=
"mu"
else
uncode
=
"m"
fi
fi
version_list
=
`
echo
"
${
python_list
[@]
}
"
|
grep
"
$python_brief_version
"
`
if
[
"
$version_list
"
!=
""
]
;
then
break
else
echo
"未找到可用的pip或pip3。PaddlePaddle目前支持:Python2.7/3.5/3.6/3.7及其对应的pip, 请重新输入,或使用ctrl + c退出"
fi
else
echo
"输入错误,请重新输入"
fi
done
}
function
checkMacAVX
(){
read
-n1
-p
"Step 4. 检测您的Mac是否支持AVX指令集,请按回车键继续..."
echo
if
[[
$AVX
!=
""
]]
;
then
AVX
=
"avx"
echo
"检测结果:支持"
else
read
-n1
-p
"检测结果:不支持。非常抱歉,PaddlePaddle在Mac系统暂不提供no_avx类型的安装包,您可以选择在Linux系统中安装no_avx版的PaddlePaddle, 请按回车键退出..."
exit
fi
echo
}
function
checkMacGPU
(){
read
-n1
-p
"Step 5. 选择CPU/GPU版本,请按回车键继续..."
echo
if
[[
$GPU
!=
""
]]
;
then
echo
"MacOS环境下,暂未提供GPU版本的PaddlePaddle安装包,将为您安装CPU版本的PaddlePaddle"
else
echo
"MacOS环境下,暂未提供GPU版本的PaddlePaddle安装包,将为您安装CPU版本的PaddlePaddle"
GPU
=
cpu
fi
echo
}
function
macos
()
{
path
=
'http://paddlepaddle.org/download?url='
AVX
=
`
sysctl
-a
|
grep
cpu |
grep
AVX1.0 |
tail
-1
|
grep
AVX
`
while
true
do
checkMacPaddleVersion
checkMacPythonVersion
checkMacAVX
checkMacGPU
echo
"*********************2. 开始安装*****************************"
echo
read
-n1
-p
"即将为您下载并安装PaddlePaddle,请按回车键继续..."
echo
if
[[
$paddle_version
==
"2"
]]
;
then
$python_root
-m
pip
install
paddlepaddle
if
[
$?
==
"0"
]
;
then
echo
"安装成功,可以使用:
${
python_root
}
来启动安装了PaddlePaddle的Python解释器"
break
else
rm
$whl_cpu_release
echo
"未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用"
echo
""
echo
"=========================================================================================="
echo
""
exit
1
fi
else
if
[
-f
$whl_cpu_develop
]
;
then
$python_root
-m
pip
install
$whl_cpu_develop
if
[
$?
==
"0"
]
;
then
rm
-rf
$whl_cpu_develop
echo
"安装成功!小提示:可以使用:
${
python_root
}
来启动安装了PaddlePaddle的Python解释器"
break
else
echo
"未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用"
echo
""
echo
"=========================================================================================="
echo
""
exit
1
fi
else
wget
${
path
}
$whl_cpu_develop
-O
$whl_cpu_develop
if
[
$?
==
"0"
]
;
then
$python_root
-m
pip
install
$whl_cpu_develop
if
[
$?
==
"0"
]
;
then
rm
$wheel_cpu_develop
echo
"安装成功,可以使用:
${
python_root
}
来启动安装了PaddlePaddle的Python解释器"
break
else
rm
$whl_cpu_release
echo
"未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用"
echo
""
echo
"=========================================================================================="
echo
""
exit
1
fi
else
rm
$whl_cpu_develop
echo
"未能正常安装PaddlePaddle,请检查您的网络 或者确认您是否安装有 wget,或者ctrl + c退出后反馈至https://github.com/PaddlePaddle/Paddle/issues"
echo
""
echo
"=========================================================================================="
echo
""
exit
1
fi
fi
fi
done
}
function
main
()
{
echo
"*********************************"
echo
"欢迎使用PaddlePaddle快速安装脚本"
echo
"*********************************"
echo
echo
"如果您在安装过程中遇到任何问题,请在https://github.com/PaddlePaddle/Paddle/issues反馈,我们的工作人员将会帮您答疑解惑"
echo
echo
"本安装包将帮助您在Linux或Mac系统下安装PaddlePaddle,包括 1)安装前的准备和 2)开始安装 两部分"
echo
read
-n1
-p
"请按回车键进行下一步..."
echo
echo
echo
"*********************1. 安装前的准备*****************************"
echo
echo
"Step 1. 正在检测您的操作系统信息..."
echo
SYSTEM
=
`
uname
-s
`
if
[
"
$SYSTEM
"
==
"Darwin"
]
;
then
echo
"您的系统为:MAC OSX"
echo
macos
else
echo
"您的系统为:Linux"
echo
OS
=
`
cat
/etc/issue|awk
'NR==1 {print $1}'
`
if
[
$OS
==
"
\S
"
]
||
[
"
$OS
"
==
"CentOS"
]
||
[
$OS
==
"Ubuntu"
]
;
then
linux
else
echo
"您的系统不在本安装包的支持范围,如您需要在windows环境下安装PaddlePaddle,请您参考PaddlePaddle官网的windows安装文档"
fi
fi
}
main
python/CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -54,7 +54,7 @@ ELSE(WIN32)
DEPENDS copy_paddle_pybind
${
FLUID_CORE
}
framework_py_proto profiler_py_proto
${
PY_FILES
}
${
external_project_dependencies
}
${
COPY_PADDLE_MASTER
}
)
ENDIF
()
set
(
paddle_python_deps
${
PADDLE_PYTHON_BUILD_DIR
}
/.timestamp
${
MKL_DEPENDS
}
)
set
(
paddle_python_deps
${
PADDLE_PYTHON_BUILD_DIR
}
/.timestamp
${
MKL_DEPENDS
}
${
external_project_dependencies
}
)
add_custom_target
(
paddle_python ALL DEPENDS
${
paddle_python_deps
}
)
set
(
PADDLE_PYTHON_PACKAGE_DIR
${
CMAKE_CURRENT_BINARY_DIR
}
/dist/
)
...
...
python/paddle/__init__.py
浏览文件 @
88d3dc94
...
...
@@ -25,4 +25,5 @@ import paddle.reader
import
paddle.dataset
import
paddle.batch
import
paddle.compat
import
paddle.distributed
batch
=
batch
.
batch
python/paddle/distributed/__init__.py
0 → 100644
浏览文件 @
88d3dc94
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
tools/run_mp
.py
→
python/paddle/distributed/launch
.py
浏览文件 @
88d3dc94
...
...
@@ -37,7 +37,7 @@ default_envs = {
GPUS
=
8
def
start_procs
(
gpus
,
cmd
,
log_dir
):
def
start_procs
(
gpus
,
entrypoint
,
entrypoint_args
,
log_dir
):
procs
=
[]
log_fns
=
[]
os
.
system
(
"mkdir -p %s"
%
log_dir
)
...
...
@@ -73,12 +73,11 @@ def start_procs(gpus, cmd, log_dir):
"PADDLE_TRAINER_ENDPOINTS"
:
all_nodes_devices_endpoints
})
print
(
"starting process "
,
i
,
cmd
,
curr_env
)
print
(
"starting process "
,
i
,
entrypoint
,
entrypoint_args
,
curr_env
)
fn
=
open
(
"%s/workerlog.%d"
%
(
log_dir
,
i
),
"w"
)
log_fns
.
append
(
fn
)
procs
.
append
(
subprocess
.
Popen
(
cmd
.
strip
().
split
(
" "
),
stdout
=
fn
,
stderr
=
fn
,
env
=
curr_env
))
cmd
=
[
sys
.
executable
,
"-u"
,
entrypoint
]
+
entrypoint_args
procs
.
append
(
subprocess
.
Popen
(
cmd
,
stdout
=
fn
,
stderr
=
fn
,
env
=
curr_env
))
for
i
in
range
(
gpus
):
try
:
...
...
@@ -89,7 +88,8 @@ def start_procs(gpus, cmd, log_dir):
pass
def
main
():
def
parse_args
():
parser
=
argparse
.
ArgumentParser
(
description
=
'''start paddle training using multi-process mode.
NOTE: your train program ***must*** run as distributed nccl2 mode,
...
...
@@ -108,21 +108,27 @@ POD_IP (current node ip address, not needed for local training)
type
=
int
,
default
=
8
,
help
=
'start number of processes for every gpu'
)
parser
.
add_argument
(
'--cmd'
,
type
=
str
,
default
=
""
,
help
=
'command to run for each process, e.g. python train.py --lr 0.1'
)
parser
.
add_argument
(
'--log_dir'
,
type
=
str
,
default
=
"mylog"
,
help
=
'directory to put logs per process.'
)
args
=
parser
.
parse_args
()
if
args
.
cmd
==
""
:
parser
.
print_help
()
exit
(
0
)
start_procs
(
args
.
gpus
,
args
.
cmd
,
args
.
log_dir
)
parser
.
add_argument
(
'entrypoint_script'
,
type
=
str
,
help
=
"The entrypoint script to be launched in parallel,"
"followed by all the arguments for each process,"
"e.g. train.py --lr 0.1"
)
parser
.
add_argument
(
'entrypoint_args'
,
nargs
=
argparse
.
REMAINDER
)
return
parser
.
parse_args
()
def
main
():
args
=
parse_args
()
# launch multiple training process
start_procs
(
args
.
gpus
,
args
.
entrypoint_script
,
args
.
entrypoint_args
,
args
.
log_dir
)
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/__init__.py
浏览文件 @
88d3dc94
...
...
@@ -158,9 +158,9 @@ def __bootstrap__():
'enable_cublas_tensor_op_math'
,
'conv_workspace_size_limit'
,
'cudnn_exhaustive_search'
,
'memory_optimize_debug'
,
'selected_gpus'
,
'sync_nccl_allreduce'
,
'limit_of_tmp_allocation'
,
'times_excess_than_required_tmp_allocation'
'times_excess_than_required_tmp_allocation'
,
'enable_inplace_whitelist'
]
core
.
init_gflags
([
sys
.
argv
[
0
]]
+
[
"--tryfromenv="
+
","
.
join
(
read_env_flags
)])
core
.
init_glog
(
sys
.
argv
[
0
])
...
...
python/paddle/fluid/compiler.py
浏览文件 @
88d3dc94
...
...
@@ -174,6 +174,11 @@ class CompiledProgram(object):
self
.
_exec_strategy
.
num_threads
=
cpu_num
*
2
trainers_endpoints
=
self
.
_program
.
_trainers_endpoints
# FIXME(dzhwinter): enable_inplace should be after memory_optimize
# if turn on python memory optimize, turn off the inplace_pass.
self
.
_build_strategy
.
enable_inplace
=
False
if
self
.
_program
.
_is_mem_optimized
else
True
if
self
.
_build_strategy
.
num_trainers
>
1
and
trainers_endpoints
:
assert
self
.
_build_strategy
.
num_trainers
==
len
(
trainers_endpoints
),
"num_trainers == len(end_points)"
...
...
python/paddle/fluid/contrib/decoder/beam_search_decoder.py
浏览文件 @
88d3dc94
...
...
@@ -22,7 +22,7 @@ This API is still under active development and may change drastically.
from
__future__
import
print_function
import
contextlib
from
...wrapped_decorator
import
signature_safe_contextmanager
import
numpy
as
np
import
six
...
...
@@ -419,7 +419,7 @@ class TrainingDecoder(object):
self
.
_state_cell
=
state_cell
self
.
_state_cell
.
_enter_decoder
(
self
)
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
block
(
self
):
"""
Define the behavior of the decoder for each RNN time step.
...
...
@@ -613,7 +613,7 @@ class BeamSearchDecoder(object):
self
.
_word_dim
=
word_dim
self
.
_input_var_dict
=
input_var_dict
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
block
(
self
):
"""
Define the behavior of the decoder for each RNN time step.
...
...
python/paddle/fluid/contrib/inferencer.py
浏览文件 @
88d3dc94
...
...
@@ -14,7 +14,7 @@
from
__future__
import
print_function
import
contextlib
from
..wrapped_decorator
import
signature_safe_contextmanager
from
..
import
core
...
...
@@ -105,7 +105,7 @@ class Inferencer(object):
return
results
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
_prog_and_scope_guard
(
self
):
with
framework
.
program_guard
(
main_program
=
self
.
inference_program
):
with
executor
.
scope_guard
(
self
.
scope
):
...
...
python/paddle/fluid/contrib/trainer.py
浏览文件 @
88d3dc94
...
...
@@ -14,7 +14,7 @@
from
__future__
import
print_function
import
contextlib
from
..wrapped_decorator
import
signature_safe_contextmanager
import
os
import
errno
import
shutil
...
...
@@ -453,7 +453,7 @@ class Trainer(object):
io
.
save_inference_model
(
param_path
,
feeded_var_names
,
target_vars
,
exe
)
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
_prog_and_scope_guard
(
self
):
with
framework
.
program_guard
(
main_program
=
self
.
train_program
,
...
...
python/paddle/fluid/executor.py
浏览文件 @
88d3dc94
...
...
@@ -17,7 +17,7 @@ from __future__ import print_function
import
os
import
multiprocessing
import
numpy
as
np
import
contextlib
from
.wrapped_decorator
import
signature_safe_contextmanager
import
six
from
.framework
import
Program
,
default_main_program
,
Variable
from
.
import
core
...
...
@@ -49,7 +49,7 @@ def _switch_scope(scope):
return
ex
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
scope_guard
(
scope
):
"""
Change the global/default scope instance by Python `with` statement. All
...
...
python/paddle/fluid/framework.py
浏览文件 @
88d3dc94
...
...
@@ -16,7 +16,7 @@ from __future__ import print_function
import
collections
from
collections
import
defaultdict
import
contextlib
from
.wrapped_decorator
import
signature_safe_contextmanager
import
os
import
re
import
traceback
...
...
@@ -111,7 +111,7 @@ class NameScope(object):
_name_scope
=
NameScope
()
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
name_scope
(
prefix
=
None
):
"""
Generate hierarchical name prefix for the operators.
...
...
@@ -1725,6 +1725,19 @@ class Program(object):
self
.
_trainers_endpoints
=
[]
# the distributed lookup table names
self
.
_distributed_lookup_table
=
None
# @deprecated(the python memory optimize transpiler is deprecated)
# whether the program is optimized by memory_optimize_transpiler
self
.
__is_mem_optimized
=
False
@
property
def
_is_mem_optimized
(
self
):
# if the program is optimized, operator input/outputs
# maybe same, which conflict with save_inference_model.
return
self
.
__is_mem_optimized
@
_is_mem_optimized
.
setter
def
_is_mem_optimized
(
self
,
target
):
self
.
__is_mem_optimized
=
target
@
property
def
op_role
(
self
):
...
...
@@ -1744,7 +1757,7 @@ class Program(object):
return
self
.
_current_role
@
op_role
.
setter
def
set_
op_role
(
self
,
role
):
def
op_role
(
self
,
role
):
self
.
_current_role
=
role
@
property
...
...
@@ -1762,7 +1775,7 @@ class Program(object):
def
set_op_role_var
(
self
,
var_name
):
self
.
_op_role_var
=
[
var_name
]
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
_optimized_guard
(
self
,
param_and_grads
):
"""
A with guard to set :code:`Optimization` :code:`OpRole` and
...
...
@@ -1792,7 +1805,7 @@ class Program(object):
self
.
_op_role_var
=
tmp_var
self
.
_current_role
=
tmp_role
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
_lr_schedule_guard
(
self
,
is_with_opt
=
False
):
"""
A with guard to set :code:`LRSched` :code:`OpRole` and
...
...
@@ -2446,7 +2459,7 @@ def switch_startup_program(program):
return
prev_program
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
program_guard
(
main_program
,
startup_program
=
None
):
"""
Change the global main program and startup program with `with` statement.
...
...
@@ -2511,7 +2524,7 @@ def _get_var(name, program=None):
return
program
.
global_block
().
var
(
name
)
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
_imperative_guard
(
tracer
):
global
_imperative_tracer_
tmp_trace
=
_imperative_tracer_
...
...
@@ -2522,7 +2535,7 @@ def _imperative_guard(tracer):
_imperative_tracer_
=
tmp_trace
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
_imperative_place_guard
(
place
):
global
_imperative_current_expected_place_
tmp_place
=
_imperative_current_expected_place_
...
...
python/paddle/fluid/imperative/base.py
浏览文件 @
88d3dc94
...
...
@@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
contextlib
from
..wrapped_decorator
import
signature_safe_contextmanager
import
numpy
as
np
from
paddle.fluid
import
core
...
...
@@ -24,7 +24,7 @@ def enabled():
return
framework
.
_in_imperative_mode
()
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
guard
(
place
=
None
):
train
=
framework
.
Program
()
startup
=
framework
.
Program
()
...
...
python/paddle/fluid/initializer.py
浏览文件 @
88d3dc94
...
...
@@ -16,7 +16,7 @@ from __future__ import print_function
from
.
import
framework
import
numpy
as
np
import
contextlib
from
.wrapped_decorator
import
signature_safe_contextmanager
from
.core
import
VarDesc
from
.
import
unique_name
...
...
@@ -49,7 +49,7 @@ def force_init_on_cpu():
return
_force_init_on_cpu_
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
init_on_cpu
():
"""
Force the variable to be inited on CPU.
...
...
python/paddle/fluid/io.py
浏览文件 @
88d3dc94
...
...
@@ -16,6 +16,7 @@ from __future__ import print_function
import
os
import
errno
import
warnings
import
time
import
shutil
import
six
...
...
@@ -931,6 +932,13 @@ def save_inference_model(dirname,
if
main_program
is
None
:
main_program
=
default_main_program
()
if
main_program
.
_is_mem_optimized
:
warnings
.
warn
(
"save_inference_model must put before you call memory_optimize.
\
the memory_optimize will modify the original program,
\
is not suitable for saving inference model
\
we save the original program as inference model."
,
RuntimeWarning
)
# fix the bug that the activation op's output as target will be pruned.
# will affect the inference performance.
...
...
python/paddle/fluid/layer_helper.py
浏览文件 @
88d3dc94
...
...
@@ -302,7 +302,8 @@ class LayerHelper(object):
if
default_initializer
is
None
and
attr
.
initializer
is
None
:
if
isinstance
(
dtype
,
core
.
VarDesc
.
VarType
):
if
dtype
!=
core
.
VarDesc
.
VarType
.
FP32
and
\
dtype
!=
core
.
VarDesc
.
VarType
.
FP64
:
dtype
!=
core
.
VarDesc
.
VarType
.
FP64
and
\
dtype
!=
core
.
VarDesc
.
VarType
.
FP16
:
raise
TypeError
(
"Can not create parameter with default initializer when dtype is not float type. Set default_initializer to fit the parameter dtype!"
)
...
...
python/paddle/fluid/layers/control_flow.py
浏览文件 @
88d3dc94
...
...
@@ -13,7 +13,7 @@
# limitations under the License.
from
__future__
import
print_function
import
contextlib
from
..wrapped_decorator
import
signature_safe_contextmanager
from
.layer_function_generator
import
autodoc
,
templatedoc
from
.tensor
import
assign
,
fill_constant
...
...
@@ -1532,7 +1532,7 @@ class DynamicRNN(object):
outputs
=
{
'Out'
:
[
x_reordered
]})
return
shrink_memory
(
x_reordered
,
self
.
step_idx
,
self
.
lod_rank_table
)
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
block
(
self
):
"""
The block for user to define operators in RNN. See the class docstring
...
...
python/paddle/fluid/layers/detection.py
浏览文件 @
88d3dc94
...
...
@@ -397,10 +397,10 @@ def box_coder(prior_box,
input is image feature map, they are close to
the origin of the coordinate system. [xmax, ymax]
is the right bottom coordinate of the anchor box.
prior_box_var(Variable|list
): prior_box_var supports two types of input.
One is variable with shape [M, 4] holds M group.
The other one is list consist of 4 elements
shared by all boxes.
prior_box_var(Variable|list
|None): prior_box_var supports two types
of input. One is variable with shape [M, 4]
holds M group. The other one is list consist of
4 elements
shared by all boxes.
target_box(Variable): This input can be a 2-D LoDTensor with shape
[N, 4] when code_type is 'encode_center_size'.
This input also can be a 3-D Tensor with shape
...
...
python/paddle/fluid/layers/io.py
浏览文件 @
88d3dc94
...
...
@@ -13,7 +13,7 @@
# limitations under the License.
from
__future__
import
print_function
import
contextlib
from
..wrapped_decorator
import
signature_safe_contextmanager
import
multiprocessing
import
os
import
six
...
...
@@ -1116,7 +1116,7 @@ class Preprocessor(object):
def
_is_completed
(
self
):
return
self
.
sub_block
and
self
.
source_var_names
and
self
.
sink_var_names
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
block
(
self
):
self
.
status
=
Preprocessor
.
IN_SUB_BLOCK
self
.
sub_block
=
self
.
main_prog
.
_create_block
()
...
...
python/paddle/fluid/layers/nn.py
浏览文件 @
88d3dc94
...
...
@@ -2930,6 +2930,7 @@ def batch_norm(input,
"momentum"
:
momentum
,
"epsilon"
:
epsilon
,
"is_test"
:
is_test
,
"data_layout"
:
data_layout
,
"use_mkldnn"
:
False
,
"fuse_with_relu"
:
fuse_with_relu
,
"use_global_stats"
:
use_global_stats
...
...
python/paddle/fluid/optimizer.py
浏览文件 @
88d3dc94
...
...
@@ -15,7 +15,7 @@
from
__future__
import
print_function
from
collections
import
defaultdict
from
contextlib
import
contextmanager
from
.wrapped_decorator
import
signature_safe_
contextmanager
from
paddle.fluid.framework
import
Program
,
Variable
,
name_scope
,
default_main_program
from
paddle.fluid.distribute_lookup_table
import
find_distributed_lookup_table
...
...
@@ -1610,7 +1610,7 @@ class ModelAverage(Optimizer):
},
stop_gradient
=
True
)
@
contextmanager
@
signature_safe_
contextmanager
def
apply
(
self
,
executor
,
need_restore
=
True
):
"""Apply average values to parameters of current model.
"""
...
...
python/paddle/fluid/parallel_executor.py
浏览文件 @
88d3dc94
...
...
@@ -146,6 +146,10 @@ class ParallelExecutor(object):
# step4: get main_program, scope, local_scopes
main
=
main_program
if
main_program
\
else
framework
.
default_main_program
()
# FIXME(dzhwinter): enable_inplace should be after memory_optimize
# if turn on python memory optimize, turn off the inplace_pass.
if
build_strategy
.
enable_inplace
is
None
:
build_strategy
.
enable_inplace
=
False
if
main
.
_is_mem_optimized
else
True
scope
=
scope
if
scope
is
not
None
else
executor
.
global_scope
()
if
share_vars_from
and
not
isinstance
(
share_vars_from
,
...
...
python/paddle/fluid/profiler.py
浏览文件 @
88d3dc94
...
...
@@ -15,7 +15,7 @@
from
__future__
import
print_function
from
.
import
core
from
contextlib
import
contextmanager
from
.wrapped_decorator
import
signature_safe_
contextmanager
import
os
import
six
...
...
@@ -35,7 +35,7 @@ NVPROF_CONFIG = [
]
@
contextmanager
@
signature_safe_
contextmanager
def
cuda_profiler
(
output_file
,
output_mode
=
None
,
config
=
None
):
"""The CUDA profiler.
This fuctions is used to profile CUDA program by CUDA runtime application
...
...
@@ -217,7 +217,7 @@ def stop_profiler(sorted_key=None, profile_path='/tmp/profile'):
core
.
disable_profiler
(
key_map
[
sorted_key
],
profile_path
)
@
contextmanager
@
signature_safe_
contextmanager
def
profiler
(
state
,
sorted_key
=
None
,
profile_path
=
'/tmp/profile'
):
"""The profiler interface.
Different from cuda_profiler, this profiler can be used to profile both CPU
...
...
python/paddle/fluid/recordio_writer.py
浏览文件 @
88d3dc94
...
...
@@ -15,14 +15,14 @@
from
__future__
import
print_function
import
os
import
contextlib
from
.wrapped_decorator
import
signature_safe_contextmanager
from
.
import
core
__all__
=
[
'convert_reader_to_recordio_file'
,
'convert_reader_to_recordio_files'
]
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
create_recordio_writer
(
filename
,
compressor
=
core
.
RecordIOWriter
.
Compressor
.
Snappy
,
max_num_records
=
1000
):
...
...
python/paddle/fluid/tests/unittests/CMakeLists.txt
浏览文件 @
88d3dc94
...
...
@@ -109,8 +109,13 @@ set_tests_properties(test_parallel_executor_fetch_feed PROPERTIES TIMEOUT 450)
py_test_modules
(
test_parallel_executor_transformer MODULES test_parallel_executor_transformer SERIAL
)
if
(
NOT APPLE
)
py_test_modules
(
test_image_classification_resnet MODULES test_image_classification_resnet SERIAL
)
if
(
CMAKE_BUILD_TYPE STREQUAL
"Debug"
)
# change the timeout from 600 to 1200, because in debug mode, this test need more time.
set_tests_properties
(
test_image_classification_resnet PROPERTIES TIMEOUT 1200
)
endif
()
endif
()
if
(
WITH_NGRAPH
)
add_subdirectory
(
ngraph
)
endif
()
...
...
python/paddle/fluid/tests/unittests/ngraph/test_accuracy_ngraph_op.py
0 → 100644
浏览文件 @
88d3dc94
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
import
numpy
as
np
from
paddle.fluid.tests.unittests.op_test
import
OpTest
class
TestNGRAPHAccuracyOp
(
OpTest
):
def
setUp
(
self
):
self
.
op_type
=
"accuracy"
self
.
dtype
=
np
.
float32
self
.
init_dtype
()
n
=
128
infer
=
np
.
random
.
random
((
n
,
1
)).
astype
(
self
.
dtype
)
indices
=
np
.
random
.
randint
(
0
,
2
,
(
n
,
1
))
label
=
np
.
random
.
randint
(
0
,
2
,
(
n
,
1
))
self
.
inputs
=
{
'Out'
:
infer
,
'Indices'
:
indices
,
"Label"
:
label
}
num_correct
=
0
for
rowid
in
range
(
n
):
for
ele
in
indices
[
rowid
]:
if
ele
==
label
[
rowid
]:
num_correct
+=
1
break
self
.
outputs
=
{
'Accuracy'
:
np
.
array
([
num_correct
/
float
(
n
)]).
astype
(
self
.
dtype
),
'Correct'
:
np
.
array
([
num_correct
]).
astype
(
"int64"
),
'Total'
:
np
.
array
([
n
]).
astype
(
"int64"
)
}
self
.
_cpu_only
=
True
def
init_dtype
(
self
):
pass
def
test_check_output
(
self
):
self
.
check_output
()
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/ngraph/test_activation_ngraph_op.py
浏览文件 @
88d3dc94
...
...
@@ -18,17 +18,7 @@ import unittest
import
numpy
as
np
import
paddle.fluid.core
as
core
from
paddle.fluid.tests.unittests.op_test
import
OpTest
from
paddle.fluid.tests.unittests.test_activation_op
import
TestRelu
,
TestTanh
class
TestNGRAPHReluDim2
(
TestRelu
):
def
setUp
(
self
):
super
(
TestNGRAPHReluDim2
,
self
).
setUp
()
class
TestNGRAPHTanhDim2
(
TestTanh
):
def
setUp
(
self
):
super
(
TestNGRAPHTanhDim2
,
self
).
setUp
()
from
paddle.fluid.tests.unittests.test_activation_op
import
TestSigmoid
,
TestRelu
,
TestTanh
class
TestNGRAPHReluDim4
(
TestRelu
):
...
...
python/paddle/fluid/tests/unittests/ngraph/test_batch_norm_ngraph_op.py
0 → 100644
浏览文件 @
88d3dc94
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
from
paddle.fluid.tests.unittests.test_batch_norm_op
import
TestBatchNormOpTraining
,
TestBatchNormOpInference
class
TestNGRAPHBatchNormOpTraining
(
TestBatchNormOpTraining
):
def
init_kernel_type
(
self
):
super
(
TestNGRAPHBatchNormOpTraining
,
self
).
init_kernel_type
()
class
TestNGRAPHBatchNormOpInference
(
TestBatchNormOpInference
):
def
init_kernel_type
(
self
):
super
(
TestNGRAPHBatchNormOpInference
,
self
).
init_kernel_type
()
class
TestNGRAPHBatchNormOpWithReluInference
(
TestBatchNormOpInference
):
def
init_kernel_type
(
self
):
super
(
TestNGRAPHBatchNormOpWithReluInference
,
self
).
init_kernel_type
()
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/ngraph/test_conv2d_ngraph_op.py
浏览文件 @
88d3dc94
...
...
@@ -15,35 +15,59 @@
from
__future__
import
print_function
import
unittest
from
paddle.fluid.tests.unittests.test_conv2d_op
import
*
from
paddle.fluid.tests.unittests.test_conv2d_op
import
TestConv2dOp
,
TestWithPad
,
TestWithStride
,
TestWithGroup
,
TestWith1x1
,
TestWithInput1x1Filter1x1
class
TestNGRAPH
(
TestConv2dOp
):
def
setUp
(
self
):
super
(
TestNGRAPH
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_kernel_type
(
self
):
super
(
TestNGRAPH
,
self
).
init_kernel_type
()
class
TestNGRAPHWithPad
(
TestWithPad
):
def
setUp
(
self
):
super
(
TestNGRAPHWithPad
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_kernel_type
(
self
):
super
(
TestNGRAPHWithPad
,
self
).
init_kernel_type
()
class
TestNGRAPHWithStride
(
TestWithStride
):
def
setUp
(
self
):
super
(
TestNGRAPHWithStride
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_kernel_type
(
self
):
super
(
TestNGRAPHWithStride
,
self
).
init_kernel_type
()
class
TestNGRAPHWithGroup
(
TestWithGroup
):
def
setUp
(
self
):
super
(
TestNGRAPHWithGroup
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_kernel_type
(
self
):
super
(
TestNGRAPHWithGroup
,
self
).
init_kernel_type
()
class
TestNGRAPHWith1x1
(
TestWith1x1
):
def
setUp
(
self
):
super
(
TestNGRAPHWith1x1
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_kernel_type
(
self
):
super
(
TestNGRAPHWith1x1
,
self
).
init_kernel_type
()
class
TestNGRAPHWithInput1x1Filter1x1
(
TestWithInput1x1Filter1x1
):
def
setUp
(
self
):
super
(
TestNGRAPHWithInput1x1Filter1x1
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_kernel_type
(
self
):
super
(
TestNGRAPHWithInput1x1Filter1x1
,
self
).
init_kernel_type
()
...
...
python/paddle/fluid/tests/unittests/ngraph/test_elementwise_add_ngraph_op.py
浏览文件 @
88d3dc94
...
...
@@ -14,73 +14,16 @@
from
__future__
import
print_function
import
unittest
from
paddle.fluid.tests.unittests.test_elementwise_add_op
import
*
from
paddle.fluid.tests.unittests.test_elementwise_add_op
import
TestElementwiseAddOp
class
TestNGRAPHElementwiseAddOp
(
TestElementwiseAddOp
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_scalar
(
TestElementwiseAddOp_scalar
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_scalar
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_scalar2
(
TestElementwiseAddOp_scalar2
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_scalar2
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_Vector
(
TestElementwiseAddOp_Vector
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_Vector
,
self
).
init_input_output
()
class
TesNGRAPHtElementwiseAddOp_broadcast_0
(
TestElementwiseAddOp_broadcast_0
):
def
init_input_output
(
self
):
super
(
TesNGRAPHtElementwiseAddOp_broadcast_0
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_broadcast_1
(
TestElementwiseAddOp_broadcast_1
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_broadcast_1
,
self
).
init_input_output
()
def
setUp
(
self
):
super
(
TestNGRAPHElementwiseAddOp
,
self
).
setUp
()
self
.
_cpu_only
=
True
class
TestNGRAPHElementwiseAddOp_broadcast_2
(
TestElementwiseAddOp_broadcast_2
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_broadcast_2
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_broadcast_3
(
TestElementwiseAddOp_broadcast_3
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_broadcast_3
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_broadcast_4
(
TestElementwiseAddOp_broadcast_4
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_broadcast_4
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_rowwise_add_0
(
TestElementwiseAddOp_rowwise_add_0
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_rowwise_add_0
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_rowwise_add_1
(
TestElementwiseAddOp_rowwise_add_1
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_rowwise_add_1
,
self
).
init_input_output
()
class
TestNGRAPHElementwiseAddOp_channelwise_add
(
TestElementwiseAddOp_channelwise_add
):
def
init_input_output
(
self
):
super
(
TestNGRAPHElementwiseAddOp_channelwise_add
,
self
).
init_input_output
()
super
(
TestNGRAPHElementwiseAddOp
,
self
).
init_input_output
()
if
__name__
==
'__main__'
:
...
...
python/paddle/fluid/tests/unittests/ngraph/test_mean_ngraph_op.py
浏览文件 @
88d3dc94
...
...
@@ -14,17 +14,13 @@
from
__future__
import
print_function
import
unittest
from
paddle.fluid.tests.unittests.test_mean_op
import
TestMeanOp
,
TestFP16MeanOp
from
paddle.fluid.tests.unittests.test_mean_op
import
TestMeanOp
class
TestNGRAPHMeanOp
(
TestMeanOp
):
def
setUp
(
self
):
super
(
TestNGRAPHMeanOp
,
self
).
setUp
()
class
TestNGRAPHFP16MeanOp
(
TestFP16MeanOp
):
def
setUp
(
self
):
super
(
TestNGRAPHFP16MeanOp
,
self
).
setUp
()
self
.
_cpu_only
=
True
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/tests/unittests/ngraph/test_mul_ngraph_op.py
浏览文件 @
88d3dc94
...
...
@@ -15,27 +15,38 @@
from
__future__
import
print_function
import
unittest
from
paddle.fluid.tests.unittests.test_mul_op
import
TestMulOp
,
TestMulOp2
,
TestFP16MulOp1
,
TestFP16MulOp2
import
numpy
as
np
from
paddle.fluid.tests.unittests.op_test
import
OpTest
class
TestNGRAPHMulOp
(
OpTest
):
def
setUp
(
self
):
self
.
op_type
=
"mul"
self
.
dtype
=
np
.
float32
self
.
init_dtype_type
()
self
.
inputs
=
{
'X'
:
np
.
random
.
random
((
2
,
4
)).
astype
(
self
.
dtype
),
'Y'
:
np
.
random
.
random
((
4
,
4
)).
astype
(
self
.
dtype
)
}
self
.
outputs
=
{
'Out'
:
np
.
dot
(
self
.
inputs
[
'X'
],
self
.
inputs
[
'Y'
])}
self
.
_cpu_only
=
True
class
TestNGRAPHMulOp
(
TestMulOp
):
def
init_dtype_type
(
self
):
pass
def
test_check_output
(
self
):
self
.
check_output
()
class
TestNGRAPHMulOp2
(
TestMulOp2
):
def
init_dtype_type
(
self
):
pass
def
test_check_grad_normal
(
self
):
self
.
check_grad
([
'X'
,
'Y'
],
'Out'
,
max_relative_error
=
0.5
)
def
test_check_grad_ingore_x
(
self
):
self
.
check_grad
(
[
'Y'
],
'Out'
,
max_relative_error
=
0.5
,
no_grad_set
=
set
(
"X"
))
class
TestNGRAPHFP16MulOp1
(
TestFP16MulOp1
):
def
init_dtype_type
(
self
):
pass
class
TestNGRAPHFP16MulOp2
(
TestFP16MulOp2
):
def
init_dtype_type
(
self
):
pass
def
test_check_grad_ingore_y
(
self
):
self
.
check_grad
(
[
'X'
],
'Out'
,
max_relative_error
=
0.5
,
no_grad_set
=
set
(
'Y'
))
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/tests/unittests/ngraph/test_pool2d_ngraph_op.py
浏览文件 @
88d3dc94
...
...
@@ -14,35 +14,59 @@
from
__future__
import
print_function
from
paddle.fluid.tests.unittests.test_pool2d_op
import
*
from
paddle.fluid.tests.unittests.test_pool2d_op
import
TestPool2D_Op
,
TestCase1
,
TestCase2
,
TestCase3
,
TestCase4
,
TestCase5
class
TestNGRAPHPool2D_Op
(
TestPool2D_Op
):
def
setUp
(
self
):
super
(
TestNGRAPHPool2D_Op
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_test_case
(
self
):
super
(
TestNGRAPHPool2D_Op
,
self
).
init_test_case
()
class
TestNGRAPHCase1
(
TestCase1
):
def
setUp
(
self
):
super
(
TestNGRAPHCase1
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_test_case
(
self
):
super
(
TestNGRAPHCase1
,
self
).
init_test_case
()
class
TestNGRAPHCase2
(
TestCase2
):
def
setUp
(
self
):
super
(
TestNGRAPHCase2
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_test_case
(
self
):
super
(
TestNGRAPHCase2
,
self
).
init_test_case
()
class
TestNGRAPHCase3
(
TestCase3
):
def
setUp
(
self
):
super
(
TestNGRAPHCase3
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_pool_type
(
self
):
super
(
TestNGRAPHCase3
,
self
).
init_pool_type
()
class
TestNGRAPHCase4
(
TestCase4
):
def
setUp
(
self
):
super
(
TestNGRAPHCase4
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_pool_type
(
self
):
super
(
TestNGRAPHCase4
,
self
).
init_pool_type
()
class
TestNGRAPHCase5
(
TestCase5
):
def
setUp
(
self
):
super
(
TestNGRAPHCase5
,
self
).
setUp
()
self
.
_cpu_only
=
True
def
init_pool_type
(
self
):
super
(
TestNGRAPHCase5
,
self
).
init_pool_type
()
...
...
python/paddle/fluid/tests/unittests/ngraph/test_scale_ngraph_op.py
浏览文件 @
88d3dc94
...
...
@@ -13,25 +13,23 @@
# limitations under the License.
from
__future__
import
print_function
import
unittest
from
paddle.fluid.tests.unittests.test_scale_op
import
TestScaleOp
,
TestScaleOpSelectedRows
,
TestScaleFp16Op
,
TestScaleFp16OpSelectedRows
from
paddle.fluid.tests.unittests.test_scale_op
import
TestScaleOp
,
TestScaleOpSelectedRows
class
TestNGRAPHScaleOp
(
TestScaleOp
):
def
init_dtype_type
(
self
):
pass
def
setUp
(
self
):
super
(
TestNGRAPHScaleOp
,
self
).
setUp
()
self
.
_cpu_only
=
True
class
TestNGRAPHScaleOpSelectedRows
(
TestScaleOpSelectedRows
):
def
init_dtype_type
(
self
):
pass
class
TestNGRAPHScale
Fp16Op
(
TestScaleFp16Op
):
def
init_dtype_type
(
self
):
pass
class
TestNGRAPHScale
OpSelectedRows
(
TestScaleOpSelectedRows
):
def
setUp
(
self
):
super
(
TestNGRAPHScaleOpSelectedRows
,
self
).
setUp
()
self
.
_cpu_only
=
True
class
TestNGRAPHScaleFp16OpSelectedRows
(
TestScaleFp16OpSelectedRows
):
def
init_dtype_type
(
self
):
pass
...
...
python/paddle/fluid/tests/unittests/ngraph/test_sum_ngraph_op.py
0 → 100644
浏览文件 @
88d3dc94
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
from
paddle.fluid.tests.unittests.test_sum_op
import
TestSumOp
,
TestSelectedRowsSumOp
,
TestLoDTensorAndSelectedRowsOp
if
__name__
==
"__main__"
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/ngraph/test_top_k_ngraph_op.py
浏览文件 @
88d3dc94
...
...
@@ -20,21 +20,25 @@ from paddle.fluid.tests.unittests.test_top_k_op import TestTopkOp, TestTopkOp3d,
class
TestNGRAPHTopkOp
(
TestTopkOp
):
def
setUp
(
self
):
super
(
TestNGRAPHTopkOp
,
self
).
setUp
()
self
.
_cpu_only
=
True
class
TestNGRAPHTopkOp2
(
TestTopkOp2
):
def
setUp
(
self
):
super
(
TestNGRAPHTopkOp2
,
self
).
setUp
()
self
.
_cpu_only
=
True
class
TestNGRAPHTopkOp3
(
TestTopkOp3
):
def
setUp
(
self
):
super
(
TestNGRAPHTopkOp3
,
self
).
setUp
()
self
.
_cpu_only
=
True
class
TestNGRAPHTopkOp4
(
TestTopkOp4
):
def
setUp
(
self
):
super
(
TestNGRAPHTopkOp4
,
self
).
setUp
()
self
.
_cpu_only
=
True
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/tests/unittests/parallel_executor_test_base.py
浏览文件 @
88d3dc94
...
...
@@ -40,7 +40,8 @@ class TestParallelExecutorBase(unittest.TestCase):
seed
=
None
,
use_parallel_executor
=
True
,
use_reduce
=
False
,
use_ir_memory_optimize
=
False
,
use_ir_memory_optimize
=
True
,
enable_inplace
=
True
,
fuse_elewise_add_act_ops
=
False
,
fuse_relu_depthwise_conv
=
False
,
optimizer
=
fluid
.
optimizer
.
Adam
,
...
...
@@ -60,7 +61,6 @@ class TestParallelExecutorBase(unittest.TestCase):
main
.
random_seed
=
seed
loss
=
method
(
use_feed
=
feed_dict
is
not
None
)
if
optimizer
:
optimizer
().
minimize
(
loss
)
...
...
@@ -72,7 +72,6 @@ class TestParallelExecutorBase(unittest.TestCase):
exe
.
run
(
startup
)
exec_strategy
=
fluid
.
ExecutionStrategy
()
exec_strategy
.
allow_op_delay
=
allow_op_delay
exec_strategy
.
num_threads
=
1
if
use_fast_executor
:
exec_strategy
.
use_experimental_executor
=
True
build_strategy
=
fluid
.
BuildStrategy
()
...
...
@@ -81,7 +80,11 @@ class TestParallelExecutorBase(unittest.TestCase):
build_strategy
.
fuse_elewise_add_act_ops
=
fuse_elewise_add_act_ops
build_strategy
.
fuse_relu_depthwise_conv
=
fuse_relu_depthwise_conv
build_strategy
.
memory_optimize
=
use_ir_memory_optimize
# python memory optimization is conflict with inplace pass.
# Use ir graph memory optimization after inplace pass is the correct way.
build_strategy
.
enable_inplace
=
False
if
memory_opt
else
enable_inplace
build_strategy
.
enable_sequential_execution
=
enable_sequential_execution
if
use_cuda
and
core
.
is_compiled_with_cuda
():
build_strategy
.
remove_unnecessary_lock
=
True
if
use_parallel_executor
:
...
...
@@ -100,9 +103,8 @@ class TestParallelExecutorBase(unittest.TestCase):
first_loss
,
=
run_executor
(
exe
=
exe
,
binary
=
binary
,
feed
=
feed_dict
,
fetch_list
=
[
loss
.
name
])
for
_
in
range
(
iter
):
run_executor
(
exe
=
exe
,
binary
=
binary
,
feed
=
feed_dict
,
fetch_list
=
[])
for
i
in
range
(
iter
):
run_executor
(
exe
=
exe
,
binary
=
binary
,
feed
=
feed_dict
,
fetch_list
=
[])
last_loss
,
=
run_executor
(
exe
=
exe
,
binary
=
binary
,
feed
=
feed_dict
,
fetch_list
=
[
loss
.
name
])
...
...
python/paddle/fluid/tests/unittests/test_box_coder_op.py
浏览文件 @
88d3dc94
...
...
@@ -34,7 +34,9 @@ def box_decoder(t_box, p_box, pb_v, output_box, norm, axis=0):
pb_y
=
pb_y
.
reshape
(
shape
)
if
pb_v
.
ndim
==
2
:
pb_v
=
pb_v
.
reshape
(
1
,
pb_v
.
shape
[
0
],
pb_v
.
shape
[
1
])
var_shape
=
(
1
,
pb_v
.
shape
[
0
],
pb_v
.
shape
[
1
])
if
axis
==
0
else
(
pb_v
.
shape
[
0
],
1
,
pb_v
.
shape
[
1
])
pb_v
=
pb_v
.
reshape
(
var_shape
)
if
pb_v
.
ndim
==
1
:
tb_x
=
pb_v
[
0
]
*
t_box
[:,
:,
0
]
*
pb_w
+
pb_x
tb_y
=
pb_v
[
1
]
*
t_box
[:,
:,
1
]
*
pb_h
+
pb_y
...
...
@@ -125,33 +127,6 @@ class TestBoxCoderOp(OpTest):
self
.
outputs
=
{
'OutputBox'
:
output_box
}
class
TestBoxCoderOpWithOneRankVar
(
OpTest
):
def
test_check_output
(
self
):
self
.
check_output
()
def
setUp
(
self
):
self
.
op_type
=
"box_coder"
lod
=
[[
1
,
1
,
1
,
1
,
1
]]
prior_box
=
np
.
random
.
random
((
81
,
4
)).
astype
(
'float32'
)
prior_box_var
=
np
.
random
.
random
((
4
)).
astype
(
'float32'
)
target_box
=
np
.
random
.
random
((
20
,
81
,
4
)).
astype
(
'float32'
)
code_type
=
"DecodeCenterSize"
box_normalized
=
False
output_box
=
batch_box_coder
(
prior_box
,
prior_box_var
,
target_box
,
lod
[
0
],
code_type
,
box_normalized
)
self
.
inputs
=
{
'PriorBox'
:
prior_box
,
'PriorBoxVar'
:
prior_box_var
,
'TargetBox'
:
target_box
,
}
self
.
attrs
=
{
'code_type'
:
'decode_center_size'
,
'box_normalized'
:
False
}
self
.
outputs
=
{
'OutputBox'
:
output_box
}
class
TestBoxCoderOpWithoutBoxVar
(
OpTest
):
def
test_check_output
(
self
):
self
.
check_output
()
...
...
@@ -210,7 +185,7 @@ class TestBoxCoderOpWithAxis(OpTest):
self
.
op_type
=
"box_coder"
lod
=
[[
1
,
1
,
1
,
1
,
1
]]
prior_box
=
np
.
random
.
random
((
30
,
4
)).
astype
(
'float32'
)
prior_box_var
=
np
.
random
.
random
((
4
)).
astype
(
'float32'
)
prior_box_var
=
np
.
random
.
random
((
30
,
4
)).
astype
(
'float32'
)
target_box
=
np
.
random
.
random
((
30
,
81
,
4
)).
astype
(
'float32'
)
code_type
=
"DecodeCenterSize"
box_normalized
=
False
...
...
python/paddle/fluid/tests/unittests/test_eager_deletion_transformer.py
浏览文件 @
88d3dc94
...
...
@@ -16,12 +16,10 @@ import os
import
unittest
os
.
environ
[
'FLAGS_eager_delete_tensor_gb'
]
=
"0.0"
from
test_parallel_executor_transformer
import
TestTransformer
class
EagerDeletionTestTransformer
(
TestTransformer
):
pass
os
.
environ
[
'RECORDIO_FILENAME'
]
=
'/tmp/eager_deletion_transformer.wmt16.recordio'
from
test_parallel_executor_transformer
import
TestTransformer
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_expand_op.py
浏览文件 @
88d3dc94
...
...
@@ -109,5 +109,32 @@ class TestExpandOpRank4(OpTest):
self
.
check_grad
([
'X'
],
'Out'
)
class
TestExpandOpInteger
(
OpTest
):
def
setUp
(
self
):
self
.
op_type
=
"expand"
self
.
inputs
=
{
'X'
:
np
.
random
.
randint
(
10
,
size
=
(
2
,
4
,
5
)).
astype
(
"int32"
)
}
self
.
attrs
=
{
'expand_times'
:
[
2
,
1
,
4
]}
output
=
np
.
tile
(
self
.
inputs
[
'X'
],
(
2
,
1
,
4
))
self
.
outputs
=
{
'Out'
:
output
}
def
test_check_output
(
self
):
self
.
check_output
()
class
TestExpandOpBoolean
(
OpTest
):
def
setUp
(
self
):
self
.
op_type
=
"expand"
self
.
inputs
=
{
'X'
:
np
.
random
.
randint
(
2
,
size
=
(
2
,
4
,
5
)).
astype
(
"bool"
)}
self
.
attrs
=
{
'expand_times'
:
[
2
,
1
,
4
]}
output
=
np
.
tile
(
self
.
inputs
[
'X'
],
(
2
,
1
,
4
))
self
.
outputs
=
{
'Out'
:
output
}
def
test_check_output
(
self
):
self
.
check_output
()
if
__name__
==
"__main__"
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_inference_model_io.py
浏览文件 @
88d3dc94
...
...
@@ -25,6 +25,7 @@ import paddle.fluid.layers as layers
import
paddle.fluid.optimizer
as
optimizer
from
paddle.fluid.framework
import
Program
,
program_guard
from
paddle.fluid.io
import
save_inference_model
,
load_inference_model
from
paddle.fluid.transpiler
import
memory_optimize
class
TestBook
(
unittest
.
TestCase
):
...
...
@@ -87,5 +88,31 @@ class TestBook(unittest.TestCase):
self
.
assertEqual
(
expected
,
actual
)
class
TestSaveInferenceModel
(
unittest
.
TestCase
):
def
test_save_inference_model
(
self
):
MODEL_DIR
=
"./tmp/inference_model2"
init_program
=
Program
()
program
=
Program
()
# fake program without feed/fetch
with
program_guard
(
program
,
init_program
):
x
=
layers
.
data
(
name
=
'x'
,
shape
=
[
2
],
dtype
=
'float32'
)
y
=
layers
.
data
(
name
=
'y'
,
shape
=
[
1
],
dtype
=
'float32'
)
y_predict
=
layers
.
fc
(
input
=
x
,
size
=
1
,
act
=
None
)
cost
=
layers
.
square_error_cost
(
input
=
y_predict
,
label
=
y
)
avg_cost
=
layers
.
mean
(
cost
)
place
=
core
.
CPUPlace
()
exe
=
executor
.
Executor
(
place
)
exe
.
run
(
init_program
,
feed
=
{},
fetch_list
=
[])
memory_optimize
(
program
,
print_log
=
True
)
self
.
assertEqual
(
program
.
_is_mem_optimized
,
True
)
# will print warning message
save_inference_model
(
MODEL_DIR
,
[
"x"
,
"y"
],
[
avg_cost
],
exe
,
program
)
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_ir_inplace_pass.py
0 → 100644
浏览文件 @
88d3dc94
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
os
import
unittest
import
numpy
as
np
import
paddle.fluid.core
as
core
import
paddle.fluid
as
fluid
from
parallel_executor_test_base
import
TestParallelExecutorBase
def
fc_with_batchnorm
(
use_feed
):
img
=
fluid
.
layers
.
data
(
name
=
'image'
,
shape
=
[
784
],
dtype
=
'float32'
)
label
=
fluid
.
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
dtype
=
'int64'
)
hidden
=
img
for
_
in
range
(
3
):
hidden
=
fluid
.
layers
.
fc
(
hidden
,
size
=
200
,
act
=
'tanh'
,
bias_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
Constant
(
value
=
1.0
)))
hidden
=
fluid
.
layers
.
batch_norm
(
input
=
hidden
)
prediction
=
fluid
.
layers
.
fc
(
hidden
,
size
=
10
,
act
=
'softmax'
)
loss
=
fluid
.
layers
.
cross_entropy
(
input
=
prediction
,
label
=
label
)
loss
=
fluid
.
layers
.
mean
(
loss
)
return
loss
class
TestIrInplace
(
TestParallelExecutorBase
):
@
classmethod
def
setUpClass
(
cls
):
os
.
environ
[
'CPU_NUM'
]
=
str
(
4
)
def
_fc_with_batchnorm
(
self
,
ir_memory_optimize
,
enable_inplace
,
memory_opt
=
False
):
if
not
core
.
is_compiled_with_cuda
():
return
np
.
random
.
seed
(
5
)
img
=
np
.
random
.
random
(
size
=
[
32
,
784
]).
astype
(
np
.
float32
)
label
=
np
.
ones
(
shape
=
[
32
,
1
],
dtype
=
'int64'
)
self
.
check_network_convergence
(
fc_with_batchnorm
,
feed_dict
=
{
"image"
:
img
,
"label"
:
label
},
use_cuda
=
True
,
memory_opt
=
memory_opt
,
use_ir_memory_optimize
=
ir_memory_optimize
,
enable_inplace
=
enable_inplace
)
def
test_fc_with_batchnorm
(
self
,
delta
=
1e-3
):
loss00
=
self
.
_fc_with_batchnorm
(
False
,
False
)
loss10
=
self
.
_fc_with_batchnorm
(
True
,
False
)
loss01
=
self
.
_fc_with_batchnorm
(
False
,
True
)
loss11
=
self
.
_fc_with_batchnorm
(
True
,
True
)
self
.
assertAlmostEqual
(
loss00
,
loss10
,
delta
=
delta
)
self
.
assertAlmostEqual
(
loss00
,
loss01
,
delta
=
delta
)
self
.
assertAlmostEqual
(
loss00
,
loss11
,
delta
=
delta
)
python/paddle/fluid/tests/unittests/test_parallel_executor_seresnext.py
浏览文件 @
88d3dc94
...
...
@@ -200,7 +200,7 @@ class TestResnet(TestParallelExecutorBase):
model
,
use_cuda
,
iter
=
20
,
delta2
=
1e-
6
):
delta2
=
1e-
5
):
if
use_cuda
and
not
core
.
is_compiled_with_cuda
():
return
...
...
@@ -228,7 +228,7 @@ class TestResnet(TestParallelExecutorBase):
optimizer
=
optimizer
)
for
loss
in
zip
(
all_reduce_first_loss
,
reduce_first_loss
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-
6
)
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-
5
)
for
loss
in
zip
(
all_reduce_last_loss
,
reduce_last_loss
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
delta2
)
...
...
@@ -258,17 +258,17 @@ class TestResnet(TestParallelExecutorBase):
enable_sequential_execution
=
True
)
for
loss
in
zip
(
all_reduce_first_loss
,
all_reduce_first_loss_seq
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-
6
)
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-
5
)
for
loss
in
zip
(
all_reduce_last_loss
,
all_reduce_last_loss_seq
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
delta2
)
for
loss
in
zip
(
reduce_first_loss
,
reduce_first_loss_seq
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-
6
)
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-
5
)
for
loss
in
zip
(
reduce_last_loss
,
reduce_last_loss_seq
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
delta2
)
for
loss
in
zip
(
all_reduce_first_loss_seq
,
reduce_first_loss_seq
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-
6
)
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-
5
)
for
loss
in
zip
(
all_reduce_last_loss_seq
,
reduce_last_loss_seq
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
delta2
)
...
...
@@ -277,7 +277,7 @@ class TestResnet(TestParallelExecutorBase):
use_cuda
=
True
,
use_reduce
=
False
,
iter
=
20
,
delta2
=
1e-
6
):
delta2
=
1e-
5
):
if
use_cuda
and
not
core
.
is_compiled_with_cuda
():
return
...
...
@@ -308,7 +308,7 @@ class TestResnet(TestParallelExecutorBase):
optimizer
=
optimizer
)
self
.
assertAlmostEquals
(
np
.
mean
(
parallel_first_loss
),
single_first_loss
[
0
],
delta
=
1e-
6
)
np
.
mean
(
parallel_first_loss
),
single_first_loss
[
0
],
delta
=
1e-
5
)
self
.
assertAlmostEquals
(
np
.
mean
(
parallel_last_loss
),
single_last_loss
[
0
],
delta
=
delta2
)
...
...
python/paddle/fluid/tests/unittests/test_parallel_executor_transformer.py
浏览文件 @
88d3dc94
...
...
@@ -24,7 +24,7 @@ import paddle.fluid.core as core
import
paddle.dataset.wmt16
as
wmt16
import
os
WMT16_RECORDIO_FILE
=
"/tmp/wmt16.recordio"
WMT16_RECORDIO_FILE
=
os
.
environ
.
get
(
'RECORDIO_FILENAME'
,
'/tmp/wmt16.recordio'
)
class
ModelHyperParams
(
object
):
...
...
python/paddle/fluid/tests/unittests/transformer_model.py
浏览文件 @
88d3dc94
...
...
@@ -17,6 +17,7 @@ from __future__ import print_function
from
functools
import
partial
import
numpy
as
np
import
os
import
paddle.fluid
as
fluid
import
paddle.fluid.layers
as
layers
from
paddle.fluid.layers.io
import
open_recordio_file
...
...
@@ -408,7 +409,7 @@ def transformer(
trg_pad_idx
,
pos_pad_idx
,
):
file_obj
=
open_recordio_file
(
filename
=
'/tmp/wmt16.recordio'
,
filename
=
os
.
environ
.
get
(
'RECORDIO_FILENAME'
,
'/tmp/wmt16.recordio'
)
,
shapes
=
[
[
batch_size
*
max_length
,
1
],
[
batch_size
*
max_length
,
1
],
...
...
python/paddle/fluid/transpiler/memory_optimization_transpiler.py
浏览文件 @
88d3dc94
...
...
@@ -540,6 +540,7 @@ def memory_optimize(input_program,
if
skip_opt_set
is
not
None
:
skip_opt_set
=
set
(
map
(
to_name_str
,
skip_opt_set
))
cfgs
=
_get_cfgs
(
input_program
)
input_program
.
_is_mem_optimized
=
True
for
cfg
in
cfgs
:
cfg
.
memory_optimize
(
skip_opt_set
=
skip_opt_set
,
level
=
level
)
...
...
@@ -559,5 +560,6 @@ def release_memory(input_program, skip_opt_set=None):
None
"""
cfgs
=
_get_cfgs
(
input_program
)
input_program
.
_is_mem_optimized
=
True
for
cfg
in
cfgs
:
cfg
.
release_memory
(
skip_opt_set
=
skip_opt_set
)
python/paddle/fluid/unique_name.py
浏览文件 @
88d3dc94
...
...
@@ -15,7 +15,7 @@
from
__future__
import
print_function
import
collections
import
contextlib
from
.wrapped_decorator
import
signature_safe_contextmanager
import
six
import
sys
...
...
@@ -68,7 +68,7 @@ def switch(new_generator=None):
return
old
@
contextlib
.
contextmanager
@
signature_safe_
contextmanager
def
guard
(
new_generator
=
None
):
if
isinstance
(
new_generator
,
six
.
string_types
):
new_generator
=
UniqueNameGenerator
(
new_generator
)
...
...
python/paddle/fluid/wrapped_decorator.py
0 → 100644
浏览文件 @
88d3dc94
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
decorator
import
contextlib
__all__
=
[
'wrap_decorator'
,
'signature_safe_contextmanager'
]
def
wrap_decorator
(
decorator_func
):
@
decorator
.
decorator
def
__impl__
(
func
,
*
args
,
**
kwargs
):
wrapped_func
=
decorator_func
(
func
)
return
wrapped_func
(
*
args
,
**
kwargs
)
return
__impl__
signature_safe_contextmanager
=
wrap_decorator
(
contextlib
.
contextmanager
)
python/requirements.txt
浏览文件 @
88d3dc94
...
...
@@ -11,3 +11,4 @@ graphviz
six
funcsigs
pyyaml
decorator
python/setup.py.in
浏览文件 @
88d3dc94
...
...
@@ -100,6 +100,7 @@ packages=['paddle',
'paddle.utils',
'paddle.dataset',
'paddle.reader',
'paddle.distributed',
'paddle.fluid',
'paddle.fluid.imperative',
'paddle.fluid.proto',
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录