提交 9ca8db23 编写于 作者: C chenweihang

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into squeeze_op

......@@ -18,6 +18,8 @@ learning to many products at Baidu.
Our vision is to enable deep learning for everyone via PaddlePaddle.
Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest feature of PaddlePaddle.
### Lastest PaddlePaddle Version: [Fluid](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/fluid)
## Features
- **Flexibility**
......
......@@ -257,8 +257,8 @@ function(cc_test TARGET_NAME)
set(multiValueArgs SRCS DEPS ARGS)
cmake_parse_arguments(cc_test "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
add_executable(${TARGET_NAME} ${cc_test_SRCS})
target_link_libraries(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main memory gtest gflags glog)
add_dependencies(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main memory gtest gflags glog)
target_link_libraries(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main lod_tensor memory gtest gflags glog)
add_dependencies(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main lod_tensor memory gtest gflags glog)
add_test(NAME ${TARGET_NAME}
COMMAND ${TARGET_NAME} ${cc_test_ARGS}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
......@@ -324,8 +324,8 @@ function(nv_test TARGET_NAME)
set(multiValueArgs SRCS DEPS)
cmake_parse_arguments(nv_test "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
cuda_add_executable(${TARGET_NAME} ${nv_test_SRCS})
target_link_libraries(${TARGET_NAME} ${nv_test_DEPS} paddle_gtest_main memory gtest gflags glog)
add_dependencies(${TARGET_NAME} ${nv_test_DEPS} paddle_gtest_main memory gtest gflags glog)
target_link_libraries(${TARGET_NAME} ${nv_test_DEPS} paddle_gtest_main lod_tensor memory gtest gflags glog)
add_dependencies(${TARGET_NAME} ${nv_test_DEPS} paddle_gtest_main lod_tensor memory gtest gflags glog)
add_test(${TARGET_NAME} ${TARGET_NAME})
if (nv_test_SERIAL)
set_property(TEST ${TARGET_NAME} PROPERTY SERIAL 1)
......
# Distributed Training with NCCL2
We design a pattern that can enable training with `ParallelExecutor` and
using [NCCL2](https://developer.nvidia.com/nccl) as it's collective
communication library.
In `ParallelExecutor` we can use `AllReduce` or `Reduce` and `Broadcast`
to do multi GPU training. And if we initialize NCCL2 communicators as
ranks in a distributed environment, we can simply run the `ParallelExecutor`
as a distributed program! The only thing that may be different than in
the single node version is that we need to broadcast the NCCL unique ID
to all the nodes, and initialize communicators using that ID, so NCCL2
will know each other as ranks.
To achieve this feature, we introduce a new operator: `gen_nccl_id` op,
so we are ***not*** "bind to" running NCCL2 with MPI, we can run it in
what ever platform you like.
It have two running modes:
1. Generate and broadcast mode, which should be used on trainer 0;
1. Listen and fetch mode, which should be used on trainers other than 0.
In both two modes, this op can save the NCCL ID into current scope as a
persistable variable, Then we can insert this op at the end of
"startup program" of fluid, so that all workers can get the same ID to
initialize NCCL communicator objects.
<img src="src/ncc2_design.png">
The above figure indicates the general process when training with NCCL2
distributed. Each trainer have the number of communicators equal to the
number of GPUs, but the ranks should match the global ranks number: here
we have total 8 GPUs, so `nranks==8`, for each trainer, the ranks should
be from 0 ~ 3 on trainer 0 and 4 ~ 7 on trainer 1.
......@@ -119,6 +119,32 @@ optimization algorithm $f$ runs on the storage service.
- Con: the storage service needs to be able to run the optimization
algorithm.
## Distributed Sparse Table in Fluid
For another design, we can implement a distributed sparse table in Fluid,
and don't need to maintain an external storage component while training.
You may need to read Fluid [Distributed Training Architecture](./distributed_architecture.md)
and [Parameter Server](./parameter_server.md) before going on.
![fluid lookup remote table](./src/fluid_lookup_remote_table.png)
Partition a large table into multiple pserver instances
1. `DistributeTranspiler` would split the table partitioned into some small
table blocks with some partitioned algorithms such as
[RoundRobin](https://en.wikipedia.org/wiki/Round-robin_scheduling),
[Hash](https://en.wikipedia.org/wiki/Hash) and etc...
1. For some cases, the range of input `Ids` is very wide and unpredictable, so the sparse
table would be able to fill a new value for the id that didn't appear before with
zero, uniform random or Gaussian distribution.
For each Trainer's training process:
1. In the forward pass, we use `pre-fetch` op to pre-fetch parameter blocks according to the
input `Ids` from PServers instead of the local `lookup_table` op, and then merge the blocks
into a parameter `W`.
1. Compute `GRAD@W'` in the backward pass using the pre-fetched `W` and send it to PServer to
execute the optimize pass.
## Conclusion
Let us do the "storage service does not optimize" solution first, as a
......
......@@ -52,7 +52,7 @@ In `trainer_internal.cpp:L93 trainOneBatch`:
When doing actual network forward and backward, at the beginning of each batch, the trainer will try to download one row of data from pserver.
In `trainer/RemoteParameterUpdater.cpp`: `parameterUpdater_->getParametersRemote();`:
In `legacy/trainer/RemoteParameterUpdater.cpp`: `parameterUpdater_->getParametersRemote();`:
```c++
if (fullSize) {
......
......@@ -18,20 +18,20 @@ Figure 1. PaddlePaddle on IA
具体的完成状态可以参见[这里](https://github.com/PaddlePaddle/Paddle/projects/21)
## Contents
- [Overview](#overview)
- [Actions](#actions)
- [CMake](#cmake)
- [Matrix](#matrix)
- [Layers](#layers)
- [Activations](#activations)
- [Parameters](#parameters)
- [Gradients](#gradients)
- [Unit Tests](#unit-tests)
- [Python API](#python-api)
- [Benchmarking](#benchmarking)
- [Others](#others)
- [Design Concerns](#design-concerns)
- [Overview](#overview)
- [Actions](#actions)
- [CMake](#cmake)
- [Matrix](#matrix)
- [Layers](#layers)
- [Activations](#activations)
- [Parameters](#parameters)
- [Gradients](#gradients)
- [Unit Tests](#unit-tests)
- [Python API](#python-api)
- [Benchmarking](#benchmarking)
- [Others](#others)
- [Design Concerns](#design-concerns)
## Overview
......@@ -218,20 +218,20 @@ if use_mkldnn
我们总结出一些特别需要注意的点:
1. 使用**deviceId_**。为了尽可能少的在父类Layer中添加变量或者函数,
我们决定使用已有的`deviceId_`变量来区分layer的属性,定义`-2``MKLDNNLayer`特有的设备ID。
2. 重写父类Layer的**init**函数,修改`deviceId_``-2`,代表这个layer是用于跑在MKL-DNN的环境下。
我们决定使用已有的`deviceId_`变量来区分layer的属性,定义`-2``MKLDNNLayer`特有的设备ID。
2. 重写父类Layer的**init**函数,修改`deviceId_``-2`,代表这个layer是用于跑在MKL-DNN的环境下。
3. 创建`MKLDNNBase`,定义一些除了layer和memory相关的类和函数。
包括MKL-DNN会用到`MKLDNNStream``CPUEngine`,和未来可能还会用到`FPGAEngine`等。
包括MKL-DNN会用到`MKLDNNStream``CPUEngine`,和未来可能还会用到`FPGAEngine`等。
4. 如果MKL-DNN layer的后面接有cpu device,那么就会使`output_.value``extOutVal_`共享内存,
同时数据格式就是`NCHW`,这样下一个cpu device就能拿到正确的数据。
在有普通的CPU layer时, `extOutVal_``extOutGrad_`的格式始终是`NCHW`或者`NC`
## References
1. [MKL small library](https://github.com/01org/mkl-dnn#linking-your-application)[Intel MKL](https://software.intel.com/en-us/mkl)的一个子集。
主要包括了深度学习相关的数学原语与操作,一般由MKL-DNN在发布[新版本](https://github.com/01org/mkl-dnn/releases)时一起更新。
主要包括了深度学习相关的数学原语与操作,一般由MKL-DNN在发布[新版本](https://github.com/01org/mkl-dnn/releases)时一起更新。
2. [MKL-DNN System Requirements](https://github.com/01org/mkl-dnn#system-requirements)
目前在PaddlePaddle中,仅会在支持AVX2指令集及以上的机器才使用MKL-DNN。
3. [原来的方案](https://github.com/PaddlePaddle/Paddle/pull/3096)会引入**nextLayer**的信息。
但是在PaddlePaddle中,无论是重构前的layer还是重构后的op,都不会想要知道next layer/op的信息。
但是在PaddlePaddle中,无论是重构前的layer还是重构后的op,都不会想要知道next layer/op的信息。
4. MKL-DNN的高性能格式与PaddlePaddle原有的`NCHW`不同(PaddlePaddle中的cuDNN部分使用的也是`NCHW`,所以不存在这个问题)。
所以需要引入一个转换方法,并且只需要在必要的时候转换这种格式,才能更好的发挥MKL-DNN的性能。
所以需要引入一个转换方法,并且只需要在必要的时候转换这种格式,才能更好的发挥MKL-DNN的性能。
......@@ -339,7 +339,7 @@ If you are creating a new file for the test, such as :code:`paddle/legacy/gserve
Implement Python Wrapper
========================
Implementing Python wrapper allows us to use the added layer in configuration files. All the Python wrappers are in file :code:`python/paddle/trainer/config_parser.py`. An example of the Python wrapper for fully connected layer is listed below. It has the following steps:
Implementing Python wrapper allows us to use the added layer in configuration files. All the Python wrappers are in file :code:`python/paddle/legacy/trainer/config_parser.py`. An example of the Python wrapper for fully connected layer is listed below. It has the following steps:
- Use :code:`@config_layer('fc')` at the decorator for all the Python wrapper class. :code:`fc` is the identifier of the layer.
- Implements :code:`__init__` constructor function.
......
......@@ -18,7 +18,7 @@
</tr>
<tr>
<td>cpu_avx_openblas</td>
<td>暂无</td>
<td><a href="https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuAvxOpenblas/.lastSuccessful/paddle.tgz" rel="nofollow">paddle.tgz</a></td>
</tr>
<tr>
<td>cpu_noavx_openblas</td>
......@@ -35,7 +35,12 @@
<tr>
<td>cuda8.0_cudnn7_avx_mkl</td>
<td><a href="https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddle.tgz" rel="nofollow">paddle.tgz</a></td>
</tr></tbody></table>
</tr>
<tr>
<td>cuda9.0_cudnn7_avx_mkl</td>
<td><a href="https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddle.tgz" rel="nofollow">paddle.tgz</a></td>
</tr>
</tbody></table>
### 从源码编译
......
......@@ -17,7 +17,7 @@
</tr>
<tr>
<td>cpu_avx_openblas</td>
<td>-</td>
<td><a href="https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuAvxOpenblas/.lastSuccessful/paddle.tgz" rel="nofollow">paddle.tgz</a></td>
</tr>
<tr>
<td>cpu_noavx_openblas</td>
......@@ -34,7 +34,12 @@
<tr>
<td>cuda8.0_cudnn7_avx_mkl</td>
<td><a href="https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/paddle.tgz" rel="nofollow">paddle.tgz</a></td>
</tr></tbody></table>
</tr>
<tr>
<td>cuda9.0_cudnn7_avx_mkl</td>
<td><a href="https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/paddle.tgz" rel="nofollow">paddle.tgz</a></td>
</tr>
</tbody></table>
### From source
......
if(NOT WITH_FLUID_ONLY)
add_subdirectory(legacy/cuda)
add_subdirectory(legacy/function)
add_subdirectory(utils)
add_subdirectory(legacy/utils)
add_subdirectory(legacy/math)
add_subdirectory(legacy/gserver)
add_subdirectory(legacy/parameter)
......@@ -10,7 +10,7 @@ if(NOT WITH_FLUID_ONLY)
add_subdirectory(legacy/capi)
else()
add_subdirectory(legacy/pserver)
add_subdirectory(trainer)
add_subdirectory(legacy/trainer)
add_subdirectory(scripts)
if(WITH_C_API)
......
......@@ -22,9 +22,9 @@
#include "paddle/contrib/inference/paddle_inference_api.h"
#include "paddle/fluid/framework/ddim.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/inference/io.h"
#include "paddle/fluid/platform/init.h"
#include "paddle/fluid/platform/profiler.h"
namespace paddle {
......
......@@ -21,10 +21,10 @@ endif()
cc_test(eigen_test SRCS eigen_test.cc DEPS tensor)
nv_test(mixed_vector_test SRCS mixed_vector_test.cu DEPS place memory device_context init)
nv_test(mixed_vector_test SRCS mixed_vector_test.cu DEPS place memory device_context tensor)
cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor framework_proto recordio)
cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor memory)
nv_test(lod_tensor_gpu_test SRCS lod_tensor_test.cu DEPS lod_tensor init)
nv_test(lod_tensor_gpu_test SRCS lod_tensor_test.cu DEPS lod_tensor)
cc_library(reader SRCS reader.cc DEPS lod_tensor ddim)
......@@ -38,7 +38,7 @@ cc_test(scope_test SRCS scope_test.cc DEPS scope)
cc_library(data_device_transform SRCS data_device_transform.cc DEPS tensor)
nv_test(data_device_transform_test SRCS data_device_transform_test.cu
DEPS operator op_registry init math_function)
DEPS operator op_registry device_context math_function)
if(WITH_GPU)
nv_library(data_type_transform SRCS data_type_transform.cu DEPS tensor)
......@@ -63,7 +63,7 @@ cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
cc_library(shape_inference SRCS shape_inference.cc DEPS ddim attribute device_context)
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope glog
shape_inference data_transform lod_tensor profiler)
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry init)
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry device_context)
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS shape_inference op_info operator glog)
cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog proto_desc)
......@@ -101,14 +101,14 @@ cc_test(var_type_inference_test SRCS var_type_inference_test.cc DEPS op_registry
cc_library(selected_rows SRCS selected_rows.cc DEPS tensor)
cc_test(selected_rows_test SRCS selected_rows_test.cc DEPS selected_rows)
cc_library(init SRCS init.cc DEPS gflags device_context place stringpiece operator)
cc_test(init_test SRCS init_test.cc DEPS init)
cc_test(op_kernel_type_test SRCS op_kernel_type_test.cc DEPS place device_context framework_proto)
cc_test(cow_ptr_tests SRCS details/cow_ptr_test.cc)
# cc_test(channel_test SRCS channel_test.cc)
cc_test(tuple_test SRCS tuple_test.cc )
cc_test(concurrency_test SRCS concurrency_test.cc DEPS go_op channel_close_op channel_create_op
channel_send_op channel_recv_op sum_op select_op elementwise_add_op compare_op
conditional_block_op while_op assign_op print_op executor proto_desc)
# disable test temporarily.
# TODO https://github.com/PaddlePaddle/Paddle/issues/11971
# cc_test(concurrency_test SRCS concurrency_test.cc DEPS go_op channel_close_op channel_create_op
# channel_send_op channel_recv_op sum_op select_op elementwise_add_op compare_op
# conditional_block_op while_op assign_op print_op executor proto_desc)
......@@ -14,13 +14,13 @@ limitations under the License. */
#include "gtest/gtest.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_info.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/elementwise_op_function.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/init.h"
namespace paddle {
namespace framework {
......
......@@ -62,7 +62,7 @@ std::vector<std::array<int, 3>> DataBalanceOpHandle::GetBalancePlan(
}
if (total_size < device_num) {
// No enough data.
PADDLE_THROW("There is no next data.");
PADDLE_THROW_EOF();
}
std::sort(size_device_vec.begin(), size_device_vec.end(),
[](const std::array<int, 2> &a, const std::array<int, 2> &b) {
......
......@@ -124,16 +124,10 @@ void OpHandleBase::RunAndRecordEvent(const std::function<void()> &callback) {
#ifdef PADDLE_WITH_CUDA
if (!events_.empty()) { // Use event
std::function<void()> method = callback;
// NOTE(zcd): device context must be ordered here because RecordEvent
// will use a mutex to ensure the safe of multi-threads.
std::map<platform::DeviceContext *, platform::Place> ordered_ctxes;
for (auto &p : dev_ctxes_) {
ordered_ctxes.emplace(p.second, p.first);
}
for (auto &p : ordered_ctxes) {
method = [method, p, this]() {
static_cast<platform::CUDADeviceContext *>(p.first)->RecordEvent(
events_.at(boost::get<platform::CUDAPlace>(p.second).device),
static_cast<platform::CUDADeviceContext *>(p.second)->RecordEvent(
events_.at(boost::get<platform::CUDAPlace>(p.first).device),
method);
};
}
......
......@@ -13,9 +13,9 @@
// limitations under the License.
#pragma once
#include <map>
#include <string>
#include <vector>
#include "paddle/fluid/framework/details/var_handle.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/macros.h"
......@@ -92,9 +92,7 @@ class OpHandleBase {
std::vector<VarHandleBase *> inputs_;
std::vector<VarHandleBase *> outputs_;
std::unordered_map<platform::Place, platform::DeviceContext *,
platform::PlaceHash>
dev_ctxes_;
std::map<platform::Place, platform::DeviceContext *> dev_ctxes_;
#ifdef PADDLE_WITH_CUDA
std::unordered_map<int, cudaEvent_t> events_;
......
......@@ -54,8 +54,7 @@ struct ReduceLoDTensor {
inline void GatherSelectedRows(
const std::vector<const SelectedRows *> &src_selecte_rows_,
const std::vector<platform::Place> &in_places,
const std::unordered_map<platform::Place, platform::DeviceContext *,
platform::PlaceHash> &dev_ctxes,
const std::map<platform::Place, platform::DeviceContext *> &dev_ctxes,
const platform::Place &out_place, SelectedRows *dst_selecte_rows) {
PADDLE_ENFORCE(!src_selecte_rows_.empty());
......
......@@ -98,9 +98,18 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
if (timeout) {
std::lock_guard<std::mutex> l(exception_mu_);
if (exception_) {
auto exp = *exception_;
exception_.reset();
throw exp;
std::exception *exp = exception_.get();
if (dynamic_cast<platform::EOFException *>(exp)) {
auto e = *static_cast<platform::EOFException *>(exp);
exception_.reset();
throw e;
} else if (dynamic_cast<platform::EnforceNotMet *>(exp)) {
auto e = *static_cast<platform::EnforceNotMet *>(exp);
exception_.reset();
throw e;
} else {
LOG(FATAL) << "Unknown exception.";
}
} else {
continue;
}
......@@ -199,6 +208,12 @@ void ThreadedSSAGraphExecutor::RunOp(
running_ops_--;
ready_var_q->Extend(op->Outputs());
VLOG(10) << op << " " << op->Name() << "Signal posted";
} catch (platform::EOFException ex) {
std::lock_guard<std::mutex> l(exception_mu_);
// EOFException will not cover up existing EnforceNotMet.
if (exception_.get() == nullptr) {
exception_.reset(new platform::EOFException(ex));
}
} catch (platform::EnforceNotMet ex) {
std::lock_guard<std::mutex> l(exception_mu_);
exception_.reset(new platform::EnforceNotMet(ex));
......
......@@ -57,7 +57,7 @@ class ThreadedSSAGraphExecutor : public SSAGraphExecutor {
std::vector<platform::Place> places_;
platform::DeviceContextPool fetch_ctxs_;
std::mutex exception_mu_;
std::unique_ptr<platform::EnforceNotMet> exception_;
std::unique_ptr<std::exception> exception_;
std::atomic<int> running_ops_;
void InsertPendingOp(std::unordered_map<OpHandleBase *, size_t> *pending_ops,
......
......@@ -17,9 +17,9 @@
#include <stdio.h>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/platform/assert.h"
#include "paddle/fluid/platform/init.h"
#include "paddle/fluid/platform/place.h"
__global__ void test(size_t* a, int size) {
......
......@@ -13,10 +13,10 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "gtest/gtest.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/op_info.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/platform/init.h"
namespace paddle {
namespace framework {
......
set(FLUID_CORE_MODULES proto_desc memory lod_tensor executor init)
set(FLUID_CORE_MODULES proto_desc memory lod_tensor executor )
# TODO(panyx0718): Should this be called paddle_fluid_inference_api_internal?
cc_library(paddle_fluid_api
......
......@@ -54,4 +54,5 @@ It can be used as a helper class that draws the modified graph after each pass.
There is some helper legacy/function/class for analysis.
- [dot.h](./dot.h) give a easy to use interface for generating `DOT` codes,
- [graph_traits.h](./graph_traits.h) contains the graph traversal algorithms, it uses `iterator` to make the algorithms easy to share across different passes.
- [graph_traits.h](./graph_traits.h) contains the interfaces of the graph traversal algorithms, it uses `iterator`to make the algorithms easy to share across different passes,
there are some implementations in [data_flow_graph.cc](./data_flow_graph.cc) , such as BFS and DFS..
......@@ -32,19 +32,6 @@ class Pass {
public:
Pass() = default;
virtual ~Pass() = default;
// Virtual method overridden by subclasses to do only necessary initialization
// before any pass is run.
// virtual bool Initialize() { return false; }
// There is some passes such as FlowToDataFlowGraphPass that needs a
// ProgramDesc. Here use the native ProgramDesc ProtoBuf message, so that it
// only couple with the proto file.
// virtual bool Initialize(const framework::proto::ProgramDesc &desc) { return
// false; }
// There are some Passes such as DataFlowGraphToFluidPass that will output a
// ProgramDesc.
// virtual bool Initialize(framework::proto::ProgramDesc *desc) { return
// false; }
// Mutable Pass.
virtual bool Initialize(Argument *argument) { return false; }
// Readonly Pass.
......
......@@ -20,7 +20,7 @@ limitations under the License. */
#include "paddle/fluid/framework/block_desc.h"
#include "paddle/fluid/framework/feed_fetch_type.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math/blas.h"
#include "paddle/fluid/platform/cpu_helper.h"
#include "paddle/fluid/pybind/pybind.h"
DEFINE_string(devices, "", "The devices to be used which is joined by comma.");
......@@ -33,7 +33,7 @@ namespace inference {
void Init(const std::vector<std::string> argv) {
framework::InitGflags(argv);
operators::math::SetNumThreads(FLAGS_math_num_threads);
platform::SetNumThreads(FLAGS_math_num_threads);
// init devices
std::vector<int> devices;
std::string token;
......
......@@ -18,9 +18,9 @@ limitations under the License. */
#include <string>
#include <vector>
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/platform/init.h"
namespace paddle {
namespace inference {
......
......@@ -19,7 +19,7 @@ limitations under the License. */
#include "gflags/gflags.h"
#include "gtest/gtest.h"
#include "paddle/fluid/inference/tests/test_helper.h"
#include "paddle/fluid/operators/math/blas.h"
#include "paddle/fluid/platform/cpu_helper.h"
#ifdef PADDLE_WITH_MKLML
#include <omp.h>
#endif
......@@ -164,7 +164,7 @@ TEST(inference, nlp) {
// only use 1 thread number per std::thread
omp_set_dynamic(0);
omp_set_num_threads(1);
paddle::operators::math::SetNumThreads(1);
paddle::platform::SetNumThreads(1);
#endif
double start_ms = 0, stop_ms = 0;
......
......@@ -5,7 +5,7 @@ if(WITH_GRPC)
set(DISTRIBUTE_COMPILE_FLAGS "-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor")
set_source_files_properties(grpc_serde_test.cc rpc_server_test.cc PROPERTIES COMPILE_FLAGS ${DISTRIBUTE_COMPILE_FLAGS})
cc_test(serde_test SRCS grpc_serde_test.cc variable_response.cc DEPS grpc++_unsecure grpc_unsecure gpr
cares zlib protobuf sendrecvop_grpc SERIAL)
cares zlib protobuf sendrecvop_grpc scope profiler math_function SERIAL)
cc_test(grpc_server_test SRCS rpc_server_test.cc DEPS sendrecvop_grpc
grpc++_unsecure grpc_unsecure gpr cares zlib protobuf executor
proto_desc lookup_table_op SERIAL)
......
......@@ -54,13 +54,13 @@ math_library(softmax DEPS math_function)
math_library(unpooling)
math_library(vol2col)
cc_test(math_function_test SRCS math_function_test.cc)
cc_test(math_function_test SRCS math_function_test.cc DEPS math_function)
cc_test(selected_rows_functor_test SRCS selected_rows_functor_test.cc DEPS selected_rows_functor)
cc_test(im2col_test SRCS im2col_test.cc DEPS im2col)
cc_test(vol2col_test SRCS vol2col_test.cc DEPS vol2col)
cc_test(sequence_padding_test SRCS sequence_padding_test.cc DEPS sequence_padding)
if(WITH_GPU)
nv_test(math_function_gpu_test SRCS math_function_test.cu)
nv_test(selected_rows_functor_gpu_test SRCS selected_rows_functor_test.cu DEPS selected_rows_functor)
nv_test(math_function_gpu_test SRCS math_function_test.cu DEPS math_function)
nv_test(selected_rows_functor_gpu_test SRCS selected_rows_functor_test.cu DEPS selected_rows_functor math_function)
endif()
cc_test(concat_test SRCS concat_test.cc DEPS concat)
......@@ -23,41 +23,12 @@
#ifdef PADDLE_USE_OPENBLAS
#include <cblas.h>
#ifdef LAPACK_FOUND
#include <lapacke.h>
#endif
#endif
#ifndef LAPACK_FOUND
extern "C" {
#include <cblas.h> // NOLINT
int LAPACKE_sgetrf(int matrix_layout, int m, int n, float* a, int lda,
int* ipiv);
int LAPACKE_dgetrf(int matrix_layout, int m, int n, double* a, int lda,
int* ipiv);
int LAPACKE_sgetri(int matrix_layout, int n, float* a, int lda,
const int* ipiv);
int LAPACKE_dgetri(int matrix_layout, int n, double* a, int lda,
const int* ipiv);
}
#endif
namespace paddle {
namespace operators {
namespace math {
static void SetNumThreads(int num_threads) {
#ifdef PADDLE_USE_OPENBLAS
int real_num_threads = num_threads > 1 ? num_threads : 1;
openblas_set_num_threads(real_num_threads);
#elif defined(PADDLE_WITH_MKLML)
int real_num_threads = num_threads > 1 ? num_threads : 1;
platform::dynload::MKL_Set_Num_Threads(real_num_threads);
#else
PADDLE_ENFORCE(false, "To be implemented.");
#endif
}
/**
* Matrix Descriptor of a memory buffer.
*
......
......@@ -19,23 +19,6 @@ limitations under the License. */
#ifdef PADDLE_USE_OPENBLAS
#include <cblas.h>
#ifdef LAPACK_FOUND
#include <lapacke.h>
#endif
#endif
#ifndef LAPACK_FOUND
extern "C" {
#include <cblas.h> // NOLINT
int LAPACKE_sgetrf(int matrix_layout, int m, int n, float* a, int lda,
int* ipiv);
int LAPACKE_dgetrf(int matrix_layout, int m, int n, double* a, int lda,
int* ipiv);
int LAPACKE_sgetri(int matrix_layout, int n, float* a, int lda,
const int* ipiv);
int LAPACKE_dgetri(int matrix_layout, int n, double* a, int lda,
const int* ipiv);
}
#endif
#include <cmath>
......
......@@ -19,7 +19,6 @@ limitations under the License. */
#include <thread> // NOLINT
#include <vector>
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/op_desc.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/program_desc.h"
......@@ -27,6 +26,7 @@ limitations under the License. */
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/gpu_info.h"
#include "paddle/fluid/platform/init.h"
#include "paddle/fluid/platform/place.h"
USE_NO_KERNEL_OP(ncclInit);
......
......@@ -14,7 +14,7 @@ limitations under the License. */
#include <vector>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
namespace paddle {
namespace operators {
......
......@@ -68,7 +68,7 @@ class ReadOp : public framework::OperatorBase {
reader->ReadNext(&ins);
if (ins.empty()) {
if (Attr<bool>("throw_eof_exp")) {
PADDLE_THROW("There is no next data.");
PADDLE_THROW_EOF();
} else {
ins.resize(out_arg_names.size());
for (auto& tensor : ins) {
......
......@@ -28,6 +28,9 @@ cc_test(place_test SRCS place_test.cc DEPS place glog gflags)
add_subdirectory(dynload)
cc_library(cpu_helper SRCS cpu_helper.cc DEPS cblas enforce)
cc_test(cpu_helper_test SRCS cpu_helper_test.cc DEPS cpu_helper)
IF(WITH_GPU)
set(GPU_CTX_DEPS dynload_cuda dynamic_loader)
ELSE()
......@@ -42,10 +45,12 @@ ENDIF()
# memcpy depends on device_context, here add deps individually for
# avoiding cycle dependencies
cc_library(device_context SRCS device_context.cc DEPS malloc
place eigen3 ${GPU_CTX_DEPS} ${MKLDNN_CTX_DEPS})
cc_library(device_context SRCS device_context.cc init.cc DEPS malloc
place eigen3 stringpiece cpu_helper ${GPU_CTX_DEPS} ${MKLDNN_CTX_DEPS})
nv_test(device_context_test SRCS device_context_test.cu DEPS device_context gpu_info)
cc_test(init_test SRCS init_test.cc DEPS device_context)
nv_test(cudnn_helper_test SRCS cudnn_helper_test.cc DEPS dynload_cuda)
nv_test(transform_test SRCS transform_test.cu DEPS memory place device_context)
......@@ -53,5 +58,5 @@ cc_library(device_tracer SRCS device_tracer.cc DEPS boost profiler_proto framewo
cc_library(profiler SRCS profiler.cc DEPS device_context device_tracer)
cc_test(profiler_test SRCS profiler_test.cc DEPS profiler)
nv_test(float16_gpu_test SRCS float16_test.cu)
cc_test(float16_test SRCS float16_test.cc)
nv_test(float16_gpu_test SRCS float16_test.cu DEPS lod_tensor)
cc_test(float16_test SRCS float16_test.cc DEPS lod_tensor)
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/platform/cpu_helper.h"
#include "paddle/fluid/platform/enforce.h"
#ifdef PADDLE_WITH_MKLML
#include "paddle/fluid/platform/dynload/mklml.h"
#endif
#ifdef PADDLE_USE_OPENBLAS
#include <cblas.h>
#endif
namespace paddle {
namespace platform {
void SetNumThreads(int num_threads) {
#ifdef PADDLE_USE_OPENBLAS
int real_num_threads = num_threads > 1 ? num_threads : 1;
openblas_set_num_threads(real_num_threads);
#elif defined(PADDLE_WITH_MKLML)
int real_num_threads = num_threads > 1 ? num_threads : 1;
platform::dynload::MKL_Set_Num_Threads(real_num_threads);
#else
PADDLE_ENFORCE(false, "To be implemented.");
#endif
}
} // namespace platform
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <stddef.h>
namespace paddle {
namespace platform {
//! Set the number of threads in use.
void SetNumThreads(int num_threads);
} // namespace platform
} // namespace paddle
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/platform/cpu_helper.h"
#include "gtest/gtest.h"
TEST(CpuHelper, SetNumThread) {
paddle::platform::SetNumThreads(1);
paddle::platform::SetNumThreads(4);
}
......@@ -10,6 +10,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/platform/device_context.h"
#include <set>
#include <string>
#include <unordered_set>
#include <vector>
......@@ -35,7 +36,7 @@ DeviceContextPool::DeviceContextPool(
const std::vector<platform::Place>& places) {
PADDLE_ENFORCE_GT(places.size(), 0);
using PtrType = std::unique_ptr<DeviceContext>;
std::unordered_set<Place, PlaceHash> set;
std::set<Place> set;
for (auto& p : places) {
set.insert(p);
}
......
......@@ -27,12 +27,12 @@ limitations under the License. */
#include <mkldnn.hpp>
#endif
#include <map>
#include "glog/logging.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/place.h"
#include "unsupported/Eigen/CXX11/Tensor"
#include "glog/logging.h"
namespace paddle {
namespace platform {
......@@ -201,9 +201,7 @@ class DeviceContextPool {
private:
static DeviceContextPool* pool;
std::unordered_map<const platform::Place,
std::unique_ptr<platform::DeviceContext>, PlaceHash>
device_contexts_;
std::map<Place, std::unique_ptr<DeviceContext>> device_contexts_;
DISABLE_COPY_AND_ASSIGN(DeviceContextPool);
};
......
......@@ -69,19 +69,3 @@ TEST(Device, DeviceContextPool) {
ASSERT_NE(dev_ctx, nullptr);
}
}
int main(int argc, char** argv) {
std::vector<paddle::platform::Place> places;
places.emplace_back(paddle::platform::CPUPlace());
int count = paddle::platform::GetCUDADeviceCount();
for (int i = 0; i < count; ++i) {
places.emplace_back(paddle::platform::CUDAPlace(i));
}
VLOG(0) << " DeviceCount " << count;
paddle::platform::DeviceContextPool::Init(places);
testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
......@@ -36,8 +36,6 @@ DEFINE_string(cuda_dir, "",
DEFINE_string(warpctc_dir, "", "Specify path for loading libwarpctc.so.");
DEFINE_string(lapack_dir, "", "Specify path for loading liblapack.so.");
DEFINE_string(nccl_dir, "",
"Specify path for loading nccl library, such as libcublas, "
"libcurand. For instance, /usr/local/cuda/lib64. If default, "
......@@ -189,14 +187,6 @@ void* GetWarpCTCDsoHandle() {
#endif
}
void* GetLapackDsoHandle() {
#if defined(__APPLE__) || defined(__OSX__)
return GetDsoHandleFromSearchPath(FLAGS_lapack_dir, "liblapacke.dylib");
#else
return GetDsoHandleFromSearchPath(FLAGS_lapack_dir, "liblapacke.so");
#endif
}
void* GetNCCLDsoHandle() {
#if defined(__APPLE__) || defined(__OSX__)
return GetDsoHandleFromSearchPath(FLAGS_nccl_dir, "libnccl.dylib");
......
......@@ -23,7 +23,6 @@ void* GetCUDNNDsoHandle();
void* GetCUPTIDsoHandle();
void* GetCurandDsoHandle();
void* GetWarpCTCDsoHandle();
void* GetLapackDsoHandle();
void* GetNCCLDsoHandle();
void* GetTensorRtDsoHandle();
void* GetMKLMLDsoHandle();
......
......@@ -102,6 +102,15 @@ struct EnforceNotMet : public std::exception {
const char* what() const noexcept { return err_str_.c_str(); }
};
struct EOFException : public std::exception {
std::string err_str_;
EOFException(const char* err_msg, const char* f, int l) {
err_str_ = string::Sprintf("%s at [%s:%d]", err_msg, f, l);
}
const char* what() const noexcept { return err_str_.c_str(); }
};
// Because most enforce conditions would evaluate to true, we can use
// __builtin_expect to instruct the C++ compiler to generate code that
// always forces branch prediction of true.
......@@ -242,6 +251,11 @@ inline void throw_on_error(T e) {
#define PADDLE_ENFORCE(...) ::paddle::platform::throw_on_error(__VA_ARGS__);
#endif
#define PADDLE_THROW_EOF() \
do { \
throw ::paddle::platform::EOFException("There is no next data.", __FILE__, \
__LINE__); \
} while (false)
/*
* Some enforce helpers here, usage:
* int a = 1;
......
......@@ -210,3 +210,14 @@ TEST(ENFORCE_USER_DEFINED_CLASS, NE) {
Dims a{{1, 2, 3, 4}}, b{{5, 6, 7, 8}};
ASSERT_THROW(PADDLE_ENFORCE_EQ(a, b), paddle::platform::EnforceNotMet);
}
TEST(EOF_EXCEPTION, THROW_EOF) {
bool caught_eof = false;
try {
PADDLE_THROW_EOF();
} catch (paddle::platform::EOFException error) {
caught_eof = true;
EXPECT_TRUE(HasPrefix(StringPiece(error.what()), "There is no next data."));
}
EXPECT_TRUE(caught_eof);
}
......@@ -13,8 +13,8 @@ limitations under the License. */
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/platform/init.h"
namespace paddle {
namespace platform {
......
......@@ -15,7 +15,7 @@ limitations under the License. */
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
#define ARITHMETIC_KERNEL(op_type, sign) \
__global__ void op_type(const half* in1, const half* in2, half* out) { \
......
......@@ -16,10 +16,10 @@ limitations under the License. */
#include <stdexcept>
#include <string>
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/operators/math/blas.h"
#include "paddle/fluid/platform/cpu_helper.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/init.h"
#include "paddle/fluid/platform/place.h"
#include "paddle/fluid/string/piece.h"
......@@ -115,7 +115,7 @@ void InitDevices(bool init_p2p, const std::vector<int> devices) {
places.emplace_back(platform::CPUPlace());
platform::DeviceContextPool::Init(places);
#ifndef PADDLE_WITH_MKLDNN
operators::math::SetNumThreads(1);
platform::SetNumThreads(1);
#endif
}
......
......@@ -13,8 +13,8 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "gtest/gtest.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/init.h"
TEST(InitDevices, CPU) {
using paddle::framework::InitDevices;
......
......@@ -30,6 +30,7 @@ struct CPUPlace {
// needed for variant equality comparison
inline bool operator==(const CPUPlace &) const { return true; }
inline bool operator!=(const CPUPlace &) const { return false; }
inline bool operator<(const CPUPlace &) const { return false; }
};
struct CUDAPlace {
......@@ -42,6 +43,7 @@ struct CUDAPlace {
return device == o.device;
}
inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); }
inline bool operator<(const CUDAPlace &o) const { return device < o.device; }
int device;
};
......@@ -52,6 +54,7 @@ struct CUDAPinnedPlace {
// needed for variant equality comparison
inline bool operator==(const CUDAPinnedPlace &) const { return true; }
inline bool operator!=(const CUDAPinnedPlace &) const { return false; }
inline bool operator<(const CUDAPinnedPlace &) const { return false; }
};
struct IsCUDAPlace : public boost::static_visitor<bool> {
......@@ -89,18 +92,6 @@ bool is_cuda_pinned_place(const Place &);
bool places_are_same_class(const Place &, const Place &);
bool is_same_place(const Place &, const Place &);
struct PlaceHash {
std::size_t operator()(const Place &p) const {
constexpr size_t num_dev_bits = 4;
std::hash<int> ihash;
size_t dev_id = 0;
if (is_gpu_place(p)) {
dev_id = boost::get<CUDAPlace>(p).device;
}
return ihash(dev_id << num_dev_bits | p.which());
}
};
std::ostream &operator<<(std::ostream &, const Place &);
template <typename Visitor>
......
......@@ -2,13 +2,13 @@ if(WITH_PYTHON)
if(WITH_AMD_GPU)
hip_library(paddle_pybind SHARED
SRCS pybind.cc exception.cc protobuf.cc const_value.cc recordio.cc
DEPS pybind python proto_desc memory executor prune init profiler feed_fetch_method
DEPS pybind python proto_desc memory executor prune profiler feed_fetch_method
parallel_executor
${GLOB_OP_LIB})
else()
cc_library(paddle_pybind SHARED
SRCS pybind.cc exception.cc protobuf.cc const_value.cc recordio.cc
DEPS pybind python proto_desc memory executor prune init profiler feed_fetch_method
DEPS pybind python proto_desc memory executor prune profiler feed_fetch_method
parallel_executor
${GLOB_OP_LIB})
if(NOT APPLE AND NOT ANDROID)
......
......@@ -18,10 +18,13 @@ namespace paddle {
namespace pybind {
void BindException(pybind11::module* m) {
static pybind11::exception<platform::EOFException> eof(*m, "EOFException");
static pybind11::exception<platform::EnforceNotMet> exc(*m, "EnforceNotMet");
pybind11::register_exception_translator([](std::exception_ptr p) {
try {
if (p) std::rethrow_exception(p);
} catch (const platform::EOFException& e) {
eof(e.what());
} catch (const platform::EnforceNotMet& e) {
exc(e.what());
}
......
......@@ -24,7 +24,6 @@ limitations under the License. */
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/feed_fetch_method.h"
#include "paddle/fluid/framework/framework.pb.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/lod_rank_table.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor_array.h"
......@@ -36,6 +35,7 @@ limitations under the License. */
#include "paddle/fluid/operators/activation_op.h"
#include "paddle/fluid/operators/reader/lod_tensor_blocking_queue.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/init.h"
#include "paddle/fluid/platform/place.h"
#include "paddle/fluid/platform/profiler.h"
#include "paddle/fluid/pybind/const_value.h"
......
......@@ -83,6 +83,13 @@ void Fprintf(std::ostream& out, const char* fmt, const Args&... args) {
tinyformat::vformat(out, fmt, tinyformat::makeFormatList(args...));
}
template <typename... Args>
std::string Sprintf(const Args&... args) {
std::ostringstream oss;
Fprintf(oss, "");
return oss.str();
}
template <typename... Args>
std::string Sprintf(const char* fmt, const Args&... args) {
std::ostringstream oss;
......
......@@ -27,4 +27,5 @@ TEST(StringPrintf, StringPrintf) {
EXPECT_EQ(std::string("Wednesday, July 27, 14:44"),
paddle::string::Sprintf("%s, %s %d, %.2d:%.2d", weekday, month, day,
hour, min));
EXPECT_EQ(std::string(""), paddle::string::Sprintf());
}
......@@ -15,11 +15,11 @@
#include <fstream>
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/init.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/init.h"
#include "paddle/fluid/platform/place.h"
namespace paddle {
......
......@@ -14,7 +14,7 @@ limitations under the License. */
#include "PaddleAPI.h"
#include "PaddleAPIPrivate.h"
#include "paddle/trainer/Trainer.h"
#include "paddle/legacy/trainer/Trainer.h"
struct ParameterConfigPrivate {
paddle::ParameterPtr parameter;
......
......@@ -198,5 +198,5 @@ namespace std {
%ignore ParameterConfigPrivate;
%ignore OptimizationConfigPrivate;
%ignore ParameterTraverseCallbackPrivate;
%include "utils/GlobalConstants.h"
%include "legacy/utils/GlobalConstants.h"
%include "legacy/api/PaddleAPI.h"
......@@ -20,8 +20,8 @@ limitations under the License. */
#include <string>
#include <vector>
#include "paddle/legacy/gserver/gradientmachines/GradientMachine.h"
#include "paddle/utils/Common.h"
#include "paddle/utils/GlobalConstants.h"
#include "paddle/legacy/utils/Common.h"
#include "paddle/legacy/utils/GlobalConstants.h"
/// Import PaddlePaddle's enumeration into global namespace.
using namespace paddle::enumeration_wrapper; // NOLINT
......
......@@ -17,7 +17,7 @@ limitations under the License. */
#include "paddle/legacy/gserver/evaluators/Evaluator.h"
#include "paddle/legacy/gserver/gradientmachines/GradientMachine.h"
#include "paddle/legacy/parameter/ParameterUpdaterBase.h"
#include "paddle/trainer/TrainerConfigHelper.h"
#include "paddle/legacy/trainer/TrainerConfigHelper.h"
struct GradientMachinePrivate {
std::shared_ptr<paddle::GradientMachine> machine;
......
......@@ -16,10 +16,10 @@ limitations under the License. */
#include "PaddleAPIPrivate.h"
#ifndef PADDLE_WITHOUT_GOLANG
#include "paddle/trainer/NewRemoteParameterUpdater.h"
#include "paddle/legacy/trainer/NewRemoteParameterUpdater.h"
#endif
#include "paddle/trainer/RemoteParameterUpdater.h"
#include "paddle/trainer/ThreadParameterUpdater.h"
#include "paddle/legacy/trainer/RemoteParameterUpdater.h"
#include "paddle/legacy/trainer/ThreadParameterUpdater.h"
ParameterUpdater::ParameterUpdater() : m(new ParameterUpdaterPrivate()) {}
......
......@@ -19,7 +19,7 @@ limitations under the License. */
#include "PaddleAPI.h"
#include "paddle/legacy/gserver/gradientmachines/GradientMachine.h"
#include "paddle/legacy/parameter/Argument.h"
#include "paddle/utils/Flags.h"
#include "paddle/legacy/utils/Flags.h"
// used to represent partial sequence
struct Path {
......
......@@ -20,10 +20,10 @@ limitations under the License. */
#include <memory>
#include "paddle/legacy/gserver/gradientmachines/NeuralNetwork.h"
#include "paddle/trainer/ParamUtil.h"
#include "paddle/trainer/Trainer.h"
#include "paddle/trainer/TrainerInternal.h"
#include "paddle/utils/Flags.h"
#include "paddle/legacy/trainer/ParamUtil.h"
#include "paddle/legacy/trainer/Trainer.h"
#include "paddle/legacy/trainer/TrainerInternal.h"
#include "paddle/legacy/utils/Flags.h"
using paddle::real;
......
......@@ -15,10 +15,10 @@ limitations under the License. */
#include "PaddleAPI.h"
#include "paddle/legacy/parameter/Parameter.h"
#include "paddle/utils/Common.h"
#include "paddle/utils/Flags.h"
#include "paddle/utils/PythonUtil.h"
#include "paddle/utils/Util.h"
#include "paddle/legacy/utils/Common.h"
#include "paddle/legacy/utils/Flags.h"
#include "paddle/legacy/utils/PythonUtil.h"
#include "paddle/legacy/utils/Util.h"
#include <algorithm>
#include <iostream>
......
......@@ -18,9 +18,9 @@ limitations under the License. */
#include <vector>
#include "capi_private.h"
#include "main.h"
#include "paddle/trainer/TrainerConfigHelper.h"
#include "paddle/utils/Excepts.h"
#include "paddle/utils/PythonUtil.h"
#include "paddle/legacy/trainer/TrainerConfigHelper.h"
#include "paddle/legacy/utils/Excepts.h"
#include "paddle/legacy/utils/PythonUtil.h"
static void initPaddle(int argc, char** argv) {
paddle::initMain(argc, argv);
......
......@@ -15,7 +15,7 @@ limitations under the License. */
#include <functional>
#include "capi.h"
#include "gtest/gtest.h"
#include "paddle/utils/ThreadLocal.h"
#include "paddle/legacy/utils/ThreadLocal.h"
static std::vector<paddle_real> randomBuffer(size_t bufSize) {
auto& eng = paddle::ThreadLocalRandomEngine::get();
......
......@@ -14,12 +14,12 @@ limitations under the License. */
#include <gtest/gtest.h>
#include <paddle/legacy/gserver/gradientmachines/GradientMachine.h>
#include <paddle/trainer/TrainerConfigHelper.h>
#include <paddle/legacy/trainer/TrainerConfigHelper.h>
#include <stdlib.h>
#include <string.h>
#include <type_traits>
#include "capi.h"
#include "paddle/utils/ThreadLocal.h"
#include "paddle/legacy/utils/ThreadLocal.h"
static std::vector<paddle_real> randomBuffer(size_t bufSize) {
auto& eng = paddle::ThreadLocalRandomEngine::get();
......
......@@ -208,7 +208,7 @@ typedef struct {
#include <cuda_runtime.h>
#include "paddle/legacy/cuda/include/hl_cuda.h"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
extern __thread bool g_sync_flag;
extern __thread cudaStream_t default_stream;
......
......@@ -18,7 +18,7 @@ limitations under the License. */
#ifdef __NVCC__
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
/*
* threads(framePerBlock, batchPerBlock)
......
......@@ -18,7 +18,7 @@ limitations under the License. */
#ifdef __NVCC__
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
#include "hl_device_functions.cuh"
/*
......
......@@ -18,7 +18,7 @@ limitations under the License. */
#define HL_GPU_MATRIX_KERNEL_CUH_
#include <algorithm>
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
#include "hl_base.h"
#ifdef __NVCC__
......
......@@ -18,7 +18,7 @@ limitations under the License. */
#include "hl_cuda.ph"
#include "hl_matrix_base.cuh"
#include "hl_thread.ph"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
/**
* @brief matrix row operator.
......
......@@ -16,8 +16,8 @@ limitations under the License. */
#include <sys/time.h>
#include "hl_cuda.h"
#include "hl_thread.ph"
#include "paddle/utils/DynamicLoader.h"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/DynamicLoader.h"
#include "paddle/legacy/utils/Logging.h"
namespace dynload {
......
......@@ -17,8 +17,8 @@ limitations under the License. */
#include <gflags/gflags.h>
#include "hl_cuda_cudnn.ph"
#include "hl_thread.ph"
#include "paddle/utils/DynamicLoader.h"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/DynamicLoader.h"
#include "paddle/legacy/utils/Logging.h"
DEFINE_int32(cudnn_conv_workspace_limit_in_mb,
4096,
......
......@@ -23,8 +23,8 @@ limitations under the License. */
#include <unistd.h>
#include "hl_cuda.ph"
#include "hl_thread.ph"
#include "paddle/utils/Logging.h"
#include "paddle/utils/DynamicLoader.h"
#include "paddle/legacy/utils/Logging.h"
#include "paddle/legacy/utils/DynamicLoader.h"
// clang-format on
namespace dynload {
......
......@@ -16,7 +16,7 @@ limitations under the License. */
#include "hl_base.h"
#include "hl_cuda_cublas.h"
#include "hl_device_functions.cuh"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
typedef hppl::Active<real>::forward t_forward;
typedef hppl::Active<real>::backward t_backward;
......
......@@ -20,7 +20,7 @@ limitations under the License. */
#include "hl_matrix_ops.cuh"
#include "hl_sequence.h"
#include "hl_sparse.ph"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
DEFINE_MATRIX_UNARY_OP(Zero, a = 0);
DEFINE_MATRIX_TERNARY_PARAMETER_OP(_add, TWO_PARAMETER, c = p1 * a + p2 * b);
......
......@@ -14,7 +14,7 @@ limitations under the License. */
#include "hl_base.h"
#include "hl_device_functions.cuh"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
__global__ void KeMaxSequenceForward(real* input,
const int* sequence,
......
......@@ -18,7 +18,7 @@ limitations under the License. */
#include "hl_matrix_ops.cuh"
#include "hl_sparse.h"
#include "hl_sparse.ph"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
DEFINE_MATRIX_UNARY_PARAMETER_OP(mul_scalar, ONE_PARAMETER, a = a * p);
DEFINE_MATRIX_UNARY_OP(Zero, a = 0);
......
......@@ -15,7 +15,7 @@ limitations under the License. */
#include "hl_base.h"
#include "hl_cuda.h"
#include "hl_device_functions.cuh"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
template <int blockDimX, int blockDimY, int gridDimX, bool AddRow>
__global__ void KeMatrixAddRows(real* output,
......
......@@ -15,7 +15,7 @@ limitations under the License. */
#include "paddle/legacy/cuda/include/hl_base.h"
#include "paddle/legacy/cuda/include/hl_sparse.ph"
#include "paddle/legacy/cuda/include/hl_top_k.h"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/Logging.h"
// using namespace hppl;
......
......@@ -14,8 +14,8 @@ limitations under the License. */
#include "hl_warpctc_wrap.h"
#include <mutex>
#include "paddle/utils/DynamicLoader.h"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/DynamicLoader.h"
#include "paddle/legacy/utils/Logging.h"
namespace dynload {
......
......@@ -18,9 +18,9 @@ limitations under the License. */
#include <vector>
#include "BufferArg.h"
#include "paddle/legacy/math/Matrix.h"
#include "paddle/utils/Any.h"
#include "paddle/utils/ClassRegistrar.h"
#include "paddle/utils/Error.h"
#include "paddle/legacy/utils/Any.h"
#include "paddle/legacy/utils/ClassRegistrar.h"
#include "paddle/legacy/utils/Error.h"
namespace paddle {
......
......@@ -15,7 +15,7 @@ limitations under the License. */
#include "MulOp.h"
#include "GemmFunctor.h"
#include "paddle/legacy/math/SIMDFunctions.h"
#include "paddle/utils/ThreadLocal.h"
#include "paddle/legacy/utils/ThreadLocal.h"
namespace {
inline void vecAddTo(real* a, const real* b, real scaleB, size_t len) {
......
......@@ -21,8 +21,8 @@ limitations under the License. */
#include <thread>
#include <type_traits>
#include "paddle/legacy/parameter/Argument.h"
#include "paddle/utils/ClassRegistrar.h"
#include "paddle/utils/Logging.h"
#include "paddle/legacy/utils/ClassRegistrar.h"
#include "paddle/legacy/utils/Logging.h"
#ifdef PADDLE_WITH_MKLDNN
#include "MKLDNNActivation.h"
......
......@@ -15,7 +15,7 @@ limitations under the License. */
#pragma once
#include <string>
#include <vector>
#include "paddle/utils/Error.h"
#include "paddle/legacy/utils/Error.h"
namespace paddle {
......
......@@ -14,7 +14,7 @@ limitations under the License. */
#include "MKLDNNActivation.h"
#include "mkldnn.hpp"
#include "paddle/utils/ClassRegistrar.h"
#include "paddle/legacy/utils/ClassRegistrar.h"
namespace paddle {
......
......@@ -16,10 +16,10 @@ limitations under the License. */
#include <unistd.h>
#include <algorithm>
#include "paddle/utils/Logging.h"
#include "paddle/utils/Stat.h"
#include "paddle/utils/StringUtil.h"
#include "paddle/utils/Util.h"
#include "paddle/legacy/utils/Logging.h"
#include "paddle/legacy/utils/Stat.h"
#include "paddle/legacy/utils/StringUtil.h"
#include "paddle/legacy/utils/Util.h"
namespace paddle {
......
......@@ -29,13 +29,13 @@ limitations under the License. */
#include "paddle/legacy/math/SparseMatrix.h"
#include "paddle/legacy/math/Vector.h"
#include "paddle/legacy/parameter/Argument.h"
#include "paddle/utils/ClassRegistrar.h"
#include "paddle/utils/Common.h"
#include "paddle/utils/Locks.h"
#include "paddle/utils/Logging.h"
#include "paddle/utils/Queue.h"
#include "paddle/utils/ThreadLocal.h"
#include "paddle/utils/Util.h"
#include "paddle/legacy/utils/ClassRegistrar.h"
#include "paddle/legacy/utils/Common.h"
#include "paddle/legacy/utils/Locks.h"
#include "paddle/legacy/utils/Logging.h"
#include "paddle/legacy/utils/Queue.h"
#include "paddle/legacy/utils/ThreadLocal.h"
#include "paddle/legacy/utils/Util.h"
namespace paddle {
/**
......
......@@ -14,8 +14,8 @@ limitations under the License. */
#include "MultiDataProvider.h"
#include <algorithm>
#include "paddle/utils/Logging.h"
#include "paddle/utils/Util.h"
#include "paddle/legacy/utils/Logging.h"
#include "paddle/legacy/utils/Util.h"
namespace paddle {
......
......@@ -13,9 +13,9 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "PyDataProvider.h"
#include "paddle/utils/Common.h"
#include "paddle/utils/PythonUtil.h"
#include "paddle/utils/Util.h"
#include "paddle/legacy/utils/Common.h"
#include "paddle/legacy/utils/PythonUtil.h"
#include "paddle/legacy/utils/Util.h"
namespace paddle {
......
......@@ -14,7 +14,7 @@ limitations under the License. */
#pragma once
#include <paddle/utils/PythonUtil.h>
#include <paddle/legacy/utils/PythonUtil.h>
#include "DataFormat.pb.h"
#include "DataProvider.h"
......
......@@ -25,9 +25,9 @@ limitations under the License. */
#include "DataProvider.h"
#include "paddle/utils/Locks.h"
#include "paddle/utils/PythonUtil.h"
#include "paddle/utils/Stat.h"
#include "paddle/legacy/utils/Locks.h"
#include "paddle/legacy/utils/PythonUtil.h"
#include "paddle/legacy/utils/Stat.h"
namespace paddle {
......
......@@ -14,7 +14,7 @@ limitations under the License. */
#include "Evaluator.h"
#include "paddle/legacy/gserver/gradientmachines/NeuralNetwork.h"
#include "paddle/utils/StringUtil.h"
#include "paddle/legacy/utils/StringUtil.h"
namespace paddle {
......
......@@ -16,7 +16,7 @@ limitations under the License. */
#include <vector>
#include "paddle/legacy/math/Vector.h"
#include "paddle/utils/StringUtil.h"
#include "paddle/legacy/utils/StringUtil.h"
#include "Evaluator.h"
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册