diff --git a/doc/howto/optimization/cpu_profiling.md b/doc/howto/optimization/cpu_profiling.md new file mode 100644 index 0000000000000000000000000000000000000000..32d89a7c183d57e0e69039dfb2c78703d9866f7c --- /dev/null +++ b/doc/howto/optimization/cpu_profiling.md @@ -0,0 +1,163 @@ +此教程会介绍如何使用Python的cProfile包,与Python库yep,google perftools来运行性能分析(Profiling)与调优。 + +运行性能分析可以让开发人员科学的,有条不紊的对程序进行性能优化。性能分析是性能调优的基础。因为在程序实际运行中,真正的瓶颈可能和程序员开发过程中想象的瓶颈相去甚远。 + +性能优化的步骤,通常是循环重复若干次『性能分析 --> 寻找瓶颈 ---> 调优瓶颈 --> 性能分析确认调优效果』。其中性能分析是性能调优的至关重要的量化指标。 + +Paddle提供了Python语言绑定。用户使用Python进行神经网络编程,训练,测试。Python解释器通过`pybind`和`swig`调用Paddle的动态链接库,进而调用Paddle C++部分的代码。所以Paddle的性能分析与调优分为两个部分: + +* Python代码的性能分析 +* Python与C++混合代码的性能分析 + + +## Python代码的性能分析 + +### 生成性能分析文件 + +Python标准库中提供了性能分析的工具包,[cProfile](https://docs.python.org/2/library/profile.html)。生成Python性能分析的命令如下: + +```bash +python -m cProfile -o profile.out main.py +``` + +其中`-o`标识了一个输出的文件名,用来存储本次性能分析的结果。如果不指定这个文件,`cProfile`会打印一些统计信息到`stdout`。这不方便我们进行后期处理(进行`sort`, `split`, `cut`等等)。 + +### 查看性能分析文件 + +当main.py运行完毕后,性能分析结果文件`profile.out`就生成出来了。我们可以使用[cprofilev](https://github.com/ymichael/cprofilev)来查看性能分析结果。`cprofilev`是一个Python的第三方库。使用它会开启一个HTTP服务,将性能分析结果以网页的形式展示出来。 + +使用`pip install cprofilev`安装`cprofilev`工具。安装完成后,使用如下命令开启HTTP服务 + +```bash +cprofilev -a 0.0.0.0 -p 3214 -f profile.out main.py +``` + +其中`-a`标识HTTP服务绑定的IP。使用`0.0.0.0`允许外网访问这个HTTP服务。`-p`标识HTTP服务的端口。`-f`标识性能分析的结果文件。`main.py`标识被性能分析的源文件。 + +访问对应网址,即可显示性能分析的结果。性能分析结果格式如下: + +```text + ncalls tottime percall cumtime percall filename:lineno(function) + 1 0.284 0.284 29.514 29.514 main.py:1() + 4696 0.128 0.000 15.748 0.003 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/executor.py:20(run) + 4696 12.040 0.003 12.040 0.003 {built-in method run} + 1 0.144 0.144 6.534 6.534 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/__init__.py:14() +``` + +每一列的含义是: + +| 列名 | 含义 | +| --- | --- | +| ncalls | 函数的调用次数 | +| tottime | 函数实际使用的总时间。该时间去除掉本函数调用其他函数的时间 | +| percall | tottime的每次调用平均时间 | +| cumtime | 函数总时间。包含这个函数调用其他函数的时间 | +| percall | cumtime的每次调用平均时间 | +| filename:lineno(function) | 文件名, 行号,函数名 | + + +### 寻找性能瓶颈 + +通常`tottime`和`cumtime`是寻找瓶颈的关键指标。这两个指标代表了某一个函数真实的运行时间。 + +将性能分析结果按照tottime排序,效果如下: + +```text + 4696 12.040 0.003 12.040 0.003 {built-in method run} + 300005 0.874 0.000 1.681 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/dataset/mnist.py:38(reader) + 107991 0.676 0.000 1.519 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:219(__init__) + 4697 0.626 0.000 2.291 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:428(sync_with_cpp) + 1 0.618 0.618 0.618 0.618 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/__init__.py:1() + +``` + +可以看到最耗时的函数是C++端的`run`函数。这需要联合我们第二节`Python与C++混合代码的性能分析`来进行调优。而`sync_with_cpp`函数的总共耗时很长,每次调用的耗时也很长。于是我们可以点击`sync_with_cpp`的详细信息,了解其调用关系。 + +```text +Called By: + + Ordered by: internal time + List reduced from 4497 to 2 due to restriction <'sync_with_cpp'> + +Function was called by... + ncalls tottime cumtime +/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:428(sync_with_cpp) <- 4697 0.626 2.291 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:562(sync_with_cpp) +/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:562(sync_with_cpp) <- 4696 0.019 2.316 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:487(clone) + 1 0.000 0.001 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:534(append_backward) + + +Called: + + Ordered by: internal time + List reduced from 4497 to 2 due to restriction <'sync_with_cpp'> +``` + +通常观察热点函数间的调用关系,和对应行的代码,就可以了解到问题代码在哪里。当我们做出性能修正后,再次进行性能分析(profiling)即可检查我们调优后的修正是否能够改善程序的性能。 + + + +## Python与C++混合代码的性能分析 + +### 生成性能分析文件 + +C++的性能分析工具非常多。常见的包括`gprof`, `valgrind`, `google-perftools`。但是调试Python中使用的动态链接库与直接调试原始二进制相比增加了很多复杂度。幸而Python的一个第三方库`yep`提供了方便的和`google-perftools`交互的方法。于是这里使用`yep`进行Python与C++混合代码的性能分析 + +使用`yep`前需要安装`google-perftools`与`yep`包。ubuntu下安装命令为 + +```bash +apt install libgoogle-perftools-dev +pip install yep +``` + +安装完毕后,我们可以通过 + +```bash +python -m yep -v main.py +``` + +生成性能分析文件。生成的性能分析文件为`main.py.prof`。 + +命令行中的`-v`指定在生成性能分析文件之后,在命令行显示分析结果。我们可以在命令行中简单的看一下生成效果。因为C++与Python不同,编译时可能会去掉调试信息,运行时也可能因为多线程产生混乱不可读的性能分析结果。为了生成更可读的性能分析结果,可以采取下面几点措施: + +1. 编译时指定`-g`生成调试信息。使用cmake的话,可以将CMAKE_BUILD_TYPE指定为`RelWithDebInfo`。 +2. 编译时一定要开启优化。单纯的`Debug`编译性能会和`-O2`或者`-O3`有非常大的差别。`Debug`模式下的性能测试是没有意义的。 +3. 运行性能分析的时候,先从单线程开始,再开启多线程,进而多机。毕竟如果单线程调试更容易。可以设置`OMP_NUM_THREADS=1`这个环境变量关闭openmp优化。 + +### 查看性能分析文件 + +在运行完性能分析后,会生成性能分析结果文件。我们可以使用[pprof](https://github.com/google/pprof)来显示性能分析结果。注意,这里使用了用`Go`语言重构后的`pprof`,因为这个工具具有web服务界面,且展示效果更好。 + +安装`pprof`的命令和一般的`Go`程序是一样的,其命令如下: + +```bash +go get github.com/google/pprof +``` + +进而我们可以使用如下命令开启一个HTTP服务: + +```bash +pprof -http=0.0.0.0:3213 `which python` ./main.py.prof +``` + +这行命令中,`-http`指开启HTTP服务。`which python`会产生当前Python二进制的完整路径,进而指定了Python可执行文件的路径。`./main.py.prof`输入了性能分析结果。 + +访问对应的网址,我们可以查看性能分析的结果。结果如下图所示: + +![result](./pprof_1.png) + + +### 寻找性能瓶颈 + +与寻找Python代码的性能瓶颈类似,寻找Python与C++混合代码的性能瓶颈也是要看`tottime`和`cumtime`。而`pprof`展示的调用图也可以帮助我们发现性能中的问题。 + +例如下图中, + +![kernel_perf](./pprof_2.png) + +在一次训练中,乘法和乘法梯度的计算占用2%-4%左右的计算时间。而`MomentumOp`占用了17%左右的计算时间。显然,`MomentumOp`的性能有问题。 + +在`pprof`中,对于性能的关键路径都做出了红色标记。先检查关键路径的性能问题,再检查其他部分的性能问题,可以更有次序的完成性能的优化。 + +## 总结 + +至此,两种性能分析的方式都介绍完毕了。希望通过这两种性能分析的方式,Paddle的开发人员和使用人员可以有次序的,科学的发现和解决性能问题。 diff --git a/doc/howto/optimization/pprof_1.png b/doc/howto/optimization/pprof_1.png new file mode 100644 index 0000000000000000000000000000000000000000..8e9edbf377672d0ef40f2fc7bd39e746923550cb Binary files /dev/null and b/doc/howto/optimization/pprof_1.png differ diff --git a/doc/howto/optimization/pprof_2.png b/doc/howto/optimization/pprof_2.png new file mode 100644 index 0000000000000000000000000000000000000000..172ba20399ba974d27f4c072425277b69b02520b Binary files /dev/null and b/doc/howto/optimization/pprof_2.png differ diff --git a/paddle/gserver/layers/ROIPoolLayer.cpp b/paddle/gserver/layers/ROIPoolLayer.cpp index 02402894d3354a6af221948a3360ef830881bf39..2c8256b91c97b513ce7237b8174c522430094926 100644 --- a/paddle/gserver/layers/ROIPoolLayer.cpp +++ b/paddle/gserver/layers/ROIPoolLayer.cpp @@ -13,6 +13,7 @@ See the License for the specific language governing permissions and limitations under the License. */ #include "ROIPoolLayer.h" +#include namespace paddle { @@ -126,10 +127,8 @@ void ROIPoolLayer::forward(PassType passType) { bool isEmpty = (hend <= hstart) || (wend <= wstart); size_t poolIndex = ph * pooledWidth_ + pw; - if (isEmpty) { - outputData[poolIndex] = 0; - argmaxData[poolIndex] = -1; - } + outputData[poolIndex] = isEmpty ? 0 : -FLT_MAX; + argmaxData[poolIndex] = -1; for (size_t h = hstart; h < hend; ++h) { for (size_t w = wstart; w < wend; ++w) { diff --git a/paddle/operators/CMakeLists.txt b/paddle/operators/CMakeLists.txt index 7ab09b6c6541b5bb1ec7503da85bacfda3a0112f..05d4ea26067e04d2ebcd1a434451a26f6e82f41d 100644 --- a/paddle/operators/CMakeLists.txt +++ b/paddle/operators/CMakeLists.txt @@ -73,6 +73,13 @@ function(op_library TARGET) file(APPEND ${pybind_file} "USE_OP(conv2d);\n") endif() + # conv_cudnn_op contains several operators + if ("${TARGET}" STREQUAL "conv_cudnn_op") + set(pybind_flag 1) + # It's enough to just adding one operator to pybind + file(APPEND ${pybind_file} "USE_OP(conv2d_cudnn);\n") + endif() + # pool_op contains several operators if ("${TARGET}" STREQUAL "pool_op") set(pybind_flag 1) diff --git a/paddle/operators/conv_cudnn_op.cc b/paddle/operators/conv_cudnn_op.cc index c03dc3e4fb07ac6ecde42be93a1138d91778edf4..0dd8c13b2ad6ff206066ccb98a4c009e4c3b4fd0 100644 --- a/paddle/operators/conv_cudnn_op.cc +++ b/paddle/operators/conv_cudnn_op.cc @@ -17,10 +17,10 @@ namespace paddle { namespace operators { -class CudnnConvOpMaker : public Conv2DOpMaker { +class CudnnConv2DOpMaker : public Conv2DOpMaker { public: - CudnnConvOpMaker(framework::OpProto* proto, - framework::OpAttrChecker* op_checker) + CudnnConv2DOpMaker(framework::OpProto* proto, + framework::OpAttrChecker* op_checker) : Conv2DOpMaker(proto, op_checker) { AddAttr("workspace_size_MB", "workspace size for cudnn, in MB, " @@ -32,16 +32,43 @@ class CudnnConvOpMaker : public Conv2DOpMaker { } }; +class CudnnConv3DOpMaker : public Conv3DOpMaker { + public: + CudnnConv3DOpMaker(framework::OpProto* proto, + framework::OpAttrChecker* op_checker) + : Conv3DOpMaker(proto, op_checker) { + AddAttr("workspace_size_MB", + "workspace size for cudnn, in MB, " + "workspace is a section of GPU memory which will be " + "allocated/freed each time the operator runs, larger " + "workspace size can increase performance but also requires " + "better hardware. This size should be chosen carefully.") + .SetDefault(4096); + } +}; + } // namespace operators } // namespace paddle namespace ops = paddle::operators; -REGISTER_OP(conv_cudnn, ops::ConvOp, ops::CudnnConvOpMaker, conv_cudnn_grad, - ops::ConvOpGrad); +REGISTER_OP(conv2d_cudnn, ops::ConvOp, ops::CudnnConv2DOpMaker, + conv2d_cudnn_grad, ops::ConvOpGrad); + +REGISTER_OP(conv3d_cudnn, ops::ConvOp, ops::CudnnConv3DOpMaker, + conv3d_cudnn_grad, ops::ConvOpGrad); + +REGISTER_OP_CPU_KERNEL(conv2d_cudnn, + ops::GemmConvKernel, + ops::GemmConvKernel); +REGISTER_OP_CPU_KERNEL( + conv2d_cudnn_grad, + ops::GemmConvGradKernel, + ops::GemmConvGradKernel); -REGISTER_OP_CPU_KERNEL(conv_cudnn, +REGISTER_OP_CPU_KERNEL(conv3d_cudnn, ops::GemmConvKernel, ops::GemmConvKernel); REGISTER_OP_CPU_KERNEL( - conv_cudnn_grad, ops::GemmConvGradKernel, + conv3d_cudnn_grad, + ops::GemmConvGradKernel, ops::GemmConvGradKernel); diff --git a/paddle/operators/conv_cudnn_op.cu.cc b/paddle/operators/conv_cudnn_op.cu.cc index 5eaf6b33704eb371fff4b949c6cc32a7a5dbc812..a9763d424801cfced5fe4c4718a335a24b81cfdc 100644 --- a/paddle/operators/conv_cudnn_op.cu.cc +++ b/paddle/operators/conv_cudnn_op.cu.cc @@ -56,6 +56,21 @@ class CudnnConvOpKernel : public framework::OpKernel { ScopedFilterDescriptor filter_desc; ScopedConvolutionDescriptor conv_desc; DataLayout layout = DataLayout::kNCHW; + if (input->dims().size() == 5) { + layout = DataLayout::kNCDHW; + } + + cudnnConvolutionDescriptor_t cudnn_conv_desc = + conv_desc.descriptor(paddings, strides, dilations); + +#if CUDNN_VERSION_MIN(7, 0, 0) + // cudnn 7 can support groups, no need to do it mannually + // FIXME(typhoonzero): find a better way to disable groups + // rather than setting it to 1. + PADDLE_ENFORCE(platform::dynload::cudnnSetConvolutionGroupCount( + cudnn_conv_desc, groups)); + groups = 1; +#endif cudnnTensorDescriptor_t cudnn_input_desc = input_desc.descriptor( layout, framework::vectorize2int(input->dims()), groups); @@ -63,19 +78,34 @@ class CudnnConvOpKernel : public framework::OpKernel { layout, framework::vectorize2int(output->dims()), groups); cudnnFilterDescriptor_t cudnn_filter_desc = filter_desc.descriptor( layout, framework::vectorize2int(filter->dims()), groups); - cudnnConvolutionDescriptor_t cudnn_conv_desc = - conv_desc.descriptor(paddings, strides, dilations); int input_channels = input->dims()[1]; - int input_height = input->dims()[2]; - int input_width = input->dims()[3]; - int output_channels = output->dims()[1]; - int output_height = output->dims()[2]; - int output_width = output->dims()[3]; + int input_height, input_width, input_depth; + if (input->dims().size() == 5) { + input_depth = input->dims()[2]; + input_height = input->dims()[3]; + input_width = input->dims()[4]; + } else { // dim size is enforced in InferShape + input_depth = 1; + input_height = input->dims()[2]; + input_width = input->dims()[3]; + } + int output_channels = filter->dims()[0]; + int output_height, output_width, output_depth; + if (output->dims().size() == 5) { + output_depth = output->dims()[2]; + output_height = output->dims()[3]; + output_width = output->dims()[4]; + } else { + output_depth = 1; + output_height = output->dims()[2]; + output_width = output->dims()[3]; + } - int group_offset_in = input_channels / groups * input_height * input_width; + int group_offset_in = + input_channels / groups * input_height * input_width * input_depth; int group_offset_out = - output_channels / groups * output_height * output_width; + output_channels / groups * output_height * output_width * output_depth; int group_offset_filter = filter->numel() / groups; // ------------------- cudnn conv workspace --------------------- void* cudnn_workspace = nullptr; @@ -138,12 +168,26 @@ class CudnnConvGradOpKernel : public framework::OpKernel { // ------------------- cudnn descriptors --------------------- ScopedTensorDescriptor input_desc; ScopedTensorDescriptor output_grad_desc; - ScopedTensorDescriptor input_grad_desc; ScopedFilterDescriptor filter_desc; ScopedFilterDescriptor filter_grad_desc; ScopedConvolutionDescriptor conv_desc; DataLayout layout = DataLayout::kNCHW; + if (input->dims().size() == 5) { + layout = DataLayout::kNCDHW; + } + + cudnnConvolutionDescriptor_t cudnn_conv_desc = + conv_desc.descriptor(paddings, strides, dilations); + +#if CUDNN_VERSION_MIN(7, 0, 0) + // cudnn 7 can support groups, no need to do it mannually + // FIXME(typhoonzero): find a better way to disable groups + // rather than setting it to 1. + PADDLE_ENFORCE(platform::dynload::cudnnSetConvolutionGroupCount( + cudnn_conv_desc, groups)); + groups = 1; +#endif cudnnTensorDescriptor_t cudnn_input_desc = input_desc.descriptor( layout, framework::vectorize2int(input->dims()), groups); @@ -152,22 +196,35 @@ class CudnnConvGradOpKernel : public framework::OpKernel { layout, framework::vectorize2int(output_grad->dims()), groups); cudnnFilterDescriptor_t cudnn_filter_desc = filter_desc.descriptor( layout, framework::vectorize2int(filter->dims()), groups); - cudnnTensorDescriptor_t cudnn_input_grad_desc = nullptr; - cudnnFilterDescriptor_t cudnn_filter_grad_desc = nullptr; - - cudnnConvolutionDescriptor_t cudnn_conv_desc = - conv_desc.descriptor(paddings, strides, dilations); int input_channels = input->dims()[1]; - int input_height = input->dims()[2]; - int input_width = input->dims()[3]; + int input_height, input_width, input_depth; + if (input->dims().size() == 5) { + input_depth = input->dims()[2]; + input_height = input->dims()[3]; + input_width = input->dims()[4]; + } else { // dim size is enforced in InferShape + input_depth = 1; + input_height = input->dims()[2]; + input_width = input->dims()[3]; + } + int output_grad_channels = filter->dims()[0]; - int output_grad_height = output_grad->dims()[2]; - int output_grad_width = output_grad->dims()[3]; + int output_grad_height, output_grad_width, output_grad_depth; + if (input->dims().size() == 5) { + output_grad_depth = output_grad->dims()[2]; + output_grad_height = output_grad->dims()[3]; + output_grad_width = output_grad->dims()[4]; + } else { + output_grad_depth = 1; + output_grad_height = output_grad->dims()[2]; + output_grad_width = output_grad->dims()[3]; + } - int group_offset_in = input_channels / groups * input_height * input_width; - int group_offset_out = - output_grad_channels / groups * output_grad_height * output_grad_width; + int group_offset_in = + input_channels / groups * input_height * input_width * input_depth; + int group_offset_out = output_grad_channels / groups * output_grad_height * + output_grad_width * output_grad_depth; int group_offset_filter = filter->numel() / groups; // ------------------- cudnn backward algorithm --------------------- cudnnConvolutionBwdDataAlgo_t data_algo; @@ -180,8 +237,6 @@ class CudnnConvGradOpKernel : public framework::OpKernel { auto handle = ctx.cuda_device_context().cudnn_handle(); if (input_grad) { - cudnn_input_grad_desc = input_grad_desc.descriptor( - layout, framework::vectorize2int(input_grad->dims()), groups); PADDLE_ENFORCE( platform::dynload::cudnnGetConvolutionBackwardDataAlgorithm( handle, cudnn_filter_desc, @@ -190,19 +245,17 @@ class CudnnConvGradOpKernel : public framework::OpKernel { cudnn_output_grad_desc, cudnn_conv_desc, // dxDesc: Handle to the previously initialized output tensor // descriptor. - cudnn_input_grad_desc, + cudnn_input_desc, CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT, workspace_size_limit, &data_algo)); PADDLE_ENFORCE( platform::dynload::cudnnGetConvolutionBackwardDataWorkspaceSize( handle, cudnn_filter_desc, cudnn_output_grad_desc, - cudnn_conv_desc, cudnn_input_grad_desc, data_algo, &tmp_size)); + cudnn_conv_desc, cudnn_input_desc, data_algo, &tmp_size)); workspace_size_in_bytes = std::max(workspace_size_in_bytes, tmp_size); } if (filter_grad) { - cudnn_filter_grad_desc = filter_grad_desc.descriptor( - layout, framework::vectorize2int(filter_grad->dims()), groups); PADDLE_ENFORCE( platform::dynload::cudnnGetConvolutionBackwardFilterAlgorithm( handle, cudnn_input_desc, cudnn_output_grad_desc, cudnn_conv_desc, @@ -222,7 +275,6 @@ class CudnnConvGradOpKernel : public framework::OpKernel { platform::GPUPlace gpu = boost::get(ctx.GetPlace()); cudnn_workspace = paddle::memory::Alloc(gpu, workspace_size_in_bytes); // ------------------- cudnn conv backward data --------------------- - // FIXME(typhoonzero): template type T may not be the same as cudnn call. T alpha = 1.0f, beta = 0.0f; if (input_grad) { T* input_grad_data = input_grad->mutable_data(ctx.GetPlace()); @@ -233,21 +285,20 @@ class CudnnConvGradOpKernel : public framework::OpKernel { handle, &alpha, cudnn_filter_desc, filter_data + i * group_offset_filter, cudnn_output_grad_desc, output_grad_data + i * group_offset_out, cudnn_conv_desc, data_algo, - cudnn_workspace, workspace_size_in_bytes, &beta, - cudnn_input_grad_desc, input_grad_data + i * group_offset_in)); + cudnn_workspace, workspace_size_in_bytes, &beta, cudnn_input_desc, + input_grad_data + i * group_offset_in)); } } // ------------------- cudnn conv backward filter --------------------- if (filter_grad) { T* filter_grad_data = filter_grad->mutable_data(ctx.GetPlace()); // Because beta is zero, it is unnecessary to reset filter_grad. - for (int i = 0; i < groups; i++) { PADDLE_ENFORCE(platform::dynload::cudnnConvolutionBackwardFilter( handle, &alpha, cudnn_input_desc, input_data + i * group_offset_in, cudnn_output_grad_desc, output_grad_data + i * group_offset_out, cudnn_conv_desc, filter_algo, cudnn_workspace, - workspace_size_in_bytes, &beta, cudnn_filter_grad_desc, + workspace_size_in_bytes, &beta, cudnn_filter_desc, filter_grad_data + i * group_offset_filter)); } } @@ -259,8 +310,16 @@ class CudnnConvGradOpKernel : public framework::OpKernel { } // namespace operators } // namespace paddle -REGISTER_OP_GPU_KERNEL(conv_cudnn, paddle::operators::CudnnConvOpKernel, +REGISTER_OP_GPU_KERNEL(conv2d_cudnn, + paddle::operators::CudnnConvOpKernel, + paddle::operators::CudnnConvOpKernel); +REGISTER_OP_GPU_KERNEL(conv2d_cudnn_grad, + paddle::operators::CudnnConvGradOpKernel, + paddle::operators::CudnnConvGradOpKernel); + +REGISTER_OP_GPU_KERNEL(conv3d_cudnn, + paddle::operators::CudnnConvOpKernel, paddle::operators::CudnnConvOpKernel); -REGISTER_OP_GPU_KERNEL(conv_cudnn_grad, +REGISTER_OP_GPU_KERNEL(conv3d_cudnn_grad, paddle::operators::CudnnConvGradOpKernel, paddle::operators::CudnnConvGradOpKernel); diff --git a/paddle/platform/cudnn_helper.h b/paddle/platform/cudnn_helper.h index c5d8a6066ef3becb601344590f977a38c2af0a63..80a4c9bb4bbcd03cf849d86118db4e502382f031 100644 --- a/paddle/platform/cudnn_helper.h +++ b/paddle/platform/cudnn_helper.h @@ -116,7 +116,7 @@ inline cudnnTensorFormat_t GetCudnnTensorFormat( case DataLayout::kNCHW: return CUDNN_TENSOR_NCHW; case DataLayout::kNCDHW: - return CUDNN_TENSOR_NCHW; // TODO(chengduoZH) : add CUDNN_TENSOR_NCDHW + return CUDNN_TENSOR_NCHW; // NOTE: cudnn treat NdTensor as the same default: PADDLE_THROW("Unknown cudnn equivalent for order"); } @@ -143,7 +143,7 @@ class ScopedTensorDescriptor { strides[i] = dims[i + 1] * strides[i + 1]; } // Update tensor descriptor dims setting if groups > 1 - // FIXME(typhoonzero): Assume using NCHW or NCDHW order + // NOTE: Assume using NCHW or NCDHW order std::vector dims_with_group(dims.begin(), dims.end()); // copy if (groups > 1) { dims_with_group[1] = dims_with_group[1] / groups; @@ -186,7 +186,6 @@ class ScopedFilterDescriptor { // width of the filter. std::vector kernel_with_group(kernel.begin(), kernel.end()); if (groups > 1) { - // M /= groups kernel_with_group[0] /= groups; // NOTE: input filter(C) of the filter is already asserted to be C/groups. } diff --git a/python/paddle/v2/fluid/executor.py b/python/paddle/v2/fluid/executor.py index ed1c2c06daa7ede97e138049a1f7044d071c31e8..bd98d6b154cca720148aec3ec8b025a81ad545f4 100644 --- a/python/paddle/v2/fluid/executor.py +++ b/python/paddle/v2/fluid/executor.py @@ -1,9 +1,38 @@ +import numpy as np import paddle.v2.fluid.core as core from paddle.v2.fluid.framework import Block, Program, g_main_program g_scope = core.Scope() +def as_numpy(tensor): + if isinstance(tensor, list): + return [as_numpy(t) for t in tensor] + assert isinstance(tensor, core.LoDTensor) + lod = tensor.lod() + tensor_data = np.array(tensor) + if len(lod) == 0: + ans = tensor_data + else: + raise RuntimeError("LoD Calculate lacks unit tests and buggy") + # elif len(lod) == 1: + # ans = [] + # idx = 0 + # while idx < len(lod) - 1: + # ans.append(tensor_data[lod[idx]:lod[idx + 1]]) + # idx += 1 + # else: + # for l in reversed(lod): + # ans = [] + # idx = 0 + # while idx < len(l) - 1: + # ans.append(tensor_data[l[idx]:l[idx + 1]]) + # idx += 1 + # tensor_data = ans + # ans = tensor_data + return ans + + class Executor(object): def __init__(self, places): if not isinstance(places, list) and not isinstance(places, tuple): @@ -16,6 +45,47 @@ class Executor(object): act_places.append(p) self.executor = core.Executor(act_places) + self.places = places + + def aslodtensor(self, data): + def accumulate(data): + if not isinstance(data, list): + return 1 + return sum([accumulate(sub) for sub in data]) + + def parselod(data): + seq_lens = [accumulate(seq) for seq in data] + cur_len = 0 + lod = [cur_len] + for l in seq_lens: + cur_len += l + lod.append(cur_len) + return lod + + assert len(self.places) != 0 + if not isinstance(data, list): + # pure tensor case + tensor = core.LoDTensor() + tensor.set(data, self.places[0]) + return tensor + else: + raise RuntimeError("Current implementation lacks unittests") + # lodtensor case + lod = [] + if not isinstance(data[0], list): + lod.append(parselod(data)) + flattened_data = np.concatenate(data, axis=0).astype("int64") + else: + while isinstance(data[0], list): + lod.append(parselod(seq)) + flattened_data = [item for seq in data for item in seq] + data = flattened_data + flattened_data = np.concatenate(data, axis=0).astype("int64") + flattened_data = flattened_data.reshape([len(flattened_data), 1]) + tensor = core.LoDTensor() + tensor.set(flattened_data, self.places[0]) + tensor.set_lod(lod) + return tensor def run(self, program=None, @@ -23,7 +93,8 @@ class Executor(object): fetch_list=None, feed_var_name='feed', fetch_var_name='fetch', - scope=None): + scope=None, + return_numpy=True): if feed is None: feed = {} if fetch_list is None: @@ -52,7 +123,10 @@ class Executor(object): inputs={'X': [feed_var]}, outputs={'Out': [out]}, attrs={'col': i}) - core.set_feed_variable(scope, feed[name], feed_var.name, i) + cur_feed = feed[name] + if not isinstance(cur_feed, core.LoDTensor): + cur_feed = self.aslodtensor(cur_feed) + core.set_feed_variable(scope, cur_feed, feed_var.name, i) fetch_var = global_block.create_var( name=fetch_var_name, @@ -66,7 +140,11 @@ class Executor(object): attrs={'col': i}) self.executor.run(program.desc, scope, 0, True) - return [ + outs = [ core.get_fetch_variable(scope, fetch_var_name, i) for i in xrange(len(fetch_list)) ] + + if return_numpy: + outs = as_numpy(outs) + return outs diff --git a/python/paddle/v2/fluid/tests/.gitignore b/python/paddle/v2/fluid/tests/.gitignore index fcc52c04886865d96c1bfe1597a9dc99c181de1f..a648f2b387c2c7b9422eea6749e43e7b8871f60f 100644 --- a/python/paddle/v2/fluid/tests/.gitignore +++ b/python/paddle/v2/fluid/tests/.gitignore @@ -1,2 +1,3 @@ image/ fit_a_line.model/ +tmp diff --git a/python/paddle/v2/fluid/tests/op_test.py b/python/paddle/v2/fluid/tests/op_test.py index 51023bd19a8326152335eabc9e96600427527f26..e83c4a0622013cbfebdf39434ef252412697acb1 100644 --- a/python/paddle/v2/fluid/tests/op_test.py +++ b/python/paddle/v2/fluid/tests/op_test.py @@ -261,7 +261,10 @@ class OpTest(unittest.TestCase): feed_map = self.feed_var(inputs, place) exe = Executor(place) - outs = exe.run(program, feed=feed_map, fetch_list=fetch_list) + outs = exe.run(program, + feed=feed_map, + fetch_list=fetch_list, + return_numpy=False) for out_name, out_dup in Operator.get_op_outputs(self.op_type): if out_name not in self.outputs: @@ -500,5 +503,6 @@ class OpTest(unittest.TestCase): fetch_list = [g for p, g in param_grad_list] executor = Executor(place) - result = executor.run(prog, feed_dict, fetch_list) - return map(np.array, result) + return map( + np.array, + executor.run(prog, feed_dict, fetch_list, return_numpy=False)) diff --git a/python/paddle/v2/fluid/tests/test_array_read_write_op.py b/python/paddle/v2/fluid/tests/test_array_read_write_op.py index e019a4e15f0e25deaedf30911b44e576c8f89013..b7790b01062d480cbd6c9e1a626d318385b4f61e 100644 --- a/python/paddle/v2/fluid/tests/test_array_read_write_op.py +++ b/python/paddle/v2/fluid/tests/test_array_read_write_op.py @@ -52,15 +52,13 @@ class TestArrayReadWrite(unittest.TestCase): exe = Executor(cpu) - tensor = core.LoDTensor() - tensor.set(numpy.random.random(size=(100, 100)).astype('float32'), cpu) - - outs = map(numpy.array, - exe.run(feed={'x0': tensor, - 'x1': tensor, - 'x2': tensor}, - fetch_list=[a_sum, x_sum], - scope=scope)) + tensor = numpy.random.random(size=(100, 100)).astype('float32') + + outs = exe.run(feed={'x0': tensor, + 'x1': tensor, + 'x2': tensor}, + fetch_list=[a_sum, x_sum], + scope=scope) self.assertEqual(outs[0], outs[1]) total_sum = layers.sums(input=[a_sum, x_sum]) @@ -72,12 +70,11 @@ class TestArrayReadWrite(unittest.TestCase): [each_x.name + "@GRAD" for each_x in x]) g_out = [ item.sum() - for item in map( - numpy.array, - exe.run(feed={'x0': tensor, - 'x1': tensor, - 'x2': tensor}, - fetch_list=g_vars)) + for item in exe.run( + feed={'x0': tensor, + 'x1': tensor, + 'x2': tensor}, + fetch_list=g_vars) ] g_out_sum = numpy.array(g_out).sum() diff --git a/python/paddle/v2/fluid/tests/test_conditional_block.py b/python/paddle/v2/fluid/tests/test_conditional_block.py index 2a30fd107968ce0fa188bda44e731ad760dce1f5..d953ee7ddc37d150d87cbd680379410a4d16f6b1 100644 --- a/python/paddle/v2/fluid/tests/test_conditional_block.py +++ b/python/paddle/v2/fluid/tests/test_conditional_block.py @@ -21,18 +21,15 @@ class ConditionalBlock(unittest.TestCase): exe = Executor(cpu) exe.run(g_startup_program) - x = core.LoDTensor() - x.set(numpy.random.random(size=(10, 1)).astype('float32'), cpu) + x = numpy.random.random(size=(10, 1)).astype('float32') - outs = map(numpy.array, exe.run(feed={'X': x}, fetch_list=[out]))[0] + outs = exe.run(feed={'X': x}, fetch_list=[out])[0] print outs loss = layers.mean(x=out) append_backward_ops(loss=loss) - outs = map(numpy.array, - exe.run(feed={'X': x}, - fetch_list=[ - g_main_program.block(0).var(data.name + "@GRAD") - ]))[0] + outs = exe.run( + feed={'X': x}, + fetch_list=[g_main_program.block(0).var(data.name + "@GRAD")])[0] print outs diff --git a/python/paddle/v2/fluid/tests/test_conv2d_op.py b/python/paddle/v2/fluid/tests/test_conv2d_op.py index 2240dc73cdd31f320fed174dd811e93c6640137f..e82e3ab0c9c0bc75a13a8948fda925bc4f0b6512 100644 --- a/python/paddle/v2/fluid/tests/test_conv2d_op.py +++ b/python/paddle/v2/fluid/tests/test_conv2d_op.py @@ -16,8 +16,8 @@ def conv2d_forward_naive(input, filter, group, conv_param): out_w = 1 + (in_w + 2 * pad[1] - (dilation[1] * (f_w - 1) + 1)) / stride[1] out = np.zeros((in_n, out_c, out_h, out_w)) - d_bolck_w = (dilation[0] * (f_h - 1) + 1) - d_bolck_h = (dilation[1] * (f_w - 1) + 1) + d_bolck_h = (dilation[0] * (f_h - 1) + 1) + d_bolck_w = (dilation[1] * (f_w - 1) + 1) input_pad = np.pad(input, ((0, ), (0, ), (pad[0], ), (pad[1], )), mode='constant', @@ -167,27 +167,27 @@ class TestWithDilation(TestConv2dOp): #----------------Conv2dCudnn---------------- class TestCudnn(TestConv2dOp): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" class TestCudnnWithPad(TestWithPad): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" class TestCudnnWithStride(TestWithStride): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" class TestCudnnWithGroup(TestWithGroup): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" class TestCudnnWith1x1(TestWith1x1): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" # cudnn v5 does not support dilation conv. diff --git a/python/paddle/v2/fluid/tests/test_conv3d_op.py b/python/paddle/v2/fluid/tests/test_conv3d_op.py index 934ea46437d67b78309a86a2779e0c6577399136..8593dff20b5c283d5862206dfb0c0d2501039d07 100644 --- a/python/paddle/v2/fluid/tests/test_conv3d_op.py +++ b/python/paddle/v2/fluid/tests/test_conv3d_op.py @@ -169,5 +169,31 @@ class TestWithDilation(TestConv3dOp): self.groups = 3 +class TestCudnn(TestConv3dOp): + def init_op_type(self): + self.op_type = "conv3d_cudnn" + + +class TestWithGroup1Cudnn(TestWithGroup1): + def init_op_type(self): + self.op_type = "conv3d_cudnn" + + +class TestWithGroup2Cudnn(TestWithGroup2): + def init_op_type(self): + self.op_type = "conv3d_cudnn" + + +class TestWith1x1Cudnn(TestWith1x1): + def init_op_type(self): + self.op_type = "conv3d_cudnn" + + +# FIXME(typhoonzero): find a way to determine if +# using cudnn > 6 in python +# class TestWithDilationCudnn(TestWithDilation): +# def init_op_type(self): +# self.op_type = "conv3d_cudnn" + if __name__ == '__main__': unittest.main() diff --git a/python/paddle/v2/fluid/tests/test_executor_and_mul.py b/python/paddle/v2/fluid/tests/test_executor_and_mul.py index da64739de5eb4eca8db8ac8370276c41692a7242..558273e30dff7fb74f78751f4fe569f79a453d0d 100644 --- a/python/paddle/v2/fluid/tests/test_executor_and_mul.py +++ b/python/paddle/v2/fluid/tests/test_executor_and_mul.py @@ -1,5 +1,5 @@ import unittest -from paddle.v2.fluid.layers import mul, data +from paddle.v2.fluid.layers import mul, data, sequence_pool import paddle.v2.fluid.core as core from paddle.v2.fluid.executor import Executor from paddle.v2.fluid.framework import g_main_program @@ -17,17 +17,13 @@ class TestExecutor(unittest.TestCase): out = mul(x=a, y=b) place = core.CPUPlace() a_np = numpy.random.random((100, 784)).astype('float32') - tensor_a = core.LoDTensor() - tensor_a.set(a_np, place) b_np = numpy.random.random((784, 100)).astype('float32') - tensor_b = core.LoDTensor() - tensor_b.set(b_np, place) exe = Executor(place) outs = exe.run(g_main_program, - feed={'a': tensor_a, - 'b': tensor_b}, + feed={'a': a_np, + 'b': b_np}, fetch_list=[out]) - out = numpy.array(outs[0]) + out = outs[0] self.assertEqual((100, 100), out.shape) self.assertTrue(numpy.allclose(out, numpy.dot(a_np, b_np))) diff --git a/python/paddle/v2/fluid/tests/test_inference_model_io.py b/python/paddle/v2/fluid/tests/test_inference_model_io.py index 74f1ce23262bbc969f9544885a7390534c76cdf6..60aed62ead83dedbeb9438c431ec292558d88ce5 100644 --- a/python/paddle/v2/fluid/tests/test_inference_model_io.py +++ b/python/paddle/v2/fluid/tests/test_inference_model_io.py @@ -1,13 +1,13 @@ -import paddle.v2 as paddle -import paddle.v2.fluid.layers as layers +import unittest + +import numpy as np import paddle.v2.fluid.core as core -import paddle.v2.fluid.optimizer as optimizer +import paddle.v2.fluid.executor as executor +import paddle.v2.fluid.layers as layers +import paddle.v2.fluid.optimizer as optimizer from paddle.v2.fluid.framework import Program from paddle.v2.fluid.io import save_inference_model, load_inference_model -import paddle.v2.fluid.executor as executor -import unittest -import numpy as np class TestBook(unittest.TestCase): @@ -44,7 +44,7 @@ class TestBook(unittest.TestCase): x=cost, main_program=program, startup_program=init_program) sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) - opts = sgd_optimizer.minimize(avg_cost, init_program) + sgd_optimizer.minimize(avg_cost, init_program) place = core.CPUPlace() exe = executor.Executor(place) @@ -52,25 +52,20 @@ class TestBook(unittest.TestCase): exe.run(init_program, feed={}, fetch_list=[]) for i in xrange(100): - x_data = np.array( + tensor_x = np.array( [[1, 1], [1, 2], [3, 4], [5, 2]]).astype("float32") - y_data = np.array([[-2], [-3], [-7], [-7]]).astype("float32") + tensor_y = np.array([[-2], [-3], [-7], [-7]]).astype("float32") - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) exe.run(program, feed={'x': tensor_x, 'y': tensor_y}, fetch_list=[avg_cost]) save_inference_model(MODEL_DIR, ["x", "y"], [avg_cost], exe, program) - outs = exe.run(program, - feed={'x': tensor_x, - 'y': tensor_y}, - fetch_list=[avg_cost]) - expected = np.array(outs[0]) + expected = exe.run(program, + feed={'x': tensor_x, + 'y': tensor_y}, + fetch_list=[avg_cost])[0] reload(executor) # reload to build a new scope exe = executor.Executor(place) @@ -83,7 +78,7 @@ class TestBook(unittest.TestCase): feed={feed_var_names[0]: tensor_x, feed_var_names[1]: tensor_y}, fetch_list=fetch_vars) - actual = np.array(outs[0]) + actual = outs[0] self.assertEqual(feed_var_names, ["x", "y"]) self.assertEqual(len(fetch_vars), 1) diff --git a/python/paddle/v2/fluid/tests/test_lod_array_length_op.py b/python/paddle/v2/fluid/tests/test_lod_array_length_op.py index a01ae83772185df218b8c453557dc0cac719673b..8a4be545eda841dbda33b7c8cae9f91a4199f2f8 100644 --- a/python/paddle/v2/fluid/tests/test_lod_array_length_op.py +++ b/python/paddle/v2/fluid/tests/test_lod_array_length_op.py @@ -13,7 +13,7 @@ class TestLoDArrayLength(unittest.TestCase): arr_len = layers.array_length(arr) cpu = core.CPUPlace() exe = Executor(cpu) - result = numpy.array(exe.run(fetch_list=[arr_len])[0]) + result = exe.run(fetch_list=[arr_len])[0] self.assertEqual(11, result[0]) diff --git a/python/paddle/v2/fluid/tests/test_lod_tensor_array_ops.py b/python/paddle/v2/fluid/tests/test_lod_tensor_array_ops.py index 16e64b8cd52d72a3bbc84e43d772b843dad0129a..032922a08a2960b48c308ad642b3228b766d725a 100644 --- a/python/paddle/v2/fluid/tests/test_lod_tensor_array_ops.py +++ b/python/paddle/v2/fluid/tests/test_lod_tensor_array_ops.py @@ -151,10 +151,11 @@ class TestCPULoDTensorArrayOpGrad(unittest.TestCase): exe = Executor(place) g_out = [ - item.sum() - for item in map( - numpy.array, - exe.run(program, feed={'x': tensor}, fetch_list=[g_vars])) + numpy.array(item).sum() + for item in exe.run(program, + feed={'x': tensor}, + fetch_list=[g_vars], + return_numpy=False) ] g_out_sum = numpy.array(g_out).sum() diff --git a/python/paddle/v2/fluid/tests/test_mnist_if_else_op.py b/python/paddle/v2/fluid/tests/test_mnist_if_else_op.py index e76357a5be07d79eafee4c3a27911efe8a3eaef4..50fcc4a72ddbd6d7a3d3b73434c6ac8de5a006e2 100644 --- a/python/paddle/v2/fluid/tests/test_mnist_if_else_op.py +++ b/python/paddle/v2/fluid/tests/test_mnist_if_else_op.py @@ -65,17 +65,10 @@ class TestMNISTIfElseOp(unittest.TestCase): y_data = np.array(map(lambda x: x[1], data)).astype("int64") y_data = np.expand_dims(y_data, axis=1) - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) - - outs = map(np.array, - exe.run(kwargs['main_program'], - feed={'x': tensor_x, - 'y': tensor_y}, - fetch_list=[avg_loss])) + outs = exe.run(kwargs['main_program'], + feed={'x': x_data, + 'y': y_data}, + fetch_list=[avg_loss]) print outs[0] if outs[0] < 1.0: return @@ -129,19 +122,12 @@ class TestMNISTIfElseOp(unittest.TestCase): for data in train_reader(): x_data = np.array(map(lambda x: x[0], data)).astype("float32") y_data = np.array(map(lambda x: x[1], data)).astype("int64") - y_data = np.expand_dims(y_data, axis=1) - - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) + y_data = y_data.reshape((y_data.shape[0], 1)) - outs = map(np.array, - exe.run(kwargs['main_program'], - feed={'x': tensor_x, - 'y': tensor_y}, - fetch_list=[avg_loss])) + outs = exe.run(kwargs['main_program'], + feed={'x': x_data, + 'y': y_data}, + fetch_list=[avg_loss]) print outs[0] if outs[0] < 1.0: return diff --git a/python/paddle/v2/fluid/tests/test_parameter.py b/python/paddle/v2/fluid/tests/test_parameter.py index d467e4bbb79b291c442c643158ef6c0d630920dd..13f6278ad8b7244e7980b32463f29d7a824b4572 100644 --- a/python/paddle/v2/fluid/tests/test_parameter.py +++ b/python/paddle/v2/fluid/tests/test_parameter.py @@ -24,7 +24,7 @@ class TestParameter(unittest.TestCase): self.assertEqual(0, param.block.idx) exe = Executor(core.CPUPlace()) p = exe.run(g_main_program, fetch_list=[param])[0] - self.assertTrue(np.allclose(np.array(p), np.ones(shape) * val)) + self.assertTrue(np.allclose(p, np.ones(shape) * val)) p = io.get_parameter_value_by_name('fc.w', exe, g_main_program) self.assertTrue(np.allclose(np.array(p), np.ones(shape) * val)) diff --git a/python/paddle/v2/fluid/tests/test_recurrent_op.py b/python/paddle/v2/fluid/tests/test_recurrent_op.py index 88bcdc3e6a21881ace2be53c22a62d78df30a974..84548847f76c6315da000e1b3d062deafe55a05e 100644 --- a/python/paddle/v2/fluid/tests/test_recurrent_op.py +++ b/python/paddle/v2/fluid/tests/test_recurrent_op.py @@ -156,7 +156,7 @@ class RecurrentOpTest1(unittest.TestCase): feed=self.feed_map, fetch_list=[self.output]) - return np.array(out[0]) + return out[0] def backward(self): self.feed_map = { @@ -171,7 +171,8 @@ class RecurrentOpTest1(unittest.TestCase): exe = Executor(self.place) return exe.run(self.main_program, feed=self.feed_map, - fetch_list=fetch_list) + fetch_list=fetch_list, + return_numpy=False) def test_backward(self): self.check_forward() diff --git a/python/paddle/v2/fluid/tests/test_rnn_memory_helper_op.py b/python/paddle/v2/fluid/tests/test_rnn_memory_helper_op.py index a3cba92504a28590083df57e69f7662a887d94a6..9999165ed509aa40f31f26aa676f381561bd0016 100644 --- a/python/paddle/v2/fluid/tests/test_rnn_memory_helper_op.py +++ b/python/paddle/v2/fluid/tests/test_rnn_memory_helper_op.py @@ -7,12 +7,6 @@ import numpy as np import paddle.v2.fluid.core as core -def create_tensor(np_data, place): - tensor = core.LoDTensor() - tensor.set(np_data, place) - return tensor - - class RNNMemoryHelperOpTest(unittest.TestCase): def setUp(self): self.program = Program() @@ -30,13 +24,13 @@ class RNNMemoryHelperOpTest(unittest.TestCase): def test_forward(self): x_np = np.random.normal(size=(2, 3)).astype("float32") - self.feed_map = {'X': create_tensor(x_np, self.place)} + self.feed_map = {'X': x_np} self.fetch_list = [self.Out] exe = Executor(self.place) out = exe.run(self.program, feed=self.feed_map, fetch_list=self.fetch_list) - np.isclose(np.array(out[0]), x_np, rtol=1e-5) + self.assertTrue(np.allclose(out[0], x_np, rtol=1e-5)) class RNNMemoryHelperGradOpTest(unittest.TestCase): @@ -66,8 +60,7 @@ class RNNMemoryHelperGradOpTest(unittest.TestCase): def test_backward(self): self.feed_map = { - name: create_tensor( - np.random.normal(size=(2, 3)).astype("float32"), self.place) + name: np.random.normal(size=(2, 3)).astype("float32") for name in self.input_names } self.fetch_list = [self.output_vars['X@GRAD']] @@ -76,7 +69,7 @@ class RNNMemoryHelperGradOpTest(unittest.TestCase): out = exe.run(self.program, feed=self.feed_map, fetch_list=self.fetch_list) - np.isclose(np.array(out[0]), self.feed_map['Out@GRAD'], rtol=1e-5) + np.isclose(out[0], self.feed_map['Out@GRAD'], rtol=1e-5) class RNNMemoryHelperGradOpWithoutInputTest(unittest.TestCase): @@ -110,8 +103,7 @@ class RNNMemoryHelperGradOpWithoutInputTest(unittest.TestCase): def test_backward(self): self.feed_map = { - name: create_tensor( - np.random.normal(size=(2, 3)).astype("float32"), self.place) + name: np.random.normal(size=(2, 3)).astype("float32") for name in ['X', 'Out'] } self.fetch_list = [self.output_vars['X@GRAD']] @@ -120,10 +112,9 @@ class RNNMemoryHelperGradOpWithoutInputTest(unittest.TestCase): out = exe.run(self.program, feed=self.feed_map, fetch_list=self.fetch_list) - np.isclose( - np.array(out[0]), - np.zeros(shape=(2, 3)).astype("float32"), - rtol=1e-5) + self.assertTrue( + np.allclose( + out[0], np.zeros(shape=(2, 3)).astype("float32"), rtol=1e-5)) if __name__ == '__main__': diff --git a/python/paddle/v2/fluid/tests/test_shrink_rnn_memory.py b/python/paddle/v2/fluid/tests/test_shrink_rnn_memory.py index 953629d610e183cdddf97081f94a77951fe979d8..05f6a560644f18da6ff2e015911901cd73cc36c9 100644 --- a/python/paddle/v2/fluid/tests/test_shrink_rnn_memory.py +++ b/python/paddle/v2/fluid/tests/test_shrink_rnn_memory.py @@ -27,19 +27,16 @@ class TestShrinkRNNMemory(unittest.TestCase): tensor_np = numpy.random.random(size=(3, 100)).astype('float32') tensor.set(tensor_np, cpu) exe = Executor(cpu) - outs = map(numpy.array, - exe.run(feed={'x': tensor}, fetch_list=[mem1, mem2, mem3])) + outs = exe.run(feed={'x': tensor}, fetch_list=[mem1, mem2, mem3]) self.assertTrue(numpy.allclose(tensor_np[0:3], outs[0])) self.assertTrue(numpy.allclose(tensor_np[0:2], outs[1])) self.assertTrue(numpy.allclose(tensor_np[0:1], outs[2])) mem3_mean = layers.mean(x=mem3) append_backward_ops(loss=mem3_mean) - x_grad = map(numpy.array, - exe.run(feed={'x': tensor}, - fetch_list=[ - g_main_program.global_block().var('x@GRAD') - ]))[0] + x_grad = exe.run( + feed={'x': tensor}, + fetch_list=[g_main_program.global_block().var('x@GRAD')])[0] self.assertAlmostEqual(1.0, x_grad.sum(), delta=0.1) diff --git a/python/paddle/v2/fluid/tests/test_split_and_merge_lod_tensor_op.py b/python/paddle/v2/fluid/tests/test_split_and_merge_lod_tensor_op.py index a98cb3bbab8442886206b59a2b591fee96deeb9f..f5da4e408f0a83dbf6da530b478e91bbf9cd5ab2 100644 --- a/python/paddle/v2/fluid/tests/test_split_and_merge_lod_tensor_op.py +++ b/python/paddle/v2/fluid/tests/test_split_and_merge_lod_tensor_op.py @@ -98,7 +98,11 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): exe = Executor(place) scope = core.Scope() - exe.run(program, feed={'x': tensor, 'y': mask}, scope=scope) + exe.run(program, + feed={'x': tensor, + 'y': mask}, + scope=scope, + return_numpy=False) var_true = scope.find_var(out_true.name).get_tensor() @@ -169,7 +173,8 @@ class TestCPUSplitMergeLoDTensorGrad(unittest.TestCase): feed={'x': tensor, 'y': mask}, fetch_list=[g_vars], - scope=scope)) + scope=scope, + return_numpy=False)) ] g_out_sum = np.array(g_out).sum() diff --git a/python/paddle/v2/fluid/tests/test_while_op.py b/python/paddle/v2/fluid/tests/test_while_op.py index fca0cdcc319ff661ced33b6bcd242c894941576c..033b03a4957131e1155c61e8ed2f10eefb23fda4 100644 --- a/python/paddle/v2/fluid/tests/test_while_op.py +++ b/python/paddle/v2/fluid/tests/test_while_op.py @@ -55,19 +55,10 @@ class TestWhileOp(unittest.TestCase): for i in xrange(3): d.append(numpy.random.random(size=[10]).astype('float32')) - d_tensor = [] - for item in d: - t = core.LoDTensor() - t.set(item, cpu) - d_tensor.append(t) - - outs = map(numpy.array, - exe.run(feed={ - 'd0': d_tensor[0], - 'd1': d_tensor[1], - 'd2': d_tensor[2] - }, - fetch_list=[sum_result])) + outs = exe.run(feed={'d0': d[0], + 'd1': d[1], + 'd2': d[2]}, + fetch_list=[sum_result]) self.assertAlmostEqual(numpy.sum(d), numpy.sum(outs[0]), delta=0.01) diff --git a/python/paddle/v2/fluid/tests/tmp/inference_model/__model__ b/python/paddle/v2/fluid/tests/tmp/inference_model/__model__ deleted file mode 100644 index e333d10da94943372b0fe4dedd9d857817ec9ca6..0000000000000000000000000000000000000000 Binary files a/python/paddle/v2/fluid/tests/tmp/inference_model/__model__ and /dev/null differ diff --git a/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.b_0 b/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.b_0 deleted file mode 100644 index b1e5fad056e58f23c2cf917a3f4c4d4632ae7d58..0000000000000000000000000000000000000000 Binary files a/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.b_0 and /dev/null differ diff --git a/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.w_0 b/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.w_0 deleted file mode 100644 index 2f41796c0495570941c236c8c3f422b3cbd5edd2..0000000000000000000000000000000000000000 Binary files a/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.w_0 and /dev/null differ diff --git a/python/paddle/v2/framework/tests/test_elementwise_mod_op.py b/python/paddle/v2/framework/tests/test_elementwise_mod_op.py deleted file mode 100644 index 35c38147a24fb237b4607836a86cffa81b2d8904..0000000000000000000000000000000000000000 --- a/python/paddle/v2/framework/tests/test_elementwise_mod_op.py +++ /dev/null @@ -1,36 +0,0 @@ -import unittest -import numpy as np -from op_test import OpTest - - -class ElementwiseModOp(OpTest): - def setUp(self): - self.op_type = "elementwise_mod" - """ Warning - CPU gradient check error! - 'X': np.random.randint((32,84)).astype("int32"), - 'Y': np.random.randint((32,84)).astype("int32") - """ - self.inputs = { - 'X': np.random.randint(1, 10, [13, 17]).astype("int32"), - 'Y': np.random.randint(1, 10, [13, 17]).astype("int32") - } - self.outputs = {'Out': np.mod(self.inputs['X'], self.inputs['Y'])} - - def test_check_output(self): - self.check_output() - - def test_check_grad_normal(self): - self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.05) - - def test_check_grad_ingore_x(self): - self.check_grad( - ['Y'], 'Out', max_relative_error=0.05, no_grad_set=set("X")) - - def test_check_grad_ingore_y(self): - self.check_grad( - ['X'], 'Out', max_relative_error=0.05, no_grad_set=set('Y')) - - -if __name__ == '__main__': - unittest.main()