- 05 12月, 2019 1 次提交
-
-
由 lilong12 提交于
* fix the computation for dx (grad for x) for prelu operation. (#20949) * set the default value of alpha for prelu to 0.25, test=develop * add the call to __syncthreads(), test=develop * fix the implementation of cpu prelu, test=develop * repair the implementation of element mode prelu, test=develop * modify test_prelu_op.py, test=develop
-
- 02 12月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
seems shuffle_sync cannot handle small size test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 25 11月, 2019 1 次提交
-
-
由 Zhang Ting 提交于
* [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756) * All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview * fix the bug that attr(offsets) should be initialized, test=develop * [cherry-pick] maxout supports channel_last input (#20846) * maxout support channel_last input, test=develop * modified details of Input(X) and Attr(groups, axis) in doc, test=develop * [cherry-pick] lrn supports channel_last input, test=develop (#20954)
-
- 21 11月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* delete paddle infershape enforce marco (#20832) * Polish and arrange code in enforce.h (#20901) * Enrich the type of error and declare the error type interfaces (#21024) * Enrich the type of error and declare the error type interfaces, test=develop * adjust tests to adapt new form, test=develop * add inference deps with error_codes.pb.h, test=develop * restore stack iter start pos, test=develop * polish code based review comments, test=develop * Add dependency for error_codes.proto (#21084) * fix activation_functions deps, test=develop, test=document_fix * add error_codes_proto deps, test=develop, test=document_fix * try delete enforce.h, test=develop, test=document_fix * change cuda enforce & add example (#21142) test=release/1.6
-
- 31 10月, 2019 2 次提交
-
-
由 Zhang Ting 提交于
[cherry-pick] fix the bug of conv_transpose: compitable with AnyLayout setting, test=release/1.6 #(20897) (#20918)
-
由 Pei Yang 提交于
Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733) (#20902) * fix pool2d trt converter, test=develop * add fix for split op converter, test=develop
-
- 16 10月, 2019 1 次提交
-
-
由 qingqing01 提交于
* Support fp16 in fused_elemwise_activation_op. * Fix unit testing in ONLY-CPU mode.
-
- 14 10月, 2019 1 次提交
-
-
由 Zhang Ting 提交于
fix conv_transpose's bug: compatible with Anylayout setting: cherry-pick 20589
-
- 10 10月, 2019 1 次提交
-
-
由 liym27 提交于
* Delete PadFuntion, include padding.h instead. * move function(IsSymmetricPadding) from conv_cudnn_op.cu/conv_transpose_cudnn_op.cu to padding.h.
-
- 08 10月, 2019 1 次提交
-
-
由 Zhang Ting 提交于
-
- 03 10月, 2019 1 次提交
-
-
由 liym27 提交于
1.support asymmetric padding; 2.support padding algorithm:"SAME" and "VALID"; 3.support channel_last: data_format NHWC and NDHWC; 4.change doc of python API and c++; test=release/1.6
-
- 01 10月, 2019 1 次提交
-
-
由 danleifeng 提交于
-
- 28 9月, 2019 1 次提交
-
-
由 liym27 提交于
* fix pool2d pool3d: 1. support asymmetric padding; 2. support padding algorithm:"SAME" and "VALID"; 3. support channel_last: data_format NHWC and NDHWC; 4. support inferring shape when input with negative dims in compile time; 5. change doc of python API and c++; 6. fix bug in cuda kernel when Attr(adaptive) is true. test=develop,test=document_preview * fix 'tensors' to 'Tensors'. test=develop,test=document_preview * add test for converage ValueError.test=develop,test=document_preview * resolve conflict in test_pool2d. test=develop
-
- 27 9月, 2019 1 次提交
-
-
由 chengduo 提交于
* make pad and split support fp16 test=develop
-
- 25 9月, 2019 1 次提交
-
-
由 Bob Zhu 提交于
* add support of matmul with multiple head even different width and height Original matmul with multiple head supports only the mat_a.width == mat_b.height, in that case, mat_b will be horizontally split. In this patch, we extend the support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height, in this case, mab_b will be vertically split. One example is A is [3, 8], B is [2, 16], head_number is 4. In this case, A will be split as [3, 2], B will be (vertically) split as [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16] test=develop * add support of matmul with multiple head even different width and height Original matmul with multiple head supports only the mat_a.width == mat_b.height, in that case, mat_b will be horizontally split. In this patch, we extend the support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height, in this case, mab_b will be vertically split. One example is A is [3, 8], B is [2, 16], head_number is 4. In this case, A will be split as [3, 2], B will be (vertically) split as [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16] test=develop * refactor the code of matmul with multiple head even different width and height test=develop
-
- 23 9月, 2019 1 次提交
-
-
由 Kaipeng Deng 提交于
* fix softmax ce time limit check failed. test=develop * refine softmax calc. test=develop
-
- 20 9月, 2019 1 次提交
-
-
由 Aurelius84 提交于
* support 2-level lod of input in sequence_pool test=develop * fix lod level bug in .cu test=develop
-
- 16 9月, 2019 1 次提交
-
-
由 Kaipeng Deng 提交于
-
- 11 9月, 2019 2 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop
-
- 05 9月, 2019 3 次提交
- 04 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 03 9月, 2019 2 次提交
- 02 9月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 29 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 20 8月, 2019 1 次提交
-
-
由 Yihua Xu 提交于
* Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop
-
- 19 8月, 2019 1 次提交
-
-
由 silingtong123 提交于
* print error code if cuda related API fails
-
- 01 8月, 2019 1 次提交
-
-
由 LielinJiang 提交于
* fix depthwise conv gpu kernel bug, test=develop * add more depthwise conv test, test=develop
-
- 24 7月, 2019 1 次提交
-
-
由 Bob Zhu 提交于
* extend matmul op to support multiple head multiplication With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
-
- 28 6月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* add_elementwise_add_inplace_test,test=develop * rename file, test=develop
-
- 25 6月, 2019 1 次提交
-
-
由 Hongyu Liu 提交于
* sequnce mask support max length tensor input; test=develop * add rnn_impl.py; test=develop * add basic gru lstm unittest; test=develop * fix api spec; test=develop * fix sequence_mask op bug; test=develop test=document_preview * change +-*x to elmentwise_op; test=develop * add mkl flag; test=develop * fix rnn impl bug; test=develop * update api spec; test=develop * fix doc bug; test=develop * fix lstm bugs; test=develop
-
- 14 6月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
test=develop
-
- 12 6月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979) test=develop
-
- 10 6月, 2019 1 次提交
-
-
由 Yibing Liu 提交于
* Enable seq_pool op to accept len 0 input test=develop * Update sequence_pool's api test=develop * Add more unittest cases for seq_pool op test=develop * Remove legacy comments test=develop * Don't use template in op maker test=develop
-
- 30 5月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Enhance fused_elementwise_activation op. test=develop * Move the api fused_elementwise_activation to contrib. test=develop * Add including files. test=develop * Add the support of sigmoid in fused_elementwise_activetion op. * Update API.spec. test=develop
-
- 29 5月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415) * Optimize the concat and split kernel for special cases that the number of inputs/outputs is 2. test=develop * Refine codes. test=develop * Correct the condition. test=develop * Move the define of tmp_data outside the if statement. * Print the cudnn minor version. test=develop * Fix the case when in_num/o_num is 1 in concat/split op. test=develop * Remove const_cast. test=develop
-
- 24 5月, 2019 1 次提交
-
-
由 tensor-tang 提交于
* refine softmax fwd test=develop * refine cpu softmax bwd test=develop * fix batch size test=develop * fix compile issue with gpu test=develop * add value clip
-