- 08 11月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* fix activation_functions deps, test=develop, test=document_fix * add error_codes_proto deps, test=develop, test=document_fix * try delete enforce.h, test=develop, test=document_fix
-
- 05 11月, 2019 2 次提交
-
-
由 zhaoyuchen2018 提交于
ocr_recognition fails, so add a path to handle small frame_size. test=develop
-
由 Tao Luo 提交于
test=develop
-
- 01 11月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
seems shuffle_sync cannot handle small size test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 31 10月, 2019 2 次提交
-
-
由 Zhang Ting 提交于
* maxout support channel_last input, test=develop * modified details of Input(X) and Attr(groups, axis) in doc, test=develop
-
由 Zhang Ting 提交于
-
- 30 10月, 2019 1 次提交
-
-
由 zhang wenhui 提交于
-
- 28 10月, 2019 1 次提交
-
-
由 Aurelius84 提交于
-
- 23 10月, 2019 1 次提交
-
-
由 Pei Yang 提交于
Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733) * fix pool2d trt converter, test=develop * add fix for split op converter, test=develop
-
- 16 10月, 2019 1 次提交
-
-
由 qingqing01 提交于
* Support fp16 in fused_elemwise_activation_op. * Fix unit testing in ONLY-CPU mode.
-
- 13 10月, 2019 1 次提交
-
-
由 Zhang Ting 提交于
-
- 09 10月, 2019 1 次提交
-
-
由 liym27 提交于
* Delete PadFuntion, include padding.h instead. test=develop * move function(IsSymmetricPadding) from conv_cudnn_op.cu/conv_transpose_cudnn_op.cu to padding.h, test=develop
-
- 07 10月, 2019 1 次提交
-
-
由 Zhang Ting 提交于
-
- 30 9月, 2019 1 次提交
-
-
由 danleifeng 提交于
Improve elementwise operators performance in same dimensions
-
- 29 9月, 2019 1 次提交
-
-
由 liym27 提交于
1.support asymmetric padding; 2.support padding algorithm:"SAME" and "VALID"; 3.support channel_last: data_format NHWC and NDHWC; 4.change doc of python API and c++; test=develop, test=document_preview
-
- 28 9月, 2019 1 次提交
-
-
由 liym27 提交于
* fix pool2d pool3d: 1. support asymmetric padding; 2. support padding algorithm:"SAME" and "VALID"; 3. support channel_last: data_format NHWC and NDHWC; 4. support inferring shape when input with negative dims in compile time; 5. change doc of python API and c++; 6. fix bug in cuda kernel when Attr(adaptive) is true. test=develop,test=document_preview * fix 'tensors' to 'Tensors'. test=develop,test=document_preview * add test for converage ValueError.test=develop,test=document_preview * resolve conflict in test_pool2d. test=develop
-
- 27 9月, 2019 1 次提交
-
-
由 chengduo 提交于
* make pad and split support fp16 test=develop
-
- 25 9月, 2019 1 次提交
-
-
由 Bob Zhu 提交于
* add support of matmul with multiple head even different width and height Original matmul with multiple head supports only the mat_a.width == mat_b.height, in that case, mat_b will be horizontally split. In this patch, we extend the support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height, in this case, mab_b will be vertically split. One example is A is [3, 8], B is [2, 16], head_number is 4. In this case, A will be split as [3, 2], B will be (vertically) split as [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16] test=develop * add support of matmul with multiple head even different width and height Original matmul with multiple head supports only the mat_a.width == mat_b.height, in that case, mat_b will be horizontally split. In this patch, we extend the support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height, in this case, mab_b will be vertically split. One example is A is [3, 8], B is [2, 16], head_number is 4. In this case, A will be split as [3, 2], B will be (vertically) split as [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16] test=develop * refactor the code of matmul with multiple head even different width and height test=develop
-
- 23 9月, 2019 1 次提交
-
-
由 Kaipeng Deng 提交于
* fix softmax ce time limit check failed. test=develop * refine softmax calc. test=develop
-
- 20 9月, 2019 1 次提交
-
-
由 Aurelius84 提交于
* support 2-level lod of input in sequence_pool test=develop * fix lod level bug in .cu test=develop
-
- 16 9月, 2019 1 次提交
-
-
由 Kaipeng Deng 提交于
-
- 11 9月, 2019 2 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop
-
- 05 9月, 2019 3 次提交
- 04 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 03 9月, 2019 2 次提交
- 02 9月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 29 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 20 8月, 2019 1 次提交
-
-
由 Yihua Xu 提交于
* Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop
-
- 19 8月, 2019 1 次提交
-
-
由 silingtong123 提交于
* print error code if cuda related API fails
-
- 01 8月, 2019 1 次提交
-
-
由 LielinJiang 提交于
* fix depthwise conv gpu kernel bug, test=develop * add more depthwise conv test, test=develop
-
- 24 7月, 2019 1 次提交
-
-
由 Bob Zhu 提交于
* extend matmul op to support multiple head multiplication With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
-
- 28 6月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* add_elementwise_add_inplace_test,test=develop * rename file, test=develop
-
- 25 6月, 2019 1 次提交
-
-
由 Hongyu Liu 提交于
* sequnce mask support max length tensor input; test=develop * add rnn_impl.py; test=develop * add basic gru lstm unittest; test=develop * fix api spec; test=develop * fix sequence_mask op bug; test=develop test=document_preview * change +-*x to elmentwise_op; test=develop * add mkl flag; test=develop * fix rnn impl bug; test=develop * update api spec; test=develop * fix doc bug; test=develop * fix lstm bugs; test=develop
-
- 14 6月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
test=develop
-
- 12 6月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979) test=develop
-
- 10 6月, 2019 1 次提交
-
-
由 Yibing Liu 提交于
* Enable seq_pool op to accept len 0 input test=develop * Update sequence_pool's api test=develop * Add more unittest cases for seq_pool op test=develop * Remove legacy comments test=develop * Don't use template in op maker test=develop
-