- 10 1月, 2021 1 次提交
-
-
由 wangchaochaohu 提交于
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
-
- 05 8月, 2020 1 次提交
-
-
由 Zhaolong Xing 提交于
test=develop
-
- 16 6月, 2020 1 次提交
-
-
由 Leo Chen 提交于
-
- 12 5月, 2020 1 次提交
-
-
由 wawltor 提交于
* Remove the error in the elementwise op, use the backup mode to calculate
-
- 11 5月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop
-
- 13 4月, 2020 1 次提交
-
-
由 LutaoChu 提交于
Those ops add the kernel message enhancement, as follows paddle.fluid.layers.elementwise_add paddle.fluid.layers.elementwise_div paddle.fluid.layers.elementwise_floordiv paddle.fluid.layers.elementwise_max paddle.fluid.layers.elementwise_min paddle.fluid.layers.elementwise_mod paddle.fluid.layers.elementwise_mul paddle.fluid.layers.elementwise_pow paddle.fluid.layers.elementwise_sub
-
- 03 4月, 2020 2 次提交
-
-
由 zhaoyuchen2018 提交于
elementwise function used before definition then failed in cuda 8, move it ahead.
-
由 zhaoyuchen2018 提交于
* improve elementwise performance. * Add contiguous check, test=develop
-
- 29 3月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve elementwise performance. Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern. * Add some cuda kernel to speedup common broadcast cases. test=develop * Add more test cases and fix cuda kernel bug. test=develop * Remove tests as cpu percision fails.test=develop * Refine SplitDims, test=develop * Change file mode, test=develop
-
- 25 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 17 1月, 2020 1 次提交
-
-
由 qingqing01 提交于
-
- 19 11月, 2019 1 次提交
-
-
由 danleifeng 提交于
-
- 10 10月, 2019 1 次提交
-
-
由 danleifeng 提交于
-
- 04 9月, 2019 1 次提交
-
-
由 danleifeng 提交于
elementwise broadcast function enhancement
-
- 20 8月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
For small case use 1D block is better than 2D block. Refer to this issue: #19275
-
- 14 6月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
test=develop
-
- 20 5月, 2019 1 次提交
-
-
由 lvmengsi 提交于
* double backward, elementwise_div * fix dx empty. test=develop * bug fix (#17392) fix secure bug * Eanble stack operator for a Ngraph, test=develop (#17406) * fix sqrt_grad_grad unittest. test=develop (#17410) * fix sqrt_grad_grad unittest. test=develop * disable sqrt_grad_grad unittest. test=develop * test=develop, fix unittest * test=develop, fix unittest * test=develop, fix unittest * test=develop, fix bug * fix unittest. test=develop * fix unittest dx. test=develop * tmp fix! for test... test=develop * reduce tmp, test=develop * test=develop, reduce tmp * fix broadcast unittest. test=develop * fix format. test=develop * refine code. test=develop * refine code. test=develop * refine GetDoubleGradSafeTensor. test=develop * fix format. test=develop
-
- 13 5月, 2019 1 次提交
-
-
由 Kaipeng Deng 提交于
* add double grad for elementwise_mul. test=develop * remove comment. test=develop * fix grad sum. test=develop * fix for axis expand. test=develop * add test for axis expand. test=develop
-
- 08 5月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Refine elementwise kernel. Add a simple cuda kernel if grad x and y both exist Use 2D block cuda kernel to do broadcast. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * refine code. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 24 1月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 16 11月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* wip simplify operator framework * wip * wip * done test=develop * clean test=develop * fix test=develop * fix deps test=develop * fix cpu build test=develop * fix tensorrt build test=develop * fix tests test=develop * fix test=develop * fix cpu build test=develop
-
- 14 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 08 11月, 2018 1 次提交
-
-
由 Zhaolong Xing 提交于
-
- 07 11月, 2018 1 次提交
-
-
由 chengduo 提交于
* add fp16 backward support test=develop * add sum_op fp16 test * disable test_dist_save_load test=develop * add check_grad for sum * add unit test for softmax_grad fp16 test=develop * add scale_op unit test * add mul_grad_op unit test for fp16 * add cross_entropy_grad and eman_grad unit test for fp16 test=develop * fix cross_entropy unit test * add pool2d fp16 unit test * refine conv2d fp16 unit test test=develop * refine activation unit test test=develop * fix ci test=develop * follow zhihong's comment, copy from https://github.com/PaddlePaddle/Paddle/pull/12796 test=develop
-
- 05 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
-
- 14 10月, 2018 1 次提交
-
-
由 wanghaoshuang 提交于
-
- 20 9月, 2018 1 次提交
-
-
由 chengduo 提交于
* Add Preface * Add demo code * Save file * Refine code * seems can work * use elementwise strategy * Use ElementwiseComputeEx * Add comments * extract functions from operator * Refine code * Follow comment * code refine * add op_fuse pass * add backward * code refine * use TopologySortOperations * follow comments * refine IsFusible * code enhance * fix op_fusion_pass * refine code * refine fuse_elemwise_act_op * adjust the input and output * refine logic * add intermediate_edge * disable inplace * follow comments * refine logic * follow comments * Remove the removable IntermediateOut * change strategy * code refine * enable fuse backward * code refine * code refine * rename unit test * follow comments
-
- 12 9月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 03 9月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 30 8月, 2018 1 次提交
-
-
由 chengduo 提交于
* Enhance the function of fused_elementwise_activation_op * enhance unit test * Clean Code And Add Doc * Add compound functors * Fix doc and enhance unit test * define Dx and Dy for d_binary_func * add mul_scale * add mul_scale * add elementwise_mul * code refine * code refine * add doc * add AsIntermediate
-
- 27 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 20 8月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 17 8月, 2018 1 次提交
-
- 16 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "cherry picked operators changes" * "remove duplicated code" * "add constant setter" * "add get expected kernel" * "fix ci" * "add fill constant"
-
- 10 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 01 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "add gradient register" * "make some enhance" * "better format" * "fix typo" * "fix reuse" * "fix get expected kernel" * "change the mkldnn code" * "fix mkldnn" * "fix mkldnn failed test" * "add comment"
-
- 03 5月, 2018 1 次提交
-
-
由 chengduo 提交于
* fix __shfl_down_sync_ of cross_entropy * use reduceSum * "fix ci"
-
- 30 4月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "re-commit " * "picked up" * "fix ci" * "fix pdb hang up issue in cuda 9"
-
- 24 4月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 10 4月, 2018 1 次提交
-
-
由 chengduo 提交于
* add cuda_device_functions.h * move reduceSum to elementwise_op_function.h
-