- 07 11月, 2018 1 次提交
-
-
由 chengduo 提交于
* add fp16 backward support test=develop * add sum_op fp16 test * disable test_dist_save_load test=develop * add check_grad for sum * add unit test for softmax_grad fp16 test=develop * add scale_op unit test * add mul_grad_op unit test for fp16 * add cross_entropy_grad and eman_grad unit test for fp16 test=develop * fix cross_entropy unit test * add pool2d fp16 unit test * refine conv2d fp16 unit test test=develop * refine activation unit test test=develop * fix ci test=develop * follow zhihong's comment, copy from https://github.com/PaddlePaddle/Paddle/pull/12796 test=develop
-
- 14 10月, 2018 1 次提交
-
-
由 wanghaoshuang 提交于
-
- 20 9月, 2018 1 次提交
-
-
由 chengduo 提交于
* Add Preface * Add demo code * Save file * Refine code * seems can work * use elementwise strategy * Use ElementwiseComputeEx * Add comments * extract functions from operator * Refine code * Follow comment * code refine * add op_fuse pass * add backward * code refine * use TopologySortOperations * follow comments * refine IsFusible * code enhance * fix op_fusion_pass * refine code * refine fuse_elemwise_act_op * adjust the input and output * refine logic * add intermediate_edge * disable inplace * follow comments * refine logic * follow comments * Remove the removable IntermediateOut * change strategy * code refine * enable fuse backward * code refine * code refine * rename unit test * follow comments
-
- 12 9月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 03 9月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 30 8月, 2018 1 次提交
-
-
由 chengduo 提交于
* Enhance the function of fused_elementwise_activation_op * enhance unit test * Clean Code And Add Doc * Add compound functors * Fix doc and enhance unit test * define Dx and Dy for d_binary_func * add mul_scale * add mul_scale * add elementwise_mul * code refine * code refine * add doc * add AsIntermediate
-
- 27 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 20 8月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 17 8月, 2018 1 次提交
-
- 16 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "cherry picked operators changes" * "remove duplicated code" * "add constant setter" * "add get expected kernel" * "fix ci" * "add fill constant"
-
- 10 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 01 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "add gradient register" * "make some enhance" * "better format" * "fix typo" * "fix reuse" * "fix get expected kernel" * "change the mkldnn code" * "fix mkldnn" * "fix mkldnn failed test" * "add comment"
-
- 03 5月, 2018 1 次提交
-
-
由 chengduo 提交于
* fix __shfl_down_sync_ of cross_entropy * use reduceSum * "fix ci"
-
- 30 4月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "re-commit " * "picked up" * "fix ci" * "fix pdb hang up issue in cuda 9"
-
- 24 4月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 10 4月, 2018 1 次提交
-
-
由 chengduo 提交于
* add cuda_device_functions.h * move reduceSum to elementwise_op_function.h
-
- 06 3月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 28 2月, 2018 1 次提交
-
-
由 xuwei06 提交于
When the second argument contains batch dimension, the axis should be 0. Also makes elementwise ops more tolerant at handling tensors with trailing singular dimensions.
-
- 26 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 24 2月, 2018 2 次提交
-
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
- 23 2月, 2018 2 次提交
- 13 2月, 2018 1 次提交
-
-
由 xuwei06 提交于
And some minor fixes on comments.
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 03 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 02 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 22 1月, 2018 1 次提交
-
-
由 Yang Yu 提交于
-
- 19 1月, 2018 1 次提交
-
-
由 Yang Yu 提交于
-
- 17 1月, 2018 1 次提交
-
-
由 fengjiayi 提交于
-
- 16 1月, 2018 1 次提交
-
-
由 fengjiayi 提交于
-
- 15 1月, 2018 1 次提交
-
-
由 fengjiayi 提交于
-
- 26 12月, 2017 1 次提交
-
-
由 Luo Tao 提交于
-
- 25 12月, 2017 1 次提交
-
-
由 chengduoZH 提交于
-
- 19 12月, 2017 2 次提交
-
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
- 16 12月, 2017 1 次提交
-
-
由 chengduoZH 提交于
-
- 12 12月, 2017 1 次提交
-
-
由 QI JUN 提交于
There are mainly following fixes: - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place` - remove `eigen_device` interface in base class `DeviceContext` - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext` - remove unused `platform::EigenDeviceConverter` - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL` - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
-