- 06 1月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* move gpu_impl of elementwise kernel * change copyright to 2022
-
- 05 1月, 2022 1 次提交
-
-
由 crystal 提交于
* add elementwise div * move mul and div grad functor * Combine multiple CUDA kernels * Update the reduce interface call * add multi-output * add multi-output div * add branch judge * Package branch * Combine the x and y functions into one
-
- 04 1月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* change 'math' to 'math_kernel' * fix compile bugs * merge develop * fix compile bugs * move cpu_impl of elementwise kernel to new directory
-
- 18 12月, 2021 1 次提交
-
-
由 Feiyu Chan 提交于
* add complex op and `paddle.complex`.
-
- 15 12月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
test=document_fix
-
- 09 12月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 12 11月, 2021 1 次提交
-
-
由 YuanRisheng 提交于
* elementwise_add kernel refactor * fix compile bugs in elementwise_add refactor * fix compile bugs when run in npu/xpu * fix bugs when run unit test * fix bugs when run ci-windows * modify code as recommended * code format adjust * fix bugs when run ci * fix compile bug when run in ci-windwos
-
- 21 10月, 2021 1 次提交
-
-
由 Jack Zhou 提交于
* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs
-
- 15 9月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 14 9月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35688)
-
- 13 9月, 2021 2 次提交
- 22 8月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 05 7月, 2021 2 次提交
- 04 6月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 02 6月, 2021 2 次提交
- 12 4月, 2021 1 次提交
-
-
由 ronnywang 提交于
* [ROCM] fix test_gru_rnn_op * [ROCM] fix test_expand_op * [ROCM] fix test_cross_entropy_loss * [ROCM] fix test_conv_nn_grad * [ROCM] fix test_bilinear_tensor_product_op * [ROCM] fix elementwise_op_function * [ROCM] fix test_lstm_cudnn_op * [ROCM] fix test_gpu_package_without_gpu_device * [ROCM] fix test_gru_unit_op * [ROCM] fix test_imperative_optimizer * [ROCM] fix rnn * [ROCM] fix group_norm_op * [ROCM] fix test_pool3d_api * [ROCM] fix test_pool3d_op
-
- 10 3月, 2021 1 次提交
-
-
由 JamesLim 提交于
-
- 03 3月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [ROCM] update fluid elementwise op for rocm (part10), test=develop * update, test=develop * address review comments, test=develop
-
- 03 2月, 2021 1 次提交
-
-
由 wawltor 提交于
fix the broadcast for the large second input
-
- 10 1月, 2021 1 次提交
-
-
由 wangchaochaohu 提交于
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
-
- 05 8月, 2020 1 次提交
-
-
由 Zhaolong Xing 提交于
test=develop
-
- 16 6月, 2020 1 次提交
-
-
由 Leo Chen 提交于
-
- 12 5月, 2020 1 次提交
-
-
由 wawltor 提交于
* Remove the error in the elementwise op, use the backup mode to calculate
-
- 11 5月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop
-
- 13 4月, 2020 1 次提交
-
-
由 LutaoChu 提交于
Those ops add the kernel message enhancement, as follows paddle.fluid.layers.elementwise_add paddle.fluid.layers.elementwise_div paddle.fluid.layers.elementwise_floordiv paddle.fluid.layers.elementwise_max paddle.fluid.layers.elementwise_min paddle.fluid.layers.elementwise_mod paddle.fluid.layers.elementwise_mul paddle.fluid.layers.elementwise_pow paddle.fluid.layers.elementwise_sub
-
- 03 4月, 2020 2 次提交
-
-
由 zhaoyuchen2018 提交于
elementwise function used before definition then failed in cuda 8, move it ahead.
-
由 zhaoyuchen2018 提交于
* improve elementwise performance. * Add contiguous check, test=develop
-
- 29 3月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve elementwise performance. Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern. * Add some cuda kernel to speedup common broadcast cases. test=develop * Add more test cases and fix cuda kernel bug. test=develop * Remove tests as cpu percision fails.test=develop * Refine SplitDims, test=develop * Change file mode, test=develop
-
- 25 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 17 1月, 2020 1 次提交
-
-
由 qingqing01 提交于
-
- 19 11月, 2019 1 次提交
-
-
由 danleifeng 提交于
-
- 10 10月, 2019 1 次提交
-
-
由 danleifeng 提交于
-
- 04 9月, 2019 1 次提交
-
-
由 danleifeng 提交于
elementwise broadcast function enhancement
-
- 20 8月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
For small case use 1D block is better than 2D block. Refer to this issue: #19275
-
- 14 6月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
test=develop
-
- 20 5月, 2019 1 次提交
-
-
由 lvmengsi 提交于
* double backward, elementwise_div * fix dx empty. test=develop * bug fix (#17392) fix secure bug * Eanble stack operator for a Ngraph, test=develop (#17406) * fix sqrt_grad_grad unittest. test=develop (#17410) * fix sqrt_grad_grad unittest. test=develop * disable sqrt_grad_grad unittest. test=develop * test=develop, fix unittest * test=develop, fix unittest * test=develop, fix unittest * test=develop, fix bug * fix unittest. test=develop * fix unittest dx. test=develop * tmp fix! for test... test=develop * reduce tmp, test=develop * test=develop, reduce tmp * fix broadcast unittest. test=develop * fix format. test=develop * refine code. test=develop * refine code. test=develop * refine GetDoubleGradSafeTensor. test=develop * fix format. test=develop
-