- 25 10月, 2021 1 次提交
-
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
- 24 10月, 2021 1 次提交
-
-
由 Jack Zhou 提交于
* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs
-
- 22 10月, 2021 1 次提交
-
-
由 niuliling123 提交于
* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 * Update the implement of reduceAnyKernel according to kernel primitive api
-
- 30 9月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
-
- 22 9月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
[Cherry-pick 2.2] Correct the return type of elementwise kernel to avoid many compiling warnings. (#35839) (#35868) Cherry-pick #35839
-
- 18 9月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
* - REorder disabling caching * - compilation fix * - another compilation fix * - another compilation fix * - compilation fix * - Fix * - yet another compilation fix * - suppresingly another compilation fix * - lint * - fix after review * - fix
-
- 15 9月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 14 9月, 2021 2 次提交
- 13 9月, 2021 2 次提交
- 08 9月, 2021 1 次提交
-
-
由 will-jl944 提交于
multiply supports bool
-
- 07 9月, 2021 1 次提交
-
-
由 niuliling123 提交于
-
- 06 9月, 2021 1 次提交
-
-
由 wawltor 提交于
* Add the extra flag for the some ops * fix the compile problem in matmul extra
-
- 03 9月, 2021 2 次提交
- 02 9月, 2021 1 次提交
-
-
由 wangxinxin08 提交于
add axis check for elementwise op while the dimension of x is equal to the dimension of tensor (#35340)
-
- 31 8月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
-
- 27 8月, 2021 1 次提交
-
-
由 baoachun 提交于
* add elementwise max grad op for npu * add elementwise max grad op for npu * add elementwise max grad op for npu * add elementwise max grad op for npu * add elementwise max grad op for npu
-
- 26 8月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
[oneDNN] disable caching oneDNN primitives in matmul v2, Reduce grad and elementwise_add grad, expand_v2 (#35132) * - grad caching disabled of matmul_v1 - compilation fix - compilation fix * - reduction removed * - Matmul v2 disabled caching * Draft of further changes * - workaround for reducegrad * - fixes to UT * - fix to compilation * - another fix * - fix
-
- 25 8月, 2021 2 次提交
-
-
由 ronnywang 提交于
-
由 taixiurong 提交于
-
- 22 8月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 16 8月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
* - Added softmax without caching * - Binary is no longer manually cached * - Activation onednn caching removed * - Removed manual caching of activation * - modified UT * - fix * - fix * - fixes to building * - fix * - fix * - fix to UT * - Faulty UT workaround * - approval workaround * - Fixes after review * - compilation fixes * - more lint fixes * - more fixes after review * - fixes after another round of review * - hopefully compilation fix - compilation fix
-
- 12 8月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
This reverts commit 0a5c99e8.
-
- 11 8月, 2021 2 次提交
-
-
由 Jacek Czaja 提交于
* - Added softmax without caching * - Binary is no longer manually cached * - Activation onednn caching removed * - Removed manual caching of activation * - modified UT * - fix * - fix * - fixes to building * - fix * - fix * - fix to UT * - Faulty UT workaround * - approval workaround * - Fixes after review * - compilation fixes * - more lint fixes * - more fixes after review * - fixes after another round of review
-
由 andyjpaddle 提交于
-
- 09 8月, 2021 1 次提交
-
-
由 ronnywang 提交于
* add broadcast supporting for elementwise_add * add broadcast supporting for elementwise_add * add more tests * remove the redundant code * update * fix place error in unittest * remove skip.If
-
- 05 8月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 07 7月, 2021 1 次提交
-
-
由 taixiurong 提交于
-
- 05 7月, 2021 2 次提交
- 24 6月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
* - fix to #33282 * - Increased threshold for elementwise_mul_bf16 grad * -disabled faulty UT * - fix to approval
-
- 23 6月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 12 6月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 04 6月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 02 6月, 2021 2 次提交
- 26 5月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* refine ~npuOpRunner * implement destructor and forbid copy * use reference to avoid copy * use const reference * relax adam precision * fix top_k
-
- 25 5月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* modify complex template for elementwise ops * modify mul, div grad struct * add complex template for CudaShuffleDownSync CudaShuffleXorSync funcs and fix the bug when delete cuda<9000 * fix shuffle func args bug * fix shuffle func args bug * fix shuffle func args bug
-