- 16 11月, 2021 1 次提交
-
-
由 Li Min 提交于
fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
-
- 15 11月, 2021 10 次提交
-
-
由 Chen Weihang 提交于
* move extension into pten [no-verify] * append tensor methods by ext_tensor [no-verify] * append other tensor methods [no-verify] * ext related files tidy [no-verify] * include relation tidy [no-verify] * add pten tensor test [no-verify] * replace tensor in custom op & compile success * refine tensor constructor for unittest * custom relu jit run success * fix all custom op unittests * add inference cmake adapt [no-verify] * fix failed unittests * fix windows failed unittests * try to fix kunlun and inference failed * fix test_elementwise_api error * try to fix win compile failed * fix kunlun fp16 type error * remove useless haddle error macro * add custom linear op test * fix compile failed & add win symbols * fix non pten kernel cast failed * add dll decl for api * polish several deetails * polish details by review comment * add dll_decl for register
-
由 baoachun 提交于
* remove input dim check of activation in op_teller * remove input dim check of concat in op_teller * remove input dim check of clip in op_teller * remove input dim check of scale in op_teller * remove input dim check in op_teller * update attr check of slice in op_teller
-
由 wanghuancoder 提交于
* fix 3 bug, test=develop * refine, test=develop
-
由 arlesniak 提交于
* Added BF16 to mean op * fix for CI * fix for CI * fix for CI
-
由 Weilong Wu 提交于
* Add elementwise_mul triple grad kernel * Removed InplaceInferer and polished code
-
由 Zeng Jinle 提交于
* add split_program * make ut faster * increase ut timeout * make result deterministic * add fuse_all_reduce pass * add ut framework, update * fix ut framework * remove useless code * add coverage support * update * fix CI * fix some bugs and fix ci coverage * fix conflict
-
由 zyfncg 提交于
-
由 jiangcheng 提交于
-
由 Liu-xiandong 提交于
* modify sparse_attention docs, test=develop * add warning * add warning ,test=document_fix
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-
- 12 11月, 2021 7 次提交
-
-
由 zhangkaihuo 提交于
* fix bug: 1. atten: set the default value of attn_dropout_rate to None 2. ffn: add activation parameter
-
由 Chen Weihang 提交于
-
由 Yuang Liu 提交于
-
由 Leo Chen 提交于
* split declaration and implementation * remove initdevices * refine VariableMetaInfo * add ut * fix compile
-
由 Fan Zhang 提交于
[CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py (#36753) * [CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py * [CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py
-
由 Aganlengzi 提交于
-
由 zhaoyingli 提交于
* add AutoConvert * add unitest * amend merge&slice * amend default dist_attr * update doc&improve coverage * add interface dist_context * tiny modify
-
- 11 11月, 2021 11 次提交
-
-
由 zhouweiwei2014 提交于
-
由 Weilong Wu 提交于
* Add default arg to enhance varbase ClearGradient func * Removed default arg, use a Flag to enhance varbase ClearGradient func * Renamed Flags to FLAGS_real_release * Use default arg to enhance varbase ClearGradient func and expose two func to set/get gradient isEmpty * Removed DECLARE_bool statement * Polished Code
-
由 TTerror 提交于
* add where/where_index/masked_select for kunlun * fix where/where_index * update where/masked_select
-
由 jakpiase 提交于
* added softplus + activation fuse plass * minor change * implemented reviewer suggestion * minor fix * minor fix * added scale_out parameter * minor fix * fix for iScan CI * conditionally disabled logs * refactored pass builder
-
由 xiayanming 提交于
* fleet support elastic train * fleet support elastic train * support elastic * add unittest * fix unitest bug * fix unittest bug * fix unittest bug * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix elastic bug * fix ci fail * fix ci fail * fix elastic bug * fix elastic bug * fix joint debugging bug * fix joint debugging bug * fix windows ci failed * fix windows ci failed
-
由 zmx 提交于
* change username * fix * fix * fix * fix * fix * update * update * update unittests * fix * update * fix * update * fix * fix * fix * update * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update send_and_recv op. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix unit. notest,test=coverage * fix ut. notest, test=coverage * update. notest,test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix. notest, test=coverage * fix. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * add func. notest, test=coverage * fix ut. notest, test=coverage * fix. test=develop * fix. test=develop
-
由 Weilong Wu 提交于
* Expose func for varbase * Expose func for varbase and enhance varbase init func * Change func name and add test case for _CopyGradientWith * Rename func * Add test cases to increase coverage * Refine the logic of _to func * Replace numel() with _numel(), Add test code
-
由 LiYuRio 提交于
-
由 Wilber 提交于
-
由 wanghuancoder 提交于
* fix 2 bug: 1.skip lodtensorarray; 2.delete feed op, test=develop * program clone, test=develop
-
由 Nyakku Shigure 提交于
* add wide resnet * update pretrained weights link
-
- 10 11月, 2021 5 次提交
-
-
由 jakpiase 提交于
* added stack oneDNN FP32 op * minor change * CI fix * added skipping for gpus * fix for stack op * CI fix * CI fix * Added comment * CI fix
-
由 Aurelius84 提交于
-
由 Li Min 提交于
att, bug fix
-
由 baoachun 提交于
-
由 Jack Zhou 提交于
* fix rnn grad bug when num_layers is set 2 and dropout_prob is set 0 * add more test for rnn
-
- 09 11月, 2021 5 次提交
-
-
由 zhangbo9674 提交于
* refine layer to * delete comment * refine logic * refine code * refine pure_fp16_init * refine comment
-
由 Aurelius84 提交于
-
由 wanghuancoder 提交于
* delete profiler.cuda_profiler, test=develop * delete nvprof, test=develop * add required: gpu, test=develop * remove cuda_profiler, test=develop
-
由 Zeng Jinle 提交于
* try to fix CUDA Graph H2D copy bug * remove useless code * fix ci * fix ROCM CI * fix CUDA_VERSION * improve CI coverage
-
由 TTerror 提交于
-
- 08 11月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* Use cuda virtual memory management and merge blocks, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * window dll, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * use autogrowthv2 for system allocator, test=develop * remove ~CUDAVirtualMemAllocator(), test=develop * refine, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * fix bug, test=develop * revert system allocator, test =develop * revert multiprocessing, test=develop * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop * catch cudaErrorInitializationError when create allocator, test=develop * fix cuMemSetAccess use, test=develop * refine cuda api use, test=develop * refine, test=develop * for test, test=develop * for test, test=develop * switch to v2, test=develop * refine virtual allocator, test=develop * Record cuMemCreate and cuMemRelease, test=develop * refine, test=develop * avoid out of bounds, test=develop * rename allocator, test=develop * refine, test=develop * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop * for test,test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
-