- 15 11月, 2021 7 次提交
- 
- 
由 arlesniak 提交于* Added BF16 to mean op * fix for CI * fix for CI * fix for CI 
- 
由 Weilong Wu 提交于* Add elementwise_mul triple grad kernel * Removed InplaceInferer and polished code 
- 
由 Zeng Jinle 提交于* add split_program * make ut faster * increase ut timeout * make result deterministic * add fuse_all_reduce pass * add ut framework, update * fix ut framework * remove useless code * add coverage support * update * fix CI * fix some bugs and fix ci coverage * fix conflict 
- 
由 zyfncg 提交于
- 
由 jiangcheng 提交于
- 
由 Liu-xiandong 提交于* modify sparse_attention docs, test=develop * add warning * add warning ,test=document_fix 
- 
由 zmx 提交于* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop 
 
- 
- 12 11月, 2021 7 次提交
- 
- 
由 zhangkaihuo 提交于* fix bug: 1. atten: set the default value of attn_dropout_rate to None 2. ffn: add activation parameter 
- 
由 Chen Weihang 提交于
- 
由 Yuang Liu 提交于
- 
由 Leo Chen 提交于* split declaration and implementation * remove initdevices * refine VariableMetaInfo * add ut * fix compile 
- 
由 Fan Zhang 提交于[CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py (#36753) * [CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py * [CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py 
- 
由 Aganlengzi 提交于
- 
由 zhaoyingli 提交于* add AutoConvert * add unitest * amend merge&slice * amend default dist_attr * update doc&improve coverage * add interface dist_context * tiny modify 
 
- 
- 11 11月, 2021 11 次提交
- 
- 
由 zhouweiwei2014 提交于
- 
由 Weilong Wu 提交于* Add default arg to enhance varbase ClearGradient func * Removed default arg, use a Flag to enhance varbase ClearGradient func * Renamed Flags to FLAGS_real_release * Use default arg to enhance varbase ClearGradient func and expose two func to set/get gradient isEmpty * Removed DECLARE_bool statement * Polished Code 
- 
由 TTerror 提交于* add where/where_index/masked_select for kunlun * fix where/where_index * update where/masked_select 
- 
由 jakpiase 提交于* added softplus + activation fuse plass * minor change * implemented reviewer suggestion * minor fix * minor fix * added scale_out parameter * minor fix * fix for iScan CI * conditionally disabled logs * refactored pass builder 
- 
由 xiayanming 提交于* fleet support elastic train * fleet support elastic train * support elastic * add unittest * fix unitest bug * fix unittest bug * fix unittest bug * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix elastic bug * fix ci fail * fix ci fail * fix elastic bug * fix elastic bug * fix joint debugging bug * fix joint debugging bug * fix windows ci failed * fix windows ci failed 
- 
由 zmx 提交于* change username * fix * fix * fix * fix * fix * update * update * update unittests * fix * update * fix * update * fix * fix * fix * update * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update send_and_recv op. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix unit. notest,test=coverage * fix ut. notest, test=coverage * update. notest,test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix. notest, test=coverage * fix. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * add func. notest, test=coverage * fix ut. notest, test=coverage * fix. test=develop * fix. test=develop 
- 
由 Weilong Wu 提交于* Expose func for varbase * Expose func for varbase and enhance varbase init func * Change func name and add test case for _CopyGradientWith * Rename func * Add test cases to increase coverage * Refine the logic of _to func * Replace numel() with _numel(), Add test code 
- 
由 LiYuRio 提交于
- 
由 Wilber 提交于
- 
由 wanghuancoder 提交于* fix 2 bug: 1.skip lodtensorarray; 2.delete feed op, test=develop * program clone, test=develop 
- 
由 Nyakku Shigure 提交于* add wide resnet * update pretrained weights link 
 
- 
- 10 11月, 2021 6 次提交
- 
- 
由 jakpiase 提交于* added stack oneDNN FP32 op * minor change * CI fix * added skipping for gpus * fix for stack op * CI fix * CI fix * Added comment * CI fix 
- 
由 Aurelius84 提交于
- 
由 Huihuang Zheng 提交于Add libcinnapi.so to setup.py.in 
- 
由 Li Min 提交于att, bug fix 
- 
由 baoachun 提交于
- 
由 Jack Zhou 提交于* fix rnn grad bug when num_layers is set 2 and dropout_prob is set 0 * add more test for rnn 
 
- 
- 09 11月, 2021 5 次提交
- 
- 
由 zhangbo9674 提交于* refine layer to * delete comment * refine logic * refine code * refine pure_fp16_init * refine comment 
- 
由 Aurelius84 提交于
- 
由 wanghuancoder 提交于* delete profiler.cuda_profiler, test=develop * delete nvprof, test=develop * add required: gpu, test=develop * remove cuda_profiler, test=develop 
- 
由 Zeng Jinle 提交于* try to fix CUDA Graph H2D copy bug * remove useless code * fix ci * fix ROCM CI * fix CUDA_VERSION * improve CI coverage 
- 
由 TTerror 提交于
 
- 
- 08 11月, 2021 4 次提交
- 
- 
由 wanghuancoder 提交于* Use cuda virtual memory management and merge blocks, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * window dll, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * use autogrowthv2 for system allocator, test=develop * remove ~CUDAVirtualMemAllocator(), test=develop * refine, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * fix bug, test=develop * revert system allocator, test =develop * revert multiprocessing, test=develop * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop * catch cudaErrorInitializationError when create allocator, test=develop * fix cuMemSetAccess use, test=develop * refine cuda api use, test=develop * refine, test=develop * for test, test=develop * for test, test=develop * switch to v2, test=develop * refine virtual allocator, test=develop * Record cuMemCreate and cuMemRelease, test=develop * refine, test=develop * avoid out of bounds, test=develop * rename allocator, test=develop * refine, test=develop * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop * for test,test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop 
- 
由 Li Min 提交于目前的fused_attention_op不支持attn_mask=None的输入,本PR对此进行了补充,并补充了相应的单测逻辑。 
- 
由 Wilber 提交于
- 
由 kuizhiqing 提交于
 
- 
 
