- 24 11月, 2021 5 次提交
-
-
由 Jiawei Wang 提交于
-
由 Wangzheee 提交于
* matmul_convert_int8 * matmul_convert_int8 * matmulconvert_int8 * Matmul_int8_convert: tensor*tensor * Matmul_int8_convert: tensor*tensor * Matmul_int8_convert: tensor*tensor
-
由 zhaoyingli 提交于
* adapt auto search * adapt auto search * fix matmulv2 compatible * del debug
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the unified cluster representation * Add the local id for devices * Add some comments
-
由 0x45f 提交于
* run dy2stat pure fp16 in Linear model * no use self._pure_fp16_inputs * add test and fix Adam error in dy2stat pure fp16 training * use paddle.optimizer.Adam * run test in gpu * change test time for CI * enlarge atol for test_resnet_pure_fp16 * refine code and enlarge atol * make custom_white_list and custom_black_list take effect for AMP and pure fp16 * check tracer is not None * use default atol * change filter_size * change atol and add some NOTE
-
- 23 11月, 2021 8 次提交
-
-
由 pangyoki 提交于
* fix inplace bug * fix custom grad input error * add unittest * fix inplace bug
-
由 Li Min 提交于
Add support for bias is none for fused_attention op.
-
由 CtfGo 提交于
`paddle.utils.download` :change to call `extractall` on tar/zip compressd file to speed up the uncompress process when they includes many files --- result of decompression speed comparison --- 1. dataset:https://paddlenlp.bj.bcebos.com/datasets/cnn_dailymail/cnn_stories.tgz, decompression time :5m50s vs 20s 2. dataset:https://paddlenlp.bj.bcebos.com/datasets/cnn_dailymail/dailymail_stories.tgz, decompression time:33m20s vs 47s
-
由 Leo Chen 提交于
* skip compiled program with places > 1 * fix corner case and add ut
-
由 Wangzheee 提交于
* fix_nearest * fix_nearest * fix_nearest * fix_nearest
-
由 ronnywang 提交于
* Added HCCL backend support in dynamic graph mode * fix segmentation fault * add ut
-
由 Zhanlue Yang 提交于
* Bug fix for snapshoting VariableWrapper with initialized tensor but empty allocation * Added unittest for inplace&clear_gradient
-
由 Aurelius84 提交于
* Add transfer_layout/dtype op * clean useless codes * fix unused var * add optest in white.txt * split into data_transfer.cc * fix cmake * modify according reviewer comment * replace cast_op with transfer_dtype_op
-
- 22 11月, 2021 13 次提交
-
-
由 zhaoyingli 提交于
* fix autoconvert * fix merge parameter
-
由 andyjpaddle 提交于
* add isclose op, test=develop * add isclose op, test=develop * add isclose api, test=develop * rm useless code * rm useless code * update python api of isclose * add some unittest of isclose op, test=develop
-
由 0x45f 提交于
[Dy2stat]Allow users to switch eval/train mode when using @to_static to decorate a function (#37383) * Allow users to switch eval/train mode when using @to_static to decorate a function * refine code for train() and eval()
-
由 zhupengyang 提交于
-
由 zyfncg 提交于
* support zero dim for slice op * support zero dim Tensor in set_value op * polish some debug log
-
由 zyfncg 提交于
-
由 Zhanlue Yang 提交于
-
由 Wilber 提交于
* shape api should not backward * fix stop_gradient * update * update doc
-
由 Jason 提交于
-
由 Weilong Wu 提交于
* Removed one ENFORCE statement * Changed func name to _share_buffer_to * Improve error reporting information * Updated the logic of _is_share_buffer_to func
-
由 Webbley 提交于
-
由 zmx 提交于
* fix api. test=develop * fix api. test=develop
-
由 Leo Chen 提交于
-
- 19 11月, 2021 8 次提交
-
-
由 Weilong Wu 提交于
-
由 lilong12 提交于
-
由 zhouweiwei2014 提交于
* add new API paddle.nn.initializer.Orthogonal and calculate_gain * fix comment * fix comment
-
由 wuhuanzhou 提交于
* GeneratePass support attr condition and mapping, test=develop * fix coverage, test=develop * Add fuse_resnet_unit pass, test=develop * fix CI errors, test=develop * fix CI errors, test=develop * fix unittest error when compiling without CUDA, test=develop * fix static ci error, test=develop * limit kernel size must equal 1, test=develop
-
由 wangguanqun 提交于
-
由 Siming Dai 提交于
* add cpu version, using set: sum, min, max * add cpu version: mean * improve cpu code and fix dynamic memory allcation problem * fix arg error, add index judge, delete fp16 * fix bug in CudaAtomicMax and CudaAtomicMin * add CUDA version * fix grad_op bug for index * add op test, add correct cpu grad op * Add correct CUDA Mean grad * [Add] Successful MEAN and SUM * [Add] Successful MIN and MAX in CPU * [Add] Successful MIN and MAX in CUDA * fix windows dtype ci * fix ROCM ci by adding HIP flag * rename fused_gather_scatter to send_recv * unify name as send and recv * change zero index return time * add send_recv incubate api * fix index data type, add unittest case for API * delete redundant input tensor * fix en example and docs, add default value in pool_type * add shape judge and max grid judge * fix comment * fix index type bug * add const & * fix en docs * delete numpy in examples * add unittest for int input * fix send_recv comment * change send_recv to graph_send_recv
-
由 Yuang Liu 提交于
-
由 0x45f 提交于
* support `for i in [1,2,3]` statements in dy2stat * add test case * fix ci * remove wrong code
-
- 18 11月, 2021 6 次提交
-
-
由 zmx 提交于
* fix pslib. test=develop * add device to train_from_dataset. test=develop * refine fleet.stop_worker. test=develop * fix ut. test=develop * fix ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop
-
由 xiayanming 提交于
* fleet support elastic train * fleet support elastic train * support elastic * add unittest * fix unitest bug * fix unittest bug * fix unittest bug * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix unittest coverage * fix elastic bug * fix ci fail * fix ci fail * fix elastic bug * fix elastic bug * fix joint debugging bug * fix joint debugging bug * fix windows ci failed * fix windows ci failed * Optimize fleet elastic scale in/out * elastic support pre hook * add prehook unittest
-
由 zhangbo9674 提交于
-
由 Shang Zhizhou 提交于
-
由 zmx 提交于
-
由 Yuang Liu 提交于
-