- 08 1月, 2021 8 次提交
-
-
由 liym27 提交于
[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003) (#30146) 1. when slice_item is a slice: 1) the start of __getitem__ should be std::max(start, 0) if slice 2) the start of __getitem__ should be std::min(end, dim) 2. when slice_item is an integer, it should be in [-dim_len, dim_len) 3. Fix error message to use accurate data
-
由 liym27 提交于
1. Type of index: int, slice(step must be 1). 2. Type of value: (1) int32, int64, float32, bool; (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported> (3) paddle.Tensor(int32, int64, float32, float64, bool);
-
由 Jiaqi Liu 提交于
* fix beam search bug * add dygraph unittest * update dynamic_decode argument doc * add warning info for state which has no lengths attribute
-
由 Chen Weihang 提交于
* simplify prepared op impl to improve performance * fix kunlun compile error * continue fix kunlun compile error * only transform diff place when dtype diff * fix failed unittests * remove useless file * polish impl by review comment
-
由 123malin 提交于
* Add Lookahead and ModelAverage Optimizer (#30004) * test=develop, add model_average and lookahead * Improve Index select cuda kernel (#30139) * test=develop, add index_select_cuda kernel
-
由 LutaoChu 提交于
-
由 ceci3 提交于
* fix syncbn convet * add unittest
-
由 Chen Weihang 提交于
* Simplify the options of spawn based on fleetrun (#30144) * Simplify the options of spawn based on fleetrun * polish details * polish doc details * cleanup enum test=develop (#29294) Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
-
- 07 1月, 2021 5 次提交
-
-
由 WeiXin 提交于
* Support storage of large parameters (#29988) * Support storage of large parameters * Reduce the complexity of the unittest * Reduce the complexity of the unittest,commented out unittest for * add unittest for static.save/load * Increase the timeout threshold of 'test_static_save_load' * Increase the timeout threshold of 'test_static_save_load' * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load' * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load' * Extend the timeout for the (#30151)
-
由 Leo Chen 提交于
* Improve performance of elementwise_add grad op (#29187) * pass stop_gradient for cast op * improve performance of elementwise_add grad * use tensor copy async * dygraph branch * fix dygraph branch * add ut * make gelu fp16 computing more robust (#29484) * Add fast path for dropout when p == 0 (#29553) * add fast path for p == 0 in dropout * add ut
-
由 furnace 提交于
* Layer norm fp16 (#29169) * add fp16 for layer_norm op * revert layernorm api * fix forward * fix forward * fix backward for layernorm with fp16 * fix unit test for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U> * fix with_mkldnn compile error for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> * fix layer_norm accuracy (#29434) * Layernorm opt (#29522) * layernorm fw opt * layernorm bw opt * fix typo, test=develop * remove const dim3 for windows CI compatibility * merge develop Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com> * Fix compile problem when cuda_arch < 6000 (#29576) * fix compile problem when cuda_arch < 6000 * refine code * refine code Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
-
由 tangwei12 提交于
Change-Id: Ia5279b0cbb6a5b3970aff66e9510e0d85efa70ce
-
由 ceci3 提交于
* fix bn docs (#30096) * add attribute for batch_norm (#29950) * add attribute for batch_norm
-
- 06 1月, 2021 3 次提交
-
-
由 gongweibao 提交于
* fix log test=release/2.0 * fix ut test=develop
-
由 huangxu96 提交于
* add fp16 check into max and avg pool (#29479) * Add ReserveSpace in dygraph batch_norm. (#29221) * Add ReserveSpace in dygraph batch_norm. * Add unittest for reservespace * add float16 into adaptive_avg_pool2d check list. (#29547)
-
由 liym27 提交于
4 APIs: array_length, array_read, array_write, create_array,cherry-pick #29565
-
- 05 1月, 2021 6 次提交
-
-
由 Thunderbrook 提交于
* add topo aware * resource.h * topo aware * format
-
由 liym27 提交于
* [cherry-pick 2.0] Fix unitest test_slice (#29740) Before this commit, test_slice use old api `dygraph_to_static_func` to use Dynamic-t-Static and use Executor explicitly,which is not recommended to users. After fixed, use recommended API `paddle.jit.to_static` to replace `dygraph_to_static_func`, which won't trigger the random exception on coverage CI. * [cherry-pick 2.0][Dy2Stat] Support grammar: for ele in var[idx] (#29541) Support to transformfor ele in var stms in which var is a slice of Tensor. * [cherry-pick 2.0][Dy2Stat] Fix bug for loop: a variable is used and created in loop, but used before created (#29769)
-
由 cc 提交于
* fix ininite scale values (#29386) * Support dygraph quant model (#29927) * Avoid the scale to be infinity in quant2_int8_mkldnn_pass, test=develop * support quantized model for paddle2.0 dygraph, test=develop Co-authored-by: NWojciech Uss <wojciech.uss@intel.com>
-
由 gongweibao 提交于
-
由 Chen Weihang 提交于
Set FLAGS_selected_gpus for spawn. When the child process starts, it will inherit the configuration of the main process and set the FLAGS once, but the environment variable has not been set at this time, which leads to the FLAGS_selected_gpus is keep same with mainprocess(usually empty), so manually update the flags here. 注:增加了一个单测,又移除了,单测打印显示CI机器nvidia-smi只有两张卡,需要大于两张卡才能测这个问题
-
由 cc 提交于
-
- 04 1月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* support deepcopy for Layer/Tensor/Paramerbase * fix some code
-
- 31 12月, 2020 5 次提交
-
-
由 lilong12 提交于
* add distributed.split, test=develop
-
由 lilong12 提交于
* update, test=develop
-
由 lilong12 提交于
* update, test=develop (#29559) * Disable gloo by default (#29805) * update, test=develop * update, test=develop
-
由 zhupengyang 提交于
test=develop
-
由 xiaoting 提交于
* add alias for upsample, test=develop * add alias for upsample * fix example
-
- 30 12月, 2020 3 次提交
-
-
由 wawltor 提交于
-
由 Chen Long 提交于
* fix doc bugs test=document_fix * fix code bugs test=document_fix * fix code bugs test=document_fix * fix doc bugs test=document_fix * fix doc bugs test=document_fix * fix doc bugs test=document_fix
-
由 LielinJiang 提交于
* fix cv2 rotation
-
- 29 12月, 2020 6 次提交
-
-
由 liuyuhui 提交于
* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926) * add bkcl.so in whl for kunlun (#29947) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29961) Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
-
由 Chen Weihang 提交于
* [Complex] Add support for complex grad accumulated (#29889) * add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line * [Complex] Handle complex to real after type promotion (#29855) * try to add fwd op input dtypes * refactor base impl * return tmp_ins after dygraph prepare data * fix typo found in debug * polish comment & add complex net test * revert detail change * fix unittest failed * add complex kernel condition control * fix xpu test failed & polish comment * polish details by review comments * Complex op test (#29753) * delete no need to calculate inputs in dygraph op_test * delete no need to calculate inputs in dygraph op_test * change grad elementwise_mul for complex types (#29757) * add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number * delete no need to calculate inputs in dygraph op_test * delete no need to calculate inputs in dygraph op_test * modify grad of mul for complex types * fix the grads of inputs args order not match bug * change the grad of div when complex types (#29804) * change the grad of div when complex types * fix the grads of inputs args order not match bug Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
-
由 Wilber 提交于
-
由 Thunderbrook 提交于
* cherry pick heter ps * CMakeList
-
由 LielinJiang 提交于
* fix conv_transpose bug when padding=same
-
由 XiaoguangHu 提交于
* [cherry-pick] cherry-pick of PR#29928 * delete paddle.metric.chunk_eval and paddle.metric.mean_iou * delete paddle.nn.clip and paddle.nn.clip_by_norm * delete paddle.nn.functional.activation.hard_sigmoid and paddle.nn.functional.activation.hard_swish * [cherry-pick] cherry-pick of PR#29928 * fix extension import error
-
- 28 12月, 2020 2 次提交
-
-
由 liym27 提交于
[Cherry-Pick 2.0][Dy2Stat] 1. Fix bug of for-range stmts. 2. Support that step value is negative in for-range stmts (#29519) (#29874) 1. Fix error in _build_cond_stmt of for-range stmts. 2. Support that step value is negative in for-range stmts 3. Fix code because of the diff between Py2 and Py3
-
由 Huihuang Zheng 提交于
* [Dy2stat] Enable jit.save to Save Without Running (#29579) Enable jit.save to Save Without Running. * Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617) Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
-
- 25 12月, 2020 1 次提交
-
-
由 LielinJiang 提交于
* update to_tensor en docs
-