- 24 8月, 2021 4 次提交
-
-
由 Jacek Czaja 提交于
* - concat refactoring draft * - cmpilation fixes * - yet another compilation fix * - fix * - compilation fix * - fixes to compilation * - another compilation fix * - fix * - Added overloaded AcquirePrimitiveDesc for concat * - fix * - reserve introduced * - UT fixes * - test concat int8 improved * - fixes * - fix to crash * - lint fixes * - fixes after review * - some other fixes from review
-
由 ronnywang 提交于
* add conv_op_npu and test * add more tests * clean headers & support fp16 * update
-
由 ronnywang 提交于
* add pool2d_op_npu and test * update * update pool2d_backward_navie * clean headers
-
由 Yulong Ao 提交于
* add auto_parallel dir * mv to paddle.distributed * add shard_xx api * add distributed attrs for var * add ut, test=develop * add dist * update * update * update * update * update * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update * update * update * update * update * update, test=develop * update, test=develop * update * update * delete unused proto * resotre op_desc * restore type_defs * update var_desc * remove dimss_mapping for proto_pybind * update interface.py * update framework.py * update * update * add auto_parallel dir * mv to paddle.distributed * add shard_xx api * add distributed attrs for var * add ut, test=develop * [WIP] Add the auto completion feature and related codes * [WIP] Improve the auto completion and related codes * [WIP] Make the auto completion to support data-parallel * [WIP] Make the completion support mp and dp+mp * [WIP] Refactor auto completion unit test for MLP * [WIP] Refactor the implementation of DistributedOperatorImpl * [WIP] Improve dims_mapping update rule and fix a bug * [WIP] Support auto completion for one transformer decoder layer * [WIP] Add a minor change * [WIP] Fix a bug within the uint test * Shard XShape tensor, add embedding completion and refactor code * Add the distributed_operators dir to setup.py.in * Improve the completion process and add the unittest for gpt * fix process_mesh ut * fix process_mesh ut * update * update, test=develop * Add support for automatically completing distributed attrs of special ops * update * update * update * fix doc sample codes, test=develop * improve coverage, test=develop * add static_mode check, test=develop * Model the cluster for cost model and physical mapping * update, test=develop * add set_placement, test=develop * Add the check to make sure the candidate tensors' size is great than zero * update doc, test=develop * update doc, test=develop * update doc, test=develop * update doc, test=develop * update, test=develop * Auto mark dist attrs annotated by user * update ndarray to nested list, test=develop * update, test=develop * Add auto-completion module for auto-parallel (based on PR#33804) * Remove unnecessary files * Remove unrelated files for the auto completion pr * Update the unit test to improve the coverage * Modify codes based on reviews * Minor changes for CI * Improve some codes based on new comments * Fix bugs caused by shallow copy in attributes.py * Imporve amend_distributed_attr_for_program in context.py * Other changes for weihang's comments Co-authored-by: Nsandyhouse <lilong12@baidu.com>
-
- 23 8月, 2021 8 次提交
-
-
由 Bo Liu 提交于
-
由 Wilber 提交于
-
由 Yuang Liu 提交于
-
由 zyfncg 提交于
* Support getitem by Bool index * delete some debug info of bool index * support the case that the shape of bool index is different from indexed tensor
-
由 pangyoki 提交于
-
由 pangyoki 提交于
-
由 zhaoyingli 提交于
* adamw support cuda * adamw support cuda
-
由 Linjie Chen 提交于
* Add cuda device count api * update coda format * fix unittest error * update code format * update comment
-
- 20 8月, 2021 8 次提交
-
-
由 Hao Lin 提交于
-
由 Yuang Liu 提交于
-
由 lzzyzlbb 提交于
* add rmsprop npu * add argsort npu * add argsort npu * modify according to review * modify sharedatawith according to review * modify reshape according to review * rm dygraph=false
-
由 Sing_chan 提交于
* [NPU] Support npu kernel for pad3d op * fix for comment of zhouwei25 * fix some bugs according to qili93's comments * add support and test for paddings in input * delete VLOG used for debug
-
由 zhaoyingli 提交于
* add depthwise_conv2d npu * add some tests * Delete test_unique_op_npu.py * delete trans input
-
由 zhaoyingli 提交于
* [NPU] Support npu op where and where grad * fix use const_cast * delete a test
-
由 JYChen 提交于
* add (N,C,*) input support for GroupNorm * --amend
-
由 shangliang Xu 提交于
-
- 19 8月, 2021 4 次提交
-
-
由 JingZhuangzhuang 提交于
* add npu sin op * [NPU] Support npu kernel for sin op * modify support npu kernel for sin op * modify support npu kernel for sin op * modify nou sin op * modify npu sin op * add sin op npu
-
由 parap1uie-s 提交于
-
由 lilong12 提交于
-
由 王明冬 提交于
-
- 18 8月, 2021 13 次提交
-
-
由 lzzyzlbb 提交于
* [npu]add rmsprop op
-
由 xiongkun 提交于
* Add NPU kernel for norm Op: float16 and float32 * fix code for code review * fix for code review * add type for paddle_throw * remove unnecessary head file.\nAdd more testcase * remove a broadcast
-
由 littletomatodonkey 提交于
* fix pad outliers err * fix pad api input type and doc * fix example of pad * add unittest for pad3d * fix unittest * fix error format * fix pad doc
-
由 wanghuancoder 提交于
* code refactoring, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
-
由 Jackwaterveg 提交于
* test=develop * test=develop
-
由 Jackwaterveg 提交于
* test=develop * test=develop
-
由 WangXi 提交于
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16 param to the optimizer (#34965)
-
由 Fan Zhang 提交于
[CPU-PSLIB] Add consistency insepection of use_var_list and data_generator data, test=develop (#34463)
-
由 XGZhang 提交于
-
由 Zhanlue Yang 提交于
* Add function to disable paddle signal handler Paddle used google::InstallFaultSignalHandler to handle selected system signals, mainly for debugging and bug report purposes. However, this can be conflicted with other python packages whoever captures similar signals. Such python package involves tvm and more To resolve this issue, we support a function to disable signal handler * Remove signal test from WIN32 platform * Remove redundant return from disable_signal_handler() function * Add detailed messages to en_doc
-
由 WangXi 提交于
-
由 Guoxia Wang 提交于
* support class center sample of PartialFC
-
由 Wangzheee 提交于
* unitest_quant_dequant * fix * fix * deleted: test_trt_quant_conv2d_dequant_fuse_pass.py * fix
-
- 17 8月, 2021 3 次提交
-
-
由 Roc 提交于
-
由 Aganlengzi 提交于
-
由 WeiXin 提交于
* polish unittest. * polish code * polish code
-