- 31 8月, 2021 2 次提交
-
-
由 Aurelius84 提交于
* polish code * fix unittest on windows * refine pybind interface * support statistic MemSize of AllocatorPool * Replace mutex into atomic
-
由 王明冬 提交于
-
- 30 8月, 2021 3 次提交
-
-
由 chentianyu03 提交于
-
由 ceci3 提交于
* update ernie int8
-
由 Aurelius84 提交于
* Abstract GenerateDeviceEventFlag to shield platforms * Remove get_cuda_flags
-
- 27 8月, 2021 2 次提交
-
-
由 joanna.wozna.intel 提交于
* Add calculation for gru op * Correct the types * Remove mkldnn only * Correct mkldnn ifdef * Remove mkldnn ifdef * Separate mkldnn quantizer test * Correct Windows test * Check different cmake fix * Revert cmake change * Cmake change 2 * Cmake change 3
-
由 Aurelius84 提交于
* add CPUDeiveEvent * Polish DeviceEvent code * Add DEVICE_EVENT_LIBS
-
- 26 8月, 2021 5 次提交
-
-
由 wanghuancoder 提交于
* gc for newexecutor, test=develop * refine, test=develop * add interpretercore_gc_helper.h,test=develop * backup * gc whit thread and device_event, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * fix bug, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * add CheckGC, test=develop
-
由 Aurelius84 提交于
* Modify into QueueSync QueueAsync * fix complie on MacOS * fix pointer * fix conflict * polish unittest * fix windows fetch error * polish code according reviewer * fix device_guard on CPU place
-
由 Wilber 提交于
-
由 liutiexing 提交于
-
由 XGZhang 提交于
-
- 25 8月, 2021 2 次提交
-
-
由 wanghuancoder 提交于
* fix cmaklist for new executor, test=develop * refine, test=develop * refine, test=develop
-
由 liutiexing 提交于
-
- 24 8月, 2021 5 次提交
-
-
由 wanghuancoder 提交于
* add fetch, test=develop * fix fetch2op, test=develop * fix fetch2op, test=develop * refine, test=develop * fix fetch ctx, test=develop * add wait, test=develop * rename fetch2 to fetch_v2, test=develop * merge, test=develop
-
由 wanghuancoder 提交于
-
由 王明冬 提交于
-
由 Zeng Jinle 提交于
-
由 Yulong Ao 提交于
* add auto_parallel dir * mv to paddle.distributed * add shard_xx api * add distributed attrs for var * add ut, test=develop * add dist * update * update * update * update * update * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update * update * update * update * update * update, test=develop * update, test=develop * update * update * delete unused proto * resotre op_desc * restore type_defs * update var_desc * remove dimss_mapping for proto_pybind * update interface.py * update framework.py * update * update * add auto_parallel dir * mv to paddle.distributed * add shard_xx api * add distributed attrs for var * add ut, test=develop * [WIP] Add the auto completion feature and related codes * [WIP] Improve the auto completion and related codes * [WIP] Make the auto completion to support data-parallel * [WIP] Make the completion support mp and dp+mp * [WIP] Refactor auto completion unit test for MLP * [WIP] Refactor the implementation of DistributedOperatorImpl * [WIP] Improve dims_mapping update rule and fix a bug * [WIP] Support auto completion for one transformer decoder layer * [WIP] Add a minor change * [WIP] Fix a bug within the uint test * Shard XShape tensor, add embedding completion and refactor code * Add the distributed_operators dir to setup.py.in * Improve the completion process and add the unittest for gpt * fix process_mesh ut * fix process_mesh ut * update * update, test=develop * Add support for automatically completing distributed attrs of special ops * update * update * update * fix doc sample codes, test=develop * improve coverage, test=develop * add static_mode check, test=develop * Model the cluster for cost model and physical mapping * update, test=develop * add set_placement, test=develop * Add the check to make sure the candidate tensors' size is great than zero * update doc, test=develop * update doc, test=develop * update doc, test=develop * update doc, test=develop * update, test=develop * Auto mark dist attrs annotated by user * update ndarray to nested list, test=develop * update, test=develop * Add auto-completion module for auto-parallel (based on PR#33804) * Remove unnecessary files * Remove unrelated files for the auto completion pr * Update the unit test to improve the coverage * Modify codes based on reviews * Minor changes for CI * Improve some codes based on new comments * Fix bugs caused by shallow copy in attributes.py * Imporve amend_distributed_attr_for_program in context.py * Other changes for weihang's comments Co-authored-by: Nsandyhouse <lilong12@baidu.com>
-
- 23 8月, 2021 1 次提交
-
-
由 Wilber 提交于
-
- 20 8月, 2021 2 次提交
-
-
由 Yuang Liu 提交于
-
由 wangguanqun 提交于
* add trainer desc config to distributed strategy * code style modified * data_feed set lod
-
- 18 8月, 2021 3 次提交
-
-
由 wanghuancoder 提交于
* code refactoring, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
-
由 WangXi 提交于
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16 param to the optimizer (#34965)
-
由 Chen Weihang 提交于
* fix ext_tensor.cast failed bug * remove useless deps * fix windows cmake failed * try to fix windows make failed * fix make error on windwos
-
- 17 8月, 2021 2 次提交
-
-
由 chentianyu03 提交于
* copy boost optional.hpp to paddle * copy boost optional.hpp to paddle * move directions * del fluid/utils * modify .hpp to .h * move directions * modify to paddle::optional * add modification description * format code stype for the files in paddle/utils * format code stype
-
由 Zeng Jinle 提交于
* add inplace passes and tests * update * fix use_cuda undefined fix compile error of op compat * add more ut * fix CPU CI error * check adam unique * fix mac/windows ci, improve coverage * fix ci error * follow weihang's comment * fix BlockDesc::MoveFrom * follow qiuliang's comment * update * follow huihuang's comments
-
- 16 8月, 2021 2 次提交
-
-
由 Fan Zhang 提交于
-
由 joanna.wozna.intel 提交于
* Remove force_fp32_output from elementwise_add quantization * Fix cpu_quantize_placement test * Review related changes
-
- 13 8月, 2021 2 次提交
-
-
由 zyfncg 提交于
* Fix a bug : can't load more than one custom op module * Fix a bug : can't load more than one custom op module * add test for load multiple modules of custom c++ op * add config for Coverage CI
-
由 Zeng Jinle 提交于
-
- 11 8月, 2021 4 次提交
-
-
由 Wangzheee 提交于
* fix_fc_reshape_convert * fix
-
由 Hao Lin 提交于
* Add ext_tensor.slice() API, test=develop * Call Tensor::mutable_data first to fix bugs and add test for writing to sliced tensor * Fix unit test bug * Fix code format problem, test=develop * Fix code format problem * Fix code format problem * strengthen unit test * Use CustomTensorUtils::ShareDataFrom to simplify codes
-
由 lilong12 提交于
* add auto_parallel apis
-
由 hong 提交于
* add not used output var to gc_check_list; test=develop * add useless output to gc check list; test=develop
-
- 10 8月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* add any.hpp to utils and replace boost::any with self defined paddle::any * add copy any.hpp to custom op depends * modify any.hpp include path * remove boost from setup.py.in * add copy any.hpp to custom op depends * move any.hpp to paddle/utils/ dirs * move any.h to extension/include direction * copy utils to right directions
-
- 06 8月, 2021 3 次提交
-
-
由 houj04 提交于
-
由 QingshuChen 提交于
* support kunlun black list and add kl1 op * xpu_op_list add device_context dependence
-
由 Qi Li 提交于
-
- 05 8月, 2021 1 次提交
-
-
由 hong 提交于
* first test version * add test exec; * add data transfer; test=develop * add new exec head; * add memcpy; test=develop * add python fetch * add new test * add graph node; test=develop * remove useless new executor test; test=develop * remove gperf dependency; test=develop * fix compile bugs; test=develop * remove useless code; test=develop * remove useless code; test=develop * add uni test; test=develop * polish code; test=develop * polish code; test=develop * add interpreter cmakefile; test=develop * remove useless code; test=develop
-