- 18 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* [HybridParallel]Support fp16 in dygraph hybrid parallel * update * update * update for recompute * add unittest of pp+fp16 * add unittest of recompute+fp16 * update * modify ut
-
- 15 10月, 2021 1 次提交
-
-
由 duanboqiang 提交于
-
- 14 10月, 2021 3 次提交
-
-
由 duanboqiang 提交于
-
由 ShenLiang 提交于
* add no_sync for parameters sync * add pipeline for moe
-
由 Yuang Liu 提交于
-
- 13 10月, 2021 2 次提交
-
-
由 Guoxia Wang 提交于
-
由 Leo Chen 提交于
* refine amp level * fix typo * update tracer._amp_level
-
- 12 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* fix calling bug of HybridParallelClipGrad * fix bugs of HybridParallelClipGrad * add unittest of pp with HybridParallelClipGrad * fix bugs in mp_layers.py * update * fix bugs in pp_layers.py * update
-
- 11 10月, 2021 1 次提交
-
-
由 danleifeng 提交于
* heterps:add fuse_allreduce op; test=develop * add program_mode in minimize for pslib mode;test=develop
-
- 09 10月, 2021 1 次提交
-
-
由 zhaoyingli 提交于
* support ClipGradByGlobalNorm in sharding * support ClipGradByGlobalNorm in sharding * test=allcase
-
- 08 10月, 2021 1 次提交
-
-
由 yaoxuefeng 提交于
-
- 07 10月, 2021 1 次提交
-
-
由 Haohongxiang 提交于
* fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer * update * update
-
- 30 9月, 2021 1 次提交
-
-
由 李季 提交于
* fix raw optim * pre-commit test file Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
- 29 9月, 2021 1 次提交
-
-
由 WangXi 提交于
-
- 28 9月, 2021 1 次提交
-
-
由 WangXi 提交于
-
- 24 9月, 2021 2 次提交
-
-
由 ShenLiang 提交于
-
由 seemingwang 提交于
* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions * move graph files into a folder * code style change * remove graph operations from base table * optimize get_feat function of graph engine * fix long long count problem * remove redandunt graph files * remove unused shell * recover dropout_op_pass.h * fix potential stack overflow when request number is too large & node add & node clear & node remove * when sample k is larger than neigbor num, return directly * using random seed generator of paddle to speed up * fix bug of random sample k * fix code style * fix code style * add remove graph to fleet_py.cc * fix blocking_queue problem * fix style * fix * recover capacity check * add remove graph node; add set_feature * add remove graph node; add set_feature * add remove graph node; add set_feature * add remove graph node; add set_feature * fix distributed op combining problems * optimize * remove logs Co-authored-by: NHuang Zhengjie <270018958@qq.com> Co-authored-by: NWeiyue Su <weiyue.su@gmail.com> Co-authored-by: Nsuweiyue <suweiyue@baidu.com> Co-authored-by: Nluobin06 <luobin06@baidu.com> Co-authored-by: Nliweibin02 <liweibin02@baidu.com> Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
-
- 18 9月, 2021 2 次提交
-
-
由 WangXi 提交于
-
由 Guoxia Wang 提交于
* fix bug
-
- 17 9月, 2021 3 次提交
-
-
由 zhangbo9674 提交于
* add pure fp16 major function in auto_cast & tracer * support master weight in dygraph for pure fp16 * check mix dtype of fp16&fp32 for check_finite_and_unscale op * change pure fp16 funtion name * refine some bug in auto_cast * refine auto_cast interface logic * add param _casted_by_pure_fp16 for class Layer * support state_dict hook for save model by user appointed dtype in pure_fp16_decorator * refine pure_fp16_decorator as decorator * add unittest * add comment * add comment * support recompute * add comment for auto_cast and decorator * support to_static_state_dict for paddle.jit.save * unlimite models num and optimizers num * add lookup_table in black_list * fix momentum and layer state_dict * fix bug in layer state_dict * fix bug in layer state_dict_helper * refine unittest * refine test_momentun_op * refine interface and some code * refine amp_decorator interface * refine pure fp16 interface * refine master weight interface
-
由 Guoxia Wang 提交于
-
由 Guoxia Wang 提交于
* add launch doc
-
- 16 9月, 2021 2 次提交
- 15 9月, 2021 2 次提交
-
-
由 Haohongxiang 提交于
-
由 WangXi 提交于
-
- 14 9月, 2021 2 次提交
-
-
由 Haohongxiang 提交于
* Add solutions to PyLayer which is unsupported in DataParallel * modify note format for parallel.py * modify docs of dataparallel * add docs of dp with pylayer * modify docs format * modify example format * change example of dp with pylayer * add unittest for dp with pylayer * modify ut * merge latest codes * update * modify for CI-Coverage * modify text-indent
-
由 Zeng Jinle 提交于
* fix raw optimizer gm * update * update ut
-
- 13 9月, 2021 2 次提交
-
-
由 ShenLiang 提交于
* support grad group * fix single card condition
-
由 Guoxia Wang 提交于
* support hybrid parallel inference helper class
-
- 10 9月, 2021 2 次提交
- 08 9月, 2021 2 次提交
-
-
由 Yulong Ao 提交于
* add auto_parallel dir * mv to paddle.distributed * add shard_xx api * add distributed attrs for var * add ut, test=develop * add dist * update * update * update * update * update * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update * update * update * update * update * update, test=develop * update, test=develop * update * update * delete unused proto * resotre op_desc * restore type_defs * update var_desc * remove dimss_mapping for proto_pybind * update interface.py * update framework.py * update * update * add auto_parallel dir * mv to paddle.distributed * add shard_xx api * add distributed attrs for var * add ut, test=develop * [WIP] Add the auto completion feature and related codes * [WIP] Improve the auto completion and related codes * [WIP] Make the auto completion to support data-parallel * [WIP] Make the completion support mp and dp+mp * [WIP] Refactor auto completion unit test for MLP * [WIP] Refactor the implementation of DistributedOperatorImpl * [WIP] Improve dims_mapping update rule and fix a bug * [WIP] Support auto completion for one transformer decoder layer * [WIP] Add a minor change * [WIP] Fix a bug within the uint test * Shard XShape tensor, add embedding completion and refactor code * Add the distributed_operators dir to setup.py.in * Improve the completion process and add the unittest for gpt * fix process_mesh ut * fix process_mesh ut * update * update, test=develop * Add support for automatically completing distributed attrs of special ops * update * update * update * fix doc sample codes, test=develop * improve coverage, test=develop * add static_mode check, test=develop * Model the cluster for cost model and physical mapping * update, test=develop * add set_placement, test=develop * Add the check to make sure the candidate tensors' size is great than zero * update doc, test=develop * update doc, test=develop * update doc, test=develop * update doc, test=develop * update, test=develop * Auto mark dist attrs annotated by user * update ndarray to nested list, test=develop * update, test=develop * Add auto-completion module for auto-parallel (based on PR#33804) * Remove unnecessary files * Remove unrelated files for the auto completion pr * Update the unit test to improve the coverage * Modify codes based on reviews * Minor changes for CI * Improve some codes based on new comments * Fix bugs caused by shallow copy in attributes.py * Imporve amend_distributed_attr_for_program in context.py * Other changes for weihang's comments * support shard reader * support shard reader * add parallel mode * update process mesh * add method to compute comm_group * implement dist_embedding forward func * implement dist matmul forward func * implement dist reshape forward func * add transpiler framework * add transpiler forward * implement transpiler forward * implement transpiler backward & update * add process * add unitest * chmod * chmod * chmod * update unitest * add unitest for gpt * remove unused print * rename transpiler --> partitioner * rename transpiler --> partitioner * chmod * chmod * bug fixed * remove amp function * update case for dp mode * update case for dp mode * [Auto Parallel] Integrate all parts with the newest code * Integrate all parts of auto parallel and improve codes * Integrate all parts by AutoParallelizer * Add unit test for AutoParallelizer * Improve auto completion module for pipeline parallel * Add support for matmul_v2 in dist_matmul * Correct the typo "stratergy" to "strategy" * Modify distributed_strategy.proto to conform the main stream * Restore parts of distributed_strategy to conform the develop branch Co-authored-by: Nsandyhouse <lilong12@baidu.com> Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
-
由 Zeng Jinle 提交于
* add fleet api for program pass * turn on apply pass for CI test * fix disable fuse_all_optimizer bug * try to test ci * fix CI * fill unspecified op role * fix fuse_allreduce * add ut to improve coverage * remove useless change * improve c++ coverage * follow some comments * test ir pass pipeline * update doc * reduce ut time again
-
- 01 9月, 2021 2 次提交
- 25 8月, 2021 1 次提交
-
-
由 WangXi 提交于
-
- 20 8月, 2021 1 次提交
-
-
由 Yuang Liu 提交于
-
- 18 8月, 2021 2 次提交