- 14 4月, 2021 9 次提交
-
-
由 Adam Osewski 提交于
* Initial draft for SGD BG16 kernel. * Unit tests for SGD with BF16 data type. * Add VLOG message to SGD BF16 op CPU kernel. * Enhance error messages and error types. * Refactor SGD op kernels to leverage some common code. * Make easier to add new kerne invoke code. * Fix SGD op kernel for sparse grad. * Unify quotes style. * Fix error for ROCM compilation. * Use specialized PADDLE_ENFORCE_xx functions.
-
由 Chen Weihang 提交于
-
由 xingfeng01 提交于
-
由 Pei Yang 提交于
* add check for runtime dynamic shape * add unittest * add lower bound case * adjust timeout of new ut to 120s
-
由 Chen Weihang 提交于
* add register backward hook method * add leaf grad accumullated test
-
由 Qi Li 提交于
* [ROCM] fix some typo in cmake, test=develop * [ROCM] fix rccl in paddle build script, test=develop
-
由 tianshuo78520a 提交于
* Delete grpc.cmake/distribeted/distributed_ops * reset operators/CMakeLists.txt * rm test_transpiler_ops.py * del test_transpiler_ops.py
-
由 zhulei 提交于
* fix matrix_inverse_op with rocm * fix matrix_inverse_op with rocm * fix matrix_inverse_op with rocm * fix matrix_inverse_op with rocm
-
由 xiegegege 提交于
-
- 13 4月, 2021 5 次提交
-
-
由 Pei Yang 提交于
* extend multiclass_nms unittest timeout threshold * adjust timeout to 200s * temporarily disable multiclass_nms trt op teller
-
由 YUNSHEN XIE 提交于
* fix error for long args * remove unneccessary code
-
由 chentianyu03 提交于
* add layer.to api * add layer.to api * add layer.to api * add the doc for Layer.to * add input type checking * modify assert and import bug * format code style * format code style * make place support str type * add SetGradVarBase method to set the gradient after conversion * modify argument palce to device * modify argument palce to device * modify doc of layers.to API * add xpuplace to device argument
-
由 Qi Li 提交于
-
由 jiangcheng 提交于
-
- 12 4月, 2021 4 次提交
-
-
由 ronnywang 提交于
* [ROCM] fix test_gru_rnn_op * [ROCM] fix test_expand_op * [ROCM] fix test_cross_entropy_loss * [ROCM] fix test_conv_nn_grad * [ROCM] fix test_bilinear_tensor_product_op * [ROCM] fix elementwise_op_function * [ROCM] fix test_lstm_cudnn_op * [ROCM] fix test_gpu_package_without_gpu_device * [ROCM] fix test_gru_unit_op * [ROCM] fix test_imperative_optimizer * [ROCM] fix rnn * [ROCM] fix group_norm_op * [ROCM] fix test_pool3d_api * [ROCM] fix test_pool3d_op
-
由 limingshu 提交于
-
由 Leo Chen 提交于
-
由 TTerror 提交于
* fix concat_grad on kunlun * fix concat_grad on kunlun
-
- 10 4月, 2021 1 次提交
-
-
由 AshburnLee 提交于
-
- 09 4月, 2021 5 次提交
-
-
由 niuliling123 提交于
* make high precision for avg_pool
-
由 Leo Chen 提交于
* [feature] support npu allocator (#30840) [feature] support npu allocator * [feature] support npu operator (#30951) [feature] support npu operator * [feature] support npu allocator, part 2 (#30972) * support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute * support parsing ascend rank table file (#31000) support parsing ascend rank table file * Fix reshape on GE graph. (#31084) Fix reshape on GE graph * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) * add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> * Fix compilation problem (#31100) Fix compilation problem (#31100) * fix compile * fix code stype * remove const_cast * support adding correct npu op in pybind.h (#31143) * support adding correct npu op in pybind.h * refine code * [NPU] Support executor with NPU (#31057) * [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu * refactor npu device manager (#31154) refactor npu device manager (#31154) * fix selected npus * fix compile * fix reading flags from env * format Co-authored-by: Nxiayanming <41795079@qq.com> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
-
由 Yiqun Liu 提交于
-
由 Aurelius84 提交于
* Remove old custom OP to reduce whl package volume * [Custom OP]Remove old custom OP to reduce whl package volume * support macos
-
由 Jacek Czaja 提交于
-
- 08 4月, 2021 3 次提交
-
-
由 cc 提交于
* Support converting the model from fp32 to fp16
-
由 Zhen Wang 提交于
* Use the runtime to create the unsupported_fp16_list using in AMP. * Add more infos about supported ops. * Add some comments for the function of OpSupportedInfos. * Fix the unit test of test_multi_precision_fp16_train.
-
由 Thomas Young 提交于
-
- 07 4月, 2021 9 次提交
-
-
由 danleifeng 提交于
* add uint8 type for flatten;test=develop
-
由 seemingwang 提交于
* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions * move graph files into a folder * code style change * remove graph operations from base table Co-authored-by: NHuang Zhengjie <270018958@qq.com> Co-authored-by: NWeiyue Su <weiyue.su@gmail.com> Co-authored-by: Nsuweiyue <suweiyue@baidu.com> Co-authored-by: Nluobin06 <luobin06@baidu.com> Co-authored-by: Nliweibin02 <liweibin02@baidu.com> Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
-
由 YUNSHEN XIE 提交于
* added ut check on windows,notest,test=windows_ci * debug,notest,test=windows_ci * debug,notest,test=windows_ci * fix bug,notest,test=windows_ci * added ut check * test for new ut add on windows * test,notest,test=windows_ci * fix bug,notest,test=windows_ci * test * test * test * test,notest,test=windows_ci * test,notest,test=windows_ci * check added ut on windows * only fetch upstream develop * modified according comment * Update run_unittests.sh * Update run_unittests.sh
-
由 furnace 提交于
-
由 zhang wenhui 提交于
* Ascend rc (#30483) * Fix compilcation on CANN20.1 and older (#30494) Fix compilcation on CANN20.1 and older * Add distribution supported (#30578) Add distribution supported * Build praser for Hcom* operators (#30627) Build praser for Hcom* operators * Pass device_ids info from launch to trainer. (#30632) Pass device_ids info from launch to trainer * Add Hccl program group (#30642) Add Hccl program group * Add startup bash files of test_ascend_group. (#30645) Add startup bash files of test_ascend_group * cleanup (#30646) cleanup test_ascend_group.py * [Feature] Build parser to support distributed training (#30658) [Feature] Build parser to support distributed training * fix compilation on ascend-20.1 (#30722) fix compilation on ascend-20.1 * Dev/fix ascend string (#30749) Dev/fix ascend string * code style (#30781) code style * Merge ascend_optimizer and ascend_parser. (#30776) Merge ascend_optimizer and ascend_parser. * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug (#30797) Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug * Add paddle ascend distribution training supported (#30796) Add paddle ascend distribution training supported * pass cxx_flags to gloo cmake (#30857) * Destroy session first. (#30954) Destroy session first. * merge * fix, test=develop * fix, test=develop * fix style, test=develop * fix, test=develop * fix * fix log fatal, test=develop * fix enforce style, test=develop * fix, test=develop * fix, test=develop * fix rccl, test=develop * fix test, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix node_num, test=develop * fix ids str, test=develop * fix ids str, test=develop * fix ids str, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix style code, test=develop * fix style code, test=develop * fix style code, test=develop * fix style code, test=develop Co-authored-by: Nhutuxian <hutuxian2011@sina.cn> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: NVoid Main <voidmain1313113@gmail.com> Co-authored-by: NLeo Chen <chenqiuliang@baidu.com> Co-authored-by: Ndingsiyu <18369187719@163.com> Co-authored-by: NOleNet <olenet@126.com>
-
由 JZ-LIANG 提交于
-
由 Ouyang Chao 提交于
* improve performance of DepthwiseConv(NWHC)
-
由 iducn 提交于
* print build summary * print build summary * print build summary * print build summary
-
由 tangwei12 提交于
* add PullSparseValue for pull sparse * fix bug for PullSparseValue * add test mode in lookuptable * revert API change * add comment for is_training
-
- 06 4月, 2021 4 次提交
-
-
由 tianshuo78520a 提交于
-
由 wuhuanzhou 提交于
-
由 zlsh80826 提交于
* fix yolobox teller condition * fix cuda double free bug
-
由 Pei Yang 提交于
-