- 17 10月, 2022 13 次提交
-
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Support allow_partial switch, which can be configure in pipeline_configs. If sent tensor are not the same from different hosts, they shouldn't been sent partially and then concated as a whole tensor. * Change name allow_partial to enable_partial_send_recv. * Add global variable _enable_partial_send_recv
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * support pure bfloat16 * support bf16 linear * update PR to pass CI * tiny fix where_grad_kernel.cu * Support bfloat16 type for reducer and sharding. * Fix some bug. * Polish code. * Polise code. * Add bfloat16 datatype in fill_grad kernels. Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
由 OccupyMars2025 提交于
* Update test_sparse_transpose_op.py * Update test_sparse_transpose_op.py
-
由 YuRonan 提交于
* init gumbel api * commit: update test file * fix:bug * update Gumbel API * upgrade distribution/gumbel.py * add tests/test_distribution_gumbel.py * fix:code style * fix:code style * fix:code style * fix:code style * fix bug * fix:code style * fix:code style * fix:rollback uniform * fix:delete invalid code * fix:bug and add static test * fix:code style * fix:code style * fix:delete init transforms * fix:bug * fix:bug * fix:code style * fix:code style * fix:add transforms * fix:code style * fix:code style * fix:bug * fix:bug * fix:code style * fix:code style * fix:bug * fix:code style * fix:code style * fix:bug for gumbel.py / add:judge transforms'len for transformed_distribution.py * update gumbel.py * fix:bug for test_distribution_gumbel.py * fix:bug for test_distribution_gumbel_static.py * fix:code style * fix:code style * fix:coverage * fix:bug * fix:bug * fix:code style * fix:bug * delete:no use package for gumbel.py * add:coverage transforms'len judge for test_distribution_gumbel.py * fix:code style for test_distribution_gumbel.py * fix:coverage * fix:code style * fix:code style * fix:code style * fix:code style * fix:code style * fix:en doc * fix:param * fix:copyright * fixSample; test=document_fix Co-authored-by: Ndasen <sen15530876201@163.com>
-
由 OccupyMars2025 提交于
* add sparse reshape * change the dtype in all test cases to int64 * just one test case * modify comments * Update test_sparse_reshape_op.py * chang the type of "shape" from vector<int64_t> to IntArray * check whether sp_out.to_dense() is the cause of error * print sp_out * Update reshape_kernel.cc * use numpy to generate the equal paddle tensor * just check dense_tensor.numpy() * check cpu and cuda versions * Update test_sparse_reshape_op.py * supply all test cases for cpu forward coo kernel * test forward coo cuda kernel * change configuration of cuda kernel * keep only one test case * test coo cpu kernel (forward and backward) * row major or column major ??? * test cuda coo forward kernel * complete declaration and registration * Update __init__.py * rebuild * retrigger CI * add cudaMalloc and cudaMemcpy in ReshapeCooKernel and change back to row major order in a cuda dense tensor * midify minor error * test only cpu coo forward kernel * add all test cases for coo forward kernel (both cpu and gpu) * test all forward kernels (coo, csr; cpu, gpu) * add all test cases for all kinds of kernels * just retrigger CI * Update sparse_ops.yaml * Update sparse_ops.yaml * Update sparse_ops.yaml * resolve conflicts * Update sparse_ops.yaml * don't specify tensor place * new shape has -1 or 0 in it * Update unary_grad_kernel.h * correct lvalue error * code style * Update sparse_backward.yaml * Update sparse_ops.yaml * Update unary_kernel.h * Update unary.py * Update sparse_backward.yaml * Update unary.py * code style * code style * code style * Update unary.py * specify tensor place explicitly * do not use numpy array * use numpy array in unit test again * modify example code in docstring
-
由 Weilong Wu 提交于
-
由 Wang Bojun 提交于
* first version of ln_s_p with s>0 * refine and UT * pass opt draft * pass opt * code refine * code-style * bug fix * fix ci test * code style
-
由 Yulong Ao 提交于
* [Auto Parallel] Fix the bug for None labels * [Auto Parallel] Fix the completion bug
-
由 pangyoki 提交于
* skip ReplaceAllReduceOp in GraphtoBlock when nccl_ctxs_ is nullptr * update ut * test_dist_allreduce_op failed * fix test_dist_allreduce_op * add ut * fix nccl cpu compile * fix
-
由 Nyakku Shigure 提交于
* [CodeStyle][py2] remove `compat` module (to_bytes) * remove some unused imports * clean up to_bytes definition and unittests * Revert "clean up to_bytes definition and unittests" This reverts commit e726539e1768172a411ff60e63fab82f164343cf. * use `b` prefix instead of `encode()`
-
由 Guanghua Yu 提交于
-
由 Guanghua Yu 提交于
-
由 duanyanhui 提交于
* add singleton to custom device * Update custom_device.cc Init device_init_flag_ in default
-
- 14 10月, 2022 7 次提交
-
-
由 Wen Sun 提交于
-
由 jingsongliu 提交于
* update test_image.py * update test_image.py
-
由 zhaoyingli 提交于
* for gpt-gen * fix reshard * adapt assign and shape op * add dist_assign & unittest * add conditional block unittest * rename unittest
-
由 parap1uie-s 提交于
* Fix hAPI bug of not compatible with LayerHook
-
由 Wilber 提交于
-
由 Zhang Jun 提交于
-
由 Yulong Ao 提交于
-
- 13 10月, 2022 16 次提交
-
-
由 Siming Dai 提交于
-
由 yeliang2258 提交于
* fix immutable op quantize bugs * fix * fix build bug * fix test * notest,test=inference * fix ppyoloe acc drop bugs * fix test * fix test * add test * fix * fix * fix test * fix refined name bug * fix test * bias fix * fix matmul weight dequant bug * re-ci * fix tester * fix test * fix tester * update weight dequantize func * update code * update test for converage * update test * update cmake * update cmakelist * update code * rerun ci * remove useless code
-
由 xiaohemaikoo 提交于
-
由 xiaoxiaohehe001 提交于
-
由 Leo Chen 提交于
-
由 weishengying 提交于
Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm, grid_sampler, pad3d, etc (#46291)
-
由 zhouweiwei2014 提交于
-
由 Paulina Gacek 提交于
-
由 zhouweiwei2014 提交于
-
由 Aurelius84 提交于
* [BUG]Fix expand_as_v2 bug while X and Y with different dtype * fix commit
-
由 wuhuachaocoding 提交于
* combine dp and stage2 hybrid parallel. * update condition.
-
由 Zhang Ting 提交于
* Revert "【Hackathon No.56&38】deformable_conv_v1 算子实现 float16 数据类型支持&前向运行加速 (#46111)"
-
由 Xinger 提交于
* add rpc module in cpp side * add rpc module in python side * support win32 and mac for rpc * 代码优化 * 优化代码 * update rpc * update rpc launch * rpc remove rank and world_size api * fix logger import bug * remove support for win and mac * remove support for xpu, npu, cinn and rocm * remove support for xpu, npu, cinn and rocm * fix shutdown barrier timeout bug * update:python_rpc_handler to shared ptr * fix master shutodwn first bug * tests support for cpu * update log to vlog * update get service info api * add single process test case * remove process group * remove some useless dependencies * update rpc api comments * update rpc comments: Example to Examples * update rpc api comments * update rpc api comments * update launch api comments * update init_rpc comments * update rpc sync and async comments * fix bug: init_rpc cant be called repeatly in a process * update rpc api comment: make master endpoint unique * update rpc api:service to worker, timeout_ms to timeout * rename ServiceInfo to WorkerInfo * refactor: rename server to worker, log to vlog * add launch test * remove unused codes * refine
-
由 yangguohao 提交于
* 2022-08-30_update nn.layer.loss nn.functional.loss, test_file * 2022-08-30_update nn.layer.loss nn.functional.loss, test_file * fix: test_file * fix: test_file, docs, multi_margin_loss * fix: doc weight function * fix: test_multi_margin_loss * fix: weight np.testing.assert_allclose * fix: test_file * fix: en_doc * 2022-10-10
-
由 Nyakku Shigure 提交于
-
由 Nyakku Shigure 提交于
-
- 12 10月, 2022 4 次提交
-
-
由 JZ-LIANG 提交于
-
由 Yulong Ao 提交于
* [Auto Parallel] Suppport different dataloaders * [Auto Parallel] Add num_shards config for dataset * [Auto Parallel] Unify the logger and outputs of Engine API * [Auto Parallel] Fix the bugs of to_static * [Auto Parallel] Adjust the test_to_static.py * [Auto Parallel] Add the prepare API and replace __call__ with run * [Auto Parallel] Improve the private implementations of Engine * [Auto Parallel] Set capacity of dataloader for opt tuning * [Auto Parallel] [WIP] Change the fine-grained API * [Auto Parallel] Improve APIs to support different user cases * [Auto Parallel] Add removed config * [Auto Parallel] Add imports * [Auto Parallel] Fix bugs for to_static * [Auto Parallel] Remove unnecessary imports
-
由 zhouweiwei2014 提交于
* [Zero-Dim] support input 0D Tensor for unary api * fix CI
-
由 Yuang Liu 提交于
-