1. 17 10月, 2022 13 次提交
    • G
      Add enable_partial_send_recv switch in pipeline_configs (#46992) · b9a2f29c
      Ghost Screaming 提交于
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * Support allow_partial switch, which can be configure in
      pipeline_configs. If sent tensor are not the same from
      different hosts, they shouldn't been sent partially and
      then concated as a whole tensor.
      
      * Change name allow_partial to enable_partial_send_recv.
      
      * Add global variable _enable_partial_send_recv
      b9a2f29c
    • G
      Support BF16 training for sharding (#46846) · 0b39b244
      Ghost Screaming 提交于
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * support pure bfloat16
      
      * support bf16 linear
      
      * update PR to pass CI
      
      * tiny fix where_grad_kernel.cu
      
      * Support bfloat16 type for reducer and sharding.
      
      * Fix some bug.
      
      * Polish code.
      
      * Polise code.
      
      * Add bfloat16 datatype in fill_grad kernels.
      Co-authored-by: Nsneaxiy <sneaxiy@126.com>
      0b39b244
    • O
      [hidden trouble] Update test_sparse_transpose_op.py to get rid of a hidden trouble. (#47017) · d43c972c
      OccupyMars2025 提交于
      * Update test_sparse_transpose_op.py
      
      * Update test_sparse_transpose_op.py
      d43c972c
    • Y
      【Hackathon No.8】 add gumbel distribution api (#46255) · f1a9f877
      YuRonan 提交于
      * init gumbel api
      
      * commit: update test file
      
      * fix:bug
      
      * update Gumbel API
      
      * upgrade distribution/gumbel.py
      
      * add tests/test_distribution_gumbel.py
      
      * fix:code style
      
      * fix:code style
      
      * fix:code style
      
      * fix:code style
      
      * fix bug
      
      * fix:code style
      
      * fix:code style
      
      * fix:rollback uniform
      
      * fix:delete invalid code
      
      * fix:bug and add static test
      
      * fix:code style
      
      * fix:code style
      
      * fix:delete init transforms
      
      * fix:bug
      
      * fix:bug
      
      * fix:code style
      
      * fix:code style
      
      * fix:add transforms
      
      * fix:code style
      
      * fix:code style
      
      * fix:bug
      
      * fix:bug
      
      * fix:code style
      
      * fix:code style
      
      * fix:bug
      
      * fix:code style
      
      * fix:code style
      
      * fix:bug for gumbel.py / add:judge transforms'len for transformed_distribution.py
      
      * update gumbel.py
      
      * fix:bug for test_distribution_gumbel.py
      
      * fix:bug for test_distribution_gumbel_static.py
      
      * fix:code style
      
      * fix:code style
      
      * fix:coverage
      
      * fix:bug
      
      * fix:bug
      
      * fix:code style
      
      * fix:bug
      
      * delete:no use package for gumbel.py
      
      * add:coverage transforms'len judge for test_distribution_gumbel.py
      
      * fix:code style for test_distribution_gumbel.py
      
      * fix:coverage
      
      * fix:code style
      
      * fix:code style
      
      * fix:code style
      
      * fix:code style
      
      * fix:code style
      
      * fix:en doc
      
      * fix:param
      
      * fix:copyright
      
      * fixSample; test=document_fix
      Co-authored-by: Ndasen <sen15530876201@163.com>
      f1a9f877
    • O
      [Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape (#46694) · abb38136
      OccupyMars2025 提交于
      * add sparse reshape
      
      * change the dtype in all test cases to int64
      
      * just one test case
      
      * modify comments
      
      * Update test_sparse_reshape_op.py
      
      * chang the type of "shape"  from  vector<int64_t>  to  IntArray
      
      * check whether sp_out.to_dense() is the cause  of error
      
      * print sp_out
      
      * Update reshape_kernel.cc
      
      * use numpy to generate the equal paddle tensor
      
      * just check dense_tensor.numpy()
      
      * check cpu and cuda versions
      
      * Update test_sparse_reshape_op.py
      
      * supply all test cases for cpu forward coo kernel
      
      * test forward coo cuda kernel
      
      * change configuration of cuda kernel
      
      * keep only one test case
      
      * test coo cpu kernel (forward and backward)
      
      * row major or column major ???
      
      * test cuda coo forward kernel
      
      * complete declaration and registration
      
      * Update __init__.py
      
      * rebuild
      
      * retrigger CI
      
      * add cudaMalloc and cudaMemcpy  in  ReshapeCooKernel  and change back to row major order in a cuda dense tensor
      
      * midify minor error
      
      * test only cpu coo forward kernel
      
      * add all test cases for coo forward kernel  (both cpu and gpu)
      
      * test all forward kernels (coo, csr; cpu, gpu)
      
      * add all test cases for all kinds of kernels
      
      * just retrigger CI
      
      * Update sparse_ops.yaml
      
      * Update sparse_ops.yaml
      
      * Update sparse_ops.yaml
      
      * resolve conflicts
      
      * Update sparse_ops.yaml
      
      * don't specify tensor place
      
      * new shape has -1 or 0 in it
      
      * Update unary_grad_kernel.h
      
      * correct lvalue error
      
      * code style
      
      * Update sparse_backward.yaml
      
      * Update sparse_ops.yaml
      
      * Update unary_kernel.h
      
      * Update unary.py
      
      * Update sparse_backward.yaml
      
      * Update unary.py
      
      * code style
      
      * code style
      
      * code style
      
      * Update unary.py
      
      * specify tensor place explicitly
      
      * do not use numpy array
      
      * use numpy array in unit test again
      
      * modify example code in docstring
      abb38136
    • W
      support __floordiv__ (#47060) · 64307903
      Weilong Wu 提交于
      64307903
    • W
      Layernorm shift partition enhance (#46816) · 9e08633c
      Wang Bojun 提交于
      * first version of ln_s_p with s>0
      
      * refine and UT
      
      * pass opt draft
      
      * pass opt
      
      * code refine
      
      * code-style
      
      * bug fix
      
      * fix ci test
      
      * code style
      9e08633c
    • Y
      [Auto Parallel] Fix the bug of completion (#47056) · f0af2708
      Yulong Ao 提交于
      * [Auto Parallel] Fix the bug for None labels
      
      * [Auto Parallel] Fix the completion bug
      f0af2708
    • P
      skip ReplaceAllReduceOp in GraphtoBlock when nccl_ctxs_ is nullptr (#46911) · 2e7dc666
      pangyoki 提交于
      * skip ReplaceAllReduceOp in GraphtoBlock when nccl_ctxs_ is nullptr
      
      * update ut
      
      * test_dist_allreduce_op failed
      
      * fix test_dist_allreduce_op
      
      * add ut
      
      * fix nccl cpu compile
      
      * fix
      2e7dc666
    • N
      [CodeStyle][py2] remove `compat` module (to_bytes) (#47035) · 198c7993
      Nyakku Shigure 提交于
      * [CodeStyle][py2] remove `compat` module (to_bytes)
      
      * remove some unused imports
      
      * clean up to_bytes definition and unittests
      
      * Revert "clean up to_bytes definition and unittests"
      
      This reverts commit e726539e1768172a411ff60e63fab82f164343cf.
      
      * use `b` prefix instead of `encode()`
      198c7993
    • G
      fix dygraph new format problem export in QAT (#47023) · 6566b8f5
      Guanghua Yu 提交于
      6566b8f5
    • G
    • D
      [Custom Device] Add singleton to custom device (#46963) · 73196e5a
      duanyanhui 提交于
      * add singleton to custom device
      
      * Update custom_device.cc
      
      Init device_init_flag_ in default
      73196e5a
  2. 14 10月, 2022 7 次提交
  3. 13 10月, 2022 16 次提交
  4. 12 10月, 2022 4 次提交
    • J
      bugfix (#46921) · acdaa4fb
      JZ-LIANG 提交于
      acdaa4fb
    • Y
      [Auto Parallel] Improve the fine-grained APIs (#46552) · 686fa07a
      Yulong Ao 提交于
      * [Auto Parallel] Suppport different dataloaders
      
      * [Auto Parallel] Add num_shards config for dataset
      
      * [Auto Parallel] Unify the logger and outputs of Engine API
      
      * [Auto Parallel] Fix the bugs of to_static
      
      * [Auto Parallel] Adjust the test_to_static.py
      
      * [Auto Parallel] Add the prepare API and replace __call__ with run
      
      * [Auto Parallel] Improve the private implementations of Engine
      
      * [Auto Parallel] Set capacity of dataloader for opt tuning
      
      * [Auto Parallel] [WIP] Change the fine-grained API
      
      * [Auto Parallel] Improve APIs to support different user cases
      
      * [Auto Parallel] Add removed config
      
      * [Auto Parallel] Add imports
      
      * [Auto Parallel] Fix bugs for to_static
      
      * [Auto Parallel] Remove unnecessary imports
      686fa07a
    • zhouweiwei2014's avatar
      [Zero-Dim] support input 0D Tensor for some unary api (#45992) · 05c2b9ba
      zhouweiwei2014 提交于
      * [Zero-Dim] support input 0D Tensor for unary api
      
      * fix CI
      05c2b9ba
    • Y
      Multi groups for broadcast of sharding stage 2 (#46894) · 95768115
      Yuang Liu 提交于
      95768115