1. 19 10月, 2022 15 次提交
  2. 18 10月, 2022 14 次提交
  3. 17 10月, 2022 11 次提交
    • G
      Add enable_partial_send_recv switch in pipeline_configs (#46992) · b9a2f29c
      Ghost Screaming 提交于
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * Support allow_partial switch, which can be configure in
      pipeline_configs. If sent tensor are not the same from
      different hosts, they shouldn't been sent partially and
      then concated as a whole tensor.
      
      * Change name allow_partial to enable_partial_send_recv.
      
      * Add global variable _enable_partial_send_recv
      b9a2f29c
    • G
      Support BF16 training for sharding (#46846) · 0b39b244
      Ghost Screaming 提交于
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * support pure bfloat16
      
      * support bf16 linear
      
      * update PR to pass CI
      
      * tiny fix where_grad_kernel.cu
      
      * Support bfloat16 type for reducer and sharding.
      
      * Fix some bug.
      
      * Polish code.
      
      * Polise code.
      
      * Add bfloat16 datatype in fill_grad kernels.
      Co-authored-by: Nsneaxiy <sneaxiy@126.com>
      0b39b244
    • H
      Revert "add common subexpression elimination (#44386)" (#47062) · 7c6835ca
      hong 提交于
      This reverts commit 166ff39a.
      7c6835ca
    • O
    • Y
      [PHI]Modify DataLayout's namespace from paddle::experimental to phi (#46869) · ec749398
      YuanRisheng 提交于
      * namespace modify
      
      * update by comment
      ec749398
    • R
      Fix warning message format error (#47045) · 13284437
      RedContritio 提交于
      13284437
    • O
      [Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape (#46694) · abb38136
      OccupyMars2025 提交于
      * add sparse reshape
      
      * change the dtype in all test cases to int64
      
      * just one test case
      
      * modify comments
      
      * Update test_sparse_reshape_op.py
      
      * chang the type of "shape"  from  vector<int64_t>  to  IntArray
      
      * check whether sp_out.to_dense() is the cause  of error
      
      * print sp_out
      
      * Update reshape_kernel.cc
      
      * use numpy to generate the equal paddle tensor
      
      * just check dense_tensor.numpy()
      
      * check cpu and cuda versions
      
      * Update test_sparse_reshape_op.py
      
      * supply all test cases for cpu forward coo kernel
      
      * test forward coo cuda kernel
      
      * change configuration of cuda kernel
      
      * keep only one test case
      
      * test coo cpu kernel (forward and backward)
      
      * row major or column major ???
      
      * test cuda coo forward kernel
      
      * complete declaration and registration
      
      * Update __init__.py
      
      * rebuild
      
      * retrigger CI
      
      * add cudaMalloc and cudaMemcpy  in  ReshapeCooKernel  and change back to row major order in a cuda dense tensor
      
      * midify minor error
      
      * test only cpu coo forward kernel
      
      * add all test cases for coo forward kernel  (both cpu and gpu)
      
      * test all forward kernels (coo, csr; cpu, gpu)
      
      * add all test cases for all kinds of kernels
      
      * just retrigger CI
      
      * Update sparse_ops.yaml
      
      * Update sparse_ops.yaml
      
      * Update sparse_ops.yaml
      
      * resolve conflicts
      
      * Update sparse_ops.yaml
      
      * don't specify tensor place
      
      * new shape has -1 or 0 in it
      
      * Update unary_grad_kernel.h
      
      * correct lvalue error
      
      * code style
      
      * Update sparse_backward.yaml
      
      * Update sparse_ops.yaml
      
      * Update unary_kernel.h
      
      * Update unary.py
      
      * Update sparse_backward.yaml
      
      * Update unary.py
      
      * code style
      
      * code style
      
      * code style
      
      * Update unary.py
      
      * specify tensor place explicitly
      
      * do not use numpy array
      
      * use numpy array in unit test again
      
      * modify example code in docstring
      abb38136
    • W
      support __floordiv__ (#47060) · 64307903
      Weilong Wu 提交于
      64307903
    • W
      Layernorm shift partition enhance (#46816) · 9e08633c
      Wang Bojun 提交于
      * first version of ln_s_p with s>0
      
      * refine and UT
      
      * pass opt draft
      
      * pass opt
      
      * code refine
      
      * code-style
      
      * bug fix
      
      * fix ci test
      
      * code style
      9e08633c
    • J
      fix for conv_bias_mkldnn_pass (#47037) · acbda3e4
      jakpiase 提交于
      acbda3e4
    • P
      skip ReplaceAllReduceOp in GraphtoBlock when nccl_ctxs_ is nullptr (#46911) · 2e7dc666
      pangyoki 提交于
      * skip ReplaceAllReduceOp in GraphtoBlock when nccl_ctxs_ is nullptr
      
      * update ut
      
      * test_dist_allreduce_op failed
      
      * fix test_dist_allreduce_op
      
      * add ut
      
      * fix nccl cpu compile
      
      * fix
      2e7dc666