1. 10 2月, 2022 7 次提交
  2. 09 2月, 2022 10 次提交
  3. 08 2月, 2022 9 次提交
    • S
      Make Embedding layer support more int ids type (#39381) · 60f1461a
      sneaxiy 提交于
      * add more int id type support for embedding
      
      * add ut
      
      * add more ut
      
      * fix ci error
      60f1461a
    • H
      Add FuseOptimizerPass and test_dist_fuse_adam_pass unittest. (#39208) · ccdcfa2d
      hlygit66666 提交于
      * add fuse_relu_depthwise_conv_pass unittest
      
      * fix atol and rtol
      
      * fix according to review
      
      * Add FuseOptimizerPass and fuse_adam_pass unittest
      
      * add sgd and momentum unittest
      
      * add fuse_optimizer_pass
      
      * close amp
      
      * close amp
      
      * update
      
      * fix run on two cards
      
      * Update test_dist_fuse_adam_pass.py
      
      * Update test_dist_fuse_momentum_pass.py
      
      * Update test_dist_fuse_sgd_pass.py
      
      * Create test_dist_fuse_sgd_pass.py
      
      * Create test_dist_fuse_sgd_pass.py
      
      * Create test_dist_fuse_sgd_pass.py
      
      * Update test_dist_fuse_adam_pass.py
      
      * Update test_dist_fuse_momentum_pass.py
      
      * Update test_dist_fuse_sgd_pass.py
      ccdcfa2d
    • Z
      ps optimize refactor (#38982) · 196dbfc2
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      196dbfc2
    • Z
      [bf16] add bf16 cuda kernel: concat and split (#39380) · de0bad2a
      zhangbo9674 提交于
      * add concat & split
      
      * add concat kernel
      
      * add concat unittest
      
      * add split unittest
      de0bad2a
    • W
      0fee0044
    • B
      optimize sharding stage3 (#39334) · 23d559dd
      Baibaifan 提交于
      23d559dd
    • C
      Fix reduce_sum dtype dispatch bug on gpu (#39349) · 4d7ad277
      Chen Weihang 提交于
      * fix pten reduce dispatch bug
      
      * add cast beforce reduce
      
      * fix test failed
      4d7ad277
    • L
      [bf16] support printing bf16 tensor (#39375) · f57b21e6
      Leo Chen 提交于
      f57b21e6
    • S
      Add __PD_DEFINE_RAW_OP_KERNEL_FUNC for registering custom op kernel with ExecutionContext (#39352) · 5c3873f6
      sneaxiy 提交于
      * hack custom op
      
      * add ut
      
      * skip windows ci
      5c3873f6
  4. 07 2月, 2022 5 次提交
  5. 04 2月, 2022 1 次提交
  6. 30 1月, 2022 5 次提交
  7. 29 1月, 2022 3 次提交
    • R
      fix paddle.where broadcast bug (#39182) · 92253f11
      ronnywang 提交于
      92253f11
    • C
      [PTen] Tidy pten core headers (#39188) · dd990981
      Chen Weihang 提交于
      * open header for custom kernel
      
      * add core utils
      
      * tidy core code
      
      * tify header
      
      * tidy include
      
      * tidy namespace
      
      * resolve conflit
      
      * fix unittest and coverage
      
      * remove platform using
      
      * resolve conflict
      
      * resolve conflict
      
      * fix digamma namespace error
      
      * fix xpu full kernel error
      
      * fix xpu full kernel error
      
      * polish details
      
      * add place for lib storage
      dd990981
    • T
      Symbolic Hessian (#39221) · 64e7c715
      Tongxin Bai 提交于
      * [autograd] static Jacobian pass tests.
      
      * [autograd] apply CR suggested changes.
      
      * [autograd] more tests.
      
      * [autograd] add CPUPlace in tests.
      
      * [autograd] bug fixes.
      
      * [autograd] reformatted.
      
      * [autograd] adding Hessian, in progress.
      
      * [autograd] Hessian passes. A double grad bug fixed.
      
      * [autograd] fix renaming conflict in double backward pass.
      
      * [autograd] polish test.s
      
      * fix a bug when using brackets
      
      * debug for ci
      
      * [autograd] fixing Hessian test.
      
      * polish format.
      Co-authored-by: Nlevi131 <83750468+levi131@users.noreply.github.com>
      Co-authored-by: Nlevi131 <limaolin01@baidu.com>
      64e7c715