1. 26 2月, 2019 3 次提交
    • Y
      Optimize the CUDA implementation of sequence_expand op by reduce the times of... · f4634d76
      Yiqun Liu 提交于
      Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU. (#15493)
      
      * Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU.
      test=develop
      
      * Refine the op benchmark to support setting lod in config.
      test=develop
      f4634d76
    • G
      This PR improve performance of prior_box op about 1.25x faster on CPU. (#15909) · 630c1e83
      guomingz 提交于
      * This PR improve performance of prior_box op about 1.25x faster on CPU.
      
      * Test Env:SKX 8180 with fake data on 28 threads(bs=1).
      * The below table shows the ~25% improvement which generated by [eval_tp_fake_data.py](https://github.com/PaddlePaddle/Paddle/issues/15618#issuecomment-464613976).
      
      | Type |Event | Calls |   Total     |  Min.    |   Max.      |  Ave.      |  Ratio.|
      | ---------------- | ------------------ | ---- | ------- | -------- | -------- | ------------ | -------- |
      | w/ optimization  | thread0::prior_box | 6000 | 921.201 | 0.110572 | 0.383402 | **0.153533** | 0.084585 |
      | w/o optimization | thread0::prior_box | 6000 | 1151.85 | 0.102276 | 0.426702 | **0.191976** | 0.103337 |
      
      test=develop
      
      * Fix the style issue.
      
      test=develop
      630c1e83
    • C
      Add alloc_continuous_space_op (#15900) · 7ca8553d
      chengduo 提交于
      * add alloc_continuous_space_op
      test=develop
      
      * Polish code
      test=develop
      
      * follow comment
      test=develop
      7ca8553d
  2. 25 2月, 2019 11 次提交
  3. 24 2月, 2019 3 次提交
  4. 23 2月, 2019 2 次提交
  5. 22 2月, 2019 21 次提交