• G
    This PR improve performance of prior_box op about 1.25x faster on CPU. (#15909) · 630c1e83
    guomingz 提交于
    * This PR improve performance of prior_box op about 1.25x faster on CPU.
    
    * Test Env:SKX 8180 with fake data on 28 threads(bs=1).
    * The below table shows the ~25% improvement which generated by [eval_tp_fake_data.py](https://github.com/PaddlePaddle/Paddle/issues/15618#issuecomment-464613976).
    
    | Type |Event | Calls |   Total     |  Min.    |   Max.      |  Ave.      |  Ratio.|
    | ---------------- | ------------------ | ---- | ------- | -------- | -------- | ------------ | -------- |
    | w/ optimization  | thread0::prior_box | 6000 | 921.201 | 0.110572 | 0.383402 | **0.153533** | 0.084585 |
    | w/o optimization | thread0::prior_box | 6000 | 1151.85 | 0.102276 | 0.426702 | **0.191976** | 0.103337 |
    
    test=develop
    
    * Fix the style issue.
    
    test=develop
    630c1e83
prior_box_op.h 7.3 KB