1. 19 9月, 2019 1 次提交
    • Y
      Add a pass to fuse fc+elementwise_add+layernorm (#19776) · 3cd985a6
      Yiqun Liu 提交于
      * Add fc_elementwise_layernorm_fuse pass and unittest.
      
      * Add fused_fc_elementwise_layernorm op and its GPU kernel.
      test=develop
      
      * Apply fc_elementwise_layernorm_fuse_pass to GPU inference.
      
      * Add the setting of attrs in the definition of binary_op.
      test=develop
      
      * Add comment.
      
      * Implement the unittest.
      test=develop
      
      * Change the unittest name of layer_norm.
      test=develop
      3cd985a6
  2. 18 9月, 2019 1 次提交
  3. 17 9月, 2019 1 次提交
  4. 11 9月, 2019 1 次提交
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  5. 09 9月, 2019 1 次提交
  6. 03 9月, 2019 1 次提交
    • Y
      A a pass to enable the use of cudnn (#19346) · c5548178
      Yiqun Liu 提交于
      * Add a interface to enable cudnn for inference.
      
      * Add cudnn_placement_pass.
      test=develop
      
      * Set the default value of cudnn_enabled_op_types to null.
      test=develop
      
      * Write the common basic class, placement_pass_base, to refine the codes.
      test=develop
      
      * Call EnableCUDNN in unittest.
      test=develop
      
      * Refine cudnn_placement_pass tester.
      
      * Enable the testing of cudnn_placement_pass in inference's unittest.
      test=develop
      
      * Add the check of op kernels.
      test=develop
      c5548178
  7. 30 8月, 2019 2 次提交
    • L
      d6cb1a41
    • Y
      Add a pass to replace dropout_op with scale_op when is_test is true (#19297) · fcec365d
      Yiqun Liu 提交于
      * Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true.
      test=develop
      
      * Delete dropout_op directly when upscale_in_train is true.
      test=develop
      
      * Improve the debug string, adding the print of op_desc information.
      
      * Fix the case when dropout's input x is reused as the next op's output.
      
      * Add the pass to inference.
      test=develop
      
      * Change the log level.
      test=develop
      
      * Add unittest for inplace case.
      
      * Add comment to explain the pass.
      
      * Apply the pass for CPU inference.
      test=develop
      
      * Fix the typo.
      test=develop
      
      * Add the check of AttrType.
      test=develop
      fcec365d
  8. 22 8月, 2019 1 次提交
  9. 21 8月, 2019 1 次提交
  10. 19 8月, 2019 2 次提交
  11. 15 8月, 2019 1 次提交
  12. 12 8月, 2019 1 次提交
  13. 09 8月, 2019 1 次提交
  14. 05 8月, 2019 1 次提交
  15. 02 8月, 2019 1 次提交
  16. 31 7月, 2019 2 次提交
    • L
      fix several security bugs reported by security team (#18831) · 0d996908
      liuwei1031 提交于
      * fix security issue, test=develop
      
      * bug fix, test=develop
      
      * throw an exception when null pointer data with non-zero length PaddleBuf is passed, test=develop
      0d996908
    • Z
      Trt fp16 support (#18860) · 61238d31
      Zhaolong Xing 提交于
      * Fix Mask rcnn predictor
          1. refine memory optim algorithm to support the model with the block op.
          2. output diff : modify the affine channel fuse
          3. add condition_block_infer op
      add interface for setting trt calib table dir
      test=develop
      
      * add the missing files.
      test=develop
      
      * 1 add trt fp16 support
      test=develop
      61238d31
  17. 24 7月, 2019 1 次提交
    • Z
      Update trt5 for paddle-trt (#18645) · 26ae6d49
      Zhaolong Xing 提交于
      * update paddle-trt for:
          1. fix bug: when batch > 2, core in split plugin.
          2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
          3. add new attr to dropout.
          4. shuffle channel, swish, relu6 support
          test=develop
      
      * 1. fix ci
      test=develop
      26ae6d49
  18. 17 7月, 2019 1 次提交
    • Fix Bitmain Predictor::Clone() (#18599) · 25d80791
      石晓伟 提交于
      * update anakin-engine interfaces for content-dnn
      
      test=develop
      
      * support only-gpu mode of Anakin
      
      modify eltwise parse
      
      test=develop
      
      * modification for thread-safe
      
      test=develop
      
      * Integrated template instance
      
      test=develop
      
      * increase template parameters
      
      test=develop
      
      * support MLU predictor
      
      test=develop
      
      * update anakin cmake files
      
      test=develop
      
      * update TargetWrapper::set_device
      
      * update the initialization of anakin subgraph
      
      test=develop
      
      * use the default constructor of base class
      
      test=develop
      
      * load model from buffer with length
      
      test=develop
      
      * modify the access level of class
      
      test=develop
      
      * support anakin for bitmain arch
      
      test=develop
      
      * remove files
      
      * checkout cmakelists
      
      test=develop
      
      * modify interfaces
      
      test=develop
      
      * add cmake dependments
      
      test=develop
      
      * enforce the outputs of net
      
      test=develop
      25d80791
  19. 11 7月, 2019 1 次提交
  20. 09 7月, 2019 1 次提交
  21. 08 7月, 2019 2 次提交
    • Z
      Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532) · 88b52a27
      Zhaolong Xing 提交于
      * Fix Mask rcnn predictor
          1. refine memory optim algorithm to support the model with the block op.
          2. output diff : modify the affine channel fuse
          3. add condition_block_infer op
      add interface for setting trt calib table dir
      test=develop
      
      * add the missing files.
      test=develop
      88b52a27
    • Support Bitmain Anakin (#18542) · 15291548
      石晓伟 提交于
      * update anakin-engine interfaces for content-dnn
      
      test=develop
      
      * support only-gpu mode of Anakin
      
      modify eltwise parse
      
      test=develop
      
      * modification for thread-safe
      
      test=develop
      
      * Integrated template instance
      
      test=develop
      
      * increase template parameters
      
      test=develop
      
      * support MLU predictor
      
      test=develop
      
      * update anakin cmake files
      
      test=develop
      
      * update TargetWrapper::set_device
      
      * update the initialization of anakin subgraph
      
      test=develop
      
      * use the default constructor of base class
      
      test=develop
      
      * load model from buffer with length
      
      test=develop
      
      * modify the access level of class
      
      test=develop
      
      * support anakin for bitmain arch
      
      test=develop
      
      * remove files
      
      * checkout cmakelists
      
      test=develop
      15291548
  22. 03 7月, 2019 1 次提交
  23. 02 7月, 2019 1 次提交
  24. 01 7月, 2019 1 次提交
    • M
      Fix Pooling output scale (#18186) · 7023a86c
      Michał Gallus 提交于
      * Int8: Fix Pooling output scale
      
      test=develop
      
      * Update scales quantization for certain operators
      
      These include: concat, transpose, pool and reshape. test=develop
      
      * Move concat minimum scale finding to quantizer
      
      test=develop
      7023a86c
  25. 27 6月, 2019 2 次提交
  26. 21 6月, 2019 1 次提交
  27. 19 6月, 2019 1 次提交
    • fix spelling errors (#17941) · 802ea509
      翟飞跃 提交于
      * fix spelling errors; test=develop
      
      * Update API.spec
      
      update md5
      
      * Update API.spec
      
      * change the order of api;test=develop
      802ea509
  28. 12 6月, 2019 1 次提交
  29. 11 6月, 2019 1 次提交
    • Update the Anakin interfaces for content-dnn and MLU (#17890) · bce259e5
      石晓伟 提交于
      * update anakin-engine interfaces for content-dnn
      
      test=develop
      
      * support only-gpu mode of Anakin
      
      modify eltwise parse
      
      test=develop
      
      * modification for thread-safe
      
      test=develop
      
      * Integrated template instance
      
      test=develop
      
      * increase template parameters
      
      test=develop
      
      * support MLU predictor
      
      test=develop
      
      * update anakin cmake files
      
      test=develop
      
      * update TargetWrapper::set_device
      
      * update the initialization of anakin subgraph
      
      test=develop
      
      * use the default constructor of base class
      
      test=develop
      bce259e5
  30. 06 6月, 2019 3 次提交
    • update the initialization of anakin subgraph (#17880) · d008260f
      石晓伟 提交于
      test=develop
      d008260f
    • Z
      ae576f3c
    • INT8 MKL-DNN v2 integrate to slim (#17634) · 993c703b
      翟飞跃 提交于
      * refactor PR 16865
      
      * delete mergetool files
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * create dir for int8 model before call SaveOptimModel
      
      * test=develop
      
      * mkldnn int8 only support linux; test=develop
      
      * refine code; test=develop
      
      * remove comment; test=develop
      
      * refine code; test=develop
      
      * fix bug; test=develop
      
      * add exception for mkldnn_post_training_strategy
      
      * reuse int8v2 CAPI dataset; test=develop
      
      * fix accuracy check bug; test=develop
      
      * remove tab
      
      * convert files to unix format
      
      * test=develop
      
      * reduce CI time;test=develop
      
      * reduce CI time and refine code;test=develop
      
      * refine comment; test=develop
      
      * add cmake FLAGS;test=develop
      
      * remove predict_num;test=develop
      993c703b
  31. 03 6月, 2019 1 次提交
  32. 29 5月, 2019 1 次提交
  33. 28 5月, 2019 1 次提交
    • L
      Improve mobilenetv2 INT8 performance by using INT8 relu as post-op (#17570) · 04b6c29e
      lidanqing 提交于
      * add INT8 conv+relu6 fuse and enbale mobilentv2 INT8 test
      test=develop
      
      * change fasle and 0.0 to fuse_brelu and brelu_threshold
      test=develop
      
      change the "fuse_relu||fuse_brelu" to "unsigned_output"
      test=develop
      
      * Use relu instead of brelu as INT8 post-op because INT8 brelu is not enabled in mkldnn v0.18
      test=develop
      
      * continuous-integration fix
      test=develop
      04b6c29e