1. 03 9月, 2019 1 次提交
    • Y
      A a pass to enable the use of cudnn (#19346) · c5548178
      Yiqun Liu 提交于
      * Add a interface to enable cudnn for inference.
      
      * Add cudnn_placement_pass.
      test=develop
      
      * Set the default value of cudnn_enabled_op_types to null.
      test=develop
      
      * Write the common basic class, placement_pass_base, to refine the codes.
      test=develop
      
      * Call EnableCUDNN in unittest.
      test=develop
      
      * Refine cudnn_placement_pass tester.
      
      * Enable the testing of cudnn_placement_pass in inference's unittest.
      test=develop
      
      * Add the check of op kernels.
      test=develop
      c5548178
  2. 30 8月, 2019 1 次提交
    • Y
      Add a pass to replace dropout_op with scale_op when is_test is true (#19297) · fcec365d
      Yiqun Liu 提交于
      * Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true.
      test=develop
      
      * Delete dropout_op directly when upscale_in_train is true.
      test=develop
      
      * Improve the debug string, adding the print of op_desc information.
      
      * Fix the case when dropout's input x is reused as the next op's output.
      
      * Add the pass to inference.
      test=develop
      
      * Change the log level.
      test=develop
      
      * Add unittest for inplace case.
      
      * Add comment to explain the pass.
      
      * Apply the pass for CPU inference.
      test=develop
      
      * Fix the typo.
      test=develop
      
      * Add the check of AttrType.
      test=develop
      fcec365d
  3. 28 8月, 2019 1 次提交
    • T
      Fix the correctness of async mode at distributed training (#18863) · 65c73684
      tangwei12 提交于
      * fix correctness of the communicator
      
      * fix a bug in send thread when sending var context is empty, test=develop
      
      * add lookup_table_prefetch_op and prefetch optimize, test=develop
      
      * remove remote prefetch GPU supported
      
      * word2vec force with CPU, test=develop
      
      * test dist remote lookup table force with CPU, test=develop
      65c73684
  4. 27 8月, 2019 1 次提交
  5. 23 8月, 2019 1 次提交
  6. 21 8月, 2019 1 次提交
  7. 19 8月, 2019 3 次提交
  8. 15 8月, 2019 1 次提交
  9. 13 8月, 2019 1 次提交
  10. 12 8月, 2019 2 次提交
  11. 09 8月, 2019 1 次提交
  12. 06 8月, 2019 1 次提交
  13. 02 8月, 2019 2 次提交
    • Z
      Open gc by default (#18836) · 7ac748ad
      Zeng Jinle 提交于
      * open gc by default, test=develop
      
      * fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop
      
      * fix conditional_block op eager deletion bug, test=develop
      
      * add some comments to reviewers, test=develop
      7ac748ad
    • Fusion: seqpool_cvm_concat (#18471) · ee2f296e
      石晓伟 提交于
      * add fusion_seqpool_cvm_concat test=develop
      
      * simplify pass, test=develop
      
      * fix code style, test=develop
      ee2f296e
  14. 29 7月, 2019 1 次提交
  15. 27 7月, 2019 1 次提交
  16. 26 7月, 2019 1 次提交
    • Z
      Feature/mem opt pass refactor (#18735) · a802da65
      Zeng Jinle 提交于
      * first version memory optimize pass, test=develop
      
      * remove move_tensor_sharing_pass, test=develop
      
      * refine code comments, add unittests, test=develop
      
      * turn off memory_optimize by default, test=develop
      
      * follow huihuang's comments, test=develop
      
      * follow chengduoZH's comments, test=develop
      
      * fix grammar error, add const qualifier, fix pass_test exception message, test=develop
      
      * follow chengduoZH's comments 2nd, test=develop
      a802da65
  17. 24 7月, 2019 1 次提交
    • Z
      Update trt5 for paddle-trt (#18645) · 26ae6d49
      Zhaolong Xing 提交于
      * update paddle-trt for:
          1. fix bug: when batch > 2, core in split plugin.
          2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
          3. add new attr to dropout.
          4. shuffle channel, swish, relu6 support
          test=develop
      
      * 1. fix ci
      test=develop
      26ae6d49
  18. 23 7月, 2019 1 次提交
  19. 19 7月, 2019 1 次提交
    • H
      Support memory eager deletion on recurrent OP (#17710) · 89bc3fd8
      Huihuang Zheng 提交于
      Test PaddingRNN on V100 GPU device.
      
      Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.
                         
      GPU memory (MiB):   6414 (this PR)     vs   6837 (without this PR)
      Speed (steps/s):         10.28 (this PR)    vs    9.89 (without this PR)
       
      89bc3fd8
  20. 11 7月, 2019 1 次提交
    • Z
      Feature/buffer_shared_inplace (#17911) · d3003a16
      Zeng Jinle 提交于
      * feature/buffer_shared_inplace, test=develop
      
      * refine code, test=develop
      
      * fix elementwise_add op cpu inplace and sum inplace bug, test=develop
      
      * add unittest and debug log, test=develop
      
      * fix parallel_executor scope bug, polish code, test=develop
      
      * fix sum op, activation op, single_in_place_inference bug, test=develop
      
      * remove kLocalExecScopeName, test=develop
      
      * fix unittest,test=develop
      
      * fix out_var first version bug, test=develop
      
      * follow comments,test=develop
      d3003a16
  21. 08 7月, 2019 2 次提交
  22. 04 7月, 2019 1 次提交
  23. 01 7月, 2019 1 次提交
    • M
      Fix Pooling output scale (#18186) · 7023a86c
      Michał Gallus 提交于
      * Int8: Fix Pooling output scale
      
      test=develop
      
      * Update scales quantization for certain operators
      
      These include: concat, transpose, pool and reshape. test=develop
      
      * Move concat minimum scale finding to quantizer
      
      test=develop
      7023a86c
  24. 27 6月, 2019 1 次提交
  25. 24 6月, 2019 1 次提交
  26. 14 6月, 2019 1 次提交
  27. 11 6月, 2019 2 次提交
    • G
      Polish codes of old prs. (#17938) · da9143c1
      gongweibao 提交于
      da9143c1
    • Update the Anakin interfaces for content-dnn and MLU (#17890) · bce259e5
      石晓伟 提交于
      * update anakin-engine interfaces for content-dnn
      
      test=develop
      
      * support only-gpu mode of Anakin
      
      modify eltwise parse
      
      test=develop
      
      * modification for thread-safe
      
      test=develop
      
      * Integrated template instance
      
      test=develop
      
      * increase template parameters
      
      test=develop
      
      * support MLU predictor
      
      test=develop
      
      * update anakin cmake files
      
      test=develop
      
      * update TargetWrapper::set_device
      
      * update the initialization of anakin subgraph
      
      test=develop
      
      * use the default constructor of base class
      
      test=develop
      bce259e5
  28. 10 6月, 2019 2 次提交
  29. 06 6月, 2019 1 次提交
  30. 30 5月, 2019 1 次提交
  31. 29 5月, 2019 1 次提交
  32. 28 5月, 2019 1 次提交
    • J
      [MKL-DNN] conv_transpose mkldnn bias pass (#17644) · 6d8075ec
      Jacek Czaja 提交于
      * - changes to graph detector
      
      - Changes to pass
      
      - Added ut for new pass
      
      - use_pass
      
      - Added pass to mkldnn passes
      
      - fix to registration
      
      - improved verbose messaging for conv bias passes
      
      - Lint fixes
      
      test=develop
      
      * - Lint fixes
      
      test=develop
      6d8075ec
  33. 27 5月, 2019 1 次提交
    • S
      add Concat quantization (#17448) · 96845d21
      Sylwester Fraczek 提交于
      * add Concat quantization
      add unit test for quantizing concat
      fix for wrong value when the input is not in map of calculated scales
      add use_quantizer to concat_op.cc
      add scale_algo rules for concat
      
      test=develop
      
      * missing fix for multiple inputs quantize-squash
      
      * wojtuss review fix: adding comment
      
      test=develop
      96845d21