1. 28 9月, 2019 1 次提交
    • B
      Follow comment of Merged QAT PR 18970 (#19979) · 9de67725
      bingyanghuang 提交于
      * Follow Wangzhen's comment in PR 18970, test=develop
      
      * Review comments, test=develop
      
      * Leave fake quantization around mul
      
      test=develop
      
      * Replace Fake with Real Quantized Mul
      
      test=develop
      
      * Fix bug in quantize placement pass
      
      Nodes in the graph now have checked type instead of node name when they are to be marked for quantization test=develop
      9de67725
  2. 27 9月, 2019 2 次提交
  3. 26 9月, 2019 1 次提交
  4. 19 9月, 2019 2 次提交
    • J
      Fix conv2d+dequantize squash for residual fusion (#19545) · 3f1d0234
      joanna.wozna.intel 提交于
      * Fix conv2d+dequantize squash for residual fusion
      
      test=develop
      
      * Change condition
      
      test=develop
      3f1d0234
    • Y
      Add a pass to fuse fc+elementwise_add+layernorm (#19776) · 3cd985a6
      Yiqun Liu 提交于
      * Add fc_elementwise_layernorm_fuse pass and unittest.
      
      * Add fused_fc_elementwise_layernorm op and its GPU kernel.
      test=develop
      
      * Apply fc_elementwise_layernorm_fuse_pass to GPU inference.
      
      * Add the setting of attrs in the definition of binary_op.
      test=develop
      
      * Add comment.
      
      * Implement the unittest.
      test=develop
      
      * Change the unittest name of layer_norm.
      test=develop
      3cd985a6
  5. 18 9月, 2019 2 次提交
  6. 16 9月, 2019 2 次提交
    • C
      Fix warning info of build_strategy (#19805) · 82814970
      chengduo 提交于
      * fix warning info
      test=develop
      
      * fix bug of all_reduce_deps_pass
      test=develop
      82814970
    • Y
      Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733) · c67c8758
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      
      * Enhance fc_fuse_pass to enable fusing relu.
      
      * Allow print the shapes of var_desc in graph.
      test=develop
      
      * Enhance fc_fuse_pass_tester.
      
      * Remove the use of PADDLE_ENFORCE.
      test=develop
      
      * Correct the number of ops after fusing.
      test=develop
      
      * Fix a typo.
      test=develop
      
      * Set activation_type to null when there is no relu in fc.
      test=develop
      
      * Refine fc_fuse_pass's codes.
      
      * Enable the set of shape for tensor.
      
      * Refine repeated_fc_relu_pass and add unittest.
      test=develop
      c67c8758
  7. 13 9月, 2019 1 次提交
    • C
      Open fuse all reduce option (#19765) · 056fdedd
      chengduo 提交于
      * Open fuse all reduce op
      test=develop
      
      * Add Fuse optimization op log
      
      * Add log in fuse_optimizer op pass and fuse all_reduce op pass
      
      * replace with boost::optional<bool>
      test=develop
      
      * Polish code
      test=develop
      
      * fix code coverage
      test=develop
      056fdedd
  8. 11 9月, 2019 3 次提交
  9. 06 9月, 2019 1 次提交
  10. 04 9月, 2019 1 次提交
    • B
      Enable ngraph through build_strategy (#19266) · a3a4b6e5
      baojun 提交于
      * enable ngraph throught build_strategy test=develop
      
      * add unittest test=develop
      
      * put use_ngraph unconditional test=develop
      
      * remove paddle_enforce test=develop
      
      * remove paddle_enforce test=develop
      
      * fix copyright test=develop
      
      * limit for ngraph only test=develop
      a3a4b6e5
  11. 03 9月, 2019 2 次提交
    • T
      refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603) · 75d15719
      Tao Luo 提交于
      test=develop
      75d15719
    • Y
      A a pass to enable the use of cudnn (#19346) · c5548178
      Yiqun Liu 提交于
      * Add a interface to enable cudnn for inference.
      
      * Add cudnn_placement_pass.
      test=develop
      
      * Set the default value of cudnn_enabled_op_types to null.
      test=develop
      
      * Write the common basic class, placement_pass_base, to refine the codes.
      test=develop
      
      * Call EnableCUDNN in unittest.
      test=develop
      
      * Refine cudnn_placement_pass tester.
      
      * Enable the testing of cudnn_placement_pass in inference's unittest.
      test=develop
      
      * Add the check of op kernels.
      test=develop
      c5548178
  12. 30 8月, 2019 1 次提交
    • Y
      Add a pass to replace dropout_op with scale_op when is_test is true (#19297) · fcec365d
      Yiqun Liu 提交于
      * Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true.
      test=develop
      
      * Delete dropout_op directly when upscale_in_train is true.
      test=develop
      
      * Improve the debug string, adding the print of op_desc information.
      
      * Fix the case when dropout's input x is reused as the next op's output.
      
      * Add the pass to inference.
      test=develop
      
      * Change the log level.
      test=develop
      
      * Add unittest for inplace case.
      
      * Add comment to explain the pass.
      
      * Apply the pass for CPU inference.
      test=develop
      
      * Fix the typo.
      test=develop
      
      * Add the check of AttrType.
      test=develop
      fcec365d
  13. 28 8月, 2019 1 次提交
    • T
      Fix the correctness of async mode at distributed training (#18863) · 65c73684
      tangwei12 提交于
      * fix correctness of the communicator
      
      * fix a bug in send thread when sending var context is empty, test=develop
      
      * add lookup_table_prefetch_op and prefetch optimize, test=develop
      
      * remove remote prefetch GPU supported
      
      * word2vec force with CPU, test=develop
      
      * test dist remote lookup table force with CPU, test=develop
      65c73684
  14. 27 8月, 2019 1 次提交
  15. 23 8月, 2019 1 次提交
  16. 21 8月, 2019 1 次提交
  17. 19 8月, 2019 3 次提交
  18. 15 8月, 2019 1 次提交
  19. 13 8月, 2019 1 次提交
  20. 12 8月, 2019 2 次提交
  21. 09 8月, 2019 1 次提交
  22. 06 8月, 2019 1 次提交
  23. 02 8月, 2019 2 次提交
    • Z
      Open gc by default (#18836) · 7ac748ad
      Zeng Jinle 提交于
      * open gc by default, test=develop
      
      * fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop
      
      * fix conditional_block op eager deletion bug, test=develop
      
      * add some comments to reviewers, test=develop
      7ac748ad
    • Fusion: seqpool_cvm_concat (#18471) · ee2f296e
      石晓伟 提交于
      * add fusion_seqpool_cvm_concat test=develop
      
      * simplify pass, test=develop
      
      * fix code style, test=develop
      ee2f296e
  24. 29 7月, 2019 1 次提交
  25. 27 7月, 2019 1 次提交
  26. 26 7月, 2019 1 次提交
    • Z
      Feature/mem opt pass refactor (#18735) · a802da65
      Zeng Jinle 提交于
      * first version memory optimize pass, test=develop
      
      * remove move_tensor_sharing_pass, test=develop
      
      * refine code comments, add unittests, test=develop
      
      * turn off memory_optimize by default, test=develop
      
      * follow huihuang's comments, test=develop
      
      * follow chengduoZH's comments, test=develop
      
      * fix grammar error, add const qualifier, fix pass_test exception message, test=develop
      
      * follow chengduoZH's comments 2nd, test=develop
      a802da65
  27. 24 7月, 2019 1 次提交
    • Z
      Update trt5 for paddle-trt (#18645) · 26ae6d49
      Zhaolong Xing 提交于
      * update paddle-trt for:
          1. fix bug: when batch > 2, core in split plugin.
          2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
          3. add new attr to dropout.
          4. shuffle channel, swish, relu6 support
          test=develop
      
      * 1. fix ci
      test=develop
      26ae6d49
  28. 23 7月, 2019 1 次提交
  29. 19 7月, 2019 1 次提交
    • H
      Support memory eager deletion on recurrent OP (#17710) · 89bc3fd8
      Huihuang Zheng 提交于
      Test PaddingRNN on V100 GPU device.
      
      Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.
                         
      GPU memory (MiB):   6414 (this PR)     vs   6837 (without this PR)
      Speed (steps/s):         10.28 (this PR)    vs    9.89 (without this PR)
       
      89bc3fd8