1. 27 11月, 2019 1 次提交
    • M
      INT8 Fully-connected (#17641) · 5d7d5482
      Michał Gallus 提交于
      * Implement Int8 FC
      
      * Integrate FC into INT8v2
      
      test=develop
      
      * int8 FC: transpose weights before computing scales
      
      test=develop
      
      * Add support for activation_type string in FC
      
      test=develop
      
      * Disable MKL-DNN's FC in VGG16 and 19
      
      test=develop
      
      * Disable FC quantization when mkldnn FC is disabled
      
      test=develop
      
      * Solve PADDLE_ENFORCES in FC int8
      
      * Fix Paddle enforces and remove const cast
      
      test=develop
      
      * Fix style changes
      
      test=develop
      
      * Fix quantizer_tester test and add fc quantization
      
      test=develop
      
      * Fix FC test fail on CUDA
      
      * Remove unnecessary log from quantize placement pass
      
      test=develop
      
      * Add Thread ID to FC hash key
      
      test=develop
      
      * Add comments to MKL-DNN FC Kernel
      
      test=develop
      
      * Refactor quantizer
      
      test=develop
      
      * Fix linter issues
      
      test=develop
      
      * Fix crash in slim googlenet
      
      test=develop
      
      * Fix PADDLE_ENFORCE messages
      
      test=develop
      5d7d5482
  2. 26 11月, 2019 1 次提交
    • G
      Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) · 234060f8
      GaoWei8 提交于
      * Add fc padding to solve mkl performance
      test=develop
      
      * fix gpu pass and error information
      test=develop
      
      * fix fc_fuse_pass_test
      test=develop
      
      * fix error information
      test=develop
      
      * fix error information
      test=develop
      
      * fix name and add fc op padding test
      test=develop
      
      * fix attributes
      test=develop
      
      * optimize fc padding
      test=develop
      
      * fix test
      test=develop
      234060f8
  3. 25 11月, 2019 1 次提交
  4. 24 11月, 2019 1 次提交
  5. 22 11月, 2019 1 次提交
  6. 20 11月, 2019 1 次提交
    • Y
      Enable generating code for a given subgraph. (#21126) · 6b1e1f0d
      Yiqun Liu 提交于
      * Enable generating code for a given subgraph.
      
      * Support sorting the subgraph.
      
      * Remove the rearange of expressions because we use the sorted subgraph directly.
      
      * Enable generating code for a subgraph which is composed of grad ops.
      
      * Use expression information to check the accuracy in unittest.
      
      * Separate load and store from computation expressions.
      test=develop
      
      * Improve the loading statements in generated codes.
      test=develop
      
      * Remove unused arguments from formal list.
      test=develop
      6b1e1f0d
  7. 18 11月, 2019 2 次提交
  8. 14 11月, 2019 1 次提交
  9. 13 11月, 2019 1 次提交
  10. 12 11月, 2019 1 次提交
  11. 11 11月, 2019 2 次提交
    • C
      Add pre-condition check for fuse optimizer op pass (#21005) · 826254f6
      Chen Weihang 提交于
      * add pre condition check for fuse optimizer op pass, test=develop
      
      * add log & set init to zero, test=develop
      
      * fix test_fuse_all_reduce_pass failed, test=develop
      
      * polish details, test=develop
      
      * refine PADDLE_ENFORCE & remove needless VLOG, test=develop
      
      * refactor op check method, test=develop
      826254f6
    • Y
      Support generating code for grad_op (#21066) · 9091f8cd
      Yiqun Liu 提交于
      * Add the definition of operation in fusion_group.
      
      * Use operations in OperationMap to detect fusion_group of elementwise pattern.
      
      * Add namespace fusion_group in code_generator.
      
      * Use operations recorded in OperationMap to generate code.
      
      * Remove implementation codes to .cc file.
      
      * Refine Operation and CodeGenerator to make it easier to generate code for grad_op.
      Refine the unittest for better reuse.
      
      * Avoid recording the template's keyword in a array.
      
      * Support the generating of code for grad_op and add unittest.
      test=develop
      
      * Remove replaced_element_in_order and use use number instead.
      test=develop
      9091f8cd
  12. 08 11月, 2019 1 次提交
    • J
      Add transpose2 INT8 for mkl-dnn (#19424) · 77c20835
      joanna.wozna.intel 提交于
      * Add transpose2 INT8 for mkl-dnn
      
      test=develop
      
      * Fix test_transpose_int8_mkldnn
      
      test=develop
      
      * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"
      
      This reverts commit 34011bdb, reversing
      changes made to 2ce6473f.
      
      * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2""
      
      This reverts commit 23754dd7.
      
      * Add template to TransposeMKLDNNHandler
      
      test=develop
      
      * Resolve conflict
      
      test=develop
      
      * Restore get_size and refactor
      
      test=develop
      77c20835
  13. 05 11月, 2019 1 次提交
    • Z
      Support NoNeedBufferVarsInference in dygraph backward (#20868) · 878a40f5
      Zeng Jinle 提交于
      * support no need buffer vars in dygraph, test=develop
      
      * fix inference compilation error, test=develop
      
      * update no_need_buffer_vars_inference, test=develop
      
      * add unittests for no_need_buffer_vars_context, test=develop
      
      * refine no_need_buffer_vars by return ref, test=develop
      
      * polish some codes, test=develop
      878a40f5
  14. 02 11月, 2019 1 次提交
  15. 01 11月, 2019 1 次提交
  16. 31 10月, 2019 1 次提交
    • H
      GradMaker for dygraph (#19706) · 8c4573a3
      hong 提交于
      * refactor dygraph,test=develop
      
      * fix failed unittest,test=develop
      
      * polish code,test=develop
      
      * check windows ci error,test=develop
      try to fix windows ci error by np.allclose,test=develop
      
      * polish vlog and profiler, test=develop
      
      * try to fix preceding ops order,test=develop
      
      * test transformer in windows ci, test=develop
      
      * use python c-api to speed up tracer.trace,test=develop
      
      * test=develop, fix docker with paddle nccl problem
      
      * test=develop, add ut for debug string and gradient_accumulator
      
      * test=develop, add tests for layer/gradient_accumulator/prepared_op
      
      * test=develop, fix complie error for test_prepared_op
      
      * test=develop, add more ut for dygraph
      
      * test=develop, create API.spec for dygraph api change
      
      * optimize grad maker; test=develop
      
      * optimize grad maker
      
      * test
      
      * grad make optim; test=develop
      
      * fix unittest bugs; test=develop
      
      * add dygraph grad op maker and split_op
      
      * grad op maker refactor; test=develop
      
      * add dygraph grad maker; test=develop
      
      * fix op deformable_conv_v1_op bug; test=develop
      
      * fix deformable_conv prroi pool bugs;
      
      * fix new op grad op maker bug; test=develop
      
      * fix split by ref bug; test=develop
      
      * fix dygraph auto prune bug; test=develop
      
      * fix test_trace bug; test=develop
      
      * fix fused emb seq pool bug; test=develop
      
      * remove useless code in op_desc file; test=develop
      
      * remove useless code, StrVarBaseNode; test=develop
      
      * fix review issues; test=develop
      
      * fix rank_loss grad maker; test=develop
      
      * remove flag in VarBase; test=develop
      
      * fix distributed_notify_op compile bug ; test=develop
      
      * fix reshape op double grad; test=develop
      
      * fix expand as op; test=develop
      
      * add impertive type_defs.h for demo_train; test=develop
      
      * fix inference lib cmake; test=develop
      
      * fix inference lib; test=develop
      
      * fix infernce_lib; test=develop
      
      * fix inference cmake; test=develop
      
      * fix inference lib; test=develop
      
      * fix inference lib; test=develop
      
      * remove condition dygraph grad maker, modify local name; test=develop
      
      * fix split grad maker bug; test=develop
      
      * fix pyramid_op bug; test=develop
      
      * change travis time out limit; test=develop
      
      * restore travis; test=develop
      
      * change timeout limit; test=develop
      8c4573a3
  17. 29 10月, 2019 1 次提交
    • Y
      Implement a pass detect fusion group of elementwise op (#19884) · b5f3be83
      Yiqun Liu 提交于
      * Add fusion_group_pass and elementwise pattern.
      
      * Rewrite the detector of elementwise group.
      test=develop
      
      * Add a comment in codegen.
      
      * Add more unittest cases.
      test=develop
      
      * Move code_generator related code to fusion_group directory.
      
      * Correct the including path.
      
      * Add the definition of SubGraph and finish the insert of fusion_group op in pass.
      
      * Insert graph_vis_pass in tester to visualize the graph for debug.
      b5f3be83
  18. 24 10月, 2019 1 次提交
  19. 19 10月, 2019 1 次提交
  20. 18 10月, 2019 1 次提交
  21. 15 10月, 2019 1 次提交
  22. 14 10月, 2019 1 次提交
  23. 13 10月, 2019 1 次提交
    • Z
      Add Multihead matmul fuse pass (#20167) · b8333ede
      zhaoyuchen2018 提交于
      * Add multihead fuse pass for ernie opt
      
      * Refine softmax
      
      test=develop
      
      * Refine cuda kernel
      
      * Refine cuda version
      
      * Refine cmake
      
      test=develop
      
      * refine header file
      
      * refine test case and pass
      * refine comments
      b8333ede
  24. 12 10月, 2019 1 次提交
  25. 28 9月, 2019 1 次提交
    • B
      Follow comment of Merged QAT PR 18970 (#19979) · 9de67725
      bingyanghuang 提交于
      * Follow Wangzhen's comment in PR 18970, test=develop
      
      * Review comments, test=develop
      
      * Leave fake quantization around mul
      
      test=develop
      
      * Replace Fake with Real Quantized Mul
      
      test=develop
      
      * Fix bug in quantize placement pass
      
      Nodes in the graph now have checked type instead of node name when they are to be marked for quantization test=develop
      9de67725
  26. 27 9月, 2019 2 次提交
  27. 26 9月, 2019 1 次提交
  28. 19 9月, 2019 2 次提交
    • J
      Fix conv2d+dequantize squash for residual fusion (#19545) · 3f1d0234
      joanna.wozna.intel 提交于
      * Fix conv2d+dequantize squash for residual fusion
      
      test=develop
      
      * Change condition
      
      test=develop
      3f1d0234
    • Y
      Add a pass to fuse fc+elementwise_add+layernorm (#19776) · 3cd985a6
      Yiqun Liu 提交于
      * Add fc_elementwise_layernorm_fuse pass and unittest.
      
      * Add fused_fc_elementwise_layernorm op and its GPU kernel.
      test=develop
      
      * Apply fc_elementwise_layernorm_fuse_pass to GPU inference.
      
      * Add the setting of attrs in the definition of binary_op.
      test=develop
      
      * Add comment.
      
      * Implement the unittest.
      test=develop
      
      * Change the unittest name of layer_norm.
      test=develop
      3cd985a6
  29. 18 9月, 2019 2 次提交
  30. 16 9月, 2019 2 次提交
    • C
      Fix warning info of build_strategy (#19805) · 82814970
      chengduo 提交于
      * fix warning info
      test=develop
      
      * fix bug of all_reduce_deps_pass
      test=develop
      82814970
    • Y
      Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733) · c67c8758
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      
      * Enhance fc_fuse_pass to enable fusing relu.
      
      * Allow print the shapes of var_desc in graph.
      test=develop
      
      * Enhance fc_fuse_pass_tester.
      
      * Remove the use of PADDLE_ENFORCE.
      test=develop
      
      * Correct the number of ops after fusing.
      test=develop
      
      * Fix a typo.
      test=develop
      
      * Set activation_type to null when there is no relu in fc.
      test=develop
      
      * Refine fc_fuse_pass's codes.
      
      * Enable the set of shape for tensor.
      
      * Refine repeated_fc_relu_pass and add unittest.
      test=develop
      c67c8758
  31. 13 9月, 2019 1 次提交
    • C
      Open fuse all reduce option (#19765) · 056fdedd
      chengduo 提交于
      * Open fuse all reduce op
      test=develop
      
      * Add Fuse optimization op log
      
      * Add log in fuse_optimizer op pass and fuse all_reduce op pass
      
      * replace with boost::optional<bool>
      test=develop
      
      * Polish code
      test=develop
      
      * fix code coverage
      test=develop
      056fdedd
  32. 11 9月, 2019 3 次提交