1. 02 4月, 2020 1 次提交
  2. 30 3月, 2020 1 次提交
  3. 26 3月, 2020 1 次提交
    • Z
      [Paddle-TRT]: Ernie Dynamic shape support. (#23138) · 430b0099
      Zhaolong Xing 提交于
      * add dynamic plugin support.
      test=develop
      
      * change emb eltwise layernorm to math function
      test=develop
      
      * add emb eltwise layernorm
      test=develop
      
      * can run dynamic shape ernie
      test=develop
      
      * fix ci
      test=develop
      
      * add ut for trt ernie dynamic
      
      test=develop
      
      * refine dynamic shape c++ interface.
      test=develop
      
      * fix comments
      test=develop
      
      * fix comments
      test=develop
      430b0099
  4. 05 2月, 2020 1 次提交
  5. 04 2月, 2020 1 次提交
  6. 09 1月, 2020 1 次提交
  7. 28 11月, 2019 1 次提交
  8. 07 11月, 2019 1 次提交
  9. 05 11月, 2019 1 次提交
    • Z
      Support NoNeedBufferVarsInference in dygraph backward (#20868) · 878a40f5
      Zeng Jinle 提交于
      * support no need buffer vars in dygraph, test=develop
      
      * fix inference compilation error, test=develop
      
      * update no_need_buffer_vars_inference, test=develop
      
      * add unittests for no_need_buffer_vars_context, test=develop
      
      * refine no_need_buffer_vars by return ref, test=develop
      
      * polish some codes, test=develop
      878a40f5
  10. 30 10月, 2019 1 次提交
  11. 28 10月, 2019 1 次提交
  12. 24 10月, 2019 1 次提交
  13. 18 10月, 2019 1 次提交
  14. 02 10月, 2019 1 次提交
  15. 30 9月, 2019 1 次提交
    • W
      fix compile paddle with anakin bug · 276b5e34
      Wilber 提交于
      * fix compile with anakin bug
      
      * remove useless deps test=develop
      
      - 修复了联编anakin时,遇到的bug.
      - 编译test_anakin_activate 不通过
      - 编译test_anakin_engine 不通过
      276b5e34
  16. 17 9月, 2019 1 次提交
  17. 11 9月, 2019 2 次提交
    • Z
      Make leaky relu inplacable (#19676) · 0daa5c97
      Zeng Jinle 提交于
      * make leaky relu inplacable, test=develop
      
      * force add unittests to pass coverage, test=develop
      0daa5c97
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  18. 08 9月, 2019 1 次提交
  19. 19 8月, 2019 1 次提交
    • A
      Add match_matrix_tensor op (#18525) · 78a3d837
      Aurelius84 提交于
      * add matrch_matrix_tensor op test=develop
      
      * fix ignore unittest if with_mkl=off test=develop
      
      * clean code and rm is_test param test=develop
      
      * modify API.spec test=develop
      
      * rm useless code in search_compute.h test=develop
      
      * modify api.spec test=develop
      
      * modify default_grad.spec test=develop
      
      * Add API test code test=develop
      
      * clean code in search_computer.h
      
      * modify PADDLE_ENFORCE and clean search_compute.h test=develop
      
      * fix code style test=develop
      78a3d837
  20. 06 8月, 2019 1 次提交
    • K
      Add var_conv_2d op (#18518) · e681d655
      Kevin 提交于
      * fix overflow by int32 mul test=develop
      
      * fix reference nullptr
      
      * fix codestyle test=develop
      
      * modify to point in ContextProjectFunctor test=develop
      
      * modify to point in ContextProjectFunctor test=develop
      
      * modify . to -> test=develop
      
      * add var_conv_2d op test=develop
      
      * edit api.spec test=develop
      
      * ignore unittest if with_mkl=off test=develop
      
      * fix python3 division test=develop
      
      * fix ignore unittest bug test=develop
      
      * remove useless code test=develop
      
      * modify api.spec test=develop
      
      * modify default_grad.spec test=develop
      e681d655
  21. 23 7月, 2019 1 次提交
  22. 27 6月, 2019 1 次提交
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
  23. 11 6月, 2019 1 次提交
    • Update the Anakin interfaces for content-dnn and MLU (#17890) · bce259e5
      石晓伟 提交于
      * update anakin-engine interfaces for content-dnn
      
      test=develop
      
      * support only-gpu mode of Anakin
      
      modify eltwise parse
      
      test=develop
      
      * modification for thread-safe
      
      test=develop
      
      * Integrated template instance
      
      test=develop
      
      * increase template parameters
      
      test=develop
      
      * support MLU predictor
      
      test=develop
      
      * update anakin cmake files
      
      test=develop
      
      * update TargetWrapper::set_device
      
      * update the initialization of anakin subgraph
      
      test=develop
      
      * use the default constructor of base class
      
      test=develop
      bce259e5
  24. 30 5月, 2019 1 次提交
  25. 17 5月, 2019 1 次提交
  26. 18 4月, 2019 1 次提交
  27. 28 3月, 2019 1 次提交
  28. 22 3月, 2019 1 次提交
  29. 20 3月, 2019 1 次提交
  30. 19 3月, 2019 1 次提交
  31. 16 3月, 2019 1 次提交
  32. 15 3月, 2019 1 次提交
    • Q
      Support sync batch norm. (#16121) · 8ad672a2
      qingqing01 提交于
      * Support Sync Batch Norm.
      * Note, do not enable it in one device.
      
      Usage:
      
      build_strategy = fluid.BuildStrategy()
      build_strategy.sync_batch_norm = True
      binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
              loss_name=loss_mean.name,
              build_strategy=build_strategy)
      8ad672a2
  33. 22 2月, 2019 1 次提交
  34. 30 1月, 2019 1 次提交
  35. 25 1月, 2019 1 次提交
    • B
      Adding ngraph_engine_op (#14948) · efce2567
      baojun 提交于
      * enable ngraph_engine_op
      test=develop
      
      * merge develop test=develop
      
      * avoid const_cast test=develop
      
      * rm ngraph_operator test=develop
      
      * Added TODO to move EnableNgraph test=develop
      
      * Add TODO to remove const_cast test=develop
      efce2567
  36. 24 1月, 2019 1 次提交
    • Y
      Add the CUDA kernel for beam_search op (#15020) · 3008fa12
      Yiqun Liu 提交于
      * Refine the beam_search op and test.
      
      * A basic CUDA implementation of beam_search for small batch_size.
      
      * Implement CUDA kernel for beam_search_op.
      
      * Use multiple CUDA threads in the same block to select the top beam.
      
      * Update the python api of beam_search op.
      
      * Enable extend function in CPU kernel of beam_search op.
      
      * Unify the CUDA codes.
      test=develop
      
      * Unify the CPU kernel of beam_search op.
      
      * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.
      
      * Update the description of beam_search in API.spec.
      
      * Enable the use of CUDA kernel in beam_search op.
      
      * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
      test=develop
      
      * Follow comments.
      test=develop
      
      * Call the CPU kernel for beam_search op when batch_size > 4.
      test=develop
      
      * Remove the except of is_empty op in PrepareData.
      test=develop
      3008fa12
  37. 18 1月, 2019 1 次提交
    • Z
      Tree conv op (#15217) · e2ba9668
      zhaozhehao 提交于
      * refactor tree2col operator with new memory mechanism test=develop
      
      * test=develop
      
      * test=develop
      
      * Modified API according to panyx0718 test=develop
      
      * fix API change according to heavengate test=develop
      
      * Modify API comment test=develop
      e2ba9668
  38. 29 12月, 2018 1 次提交
  39. 26 12月, 2018 1 次提交