1. 27 4月, 2020 1 次提交
  2. 09 4月, 2020 1 次提交
    • M
      Remove: NGraph engine from PDPD repository (#23545) · 3baaee9a
      mozga-intel 提交于
      * Remove the NGraph engine from PDPD repository
      1. Each operator was removed from the operator's directory
      2. Each test was removed from the unittest directory
      3. The parallel executor support was removed from the PDPD
      4. The CMake file was removed from the PDPD
      5. The NG flags were removed from the repository
      test=develop
      
      * Remove ngraph from:
      1. Cmake file
      2. Python file
      test=develop
      3baaee9a
  3. 05 4月, 2020 1 次提交
  4. 02 4月, 2020 1 次提交
  5. 30 3月, 2020 1 次提交
  6. 26 3月, 2020 1 次提交
    • Z
      [Paddle-TRT]: Ernie Dynamic shape support. (#23138) · 430b0099
      Zhaolong Xing 提交于
      * add dynamic plugin support.
      test=develop
      
      * change emb eltwise layernorm to math function
      test=develop
      
      * add emb eltwise layernorm
      test=develop
      
      * can run dynamic shape ernie
      test=develop
      
      * fix ci
      test=develop
      
      * add ut for trt ernie dynamic
      
      test=develop
      
      * refine dynamic shape c++ interface.
      test=develop
      
      * fix comments
      test=develop
      
      * fix comments
      test=develop
      430b0099
  7. 05 2月, 2020 1 次提交
  8. 04 2月, 2020 1 次提交
  9. 09 1月, 2020 1 次提交
  10. 28 11月, 2019 1 次提交
  11. 07 11月, 2019 1 次提交
  12. 05 11月, 2019 1 次提交
    • Z
      Support NoNeedBufferVarsInference in dygraph backward (#20868) · 878a40f5
      Zeng Jinle 提交于
      * support no need buffer vars in dygraph, test=develop
      
      * fix inference compilation error, test=develop
      
      * update no_need_buffer_vars_inference, test=develop
      
      * add unittests for no_need_buffer_vars_context, test=develop
      
      * refine no_need_buffer_vars by return ref, test=develop
      
      * polish some codes, test=develop
      878a40f5
  13. 30 10月, 2019 1 次提交
  14. 28 10月, 2019 1 次提交
  15. 24 10月, 2019 1 次提交
  16. 18 10月, 2019 1 次提交
  17. 02 10月, 2019 1 次提交
  18. 30 9月, 2019 1 次提交
    • W
      fix compile paddle with anakin bug · 276b5e34
      Wilber 提交于
      * fix compile with anakin bug
      
      * remove useless deps test=develop
      
      - 修复了联编anakin时,遇到的bug.
      - 编译test_anakin_activate 不通过
      - 编译test_anakin_engine 不通过
      276b5e34
  19. 17 9月, 2019 1 次提交
  20. 11 9月, 2019 2 次提交
    • Z
      Make leaky relu inplacable (#19676) · 0daa5c97
      Zeng Jinle 提交于
      * make leaky relu inplacable, test=develop
      
      * force add unittests to pass coverage, test=develop
      0daa5c97
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  21. 08 9月, 2019 1 次提交
  22. 19 8月, 2019 1 次提交
    • A
      Add match_matrix_tensor op (#18525) · 78a3d837
      Aurelius84 提交于
      * add matrch_matrix_tensor op test=develop
      
      * fix ignore unittest if with_mkl=off test=develop
      
      * clean code and rm is_test param test=develop
      
      * modify API.spec test=develop
      
      * rm useless code in search_compute.h test=develop
      
      * modify api.spec test=develop
      
      * modify default_grad.spec test=develop
      
      * Add API test code test=develop
      
      * clean code in search_computer.h
      
      * modify PADDLE_ENFORCE and clean search_compute.h test=develop
      
      * fix code style test=develop
      78a3d837
  23. 06 8月, 2019 1 次提交
    • K
      Add var_conv_2d op (#18518) · e681d655
      Kevin 提交于
      * fix overflow by int32 mul test=develop
      
      * fix reference nullptr
      
      * fix codestyle test=develop
      
      * modify to point in ContextProjectFunctor test=develop
      
      * modify to point in ContextProjectFunctor test=develop
      
      * modify . to -> test=develop
      
      * add var_conv_2d op test=develop
      
      * edit api.spec test=develop
      
      * ignore unittest if with_mkl=off test=develop
      
      * fix python3 division test=develop
      
      * fix ignore unittest bug test=develop
      
      * remove useless code test=develop
      
      * modify api.spec test=develop
      
      * modify default_grad.spec test=develop
      e681d655
  24. 23 7月, 2019 1 次提交
  25. 27 6月, 2019 1 次提交
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
  26. 11 6月, 2019 1 次提交
    • Update the Anakin interfaces for content-dnn and MLU (#17890) · bce259e5
      石晓伟 提交于
      * update anakin-engine interfaces for content-dnn
      
      test=develop
      
      * support only-gpu mode of Anakin
      
      modify eltwise parse
      
      test=develop
      
      * modification for thread-safe
      
      test=develop
      
      * Integrated template instance
      
      test=develop
      
      * increase template parameters
      
      test=develop
      
      * support MLU predictor
      
      test=develop
      
      * update anakin cmake files
      
      test=develop
      
      * update TargetWrapper::set_device
      
      * update the initialization of anakin subgraph
      
      test=develop
      
      * use the default constructor of base class
      
      test=develop
      bce259e5
  27. 30 5月, 2019 1 次提交
  28. 17 5月, 2019 1 次提交
  29. 18 4月, 2019 1 次提交
  30. 28 3月, 2019 1 次提交
  31. 22 3月, 2019 1 次提交
  32. 20 3月, 2019 1 次提交
  33. 19 3月, 2019 1 次提交
  34. 16 3月, 2019 1 次提交
  35. 15 3月, 2019 1 次提交
    • Q
      Support sync batch norm. (#16121) · 8ad672a2
      qingqing01 提交于
      * Support Sync Batch Norm.
      * Note, do not enable it in one device.
      
      Usage:
      
      build_strategy = fluid.BuildStrategy()
      build_strategy.sync_batch_norm = True
      binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
              loss_name=loss_mean.name,
              build_strategy=build_strategy)
      8ad672a2
  36. 22 2月, 2019 1 次提交
  37. 30 1月, 2019 1 次提交
  38. 25 1月, 2019 1 次提交
    • B
      Adding ngraph_engine_op (#14948) · efce2567
      baojun 提交于
      * enable ngraph_engine_op
      test=develop
      
      * merge develop test=develop
      
      * avoid const_cast test=develop
      
      * rm ngraph_operator test=develop
      
      * Added TODO to move EnableNgraph test=develop
      
      * Add TODO to remove const_cast test=develop
      efce2567
  39. 24 1月, 2019 1 次提交
    • Y
      Add the CUDA kernel for beam_search op (#15020) · 3008fa12
      Yiqun Liu 提交于
      * Refine the beam_search op and test.
      
      * A basic CUDA implementation of beam_search for small batch_size.
      
      * Implement CUDA kernel for beam_search_op.
      
      * Use multiple CUDA threads in the same block to select the top beam.
      
      * Update the python api of beam_search op.
      
      * Enable extend function in CPU kernel of beam_search op.
      
      * Unify the CUDA codes.
      test=develop
      
      * Unify the CPU kernel of beam_search op.
      
      * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.
      
      * Update the description of beam_search in API.spec.
      
      * Enable the use of CUDA kernel in beam_search op.
      
      * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
      test=develop
      
      * Follow comments.
      test=develop
      
      * Call the CPU kernel for beam_search op when batch_size > 4.
      test=develop
      
      * Remove the except of is_empty op in PrepareData.
      test=develop
      3008fa12