1. 24 7月, 2019 1 次提交
    • B
      Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60
      Bob Zhu 提交于
      * extend matmul op to support multiple head multiplication
      
      With the support of multiple head, the multiplication of two big matrixes is
      split into multiplication of several (head_number) small matrixes. e.g. if
      Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
      as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
      [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
      220eef60
  2. 22 7月, 2019 1 次提交
  3. 18 7月, 2019 1 次提交
    • Z
      Feature/auto_growth_allocator (#18561) · ae58afc5
      Zeng Jinle 提交于
      * feature/auto_growth_allocator, test=develop
      
      * add unittest of AlignedAllocator, test=develop
      
      * try to turn on auto_growth to test on CI, test=develop
      
      * fix segmentation fault in mixed_vector.h, test=develop
      
      * add unittests, test=develop
      ae58afc5
  4. 15 7月, 2019 1 次提交
  5. 12 7月, 2019 1 次提交
  6. 11 7月, 2019 1 次提交
    • Z
      Feature/buffer_shared_inplace (#17911) · d3003a16
      Zeng Jinle 提交于
      * feature/buffer_shared_inplace, test=develop
      
      * refine code, test=develop
      
      * fix elementwise_add op cpu inplace and sum inplace bug, test=develop
      
      * add unittest and debug log, test=develop
      
      * fix parallel_executor scope bug, polish code, test=develop
      
      * fix sum op, activation op, single_in_place_inference bug, test=develop
      
      * remove kLocalExecScopeName, test=develop
      
      * fix unittest,test=develop
      
      * fix out_var first version bug, test=develop
      
      * follow comments,test=develop
      d3003a16
  7. 27 6月, 2019 2 次提交
    • K
      add WITH_COVERAGE option, default OFF (#17872) · 27fb9cad
      kh2se2013 提交于
      * add WITH_COVERAGE option, default OFF
      
      test=develop
      
      * add coverage for python sdk
      
      test=develop
      
      * fix code style
      
      * fix COVERAGE_FILE path
      
      test=develop
      
      * remove coverage package
      
      test=develop
      
      * test = develop, run coverage as module
      27fb9cad
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
  8. 26 6月, 2019 3 次提交
  9. 25 6月, 2019 1 次提交
    • H
      Sequence mask support tensor (#18249) · df2eee71
      Hongyu Liu 提交于
      * sequnce mask support max length tensor input; test=develop
      
      * add rnn_impl.py; test=develop
      
      * add basic gru lstm unittest; test=develop
      
      * fix api spec; test=develop
      
      * fix sequence_mask op bug;
      test=develop
      test=document_preview
      
      * change +-*x to elmentwise_op; test=develop
      
      * add mkl flag; test=develop
      
      * fix rnn impl bug; test=develop
      
      * update api spec; test=develop
      
      * fix doc bug; test=develop
      
      * fix lstm bugs; test=develop
      df2eee71
  10. 21 6月, 2019 1 次提交
  11. 20 6月, 2019 1 次提交
  12. 19 6月, 2019 2 次提交
  13. 18 6月, 2019 1 次提交
  14. 12 6月, 2019 1 次提交
  15. 06 6月, 2019 3 次提交
  16. 31 5月, 2019 1 次提交
  17. 30 5月, 2019 2 次提交
  18. 29 5月, 2019 1 次提交
  19. 23 5月, 2019 1 次提交
  20. 21 5月, 2019 1 次提交
  21. 13 5月, 2019 1 次提交
  22. 07 5月, 2019 1 次提交
  23. 05 5月, 2019 2 次提交
  24. 25 4月, 2019 2 次提交
  25. 23 4月, 2019 1 次提交
    • Q
      Support backward of backward for Relu and add a new gradient checker by... · c1c2633a
      qingqing01 提交于
      Support backward of backward for Relu and add a new gradient checker by comparing theoretical and numerical Jacobian. (#16862)
      
      * Support backward of backward and a new gradient checker
      * Rename decorators.py to decorator_helper.py, since Python on Windows CI has decorators package.
      
      1. Add ReluDoubleGradMaker when register relu_grad.
      2. Add a new gradient checker by comparing theoretical and numerical Jacobian.  Check double gradients by double_grad_check.
      c1c2633a
  26. 22 4月, 2019 2 次提交
    • Z
      Move gc test to each test of op (#16999) · f188b370
      Zeng Jinle 提交于
      * move gc test to op_test
      test=develop
      
      * Revert "move gc test to op_test"
      
      This reverts commit cf15da65c38f57c91f53b3d8b3c2365d4aa86016.
      
      * enable gc test in some ops
      test=develop
      f188b370
    • W
      add parallel build script to ci … (#16901) · d9991dcc
      wopeizl 提交于
      * add parallel build script to ci test=develop
      * 1. classify the test case as single card/two cards/multiple cards type
         2. run test case according to the run type
      d9991dcc
  27. 18 4月, 2019 1 次提交
  28. 17 4月, 2019 1 次提交
  29. 12 4月, 2019 1 次提交
  30. 10 4月, 2019 1 次提交