1. 24 7月, 2019 3 次提交
    • B
      Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60
      Bob Zhu 提交于
      * extend matmul op to support multiple head multiplication
      
      With the support of multiple head, the multiplication of two big matrixes is
      split into multiplication of several (head_number) small matrixes. e.g. if
      Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
      as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
      [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
      220eef60
    • W
      Add python API for appending LoD level (#18702) · 075e1cf7
      whs 提交于
      * Make lod reset op support for append lod level.
      
      * Fix API.spec
      test=develop
      
      * Fix unitest.
      test=develop
      
      * Add python api for lod append.
      test=develop
      
      * Fix API.spec
      test=develop
      
      * Fix format of doc.
      test=develop
      
      * Fix unitest.
      test=develop
      
      * Fix doc.
      test=develop
      075e1cf7
    • C
      Enhance backward process (#18700) · 8259f141
      chengduo 提交于
      * prun backward ops
      test=develop
      8259f141
  2. 23 7月, 2019 2 次提交
  3. 22 7月, 2019 3 次提交
  4. 19 7月, 2019 3 次提交
  5. 18 7月, 2019 2 次提交
    • Z
      Feature/auto_growth_allocator (#18561) · ae58afc5
      Zeng Jinle 提交于
      * feature/auto_growth_allocator, test=develop
      
      * add unittest of AlignedAllocator, test=develop
      
      * try to turn on auto_growth to test on CI, test=develop
      
      * fix segmentation fault in mixed_vector.h, test=develop
      
      * add unittests, test=develop
      ae58afc5
    • H
      hash_op support int64 hash_size (#18674) · bb2f5d24
      hutuxian 提交于
      * hash_op support int64 hash_size
      * add corresponding UT
      bb2f5d24
  6. 15 7月, 2019 2 次提交
  7. 12 7月, 2019 2 次提交
  8. 11 7月, 2019 2 次提交
    • G
    • Z
      Feature/buffer_shared_inplace (#17911) · d3003a16
      Zeng Jinle 提交于
      * feature/buffer_shared_inplace, test=develop
      
      * refine code, test=develop
      
      * fix elementwise_add op cpu inplace and sum inplace bug, test=develop
      
      * add unittest and debug log, test=develop
      
      * fix parallel_executor scope bug, polish code, test=develop
      
      * fix sum op, activation op, single_in_place_inference bug, test=develop
      
      * remove kLocalExecScopeName, test=develop
      
      * fix unittest,test=develop
      
      * fix out_var first version bug, test=develop
      
      * follow comments,test=develop
      d3003a16
  9. 10 7月, 2019 1 次提交
  10. 09 7月, 2019 2 次提交
  11. 05 7月, 2019 3 次提交
    • Z
      Fix topk cannot handle 1D vector bug (#18466) · 832d8191
      zhaoyuchen2018 提交于
      * Fix topk cannot handle 1D vector bug
      
      Add path to handle 1D vector
      
      test=develop
      Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
      
      * refine code
      
      test=develop
      Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
      832d8191
    • J
      Hide no support (#18515) · 7586cdd5
      Jiabin Yang 提交于
      * test=develop, fix docker with paddle nccl problem
      
      * test=develop, hide no_support api and add ut for it
      7586cdd5
    • L
      Add distributions of normal and uniform (#18023) · 43e17c79
      LielinJiang 提交于
      * add_distributions_of_normal_and_uniform
      
      * paddle/fluid/API.spec
      
      * modify API.spec
      
      * modified paddle/fluid/API.spec, test=develop
      
      * modify paddle/fluid/API.spec, test=develop
      
      * modify paddle/fluid/API.spec, test=develop
      
      * fix some comment, test=develop
      
      * modify API.spec, test=develop
      
      * add comment for init function, modify hard code, test=develop
      
      * modify API.spec, test=develop
      
      * modify API.spec, test=develop
      
      * make unit test function shorter, test=develop
      
      * modify paddle/fluid/API.spec
      43e17c79
  12. 04 7月, 2019 2 次提交
  13. 03 7月, 2019 7 次提交
  14. 02 7月, 2019 2 次提交
    • Y
      supports collective training with programs (#18392) · a873fa84
      Yi Liu 提交于
      1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
      2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
      3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
      a873fa84
    • C
      Add find_no_grad_vars in backward.py (#17942) · e0d8c6ac
      chengduo 提交于
      * add not_been_used_vars to no_grad_set
      test=develop
      e0d8c6ac
  15. 01 7月, 2019 1 次提交
  16. 27 6月, 2019 2 次提交
    • K
      add WITH_COVERAGE option, default OFF (#17872) · 27fb9cad
      kh2se2013 提交于
      * add WITH_COVERAGE option, default OFF
      
      test=develop
      
      * add coverage for python sdk
      
      test=develop
      
      * fix code style
      
      * fix COVERAGE_FILE path
      
      test=develop
      
      * remove coverage package
      
      test=develop
      
      * test = develop, run coverage as module
      27fb9cad
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
  17. 26 6月, 2019 1 次提交