1. 19 3月, 2018 1 次提交
    • X
      Enable P2P memory copy · 18ac6947
      Xin Pan 提交于
      On k40 with 4 devices, time reduces from ~4.0 to ~3.8+, should be
      more obvious on better hardware
      18ac6947
  2. 16 3月, 2018 1 次提交
  3. 15 3月, 2018 14 次提交
    • T
      Implement Select OP (#9088) · 1e4c504e
      Thuan Nguyen 提交于
      * Fix old documentation for channel_recv
      
      * Initial design of CSP select
      
      * Redesign channel implementation for Select Op
      
      * Remove unecessary header
      
      * Initial checkin of select op, currently will read all the conditional_op in the cases block and also pull out all channels involved in the select.
      
      * Init python select op API
      
      * Python select bug fix when checking op creates block
      
      * Add case_to_execute as (a) input to select, (b) into the passed inputs into the select op
      
      * Add in addition code for select op
      
      * Init fibonacci test from python
      
      * implement fibonnaci sequence test
      
      * update fib unit test
      
      * Improve select test cases
      
      * Shorten non-pep-8-ed lines
      
      * Add methods on channel needed by select op
      
      * Fix compile issues, finish implementation, still need to debug code
      
      * Fix issue with fibonncci test, it works now!
      
      * Change QueueMessage callback to take in an ChannelAction enum, fix select unit test
      
      * Fix case attributes
      
      * Fix issue with select control flow
      
      * Make cases - previously on each selectcase conditional_block - attributes to select
      
      * Use class constants for type of channel
      
      * Change select op to take in "cases" attribute
      
      * return boolean from select callback function to tell Channel if this RECV or SEND should be executed
      
      * Improve attributes and inputs comments on select op
      
      * Fix issues with python unit test
      
      * Assert fibonacci final output
      
      * Fix issue when channel name / channel var is null for "default" case in select op
      
      * Assert base select test output
      
      * Make QueueMessage use shared pointer and modify the order of the callback
      
      * Fixing the order in which the callback is called
      
      * Move channel utility methods to paddle/fluid/operators/concurrency/channel_util
      
      * Create channel_util and move channel util methods
      
      * Fix crash when calling select_op
      
      * Fix deadlock
      
      * Fix issue of channel destructor deadlock
      
      * Fix precommit issues
      
      * Accidentally checked in changes to beam_search_op, reverting change.
      
      * Fix dependency issue in concurrency cmake
      
      * add device_context dependency for concurrency target
      1e4c504e
    • Q
    • X
      Merge pull request #9037 from panyx0718/develop · d284cf88
      Xin Pan 提交于
      Better timeline
      d284cf88
    • D
      [Speed]implement cudnn sequence softmax cudnn (#8978) · 128adf53
      dzhwinter 提交于
      * "add softmax cudnn functor support"
      
      * "add testing"
      
      * "refine cmakelist"
      
      * "sequence softmax forward speed up"
      
      * "add softmax grad"
      
      * "fix sequence softmax test"
      
      * "add double precision'
      
      * "fix softmax test"
      
      * "add softmax cudnn support"
      
      * "fix softmax cudnn test"
      
      * "add softmax to nn.py"
      
      * "fix compile bug"
      
      * "refine cmakelist"
      
      * "fix ci"
      
      * "fix based on comment"
      
      * "fix based on comments"
      
      * "fix ci"
      128adf53
    • Y
      Merge pull request #9058 from reyoung/feature/parallel_do_bug · 9b9f3f09
      Yu Yang 提交于
      Fix models #725
      9b9f3f09
    • R
      Merge pull request #9093 from weixing02/dockerfile · 3ff649e3
      ranqiu92 提交于
      The sphinx version is specified as 1.5.6 in the Dockerfile
      3ff649e3
    • _青葱's avatar
      Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dockerfile · a68290fc
      _青葱 提交于
      Merge branch develop
      a68290fc
    • _青葱's avatar
      Add comments · 45eb94e6
      _青葱 提交于
      45eb94e6
    • K
      Add fp16 mul op support and bind paddle fp16 to numpy fp16 (#9017) · e26f1123
      Kexin Zhao 提交于
      * add fp16 mul op support
      
      * small fix
      
      * fix bug
      
      * small fix
      
      * fix PADDLE_WITH_CUDA compiling issue
      
      * reorg code
      
      * test for pybind
      
      * treate as float16 as uint16_t in pybind
      
      * bind np.float16 to paddle float16
      
      * small fix
      
      * clean code
      
      * remove redundancy
      
      * fix mul_op test
      
      * address comments
      
      * small fix
      
      * add is_float16_supported func
      e26f1123
    • _青葱's avatar
      fdc3843f
    • D
      "exported scatter to python" (#9038) · 71400711
      dzhwinter 提交于
      * "exported scatter to python"
      
      * Revert ""exported scatter to python""
      
      This reverts commit 38745a62.
      
      * "polish scatter and export to python"
      71400711
    • T
      Merge pull request #9067 from luotao1/with_fluid · cf2addd2
      Tao Luo 提交于
      enable WITH_FLUID option
      cf2addd2
    • C
      Merge pull request #9072 from chengduoZH/feature/refine_parallel_do · 11c43e5d
      chengduo 提交于
      Refine parallel_do_grad
      11c43e5d
    • A
      41894da1
  4. 14 3月, 2018 24 次提交