- 24 9月, 2019 1 次提交
-
-
由 Jacek Czaja 提交于
- First implementation of BWD and FWD of pooling mkl-dnn - Compilation fix - Fix - Fix - Fix - Fix to crash - Compilation fix - Combined AcquireBacward with Fwd test=develop
-
- 23 9月, 2019 1 次提交
-
-
由 chengduo 提交于
* Add RecordHistoryLocalExecScopes test=develop
-
- 22 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop
-
- 20 9月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 Jacek Czaja 提交于
- LRN mkl-dnn kernel refactor test=develop - compilation fix - Another compilation fix - Compilation fix - another compilation fix - compilation fix - Crash fix - optional LRN mkldnn workspace - Added mid allocation - Workaround for tests - Removed gradient from is_test ut - Removed mid for inference - Reverted LRN mid removal for is_test - PADDLE_ENFORCE adjusted - Rebase to templatization commit - Compilation fix - compilation fix test=develop - lint test=develop - Fix to crash - Rebase to recent codebase - lin - lint - compilation fix
-
- 19 9月, 2019 2 次提交
-
-
由 lidanqing 提交于
* fix conflicts test=develop * change mask_bias_reorder test=develop * add ComputeMask function to make code clear test=develop * change according to reviews test=develop * change according to reviews test=develop
-
由 Adam 提交于
* Add template functions for Acquire primitive/primitive_desc test=develop * Move acquire primitive descriptor to protected section test=develop
-
- 18 9月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
-
- 17 9月, 2019 1 次提交
-
-
由 Adam 提交于
test=develop
-
- 16 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 14 9月, 2019 2 次提交
- 12 9月, 2019 1 次提交
-
-
由 Jacek Czaja 提交于
test=develop - fix to BWD test=develop
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 10 9月, 2019 2 次提交
-
-
由 Adam 提交于
* MKLDNN handler cleanup * MKLDNN handler cleanup test=develop
-
由 XiaoguangHu 提交于
Add document annotations for FLAGS that need to be open to external developers test=develop (#19692) Add document annotations for FLAGS that need to be open to external developers
-
- 09 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* paddle::framework::vectorize() templatization test=develop * update pybind/imperative.cc test=develop * revert update on unsqueeze_op.cc and warpctc_cudnn_op.cu.cc test=develop
-
- 05 9月, 2019 2 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-
由 Tao Luo 提交于
* remove assert.h * change PADDLE_ASSERT_MSG to PADDLE_ENFORCE test=develop * fix tensorrt paddle_enforce test=develop
-
- 03 9月, 2019 3 次提交
- 02 9月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 01 9月, 2019 2 次提交
-
-
由 Jacek Czaja 提交于
* - First set of modifications - Compilation fixes - compilation fix - Another compilation fix - Moved AcquireSoftmaxPrimitiveDescriptor call into handler - MKL-DNN Softmax PD refactor test=develop - Compilation fix test=develop - another compilation fix - cosmetcis test=develop - Compilation fix - Fix to crash when softmax backward is created * - Fixes after review of softmax refactoring test=develop
-
由 Zeng Jinle 提交于
* add retry_allocator for gpu, test=develop * follow chengduoZH's comments, test=develop * follow huihuang's comments,test=develop * change f,l in enforce.h to be file,line, test=develop * increase code coverage by adding unittests, test=develop * fix CMakeLists.txt, test=develop
-
- 30 8月, 2019 3 次提交
-
-
由 Jacek Czaja 提交于
- Refactor step 1 - Compilation fix - Yet another compilation fix - Even more compilation fix - Lint fixes test=develop - Removed deprectaed PADDLE_ENFORCE occurance test=develop - Candidate fix to BN forward - Lint fixes test=develop - Refactoring in data_layout_transform - compilation fix - Another comppilation fix - Step further into darkness - Yet another compilation fix - Yet another compilation fix - missing header - compilation fix - Added MKLDNN -> Paddle conversion in fetch op test=develop - Compilation fix test=develop - Lint test=develop - Mul fix - Fix to MKLDNN MUL op and Elementwise MUL UT test=develop - Workaround for diffrent weights with groups representation Paddle vs MKL-DNN. test=develop - Candidate fix for 5D convolution with groups - Refactor of fix for conv3d and conv2d in fetch op test=develop - Compilation fix - Still same compilation fix - Compilation fix - Compilation fix - Reverted refactoring of fixes - Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format not NHWC test=develop - minor fix in UT test=develop - Lint fixes test=develop
-
由 liuwei1031 提交于
-
由 Zeng Jinle 提交于
-
- 28 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* add signal message to stderr, test=develop * add unittests for ugly SignalHandle, test=develop
-
- 27 8月, 2019 2 次提交
- 20 8月, 2019 2 次提交
-
-
由 Tao Luo 提交于
* replace part of PADDLE_ASSERT to PADDLE_ENFORCE test=develop * remove unused fallback_alloc_size_ * add unit-test of CUDAPinnedAllocator test=develop
-
由 Yihua Xu 提交于
* Implement the operator with sprase matrix multiply * Update the URL of mklml library. test=develop * Disable MKLML implematation when using no-linux. test=develop * Ignore the deprecated status for windows test=develop
-
- 19 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* make PADDLE_ENFORCE_EQ support cannot to string types, test=develop * follow huihuang's comments, test=develop
-
- 16 8月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
-
- 15 8月, 2019 1 次提交
-
-
由 Adam 提交于
test=develop
-
- 12 8月, 2019 2 次提交
-
-
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
-
由 wopeizl 提交于
* add tensorrt support for windows
-