- 14 1月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "unified operators" * "add CUDNN register" * "add use cudnn attribute" * "add attribute" * "test conv tranpose op" * "remove duplicated attr" * "fix op test" * "add attribute to set cudnn" * "add more log" * "need layout op register support" * "add more log" * "change GetExpectedKernelType " * "fix Get attr in conv_op" * "fix CI" * "fix tests" * "removed kernel priority fallback" * "fix CI" * "fix stack pointer bug" * "refine buggy interface" * "add const cast to save life" * "fix get_output_with_grad" * "fix op test with dataformat" * ""fix pooling * "fix pooling test" * "fix CI" * "fix with_gpu error" * "add transform needed functional check" * "fix unpack list error" * "comment out parallel.do temporary" * "fix CI" * "fix compile doc error" * "make threshold larger"
-
- 09 1月, 2018 1 次提交
-
-
由 Yiqun Liu 提交于
* Add Seq2BatchFunctor, which will be used in WarpCTCOp. * Implement WrapCTCFunctor and WrapCTCKernel. * Add unittest of warpctc_op. * Modify the check_output inferface in python unittest framework to allow check a subset of outputs. * Use absolute offset lod in warpctc_op and related functors. * Refine the comments of warpctc_op. * The new python unittest supports checking a subset of the outputs, so revoke the previous change. * Rename the transform from LoDTensor to Tensor with shape [max_sequence_length, num_sequences, sequence_width] to PaddingSequenceFunctor. * Update to the newest codes. * Rename the PaddingSequenceFunctor to PaddingLoDTensorFunctor and remove the computation of dimensions out of the functos.
-
- 26 12月, 2017 1 次提交
-
-
由 Luo Tao 提交于
-
- 24 12月, 2017 1 次提交
-
-
由 dzhwinter 提交于
* "change operator interface" * "move devicepool to device_context" * "fix operator test" * "fix op_registry Run interface" * "net op passed. Need to fix nccl multi-Context" * "add nccl group function" * "add nccl group function" * "fix gpu count exceed 32 error" * "fix recurrent op, nccl op" * "change the other operators interface with Place" * "fix typo" * "fix pybind" * "fix device in python side" * "fix pybind failed" * "add init for test" * "fix CI"
-
- 15 12月, 2017 1 次提交
-
-
由 Yu Yang 提交于
-
- 07 12月, 2017 1 次提交
-
-
由 Yu Yang 提交于
* Add HasCUDNN to detect if CUDNN is installed or not * Fix CI
-
- 29 11月, 2017 1 次提交
-
-
由 武毅 提交于
* fix compile on cudnn7 * update * update * make silent
-
- 24 11月, 2017 1 次提交
-
-
由 Qiao Longfei 提交于
* make enforce a target and dependent on nccl when gpu is enabled * add some more dependency
-
- 11 11月, 2017 2 次提交
-
-
由 dangqingqing 提交于
-
由 emailweixu 提交于
It caused by a bug of std::call_once described in https://stackoverflow.com/questions/41717579/stdcall-once-hangs-on-second-call-after-callable-threw-on-first-call. It is likely caused by a deeper bug of pthread_once, which is discussed in https://patchwork.ozlabs.org/patch/482350/
-
- 26 10月, 2017 1 次提交
-
-
由 Qiao Longfei 提交于
* init cudnn batch norm op * rename batch_norm_cudnn_op.cc batch_norm_op.cu * correct name style * add ExtractNCWHD, simplify code * fix ExtractNCWHD * use CUDNN_ENFORCE instead of PADDLE_ENFORCE
-
- 24 10月, 2017 2 次提交
- 18 10月, 2017 1 次提交
-
-
由 Markus Kliegl 提交于
* initial matmul operator Similar to np.matmul, but also has transpose_X and transpose_Y flags, and only supports tensors from rank 1 to 3 inclusive. For GPU, uses cublas?gemmStridedBatched. For CPU, uses cblas_?gemm_batch if available via MKL; otherwise a simple serial implementation that loops over the batch dimension is employed for now.
-
- 16 10月, 2017 1 次提交
-
-
由 Dong Zhihong 提交于
-
- 15 10月, 2017 1 次提交
-
-
由 Dong Zhihong 提交于
-
- 31 8月, 2017 1 次提交
-
-
由 dangqingqing 提交于
-
- 10 8月, 2017 4 次提交
- 04 8月, 2017 1 次提交
-
-
由 liaogang 提交于
-
- 15 7月, 2017 1 次提交
-
-
由 liaogang 提交于
-
- 13 7月, 2017 1 次提交
-
-
由 qijun 提交于
-
- 12 7月, 2017 1 次提交
-
-
由 qijun 提交于
-
- 11 7月, 2017 2 次提交
- 04 7月, 2017 3 次提交