- 17 6月, 2020 1 次提交
-
-
由 zlsh80826 提交于
* blockReduce opt * launch threads align to warpSize * reduce unnecessary shared memory for broadcast reduced value * vectorize SoftmaxKernelWithEltadd * add fp16 constrain * test=develop
-
- 16 6月, 2020 2 次提交
- 15 6月, 2020 2 次提交
-
-
由 Jeng Bai-Cheng 提交于
This commit fixs the compiling bug regarding unique_ptr of IOptimizationProfile. IOptimizationProfile has protected dtor and is controlled by TensorRT internally. Application shouldn't delete the pointer of IOptimizationProfile. See TensorRT document: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/c_api/classnvinfer1_1_1_i_builder.html#a9ac47e100454151d8206ac91d543299a test=develop
-
由 zlsh80826 提交于
* parallel move shared data test=develop * test=develop
-
- 14 6月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
* test=develop * test=develop * fix bug * test=develop * test=develop
-
- 12 6月, 2020 2 次提交
- 11 6月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* use allow list instead of white list, test=develop * reduce include, test=develop
-
- 10 6月, 2020 6 次提交
-
-
由 Zhang Ting 提交于
-
由 hutuxian 提交于
Support CMatchAucCalculator based on CMatchRankAucCalculator with a new parameter ignore_rank
-
由 Zhou Wei 提交于
* windows publish package scripts,test=develop * windows publish package scripts,test=develop * windows publish package scripts,test=develop
-
由 Leo Chen 提交于
-
由 wangchaochaohu 提交于
-
由 silingtong123 提交于
* test=develop, add log message in the function UpdateDllFlag * test=develop, add the test
-
- 09 6月, 2020 7 次提交
-
-
由 Chen Weihang 提交于
-
由 Sylwester Fraczek 提交于
* remove gmock from ut test=develop * coverage enabled for r+t+m fuse pass test=develop
-
由 wawltor 提交于
Add the support the 5d,6d tensor support for the reduce ops; Add the same time, the compile time, it was 22 minutes, it was 21 minutes after fixed.
-
由 liuwei1031 提交于
-
由 silingtong123 提交于
-
由 wangchaochaohu 提交于
-
由 Sylwester Fraczek 提交于
test=develop
-
- 08 6月, 2020 4 次提交
-
-
由 mapingshuo 提交于
fixes the CUDAPlace info in the Print op
-
由 Aurelius84 提交于
* Support LoDTensorArray in reverse_op test=develop * polish en doc and unittest code test=develop * refine sample code test=develop * add example of LoDTensorArray test=develop * fix typo test=develop
-
由 Leo Chen 提交于
* refine err_msg of pybind.cc, test=develop * refine err_msg in tensor_py.h, test=develop * refine error msg, test=develop * fix test_exception, test=develop * follow comments, test=develop
-
由 Zhou Wei 提交于
-
- 05 6月, 2020 6 次提交
-
-
由 Leo Chen 提交于
* refine isfinite, test=develop * use namespace std of isfinite, test=develop, test=win_gpu
-
由 silingtong123 提交于
-
由 whs 提交于
-
由 Chen Weihang 提交于
* support selectedrows allreduce in multi-cards dygraph, test=develop * remove useless import modules in unittests, test=develop * add nccl cmake to get nccl version, test=develop * add if-condition to compiled correctly, test=develop * add detail version parseing for old nccl, test=develop * polish camke details, test=develop * fix remove test cmake error, test=develop * fix cmake condition, test=develop * change unittest camke list, test=develop * fix unittest cmake rule, test=develop, test=framep0
-
由 Pei Yang 提交于
-
由 silingtong123 提交于
* test=develop, fix a bug * test=develop, remove the macro of PADDLE_DLL_INFERENCE
-
- 04 6月, 2020 6 次提交
-
-
由 lilong12 提交于
* add the support of device index for device_guard.
-
由 lilong12 提交于
* add queue_generator_op, dequeue_op, enqueue_op and ut, test=develop
-
由 hutuxian 提交于
* Fix the field length in LoD scenario * Fix the missed lod info when copy tensor in dump field * Add some log to make debug easy
-
由 石晓伟 提交于
-
由 Leo Chen 提交于
* add amp_check_finite_and_scale op, test=develop * add cpu kernel, test=develop * use bool, test=develop * follow comments, test=develop
-
由 zhangchunle 提交于
-
- 03 6月, 2020 2 次提交
-
-
由 leesusu 提交于
-
由 Chen Weihang 提交于
-