- 10 1月, 2020 3 次提交
-
-
由 liu zhengxi 提交于
-
由 GaoWei8 提交于
* Optimize the kernel implementation of layernorm with openmp (#20895) * Add ernie c++ inference test (#21015) * Add ernie unit test test=develop * Add ernie unit test test=develop * Add ernie unit test test=develop * remove ngraph * optimize gpu test test=develop * optimize codes test=develop * fix cmake fails on inference_download_and_uncompress (#21185) * solve cmake fails on inference_download_and_uncompress test=develop * solve cmake fails on inference_download_and_uncompress test=develop * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop * Polish the codes of fc when needs padding (#21378) test=develop * Add ernie large c++ inference test (#21365) * add ernie-large test test=develop * add ernie large c++ inference test test=develop * Modify padding strategy: remove weight copy in fc padding (#21650) test=develop * optimize fc jit (#21878) test=develop Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
-
由 石晓伟 提交于
* fix multi-thread error of fc_gru_fuse_pass.cc, test=develop * export FLAGS and GLOG symbols, test=develop
-
- 09 1月, 2020 3 次提交
-
-
由 zhaoyuchen2018 提交于
windows conv_fusion failed as no kernel, explicit declare lambda Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Chen Weihang 提交于
-
由 WangXi 提交于
[Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce & sync_batch_norm hang in fleet (#22157)
-
- 08 1月, 2020 2 次提交
-
-
由 zhaoyuchen2018 提交于
* Fix softmax cuda bug * Refine multihead log and softmax logic * Align block to 32
-
由 liu zhengxi 提交于
* fix seqconv_eltadd_relu pass during multi-threads predictor, test=develop * fix attention_lstm_fuse_pass during multi-threads inference, test=develop * fix embedding_fc_lstm_fuse_pass during multi-threads inference, test=develop * fix fc_lstm_fuse_pass during multi-threads inference, test=develop * fix seq_concat_fc_fuse_pass during multi-threads inference, test=develop
-
- 07 1月, 2020 4 次提交
-
-
由 Pei Yang 提交于
-
由 Chen Weihang 提交于
* add param & grad shape check for sgd op * add _reshape_inplece interface for dygraph parallel * refine unittest based paddle/models scripts, test=develop * add unittest for parallel grad fuse, test=develop
-
由 Aurelius84 提交于
* fix decay param in DecayAdagrad test=develop (#22026) * fix integer overflow in match_matrix (#22036) * fix integer overflow in match_matrix test=develop * fix integer overflow in match_matrix test=develop * fix typo test=develop
-
由 Yibing Liu 提交于
* Fix the global_step & continuous applying error in EMA * Fix for step 0 & add unit test test=release/1.6
-
- 16 12月, 2019 1 次提交
-
-
由 石晓伟 提交于
-
- 09 12月, 2019 2 次提交
-
-
由 xiegegege 提交于
* fix logger problem test=develop * refine logger test=develop
-
由 Zhaolong Xing 提交于
This reverts commit 0473cdb8.
-
- 08 12月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
test=release/1.6
-
- 06 12月, 2019 4 次提交
-
-
由 bingyanghuang 提交于
-
由 Aurelius84 提交于
-
由 石晓伟 提交于
-
由 Zhaolong Xing 提交于
* Fix TensorRT detection bug 1. Add new search path for TensorRT at tensorrt.cmake 2. Add better debug message 3. Fix the bug of detection of TensorRT version In NVIDIA official docker image, TensorRT headers are located at `/usr/include/x86_64-linux-gnu` and TensorRT libraries are located at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will fail to detect TensorRT. There is no debug/warning message to tell developer that TensorRT is failed to be detected. In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is defined at `NvInferVersion.h` instead of `NvInfer.h`, so add compatibility fix. * Fix TensorRT variables in CMake 1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}` 2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}` Manually type path may locate incorrect path of TensorRT. Use the paths detected by system instead. * Fix TensorRT library path 1. Add new variable - `${TENSORRT_LIBRARY_DIR}` 2. Fix TensorRT library path inference_lib.cmake and setup.py.in need the path of TensorRT library instead of the file of TensorRT library, so add new variable to fix it. * Add more general search rule for TensoRT Let system detect architecture instead of manually assign it, so replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`. * Add more general search rule for TensorRT Remove duplicate search rules for TensorRT libraries. Use `${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so test=release/1.6
-
- 05 12月, 2019 5 次提交
-
-
由 Pei Yang 提交于
-
由 lilong12 提交于
-
由 zhouwei25 提交于
-
由 lilong12 提交于
* fix the computation for dx (grad for x) for prelu operation. (#20949) * set the default value of alpha for prelu to 0.25, test=develop * add the call to __syncthreads(), test=develop * fix the implementation of cpu prelu, test=develop * repair the implementation of element mode prelu, test=develop * modify test_prelu_op.py, test=develop
-
由 zhouwei25 提交于
-
- 04 12月, 2019 6 次提交
-
-
由 Pei Yang 提交于
make config option DisableGlogInfo() able to mute all inference logs
-
由 tangwei12 提交于
* fix fetch handler problem and refactor when a user define FetchHandler class, he or she should initialize a handler with variable dict. the key of a variable dict is a user defined name, the value of a variable dict is a Varaible generated from python API. For each fetching, a user should implement handler function in which fetched_result_dict will be available and the user can access the fetched value with user defined keys.
-
由 Zhaolong Xing 提交于
* ADD NV JETSON SUPPORT test=release/1.6 * CHERRY_PICK: specify the auto growth allocator for inference. test=release/1.6
-
由 WangXi 提交于
-
由 bingyanghuang 提交于
-
由 hong 提交于
* disable reshape inplace in dygraph model; test=develop (#21157) * fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)
-
- 03 12月, 2019 9 次提交
-
-
由 lilong12 提交于
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
-
由 Lv Mengsi 提交于
* fix transpose conv,test=develop * fix comments test=develop
-
由 zhaoyuchen2018 提交于
* Improve argsort performance. - Give 200000 data to compute argsort on v100, can speed up ~190x before opt cost: 0.53s after opt cost:0.0027s - Add fp16 support * Refine error message * Refine code * Add descending sort test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Kaipeng Deng 提交于
* add Adam beta1/beta2 support Variable. test=develop
-
由 zhaoyuchen2018 提交于
* Add Asypadding for conv fusion. test=develop reference: pr/20042 * Fix eigen build link error * Change back file mode * Use math function & add more checks.
-
由 lilong12 提交于
* add the framework support for distfc and ut, test=develop * fix the implementation of shard_index_op, test=develop
-
由 Kaipeng Deng 提交于
* batch_norm momentum support variable. test=develop
-
由 石晓伟 提交于
-
由 Pei Yang 提交于
-