- 10 1月, 2020 4 次提交
-
-
由 Guo Sheng 提交于
* Fix default label dim of label_smooth_op. test=develop (#21862) * Fix unit tests of label_smooth_op's data size.
-
由 liu zhengxi 提交于
-
由 GaoWei8 提交于
* Optimize the kernel implementation of layernorm with openmp (#20895) * Add ernie c++ inference test (#21015) * Add ernie unit test test=develop * Add ernie unit test test=develop * Add ernie unit test test=develop * remove ngraph * optimize gpu test test=develop * optimize codes test=develop * fix cmake fails on inference_download_and_uncompress (#21185) * solve cmake fails on inference_download_and_uncompress test=develop * solve cmake fails on inference_download_and_uncompress test=develop * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop * Polish the codes of fc when needs padding (#21378) test=develop * Add ernie large c++ inference test (#21365) * add ernie-large test test=develop * add ernie large c++ inference test test=develop * Modify padding strategy: remove weight copy in fc padding (#21650) test=develop * optimize fc jit (#21878) test=develop Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
-
由 石晓伟 提交于
* fix multi-thread error of fc_gru_fuse_pass.cc, test=develop * export FLAGS and GLOG symbols, test=develop
-
- 09 1月, 2020 3 次提交
-
-
由 zhaoyuchen2018 提交于
windows conv_fusion failed as no kernel, explicit declare lambda Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Chen Weihang 提交于
-
由 WangXi 提交于
[Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce & sync_batch_norm hang in fleet (#22157)
-
- 08 1月, 2020 2 次提交
-
-
由 zhaoyuchen2018 提交于
* Fix softmax cuda bug * Refine multihead log and softmax logic * Align block to 32
-
由 liu zhengxi 提交于
* fix seqconv_eltadd_relu pass during multi-threads predictor, test=develop * fix attention_lstm_fuse_pass during multi-threads inference, test=develop * fix embedding_fc_lstm_fuse_pass during multi-threads inference, test=develop * fix fc_lstm_fuse_pass during multi-threads inference, test=develop * fix seq_concat_fc_fuse_pass during multi-threads inference, test=develop
-
- 07 1月, 2020 3 次提交
-
-
由 Pei Yang 提交于
-
由 Chen Weihang 提交于
* add param & grad shape check for sgd op * add _reshape_inplece interface for dygraph parallel * refine unittest based paddle/models scripts, test=develop * add unittest for parallel grad fuse, test=develop
-
由 Aurelius84 提交于
* fix decay param in DecayAdagrad test=develop (#22026) * fix integer overflow in match_matrix (#22036) * fix integer overflow in match_matrix test=develop * fix integer overflow in match_matrix test=develop * fix typo test=develop
-
- 16 12月, 2019 1 次提交
-
-
由 石晓伟 提交于
-
- 09 12月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
This reverts commit 0473cdb8.
-
- 08 12月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
test=release/1.6
-
- 06 12月, 2019 3 次提交
-
-
由 bingyanghuang 提交于
-
由 Aurelius84 提交于
-
由 石晓伟 提交于
-
- 05 12月, 2019 2 次提交
-
-
由 Pei Yang 提交于
-
由 lilong12 提交于
* fix the computation for dx (grad for x) for prelu operation. (#20949) * set the default value of alpha for prelu to 0.25, test=develop * add the call to __syncthreads(), test=develop * fix the implementation of cpu prelu, test=develop * repair the implementation of element mode prelu, test=develop * modify test_prelu_op.py, test=develop
-
- 04 12月, 2019 6 次提交
-
-
由 Pei Yang 提交于
make config option DisableGlogInfo() able to mute all inference logs
-
由 tangwei12 提交于
* fix fetch handler problem and refactor when a user define FetchHandler class, he or she should initialize a handler with variable dict. the key of a variable dict is a user defined name, the value of a variable dict is a Varaible generated from python API. For each fetching, a user should implement handler function in which fetched_result_dict will be available and the user can access the fetched value with user defined keys.
-
由 Zhaolong Xing 提交于
* ADD NV JETSON SUPPORT test=release/1.6 * CHERRY_PICK: specify the auto growth allocator for inference. test=release/1.6
-
由 WangXi 提交于
-
由 bingyanghuang 提交于
-
由 hong 提交于
* disable reshape inplace in dygraph model; test=develop (#21157) * fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)
-
- 03 12月, 2019 11 次提交
-
-
由 lilong12 提交于
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
-
由 Lv Mengsi 提交于
* fix transpose conv,test=develop * fix comments test=develop
-
由 zhaoyuchen2018 提交于
* Improve argsort performance. - Give 200000 data to compute argsort on v100, can speed up ~190x before opt cost: 0.53s after opt cost:0.0027s - Add fp16 support * Refine error message * Refine code * Add descending sort test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Kaipeng Deng 提交于
* add Adam beta1/beta2 support Variable. test=develop
-
由 zhaoyuchen2018 提交于
* Add Asypadding for conv fusion. test=develop reference: pr/20042 * Fix eigen build link error * Change back file mode * Use math function & add more checks.
-
由 lilong12 提交于
* add the framework support for distfc and ut, test=develop * fix the implementation of shard_index_op, test=develop
-
由 Kaipeng Deng 提交于
* batch_norm momentum support variable. test=develop
-
由 石晓伟 提交于
-
由 Pei Yang 提交于
-
由 bingyanghuang 提交于
-
由 wangguanzhong 提交于
-
- 02 12月, 2019 3 次提交
-
-
由 Thunderbrook 提交于
* support dump param of model into afs (#20302) * support dump param to afs test=develop * code style test=develop * code style test=develop * dump param test=develop * dump param test=develop * dump param test=develop * dump param test=develop * find lookup table in order (#20932) test=develop * cherry-pick test=develop * solve pslib core in stop worker test=develop * print table stat info for pslib test=develop
-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 zhaoyuchen2018 提交于
The op should handle k=1024 Fix seq_len < warpsize error. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-