- 10 1月, 2020 2 次提交
-
-
由 Guo Sheng 提交于
* Fix default label dim of label_smooth_op. test=develop (#21862) * Fix unit tests of label_smooth_op's data size.
-
由 GaoWei8 提交于
* Optimize the kernel implementation of layernorm with openmp (#20895) * Add ernie c++ inference test (#21015) * Add ernie unit test test=develop * Add ernie unit test test=develop * Add ernie unit test test=develop * remove ngraph * optimize gpu test test=develop * optimize codes test=develop * fix cmake fails on inference_download_and_uncompress (#21185) * solve cmake fails on inference_download_and_uncompress test=develop * solve cmake fails on inference_download_and_uncompress test=develop * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop * Polish the codes of fc when needs padding (#21378) test=develop * Add ernie large c++ inference test (#21365) * add ernie-large test test=develop * add ernie large c++ inference test test=develop * Modify padding strategy: remove weight copy in fc padding (#21650) test=develop * optimize fc jit (#21878) test=develop Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
-
- 09 1月, 2020 2 次提交
-
-
由 Chen Weihang 提交于
-
由 WangXi 提交于
[Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce & sync_batch_norm hang in fleet (#22157)
-
- 08 1月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Fix softmax cuda bug * Refine multihead log and softmax logic * Align block to 32
-
- 07 1月, 2020 3 次提交
-
-
由 Chen Weihang 提交于
* add param & grad shape check for sgd op * add _reshape_inplece interface for dygraph parallel * refine unittest based paddle/models scripts, test=develop * add unittest for parallel grad fuse, test=develop
-
由 Aurelius84 提交于
* fix decay param in DecayAdagrad test=develop (#22026) * fix integer overflow in match_matrix (#22036) * fix integer overflow in match_matrix test=develop * fix integer overflow in match_matrix test=develop * fix typo test=develop
-
由 Yibing Liu 提交于
* Fix the global_step & continuous applying error in EMA * Fix for step 0 & add unit test test=release/1.6
-
- 09 12月, 2019 1 次提交
-
-
由 xiegegege 提交于
* fix logger problem test=develop * refine logger test=develop
-
- 06 12月, 2019 3 次提交
-
-
由 bingyanghuang 提交于
-
由 Aurelius84 提交于
-
由 Zhaolong Xing 提交于
* Fix TensorRT detection bug 1. Add new search path for TensorRT at tensorrt.cmake 2. Add better debug message 3. Fix the bug of detection of TensorRT version In NVIDIA official docker image, TensorRT headers are located at `/usr/include/x86_64-linux-gnu` and TensorRT libraries are located at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will fail to detect TensorRT. There is no debug/warning message to tell developer that TensorRT is failed to be detected. In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is defined at `NvInferVersion.h` instead of `NvInfer.h`, so add compatibility fix. * Fix TensorRT variables in CMake 1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}` 2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}` Manually type path may locate incorrect path of TensorRT. Use the paths detected by system instead. * Fix TensorRT library path 1. Add new variable - `${TENSORRT_LIBRARY_DIR}` 2. Fix TensorRT library path inference_lib.cmake and setup.py.in need the path of TensorRT library instead of the file of TensorRT library, so add new variable to fix it. * Add more general search rule for TensoRT Let system detect architecture instead of manually assign it, so replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`. * Add more general search rule for TensorRT Remove duplicate search rules for TensorRT libraries. Use `${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so test=release/1.6
-
- 05 12月, 2019 2 次提交
-
-
由 lilong12 提交于
-
由 lilong12 提交于
* fix the computation for dx (grad for x) for prelu operation. (#20949) * set the default value of alpha for prelu to 0.25, test=develop * add the call to __syncthreads(), test=develop * fix the implementation of cpu prelu, test=develop * repair the implementation of element mode prelu, test=develop * modify test_prelu_op.py, test=develop
-
- 04 12月, 2019 4 次提交
-
-
由 tangwei12 提交于
* fix fetch handler problem and refactor when a user define FetchHandler class, he or she should initialize a handler with variable dict. the key of a variable dict is a user defined name, the value of a variable dict is a Varaible generated from python API. For each fetching, a user should implement handler function in which fetched_result_dict will be available and the user can access the fetched value with user defined keys.
-
由 WangXi 提交于
-
由 bingyanghuang 提交于
-
由 hong 提交于
* disable reshape inplace in dygraph model; test=develop (#21157) * fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)
-
- 03 12月, 2019 7 次提交
-
-
由 lilong12 提交于
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
-
由 zhaoyuchen2018 提交于
* Improve argsort performance. - Give 200000 data to compute argsort on v100, can speed up ~190x before opt cost: 0.53s after opt cost:0.0027s - Add fp16 support * Refine error message * Refine code * Add descending sort test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Kaipeng Deng 提交于
* add Adam beta1/beta2 support Variable. test=develop
-
由 zhaoyuchen2018 提交于
* Add Asypadding for conv fusion. test=develop reference: pr/20042 * Fix eigen build link error * Change back file mode * Use math function & add more checks.
-
由 lilong12 提交于
* add the framework support for distfc and ut, test=develop * fix the implementation of shard_index_op, test=develop
-
由 Kaipeng Deng 提交于
* batch_norm momentum support variable. test=develop
-
由 bingyanghuang 提交于
-
- 02 12月, 2019 1 次提交
-
-
由 Thunderbrook 提交于
* support dump param of model into afs (#20302) * support dump param to afs test=develop * code style test=develop * code style test=develop * dump param test=develop * dump param test=develop * dump param test=develop * dump param test=develop * find lookup table in order (#20932) test=develop * cherry-pick test=develop * solve pslib core in stop worker test=develop * print table stat info for pslib test=develop
-
- 29 11月, 2019 2 次提交
-
-
由 WangXi 提交于
-
由 Wojciech Uss 提交于
-
- 28 11月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052) * fix cache table bug * add save_paddle_inference_model * fix hdfs util bug * test=develop * fix several sparse table issuses (#20686) * no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto. * add find_distributed_lookup_table_grads instead of hard code GRAD * support embedding stop gradient. push sparse has error before fix this.* * fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this. * fix pull sparse, skip slots which do not have embedding. * fix collect feasign label info, skip slots which do not have embedding. * support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables. * test=develop * add copy table (#21086) * copy some feasigns and corresponding embeddings from one sparse table to another * copy all feasigns and corresponding embeddings from one sparse table to another * copy all dense params from one table to another * copy some local vars to other local vars * fix fs_client_param bug (#21212) * fix fs_client_param bug, user can set this config through fleet_desc_file or fleet config * test=develop * fix fleet util bug (#21254) * fix fleet util bug in save paddle inference model * test=develop
-
- 26 11月, 2019 4 次提交
-
-
由 Lv Mengsi 提交于
* Fix gradients (#20857) * fix_gradients * fix_gradients, test=develop * fix instance norm (#21042) * fix instance norm * update unitest,test=develop * fix_bn * revert unittest,test=develop
-
由 bingyanghuang 提交于
-
由 WangXi 提交于
-
由 WangXi 提交于
-
- 25 11月, 2019 3 次提交
-
-
由 lijianshe02 提交于
* add input type and input data type check for Print_op test=develop (#21250) * add input type and input data type check for Print_op test=develop * cherry-pick error info check of Print_op for release1.6 test=develop * cherry-pick error info check of Print_op for release1.6 test=develop
-
由 Zhang Ting 提交于
* [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756) * All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview * fix the bug that attr(offsets) should be initialized, test=develop * [cherry-pick] maxout supports channel_last input (#20846) * maxout support channel_last input, test=develop * modified details of Input(X) and Attr(groups, axis) in doc, test=develop * [cherry-pick] lrn supports channel_last input, test=develop (#20954)
-
- 23 11月, 2019 2 次提交
-
-
由 Kaipeng Deng 提交于
-
由 Kaipeng Deng 提交于
* fix elementwise_mod FP kernel. test=develop * fix unittest. test=develop
-
- 21 11月, 2019 1 次提交
-
-
由 liym27 提交于
[cherry-pick]fix bug in pool/conv/conv_transpose: UpdatePaddingAndDilation, _get_padding_with_SAME and conv2dtranspose_forward_naive. (#20997) (#21225) * fix bug in pool/conv/conv_transpose: 1. It should be stride[i] not stride[0] in UpdatePaddingAndDilation; 2. fix bug of func _get_padding_with_SAME in test_conv/conv_transpose_op.py; 3. fix bug of the computation process in function conv2dtranspose_forward_naive. test=release/1.6
-
- 14 11月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=release/1.6
-