- 06 12月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
* Fix TensorRT detection bug 1. Add new search path for TensorRT at tensorrt.cmake 2. Add better debug message 3. Fix the bug of detection of TensorRT version In NVIDIA official docker image, TensorRT headers are located at `/usr/include/x86_64-linux-gnu` and TensorRT libraries are located at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will fail to detect TensorRT. There is no debug/warning message to tell developer that TensorRT is failed to be detected. In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is defined at `NvInferVersion.h` instead of `NvInfer.h`, so add compatibility fix. * Fix TensorRT variables in CMake 1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}` 2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}` Manually type path may locate incorrect path of TensorRT. Use the paths detected by system instead. * Fix TensorRT library path 1. Add new variable - `${TENSORRT_LIBRARY_DIR}` 2. Fix TensorRT library path inference_lib.cmake and setup.py.in need the path of TensorRT library instead of the file of TensorRT library, so add new variable to fix it. * Add more general search rule for TensoRT Let system detect architecture instead of manually assign it, so replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`. * Add more general search rule for TensorRT Remove duplicate search rules for TensorRT libraries. Use `${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so test=release/1.6
-
- 05 12月, 2019 2 次提交
-
-
由 lilong12 提交于
-
由 lilong12 提交于
* fix the computation for dx (grad for x) for prelu operation. (#20949) * set the default value of alpha for prelu to 0.25, test=develop * add the call to __syncthreads(), test=develop * fix the implementation of cpu prelu, test=develop * repair the implementation of element mode prelu, test=develop * modify test_prelu_op.py, test=develop
-
- 04 12月, 2019 4 次提交
-
-
由 tangwei12 提交于
* fix fetch handler problem and refactor when a user define FetchHandler class, he or she should initialize a handler with variable dict. the key of a variable dict is a user defined name, the value of a variable dict is a Varaible generated from python API. For each fetching, a user should implement handler function in which fetched_result_dict will be available and the user can access the fetched value with user defined keys.
-
由 WangXi 提交于
-
由 bingyanghuang 提交于
-
由 hong 提交于
* disable reshape inplace in dygraph model; test=develop (#21157) * fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)
-
- 03 12月, 2019 7 次提交
-
-
由 lilong12 提交于
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
-
由 zhaoyuchen2018 提交于
* Improve argsort performance. - Give 200000 data to compute argsort on v100, can speed up ~190x before opt cost: 0.53s after opt cost:0.0027s - Add fp16 support * Refine error message * Refine code * Add descending sort test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Kaipeng Deng 提交于
* add Adam beta1/beta2 support Variable. test=develop
-
由 zhaoyuchen2018 提交于
* Add Asypadding for conv fusion. test=develop reference: pr/20042 * Fix eigen build link error * Change back file mode * Use math function & add more checks.
-
由 lilong12 提交于
* add the framework support for distfc and ut, test=develop * fix the implementation of shard_index_op, test=develop
-
由 Kaipeng Deng 提交于
* batch_norm momentum support variable. test=develop
-
由 bingyanghuang 提交于
-
- 02 12月, 2019 1 次提交
-
-
由 Thunderbrook 提交于
* support dump param of model into afs (#20302) * support dump param to afs test=develop * code style test=develop * code style test=develop * dump param test=develop * dump param test=develop * dump param test=develop * dump param test=develop * find lookup table in order (#20932) test=develop * cherry-pick test=develop * solve pslib core in stop worker test=develop * print table stat info for pslib test=develop
-
- 29 11月, 2019 2 次提交
-
-
由 WangXi 提交于
-
由 Wojciech Uss 提交于
-
- 28 11月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052) * fix cache table bug * add save_paddle_inference_model * fix hdfs util bug * test=develop * fix several sparse table issuses (#20686) * no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto. * add find_distributed_lookup_table_grads instead of hard code GRAD * support embedding stop gradient. push sparse has error before fix this.* * fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this. * fix pull sparse, skip slots which do not have embedding. * fix collect feasign label info, skip slots which do not have embedding. * support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables. * test=develop * add copy table (#21086) * copy some feasigns and corresponding embeddings from one sparse table to another * copy all feasigns and corresponding embeddings from one sparse table to another * copy all dense params from one table to another * copy some local vars to other local vars * fix fs_client_param bug (#21212) * fix fs_client_param bug, user can set this config through fleet_desc_file or fleet config * test=develop * fix fleet util bug (#21254) * fix fleet util bug in save paddle inference model * test=develop
-
- 26 11月, 2019 4 次提交
-
-
由 Lv Mengsi 提交于
* Fix gradients (#20857) * fix_gradients * fix_gradients, test=develop * fix instance norm (#21042) * fix instance norm * update unitest,test=develop * fix_bn * revert unittest,test=develop
-
由 bingyanghuang 提交于
-
由 WangXi 提交于
-
由 WangXi 提交于
-
- 25 11月, 2019 3 次提交
-
-
由 lijianshe02 提交于
* add input type and input data type check for Print_op test=develop (#21250) * add input type and input data type check for Print_op test=develop * cherry-pick error info check of Print_op for release1.6 test=develop * cherry-pick error info check of Print_op for release1.6 test=develop
-
由 Yi Liu 提交于
* fix bug of issue #21259 (#21287) pass the argument `allow_out_of_range` of one_hot op to c++ back end.
-
由 Zhang Ting 提交于
* [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756) * All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview * fix the bug that attr(offsets) should be initialized, test=develop * [cherry-pick] maxout supports channel_last input (#20846) * maxout support channel_last input, test=develop * modified details of Input(X) and Attr(groups, axis) in doc, test=develop * [cherry-pick] lrn supports channel_last input, test=develop (#20954)
-
- 23 11月, 2019 2 次提交
-
-
由 Kaipeng Deng 提交于
-
由 Kaipeng Deng 提交于
* fix elementwise_mod FP kernel. test=develop * fix unittest. test=develop
-
- 21 11月, 2019 1 次提交
-
-
由 liym27 提交于
[cherry-pick]fix bug in pool/conv/conv_transpose: UpdatePaddingAndDilation, _get_padding_with_SAME and conv2dtranspose_forward_naive. (#20997) (#21225) * fix bug in pool/conv/conv_transpose: 1. It should be stride[i] not stride[0] in UpdatePaddingAndDilation; 2. fix bug of func _get_padding_with_SAME in test_conv/conv_transpose_op.py; 3. fix bug of the computation process in function conv2dtranspose_forward_naive. test=release/1.6
-
- 14 11月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=release/1.6
-
- 11 11月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TODO: fix cudnn_conv and re-enable it test=develop test=release/1.6
-
- 07 11月, 2019 2 次提交
- 06 11月, 2019 1 次提交
-
-
由 bingyanghuang 提交于
-
- 01 11月, 2019 7 次提交
-
-
由 liym27 提交于
* [cherry-pick]fix bug in reshape: (#20781) consider the situation that shape of input can contain more than one -1. * [cherry-pick]support Tensor for split and concat, support -1 in num_or_sections, add check num_or_sections (#20780) * improve split and concat op: 1. support Tensor for argument 'dim' in split op. 2. support Tensor for argument 'axis' in concat op. * redefine function GetDataFromTensor and set unknown output shape to - 1. * add check: Attr(sections) match Input(X). * support Tensor for attr(sections) and attr(sections) can contain -1. * modify error message and fix bug for concat and call Resize only when necessary. test=release/1.6 * [cherry-pick]improve unsqueeze op to support int, Tensor for argument axes (#20824) * improve unsqueeze op to support int, Tensor and Tensor list for argument axes. * call Resize only when necessary. test=release/1.6 * [cherry-pick]Compatible int32 and int64 for attr in concat/split/unsqueeze. test=release/1.6 (#20912)
-
由 WangXi 提交于
-
由 xujiaqi01 提交于
cherry-pick1.6 simplify master+patch,remove ins when size != merge_size or has conflict slot (#20941) * simplify master+patch,remove ins when size != merge_size or has conflict slot * test=develop
-
由 xujiaqi01 提交于
* add check nan / inf in downpour worker during training * test=develop
-
由 123malin 提交于
* update pserver decay blocks * update distributed notify handler
-
由 liym27 提交于
[cherry-pick] keep the size of symmetric padding is 2 for 2d and 3 for 3d. test=release/1.6 (#20903) (#20939)
-
由 Chengmo 提交于
* Fix Paddle Cloud role maker (#20860)
-