- 24 2月, 2021 16 次提交
-
-
由 lilong12 提交于
* update, test=develop
-
由 Thunderbrook 提交于
* push multi node * multi node * MultiThread * remove log * solve bug in 30829
-
由 liu zhengxi 提交于
* add get_cublas_handle() api * update format * add unittests * alter function name
-
由 Aurelius84 提交于
* split cxx/nvcc compile flags * enhance input argument check * rename extra_cflags into extrac_cxx_flags * add name checking in setup * fix test_dispatch failed * fix word typo and rm usless import statement * refine import statement * fix unittest failed * fix cuda flags error
-
由 qingqing01 提交于
test=document_fix
-
由 Pei Yang 提交于
* add group norm plugin * fix compile problems * move concat axis check to trt op teller * add nbDims for scale and bias nv dims * add group norm unit test * fix unittest * add trt version restriction for group norm op teller * fix unittest
-
由 Chen Weihang 提交于
-
由 YUNSHEN XIE 提交于
-
由 Chen Weihang 提交于
* add new custom op so * fix use new method error * fix test failed
-
由 tangwei12 提交于
* fix entry * fix distributed lookup table fuse case * fix entry bug at first time * move entry from paddle.fluid -> paddle.distributed * fix ut with paddle.enable_static() Co-authored-by: Nmalin10 <malin10@baidu.com>
-
由 Qi Li 提交于
-
由 yaoxuefeng 提交于
-
由 Aurelius84 提交于
* split build directory for each setup.py * fix template string
-
由 Zhou Wei 提交于
* fix some problem of Windows custom op * fix some problem of Windows custom op * fix some problem of Windows custom op
-
由 chentianyu03 提交于
* add error msg when dtypes of operator are not same * add error msg when dtypes of operator are not same * change error msg to warning msg when dtypes of operator are not same * modify test case to fit for python2
-
由 Zhou Wei 提交于
-
- 23 2月, 2021 13 次提交
-
-
由 alncat 提交于
* added support for fake_quantize_dequantize_abs_max op in quantization inference pass * remove const_cast to pass ci * remove compare operator to pass ci-coverage * added detailed error message for unregistered tensorrt_subgrah_pass
-
由 Chen Weihang 提交于
* split test & add inference test * add timeout config * change to setup install * change to jit compile * add verbose for test * fix load setup name repeat * polish details * resolve conflict * fix code format error
-
由 Jacek Czaja 提交于
-
由 Guanghua Yu 提交于
-
由 xiemoyuan 提交于
* Support 'bool' and 'int' for attention mask. * Update docs. * Add unittest for Transformer. * fix bugs.
-
由 WeiXin 提交于
* add default argument for paddle.save/static.save * edit documentation of * Add comments for special processing for protocol=2 and protocol=3. * Update python/paddle/fluid/io.py Co-authored-by: Nlanxianghit <47554610+lanxianghit@users.noreply.github.com> Co-authored-by: Nlanxianghit <47554610+lanxianghit@users.noreply.github.com>
-
由 Qi Li 提交于
-
由 yukavio 提交于
* remove PrettyTable dependence from paddle.flops * fix bug in python2.7 * fix flops * fix flops * fix bug * fix bug
-
由 wangchaochaohu 提交于
* fix windows for optimization of elementwise_add Op
-
由 joanna.wozna.intel 提交于
* Unification of bfloat16 enablement process and refactor * Remove unnecessary function * Standardize the output name search
-
由 Zhong Hui 提交于
[BUG FIX] Fix softmax cross entropy overflow problem.
-
由 Zhou Wei 提交于
-
由 Qi Li 提交于
-
- 22 2月, 2021 11 次提交
-
-
由 Thunderbrook 提交于
* save multi table one path * format
-
由 Qi Li 提交于
-
由 Huihuang Zheng 提交于
**Problem** In our old shape transformer logic, if user write: ``` s = tensor.shape ... y = paddle.some_api(s) ``` Dy2stat will change it to ``` ... y = paddle.some_api(convert_var_shape(tensor)) ``` However it will cause fatal bug if user changes the shape of `x` after assign. For example: ``` s = tensor.shape ... tensor = paddle.some_change_shape_api(tensor) ... y = paddle.some_api(s) ``` Then the Dy2stat will get wrong result because the code is translated into: ``` tensor = paddle.some_change_shape_api(tensor) ... y = paddle.some_api(convert_var_shape(tensor)) # tensor shape has been changed, not origin `s` value ``` **Solution Logic** It can not be solved in the old logic, so I refactoring tensor_shape_transformer logic. Now we will use `s` to store shape attribute and generate a var `s__STATIC_CONVERT_VAR_SHAPE_SUFFIX` to store static shape API `shape(tensor)` ``` s = tensor.shape ... y = paddle.some_api(s) ``` Dy2stat will change it to ``` s = tensor.shape s__STATIC_CONVERT_VAR_SHAPE_SUFFIX = shape(tensor) ... y = paddle.some_api(choose_shape_attr_or_api(s, s__STATIC_CONVERT_VAR_SHAPE_SUFFIX )) ``` In this case, the code is consistent with origin dygraph meaning and it fixed the change after assign bug. **Code Key Note** To help reviewers, the key change of this PR is changing `self.name_to_var_shape` from "mapping name to shape node" to "mapping name to its STATIC_CONVERT_VAR_SHAPE_SUFFIX name", then if a variable name has the SUFFIX, we can choose to use attribute shape or shape api. Other changes go with the key change. **Consideration** The issue of this PR is that we store extra static `shape` API result, will it harms the speed of Dy2stat? In some cases it will, but we argue that the benefit would be greater than the cost. 1. The extra calling to static `shape` API will happen when coder assign among shape variables. Take the following dygraph code as an instance: ``` s1 = tensor.shape s2 = s1 s3 = s2 ... ``` Then we called extra static `shape` APIs again and again, however users seldom write code like this. 2. If the shape variable is used a lot, for example: ``` s = tensor.shape y1 = paddle.some_api1(s) y2 = paddle.some_api2(s) y3 = paddle.some_api3(s) ``` Our old logic will create 3 shape APIs but now just 1. This is more common user code pattern. In fact, if reviewers take a look at the current unit test in this PR, you could see the op numbers decrease after this PR. So we argue that this PR can also improve speed in this code pattern.
-
由 tangwei12 提交于
* fix dist fleet ctr ut Change-Id: I59bf5123c7bd47bd0e8f1ca2a26295257597c0f5 * fix dist fleet ctr ut Change-Id: Iafcdd172364be47fe67b753774ce09af050bcbce * Update CMakeLists.txt
-
由 Qi Li 提交于
-
由 Qi Li 提交于
-
由 Shang Zhizhou 提交于
* update trt int8 calibrator to IEntropyCalibratorV2 * add delele opt_cache for trt_split_converter_test
-
由 Zhou Wei 提交于
* [2.0.1]Support New Custom OP on windows * fix CI * fix code style * fix CI * fix CI * fix coverage * fix CI * fix CI
-
由 Chen Weihang 提交于
-
由 Qi Li 提交于
* [ROCM] update fluid imperative for rocm (part1), test=develop * [ROCM] update reducer.cc after merge, test=develop * update reducer cmake after merge, test=develop
-
由 JamesLim 提交于
-