- 16 9月, 2020 2 次提交
-
-
由 mapingshuo 提交于
* fix strategy, test=develop * fix can_apply
-
由 chalsliu 提交于
-
- 15 9月, 2020 5 次提交
-
-
由 Shang Zhizhou 提交于
* optimize slice TRT plugin This patch removes unnecessary barrier for data transfer of needed offset, so data transfer can be overlap with GPU kernel execution. This patch also fixes incorrect name of slice plugin. That is, replaces "layernorm" with "slice" test=develop * add serialize/deserialize to slice plugin * add static shape slice trt plugin * fix slice trt op convertor dynamic shape bug * fix format by clang-format * fix pylint format error * fix problems commented by peiyang Co-authored-by: NRyan Jeng <rjeng@nvidia.com>
-
由 Wilber 提交于
-
由 Shang Zhizhou 提交于
* optimize errror report * add test case for pad op converter * fix some spelling mistake commented by peiyang
-
由 GaoWei8 提交于
* replace sequence length attr to input
-
由 cc 提交于
* Remove the cache in post_traning_quantization, test=develop
-
- 14 9月, 2020 10 次提交
-
-
由 YUNSHEN XIE 提交于
-
由 zhupengyang 提交于
-
由 lilong12 提交于
* bug fix, test=develop
-
由 LielinJiang 提交于
Fix conv deepwise bug when in_channels=1.
-
由 xiaoting 提交于
-
由 MRXLT 提交于
* add check for sparse parameters with weight_decay * move sparse check to adam.py
-
由 Chen Weihang 提交于
* move worker loop to top level * move reader process loop to top level * fix failed unittests
-
由 Zhen Wang 提交于
Update amp_check_finite_and_scale_op and add an updating_loss_scaling op for static graph amp training. (#26240) * update amp_check_finite_and_scale_op for static_amp. * use amp_check_finite_and_scale in static graph amp. * update grads to zero when grads own infinite values(as for amp_checkout_finite_and_scale op). * add update_loss_scaling op in cpp. * add update_loss_scaling_op unit test. * update the doc of the check_finite_and_unscale op * Update the process of gradients updating skipping if the gradients have infinite values. * update the way to zero grads. * update test_update_loss_scaling_op.py * add log info when find infinite grads. * add the unit test for UpdateLossScaling Layer.
-
由 ShenLiang 提交于
* rm auto from localsgd
-
由 Adam 提交于
* Add int8 GRU kernel with UTs * Lint fixes * More lint fixes
-
- 11 9月, 2020 8 次提交
-
-
由 Leo Chen 提交于
* temporally disable zero_copy * add test * follow comments
-
由 Leo Chen 提交于
-
由 Aurelius84 提交于
* fix calcu_gradients * fix code place * fix embedding interface usage
-
由 Chen Weihang 提交于
-
由 liym27 提交于
-
由 Chen Weihang 提交于
-
由 Aurelius84 提交于
* support to_static(model) * add warning and unittest
-
由 furnace 提交于
-
- 10 9月, 2020 11 次提交
-
-
由 Zhen Wang 提交于
* Use the single GPU card to execute the test_fuse_bn_act_pass UT.
-
由 Zhen Wang 提交于
-
由 lilong12 提交于
* add double grad for tile, test=develop * add double grad for expand_v2 op, test=develop
-
由 lilong12 提交于
* add double grad for expand, test=develop
-
由 Chen Weihang 提交于
* add some unittest cases ot verify jit.save, no_test * add more unittests * add test with example inputs * polish implement details * remove useless blank * fix fetch random error
-
由 ShenLiang 提交于
-
由 123malin 提交于
* parameter_server_optimizer support auto_strategy
-
由 wawltor 提交于
fix the CudaPinMemory bug for the equal op and add the test case for the equal op
-
由 liym27 提交于
-
由 Huihuang Zheng 提交于
Decrease the number of running iterations to reduce CI time. CI system shows it decreased the unittest time from about 90 seconds to about 30 seconds
-
由 zhupengyang 提交于
-
- 09 9月, 2020 4 次提交
-
-
由 JZ-LIANG 提交于
add lars to fleet meta optimizer
-
由 liym27 提交于
-
由 Dong Daxiang 提交于
* refine launch and distributed repr string for print
-
由 Qinghe JING 提交于
* set default value to strategy in distributed_optimizer test=develop
-