- 29 4月, 2021 6 次提交
-
-
由 joanna.wozna.intel 提交于
* Add bf16 uniform random initializer * Remove duplicated section * Change UT to CPU place only * Put detail functions into anonymous namespace
-
由 arlesniak 提交于
This is cherry-pick of #32281
-
由 Chen Weihang 提交于
cherry-pick of #32666
-
由 Pei Yang 提交于
-
由 Wilber 提交于
后续修复计划是啥
-
由 Jacek Czaja 提交于
- Executor is nt always having FLAGS_use_mkldnn set to true
-
- 28 4月, 2021 2 次提交
-
-
由 wenbin 提交于
-
由 jiangcheng 提交于
* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop * remove useless while loop and optimize variable name, test=develop * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop * optimize variable name for readable by change prefix identifier from t_ to local_
-
- 27 4月, 2021 5 次提交
-
-
由 Zhong Hui 提交于
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
-
由 Pei Yang 提交于
-
由 tianshuo78520a 提交于
This reverts commit 4b7242b0.
-
由 Aurelius84 提交于
-
由 XiangGao 提交于
Co-authored-by: NYang Zhang <yangzhang@live.com>
-
- 26 4月, 2021 13 次提交
-
-
由 lilong12 提交于
* add sendrecv, test=develop
-
由 Zhou Wei 提交于
* clear CUDA compile environment on windows * fix Windows CI * fix Windows CI * fix Windows CI
-
由 jiangcheng 提交于
* new optimize for where_index_op with prefix sum version. * write a scan prefix sum kernel with stream for where index op. * optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel. * remove CheckTrue struct and rename stide_array for readable. * optimize variable name for readable. * optimize function name and annotation.
-
由 Thunderbrook 提交于
* optimize pull sparse * optimize pull sparse * change macro * format
-
由 Yiqun Liu 提交于
* Unset ReserveSpace for inference program. * Support training from an inference program.
-
由 WangXi 提交于
-
由 ShenLiang 提交于
* fix model parallel * rm parallel_help.py * add embedding
-
由 石晓伟 提交于
-
由 WeiXin 提交于
* support backward return None. * edit unittest. * edit code according to CI * Improve error information
-
由 jiangcheng 提交于
* optimize slice op and slice grad op, test=develop * optimize variable name and annotation information, test=develop
-
由 Leo Chen 提交于
* skip op has no fp16 kernel * add ut
-
由 Leo Chen 提交于
-
由 Shang Zhizhou 提交于
-
- 25 4月, 2021 14 次提交
-
-
由 Pei Yang 提交于
* fix airank bert emb order * move input num check to converter * add input num check * add unused var check white list
-
由 liym27 提交于
-
由 liym27 提交于
-
由 Baibaifan 提交于
-
由 Pei Yang 提交于
* add trt runtime version check * use different wrap, and change to major version check
-
由 Pei Yang 提交于
-
由 Zhang Ting 提交于
-
由 Leo Chen 提交于
-
由 Wilber 提交于
-
由 pangyoki 提交于
-
由 Qi Li 提交于
-
由 minghaoBD 提交于
-
由 lilong12 提交于
* update
-
由 wawltor 提交于
* fix bug: when x.dim < y.dim, the result of compare_op is inverse to expected result * support the cuda for fix the compare broadcast bug
-