- 17 12月, 2021 1 次提交
- 
- 
由 zlsh80826 提交于From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block. There are two ways to resolve this problem: Reduce the threads per block launch configuration add __launch_bound__ to give information to nvcc compiler for reducing registers usage this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
 
- 
- 03 12月, 2021 1 次提交
- 
- 
由 ronnywang 提交于* refine structure for cuda and rocm * update * update * update * update 
 
- 
- 27 4月, 2021 1 次提交
- 
- 
由 Zhong Hui 提交于* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage. 
 
- 
- 20 10月, 2020 1 次提交
- 
- 
由 wangchaochaohu 提交于
 
- 
- 26 9月, 2020 1 次提交
- 
- 
由 Zhong Hui 提交于fix cpplint error for the autmic max/min 
 
- 
- 24 9月, 2020 1 次提交
- 
- 
由 Zhong Hui 提交于Add GPU Kernels of Segment Ops, support, sum, max, min, mean 
 
- 
