Created by: donproc
PR types
Performance optimization
PR changes
OPs
Describe
Adjust Grid and Group size of LookupTableV2/LookupTable
Profiled in Docker with nsys Tesla V100-SXM2, CUDA 10.1, NVIDIA-SMI 440.33.01, Driver Version: 440.33.01
configuration | inputshape | type | size | is_sparse | timing |
---|---|---|---|---|---|
80, 128x8 | [16,16] | fp16 | [2,768] | false | 4288ns |
160, 256x4 | [512, 512] | fp32 | [4000000, 128] | false | 485933 ns |
160, 256x4 | [512, 512] | fp32 | [2000000, 256] | false | 762171 ns |
160, 256x4 | [512, 512] | fp32 | [1000000, 512] | false | 1489623.5 ns |
for input(512, 512) and size(4000000, 128), LookupTable
configuration | speedup |
---|---|
160,256x4 | 6.9x |
8,128x8(current) | 1x |