Created by: Mycaster
quantization tool seq_generator is too slow to quant embedding table;
I tried single thread with a big table, which shape is [16*10^8, 16] , it cost 250us to quant one line, and I can't wait it to be finished .
So I make multithreads version, now it cost 82minutes with 70 threads (Still needs to be improved)