Fix quantized-inference & Add generic support of checkpoint loading (#2547)
* fix checkpoint loading when it is a dictionary
* fix some issues with saving ckpt & int8 inference
* fix quantized-inference & add generic support of checkpoint loading
* remove int8 hard-coded flag
* fix mlp return tensors
* fix several issue to load checkpoints of GPT-J, GPT-NEOX, and OPT with different TP-size
* add more comments & description for checkpoint-loading module
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
Showing
想要评论请 注册 或 登录