Created by: tensor-tang
This is an issue of NLP online service.
When run inference, the memory usage is always kept as about 6G, which is definitely larger than actually needed.