Created by: kuke
Major improvements:
- Enable knowledge in float32/float64 data type transferred in float16, to reduce these knowledge data by half;
- Use multiple threads for post processing knowledge data, and add more knowledge queues for one teacher-student online connection at the same time;
- For some knowledge schema appearing in feed, get them in CPUPlace directly instead of fetch from GPUPlace.