Created by: jianhang-liu
Redundant alloc/free will cause big lock overhead in multi-instance condition due to the global mutex lock in BuddyAllocator. The below issue is solved in this PR:
Predictor create a temporary tensor and then copy data from read batch into it. The allocated memory will soon be free when executing feed op. To avoid this, add "pass in data" interface in tensor and use it in predictor;