Created by: jianhang-liu
Redundant alloc/free will cause big lock overhead in multi-instance condition due to the global mutex lock in BuddyAllocator. The issue below is solved in this PR:
CRF decoder OP do alloc/free (via mutable_data) in each iteration. Try to avoid this by adding two intermediate vars in OP so that alloc will only occur in the 1st iteration.