Created by: guru4elephant
we should have a load balancing mechanism for gpu serving. gpu memory is limited, it is not acceptable if multiple queries make gpu memory increase out of boundary.