Created by: cryoco
-- Added cuda kernel for concat op and elementwise_add op. -- removed eigen dependency from nearest_interp cuda kernel