Created by: xiebaiyuan
open cl mem optimise, split with cpu codes. fix a bug when some memory is not equal 4 . test=develop