Created by: dzhwinter
add op level memory debug info usage:
export FLAGS_benchmark=true
GLOG_vmodule=operator=3 GLOG_logtostderr=1
Then it will print each op gpu memory cost during running.
Even more, if u set the debug level to 4, then it will print the detail in infer_shape, kernel_running phases.(because infershape allocate lod memory)