and code clean
MUL CUDA kernel is faked, for the cublas not works Some enhancement needs for TypeSystem, unittests