Does it need to enhance matmul_op to support 4-D inputs
Created by: lcy-seso
When checking the dot product attention in ConvS2S and Transformer. I found in multi-head (self) attention, both inputs of the batched matrix multiplication can potentially be a 4-D tensor.
It seems we can enhance the current matmul_op to support 4-D tensor as its inputs, however, I guess this is determined by how to batch the computation to accelerate the computation speed.
Or the multiple heads can be simply wrapped in a Python API by using a for
loop.