Created by: lcy-seso
This almost becomes a baseline model for NMT. How the attention mechanism is implemented can be shared ConvS2S and many other models.