@@ -92,7 +92,7 @@ Different from the end-to-end direct prediction for word distribution of the dee
Machine Translation
---------------------
Machine Translation transforms a natural language (source language) into another natural language (target speech), which is a very basic and important research direction in natural language processing. In the wave of globalization, the important role played by machine translation in promoting cross-language civilization communication is self-evident. Its development has gone through stages such as statistical machine translation and neural-network-based Neuro Machine Translation (NMT). After NMT matured, machine translation was really applied on a large scale. The early stage of NMT is mainly based on the recurrent neural network RNN. The current time step in the training process depends on the calculation of the previous time step, so it is difficult to parallelize the time steps to improve the training speed. Therefore, NMTs of non-RNN structures have emerged, such as structures based on convolutional neural networks CNN and structures based on Self-Attention.
Machine Translation transforms a natural language (source language) into another natural language (target language), which is a very basic and important research direction in natural language processing. In the wave of globalization, the important role played by machine translation in promoting cross-language civilization communication is self-evident. Its development has gone through stages such as statistical machine translation and neural-network-based Neuro Machine Translation (NMT). After NMT matured, machine translation was really applied on a large scale. The early stage of NMT is mainly based on the recurrent neural network RNN. The current time step in the training process depends on the calculation of the previous time step, so it is difficult to parallelize the time steps to improve the training speed. Therefore, NMTs of non-RNN structures have emerged, such as structures based on convolutional neural networks CNN and structures based on Self-Attention.
The Transformer implemented in this example is a machine translation model based on the self-attention mechanism, in which there is no more RNN or CNN structure, but fully utilizes Attention to learn the context dependency. Compared with RNN/CNN, in a single layer, this structure has lower computational complexity, easier parallelization, and easier modeling for long-range dependencies, and finally achieves the best translation effect among multiple languages.