Figure 5 Computational graph after composite operator fusion
Figure 5 Computational graph after composite operator fusion
### Training Time for One Step
BERT-large scenario: After the graph kernel fusion function is enabled for the BERT-large network, the training time for one step can be improved by more than 10% while the accuracy is the same as that before the function is enabled.