Implement CRF and Viterbi decoding by some elementary operators
Created by: lcy-seso
The computation of CRF, in essence, can be regarded as "softmax with cross entropy", but the normalization factor is more complicated.
Maybe, in the near feature, with a better support of scatter/gather/ conditional operators, we can also try to implement CRF and the Viterbi decoding process by using elementary operators (with no sequence padding) to test our framework.
Just my personal consideration, maybe it is not very necessary.