[QAT] BERT Model (#21228) · Issue · PaddlePaddle / Paddle

[QAT] BERT Model

Created by: Sand3r-

Hello, as it has been agreed on previously, intel is going to prepare QAT pass for BERT INT8. Hence, we'd like to find out the following:

What flavour (type) of BERT model shall we optimize? There are several ones that I know of, each differs by the ops it is built of.
Could you please provide data the reader for us? Two of the QAT BERT models we have received accepted only 2 inputs, while other BERT models we have saw (the one from benchmark repository or the one from bert unit-test), contained 4 inputs, named placeholder[0-3].
How can we find out how to compute the accuracy?
What is the performance measure that we use for the model in question? Is it words per second (wps) or something else?