Created by: sneaxiy
This PR changes the output Mask of dropout_op to be type of uint8_t. (Furthermore, we can change Mask to be something like std::vector<bool>).
This PR makes the maximum batch size of Transformer model in benchmark repo reach 12000 stably.