Normally, the first `throughput` value e.g. `52.257` is discarded because the start time of first batch is not correct. we average the other `throughput` as the throughput of this test.
## BERT base Pretrain Test Results
All test logs can be found [here](https://oneflow-public.oss-cn-beijing.aliyuncs.com/OF_benchmark_logs/of_leinao_benchmark_log_0813.tar.gz)
All test logs can be found [here](https://oneflow-public.oss-cn-beijing.aliyuncs.com/OF_benchmark_logs/oneflow_bert_benchmark_logs.tgz)
### Group: batch size per device = 32
BERT Base Pretrain, batch size per device=32, dtype=float32, without XLA
| node num | device num | bsz per device | throughput | speedup | memory(MiB) |
Normally, the first `samples/s` value e.g. `288.088` is discarded because the start time of first batch is not correct. we average the other `samples/s` as the throughput of this test.
## Test Results
All test logs can be found [here](https://oneflow-public.oss-cn-beijing.aliyuncs.com/OF_benchmark_logs/of_leinao_benchmark_log_0813.tar.gz)
All test logs can be found [here](https://oneflow-public.oss-cn-beijing.aliyuncs.com/OF_benchmark_logs/oneflow_resnet50_logs.tgz)
### Group: batch size per device = 128
ResNet50 V1.5, batch size per device=128, dtype=float32, without XLA
| node num | device num | bsz per device | throughput | speedup | memory(MiB) |