How to run BERT model correctly and interpret results?
Created by: ghost
Hi,
I have a question regarding BERT model. I use the model from this repository and run it with parameters you suggest, I run the pretraining
stage, so I use train.py
script. I only change batch size and max_seq_len parameters, but config and every other parameter is the same as recommended.
- Now my question is - how to interpret the results? For example if I run with batch_size 1024 and I get the following output:
epoch: 4, progress: 1/1, step: 8720, loss: 8.089045, ppl: 1050.821289, next_sent_acc: 0.250000, speed: 5.793753 steps/s, file: demo_wiki_train.gz
does that mean that BERT processed 1024 * 5.79
sentences per second? Or just 5.79
sentences per second? What does this step
mean in that context?
- Second question - why can't I set
batch_size
smaller thatmax_seq_len
? I would like to run some tests on batch 8 for example but I cannot do that.
Traceback (most recent call last):
File "train.py", line 449, in <module>
train(args)
File "train.py", line 270, in train
batch_size=args.batch_size // args.max_seq_len)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/contrib/memory_usage_calc.py", line 75, in memory_usage
raise ValueError("The batch size need to be positive.")
ValueError: The batch size need to be positive.
I don't understand why this is the case. In other DL frameworks I can run whatever batch I want regardless of max_seq_len.
- Last question - why doesn't fp16 speed things up? When I run
batch_size=512, max_seq_len=128
with fp32 I get roughly 18-20steps/s
, then when I use--use_fp16=true
I get the same numbers. Is this desired behavior? I use NVIDIA V100 16GB. I see now that for larger batch_size it does indeed speed things up a bit and less memory is used.