How to run BERT model correctly and interpret results? (#3828) · Issue · PaddlePaddle / models

How to run BERT model correctly and interpret results?

Created by: ghost

Hi,

I have a question regarding BERT model. I use the model from this repository and run it with parameters you suggest, I run the pretraining stage, so I use train.py script. I only change batch size and max_seq_len parameters, but config and every other parameter is the same as recommended.

Now my question is - how to interpret the results? For example if I run with batch_size 1024 and I get the following output:

epoch: 4, progress: 1/1, step: 8720, loss: 8.089045, ppl: 1050.821289, next_sent_acc: 0.250000, speed: 5.793753 steps/s, file: demo_wiki_train.gz

does that mean that BERT processed 1024 * 5.79 sentences per second? Or just 5.79 sentences per second? What does this step mean in that context?

Second question - why can't I set batch_size smaller that max_seq_len? I would like to run some tests on batch 8 for example but I cannot do that.

Traceback (most recent call last):
  File "train.py", line 449, in <module>
    train(args)
  File "train.py", line 270, in train
    batch_size=args.batch_size // args.max_seq_len)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/contrib/memory_usage_calc.py", line 75, in memory_usage
    raise ValueError("The batch size need to be positive.")
ValueError: The batch size need to be positive.

I don't understand why this is the case. In other DL frameworks I can run whatever batch I want regardless of max_seq_len.

Last question - why doesn't fp16 speed things up? When I run batch_size=512, max_seq_len=128 with fp32 I get roughly 18-20 steps/s, then when I use --use_fp16=true I get the same numbers. Is this desired behavior? I use NVIDIA V100 16GB. I see now that for larger batch_size it does indeed speed things up a bit and less memory is used.

PaddlePaddle / models 大约 1 年 前同步成功

How to run BERT model correctly and interpret results?

PaddlePaddle / models
大约 1 年前同步成功