bert fp32 和fp16运行失败 (#3974) · Issue · PaddlePaddle / models

bert fp32 和fp16运行失败

Created by: ccmeteorljh
paddle commit-id :691ced87c087d3b25c2069e96c74c17a36ff2de2
-----------  Configuration Arguments -----------
batch_size: 32
bert_config_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/bert_config.json
checkpoints: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/save
data_dir: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/data
decr_every_n_nan_or_inf: 2
decr_ratio: 0.8
do_lower_case: True
do_test: True
do_train: True
do_val: True
enable_ce: False
epoch: 2
in_tokens: False
incr_every_n_steps: 1000
incr_ratio: 2.0
init_checkpoint: None
init_loss_scaling: 4294967296
init_pretraining_params: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params
learning_rate: 5e-05
lr_scheduler: linear_warmup_decay
max_seq_len: 128
num_iteration_per_drop_scope: 1
random_seed: 1
save_steps: 1000
shuffle: True
skip_steps: 100
task_name: XNLI
use_cuda: True
use_dynamic_loss_scaling: True
use_fast_executor: False
use_fp16: False
validation_steps: 1000
verbose: False
vocab_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/vocab.txt
warmup_proportion: 0.1
weight_decay: 0.01
------------------------------------------------
attention_probs_dropout_prob: 0.1
directionality: bidi
hidden_act: gelu
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
pooler_fc_size: 768
pooler_num_attention_heads: 12
pooler_num_fc_layers: 3
pooler_size_per_head: 128
pooler_type: first_token_transform
type_vocab_size: 2
vocab_size: 21128
------------------------------------------------
Device count: 1
Num train examples: 392702
Max train steps: 24543
Num warmup steps: 2454
W1124 22:17:56.750891 20557 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W1124 22:17:56.756631 20557 device_context.cc:244] device: 0, cuDNN Version: 7.4.
Load pretraining parameters from /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params.
I1124 22:18:00.329705 20557 parallel_executor.cc:423] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
I1124 22:18:00.433110 20557 build_strategy.cc:364] SeqOnlyAllReduceOps:0, num_trainers:1
I1124 22:18:00.579540 20557 parallel_executor.cc:287] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I1124 22:18:00.645557 20557 parallel_executor.cc:370] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py:773: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "run_classifier.py", line 428, in <module>
    main(args)
  File "run_classifier.py", line 331, in main
    outputs = exe.run(train_compiled_program, fetch_list=fetch_list)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 774, in run
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 769, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 828, in _run_impl
    return_numpy=return_numpy)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 668, in _run_parallel
    tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::operators::ReadOp::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
3   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
4   paddle::framework::details::ComputationOpHandle::RunImpl()
5   paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
6   paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
7   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
8   std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
9   ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 2479, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 424, in _init_non_iterable
    outputs={'Out': self._feed_list})
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 331, in __init__
    self._init_non_iterable()
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 258, in from_generator
    return_list)
  File "/home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/model/classifier.py", line 45, in create_model
    feed_list=inputs, capacity=50, iterable=False)
  File "run_classifier.py", line 211, in main
    num_labels=num_labels)
  File "run_classifier.py", line 428, in <module>
    main(args)

----------------------
Error Message Summary:
----------------------
Error: The feeded Variable input_mask should have dimensions = 3, shape = [-1, 128, 1], but received feeded shape [32, 125, 1]
  [Hint: Expected DimensionIsCompatibleWith(shapes[i], in_dims) == true, but received DimensionIsCompatibleWith(shapes[i], in_dims):0 != true:1.] at (/paddle/paddle/fluid/operators/reader/read_op.cc:133)
  [operator < read > error]
PaddlePaddle / models 大约 1 年 前同步成功

bert fp32 和fp16运行失败

PaddlePaddle / models
大约 1 年前同步成功