bert fp32 和fp16运行失败 (#3975) · Issue · PaddlePaddle / models

bert fp32 和fp16运行失败

Created by: ccmeteorljh

paddle commit-id :691ced87c087d3b25c2069e96c74c17a36ff2de2 fp32报错如下：

-----------  Configuration Arguments -----------
batch_size: 32
bert_config_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/bert_config.json
checkpoints: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/save
data_dir: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/data
decr_every_n_nan_or_inf: 2
decr_ratio: 0.8
do_lower_case: True
do_test: True
do_train: True
do_val: True
enable_ce: False
epoch: 2
in_tokens: False
incr_every_n_steps: 1000
incr_ratio: 2.0
init_checkpoint: None
init_loss_scaling: 4294967296
init_pretraining_params: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params
learning_rate: 5e-05
lr_scheduler: linear_warmup_decay
max_seq_len: 128
num_iteration_per_drop_scope: 1
random_seed: 1
save_steps: 1000
shuffle: True
skip_steps: 100
task_name: XNLI
use_cuda: True
use_dynamic_loss_scaling: True
use_fast_executor: False
use_fp16: False
validation_steps: 1000
verbose: False
vocab_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/vocab.txt
warmup_proportion: 0.1
weight_decay: 0.01
------------------------------------------------
attention_probs_dropout_prob: 0.1
directionality: bidi
hidden_act: gelu
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
pooler_fc_size: 768
pooler_num_attention_heads: 12
pooler_num_fc_layers: 3
pooler_size_per_head: 128
pooler_type: first_token_transform
type_vocab_size: 2
vocab_size: 21128
------------------------------------------------
Device count: 1
Num train examples: 392702
Max train steps: 24543
Num warmup steps: 2454
W1124 22:17:56.750891 20557 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W1124 22:17:56.756631 20557 device_context.cc:244] device: 0, cuDNN Version: 7.4.
Load pretraining parameters from /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params.
I1124 22:18:00.329705 20557 parallel_executor.cc:423] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
I1124 22:18:00.433110 20557 build_strategy.cc:364] SeqOnlyAllReduceOps:0, num_trainers:1
I1124 22:18:00.579540 20557 parallel_executor.cc:287] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I1124 22:18:00.645557 20557 parallel_executor.cc:370] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py:773: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "run_classifier.py", line 428, in <module>
    main(args)
  File "run_classifier.py", line 331, in main
    outputs = exe.run(train_compiled_program, fetch_list=fetch_list)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 774, in run
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 769, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 828, in _run_impl
    return_numpy=return_numpy)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 668, in _run_parallel
    tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::operators::ReadOp::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
3   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
4   paddle::framework::details::ComputationOpHandle::RunImpl()
5   paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
6   paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
7   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
8   std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
9   ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 2479, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 424, in _init_non_iterable
    outputs={'Out': self._feed_list})
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 331, in __init__
    self._init_non_iterable()
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 258, in from_generator
    return_list)
  File "/home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/model/classifier.py", line 45, in create_model
    feed_list=inputs, capacity=50, iterable=False)
  File "run_classifier.py", line 211, in main
    num_labels=num_labels)
  File "run_classifier.py", line 428, in <module>
    main(args)

----------------------
Error Message Summary:
----------------------
Error: The feeded Variable input_mask should have dimensions = 3, shape = [-1, 128, 1], but received feeded shape [32, 125, 1]
  [Hint: Expected DimensionIsCompatibleWith(shapes[i], in_dims) == true, but received DimensionIsCompatibleWith(shapes[i], in_dims):0 != true:1.] at (/paddle/paddle/fluid/operators/reader/read_op.cc:133)
  [operator < read > error]

fp16报错如下：

-----------  Configuration Arguments -----------
batch_size: 32
bert_config_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/bert_config.json
checkpoints: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/save
data_dir: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/data
decr_every_n_nan_or_inf: 2
decr_ratio: 0.8
do_lower_case: True
do_test: True
do_train: True
do_val: True
enable_ce: False
epoch: 2
in_tokens: False
incr_every_n_steps: 1000
incr_ratio: 2.0
init_checkpoint: None
init_loss_scaling: 4294967296
init_pretraining_params: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params
learning_rate: 5e-05
lr_scheduler: linear_warmup_decay
max_seq_len: 128
num_iteration_per_drop_scope: 1
random_seed: 1
save_steps: 1000
shuffle: True
skip_steps: 100
task_name: XNLI
use_cuda: True
use_dynamic_loss_scaling: True
use_fast_executor: False
use_fp16: True
validation_steps: 1000
verbose: False
vocab_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/vocab.txt
warmup_proportion: 0.1
weight_decay: 0.01
------------------------------------------------
attention_probs_dropout_prob: 0.1
directionality: bidi
hidden_act: gelu
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
pooler_fc_size: 768
pooler_num_attention_heads: 12
pooler_num_fc_layers: 3
pooler_size_per_head: 128
pooler_type: first_token_transform
type_vocab_size: 2
vocab_size: 21128
------------------------------------------------
Device count: 1
Num train examples: 392702
Max train steps: 24543
Num warmup steps: 2454
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'dtype' in fluid.embedding only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in cast only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in dropout only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'y' in matmul only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in matmul only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in fc only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in reshape only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in transpose only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in softmax only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in mean only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in accuracy only support float16 in GPU now. 
  (input_name, op_name, extra_message))
W1124 23:13:03.476914 21451 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W1124 23:13:03.490574 21451 device_context.cc:244] device: 0, cuDNN Version: 7.4.
Load pretraining parameters from /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params.
Cast parameters to float16 data format.
I1124 23:13:08.745292 21451 parallel_executor.cc:423] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
I1124 23:13:08.883260 21451 build_strategy.cc:364] SeqOnlyAllReduceOps:0, num_trainers:1
I1124 23:13:09.099370 21451 parallel_executor.cc:287] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I1124 23:13:09.196668 21451 parallel_executor.cc:370] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py:773: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "run_classifier.py", line 428, in <module>
    main(args)
  File "run_classifier.py", line 331, in main
    outputs = exe.run(train_compiled_program, fetch_list=fetch_list)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 774, in run
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 769, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 828, in _run_impl
    return_numpy=return_numpy)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 668, in _run_parallel
    tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::platform::float16 const* paddle::framework::Tensor::data<paddle::platform::float16>() const
3   void paddle::operators::ElementwiseComputeEx<paddle::operators::MulFunctor<paddle::platform::float16, void>, paddle::platform::CUDADeviceContext, paddle::platform::float16, paddle::platform::float16>(paddle::framework::ExecutionContext const&, paddle::framework::Tensor const*, paddle::framework::Tensor const*, int, paddle::operators::MulFunctor<paddle::platform::float16, void>, paddle::framework::Tensor*)
4   void paddle::operators::default_elementwise_mul<paddle::platform::CUDADeviceContext, paddle::platform::float16>(paddle::framework::ExecutionContext const&, paddle::framework::Tensor const*, paddle::framework::Tensor const*, paddle::framework::Tensor*)
5   paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16>::Compute(paddle::framework::ExecutionContext const&) const
6   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 4ul, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, long>, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
7   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
8   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
9   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
10  paddle::framework::details::ComputationOpHandle::RunImpl()
11  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
12  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
13  std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
14  std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
15  ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 2479, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layers/math_op_patch.py", line 239, in __impl__
    attrs={'axis': axis})
  File "/home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/optimization.py", line 153, in optimization
    param.name] * weight_decay * scheduled_lr
  File "run_classifier.py", line 227, in main
    decr_ratio=args.decr_ratio)
  File "run_classifier.py", line 428, in <module>
    main(args)

----------------------
Error Message Summary:
----------------------
InvalidArgumentError: Tensor holds the wrong type, it holds float, but desires to be ::paddle::platform::float16.
  [Hint: Expected valid == true, but received valid:0 != true:1.] at (/paddle/paddle/fluid/framework/tensor_impl.h:33)
  [operator < elementwise_mul > error]

PaddlePaddle / models 大约 2 年 前同步成功

bert fp32 和fp16运行失败

PaddlePaddle / models
大约 2 年前同步成功