Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • models
  • Issue
  • #3975

M
models
  • 项目概览

PaddlePaddle / models
大约 2 年 前同步成功

通知 232
Star 6828
Fork 2962
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
M
models
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 602
    • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
    • 合并请求 255
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 11月 25, 2019 by saxon_zh@saxon_zhGuest

bert fp32 和fp16运行失败

Created by: ccmeteorljh

paddle commit-id :691ced87c087d3b25c2069e96c74c17a36ff2de2 fp32报错如下:

-----------  Configuration Arguments -----------
batch_size: 32
bert_config_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/bert_config.json
checkpoints: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/save
data_dir: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/data
decr_every_n_nan_or_inf: 2
decr_ratio: 0.8
do_lower_case: True
do_test: True
do_train: True
do_val: True
enable_ce: False
epoch: 2
in_tokens: False
incr_every_n_steps: 1000
incr_ratio: 2.0
init_checkpoint: None
init_loss_scaling: 4294967296
init_pretraining_params: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params
learning_rate: 5e-05
lr_scheduler: linear_warmup_decay
max_seq_len: 128
num_iteration_per_drop_scope: 1
random_seed: 1
save_steps: 1000
shuffle: True
skip_steps: 100
task_name: XNLI
use_cuda: True
use_dynamic_loss_scaling: True
use_fast_executor: False
use_fp16: False
validation_steps: 1000
verbose: False
vocab_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/vocab.txt
warmup_proportion: 0.1
weight_decay: 0.01
------------------------------------------------
attention_probs_dropout_prob: 0.1
directionality: bidi
hidden_act: gelu
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
pooler_fc_size: 768
pooler_num_attention_heads: 12
pooler_num_fc_layers: 3
pooler_size_per_head: 128
pooler_type: first_token_transform
type_vocab_size: 2
vocab_size: 21128
------------------------------------------------
Device count: 1
Num train examples: 392702
Max train steps: 24543
Num warmup steps: 2454
W1124 22:17:56.750891 20557 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W1124 22:17:56.756631 20557 device_context.cc:244] device: 0, cuDNN Version: 7.4.
Load pretraining parameters from /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params.
I1124 22:18:00.329705 20557 parallel_executor.cc:423] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
I1124 22:18:00.433110 20557 build_strategy.cc:364] SeqOnlyAllReduceOps:0, num_trainers:1
I1124 22:18:00.579540 20557 parallel_executor.cc:287] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I1124 22:18:00.645557 20557 parallel_executor.cc:370] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py:773: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "run_classifier.py", line 428, in <module>
    main(args)
  File "run_classifier.py", line 331, in main
    outputs = exe.run(train_compiled_program, fetch_list=fetch_list)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 774, in run
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 769, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 828, in _run_impl
    return_numpy=return_numpy)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 668, in _run_parallel
    tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::operators::ReadOp::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
3   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
4   paddle::framework::details::ComputationOpHandle::RunImpl()
5   paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
6   paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
7   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
8   std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
9   ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 2479, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 424, in _init_non_iterable
    outputs={'Out': self._feed_list})
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 331, in __init__
    self._init_non_iterable()
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/reader.py", line 258, in from_generator
    return_list)
  File "/home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/model/classifier.py", line 45, in create_model
    feed_list=inputs, capacity=50, iterable=False)
  File "run_classifier.py", line 211, in main
    num_labels=num_labels)
  File "run_classifier.py", line 428, in <module>
    main(args)

----------------------
Error Message Summary:
----------------------
Error: The feeded Variable input_mask should have dimensions = 3, shape = [-1, 128, 1], but received feeded shape [32, 125, 1]
  [Hint: Expected DimensionIsCompatibleWith(shapes[i], in_dims) == true, but received DimensionIsCompatibleWith(shapes[i], in_dims):0 != true:1.] at (/paddle/paddle/fluid/operators/reader/read_op.cc:133)
  [operator < read > error]

fp16报错如下:

-----------  Configuration Arguments -----------
batch_size: 32
bert_config_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/bert_config.json
checkpoints: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/save
data_dir: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/data
decr_every_n_nan_or_inf: 2
decr_ratio: 0.8
do_lower_case: True
do_test: True
do_train: True
do_val: True
enable_ce: False
epoch: 2
in_tokens: False
incr_every_n_steps: 1000
incr_ratio: 2.0
init_checkpoint: None
init_loss_scaling: 4294967296
init_pretraining_params: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params
learning_rate: 5e-05
lr_scheduler: linear_warmup_decay
max_seq_len: 128
num_iteration_per_drop_scope: 1
random_seed: 1
save_steps: 1000
shuffle: True
skip_steps: 100
task_name: XNLI
use_cuda: True
use_dynamic_loss_scaling: True
use_fast_executor: False
use_fp16: True
validation_steps: 1000
verbose: False
vocab_path: /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/vocab.txt
warmup_proportion: 0.1
weight_decay: 0.01
------------------------------------------------
attention_probs_dropout_prob: 0.1
directionality: bidi
hidden_act: gelu
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
pooler_fc_size: 768
pooler_num_attention_heads: 12
pooler_num_fc_layers: 3
pooler_size_per_head: 128
pooler_type: first_token_transform
type_vocab_size: 2
vocab_size: 21128
------------------------------------------------
Device count: 1
Num train examples: 392702
Max train steps: 24543
Num warmup steps: 2454
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'dtype' in fluid.embedding only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in cast only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in dropout only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'y' in matmul only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in matmul only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in fc only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in reshape only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in transpose only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in softmax only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in mean only support float16 in GPU now. 
  (input_name, op_name, extra_message))
/usr/local/lib/python2.7/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in accuracy only support float16 in GPU now. 
  (input_name, op_name, extra_message))
W1124 23:13:03.476914 21451 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W1124 23:13:03.490574 21451 device_context.cc:244] device: 0, cuDNN Version: 7.4.
Load pretraining parameters from /home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/chinese_L-12_H-768_A-12/params.
Cast parameters to float16 data format.
I1124 23:13:08.745292 21451 parallel_executor.cc:423] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
I1124 23:13:08.883260 21451 build_strategy.cc:364] SeqOnlyAllReduceOps:0, num_trainers:1
I1124 23:13:09.099370 21451 parallel_executor.cc:287] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I1124 23:13:09.196668 21451 parallel_executor.cc:370] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py:773: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "run_classifier.py", line 428, in <module>
    main(args)
  File "run_classifier.py", line 331, in main
    outputs = exe.run(train_compiled_program, fetch_list=fetch_list)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 774, in run
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 769, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 828, in _run_impl
    return_numpy=return_numpy)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 668, in _run_parallel
    tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::platform::float16 const* paddle::framework::Tensor::data<paddle::platform::float16>() const
3   void paddle::operators::ElementwiseComputeEx<paddle::operators::MulFunctor<paddle::platform::float16, void>, paddle::platform::CUDADeviceContext, paddle::platform::float16, paddle::platform::float16>(paddle::framework::ExecutionContext const&, paddle::framework::Tensor const*, paddle::framework::Tensor const*, int, paddle::operators::MulFunctor<paddle::platform::float16, void>, paddle::framework::Tensor*)
4   void paddle::operators::default_elementwise_mul<paddle::platform::CUDADeviceContext, paddle::platform::float16>(paddle::framework::ExecutionContext const&, paddle::framework::Tensor const*, paddle::framework::Tensor const*, paddle::framework::Tensor*)
5   paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16>::Compute(paddle::framework::ExecutionContext const&) const
6   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 4ul, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, long>, paddle::operators::ElementwiseMulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
7   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
8   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
9   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
10  paddle::framework::details::ComputationOpHandle::RunImpl()
11  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
12  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
13  std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
14  std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
15  ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 2479, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layers/math_op_patch.py", line 239, in __impl__
    attrs={'axis': axis})
  File "/home/crim/benchmark/models/PaddleNLP/PaddleLARK/BERT/optimization.py", line 153, in optimization
    param.name] * weight_decay * scheduled_lr
  File "run_classifier.py", line 227, in main
    decr_ratio=args.decr_ratio)
  File "run_classifier.py", line 428, in <module>
    main(args)

----------------------
Error Message Summary:
----------------------
InvalidArgumentError: Tensor holds the wrong type, it holds float, but desires to be ::paddle::platform::float16.
  [Hint: Expected valid == true, but received valid:0 != true:1.] at (/paddle/paddle/fluid/framework/tensor_impl.h:33)
  [operator < elementwise_mul > error]
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/models#3975
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7