Missing use_task_id in pre-trained ERNIE-en-base
Created by: winston-zillow
Download https://ernie.bj.bcebos.com/ERNIE_Base_en_stable-2.0.0.tar.gz
, uncompress and the ernie_config.json
does not have a value for use_task_id
.
{
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"sent_type_vocab_size": 4,
"task_type_vocab_size": 16,
"vocab_size": 30522
}
As a consequence, when attempting to save the model using save_inference_model
(so that I can convert it to ONNX using the paddle2onnx tool), it would complain the task id input is missing after pruning the model for inference inside save_inference_model
as shown in my debug outputs by modifying the save method locally:
cloned 2 -> read_file_0.tmp_0 VarType.LOD_TENSOR False
cloned 6 -> _input_reader_reader VarType.READER True
cloned 32 -> read_file_0.tmp_6 VarType.LOD_TENSOR False
cloned 41 -> read_file_0.tmp_5 VarType.LOD_TENSOR False
cloned 44 -> read_file_0.tmp_4 VarType.LOD_TENSOR False
cloned 63 -> read_file_0.tmp_1 VarType.LOD_TENSOR False
cloned 219 -> read_file_0.tmp_3 VarType.LOD_TENSOR False
cloned 374 -> read_file_0.tmp_2 VarType.LOD_TENSOR False
pruned 116 -> read_file_0.tmp_4 VarType.LOD_TENSOR False
pruned 117 -> read_file_0.tmp_0 VarType.LOD_TENSOR False
pruned 129 -> read_file_0.tmp_1 VarType.LOD_TENSOR False
pruned 718 -> read_file_0.tmp_2 VarType.LOD_TENSOR False
For your reference, my model saving codes look like
# cloned and modified from `ernie/finetune/classifier`
def create_model(args,
pyreader_name,
ernie_config,
is_prediction=False,
task_name="",
is_classify=False,
is_regression=False,
ernie_version="1.0"):
...
(src_ids, sent_ids, pos_ids, task_ids, input_mask, labels,
qids) = fluid.layers.read_file(pyreader)
ernie = ErnieModel(
...
cls_feats = ernie.get_pooled_output()
inputs = (src_ids, sent_ids, pos_ids, task_ids, input_mask)
return inputs, cls_feats, pyreader, (labels, qids)
# imitating `run_classifier.py`
startup_prog = fluid.Program()
train_program = fluid.Program()
with fluid.program_guard(train_program, startup_prog):
with fluid.unique_name.guard():
inputs, ernie_latent, train_pyreader, _ = create_model(
args,
pyreader_name='input_reader',
ernie_config=ernie_config,
is_classify=args.is_classify,
is_regression=args.is_regression)
# exe.run(startup_prog) . # i think there is no need to run this
init_pretraining_params(
exe,
args.init_pretraining_params,
main_program=startup_prog,
use_fp16=args.use_fp16)
fluid.io.save_inference_model(dirname="/ernie_onnx_path",
feeded_var_names=[var.name for var in inputs],
target_vars=[ernie_latent],
main_program=train_program,
executor=exe)
After I set use_task_id
to true
in the config file, saving the inference model will work. Please confirm if that's the right fix (and fix the pre-trained archive.)