How to correctly run transformer? (#1059) · Issue · PaddlePaddle / models

How to correctly run transformer?

Created by: sfraczek

Hi,

I have encountered a number of problems with fluid/neural_machine_translation/transformer model. Am I doing something wrong? How to correctly run it?

Steps I have taken

Following instructions in https://github.com/PaddlePaddle/models/blob/develop/fluid/neural_machine_translation/transformer/README_cn.md I have downloaded WMT'16 EN-DE from https://github.com/google/seq2seq/blob/master/docs/data.md by clicking download.

Next I extracted it to wmt16_en_de directory.

Next I did paste -d ' \ t ' train.tok.clean.bpe.32000.en train.tok.clean.bpe.32000.de > train.tok.clean.bpe.32000.en-de

Then I did sed -i '1i\<s>\n<e>\n<unk>' vocab.bpe.32000

in config.py I changed use_gpu = True to False. In train.py I added import multiprocessing and changed dev_count = fluid.core.get_cuda_device_count() to dev_count = fluid.core.get_cuda_device_count() if TrainTaskConfig.use_gpu else multiprocessing.cpu_count().

Training

I launched training by python -u train.py --src_vocab_fpath wmt16_en_de/vocab.bpe.32000 --trg_vocab_fpath wmt16_en_de/vocab.bpe.32000 --special_token '<s>' '<e>' '<unk>' --train_file_pattern wmt16_en_de/train.tok.clean.bpe.32000.en-de --use_token_batch True --batch_size 3200 --sort_type pool --pool_size 200000

but I got

E0719 14:26:29.439303 55138 graph.cc:43] softmax_with_cross_entropy_grad input var not in all_var list: softmax_with_cross_entropy_0.tmp_0@GRAD
epoch: 0, consumed 0.000161s
Traceback (most recent call last):
  File "train.py", line 428, in <module>
    train(args)
  File "train.py", line 419, in train
    "pass_" + str(pass_id) + ".checkpoint"))
  File "/home/sfraczek/Paddle/build/python/paddle/fluid/io.py", line 288, in save_persistables
    filename=filename)
  File "/home/sfraczek/Paddle/build/python/paddle/fluid/io.py", line 166, in save_vars
    filename=filename)
  File "/home/sfraczek/Paddle/build/python/paddle/fluid/io.py", line 197, in save_vars
    executor.run(save_program)
  File "/home/sfraczek/Paddle/build/python/paddle/fluid/executor.py", line 449, in run
    self.executor.run(program.desc, scope, 0, True, True)
paddle.fluid.core.EnforceNotMet: holder_ should not be null
Tensor not initialized yet when Tensor::type() is called. at [/home/sfraczek/Paddle/paddle/fluid/framework/tensor.h:139]
PaddlePaddle Call Stacks:
0       0x7f060e948f1cp paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 572
1       0x7f060e94b901p paddle::framework::Tensor::type() const + 209
2       0x7f060f617bf6p paddle::operators::SaveOp::SaveLodTensor(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_,
boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::va
riant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> con
st&, paddle::framework::Variable*) const + 614
3       0x7f060f618472p paddle::operators::SaveOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boos
t::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::varian
t::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::
detail::variant::void_> const&) const + 210

So I have commented out

#fluid.io.save_persistables(
#    exe,
#    os.path.join(TrainTaskConfig.ckpt_dir,
#                 "pass_" + str(pass_id) + ".checkpoint"))

and it worked.

Inference

So next I have tried to run inference. I have found that the file wmt16_en_de/newstest2013.tok.bpe.32000.en-de doesn't exist but based on the README I guessed that I should run paste -d ' \ t ' newstest2013.tok.bpe.32000.en newstest2013.tok.bpe.32000.de > newstest2013.tok.bpe.32000.en-de is this correct?

python -u infer.py --src_vocab_fpath wmt16_en_de/vocab.bpe.32000 --trg_vocab_fpath wmt16_en_de/vocab.bpe.32000 --special_token '<s>' '<e>' '<unk>' --test_file_pattern wmt16_en_de/newstest2013.tok.bpe.32000.en-de --batch_size 4 model_path trained_models/pass_20.infer.model beam_size 5 but there was no ouptut from the script. It ended without error too.

I tried giving other files but it doesn't output anything either.

I added profiling by adding import paddle.fluid.profiler as profiler and

+    parser.add_argument(
+        "--profile",
+        type=bool,
+        default=False,
+        help="Enables/disables profiling.")

and

+    if args.profile:
+        with profiler.profiler("CPU", sorted_key='total') as cpuprof:
+            infer(args)
+    else:
+        infer(args)

But there is no output from the profile.

Please help.

PaddlePaddle / models 大约 2 年 前同步成功

How to correctly run transformer?

Steps I have taken

Training

Inference

PaddlePaddle / models
大约 2 年前同步成功