int8 unit-test fails on 6148 machine (#21594) · Issue · PaddlePaddle / Paddle

int8 unit-test fails on 6148 machine

Created by: luotao1

The PR_CI_Coverage is 5117 machine, and we add 6148 machine for nightly jobs.

PR_CI_Manylinux_Coverage

This job use cmake .. -DWITH_GPU=ON.

http://ci.paddlepaddle.org/viewLog.html?buildId=238851&buildTypeId=Paddle_PaddleManylinux_PrCiManylinuxCoverage

test_analyzer_int8_vgg16 (OTHER_FAULT)

18:14:53]	[Step 1/1] I1205 18:14:37.423142 107913 analysis_predictor.cc:475] ======= optimize end =======
[18:14:53]	[Step 1/1] --- Running warmup iteration for quantization
[18:14:53]	[Step 1/1] W1205 18:14:37.528687 107913 naive_executor.cc:45] The NaiveExecutor can not work properly if the cmake flag ON_INFER is not set.
[18:14:53]	[Step 1/1] W1205 18:14:37.528708 107913 naive_executor.cc:47] Unlike the training phase, all the scopes and variables will be reused to save the allocation overhead.
[18:14:53]	[Step 1/1] W1205 18:14:37.528713 107913 naive_executor.cc:50] Please re-compile the inference library by setting the cmake flag ON_INFER=ON if you are running Paddle Inference

test_qat_int8_vgg19_mkldnn(Failed)

[18:19:10]	[Step 1/1] 94/98 Test #825: test_qat_int8_vgg19_mkldnn ................***Failed   23.54 sec
[18:19:10]	[Step 1/1] WARNING: OMP_NUM_THREADS set to 4, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
[18:19:10]	[Step 1/1] PLEASE USE OMP_NUM_THREADS WISELY.
[18:19:10]	[Step 1/1] 2019-12-05 18:18:49,599-INFO: QAT FP32 & INT8 prediction run.
[18:19:10]	[Step 1/1] 2019-12-05 18:18:49,599-INFO: QAT model: /root/.cache/inference_demo/int8v2/VGG19_QAT/model
[18:19:10]	[Step 1/1] 2019-12-05 18:18:49,599-INFO: Dataset: /root/.cache/inference_demo/int8v2/data.bin
[18:19:10]	[Step 1/1] 2019-12-05 18:18:49,599-INFO: Batch size: 25
[18:19:10]	[Step 1/1] 2019-12-05 18:18:49,599-INFO: Batch number: 2
[18:19:10]	[Step 1/1] 2019-12-05 18:18:49,599-INFO: Accuracy drop threshold: 0.1.
[18:19:10]	[Step 1/1] 2019-12-05 18:18:49,599-INFO: --- QAT FP32 prediction start ---
[18:19:10]	[Step 1/1] Child killed

test_analyzer_qat_performance_benchmark(OTHER_FAULT)

[18:15:12]	[Step 1/1] W1205 18:14:41.332576 109699 naive_executor.cc:47] Unlike the training phase, all the scopes and variables will be reused to save the allocation overhead.
[18:15:12]	[Step 1/1] W1205 18:14:41.332579 109699 naive_executor.cc:50] Please re-compile the inference library by setting the cmake flag ON_INFER=ON if you are running Paddle Inference
[18:15:12]	[Step 1/1] W1205 18:15:05.995213 109699 naive_executor.cc:45] The NaiveExecutor can not work properly if the cmake flag ON_INFER is not set.
[18:15:12]	[Step 1/1] W1205 18:15:05.996029 109699 naive_executor.cc:47] Unlike the training phase, all the scopes and variables will be reused to save the allocation overhead.
[18:15:12]	[Step 1/1] W1205 18:15:05.996034 109699 naive_executor.cc:50] Please re-compile the inference library by setting the cmake flag ON_INFER=ON if you are running Paddle Inference

test_analyzer_int8_mobilenet_ssd(OTHER_FAULT)

[18:14:52]	[Step 1/1] --- Running analysis [ir_graph_to_program_pass]
[18:14:52]	[Step 1/1] I1205 18:14:36.681308 109630 analysis_predictor.cc:475] ======= optimize end =======
[18:14:52]	[Step 1/1] I1205 18:14:36.682399 109630 tester_helper.h:376] Thread 0, number of threads 1, run 1 times...
[18:14:52]	[Step 1/1] W1205 18:14:37.378950 109630 naive_executor.cc:45] The NaiveExecutor can not work properly if the cmake flag ON_INFER is not set.
[18:14:52]	[Step 1/1] W1205 18:14:37.378979 109630 naive_executor.cc:47] Unlike the training phase, all the scopes and variables will be reused to save the allocation overhead.
[18:14:52]	[Step 1/1] W1205 18:14:37.378983 109630 naive_executor.cc:50] Please re-compile the inference library by setting the cmake flag ON_INFER=ON if you are running Paddle Inference

PR_CI_Manylinux_Coverage_CPU

This job use cmake .. -DWITH_GPU=OFF. http://ci.paddlepaddle.org/viewLog.html?buildId=238814&buildTypeId=Paddle_PaddleManylinux_PrCiManylinuxCoverageCpu

test_qat_int8_resnet101_mkldnn

[17:12:47]	[Step 1/1] 2019-12-05 17:11:55,736-INFO: --- QAT FP32 prediction start ---
[17:12:47]	[Step 1/1] 2019-12-05 17:12:09,631-INFO: batch 1, acc1: 0.9200, acc5: 0.9600, latency: 496.9521 ms, fps: 2.01
[17:12:47]	[Step 1/1] 2019-12-05 17:12:18,439-INFO: batch 2, acc1: 0.7200, acc5: 0.9200, latency: 338.0427 ms, fps: 2.96
[17:12:47]	[Step 1/1] 2019-12-05 17:12:18,725-INFO: Total inference run time: 21.55 s
[17:12:47]	[Step 1/1] 2019-12-05 17:12:18,821-INFO: --- QAT INT8 prediction start ---
[17:12:47]	[Step 1/1] 2019-12-05 17:12:43,925-INFO: batch 1, acc1: 0.8400, acc5: 1.0000, latency: 175.9410 ms, fps: 5.68
[17:12:47]	[Step 1/1] W1205 17:12:46.972718 137610 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly
[17:12:47]	[Step 1/1] W1205 17:12:46.972774 137610 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
[17:12:47]	[Step 1/1] W1205 17:12:46.972779 137610 init.cc:214] The detail failure signal is:
[17:12:47]	[Step 1/1] 
[17:12:47]	[Step 1/1] W1205 17:12:46.972784 137610 init.cc:217] *** Aborted at 1575565966 (unix time) try "date -d @1575565966" if you are using GNU date ***
[17:12:47]	[Step 1/1] W1205 17:12:46.974493 137610 init.cc:217] PC: @                0x0 (unknown)
[17:12:47]	[Step 1/1] W1205 17:12:46.974908 137610 init.cc:217] *** SIGSEGV (@0x7f4dd02c2378) received by PID 137610 (TID 0x7f4e2e5c0700) from PID 18446744072907137912; stack trace: ***
[17:12:47]	[Step 1/1] W1205 17:12:46.975927 137610 init.cc:217]     @     0x7f4e2dd7c390 (unknown)
[17:12:47]	[Step 1/1] W1205 17:12:46.975975 137610 init.cc:217]     @     0x7f4dd02c2378 (unknown)
[17:12:47]	[Step 1/1] Segmentation fault
[17:12:47]	[Step 1/1] 
[17:12:47]	[Step 1/1] 
[17:12:47]	[Step 1/1] 99% tests passed, 1 tests failed out of 92
[17:12:47]	[Step 1/1] 
[17:12:47]	[Step 1/1] Total Test time (real) = 153.34 sec

PaddlePaddle / Paddle 大约 1 年 前同步成功

int8 unit-test fails on 6148 machine

PR_CI_Manylinux_Coverage

PR_CI_Manylinux_Coverage_CPU

PaddlePaddle / Paddle
大约 1 年前同步成功