ngraph unit-test fails on 6148 machine
Created by: luotao1
The PR_CI_Coverage is 5117 machine, and we add 6148 machine for nightly jobs.
PR_CI_Manylinux_Coverage
This job use cmake .. -DWITH_GPU=ON
.
- test_batch_norm_ngraph_op
[18:19:16] [Step 1/1] 89/98 Test #761: test_batch_norm_ngraph_op .....................***Failed 15.97 sec
[18:19:16] [Step 1/1] W1205 18:19:07.457509 176318 executor.cc:67] FLAGS_use_ngraph=True, garbage collection strategy is disabled in Executor
[18:19:16] [Step 1/1] W1205 18:19:11.907506 176318 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
[18:19:16] [Step 1/1] W1205 18:19:11.943367 176318 device_context.cc:244] device: 0, cuDNN Version: 7.3.
[18:19:16] [Step 1/1] test_batch_norm_ngraph_op failed
[18:19:16] [Step 1/1] E.
[18:19:16] [Step 1/1] ======================================================================
[18:19:16] [Step 1/1] ERROR: test_check_output (paddle.fluid.tests.unittests.test_batch_norm_op.TestBatchNormOpInference)
[18:19:16] [Step 1/1] ----------------------------------------------------------------------
[18:19:16] [Step 1/1] Traceback (most recent call last):
[18:19:16] [Step 1/1] File "/paddle/build/python/paddle/fluid/tests/unittests/test_batch_norm_op.py", line 282, in test_check_output
[18:19:16] [Step 1/1] [2, 3, 4, 5])
[18:19:16] [Step 1/1] File "/paddle/build/python/paddle/fluid/tests/unittests/test_batch_norm_op.py", line 221, in check_with_place
[18:19:16] [Step 1/1] place)
[18:19:16] [Step 1/1] File "/paddle/build/python/paddle/fluid/tests/unittests/test_batch_norm_op.py", line 152, in create_or_get_tensor
[18:19:16] [Step 1/1] tensor.set(var, place)
[18:19:16] [Step 1/1] RuntimeError:
[18:19:16] [Step 1/1]
[18:19:16] [Step 1/1] --------------------------------------------
[18:19:16] [Step 1/1] C++ Call Stacks (More useful to developers):
[18:19:16] [Step 1/1] --------------------------------------------
[18:19:16] [Step 1/1] 0 std::string paddle::platform::GetTraceBackString<std::string>(std::string&&, char const*, int)
[18:19:16] [Step 1/1] 1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
[18:19:16] [Step 1/1] 2 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
[18:19:16] [Step 1/1] 3 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
[18:19:16] [Step 1/1] 4 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
[18:19:16] [Step 1/1] 5 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
[18:19:16] [Step 1/1]
[18:19:16] [Step 1/1] ----------------------
[18:19:16] [Step 1/1] Error Message Summary:
[18:19:16] [Step 1/1] ----------------------
[18:19:16] [Step 1/1] ResourceExhaustedError:
[18:19:16] [Step 1/1]
[18:19:16] [Step 1/1] Out of memory error on GPU 0. Cannot allocate 480.000000B memory on GPU 0, available memory is only 0.000000B.
[18:19:16] [Step 1/1]
[18:19:16] [Step 1/1] Please check whether there is any other process using GPU 0.
[18:19:16] [Step 1/1] 1. If yes, please stop them, or start PaddlePaddle on another GPU.
[18:19:16] [Step 1/1] 2. If no, please decrease the batch size of your model.
[18:19:16] [Step 1/1] at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:58)
[18:19:16] [Step 1/1]
[18:19:16] [Step 1/1]
[18:19:16] [Step 1/1] ----------------------------------------------------------------------
[18:19:16] [Step 1/1] Ran 2 tests in 11.102s
[18:19:16] [Step 1/1]
[18:19:16] [Step 1/1] FAILED (errors=1)
[18:19:16] [Step 1/1]
[18:19:16] [Step 1/1] op test forward passed: CPUPlace NCHW
[18:19:16] [Step 1/1] op test forward passed: CPUPlace NHWC
[18:19:16] [Step 1/1] op test forward passed: CUDAPlace(0) NCHW
[18:19:16] [Step 1/1] op test forward passed: CUDAPlace(0) NHWC
- test_mean_ngraph_op
[18:19:11] [Step 1/1] 87/98 Test #777: test_mean_ngraph_op ...........................***Failed 3.94 sec
[18:19:11] [Step 1/1] Traceback (most recent call last):
[18:19:11] [Step 1/1] File "/paddle/tools/test_runner.py", line 20, in <module>
[18:19:11] [Step 1/1] import paddle.fluid as fluid
[18:19:11] [Step 1/1] File "/paddle/build/python/paddle/__init__.py", line 30, in <module>
[18:19:11] [Step 1/1] import paddle.dataset
[18:19:11] [Step 1/1] File "/paddle/build/python/paddle/dataset/__init__.py", line 25, in <module>
[18:19:11] [Step 1/1] import paddle.dataset.sentiment
[18:19:11] [Step 1/1] File "/paddle/build/python/paddle/dataset/sentiment.py", line 29, in <module>
[18:19:11] [Step 1/1] import nltk
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/nltk/__init__.py", line 129, in <module>
[18:19:11] [Step 1/1] from nltk.collocations import *
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/nltk/collocations.py", line 40, in <module>
[18:19:11] [Step 1/1] from nltk.metrics import ContingencyMeasures, BigramAssocMeasures, TrigramAssocMeasures
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/nltk/metrics/__init__.py", line 16, in <module>
[18:19:11] [Step 1/1] from nltk.metrics.scores import (
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/nltk/metrics/scores.py", line 18, in <module>
[18:19:11] [Step 1/1] from scipy.stats.stats import betai
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/scipy/stats/__init__.py", line 367, in <module>
[18:19:11] [Step 1/1] from .stats import *
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/scipy/stats/stats.py", line 173, in <module>
[18:19:11] [Step 1/1] from . import distributions
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/scipy/stats/distributions.py", line 10, in <module>
[18:19:11] [Step 1/1] from ._distn_infrastructure import (entropy, rv_discrete, rv_continuous,
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 16, in <module>
[18:19:11] [Step 1/1] from scipy.misc import doccer
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/scipy/misc/__init__.py", line 100, in <module>
[18:19:11] [Step 1/1] from .pilutil import *
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 19, in <module>
[18:19:11] [Step 1/1] from PIL import Image, ImageFilter
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/PIL/Image.py", line 151, in <module>
[18:19:11] [Step 1/1] from pathlib2 import Path
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/pathlib2/__init__.py", line 52, in <module>
[18:19:11] [Step 1/1] from scandir import scandir as os_scandir
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/site-packages/scandir.py", line 452, in <module>
[18:19:11] [Step 1/1] libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/ctypes/util.py", line 274, in find_library
[18:19:11] [Step 1/1] return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/ctypes/util.py", line 103, in _findLib_gcc
[18:19:11] [Step 1/1] stdout=subprocess.PIPE)
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/subprocess.py", line 394, in __init__
[18:19:11] [Step 1/1] errread, errwrite)
[18:19:11] [Step 1/1] File "/usr/local/python2.7.15/lib/python2.7/subprocess.py", line 938, in _execute_child
[18:19:11] [Step 1/1] self.pid = os.fork()
[18:19:11] [Step 1/1] OSError: [Errno 12] Cannot allocate memory