Paddle1.5多卡测试崩溃
已关闭
Paddle1.5多卡测试崩溃
Created by: dipthomas
- 版本、环境信息: 1)PaddlePaddle版本:Paddle1.5源代码编译 2)GPU:K40,CUDA8.0,CUDNN7.1.3,NCCL 2.2.13 3)系统环境:ubuntu16.04,gcc5.4
- 安装方式信息: 1)本地编译:
cmake .. -DCMAKE_INSTALL_PREFIX=output \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_TESTING=OFF \
-DWITH_PYTHON=ON \
-DWITH_MKL=OFF \
-DWITH_GPU=ON \
-DWITH_FLUID_ONLY=ON \
-DWITH_DISTRIBUTE=ON \
-DPYTHON_EXECUTABLE=/root/Python-2.7.16/build/bin/python \
-DPYTHON_INCLUDE_DIR=$PYTHON_ROOT/include/python2.7 \
-DPYTHON_LIBRARY=$PYTHON_ROOT/lib/libpython2.7.so \
-DPYTHON_NMPY_INCLUDE_DIR=$PYTHON_ROOT/lib/python2.7/site-packages/numpy/core/include
- 崩溃信息 安装完成后,调用
import paddle.fluid
paddle.fluid.install_check.run_check()
Created by: JiabinYang
看起来是nccl2的安装有问题,我在您的这个配置下试了下没有问题:
# A image for building paddle binaries and install # Use cuda devel base image for both cpu and gpu environment # When you modify it, please be aware of cudnn-runtime version # and libcudnn.so.x in paddle/scripts/docker/build.sh FROM nvidia/cuda:8.0-cudnn7-devel-ubuntu16.04 # FROM nvidia/cuda:8.0-cudnn7-devel-ubuntu14.04 MAINTAINER PaddlePaddle Authors <paddle-dev@baidu.com> ARG UBUNTU_MIRRO # ENV variables ARG WITH_GPU ARG WITH_AVX ARG WITH_DOC #ENV WOBOQ OFF ENV WITH_GPU=${WITH_GPU:-ON} # TODO: For CPU version, change this to ENV WITH_GPU=${WITH_GPU:-OFF} ENV WITH_AVX=${WITH_AVX:-ON} ENV WITH_DOC=${WITH_DOC:-OFF} ENV HOME /root RUN apt-get update WORKDIR /usr/bin RUN apt install -y gcc-4.8 g++-4.8 RUN cp gcc gcc.bak RUN cp g++ g++.bak RUN rm gcc RUN rm g++ RUN ln -s gcc-4.8 gcc RUN ln -s g++-4.8 g++ WORKDIR /home RUN apt-get install -y python-dev python-pip wget vim git # install cmake WORKDIR /home RUN wget https://cmake.org/files/v3.4/cmake-3.4.0-Linux-x86_64.tar.gz RUN tar -xvf cmake-3.4.0-Linux-x86_64.tar.gz ENV PATH=/home/cmake-3.4.0-Linux-x86_64/bin:$PATH # RUN echo "/home/cmake-3.4.3-Linux-x86_64/bin:$PATH" >> ~/.bashrc # install python2.7.15 RUN apt-get install -y build-essential checkinstall RUN apt-get install -y libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev ENV version=2.7.15 RUN wget https://www.python.org/ftp/python/$version/Python-$version.tgz RUN tar -xvf Python-$version.tgz WORKDIR /home/Python-$version RUN ./configure RUN make && make install WORKDIR /home RUN wget https://files.pythonhosted.org/packages/d3/3e/1d74cdcb393b68ab9ee18d78c11ae6df8447099f55fe86ee842f9c5b166c/setuptools-40.0.0.zip RUN apt-get -y install unzip RUN unzip setuptools-40.0.0.zip WORKDIR /home/setuptools-40.0.0 RUN python setup.py build RUN python setup.py install WORKDIR /home/Python-2.7.15 RUN ./configure RUN make && make install WORKDIR /home/setuptools-40.0.0 RUN python setup.py build RUN python setup.py install WORKDIR /home RUN wget https://files.pythonhosted.org/packages/ae/e8/2340d46ecadb1692a1e455f13f75e596d4eab3d11a57446f08259dee8f02/pip-10.0.1.tar.gz RUN tar -zxvf pip-10.0.1.tar.gz WORKDIR pip-10.0.1 RUN python setup.py install WORKDIR /home RUN rm /usr/bin/pip RUN ln -s /usr/local/bin/python2.7 /usr/bin/pip # install && config vitualenv and virtualenvwrapper RUN python -m pip install virtualenv virtualenvwrapper RUN mkdir $HOME/.virtualenvs ENV WORKON_HOME=$HOME/.virtualenvs RUN echo "WORKON_HOME=$HOME/.virtualenvs" >> ~/.bashrc RUN echo "source /usr/local/bin/virtualenvwrapper.sh" >> ~/.bashrc RUN /bin/bash -c "source /usr/local/bin/virtualenvwrapper.sh\ && mkvirtualenv paddle-venv\ && wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb \ && dpkg -i nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb\ && apt-get install -y --allow-downgrades --allow-change-held-packages libnccl2=2.2.13-1+cuda8.0 libnccl-dev=2.2.13-1+cuda8.0\ && apt install -y swig patchelf\ && python -m pip install numpy protobuf wheel\ && python -m pip install --upgrade setuptools\ && git clone https://github.com/PaddlePaddle/Paddle.git\ && cd Paddle\ && git checkout develop\ && mkdir build && cd build\ && cmake .. -DWITH_FLUID_ONLY=ON -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Auto\ && make -j$(nproc)" CMD source ~/.bashrc
请注册或登录再回复