diff --git a/doc/getstarted/basic_usage/index_cn.rst b/doc/getstarted/basic_usage/index_cn.rst deleted file mode 100644 index b473944fc7fb89d3e0a0b330933f2226734bb5bd..0000000000000000000000000000000000000000 --- a/doc/getstarted/basic_usage/index_cn.rst +++ /dev/null @@ -1,108 +0,0 @@ -经典的线性回归任务 -================== - -PaddlePaddle是源于百度的一个深度学习平台。这份简短的介绍将向你展示如何利用PaddlePaddle来解决一个经典的线性回归问题。 - -任务简介 --------- - -我们展示如何用PaddlePaddle解决 `单变量的线性回归 `_ 问题。线性回归的输入是一批点 `(x, y)` ,其中 `y = wx + b + ε`, 而 ε 是一个符合高斯分布的随机变量。线性回归的输出是从这批点估计出来的参数 `w` 和 `b` 。 - -一个例子是房产估值。我们假设房产的价格(y)是其大小(x)的一个线性函数,那么我们可以通过收集市场上房子的大小和价格,用来估计线性函数的参数w 和 b。 - -准备数据 ------------ - -假设变量 `x` 和 `y` 的真实关系为: `y = 2x + 0.3 + ε`,这里展示如何使用观测数据来拟合这一线性关系。首先,Python代码将随机产生2000个观测点,作为线性回归的输入。下面脚本符合PaddlePaddle期待的读取数据的Python程序的模式。 - -.. code-block:: python - - # dataprovider.py - from paddle.trainer.PyDataProvider2 import * - import random - - # 定义输入数据的类型: 2个浮点数 - @provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False) - def process(settings, input_file): - for i in xrange(2000): - x = random.random() - yield [x], [2*x+0.3] - -训练模型 ------------ - -为了还原 `y = 2x + 0.3`,我们先从一条随机的直线 `y' = wx + b` 开始,然后利用观测数据调整 `w` 和 `b` 使得 `y'` 和 `y` 的差距不断减小,最终趋于接近。这个过程就是模型的训练过程,而 `w` 和 `b` 就是模型的参数,即我们的训练目标。 - -在PaddlePaddle里,该模型的网络配置如下。 - -.. code-block:: python - - # trainer_config.py - from paddle.trainer_config_helpers import * - - # 1. 定义数据来源,调用上面的process函数获得观测数据 - data_file = 'empty.list' - with open(data_file, 'w') as f: f.writelines(' ') - define_py_data_sources2(train_list=data_file, test_list=None, - module='dataprovider', obj='process',args={}) - - # 2. 学习算法。控制如何改变模型参数 w 和 b - settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer()) - - # 3. 神经网络配置 - x = data_layer(name='x', size=1) - y = data_layer(name='y', size=1) - # 线性计算网络层: ȳ = wx + b - ȳ = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b')) - # 计算误差函数,即 ȳ 和真实 y 之间的距离 - cost = square_error_cost(input= ȳ, label=y) - outputs(cost) - - -这段简短的配置展示了PaddlePaddle的基本用法: - -- 第一部分定义了数据输入。一般情况下,PaddlePaddle先从一个文件列表里获得数据文件地址,然后交给用户自定义的函数(例如上面的 `process`函数)进行读入和预处理从而得到真实输入。本文中由于输入数据是随机生成的不需要读输入文件,所以放一个空列表(`empty.list`)即可。 - -- 第二部分主要是选择学习算法,它定义了模型参数改变的规则。PaddlePaddle提供了很多优秀的学习算法,这里使用一个基于momentum的随机梯度下降(SGD)算法,该算法每批量(batch)读取12个采样数据进行随机梯度计算来更新更新。 - -- 最后一部分是神经网络的配置。由于PaddlePaddle已经实现了丰富的网络层,所以很多时候你需要做的只是定义正确的网络层并把它们连接起来。这里使用了三种网络单元: - - - **数据层**:数据层 `data_layer` 是神经网络的入口,它读入数据并将它们传输到接下来的网络层。这里数据层有两个,分别对应于变量 `x` 和 `y`。 - - **全连接层**:全连接层 `fc_layer` 是基础的计算单元,这里利用它建模变量之间的线性关系。计算单元是神经网络的核心,PaddlePaddle支持大量的计算单元和任意深度的网络连接,从而可以拟合任意的函数来学习复杂的数据关系。 - - **回归误差代价层**:回归误差代价层 `square_error_cost` 是众多误差代价函数层的一种,它们在训练过程作为网络的出口,用来计算模型的误差,是模型参数优化的目标函数。 - -定义了网络结构并保存为 `trainer_config.py` 之后,运行以下训练命令: - -.. code-block:: bash - - paddle train --config=trainer_config.py --save_dir=./output --num_passes=30 - -PaddlePaddle将在观测数据集上迭代训练30轮,并将每轮的模型结果存放在 `./output` 路径下。从输出日志可以看到,随着轮数增加误差代价函数的输出在不断的减小,这意味着模型在训练数据上不断的改进,直到逼近真实解:` y = 2x + 0.3 ` - -模型检验 ------------ - -训练完成后,我们希望能够检验模型的好坏。一种常用的做法是用学习的模型对另外一组测试数据进行预测,评价预测的效果。在这个例子中,由于已经知道了真实答案,我们可以直接观察模型的参数是否符合预期来进行检验。 - -PaddlePaddle将每个模型参数作为一个numpy数组单独存为一个文件,所以可以利用如下方法读取模型的参数。 - -.. code-block:: python - - import numpy as np - import os - - def load(file_name): - with open(file_name, 'rb') as f: - f.read(16) # skip header for float type. - return np.fromfile(f, dtype=np.float32) - - print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b')) - # w=1.999743, b=0.300137 - -.. image:: ./parameters.png - :align: center - :scale: 80 % - -从图中可以看到,虽然 `w` 和 `b` 都使用随机值初始化,但在起初的几轮训练中它们都在快速逼近真实值,并且后续仍在不断改进,使得最终得到的模型几乎与真实模型一致。 - -这样,我们用PaddlePaddle解决了单变量线性回归问题, 包括数据输入、模型训练和最后的结果验证。 diff --git a/doc/getstarted/basic_usage/index_en.rst b/doc/getstarted/basic_usage/index_en.rst deleted file mode 100644 index 2cc438ebbe0f97345d25354b93b4ebbd43502415..0000000000000000000000000000000000000000 --- a/doc/getstarted/basic_usage/index_en.rst +++ /dev/null @@ -1,101 +0,0 @@ -Simple Linear Regression -======================== - -PaddlePaddle is a deep learning platform open-sourced by Baidu. With PaddlePaddle, you can easily train a classic neural network within a couple lines of configuration, or you can build sophisticated models that provide state-of-the-art performance on difficult learning tasks like sentiment analysis, machine translation, image caption and so on. - -Problem Background ------------------- - -Now, to give you a hint of what using PaddlePaddle looks like, let's start with a fundamental learning problem - `simple linear regression `_: you have observed a set of two-dimensional data points of ``X`` and ``Y``, where ``X`` is an explanatory variable and ``Y`` is corresponding dependent variable, and you want to recover the underlying correlation between ``X`` and ``Y``. Linear regression can be used in many practical scenarios. For example, ``X`` can be a variable about house size, and ``Y`` a variable about house price. You can build a model that captures relationship between them by observing real estate markets. - -Prepare the Data ------------------ - -Suppose the true relationship can be characterized as ``Y = 2X + 0.3``, let's see how to recover this pattern only from observed data. Here is a piece of python code that feeds synthetic data to PaddlePaddle. The code is pretty self-explanatory, the only extra thing you need to add for PaddlePaddle is a definition of input data types. - - .. code-block:: python - - # dataprovider.py - from paddle.trainer.PyDataProvider2 import * - import random - - # define data types of input: 2 real numbers - @provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False) - def process(settings, input_file): - for i in xrange(2000): - x = random.random() - yield [x], [2*x+0.3] - -Train a NeuralNetwork ----------------------- - -To recover this relationship between ``X`` and ``Y``, we use a neural network with one layer of linear activation units and a square error cost layer. Don't worry if you are not familiar with these terminologies, it's just saying that we are starting from a random line ``Y' = wX + b`` , then we gradually adapt ``w`` and ``b`` to minimize the difference between ``Y'`` and ``Y``. Here is what it looks like in PaddlePaddle: - - .. code-block:: python - - # trainer_config.py - from paddle.trainer_config_helpers import * - - # 1. read data. Suppose you saved above python code as dataprovider.py - data_file = 'empty.list' - with open(data_file, 'w') as f: f.writelines(' ') - define_py_data_sources2(train_list=data_file, test_list=None, - module='dataprovider', obj='process',args={}) - - # 2. learning algorithm - settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer()) - - # 3. Network configuration - x = data_layer(name='x', size=1) - y = data_layer(name='y', size=1) - y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b')) - cost = square_error_cost(input=y_predict, label=y) - outputs(cost) - -Some of the most fundamental usages of PaddlePaddle are demonstrated: - -- The first part shows how to feed data into PaddlePaddle. In general cases, PaddlePaddle reads raw data from a list of files, and then do some user-defined process to get real input. In this case, we only need to create a placeholder file since we are generating synthetic data on the fly. - -- The second part describes learning algorithm. It defines in what ways adjustments are made to model parameters. PaddlePaddle provides a rich set of optimizers, but a simple momentum based optimizer will suffice here, and it processes 12 data points each time. - -- Finally, the network configuration. It usually is as simple as "stacking" layers. Three kinds of layers are used in this configuration: - - **Data Layer**: a network always starts with one or more data layers. They provide input data to the rest of the network. In this problem, two data layers are used respectively for ``X`` and ``Y``. - - **FC Layer**: FC layer is short for Fully Connected Layer, which connects all the input units to current layer and does the actual computation specified as activation function. Computation layers like this are the fundamental building blocks of a deeper model. - - **Cost Layer**: in training phase, cost layers are usually the last layers of the network. They measure the performance of current model, and provide guidence to adjust parameters. - -Now that everything is ready, you can train the network with a simple command line call: - - .. code-block:: bash - - paddle train --config=trainer_config.py --save_dir=./output --num_passes=30 - - -This means that PaddlePaddle will train this network on the synthectic dataset for 30 passes, and save all the models under path ``./output``. You will see from the messages printed out during training phase that the model cost is decreasing as time goes by, which indicates we are getting a closer guess. - - -Evaluate the Model -------------------- - -Usually, a different dataset that left out during training phase should be used to evalute the models. However, we are lucky enough to know the real answer: ``w=2, b=0.3``, thus a better option is to check out model parameters directly. - -In PaddlePaddle, training is just to get a collection of model parameters, which are ``w`` and ``b`` in this case. Each parameter is saved in an individual file in the popular ``numpy`` array format. Here is the code that reads parameters from last pass. - - .. code-block:: python - - import numpy as np - import os - - def load(file_name): - with open(file_name, 'rb') as f: - f.read(16) # skip header for float type. - return np.fromfile(f, dtype=np.float32) - - print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b')) - # w=1.999743, b=0.300137 - - .. image:: parameters.png - :align: center - -Although starts from a random guess, you can see that value of ``w`` changes quickly towards 2 and ``b`` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer. - -There, you have recovered the underlying pattern between ``X`` and ``Y`` only from observed data. diff --git a/doc/getstarted/basic_usage/parameters.png b/doc/getstarted/basic_usage/parameters.png deleted file mode 100644 index 2ec67480951e21f0400bce1c34b3108dcd65c18c..0000000000000000000000000000000000000000 Binary files a/doc/getstarted/basic_usage/parameters.png and /dev/null differ diff --git a/doc/getstarted/build_and_install/build_from_source_cn.rst b/doc/getstarted/build_and_install/build_from_source_cn.rst new file mode 100644 index 0000000000000000000000000000000000000000..55665ac8edfcf20290936fba4c3e410b33e1f3d4 --- /dev/null +++ b/doc/getstarted/build_and_install/build_from_source_cn.rst @@ -0,0 +1,113 @@ +从源码编译PaddlePaddle +====================== + +.. _build_step: + +编译方法 +---------------- + +PaddlePaddle主要使用 `CMake `_ 以及GCC, G++作为编译工具。 +我们推荐您使用PaddlePaddle编译环境镜像完成编译,这样可以免去单独安装编译依赖的步骤,可选的不同编译环境 +可以在 `这里 `_ 找到。 +编译PaddlePaddle,需要执行: + +.. code-block:: bash + + git clone https://github.com/PaddlePaddle/Paddle.git + cd Paddle + # 如果使用Docker编译环境,执行下面的命令编译CPU-Only的二进制 + docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn5 bash -x paddle/scripts/docker/build.sh + # 如果不使用Docker编译环境,执行下面的命令 + mkdir build + cd build + cmake -DWITH_GPU=OFF -DWITH_TESTING=OFF .. + make + + +编译完成后会在build/python/dist目录下生成输出的whl包,可以选在在当前机器安装也可以拷贝到目标机器安装: + +.. code-block:: bash + + pip install python/dist/*.whl + + +.. _build_step: + +编译依赖 +---------------- + +PaddlePaddle编译需要使用到下面的依赖(包含但不限于),其他的依赖软件,会自动在编译时下载。 + +.. csv-table:: PaddlePaddle编译依赖 + :header: "依赖", "版本", "说明" + :widths: 10, 15, 30 + + "CMake", ">=3.5", "" + "GCC", "4.8.2", "推荐使用CentOS的devtools2" + "Python", "2.7.x", "依赖libpython2.7.so" + "pip", ">=9.0", "" + "numpy", "", "" + "SWIG", ">=2.0", "" + "Go", ">=1.8", "可选" + + +.. _build_options: + +编译选项 +---------------- + +PaddlePaddle的编译选项,包括生成CPU/GPU二进制文件、链接何种BLAS库等。 +用户可在调用cmake的时候设置它们,详细的cmake使用方法可以参考 +`官方文档 `_ 。 + +在cmake的命令行中,通过使用 ``-D`` 命令设置该类编译选项,例如: + +.. code-block:: bash + + cmake .. -DWITH_GPU=OFF + +.. csv-table:: 编译选项说明 + :header: "选项", "说明", "默认值" + :widths: 1, 7, 2 + + "WITH_GPU", "是否支持GPU", "ON" + "WITH_C_API", "是否仅编译CAPI", "OFF" + "WITH_DOUBLE", "是否使用双精度浮点数", "OFF" + "WITH_DSO", "是否运行时动态加载CUDA动态库,而非静态加载CUDA动态库。", "ON" + "WITH_AVX", "是否编译含有AVX指令集的PaddlePaddle二进制文件", "ON" + "WITH_PYTHON", "是否内嵌PYTHON解释器", "ON" + "WITH_STYLE_CHECK", "是否编译时进行代码风格检查", "ON" + "WITH_TESTING", "是否开启单元测试", "ON" + "WITH_DOC", "是否编译中英文文档", "OFF" + "WITH_SWIG_PY", "是否编译PYTHON的SWIG接口,该接口可用于预测和定制化训练", "Auto" + "WITH_GOLANG", "是否编译go语言的可容错parameter server", "ON" + "WITH_MKL", "是否使用MKL数学库,如果为否则是用OpenBLAS", "ON" + +BLAS ++++++ + +PaddlePaddle支持 `MKL `_ 和 +`OpenBlAS `_ 两种BLAS库。默认使用MKL。如果使用MKL并且机器含有AVX2指令集, +还会下载MKL-DNN数学库,详细参考 `这里 `_ 。 + +如果关闭MKL,则会使用OpenBLAS作为BLAS库。 + +CUDA/cuDNN ++++++++++++ + +PaddlePaddle在编译时/运行时会自动找到系统中安装的CUDA和cuDNN库进行编译和执行。 +使用参数 :code:`-DCUDA_ARCH_NAME=Auto` 可以指定开启自动检测SM架构,加速编译。 + +PaddlePaddle可以使用cuDNN v5.1之后的任何一个版本来编译运行,但尽量请保持编译和运行使用的cuDNN是同一个版本。 +我们推荐使用最新版本的cuDNN。 + +编译选项的设置 +++++++++++++++ + +PaddePaddle通过编译时指定路径来实现引用各种BLAS/CUDA/cuDNN库。cmake编译时,首先在系统路径( :code:`/usr/lib:/usr/local/lib` )中搜索这几个库,同时也会读取相关路径变量来进行搜索。 通过使用 ``-D`` 命令可以设置,例如 + +.. code-block:: bash + + cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCUDNN_ROOT=/opt/cudnnv5 + +**注意:这几个编译选项的设置,只在第一次cmake的时候有效。如果之后想要重新设置,推荐清理整个编译目录(** :code:`rm -rf` )**后,再指定。** diff --git a/doc/getstarted/build_and_install/build_from_source_en.md b/doc/getstarted/build_and_install/build_from_source_en.md deleted file mode 100644 index 2f1461489495618718d5abaeab9cbeda9b93700f..0000000000000000000000000000000000000000 --- a/doc/getstarted/build_and_install/build_from_source_en.md +++ /dev/null @@ -1,236 +0,0 @@ -Installing from Sources -========================== - -* [1. Download and Setup](#download) -* [2. Requirements](#requirements) -* [3. Build on Ubuntu](#ubuntu) -* [4. Build on Centos](#centos) - - -## Download and Setup -You can download PaddlePaddle from the [github source](https://github.com/PaddlePaddle/Paddle). - -```bash -git clone https://github.com/PaddlePaddle/Paddle paddle -cd paddle -``` -## Requirements - -To compile the source code, your computer must be equipped with the following dependencies. - -- **Compiler**: GCC >= 4.8 or Clang >= 3.3 (AppleClang >= 5.1) and gfortran compiler -- **CMake**: CMake >= 3.0 (at least CMake 3.4 on Mac OS X) -- **BLAS**: MKL, OpenBlas or ATLAS -- **Python**: only support Python 2.7 -- **Go** - -**Note:** For CUDA 7.0 and CUDA 7.5, GCC 5.0 and up are not supported! -For CUDA 8.0, GCC versions later than 5.3 are not supported! - -### Options - -PaddlePaddle supports some build options. - - - - - - - - - - - - - - - - - - - - - - - - - - -
OptionalDescription
WITH_GPUCompile PaddlePaddle with NVIDIA GPU
WITH_AVXCompile PaddlePaddle with AVX intrinsics
WITH_DSOCompile PaddlePaddle with dynamic linked CUDA
WITH_TESTINGCompile PaddlePaddle with unit testing
WITH_SWIG_PYCompile PaddlePaddle with inference api
WITH_STYLE_CHECKCompile PaddlePaddle with style check
WITH_PYTHONCompile PaddlePaddle with python interpreter
WITH_DOUBLECompile PaddlePaddle with double precision
WITH_RDMACompile PaddlePaddle with RDMA support
WITH_TIMERCompile PaddlePaddle with stats timer
WITH_PROFILERCompile PaddlePaddle with GPU profiler
WITH_DOCCompile PaddlePaddle with documentation
WITH_COVERAGECompile PaddlePaddle with code coverage
COVERALLS_UPLOADPackage code coverage data to coveralls
ON_TRAVISExclude special unit test on Travis CI
- - -**Note:** - - The GPU version works best with Cuda Toolkit 8.0 and cuDNN v5. - - Other versions like Cuda Toolkit 7.0, 7.5 and cuDNN v3, v4 are also supported. - - **To utilize cuDNN v5, Cuda Toolkit 7.5 is prerequisite and vice versa.** - -As a simple example, consider the following: - -1. **BLAS Dependencies(optional)** - - CMake will search BLAS libraries from the system. If not found, OpenBLAS will be downloaded, built and installed automatically. - To utilize preinstalled BLAS, you can simply specify MKL, OpenBLAS or ATLAS via `MKL_ROOT`, `OPENBLAS_ROOT` or `ATLAS_ROOT`. - - ```bash - # specify MKL - cmake .. -DMKL_ROOT= - # or specify OpenBLAS - cmake .. -DOPENBLAS_ROOT= - ``` - -2. **Doc Dependencies(optional)** - - To generate PaddlePaddle's documentation, install dependencies and set `-DWITH_DOC=ON` as follows: - - ```bash - pip install 'sphinx>=1.4.0' - pip install sphinx_rtd_theme recommonmark - - # install doxygen on Ubuntu - sudo apt-get install doxygen - # install doxygen on Mac OS X - brew install doxygen - - # active docs in cmake - cmake .. -DWITH_DOC=ON` - ``` - -## Build on Ubuntu 14.04 - -### Install Dependencies - -- **Paddle Dependencies** - - ```bash - # necessary - sudo apt-get update - sudo apt-get install -y git curl gcc g++ gfortran make build-essential automake - sudo apt-get install -y python python-pip python-numpy libpython-dev bison - sudo pip install 'protobuf==3.1.0.post1' - - # Install Go - # You can follow https://golang.org/doc/install for a detailed explanation. - wget -O go.tgz https://storage.googleapis.com/golang/go1.8.1.linux-amd64.tar.gz && \ - tar -C $HOME -xzf go.tgz && \ - mkdir $HOME/gopath && \ - rm go.tgz - - # Setup environment variables - export GOROOT=$HOME/go - export GOPATH=$HOME/gopath - export PATH=$PATH:$GOROOT/bin - - # install cmake 3.4 - curl -sSL https://cmake.org/files/v3.4/cmake-3.4.1.tar.gz | tar -xz && \ - cd cmake-3.4.1 && ./bootstrap && make -j4 && sudo make install && \ - cd .. && rm -rf cmake-3.4.1 - ``` - -- **GPU Dependencies (optional)** - - To build GPU version, you will need the following installed: - - 1. a CUDA-capable GPU - 2. A supported version of Linux with a GCC compiler and toolchain - 3. NVIDIA CUDA Toolkit (available at http://developer.nvidia.com/cuda-downloads) - 4. NVIDIA cuDNN Library (available at https://developer.nvidia.com/cudnn) - - The CUDA development environment relies on tight integration with the host development environment, - including the host compiler and C runtime libraries, and is therefore only supported on - distribution versions that have been qualified for this CUDA Toolkit release. - - After downloading cuDNN library, issue the following commands: - - ```bash - sudo tar -xzf cudnn-7.5-linux-x64-v5.1.tgz -C /usr/local - sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* - ``` - Then you need to set LD\_LIBRARY\_PATH, PATH environment variables in ~/.bashrc. - - ```bash - export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH - export PATH=/usr/local/cuda/bin:$PATH - ``` - -### Build and Install - -As usual, the best option is to create build folder under paddle project directory. - -```bash -mkdir build && cd build -``` - -Finally, you can build and install PaddlePaddle: - -```bash -# you can add build option here, such as: -cmake .. -DCMAKE_INSTALL_PREFIX= -# please use sudo make install, if you want to install PaddlePaddle into the system -make -j `nproc` && make install -# set PaddlePaddle installation path in ~/.bashrc -export PATH=/bin:$PATH -# install PaddlePaddle Python modules. -sudo pip install /opt/paddle/share/wheels/*.whl -``` - -## Build on Centos 7 - -### Install Dependencies - -- **CPU Dependencies** - - ```bash - # necessary - sudo yum update - sudo yum install -y epel-release - sudo yum install -y make cmake3 python-devel python-pip gcc-gfortran swig git - sudo pip install wheel numpy - sudo pip install 'protobuf>=3.0.0' - ``` - -- **GPU Dependencies (optional)** - - To build GPU version, you will need the following installed: - - 1. a CUDA-capable GPU - 2. A supported version of Linux with a GCC compiler and toolchain - 3. NVIDIA CUDA Toolkit (available at http://developer.nvidia.com/cuda-downloads) - 4. NVIDIA cuDNN Library (available at https://developer.nvidia.com/cudnn) - - The CUDA development environment relies on tight integration with the host development environment, - including the host compiler and C runtime libraries, and is therefore only supported on - distribution versions that have been qualified for this CUDA Toolkit release. - - After downloading cuDNN library, issue the following commands: - - ```bash - sudo tar -xzf cudnn-7.5-linux-x64-v5.1.tgz -C /usr/local - sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* - ``` - Then you need to set LD\_LIBRARY\_PATH, PATH environment variables in ~/.bashrc. - - ```bash - export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH - export PATH=/usr/local/cuda/bin:$PATH - ``` - -### Build and Install - -As usual, the best option is to create build folder under paddle project directory. - -```bash -mkdir build && cd build -``` - -Finally, you can build and install PaddlePaddle: - -```bash -# you can add build option here, such as: -cmake3 .. -DCMAKE_INSTALL_PREFIX= -# please use sudo make install, if you want to install PaddlePaddle into the system -make -j `nproc` && make install -# set PaddlePaddle installation path in ~/.bashrc -export PATH=/bin:$PATH -# install PaddlePaddle Python modules. -sudo pip install /opt/paddle/share/wheels/*.whl -``` diff --git a/doc/getstarted/build_and_install/build_from_source_en.rst b/doc/getstarted/build_and_install/build_from_source_en.rst new file mode 100644 index 0000000000000000000000000000000000000000..9a3ed7dd57137ddf3d6213222c17433822b01dbb --- /dev/null +++ b/doc/getstarted/build_and_install/build_from_source_en.rst @@ -0,0 +1,127 @@ +Build PaddlePaddle from Sources +========================== + +.. _build_step: + +How To Build +---------------- + +PaddlePaddle mainly uses `CMake `_ and GCC, G++ as compile +tools. We recommend you to use our pre-built Docker image to run the build +to avoid installing dependencies by yourself. We have several build environment +Docker images `here `_. +Then run: + +.. code-block:: bash + + git clone https://github.com/PaddlePaddle/Paddle.git + cd Paddle + # run the following command to build CPU-Only binaries if you are using docker + docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn5 bash -x paddle/scripts/docker/build.sh + # else run these commands + mkdir build + cd build + cmake -DWITH_GPU=OFF -DWITH_TESTING=OFF .. + make + +When the compile finishes, you can get the output whl package under +build/python/dist, then you can choose to install the whl on local +machine or copy it to the target machine. + +.. code-block:: bash + + pip install python/dist/*.whl + +.. _build_step: + +Compile Dependencies +---------------- + +PaddlePaddle need the following dependencies when compiling, other dependencies +will be downloaded automatically. + +.. csv-table:: PaddlePaddle Compile Dependencies + :header: "Dependency", "Version", "Description" + :widths: 10, 15, 30 + + "CMake", ">=3.5", "" + "GCC", "4.8.2", "Recommend devtools2 for CentOS" + "Python", "2.7.x", "Need libpython2.7.so" + "pip", ">=9.0", "" + "numpy", "", "" + "SWIG", ">=2.0", "" + "Go", ">=1.8", "Optional" + + +.. _build_options: + +Build Options +---------------- + +Build options include whether build binaries for CPU or GPU, which BLAS +library to use etc. You may pass these settings when running cmake. +For detailed cmake tutorial please refer to `here `_ 。 + +.. _build_options_bool: + +Bool Type Options +---------------- + +You can add :code:`-D` argument to pass such options, like: + +.. code-block:: bash + + cmake .. -DWITH_GPU=OFF + +.. csv-table:: Bool Type Options + :header: "Option", "Description", "Default" + :widths: 1, 7, 2 + + "WITH_GPU", "Build with GPU support", "ON" + "WITH_C_API", "Build only CAPI", "OFF" + "WITH_DOUBLE", "Build with double precision", "OFF" + "WITH_DSO", "Dynamically load CUDA libraries", "ON" + "WITH_AVX", "Build with AVX support", "ON" + "WITH_PYTHON", "Build with integrated Python interpreter", "ON" + "WITH_STYLE_CHECK", "Check code style when building", "ON" + "WITH_TESTING", "Build unit tests", "ON" + "WITH_DOC", "Build documentaions", "OFF" + "WITH_SWIG_PY", "Build Python SWIG interface for V2 API", "Auto" + "WITH_GOLANG", "Build fault-tolerant parameter server written in go", "ON" + "WITH_MKL", "Use MKL as BLAS library, else use OpenBLAS", "ON" + + +BLAS ++++++ + +PaddlePaddle supports `MKL `_ and +`OpenBlAS `_ as BLAS library。By default it uses MKL. +If you are using MKL and your machine supports AVX2, MKL-DNN will also be downloaded +and used, for more `details `_ . + +If you choose not to use MKL, then OpenBlAS will be used. + +CUDA/cuDNN ++++++++++++ + +PaddlePaddle will automatically find CUDA and cuDNN when compiling and running. +parameter :code:`-DCUDA_ARCH_NAME=Auto` can be used to detect SM architecture +automatically in order to speed up the build. + +PaddlePaddle can build with any version later than cuDNN v5.1, and we intend to +keep on with latest cuDNN versions. Be sure to run with the same version of cuDNN +you built. + +Pass Compile Options +++++++++++++++ + +You can pass compile options to use intended BLAS/CUDA/Cudnn libraries. +When running cmake command, it will search system paths like +:code:`/usr/lib:/usr/local/lib` and then search paths that you +passed to cmake, i.e. + +.. code-block:: bash + + cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCUDNN_ROOT=/opt/cudnnv5 + +**NOTE: These options only take effect when running cmake for the first time, you need to clean the cmake cache or clean the build directory (** :code:`rm -rf` **) if you want to change it.** diff --git a/doc/getstarted/build_and_install/cmake.png b/doc/getstarted/build_and_install/cmake.png deleted file mode 100644 index a58cd09ad99cf27cc1ca5785fe54d726b83a82f6..0000000000000000000000000000000000000000 Binary files a/doc/getstarted/build_and_install/cmake.png and /dev/null differ diff --git a/doc/getstarted/build_and_install/cmake/build_from_source_cn.rst b/doc/getstarted/build_and_install/cmake/build_from_source_cn.rst deleted file mode 100644 index be0c1ffa451b2901ec06621dd4d886f800b4562e..0000000000000000000000000000000000000000 --- a/doc/getstarted/build_and_install/cmake/build_from_source_cn.rst +++ /dev/null @@ -1,43 +0,0 @@ -PaddlePaddle的编译选项 -====================== - -PaddlePaddle的编译选项,包括生成CPU/GPU二进制文件、链接何种BLAS库等。用户可在调用cmake的时候设置它们,详细的cmake使用方法可以参考 `官方文档 `_ 。 - -Bool型的编译选项 ----------------- -用户可在cmake的命令行中,通过使用 ``-D`` 命令设置该类编译选项,例如 - -.. code-block:: bash - - cmake .. -DWITH_GPU=OFF - -.. csv-table:: Bool型的编译选项 - :widths: 1, 7, 2 - :file: compile_options.csv - -BLAS/CUDA/Cudnn的编译选项 --------------------------- -BLAS -+++++ - -PaddlePaddle支持以下任意一种BLAS库:`MKL `_ ,`ATLAS `_ ,`OpenBlAS `_ 和 `REFERENCE BLAS `_ 。 - -.. csv-table:: BLAS路径相关的编译选项 - :widths: 1, 2, 7 - :file: cblas_settings.csv - -CUDA/Cudnn -+++++++++++ - -PaddlePaddle可以使用cudnn v2之后的任何一个版本来编译运行,但尽量请保持编译和运行使用的cudnn是同一个版本。 我们推荐使用最新版本的cudnn v5.1。 - -编译选项的设置 -++++++++++++++ - -PaddePaddle通过编译时指定路径来实现引用各种BLAS/CUDA/Cudnn库。cmake编译时,首先在系统路径(/usr/lib\:/usr/local/lib)中搜索这几个库,同时也会读取相关路径变量来进行搜索。 通过使用 ``-D`` 命令可以设置,例如 - -.. code-block:: bash - - cmake .. -DMKL_ROOT=/opt/mkl/ -DCUDNN_ROOT=/opt/cudnnv5 - -注意:这几个编译选项的设置,只在第一次cmake的时候有效。如果之后想要重新设置,推荐清理整个编译目录(``rm -rf``)后,再指定。 diff --git a/doc/getstarted/build_and_install/cmake/cblas_settings.csv b/doc/getstarted/build_and_install/cmake/cblas_settings.csv deleted file mode 100644 index a6356baf16a0d3d2499e39d2055d8ee878dcaef2..0000000000000000000000000000000000000000 --- a/doc/getstarted/build_and_install/cmake/cblas_settings.csv +++ /dev/null @@ -1,5 +0,0 @@ -编译选项,描述,注意 -MKL_ROOT,MKL的路径,${MKL_ROOT}/include下需要包含mkl.h,${MKL_ROOT}/lib目录下需要包含mkl_core,mkl_sequential和mkl_intel_lp64三个库。 -ATLAS_ROOT,ATLAS的路径,${ATLAS_ROOT}/include下需要包含cblas.h,${ATLAS_ROOT}/lib下需要包含cblas和atlas两个库。 -OPENBLAS_ROOT,OpenBLAS的路径,${OPENBLAS_ROOT}/include下需要包含cblas.h,${OPENBLAS_ROOT}/lib下需要包含openblas库。 -REFERENCE_CBLAS_ROOT,REFERENCE BLAS的路径,${REFERENCE_CBLAS_ROOT}/include下需要包含cblas.h,${REFERENCE_CBLAS_ROOT}/lib下需要包含cblas库。 \ No newline at end of file diff --git a/doc/getstarted/build_and_install/cmake/compile_options.csv b/doc/getstarted/build_and_install/cmake/compile_options.csv deleted file mode 100644 index 463b825470579d0c3736a408b1e82dd33e6f8d42..0000000000000000000000000000000000000000 --- a/doc/getstarted/build_and_install/cmake/compile_options.csv +++ /dev/null @@ -1,12 +0,0 @@ -选项,说明,默认值 -WITH_GPU,是否支持GPU。,取决于是否寻找到CUDA工具链 -WITH_DOUBLE,是否使用双精度浮点数。,否 -WITH_DSO,是否运行时动态加载CUDA动态库,而非静态加载CUDA动态库。,是 -WITH_AVX,是否编译含有AVX指令集的PaddlePaddle二进制文件,是 -WITH_PYTHON,是否内嵌PYTHON解释器。方便今后的嵌入式移植工作。,是 -WITH_STYLE_CHECK,是否编译时进行代码风格检查,是 -WITH_RDMA,是否开启RDMA,否 -WITH_TIMER,是否开启计时功能。如果开启会导致运行略慢,打印的日志变多,但是方便调试和测Benchmark,否 -WITH_TESTING,是否开启单元测试,取决于是否寻找到GTEST -WITH_DOC,是否编译中英文文档,否 -WITH_SWIG_PY,是否编译PYTHON的SWIG接口,该接口可用于预测和定制化训练,取决于是否寻找到SWIG \ No newline at end of file diff --git a/doc/getstarted/build_and_install/docker_install_cn.rst b/doc/getstarted/build_and_install/docker_install_cn.rst index 0d34dec8e908c5e61001500725187a2233797f46..07933b2e0bbca809f6c4e90e7ff8f71d1b3304b2 100644 --- a/doc/getstarted/build_and_install/docker_install_cn.rst +++ b/doc/getstarted/build_and_install/docker_install_cn.rst @@ -1,222 +1,139 @@ -PaddlePaddle的Docker容器使用方式 +使用Docker安装运行PaddlePaddle ================================ -PaddlePaddle目前唯一官方支持的运行的方式是Docker容器。因为Docker能在所有主要操作系统(包括Linux,Mac OS X和Windows)上运行。 请注意,您需要更改 `Dockers设置 `_ 才能充分利用Mac OS X和Windows上的硬件资源。 +使用Docker安装和运行PaddlePaddle可以无需考虑依赖环境即可运行。并且也可以在Windows的docker中运行。 +您可以在 `Docker官网 `_ 获得基本的Docker安装和使用方法。 -Docker使用入门 ------------------------------- - -几个基础的概念帮助理解和使用Docker: +如果您在使用Windows,可以参考 +`这篇 `_ +教程,完成在Windows上安装和使用Docker。 -- *镜像*:一个Docker镜像是一个打包好的软件。它包含了这个软件本身和它所依赖的运行环境。PaddlePaddle的Docker镜像就包含了PaddlePaddle的Python库以及其依赖的多个Python库。这样我们可以直接在Docker中运行需要的程序而不需要安装后在执行。可以执行: +在了解Docker的基本使用方法之后,即可开始下面的步骤: - .. code-block:: bash +.. _docker_pull: - docker images +获取PaddlePaddle的Docker镜像 +------------------------------ - 来列出当前系统中的所有镜像,同样可以执行: +执行下面的命令获取最新的PaddlePaddle Docker镜像 .. code-block:: bash - - docker pull paddlepaddle/paddle:0.10.0 - 来下载Docker镜像,paddlepaddle/paddle是从官方镜像源Dockerhub.com下载的,推荐国内用户使用docker.paddlepaddle.org/paddle下载。 + docker pull paddlepaddle/paddle -- *容器*: 如果说一个Docker镜像就是一个程序,那容器就是这个程序运行时产生的“进程”。 - 实际上,一个容器就是一个操作系统的进程,但是是运行在独立的进程空间,文件系统以及网络之上。 - 可以执行: +对于国内用户,我们提供了加速访问的镜像源: .. code-block:: bash - docker run paddlepaddle/paddle:0.10.0 + docker pull docker.paddlepaddle.org/paddle - 来使用一个镜像启动一个容器。 - -- 默认情况下,Docker容器会运行在独立的文件系统空间之上,我们无法在Docker容器中 - 访问到主机上的文件。可以通过*挂载Volume*的方式,将主机上的文件或目录挂载到 - Docker容器中。下面的命令把当前目录挂载到了容器中的 /data 目录下,容器使用 - debian镜像,并且启动后执行 :code:`ls /data`。 +下载GPU版本的Docker镜像: .. code-block:: bash - docker run --rm -v $(pwd):/data debian ls /data - -PaddlePaddle发布的Docker镜像使用说明 ------------------------------- - -我们把PaddlePaddle的编译环境打包成一个镜像,称为开发镜像,里面涵盖了 -PaddlePaddle需要的所有编译工具。把编译出来的PaddlePaddle也打包成一个镜 -像,称为生产镜像,里面涵盖了PaddlePaddle运行所需的所有环境。每次 -PaddlePaddle发布新版本的时候都会发布对应版本的生产镜像以及开发镜像。运 -行镜像包括纯CPU版本和GPU版本以及其对应的非AVX版本。我们会在 -`dockerhub.com `_ -和国内镜像`docker.paddlepaddle.org` 提供最新 -的Docker镜像,可以在"tags"标签下找到最新的Paddle镜像版本。 - -**注意:为了方便在国内的开发者下载Docker镜像,我们提供了国内的镜像服务器供大家使用。如果您在国内,请把文档里命令中的paddlepaddle/paddle替换成docker.paddlepaddle.org/paddle。** - -1. 开发镜像::code:`paddlepaddle/paddle:0.10.0-dev` - - 这个镜像包含了Paddle相关的开发工具以及编译和运行环境。用户可以使用开发镜像代替配置本地环境,完成开发,编译,发布, - 文档编写等工作。由于不同的Paddle的版本可能需要不同的依赖和工具,所以如果需要自行配置开发环境需要考虑版本的因素。 - 开发镜像包含了以下工具: - - - gcc/clang - - nvcc - - Python - - sphinx - - woboq - - sshd - 很多开发者会使用远程的安装有GPU的服务器工作,用户可以使用ssh登录到这台服务器上并执行 :code:`docker exec`进入开发镜像并开始工作, - 也可以在开发镜像中启动一个SSHD服务,方便开发者直接登录到镜像中进行开发: - - 以交互容器方式运行开发镜像: - - .. code-block:: bash - - docker run -it --rm -v $(pwd):/paddle paddlepaddle/paddle:0.10.0-dev /bin/bash - - 或者,可以以后台进程方式运行容器: - - .. code-block:: bash - - docker run -d -p 2202:22 -p 8888:8888 -v $(pwd):/paddle paddlepaddle/paddle:0.10.0-dev /usr/sbin/sshd -D - - 然后用密码 :code:`root` SSH进入容器: - - .. code-block:: bash - - ssh -p 2202 root@localhost - - SSH方式的一个优点是我们可以从多个终端进入容器。比如,一个终端运行vi,另一个终端运行Python。另一个好处是我们可以把PaddlePaddle容器运行在远程服务器上,并在笔记本上通过SSH与其连接。 - -2. 生产镜像:根据CPU、GPU和非AVX区分了如下4个镜像: - - - GPU/AVX::code:`paddlepaddle/paddle:-gpu` - - GPU/no-AVX::code:`paddlepaddle/paddle:-gpu-noavx` - - CPU/AVX::code:`paddlepaddle/paddle:` - - CPU/no-AVX::code:`paddlepaddle/paddle:-noavx` - - 纯CPU镜像以及GPU镜像都会用到AVX指令集,但是2008年之前生产的旧电脑不支持AVX。以下指令能检查Linux电脑是否支持AVX: - - .. code-block:: bash - - if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi - - 如果输出是No,就需要选择使用no-AVX的镜像 - - **注:在0.10.0之后的版本,PaddlePaddle都可以自动判断硬件是否支持AVX,所以无需判断AVX即可使用** + docker pull paddlepaddle/paddle:latest-gpu + docker pull docker.paddlepaddle.org/paddle:latest-gpu - 以上方法在GPU镜像里也能用,只是请不要忘记提前在物理机上安装GPU最新驱动。 - 为了保证GPU驱动能够在镜像里面正常运行,我们推荐使用[nvidia-docker](https://github.com/NVIDIA/nvidia-docker)来运行镜像。 +选择下载使用不同的BLAS库的Docker镜像: - .. code-block:: bash - - nvidia-docker run -it --rm paddledev/paddle:0.10.0-gpu /bin/bash + .. code-block:: bash - 注意: 如果使用nvidia-docker存在问题,你也许可以尝试更老的方法,具体如下,但是我们并不推荐这种方法。: + # 默认是使用MKL的镜像 + docker pull paddlepaddle/paddle + # 使用OpenBLAS的镜像 + docker pull paddlepaddle/paddle:latest-openblas - .. code-block:: bash +下载指定版本的Docker镜像,可以从 `DockerHub网站 `_ 获取可选的tag,并执行下面的命令: - export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')" - export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}') - docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:0.10.0-gpu + .. code-block:: bash -3. 运行以及发布您的AI程序 + docker pull paddlepaddle/paddle:[tag] + # 比如: + docker pull docker.paddlepaddle.org/paddle:0.10.0-gpu - 假设您已经完成了一个AI训练的python程序 :code:`a.py`,这个程序是您在开发机上使用开发镜像完成开发。此时您可以运行这个命令在开发机上进行测试运行: +.. _docker_run: - .. code-block:: bash +在Docker中执行PaddlePaddle训练程序 +------------------------------ - docker run -it -v $PWD:/work paddle /work/a.py +假设您已经在当前目录(比如在/home/work)编写了一个PaddlePaddle的程序 :code:`train.py` (可以参考 +`PaddlePaddleBook `_ +编写),就可以使用下面的命令开始执行训练: - 如果要使用GPU,请运行: + .. code-block:: bash - .. code-block:: bash + cd /home/work + docker run -it -v $PWD:/work paddlepaddle/paddle /work/train.py + +上述命令中, :code:`-it` 参数说明容器已交互式运行; :code:`-v $PWD:/work` +指定将当前路径(Linux中$PWD变量会展开为当前路径的绝对路径)挂载到容器内部的 :code:`/work` +目录; :code:`paddlepaddle/paddle` 指定需要使用的容器; 最后 :code:`/work/train.py` +为容器内执行的命令,即运行训练程序。 - nvidia-docker run -it -v $PWD:/work paddle /work/a.py +当然,您也可以进入到Docker容器中,以交互式的方式执行或调试您的代码: + .. code-block:: bash + docker run -it -v $PWD:/work paddlepaddle/paddle /bin/bash + cd /work + python train.py - 这里`a.py`包含的所有依赖假设都可以在Paddle的运行容器中。如果需要包含更多的依赖、或者需要发布您的应用的镜像,可以编写`Dockerfile`使用`FROM paddledev/paddle:0.10.0` - 创建和发布自己的AI程序镜像。 +**注:PaddlePaddle Docker镜像为了减小体积,默认没有安装vim,您可以在容器中执行** :code:`apt-get install -y vim` **安装后,在容器中编辑代码。** -运行PaddlePaddle Book ---------------------- +.. _docker_run_book: -Jupyter Notebook是一个开源的web程序,大家可以通过它制作和分享带有代码、公式、图表、文字的交互式文档。用户可以通过网页浏览文档。 +使用Docker启动PaddlePaddle Book教程 +------------------------------ +使用Docker可以快速在本地启动一个包含了PaddlePaddle官方Book教程的Jupyter Notebook,可以通过网页浏览。 PaddlePaddle Book是为用户和开发者制作的一个交互式的Jupyter Notebook。 如果您想要更深入了解deep learning,PaddlePaddle Book一定是您最好的选择。 +大家可以通过它阅读教程,或者制作和分享带有代码、公式、图表、文字的交互式文档。 我们提供可以直接运行PaddlePaddle Book的Docker镜像,直接运行: -.. code-block:: bash + .. code-block:: bash - docker run -p 8888:8888 paddlepaddle/book + docker run -p 8888:8888 paddlepaddle/book 然后在浏览器中输入以下网址: -.. code-block:: text + .. code-block:: text - http://localhost:8888/ + http://localhost:8888/ 就这么简单,享受您的旅程! -通过Docker容器开发PaddlePaddle ------------------------------- - -开发人员可以在Docker开发镜像中开发PaddlePaddle。这样开发人员可以以一致的方式在不同的平台上工作 - Linux,Mac OS X和Windows。 +.. _docker_run_gpu: -1. 制作PaddlePaddle开发镜像 - - PaddlePaddle每次发布新版本都会发布对应的开发镜像供开发者直接使用。这里介绍如生成造这个开发镜像。 - 生成Docker镜像的方式有两个,一个是直接把一个容器转换成镜像,另一个是创建Dockerfile并运行docker build指令按照Dockerfile生成镜像。第一个方法的好处是简单快捷,适合自己实验,可以快速迭代。第二个方法的好处是Dockerfile可以把整个生成流程描述很清楚,其他人很容易看懂镜像生成过程,持续集成系统也可以简单地复现这个过程。我们采用第二个方法。Dockerfile位于PaddlePaddle repo的根目录。生成生产镜像只需要运行: - - .. code-block:: bash - - git clone https://github.com/PaddlePaddle/Paddle.git - cd Paddle - docker build -t paddle:dev . - - docker build这个命令的-t指定了生成的镜像的名字,这里我们用paddle:dev。到此,PaddlePaddle开发镜像就被构建完毕了。 +使用Docker执行GPU训练 +------------------------------ -2. 制作PaddlePaddle生产镜像 +为了保证GPU驱动能够在镜像里面正常运行,我们推荐使用 +`nvidia-docker `_ 来运行镜像。 +请不要忘记提前在物理机上安装GPU最新驱动。 - 生产镜像的生成分为两步,第一步是运行: + .. code-block:: bash - .. code-block:: bash - - docker run -v $(pwd):/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=OFF" -e "WITH_TEST=ON" paddle:dev + nvidia-docker run -it -v $PWD:/work paddledev/paddle:latest-gpu /bin/bash - 以上命令会编译PaddlePaddle,生成运行程序,以及生成创建生产镜像的Dockerfile。所有生成的的文件都在build目录下。“WITH_GPU”控制生成的生产镜像是否支持GPU,“WITH_AVX”控制生成的生产镜像是否支持AVX,”WITH_TEST“控制是否生成单元测试。 +**注: 如果没有安装nvidia-docker,可以尝试以下的方法,将CUDA库和Linux设备挂载到Docker容器内:** - 第二步是运行: + .. code-block:: bash - .. code-block:: bash - - docker build -t paddle:prod -f build/Dockerfile ./build + export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')" + export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}') + docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:latest-gpu - 以上命令会按照生成的Dockerfile把生成的程序拷贝到生产镜像中并做相应的配置,最终生成名为paddle:prod的生产镜像。 +**关于AVX:** -3. 运行单元测试 +AVX是一种CPU指令集,可以加速PaddlePaddle的计算。最新的PaddlePaddle Docker镜像默认 +是开启AVX编译的,所以,如果您的电脑不支持AVX,需要单独 +`编译 <./build_from_source_cn.rst>`_ PaddlePaddle为no-avx版本。 - 运行以下指令: +以下指令能检查Linux电脑是否支持AVX: .. code-block:: bash - - docker run -it -v $(pwd):/paddle paddle:dev bash -c "cd /paddle/build && ctest" - -文档 ----- - -Paddle的Docker开发镜像带有一个通过 `woboq code browser -`_ 生成的HTML版本的C++源代码,便于用户浏览C++源码。 -只要在Docker里启动PaddlePaddle的时候给它一个名字,就可以再运行另一个Nginx Docker镜像来服务HTML代码: - -.. code-block:: bash - - docker run -d --name paddle-cpu-doc paddle:0.10.0-dev - docker run -d --volumes-from paddle-cpu-doc -p 8088:80 nginx + if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi -接着我们就能够打开浏览器在 http://localhost:8088/paddle/ 浏览代码。 +如果输出是No,就需要选择使用no-AVX的镜像 diff --git a/doc/getstarted/build_and_install/docker_install_en.rst b/doc/getstarted/build_and_install/docker_install_en.rst index 94860240f6a4a9bed8a865684a8a79960489280e..9b977c9c72e36b4b47cbf56ae848ab83d9895783 100644 --- a/doc/getstarted/build_and_install/docker_install_en.rst +++ b/doc/getstarted/build_and_install/docker_install_en.rst @@ -1,270 +1,146 @@ PaddlePaddle in Docker Containers ================================= -Docker container is currently the only officially-supported way to -running PaddlePaddle. This is reasonable as Docker now runs on all -major operating systems including Linux, Mac OS X, and Windows. -Please be aware that you will need to change `Dockers settings -`_ to make full use -of your hardware resource on Mac OS X and Windows. +Run PaddlePaddle in Docker container so that you don't need to care about +runtime dependencies, also you can run under Windows system. You can get +tutorials at `here `_ . -Working With Docker -------------------- +If you are using Windows, please refer to +`this `_ +tutorial to start running docker under windows. -Docker is simple as long as we understand a few basic concepts: +After you've read above tutorials you may proceed the following steps. -- *image*: A Docker image is a pack of software. It could contain one or more programs and all their dependencies. For example, the PaddlePaddle's Docker image includes pre-built PaddlePaddle and Python and many Python packages. We can run a Docker image directly, other than installing all these software. We can type +.. _docker_pull: - .. code-block:: bash - - docker images +Pull PaddlePaddle Docker Image +------------------------------ - to list all images in the system. We can also run +Run the following command to download the latest Docker images: .. code-block:: bash - - docker pull paddlepaddle/paddle:0.10.0 - to download a Docker image, paddlepaddle/paddle in this example, - from Dockerhub.com. + docker pull paddlepaddle/paddle -- *container*: considering a Docker image a program, a container is a - "process" that runs the image. Indeed, a container is exactly an - operating system process, but with a virtualized filesystem, network - port space, and other virtualized environment. We can type +For users in China, we provide a faster mirror: .. code-block:: bash - docker run paddlepaddle/paddle:0.10.0 + docker pull docker.paddlepaddle.org/paddle - to start a container to run a Docker image, paddlepaddle/paddle in this example. - -- By default docker container have an isolated file system namespace, - we can not see the files in the host file system. By using *volume*, - mounted files in host will be visible inside docker container. - Following command will mount current dirctory into /data inside - docker container, run docker container from debian image with - command :code:`ls /data`. +Download GPU version images: .. code-block:: bash - docker run --rm -v $(pwd):/data debian ls /data - -Usage of CPU-only and GPU Images ----------------------------------- - -We package PaddlePaddle's compile environment into a Docker image, -called the develop image, it contains all compiling tools that -PaddlePaddle needs. We package compiled PaddlePaddle program into a -Docker image as well, called the production image, it contains all -runtime environment that running PaddlePaddle needs. For each version -of PaddlePaddle, we release both of them. Production image includes -CPU-only version and a CUDA GPU version and their no-AVX versions. - -We put the docker images on `dockerhub.com -`_. You can find the -latest versions under "tags" tab at dockerhub.com. - -** NOTE: If you are in China, you can use our Docker image registry mirror to speed up the download process. To use it, please replace all paddlepaddle/paddle in the commands to docker.paddlepaddle.org/paddle.** - - -1. development image :code:`paddlepaddle/paddle:-dev` - - This image has packed related develop tools and runtime - environment. Users and developers can use this image instead of - their own local computer to accomplish development, build, - releasing, document writing etc. While different version of paddle - may depends on different version of libraries and tools, if you - want to setup a local environment, you must pay attention to the - versions. The development image contains: - - - gcc/clang - - nvcc - - Python - - sphinx - - woboq - - sshd - - Many developers use servers with GPUs, they can use ssh to login to - the server and run :code:`docker exec` to enter the docker - container and start their work. Also they can start a development - docker image with SSHD service, so they can login to the container - and start work. - -2. Production images, this image might have multiple variants: - - - GPU/AVX::code:`paddlepaddle/paddle:-gpu` - - GPU/no-AVX::code:`paddlepaddle/paddle:-gpu-noavx` - - CPU/AVX::code:`paddlepaddle/paddle:` - - CPU/no-AVX::code:`paddlepaddle/paddle:-noavx` - - Please be aware that the CPU-only and the GPU images both use the - AVX instruction set, but old computers produced before 2008 do not - support AVX. The following command checks if your Linux computer - supports AVX: - - .. code-block:: bash - - if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi - - **NOTE:versions after 0.10.0 will automatically detect system AVX support, so manual detect is not needed in this case.** - To run the CPU-only image as an interactive container: - - .. code-block:: bash - - docker run -it --rm paddlepaddle/paddle:0.10.0 /bin/bash - - Above method work with the GPU image too -- the recommended way is - using `nvidia-docker `_. - - Please install nvidia-docker first following this `tutorial - `_. - - Now you can run a GPU image: - - .. code-block:: bash - - nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0-gpu /bin/bash - - -Train Model Using Python API ----------------------------- - -Our official docker image provides a runtime for PaddlePaddle -programs. The typical workflow will be as follows: - -Create a directory as workspace: - -.. code-block:: bash - - mkdir ~/workspace - -Edit a PaddlePaddle python program using your favourite editor - -.. code-block:: bash - - emacs ~/workspace/example.py - -Run the program using docker: - -.. code-block:: bash - - docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0 python /workspace/example.py - -Or if you are using GPU for training: + docker pull paddlepaddle/paddle:latest-gpu + docker pull docker.paddlepaddle.org/paddle:latest-gpu -.. code-block:: bash +Choose between different BLAS version: - nvidia-docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0-gpu python /workspace/example.py - -Above commands will start a docker container by running :code:`python -/workspace/example.py`. It will stop once :code:`python -/workspace/example.py` finishes. - -Another way is to tell docker to start a :code:`/bin/bash` session and -run PaddlePaddle program interactively: - -.. code-block:: bash - - docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0 /bin/bash - # now we are inside docker container - cd /workspace - python example.py - -Running with GPU is identical: - -.. code-block:: bash - - nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0-gpu /bin/bash - # now we are inside docker container - cd /workspace - python example.py - - -Develop PaddlePaddle or Train Model Using C++ API ---------------------------------------------------- - -We will be using PaddlePaddle development image since it contains all -compiling tools and dependencies. + .. code-block:: bash -1. Build PaddlePaddle develop image + # image using MKL by default + docker pull paddlepaddle/paddle + # image using OpenBLAS + docker pull paddlepaddle/paddle:latest-openblas - Use following command to build PaddlePaddle develop image: - .. code-block:: bash +If you want to use legacy versions, choose a tag from +`DockerHub `_ +and run: - git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle - docker build -t paddle:dev . - -2. Build PaddlePaddle production image + .. code-block:: bash - There are two steps for building production image, the first step is to run: + docker pull paddlepaddle/paddle:[tag] + # i.e. + docker pull docker.paddlepaddle.org/paddle:0.10.0-gpu - .. code-block:: bash +.. _docker_run: - docker run -v $(pwd):/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=OFF" -e "WITH_TEST=ON" paddle:dev +Launch your training program in Docker +------------------------------ - The above command will compile PaddlePaddle and create a Dockerfile for building production image. All the generated files are in the build directory. "WITH_GPU" controls if the generated production image supports GPU. "WITH_AVX" controls if the generated production image supports AVX. "WITH_TEST" controls if the unit test will be generated. +Assume that you have already written a PaddlePaddle program +named :code:`train.py` under directory :code:`/home/work` (refer to +`PaddlePaddleBook `_ +for more samples), then run the following command: - The second step is to run: + .. code-block:: bash - .. code-block:: bash + cd /home/work + docker run -it -v $PWD:/work paddlepaddle/paddle /work/train.py - docker build -t paddle:prod -f build/Dockerfile ./build +In the above command, :code:`-it` means run the container interactively; +:code:`-v $PWD:/work` means mount the current directory ($PWD will expand +to current absolute path in Linux) under :code:`/work` in the container. +:code:`paddlepaddle/paddle` to specify image to use; finnally +:code:`/work/train.py` is the command to run inside docker. - The above command will generate the production image by copying the compiled PaddlePaddle program into the image. +Also, you can go into the container shell, run or debug your code +interactively: -3. Run unit test + .. code-block:: bash + docker run -it -v $PWD:/work paddlepaddle/paddle /bin/bash + cd /work + python train.py - Following command will run unit test: +**NOTE: We did not install vim in the default docker image to reduce the image size, you can run** :code:`apt-get install -y vim` **to install it if you need to edit python files.** - .. code-block:: bash - - docker run -it -v $(pwd):/paddle paddle:dev bash -c "cd /paddle/build && ctest" +.. _docker_run_book: PaddlePaddle Book ------------------ -The Jupyter Notebook is an open-source web application that allows -you to create and share documents that contain live code, equations, -visualizations and explanatory text in a single browser. - -PaddlePaddle Book is an interactive Jupyter Notebook for users and developers. -We already exposed port 8888 for this book. If you want to +You can create a container serving PaddlePaddle Book using Jupyter Notebook in +one minute using Docker. PaddlePaddle Book is an interactive Jupyter Notebook +for users and developers.If you want to dig deeper into deep learning, PaddlePaddle Book definitely is your best choice. We provide a packaged book image, simply issue the command: -.. code-block:: bash + .. code-block:: bash - docker run -p 8888:8888 paddlepaddle/book + docker run -p 8888:8888 paddlepaddle/book Then, you would back and paste the address into the local browser: -.. code-block:: text + .. code-block:: text - http://localhost:8888/ + http://localhost:8888/ That's all. Enjoy your journey! +.. _docker_run_gpu: -Documentation -------------- +Train with Docker with GPU +------------------------------ -Paddle Docker images include an HTML version of C++ source code -generated using `woboq code browser -`_. This makes it easy -for users to browse and understand the C++ source code. +We recommend using +`nvidia-docker `_ +to run GPU training jobs. Please ensure you have latest +GPU driver installed before move on. -As long as we give the Paddle Docker container a name, we can run an -additional Nginx Docker container to serve the volume from the Paddle -container: + .. code-block:: bash -.. code-block:: bash + nvidia-docker run -it -v $PWD:/work paddledev/paddle:latest-gpu /bin/bash - docker run -d --name paddle-cpu-doc paddle: - docker run -d --volumes-from paddle-cpu-doc -p 8088:80 nginx +**NOTE: If you don't have nvidia-docker installed, try the following method to mount CUDA libs and devices into the container.** + .. code-block:: bash -Then we can direct our Web browser to the HTML version of source code -at http://localhost:8088/paddle/ + export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')" + export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}') + docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:latest-gpu + +**About AVX:** + +AVX is a kind of CPU instruction can accelerate PaddlePaddle's calculations. +The latest PaddlePaddle Docker image turns AVX on by default, so, if your +computer doesn't support AVX, you'll probably need to +`build <./build_from_source_en.rst>`_ with :code:`WITH_AVX=OFF`. + +The following command will tell you whether your computer supports AVX. + + .. code-block:: bash + + if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi diff --git a/doc/getstarted/build_and_install/index_cn.rst b/doc/getstarted/build_and_install/index_cn.rst index dd9923697ab85825557aa89a08870bece7c76673..88c5142ddee994ed0c0dc520195311e97f5a549e 100644 --- a/doc/getstarted/build_and_install/index_cn.rst +++ b/doc/getstarted/build_and_install/index_cn.rst @@ -6,12 +6,13 @@ 安装流程 ++++++++ -PaddlePaddle提供Docker镜像来部署环境。 +PaddlePaddle提供pip和Docker的安装方式: .. toctree:: :maxdepth: 1 - - docker_install_cn.rst + + pip_install_cn.rst + docker_install_cn.rst 编译流程 @@ -19,9 +20,14 @@ PaddlePaddle提供Docker镜像来部署环境。 .. warning:: - 编译流程主要推荐高级用户查看,普通用户请走安装流程。 + 建议直接使用上述安装流程,方便快速安装。只有在遇到需要独立定制的二进制时才需要编译。 .. toctree:: :maxdepth: 1 - cmake/build_from_source_cn.rst + build_from_source_cn.rst + +常见问题解答 +++++++++++ + +`常见问题解答 `_ diff --git a/doc/getstarted/build_and_install/index_en.rst b/doc/getstarted/build_and_install/index_en.rst index 8a53588e0439df8f4d5fd529b7a20262c67d4e58..c8b60d03578ba6a9b73134ec53b440d057e36079 100644 --- a/doc/getstarted/build_and_install/index_en.rst +++ b/doc/getstarted/build_and_install/index_en.rst @@ -1,22 +1,33 @@ Install and Build ================= -Install PaddlePaddle ----------------------- +.. _install_steps: -.. toctree:: - :maxdepth: 1 +Install Steps +++++++++ + +You can choose either pip or Docker to complete your install: + +.. toctree:: + :maxdepth: 1 + + pip_install_en.rst + docker_install_en.rst - docker_install_en.rst Build from Source ----------------- .. warning:: - Please use :code:`docker` image to install paddle. The building guide is used for hacking or contributing PaddlePaddle source code. + We recommend to directly install via above installation steps, you'll only need to build PaddlePaddle from source when you need a modifed binary. .. toctree:: :maxdepth: 1 build_from_source_en.md + +FAQ +++++++++++ + +`FAQ `_ diff --git a/doc/getstarted/build_and_install/paddleci.png b/doc/getstarted/build_and_install/paddleci.png new file mode 100644 index 0000000000000000000000000000000000000000..16087ce059aa3c07ce8c927d983eb86351915825 Binary files /dev/null and b/doc/getstarted/build_and_install/paddleci.png differ diff --git a/doc/getstarted/build_and_install/pip_install_cn.rst b/doc/getstarted/build_and_install/pip_install_cn.rst new file mode 100644 index 0000000000000000000000000000000000000000..41312da48c055826186a560ef9653653e45d1047 --- /dev/null +++ b/doc/getstarted/build_and_install/pip_install_cn.rst @@ -0,0 +1,86 @@ +使用pip安装PaddlePaddle +================================ + +PaddlePaddle可以使用常用的Python包管理工具 +`pip `_ +完成安装,并可以在大多数主流的Linux操作系统以及MacOS上执行。 + +.. _pip_install: + +使用pip安装 +------------------------------ + + +执行下面的命令即可在当前机器上安装PaddlePaddle的运行时环境,并自动下载安装依赖软件。 + + .. code-block:: bash + + pip install paddlepaddle + + +如果需要安装支持GPU的版本,需要执行: + + .. code-block:: bash + + pip install paddlepaddle-gpu + +如果需要获取并安装最新的(开发分支)PaddlePaddle,可以从我们的CI系统中下载最新的whl安装包和c-api开发包并安装, +您可以从下面的表格中找到需要的版本: + +如果在点击下面链接时出现如下登陆界面,点击“Log in as guest”即可开始下载: + +.. image:: paddleci.png + :scale: 50 % + :align: center + +.. csv-table:: 各个版本最新的whl包 + :header: "版本说明", "cp27-cp27mu", "cp27-cp27mu", "C-API" + :widths: 1, 3, 3, 3 + + "cpu_avx_mkl", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "`paddle.tgz `_" + "cpu_avx_openblas", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "暂无" + "cuda7.5_cudnn5_avx_mkl", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "`paddle.tgz `_" + "cuda8.0_cudnn5_avx_mkl", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "`paddle.tgz `_" + "cuda8.0_cudnn7_avx_mkl", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "`paddle.tgz `_" + +.. _pip_dependency: + +运行环境依赖 +------------------------------ + +PaddlePaddle安装包由于不仅仅包含.py程序,而且包含了C++编写的部分,所以我们确保发布的二进制包可以支持主流的Linux操作系统,比如CentOS 6以上,Ubuntu 14.04以上,MacOS 10.12以上。 + +PaddlePaddle发布的安装包会尽量对齐 `manylinux1 `_ 标准,通常使用CentOS 5作为编译环境。但由于CUDA库通常需要CentOS 6以上,而且CentOS 5即将停止维护,所以我们默认使用CentOS 6作为标准编译环境。 + +.. csv-table:: PaddlePaddle环境依赖 + :header: "依赖", "版本", "说明" + :widths: 10, 15, 30 + + "操作系统", "Linux, MacOS", "CentOS 6以上,Ubuntu 14.04以上,MacOS 10.12以上" + "Python", "2.7.x", "暂时不支持Python3" + "libc.so", "GLIBC_2.7", "glibc至少包含GLIBC_2.7以上的符号" + "libstdc++.so", "GLIBCXX_3.4.11, CXXABI_1.3.3", "至少包含GLIBCXX_3.4.11, CXXABI_1.3.3以上的符号" + "libgcc_s.so", "GCC_3.3", "至少包含GCC_3.3以上的符号" + +.. _pip_faq: + +安装常见问题和解决方法 +------------------------------ + +- paddlepaddle*.whl is not a supported wheel on this platform. + + 出现这个问题的主要原因是,没有找到和当前系统匹配的paddlepaddle安装包。请检查Python版本是否为2.7系列。另外最新的pip官方源中的安装包默认是manylinux1标准,需要使用最新的pip (>9.0.0) 才可以安装。可以使用下面的命令更新您的pip: + + .. code-block:: bash + + pip install --upgrade pip + + 如果仍然存在问题,可以执行: + + .. code-block:: bash + + python -c "import pip; print(pip.pep425tags.get_supported())" + + 获取当前系统支持的安装包格式,并检查和需安装的包是否匹配。pypi安装包可以在 `这个 `_ 链接中找到。 + + 如果系统支持的是 linux_x86_64 而安装包是 manylinux1_x86_64 ,需要升级pip版本到最新; 如果系统支持 manylinux1_x86_64 而安装包(本地)是 linux_x86_64 ,可以重命名这个whl包为 manylinux1_x86_64 再安装。 \ No newline at end of file diff --git a/doc/getstarted/build_and_install/pip_install_en.rst b/doc/getstarted/build_and_install/pip_install_en.rst new file mode 100644 index 0000000000000000000000000000000000000000..4f295e14baa1465a93b8eef1b3f3b6b47eeea905 --- /dev/null +++ b/doc/getstarted/build_and_install/pip_install_en.rst @@ -0,0 +1,104 @@ +Install PaddlePaddle Using pip +================================ + +You can use current widely used Python package management +tool `pip `_ +to install PaddlePaddle. This method can be used in +most of current Linux systems or MacOS. + +.. _pip_install: + +Install Using pip +------------------------------ + +Run the following command to install PaddlePaddle on the current +machine, it will also download requirements. + + .. code-block:: bash + + pip install paddlepaddle + + +If you wish to install GPU version, just run: + + .. code-block:: bash + + pip install paddlepaddle-gpu + +If you wish to install the latest develop branch PaddlePaddle, +you can download the latest whl package from our CI system. Access +the below links, log in as guest, then click at the "Artifact" +tab, you'll find the download link of whl packages. + +If the links below shows up the login form, just click "Log in as guest" to start the download: + +.. image:: paddleci.png + :scale: 50 % + :align: center + +.. csv-table:: whl package of each version + :header: "version", "cp27-cp27mu", "cp27-cp27mu", "C-API" + :widths: 1, 3, 3, 3 + + "cpu_avx_mkl", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "`paddle.tgz `_" + "cpu_avx_openblas", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "Not Available" + "cuda7.5_cudnn5_avx_mkl", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "`paddle.tgz `_" + "cuda8.0_cudnn5_avx_mkl", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "`paddle.tgz `_" + "cuda8.0_cudnn7_avx_mkl", "`paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl `_", "`paddlepaddle-0.10.0-cp27-cp27m-linux_x86_64.whl `_", "`paddle.tgz `_" + +.. _pip_dependency: + +Runtime Dependency +------------------------------ + +PaddlePaddle installation packages (whl) does not only contain .py files, +but also binaries built from C++ code. We ensure that PaddlePaddle can +run on current mainline Linux distributions, like CentOS 6, Ubuntu 14.04 +and MacOS 10.12. + +PaddlePaddle whl packages are trying to satisfy +`manylinux1 `_ +standard, which uses CentOS 5 as default build environment. But CUDA libraries +seems only run on CentOS 6 at least, also, CentOS 5 is about to end its lifetime, +so we use CentOS 6 as default build environment. + +.. csv-table:: PaddlePaddle Runtime Deps + :header: "Dependency", "version", "description" + :widths: 10, 15, 30 + + "OS", "Linux, MacOS", "CentOS 6 or later,Ubuntu 14.04 or later,MacOS 10.12 or later" + "Python", "2.7.x", "Currently Python3 is not supported" + "libc.so", "GLIBC_2.7", "glibc at least include GLIBC_2.7 symbols" + "libstdc++.so", "GLIBCXX_3.4.11, CXXABI_1.3.3", "At least include GLIBCXX_3.4.11, CXXABI_1.3.3 symbols" + "libgcc_s.so", "GCC_3.3", "At least include GCC_3.3 symbols" + +.. _pip_faq: + +FAQ +------------------------------ + +- paddlepaddle*.whl is not a supported wheel on this platform. + + The main cause of this issue is that your current platform is + not supported. Please check that you are using Python 2.7 series. + Besides, pypi only supports manylinux1 standard, you'll need to + upgrade your pip to >9.0.0. Then run the below command: + + .. code-block:: bash + + pip install --upgrade pip + + If the problem still exists, run the following command: + + .. code-block:: bash + + python -c "import pip; print(pip.pep425tags.get_supported())" + + Then you'll get supported package suffixes, then check if it matches + the file name of the whl package. You can find default whl package at + `here `_ + + If your system supports linux_x86_64 but the whl package is manylinux1_x86_64, + you'll need to update pip to the latest version; If your system supports + manylinux1_x86_64 but the whl package is linux_x86_64 you can rename the + file to manylinux1_x86_64 suffix and then install. diff --git a/doc/getstarted/index_cn.rst b/doc/getstarted/index_cn.rst index aa418c657a4ba16cce61c030066f4d3e14e891cc..a9087be6f350c5656cabb0c64ba0f200d1c666cc 100644 --- a/doc/getstarted/index_cn.rst +++ b/doc/getstarted/index_cn.rst @@ -1,10 +1,61 @@ 新手入门 ============ +.. _quick_install: + +快速安装 +++++++++ + +PaddlePaddle支持使用pip快速安装,目前支持CentOS 6以上, Ubuntu 14.04以及MacOS 10.12,并安装有Python2.7。 +执行下面的命令完成快速安装: + + .. code-block:: bash + + pip install paddlepaddle + +如果需要安装支持GPU的版本,需要执行: + + .. code-block:: bash + + pip install paddlepaddle-gpu + +更详细的安装和编译方法参考: + .. toctree:: :maxdepth: 1 build_and_install/index_cn.rst - concepts/use_concepts_cn.rst -- `深度学习入门课程 `_ +.. _quick_start: + +快速开始 +++++++++ + +创建一个 housing.py 并粘贴此Python代码: + + .. code-block:: python + + import paddle.v2 as paddle + + # Initialize PaddlePaddle. + paddle.init(use_gpu=False, trainer_count=1) + + # Configure the neural network. + x = paddle.layer.data(name='x', type=paddle.data_type.dense_vector(13)) + y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear()) + + # Infer using provided test data. + probs = paddle.infer( + output_layer=y_predict, + parameters=paddle.dataset.uci_housing.model(), + input=[item for item in paddle.dataset.uci_housing.test()()]) + + for i in xrange(len(probs)): + print 'Predicted price: ${:,.2f}'.format(probs[i][0] * 1000) + +执行 :code:`python housing.py` 瞧! 它应该打印出预测住房数据的清单。 + +.. toctree:: + :maxdepth: 1 + + concepts/use_concepts_cn.rst diff --git a/doc/getstarted/index_en.rst b/doc/getstarted/index_en.rst index be3253e3d41b99a2b696e2c5ef6463ed49680d69..d14e3f5c0cc90792fce9cb82e65da482c44dc433 100644 --- a/doc/getstarted/index_en.rst +++ b/doc/getstarted/index_en.rst @@ -1,9 +1,61 @@ GET STARTED ============ +.. _quick_install: + +Quick Install +---------------------- + +You can use pip to install PaddlePaddle with a single command, supports +CentOS 6 above, Ubuntu 14.04 above or MacOS 10.12, with Python 2.7 installed. +Simply run the following command to install: + + .. code-block:: bash + + pip install paddlepaddle + +If you need to install GPU version, run: + + .. code-block:: bash + + pip install paddlepaddle-gpu + +For more details about installation and build: + .. toctree:: :maxdepth: 1 build_and_install/index_en.rst -- `Deep Learning 101 `_ + +.. _quick_start: + +Quick Start +++++++++ + +Create a new file called housing.py, and paste this Python +code: + + + .. code-block:: python + + import paddle.v2 as paddle + + # Initialize PaddlePaddle. + paddle.init(use_gpu=False, trainer_count=1) + + # Configure the neural network. + x = paddle.layer.data(name='x', type=paddle.data_type.dense_vector(13)) + y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear()) + + # Infer using provided test data. + probs = paddle.infer( + output_layer=y_predict, + parameters=paddle.dataset.uci_housing.model(), + input=[item for item in paddle.dataset.uci_housing.test()()]) + + for i in xrange(len(probs)): + print 'Predicted price: ${:,.2f}'.format(probs[i][0] * 1000) + +Run :code:`python housing.py` and voila! It should print out a list of predictions +for the test housing data. diff --git a/doc/howto/optimization/cpu_profiling.md b/doc/howto/optimization/cpu_profiling.md new file mode 100644 index 0000000000000000000000000000000000000000..32d89a7c183d57e0e69039dfb2c78703d9866f7c --- /dev/null +++ b/doc/howto/optimization/cpu_profiling.md @@ -0,0 +1,163 @@ +此教程会介绍如何使用Python的cProfile包,与Python库yep,google perftools来运行性能分析(Profiling)与调优。 + +运行性能分析可以让开发人员科学的,有条不紊的对程序进行性能优化。性能分析是性能调优的基础。因为在程序实际运行中,真正的瓶颈可能和程序员开发过程中想象的瓶颈相去甚远。 + +性能优化的步骤,通常是循环重复若干次『性能分析 --> 寻找瓶颈 ---> 调优瓶颈 --> 性能分析确认调优效果』。其中性能分析是性能调优的至关重要的量化指标。 + +Paddle提供了Python语言绑定。用户使用Python进行神经网络编程,训练,测试。Python解释器通过`pybind`和`swig`调用Paddle的动态链接库,进而调用Paddle C++部分的代码。所以Paddle的性能分析与调优分为两个部分: + +* Python代码的性能分析 +* Python与C++混合代码的性能分析 + + +## Python代码的性能分析 + +### 生成性能分析文件 + +Python标准库中提供了性能分析的工具包,[cProfile](https://docs.python.org/2/library/profile.html)。生成Python性能分析的命令如下: + +```bash +python -m cProfile -o profile.out main.py +``` + +其中`-o`标识了一个输出的文件名,用来存储本次性能分析的结果。如果不指定这个文件,`cProfile`会打印一些统计信息到`stdout`。这不方便我们进行后期处理(进行`sort`, `split`, `cut`等等)。 + +### 查看性能分析文件 + +当main.py运行完毕后,性能分析结果文件`profile.out`就生成出来了。我们可以使用[cprofilev](https://github.com/ymichael/cprofilev)来查看性能分析结果。`cprofilev`是一个Python的第三方库。使用它会开启一个HTTP服务,将性能分析结果以网页的形式展示出来。 + +使用`pip install cprofilev`安装`cprofilev`工具。安装完成后,使用如下命令开启HTTP服务 + +```bash +cprofilev -a 0.0.0.0 -p 3214 -f profile.out main.py +``` + +其中`-a`标识HTTP服务绑定的IP。使用`0.0.0.0`允许外网访问这个HTTP服务。`-p`标识HTTP服务的端口。`-f`标识性能分析的结果文件。`main.py`标识被性能分析的源文件。 + +访问对应网址,即可显示性能分析的结果。性能分析结果格式如下: + +```text + ncalls tottime percall cumtime percall filename:lineno(function) + 1 0.284 0.284 29.514 29.514 main.py:1() + 4696 0.128 0.000 15.748 0.003 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/executor.py:20(run) + 4696 12.040 0.003 12.040 0.003 {built-in method run} + 1 0.144 0.144 6.534 6.534 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/__init__.py:14() +``` + +每一列的含义是: + +| 列名 | 含义 | +| --- | --- | +| ncalls | 函数的调用次数 | +| tottime | 函数实际使用的总时间。该时间去除掉本函数调用其他函数的时间 | +| percall | tottime的每次调用平均时间 | +| cumtime | 函数总时间。包含这个函数调用其他函数的时间 | +| percall | cumtime的每次调用平均时间 | +| filename:lineno(function) | 文件名, 行号,函数名 | + + +### 寻找性能瓶颈 + +通常`tottime`和`cumtime`是寻找瓶颈的关键指标。这两个指标代表了某一个函数真实的运行时间。 + +将性能分析结果按照tottime排序,效果如下: + +```text + 4696 12.040 0.003 12.040 0.003 {built-in method run} + 300005 0.874 0.000 1.681 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/dataset/mnist.py:38(reader) + 107991 0.676 0.000 1.519 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:219(__init__) + 4697 0.626 0.000 2.291 0.000 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:428(sync_with_cpp) + 1 0.618 0.618 0.618 0.618 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/__init__.py:1() + +``` + +可以看到最耗时的函数是C++端的`run`函数。这需要联合我们第二节`Python与C++混合代码的性能分析`来进行调优。而`sync_with_cpp`函数的总共耗时很长,每次调用的耗时也很长。于是我们可以点击`sync_with_cpp`的详细信息,了解其调用关系。 + +```text +Called By: + + Ordered by: internal time + List reduced from 4497 to 2 due to restriction <'sync_with_cpp'> + +Function was called by... + ncalls tottime cumtime +/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:428(sync_with_cpp) <- 4697 0.626 2.291 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:562(sync_with_cpp) +/home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:562(sync_with_cpp) <- 4696 0.019 2.316 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:487(clone) + 1 0.000 0.001 /home/yuyang/perf_test/.env/lib/python2.7/site-packages/paddle/v2/fluid/framework.py:534(append_backward) + + +Called: + + Ordered by: internal time + List reduced from 4497 to 2 due to restriction <'sync_with_cpp'> +``` + +通常观察热点函数间的调用关系,和对应行的代码,就可以了解到问题代码在哪里。当我们做出性能修正后,再次进行性能分析(profiling)即可检查我们调优后的修正是否能够改善程序的性能。 + + + +## Python与C++混合代码的性能分析 + +### 生成性能分析文件 + +C++的性能分析工具非常多。常见的包括`gprof`, `valgrind`, `google-perftools`。但是调试Python中使用的动态链接库与直接调试原始二进制相比增加了很多复杂度。幸而Python的一个第三方库`yep`提供了方便的和`google-perftools`交互的方法。于是这里使用`yep`进行Python与C++混合代码的性能分析 + +使用`yep`前需要安装`google-perftools`与`yep`包。ubuntu下安装命令为 + +```bash +apt install libgoogle-perftools-dev +pip install yep +``` + +安装完毕后,我们可以通过 + +```bash +python -m yep -v main.py +``` + +生成性能分析文件。生成的性能分析文件为`main.py.prof`。 + +命令行中的`-v`指定在生成性能分析文件之后,在命令行显示分析结果。我们可以在命令行中简单的看一下生成效果。因为C++与Python不同,编译时可能会去掉调试信息,运行时也可能因为多线程产生混乱不可读的性能分析结果。为了生成更可读的性能分析结果,可以采取下面几点措施: + +1. 编译时指定`-g`生成调试信息。使用cmake的话,可以将CMAKE_BUILD_TYPE指定为`RelWithDebInfo`。 +2. 编译时一定要开启优化。单纯的`Debug`编译性能会和`-O2`或者`-O3`有非常大的差别。`Debug`模式下的性能测试是没有意义的。 +3. 运行性能分析的时候,先从单线程开始,再开启多线程,进而多机。毕竟如果单线程调试更容易。可以设置`OMP_NUM_THREADS=1`这个环境变量关闭openmp优化。 + +### 查看性能分析文件 + +在运行完性能分析后,会生成性能分析结果文件。我们可以使用[pprof](https://github.com/google/pprof)来显示性能分析结果。注意,这里使用了用`Go`语言重构后的`pprof`,因为这个工具具有web服务界面,且展示效果更好。 + +安装`pprof`的命令和一般的`Go`程序是一样的,其命令如下: + +```bash +go get github.com/google/pprof +``` + +进而我们可以使用如下命令开启一个HTTP服务: + +```bash +pprof -http=0.0.0.0:3213 `which python` ./main.py.prof +``` + +这行命令中,`-http`指开启HTTP服务。`which python`会产生当前Python二进制的完整路径,进而指定了Python可执行文件的路径。`./main.py.prof`输入了性能分析结果。 + +访问对应的网址,我们可以查看性能分析的结果。结果如下图所示: + +![result](./pprof_1.png) + + +### 寻找性能瓶颈 + +与寻找Python代码的性能瓶颈类似,寻找Python与C++混合代码的性能瓶颈也是要看`tottime`和`cumtime`。而`pprof`展示的调用图也可以帮助我们发现性能中的问题。 + +例如下图中, + +![kernel_perf](./pprof_2.png) + +在一次训练中,乘法和乘法梯度的计算占用2%-4%左右的计算时间。而`MomentumOp`占用了17%左右的计算时间。显然,`MomentumOp`的性能有问题。 + +在`pprof`中,对于性能的关键路径都做出了红色标记。先检查关键路径的性能问题,再检查其他部分的性能问题,可以更有次序的完成性能的优化。 + +## 总结 + +至此,两种性能分析的方式都介绍完毕了。希望通过这两种性能分析的方式,Paddle的开发人员和使用人员可以有次序的,科学的发现和解决性能问题。 diff --git a/doc/howto/optimization/pprof_1.png b/doc/howto/optimization/pprof_1.png new file mode 100644 index 0000000000000000000000000000000000000000..8e9edbf377672d0ef40f2fc7bd39e746923550cb Binary files /dev/null and b/doc/howto/optimization/pprof_1.png differ diff --git a/doc/howto/optimization/pprof_2.png b/doc/howto/optimization/pprof_2.png new file mode 100644 index 0000000000000000000000000000000000000000..172ba20399ba974d27f4c072425277b69b02520b Binary files /dev/null and b/doc/howto/optimization/pprof_2.png differ diff --git a/paddle/gserver/layers/ROIPoolLayer.cpp b/paddle/gserver/layers/ROIPoolLayer.cpp index 02402894d3354a6af221948a3360ef830881bf39..2c8256b91c97b513ce7237b8174c522430094926 100644 --- a/paddle/gserver/layers/ROIPoolLayer.cpp +++ b/paddle/gserver/layers/ROIPoolLayer.cpp @@ -13,6 +13,7 @@ See the License for the specific language governing permissions and limitations under the License. */ #include "ROIPoolLayer.h" +#include namespace paddle { @@ -126,10 +127,8 @@ void ROIPoolLayer::forward(PassType passType) { bool isEmpty = (hend <= hstart) || (wend <= wstart); size_t poolIndex = ph * pooledWidth_ + pw; - if (isEmpty) { - outputData[poolIndex] = 0; - argmaxData[poolIndex] = -1; - } + outputData[poolIndex] = isEmpty ? 0 : -FLT_MAX; + argmaxData[poolIndex] = -1; for (size_t h = hstart; h < hend; ++h) { for (size_t w = wstart; w < wend; ++w) { diff --git a/paddle/operators/CMakeLists.txt b/paddle/operators/CMakeLists.txt index 7ab09b6c6541b5bb1ec7503da85bacfda3a0112f..a4c4374cf2f8b4b034d05e3a4c2221300a944214 100644 --- a/paddle/operators/CMakeLists.txt +++ b/paddle/operators/CMakeLists.txt @@ -73,6 +73,13 @@ function(op_library TARGET) file(APPEND ${pybind_file} "USE_OP(conv2d);\n") endif() + # conv_cudnn_op contains several operators + if ("${TARGET}" STREQUAL "conv_cudnn_op") + set(pybind_flag 1) + # It's enough to just adding one operator to pybind + file(APPEND ${pybind_file} "USE_OP(conv2d_cudnn);\n") + endif() + # pool_op contains several operators if ("${TARGET}" STREQUAL "pool_op") set(pybind_flag 1) @@ -193,6 +200,7 @@ set(DEPS_OPS lod_rank_table_op lod_tensor_to_array_op array_to_lod_tensor_op + max_sequence_len_op lstm_op tensor_array_read_write_op gru_op @@ -215,6 +223,7 @@ op_library(pool_with_index_op DEPS pooling) op_library(lod_rank_table_op SRCS lod_rank_table_op.cc DEPS lod_rank_table) op_library(lod_tensor_to_array_op SRCS lod_tensor_to_array_op.cc DEPS lod_rank_table_op) op_library(array_to_lod_tensor_op SRCS array_to_lod_tensor_op.cc DEPS lod_rank_table_op) +op_library(max_sequence_len_op SRCS max_sequence_len_op.cc DEPS lod_rank_table) op_library(tensor_array_read_write_op SRCS tensor_array_read_write_op.cc) if(WITH_GPU) op_library(nccl_op DEPS nccl_common) diff --git a/paddle/operators/conv_cudnn_op.cc b/paddle/operators/conv_cudnn_op.cc index c03dc3e4fb07ac6ecde42be93a1138d91778edf4..0dd8c13b2ad6ff206066ccb98a4c009e4c3b4fd0 100644 --- a/paddle/operators/conv_cudnn_op.cc +++ b/paddle/operators/conv_cudnn_op.cc @@ -17,10 +17,10 @@ namespace paddle { namespace operators { -class CudnnConvOpMaker : public Conv2DOpMaker { +class CudnnConv2DOpMaker : public Conv2DOpMaker { public: - CudnnConvOpMaker(framework::OpProto* proto, - framework::OpAttrChecker* op_checker) + CudnnConv2DOpMaker(framework::OpProto* proto, + framework::OpAttrChecker* op_checker) : Conv2DOpMaker(proto, op_checker) { AddAttr("workspace_size_MB", "workspace size for cudnn, in MB, " @@ -32,16 +32,43 @@ class CudnnConvOpMaker : public Conv2DOpMaker { } }; +class CudnnConv3DOpMaker : public Conv3DOpMaker { + public: + CudnnConv3DOpMaker(framework::OpProto* proto, + framework::OpAttrChecker* op_checker) + : Conv3DOpMaker(proto, op_checker) { + AddAttr("workspace_size_MB", + "workspace size for cudnn, in MB, " + "workspace is a section of GPU memory which will be " + "allocated/freed each time the operator runs, larger " + "workspace size can increase performance but also requires " + "better hardware. This size should be chosen carefully.") + .SetDefault(4096); + } +}; + } // namespace operators } // namespace paddle namespace ops = paddle::operators; -REGISTER_OP(conv_cudnn, ops::ConvOp, ops::CudnnConvOpMaker, conv_cudnn_grad, - ops::ConvOpGrad); +REGISTER_OP(conv2d_cudnn, ops::ConvOp, ops::CudnnConv2DOpMaker, + conv2d_cudnn_grad, ops::ConvOpGrad); + +REGISTER_OP(conv3d_cudnn, ops::ConvOp, ops::CudnnConv3DOpMaker, + conv3d_cudnn_grad, ops::ConvOpGrad); + +REGISTER_OP_CPU_KERNEL(conv2d_cudnn, + ops::GemmConvKernel, + ops::GemmConvKernel); +REGISTER_OP_CPU_KERNEL( + conv2d_cudnn_grad, + ops::GemmConvGradKernel, + ops::GemmConvGradKernel); -REGISTER_OP_CPU_KERNEL(conv_cudnn, +REGISTER_OP_CPU_KERNEL(conv3d_cudnn, ops::GemmConvKernel, ops::GemmConvKernel); REGISTER_OP_CPU_KERNEL( - conv_cudnn_grad, ops::GemmConvGradKernel, + conv3d_cudnn_grad, + ops::GemmConvGradKernel, ops::GemmConvGradKernel); diff --git a/paddle/operators/conv_cudnn_op.cu.cc b/paddle/operators/conv_cudnn_op.cu.cc index 5eaf6b33704eb371fff4b949c6cc32a7a5dbc812..a9763d424801cfced5fe4c4718a335a24b81cfdc 100644 --- a/paddle/operators/conv_cudnn_op.cu.cc +++ b/paddle/operators/conv_cudnn_op.cu.cc @@ -56,6 +56,21 @@ class CudnnConvOpKernel : public framework::OpKernel { ScopedFilterDescriptor filter_desc; ScopedConvolutionDescriptor conv_desc; DataLayout layout = DataLayout::kNCHW; + if (input->dims().size() == 5) { + layout = DataLayout::kNCDHW; + } + + cudnnConvolutionDescriptor_t cudnn_conv_desc = + conv_desc.descriptor(paddings, strides, dilations); + +#if CUDNN_VERSION_MIN(7, 0, 0) + // cudnn 7 can support groups, no need to do it mannually + // FIXME(typhoonzero): find a better way to disable groups + // rather than setting it to 1. + PADDLE_ENFORCE(platform::dynload::cudnnSetConvolutionGroupCount( + cudnn_conv_desc, groups)); + groups = 1; +#endif cudnnTensorDescriptor_t cudnn_input_desc = input_desc.descriptor( layout, framework::vectorize2int(input->dims()), groups); @@ -63,19 +78,34 @@ class CudnnConvOpKernel : public framework::OpKernel { layout, framework::vectorize2int(output->dims()), groups); cudnnFilterDescriptor_t cudnn_filter_desc = filter_desc.descriptor( layout, framework::vectorize2int(filter->dims()), groups); - cudnnConvolutionDescriptor_t cudnn_conv_desc = - conv_desc.descriptor(paddings, strides, dilations); int input_channels = input->dims()[1]; - int input_height = input->dims()[2]; - int input_width = input->dims()[3]; - int output_channels = output->dims()[1]; - int output_height = output->dims()[2]; - int output_width = output->dims()[3]; + int input_height, input_width, input_depth; + if (input->dims().size() == 5) { + input_depth = input->dims()[2]; + input_height = input->dims()[3]; + input_width = input->dims()[4]; + } else { // dim size is enforced in InferShape + input_depth = 1; + input_height = input->dims()[2]; + input_width = input->dims()[3]; + } + int output_channels = filter->dims()[0]; + int output_height, output_width, output_depth; + if (output->dims().size() == 5) { + output_depth = output->dims()[2]; + output_height = output->dims()[3]; + output_width = output->dims()[4]; + } else { + output_depth = 1; + output_height = output->dims()[2]; + output_width = output->dims()[3]; + } - int group_offset_in = input_channels / groups * input_height * input_width; + int group_offset_in = + input_channels / groups * input_height * input_width * input_depth; int group_offset_out = - output_channels / groups * output_height * output_width; + output_channels / groups * output_height * output_width * output_depth; int group_offset_filter = filter->numel() / groups; // ------------------- cudnn conv workspace --------------------- void* cudnn_workspace = nullptr; @@ -138,12 +168,26 @@ class CudnnConvGradOpKernel : public framework::OpKernel { // ------------------- cudnn descriptors --------------------- ScopedTensorDescriptor input_desc; ScopedTensorDescriptor output_grad_desc; - ScopedTensorDescriptor input_grad_desc; ScopedFilterDescriptor filter_desc; ScopedFilterDescriptor filter_grad_desc; ScopedConvolutionDescriptor conv_desc; DataLayout layout = DataLayout::kNCHW; + if (input->dims().size() == 5) { + layout = DataLayout::kNCDHW; + } + + cudnnConvolutionDescriptor_t cudnn_conv_desc = + conv_desc.descriptor(paddings, strides, dilations); + +#if CUDNN_VERSION_MIN(7, 0, 0) + // cudnn 7 can support groups, no need to do it mannually + // FIXME(typhoonzero): find a better way to disable groups + // rather than setting it to 1. + PADDLE_ENFORCE(platform::dynload::cudnnSetConvolutionGroupCount( + cudnn_conv_desc, groups)); + groups = 1; +#endif cudnnTensorDescriptor_t cudnn_input_desc = input_desc.descriptor( layout, framework::vectorize2int(input->dims()), groups); @@ -152,22 +196,35 @@ class CudnnConvGradOpKernel : public framework::OpKernel { layout, framework::vectorize2int(output_grad->dims()), groups); cudnnFilterDescriptor_t cudnn_filter_desc = filter_desc.descriptor( layout, framework::vectorize2int(filter->dims()), groups); - cudnnTensorDescriptor_t cudnn_input_grad_desc = nullptr; - cudnnFilterDescriptor_t cudnn_filter_grad_desc = nullptr; - - cudnnConvolutionDescriptor_t cudnn_conv_desc = - conv_desc.descriptor(paddings, strides, dilations); int input_channels = input->dims()[1]; - int input_height = input->dims()[2]; - int input_width = input->dims()[3]; + int input_height, input_width, input_depth; + if (input->dims().size() == 5) { + input_depth = input->dims()[2]; + input_height = input->dims()[3]; + input_width = input->dims()[4]; + } else { // dim size is enforced in InferShape + input_depth = 1; + input_height = input->dims()[2]; + input_width = input->dims()[3]; + } + int output_grad_channels = filter->dims()[0]; - int output_grad_height = output_grad->dims()[2]; - int output_grad_width = output_grad->dims()[3]; + int output_grad_height, output_grad_width, output_grad_depth; + if (input->dims().size() == 5) { + output_grad_depth = output_grad->dims()[2]; + output_grad_height = output_grad->dims()[3]; + output_grad_width = output_grad->dims()[4]; + } else { + output_grad_depth = 1; + output_grad_height = output_grad->dims()[2]; + output_grad_width = output_grad->dims()[3]; + } - int group_offset_in = input_channels / groups * input_height * input_width; - int group_offset_out = - output_grad_channels / groups * output_grad_height * output_grad_width; + int group_offset_in = + input_channels / groups * input_height * input_width * input_depth; + int group_offset_out = output_grad_channels / groups * output_grad_height * + output_grad_width * output_grad_depth; int group_offset_filter = filter->numel() / groups; // ------------------- cudnn backward algorithm --------------------- cudnnConvolutionBwdDataAlgo_t data_algo; @@ -180,8 +237,6 @@ class CudnnConvGradOpKernel : public framework::OpKernel { auto handle = ctx.cuda_device_context().cudnn_handle(); if (input_grad) { - cudnn_input_grad_desc = input_grad_desc.descriptor( - layout, framework::vectorize2int(input_grad->dims()), groups); PADDLE_ENFORCE( platform::dynload::cudnnGetConvolutionBackwardDataAlgorithm( handle, cudnn_filter_desc, @@ -190,19 +245,17 @@ class CudnnConvGradOpKernel : public framework::OpKernel { cudnn_output_grad_desc, cudnn_conv_desc, // dxDesc: Handle to the previously initialized output tensor // descriptor. - cudnn_input_grad_desc, + cudnn_input_desc, CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT, workspace_size_limit, &data_algo)); PADDLE_ENFORCE( platform::dynload::cudnnGetConvolutionBackwardDataWorkspaceSize( handle, cudnn_filter_desc, cudnn_output_grad_desc, - cudnn_conv_desc, cudnn_input_grad_desc, data_algo, &tmp_size)); + cudnn_conv_desc, cudnn_input_desc, data_algo, &tmp_size)); workspace_size_in_bytes = std::max(workspace_size_in_bytes, tmp_size); } if (filter_grad) { - cudnn_filter_grad_desc = filter_grad_desc.descriptor( - layout, framework::vectorize2int(filter_grad->dims()), groups); PADDLE_ENFORCE( platform::dynload::cudnnGetConvolutionBackwardFilterAlgorithm( handle, cudnn_input_desc, cudnn_output_grad_desc, cudnn_conv_desc, @@ -222,7 +275,6 @@ class CudnnConvGradOpKernel : public framework::OpKernel { platform::GPUPlace gpu = boost::get(ctx.GetPlace()); cudnn_workspace = paddle::memory::Alloc(gpu, workspace_size_in_bytes); // ------------------- cudnn conv backward data --------------------- - // FIXME(typhoonzero): template type T may not be the same as cudnn call. T alpha = 1.0f, beta = 0.0f; if (input_grad) { T* input_grad_data = input_grad->mutable_data(ctx.GetPlace()); @@ -233,21 +285,20 @@ class CudnnConvGradOpKernel : public framework::OpKernel { handle, &alpha, cudnn_filter_desc, filter_data + i * group_offset_filter, cudnn_output_grad_desc, output_grad_data + i * group_offset_out, cudnn_conv_desc, data_algo, - cudnn_workspace, workspace_size_in_bytes, &beta, - cudnn_input_grad_desc, input_grad_data + i * group_offset_in)); + cudnn_workspace, workspace_size_in_bytes, &beta, cudnn_input_desc, + input_grad_data + i * group_offset_in)); } } // ------------------- cudnn conv backward filter --------------------- if (filter_grad) { T* filter_grad_data = filter_grad->mutable_data(ctx.GetPlace()); // Because beta is zero, it is unnecessary to reset filter_grad. - for (int i = 0; i < groups; i++) { PADDLE_ENFORCE(platform::dynload::cudnnConvolutionBackwardFilter( handle, &alpha, cudnn_input_desc, input_data + i * group_offset_in, cudnn_output_grad_desc, output_grad_data + i * group_offset_out, cudnn_conv_desc, filter_algo, cudnn_workspace, - workspace_size_in_bytes, &beta, cudnn_filter_grad_desc, + workspace_size_in_bytes, &beta, cudnn_filter_desc, filter_grad_data + i * group_offset_filter)); } } @@ -259,8 +310,16 @@ class CudnnConvGradOpKernel : public framework::OpKernel { } // namespace operators } // namespace paddle -REGISTER_OP_GPU_KERNEL(conv_cudnn, paddle::operators::CudnnConvOpKernel, +REGISTER_OP_GPU_KERNEL(conv2d_cudnn, + paddle::operators::CudnnConvOpKernel, + paddle::operators::CudnnConvOpKernel); +REGISTER_OP_GPU_KERNEL(conv2d_cudnn_grad, + paddle::operators::CudnnConvGradOpKernel, + paddle::operators::CudnnConvGradOpKernel); + +REGISTER_OP_GPU_KERNEL(conv3d_cudnn, + paddle::operators::CudnnConvOpKernel, paddle::operators::CudnnConvOpKernel); -REGISTER_OP_GPU_KERNEL(conv_cudnn_grad, +REGISTER_OP_GPU_KERNEL(conv3d_cudnn_grad, paddle::operators::CudnnConvGradOpKernel, paddle::operators::CudnnConvGradOpKernel); diff --git a/paddle/operators/math/selected_rows_functor.cu b/paddle/operators/math/selected_rows_functor.cu index c40649e55ef93dec852ff6949b5cb134495e4ebf..c1dd323ba29e03e3ab4a3e4d7248388b408fb9d6 100644 --- a/paddle/operators/math/selected_rows_functor.cu +++ b/paddle/operators/math/selected_rows_functor.cu @@ -227,7 +227,6 @@ template struct SelectedRowsAddToTensor; template struct SelectedRowsAddToTensor; template struct SelectedRowsAddToTensor; template struct SelectedRowsAddToTensor; - } // namespace math } // namespace operators } // namespace paddle diff --git a/paddle/operators/max_sequence_len_op.cc b/paddle/operators/max_sequence_len_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..798022c9dd904a0ac189b4b550a94264a433ebf2 --- /dev/null +++ b/paddle/operators/max_sequence_len_op.cc @@ -0,0 +1,66 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. */ + +#include "paddle/framework/lod_rank_table.h" +#include "paddle/framework/op_registry.h" +#include "paddle/framework/operator.h" + +namespace paddle { +namespace operators { + +class MaxSeqenceLenOp : public framework::OperatorBase { + public: + MaxSeqenceLenOp(const std::string &type, + const framework::VariableNameMap &inputs, + const framework::VariableNameMap &outputs, + const framework::AttributeMap &attrs) + : OperatorBase(type, inputs, outputs, attrs) {} + + void Run(const framework::Scope &scope, + const platform::DeviceContext &dev_ctx) const override { + auto &rank_table = + scope.FindVar(Input("RankTable"))->Get(); + auto *out = + scope.FindVar(Output("Out"))->GetMutable(); + int64_t *out_ptr = out->mutable_data({1}, platform::CPUPlace()); + *out_ptr = rank_table.items()[0].length; + } +}; + +class MaxSeqenceLenOpProtoMaker : public framework::OpProtoAndCheckerMaker { + public: + MaxSeqenceLenOpProtoMaker(framework::OpProto *proto, + framework::OpAttrChecker *op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("RankTable", "The lod_rank_table."); + AddOutput("Out", "The max sequence length."); + AddComment( + R"DOC(Calculate the max sequence length through lod_rank_table.)DOC"); + } +}; + +class MaxSeqenceLenInferShape : public framework::InferShapeBase { + public: + void operator()(framework::InferShapeContext *context) const override { + PADDLE_ENFORCE(context->HasInput("RankTable")); + context->SetOutputDim("Out", {1}); + } +}; +} // namespace operators +} // namespace paddle + +REGISTER_OPERATOR(max_sequence_len, paddle::operators::MaxSeqenceLenOp, + paddle::operators::MaxSeqenceLenOpProtoMaker, + paddle::operators::MaxSeqenceLenInferShape, + paddle::framework::EmptyGradOpMaker); diff --git a/paddle/platform/cudnn_helper.h b/paddle/platform/cudnn_helper.h index c5d8a6066ef3becb601344590f977a38c2af0a63..80a4c9bb4bbcd03cf849d86118db4e502382f031 100644 --- a/paddle/platform/cudnn_helper.h +++ b/paddle/platform/cudnn_helper.h @@ -116,7 +116,7 @@ inline cudnnTensorFormat_t GetCudnnTensorFormat( case DataLayout::kNCHW: return CUDNN_TENSOR_NCHW; case DataLayout::kNCDHW: - return CUDNN_TENSOR_NCHW; // TODO(chengduoZH) : add CUDNN_TENSOR_NCDHW + return CUDNN_TENSOR_NCHW; // NOTE: cudnn treat NdTensor as the same default: PADDLE_THROW("Unknown cudnn equivalent for order"); } @@ -143,7 +143,7 @@ class ScopedTensorDescriptor { strides[i] = dims[i + 1] * strides[i + 1]; } // Update tensor descriptor dims setting if groups > 1 - // FIXME(typhoonzero): Assume using NCHW or NCDHW order + // NOTE: Assume using NCHW or NCDHW order std::vector dims_with_group(dims.begin(), dims.end()); // copy if (groups > 1) { dims_with_group[1] = dims_with_group[1] / groups; @@ -186,7 +186,6 @@ class ScopedFilterDescriptor { // width of the filter. std::vector kernel_with_group(kernel.begin(), kernel.end()); if (groups > 1) { - // M /= groups kernel_with_group[0] /= groups; // NOTE: input filter(C) of the filter is already asserted to be C/groups. } diff --git a/python/paddle/trainer/config_parser.py b/python/paddle/trainer/config_parser.py index 5ba0e50c6ba0f84a3ea87d5a5199fef23a5b05ea..cfe2a34a1f34a9c828486a7a6dbe320f230bb986 100644 --- a/python/paddle/trainer/config_parser.py +++ b/python/paddle/trainer/config_parser.py @@ -2798,19 +2798,18 @@ class AddToLayer(LayerBase): name, self.layer_type, 0, inputs=inputs, **xargs) config_assert(len(inputs) > 0, 'inputs cannot be empty for AddToLayer') - if len(self.inputs) > 1: - for input_index in xrange(len(self.inputs)): - assert self.get_input_layer(0).height == self.get_input_layer( - input_index).height - assert self.get_input_layer(0).width == self.get_input_layer( - input_index).width - assert self.get_input_layer(0).depth == self.get_input_layer( - input_index).depth + layer_size = self.get_input_layer(0).size + # To reserve heght, width, depth. + layer_with_hwc = self.get_input_layer(0) + for input_index in xrange(len(self.inputs)): + input_layer = self.get_input_layer(input_index) + assert layer_size == input_layer.size + if input_layer.height and input_layer.height and input_layer.height: + layer_with_hwc = input_layer - self.set_layer_size(self.get_input_layer(0).size) - self.set_layer_height_width(self.get_input_layer(0).height, \ - self.get_input_layer(0).width) - self.set_layer_depth(self.get_input_layer(0).depth) + self.set_layer_size(layer_with_hwc.size) + self.set_layer_height_width(layer_with_hwc.height, layer_with_hwc.width) + self.set_layer_depth(layer_with_hwc.depth) self.create_bias_parameter(bias, self.config.size) diff --git a/python/paddle/v2/fluid/__init__.py b/python/paddle/v2/fluid/__init__.py index 5df612bf3530c843c16b337f2b8f83445fcf39b5..9677c9568c6783921545364bca7b2c9c0041d823 100644 --- a/python/paddle/v2/fluid/__init__.py +++ b/python/paddle/v2/fluid/__init__.py @@ -1,11 +1,41 @@ -import sys -import core -__all__ = ['proto'] -argv = [] -if core.is_compile_gpu(): - argv = list(sys.argv) + [ - "--tryfromenv=fraction_of_gpu_memory_to_use,use_pinned_memory" - ] -else: - argv = list(sys.argv) + ["--tryfromenv=use_pinned_memory"] -core.init_gflags(argv) +# import all class inside framework into fluid module +import framework +from framework import * +# import all class inside executor into fluid module +import executor +from executor import * + +import io +import evaluator +import initializer +import layers +import nets +import optimizer +import backward +import regularizer + +from core import LoDTensor, CPUPlace, GPUPlace + +Tensor = LoDTensor +__all__ = framework.__all__ + executor.__all__ + [ + 'io', 'initializer', 'layers', 'nets', 'optimizer', 'backward', + 'regularizer', 'LoDTensor', 'CPUPlace', 'GPUPlace', 'Tensor' +] + + +def __read_gflags_from_env__(): + """ + Enable reading gflags from environment variables. + + Returns: + None + """ + import sys + import core + read_env_flags = ['use_pinned_memory'] + if core.is_compile_gpu(): + read_env_flags.append('fraction_of_gpu_memory_to_use') + core.init_gflags(sys.argv + ["--tryfromenv=" + ",".join(read_env_flags)]) + + +__read_gflags_from_env__() diff --git a/python/paddle/v2/fluid/evaluator.py b/python/paddle/v2/fluid/evaluator.py index f78d2f814c89aa6b5ee8387f2558a97c754e655c..bd4a6fda1fd20e68d5a42e76f6ab516bb5c00cff 100644 --- a/python/paddle/v2/fluid/evaluator.py +++ b/python/paddle/v2/fluid/evaluator.py @@ -1,9 +1,13 @@ import numpy as np -from paddle.v2.fluid.framework import Program, g_main_program, unique_name, Variable -import paddle.v2.fluid.core as core +import layers +from framework import Program, unique_name, Variable +from layer_helper import LayerHelper -def _clone_var_in_block_(block, var): +__all__ = ['Accuracy'] + + +def _clone_var_(block, var): assert isinstance(var, Variable) return block.create_var( name=var.name, @@ -16,175 +20,115 @@ def _clone_var_in_block_(block, var): class Evaluator(object): """ - Evalutor Base class. - - create metric states - add mini-batch evaluator caculate operator - add increment operator to accumulate the metric states + Base Class for all evaluators + + Args: + name(str): The name of evaluator. such as, "accuracy". Used for generate + temporary variable name. + main_program(Program, optional): The evaluator should be added to this + main_program. Default g_main_program + startup_program(Program, optional):The parameter should be added to this + startup_program. Default g_startup_program + + Attributes: + states(list): The list of state variables. states will be reset to zero + when `reset` is invoked. + metrics(list): The list of metrics variables. They will be calculate + every mini-batch """ def __init__(self, name, **kwargs): + self.states = [] + self.metrics = [] + self.helper = LayerHelper(name, **kwargs) + + def reset(self, executor, reset_program=None): """ - init the global states + reset metric states at the begin of each pass/user specified batch """ - self._states = {} - if kwargs.has_key("main_program"): - self._main_program = kwargs.get("main_program") - else: - self._main_program = g_main_program + if reset_program is None: + reset_program = Program() + + for var in self.states: + assert isinstance(var, Variable) + g_var = _clone_var_(reset_program.current_block(), var) + layers.fill_constant( + shape=g_var.shape, + value=0.0, + dtype=g_var.dtype, + out=g_var, + main_program=reset_program) - def states(self): - return self._states + executor.run(reset_program) - def _update_ops(self, *args, **kwargs): + def eval(self, executor, eval_program=None): """ - append update ops to the global states + Evaluate the statistics merged by multiple mini-batches. """ raise NotImplementedError() - def reset(self, executor, reset_program=None): + def create_state(self, suffix, dtype, shape): """ - Clear metric states at the begin of each pass/user specified batch - """ - if reset_program == None: - reset_program = Program() - else: - reset_program = program - block = reset_program.global_block() - for k, var in self._states.iteritems(): - g_var = _clone_var_in_block_(block, var) - zeros = block.create_var(dtype="float32", persistable=True) - block.append_op( - type="fill_constant", - outputs={"Out": [zeros]}, - attrs={ - "shape": g_var.shape, - "value": .0, - "dtype": 5, - }) - block.append_op( - type="scale", inputs={"X": zeros}, outputs={"Out": g_var}) - executor.run(reset_program, fetch_list=self._states.values()) + Create state variable. + + NOTE: It is not a public API. + + Args: + suffix(str): the state suffix. + dtype(str|core.DataType): the state data type + shape(tuple|list): the shape of state + + Returns: State variable - def eval(self, executor, eval_program=None): - """ - Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. """ - raise NotImplementedError() + state = self.helper.create_variable( + name="_".join([unique_name(self.helper.name), suffix]), + persistable=True, + dtype=dtype, + shape=shape) + self.states.append(state) + return state class Accuracy(Evaluator): """ - Accuracy need two state variable Total, Correct + Average Accuracy for multiple mini-batches. """ - def __init__(self, *args, **kwargs): + def __init__(self, input, label, k=1, **kwargs): super(Accuracy, self).__init__("accuracy", **kwargs) - block = self._main_program.global_block() - g_total = block.create_var( - name=unique_name("Total"), - persistable=True, - dtype="int64", - shape=[1]) - g_correct = block.create_var( - name=unique_name("Correct"), - persistable=True, - dtype="int64", - shape=[1]) - self._states["Total"] = g_total - self._states["Correct"] = g_correct - - def _update_ops(self, input, label, k=1, **kwargs): - block = self._main_program.global_block() - topk_out = block.create_var(dtype=input.dtype) - topk_indices = block.create_var(dtype="int64") - block.append_op( - type="top_k", - inputs={"X": [input]}, - outputs={"Out": [topk_out], - "Indices": [topk_indices]}, - attrs={"k": k}) - acc_out = block.create_var(dtype=kwargs.get("out_dtype", "float32")) - correct = block.create_var(dtype="int64", persistable=True) - total = block.create_var(dtype="int64", persistable=True) - block.append_op( - type="accuracy", - inputs={ - "Out": [topk_out], - "Indices": [topk_indices], - "Label": [label] - }, - outputs={ - "Accuracy": [acc_out], - "Correct": [correct], - "Total": [total], - }) - - block.append_op( - type="cast", - inputs={"X": [self._states["Total"]]}, - outputs={"Out": [self._states["Total"]]}, - attrs={ - "in_dtype": 5, # float32 - "out_dtype": 2, # int32 - }) - block.append_op( - type="cast", - inputs={"X": [self._states["Correct"]]}, - outputs={"Out": [self._states["Correct"]]}, - attrs={ - "in_dtype": 5, - "out_dtype": 2, - }) - - block.append_op( - type="elementwise_add", - inputs={"X": [self._states["Total"]], - "Y": [total]}, - outputs={"Out": [self._states["Total"]]}) - block.append_op( - type="elementwise_add", - inputs={"X": [self._states["Correct"]], - "Y": [correct]}, - outputs={"Out": [self._states["Correct"]]}) - - return acc_out + main_program = self.helper.main_program + if main_program.current_block().idx != 0: + raise ValueError("You can only invoke Evaluator in root block") + + self.total = self.create_state(dtype='int64', shape=[1], suffix='total') + self.correct = self.create_state( + dtype='int64', shape=[1], suffix='correct') + kwargs = {'main_program': main_program} + total = self.helper.create_tmp_variable(dtype='int') + correct = self.helper.create_tmp_variable(dtype='int') + acc = layers.accuracy( + input=input, + label=label, + k=k, + total=total, + correct=correct, + **kwargs) + total = layers.cast(x=total, dtype='int64', **kwargs) + correct = layers.cast(x=correct, dtype='int64', **kwargs) + layers.sums(input=[self.total, total], out=self.total, **kwargs) + layers.sums(input=[self.correct, correct], out=self.correct, **kwargs) + + self.metrics.append(acc) def eval(self, executor, eval_program=None): - if eval_program != None: - eval_program = eval_program - else: + if eval_program is None: eval_program = Program() - block = eval_program.global_block() - eval_out = block.create_var(dtype=self._states["Total"].dtype) - e_total = _clone_var_in_block_(block, self._states["Total"]) - e_correct = _clone_var_in_block_(block, self._states["Correct"]) - block.append_op( - type="cast", - inputs={"X": [e_total]}, - outputs={"Out": [e_total]}, - attrs={ - "in_dtype": 2, # int32 - "out_dtype": 5, # float32 - }) - block.append_op( - type="cast", - inputs={"X": [e_correct]}, - outputs={"Out": [e_correct]}, - attrs={ - "in_dtype": 2, - "out_dtype": 5, - }) - block.append_op( - type="elementwise_div", - inputs={"X": e_correct, - "Y": e_total}, - outputs={"Out": eval_out}) - out = executor.run(eval_program, fetch_list=[eval_out]) - return np.array(out[0]) - - -def accuracy(*args, **kwargs): - cls = Accuracy(*args, **kwargs) - out = cls._update_ops(*args, **kwargs) - return cls, out + block = eval_program.current_block() + kwargs = {'main_program': eval_program} + total = _clone_var_(block, self.total) + correct = _clone_var_(block, self.correct) + total = layers.cast(total, dtype='float32', **kwargs) + correct = layers.cast(correct, dtype='float32', **kwargs) + out = layers.elementwise_div(x=correct, y=total, **kwargs) + return np.array(executor.run(eval_program, fetch_list=[out])[0]) diff --git a/python/paddle/v2/fluid/executor.py b/python/paddle/v2/fluid/executor.py index ed1c2c06daa7ede97e138049a1f7044d071c31e8..3e26d1b983a3c924ce2392c266bcd32e27c7b309 100644 --- a/python/paddle/v2/fluid/executor.py +++ b/python/paddle/v2/fluid/executor.py @@ -1,9 +1,40 @@ -import paddle.v2.fluid.core as core -from paddle.v2.fluid.framework import Block, Program, g_main_program +import numpy as np +from . import core +from framework import Program, g_main_program + +__all__ = ['Executor', 'g_scope'] g_scope = core.Scope() +def as_numpy(tensor): + if isinstance(tensor, list): + return [as_numpy(t) for t in tensor] + assert isinstance(tensor, core.LoDTensor) + lod = tensor.lod() + tensor_data = np.array(tensor) + if len(lod) == 0: + ans = tensor_data + else: + raise RuntimeError("LoD Calculate lacks unit tests and buggy") + # elif len(lod) == 1: + # ans = [] + # idx = 0 + # while idx < len(lod) - 1: + # ans.append(tensor_data[lod[idx]:lod[idx + 1]]) + # idx += 1 + # else: + # for l in reversed(lod): + # ans = [] + # idx = 0 + # while idx < len(l) - 1: + # ans.append(tensor_data[l[idx]:l[idx + 1]]) + # idx += 1 + # tensor_data = ans + # ans = tensor_data + return ans + + class Executor(object): def __init__(self, places): if not isinstance(places, list) and not isinstance(places, tuple): @@ -16,6 +47,47 @@ class Executor(object): act_places.append(p) self.executor = core.Executor(act_places) + self.places = places + + def aslodtensor(self, data): + def accumulate(data): + if not isinstance(data, list): + return 1 + return sum([accumulate(sub) for sub in data]) + + def parselod(data): + seq_lens = [accumulate(seq) for seq in data] + cur_len = 0 + lod = [cur_len] + for l in seq_lens: + cur_len += l + lod.append(cur_len) + return lod + + assert len(self.places) != 0 + if not isinstance(data, list): + # pure tensor case + tensor = core.LoDTensor() + tensor.set(data, self.places[0]) + return tensor + else: + raise RuntimeError("Current implementation lacks unittests") + # lodtensor case + lod = [] + if not isinstance(data[0], list): + lod.append(parselod(data)) + flattened_data = np.concatenate(data, axis=0).astype("int64") + else: + while isinstance(data[0], list): + lod.append(parselod(seq)) + flattened_data = [item for seq in data for item in seq] + data = flattened_data + flattened_data = np.concatenate(data, axis=0).astype("int64") + flattened_data = flattened_data.reshape([len(flattened_data), 1]) + tensor = core.LoDTensor() + tensor.set(flattened_data, self.places[0]) + tensor.set_lod(lod) + return tensor def run(self, program=None, @@ -23,7 +95,8 @@ class Executor(object): fetch_list=None, feed_var_name='feed', fetch_var_name='fetch', - scope=None): + scope=None, + return_numpy=True): if feed is None: feed = {} if fetch_list is None: @@ -52,7 +125,10 @@ class Executor(object): inputs={'X': [feed_var]}, outputs={'Out': [out]}, attrs={'col': i}) - core.set_feed_variable(scope, feed[name], feed_var.name, i) + cur_feed = feed[name] + if not isinstance(cur_feed, core.LoDTensor): + cur_feed = self.aslodtensor(cur_feed) + core.set_feed_variable(scope, cur_feed, feed_var.name, i) fetch_var = global_block.create_var( name=fetch_var_name, @@ -66,7 +142,11 @@ class Executor(object): attrs={'col': i}) self.executor.run(program.desc, scope, 0, True) - return [ + outs = [ core.get_fetch_variable(scope, fetch_var_name, i) for i in xrange(len(fetch_list)) ] + + if return_numpy: + outs = as_numpy(outs) + return outs diff --git a/python/paddle/v2/fluid/framework.py b/python/paddle/v2/fluid/framework.py index 872c19c2f6f4afbd25a5f7a9df38bd3dd0b61d5f..9a62698b86b8fb38384f8c7d76ac14d3a0c95cac 100644 --- a/python/paddle/v2/fluid/framework.py +++ b/python/paddle/v2/fluid/framework.py @@ -1,12 +1,12 @@ -import paddle.v2.fluid.core as core -import paddle.v2.fluid.proto.framework_pb2 as framework_pb2 import collections + import numpy as np -import copy +from . import core +import proto.framework_pb2 as framework_pb2 __all__ = [ 'Block', 'Variable', 'Program', 'Operator', 'default_startup_program', - 'default_main_program' + 'default_main_program', 'g_startup_program', 'g_main_program' ] diff --git a/python/paddle/v2/fluid/initializer.py b/python/paddle/v2/fluid/initializer.py index 9f23e68a7635b6e6ae927603dbcc47d63f9c7f3d..d3f648f8460814a3f251d7aa9560d748af85235c 100644 --- a/python/paddle/v2/fluid/initializer.py +++ b/python/paddle/v2/fluid/initializer.py @@ -1,10 +1,7 @@ -import paddle.v2.fluid.framework as framework +import framework import numpy as np -__all__ = [ - 'ConstantInitializer', 'UniformInitializer', 'NormalInitializer', - 'XavierInitializer' -] +__all__ = ['Constant', 'Uniform', 'Normal', 'Xavier'] class Initializer(object): @@ -368,3 +365,19 @@ class MSRAInitializer(Initializer): }) var.op = op return op + + +# We short the class name, since users will use the initializer with the package +# name. The sample code: +# +# import paddle.fluid as fluid +# +# hidden = fluid.layers.fc(..., +# param_attr=ParamAttr(fluid.initializer.Xavier())) +# +# It is no need to add an `Initializer` as the class suffix +Constant = ConstantInitializer +Uniform = UniformInitializer +Normal = NormalInitializer +Xavier = XavierInitializer +MSRA = MSRAInitializer diff --git a/python/paddle/v2/fluid/layer_helper.py b/python/paddle/v2/fluid/layer_helper.py index e0880354fbc5a09bd49de7ec9c5dffc1e3c6259e..5f8855551114a9a9b671d1630c9e8a3f0cb5c04b 100644 --- a/python/paddle/v2/fluid/layer_helper.py +++ b/python/paddle/v2/fluid/layer_helper.py @@ -1,10 +1,9 @@ import copy import itertools -from paddle.v2.fluid.framework import Variable, g_main_program, \ - g_startup_program, unique_name, Program, dtype_is_floating -from paddle.v2.fluid.initializer import ConstantInitializer, \ - UniformInitializer, XavierInitializer +from framework import Variable, g_main_program, \ + g_startup_program, unique_name, dtype_is_floating +from paddle.v2.fluid.initializer import Constant, Xavier class LayerHelper(object): @@ -209,7 +208,7 @@ class LayerHelper(object): def _get_default_initializer(self, dtype): if dtype is None or dtype_is_floating(dtype) is True: - return XavierInitializer() + return Xavier() else: # For integer and boolean types, initialize with all zeros - return ConstantInitializer() + return Constant() diff --git a/python/paddle/v2/fluid/layers.py b/python/paddle/v2/fluid/layers.py index d094035fe5cae2e77fc2364e8ccb03c350f1301a..28bc3d214b559a089efb2bb736eb49cb1ba4de25 100644 --- a/python/paddle/v2/fluid/layers.py +++ b/python/paddle/v2/fluid/layers.py @@ -1,9 +1,7 @@ -import paddle.v2.fluid.core as core -import paddle.v2.fluid.proto.framework_pb2 as framework_pb2 -from paddle.v2.fluid.framework import OpProtoHolder, Variable, Program, \ - Operator -from paddle.v2.fluid.initializer import ConstantInitializer, \ - NormalInitializer, XavierInitializer +from . import core +import proto.framework_pb2 as framework_pb2 +from framework import OpProtoHolder, Variable, Program, Operator +from initializer import Constant, Normal, Xavier from paddle.v2.fluid.layer_helper import LayerHelper, unique_name import re import cStringIO @@ -58,10 +56,10 @@ def fc(input, """ def _get_default_param_initializer(): - return XavierInitializer() + return Xavier() def _get_default_bias_initializer(): - return ConstantInitializer() + return Constant() helper = LayerHelper('fc', **locals()) @@ -139,7 +137,7 @@ def embedding(input, """ def _get_default_param_initializer(): - return XavierInitializer() + return Xavier() helper = LayerHelper('embedding', **locals()) w = helper.create_parameter( @@ -418,6 +416,7 @@ def _create_op_func_(op_type): _create_op_func_('mean') _create_op_func_('mul') _create_op_func_('elementwise_add') +_create_op_func_('elementwise_div') _create_op_func_('dropout') _create_op_func_('reshape') _create_op_func_('sigmoid') @@ -457,13 +456,14 @@ def concat(input, axis, main_program=None, startup_program=None): return out -def sums(input, main_program=None, startup_program=None): +def sums(input, out=None, main_program=None, startup_program=None): """ This function takes in the input and performs the sum operation on it and returns that as the output. """ helper = LayerHelper('sum', **locals()) - out = helper.create_tmp_variable(dtype=helper.input_dtype()) + if out is None: + out = helper.create_tmp_variable(dtype=helper.input_dtype()) helper.append_op(type='sum', inputs={'X': input}, outputs={'Out': out}) return out @@ -475,7 +475,7 @@ def linear_chain_crf(input, main_program=None, startup_program=None): def _get_default_param_initializer(): - return XavierInitializer() + return Xavier() helper = LayerHelper('linear_chain_crf', **locals()) size = input.shape[1] @@ -606,7 +606,7 @@ def square_error_cost(input, label, **kwargs): return square_out -def accuracy(input, label, k=1, **kwargs): +def accuracy(input, label, k=1, correct=None, total=None, **kwargs): """ This function computes the accuracy using the input and label. The output is the top_k inputs and their indices. @@ -620,10 +620,11 @@ def accuracy(input, label, k=1, **kwargs): outputs={"Out": [topk_out], "Indices": [topk_indices]}, attrs={"k": k}) - acc_out_dtype = kwargs.get("out_dtype", "float32") acc_out = helper.create_tmp_variable(dtype="float32") - correct = helper.create_tmp_variable(dtype="int64") - total = helper.create_tmp_variable(dtype="int64") + if correct is None: + correct = helper.create_tmp_variable(dtype="int64") + if total is None: + total = helper.create_tmp_variable(dtype="int64") helper.append_op( type="accuracy", inputs={ @@ -658,10 +659,10 @@ def sequence_conv(input, """ def _get_default_bias_initializer(): - return ConstantInitializer() + return Constant() def _get_default_param_initializer(): - return XavierInitializer() + return Xavier() # FIXME(dzh) : want to unify the argument of python layer # function. So we ignore some unecessary attributes. @@ -722,11 +723,11 @@ def conv2d(input, """ def _get_default_bias_initializer(): - return ConstantInitializer() + return Constant() def _get_default_param_initializer(filter_size, num_channels): std = (2.0 / (filter_size[0]**2 * num_channels))**0.5 - return NormalInitializer(0.0, std, 0) + return Normal(0.0, std, 0) helper = LayerHelper('conv2d', **locals()) dtype = helper.input_dtype() @@ -875,22 +876,20 @@ def batch_norm(input, attr=helper.param_attr, shape=param_shape, dtype=dtype, - initializer=ConstantInitializer(1.0)) + initializer=Constant(1.0)) bias = helper.create_parameter( attr=helper.param_attr, shape=param_shape, dtype=dtype, - initializer=ConstantInitializer(0.0)) + initializer=Constant(0.0)) mean = helper.create_global_variable( dtype=input.dtype, shape=param_shape, persistable=True) - helper.set_variable_initializer( - var=mean, initializer=ConstantInitializer(0.0)) + helper.set_variable_initializer(var=mean, initializer=Constant(0.0)) variance = helper.create_global_variable( dtype=input.dtype, shape=param_shape, persistable=True) - helper.set_variable_initializer( - var=variance, initializer=ConstantInitializer(1.0)) + helper.set_variable_initializer(var=variance, initializer=Constant(1.0)) # create output # mean and mean_out share the same memory @@ -1355,6 +1354,33 @@ def lod_rank_table(x, level=0, main_program=None): return table +def max_sequence_len(rank_table, main_program=None): + """ + This function creates an operator to calculate the length of + max seqence through input rank_table(should be a lod_rank_table) + """ + helper = LayerHelper("max_seqence_len", **locals()) + res = helper.create_tmp_variable(dtype="int64") + helper.append_op( + type="max_sequence_len", + inputs={"RankTable": rank_table}, + outputs={"Out": res}) + return res + + +def topk(input, k, main_program=None, startup_program=None): + helper = LayerHelper('topk', **locals()) + topk_out = helper.create_tmp_variable(dtype=input.data_type) + topk_indices = helper.create_tmp_variable(dtype='int64') + helper.append_op( + type='top_k', + inputs={'X': [input]}, + outputs={'Out': [topk_out], + 'Indices': [topk_indices]}, + attrs={'k': k}) + return topk_out, topk_indices + + def lod_tensor_to_array(x, table, main_program=None): """ This function creates an operator to convert an LOD_Tensor to @@ -1388,14 +1414,20 @@ def array_to_lod_tensor(x, table, main_program=None): return tmp -def fill_constant(shape, dtype, value, main_program=None, startup_program=None): +def fill_constant(shape, + dtype, + value, + out=None, + main_program=None, + startup_program=None): """ This function creates a tensor , with shape as mentioned in the input and specified dtype and fills this up with a constant value that comes in the input. It also sets the stop_gradient to be True. """ helper = LayerHelper("fill_constant", **locals()) - out = helper.create_tmp_variable(dtype=dtype) + if out is None: + out = helper.create_tmp_variable(dtype=dtype) helper.append_op( type='fill_constant', inputs={}, diff --git a/python/paddle/v2/fluid/nets.py b/python/paddle/v2/fluid/nets.py index 5e14ca594bc7965dc29039ba57bb7b26b1ce6871..05728ad75a5bd1e87aa3c75ffcc4eac34b6b956c 100644 --- a/python/paddle/v2/fluid/nets.py +++ b/python/paddle/v2/fluid/nets.py @@ -1,4 +1,4 @@ -import paddle.v2.fluid.layers as layers +import layers __all__ = ["simple_img_conv_pool", "sequence_conv_pool"] diff --git a/python/paddle/v2/fluid/optimizer.py b/python/paddle/v2/fluid/optimizer.py index e82f0f060de6af63f63d5601ae94059192076e6f..934e024742fd00bf05cc0d7caaaa870c18a68074 100644 --- a/python/paddle/v2/fluid/optimizer.py +++ b/python/paddle/v2/fluid/optimizer.py @@ -1,16 +1,13 @@ from collections import defaultdict -import paddle.v2.fluid.framework as framework -from paddle.v2.fluid.framework import unique_name, Program -from paddle.v2.fluid.backward import append_backward_ops -from paddle.v2.fluid.initializer import ConstantInitializer -from paddle.v2.fluid.regularizer import append_regularization_ops -from paddle.v2.fluid.layer_helper import LayerHelper +import framework +from backward import append_backward_ops +from framework import unique_name +from initializer import Constant +from layer_helper import LayerHelper +from regularizer import append_regularization_ops -__all__ = [ - 'SGDOptimizer', 'MomentumOptimizer', 'AdagradOptimizer', 'AdamOptimizer', - 'AdamaxOptimizer', 'DecayedAdagradOptimizer' -] +__all__ = ['SGD', 'Momentum', 'Adagrad', 'Adam', 'Adamax', 'DecayedAdagrad'] class Optimizer(object): @@ -48,7 +45,7 @@ class Optimizer(object): persistable=True) param_lr = param_lr * self._learning_rate self.helper.set_variable_initializer( - var=param_lr_var, initializer=ConstantInitializer(param_lr)) + var=param_lr_var, initializer=Constant(param_lr)) return param_lr_var def _create_accumulators(self, block, parameters): @@ -96,7 +93,7 @@ class Optimizer(object): type=param.type, shape=param.shape) self.helper.set_variable_initializer( - var, initializer=ConstantInitializer(value=float(fill_value))) + var, initializer=Constant(value=float(fill_value))) self._accumulators[name][param.name] = var def _get_accumulator(self, name, param): @@ -360,7 +357,7 @@ class AdamOptimizer(Optimizer): lod_level=0, persistable=True) self.helper.set_variable_initializer( - self._beta1_pow_acc, initializer=ConstantInitializer(self._beta1)) + self._beta1_pow_acc, initializer=Constant(self._beta1)) self._beta2_pow_acc = self.helper.create_global_variable( name=unique_name('beta2_pow_acc'), @@ -370,7 +367,7 @@ class AdamOptimizer(Optimizer): persistable=True) self.helper.set_variable_initializer( - self._beta2_pow_acc, initializer=ConstantInitializer(self._beta2)) + self._beta2_pow_acc, initializer=Constant(self._beta2)) # Create accumulator tensors for first and second moments for p in parameters: @@ -462,7 +459,7 @@ class AdamaxOptimizer(Optimizer): lod_level=0, persistable=True) self.helper.set_variable_initializer( - self._beta1_pow_acc, initializer=ConstantInitializer(self._beta1)) + self._beta1_pow_acc, initializer=Constant(self._beta1)) # Create accumulator tensors for first moment and infinity norm for p in parameters: @@ -559,3 +556,19 @@ class DecayedAdagradOptimizer(Optimizer): attrs={"epsilon": self._epsilon}) return decayed_adagrad_op + + +# We short the class name, since users will use the optimizer with the package +# name. The sample code: +# +# import paddle.fluid as fluid +# +# sgd = fluid.optimizer.SGD(...) +# +# It is no need to add an `Optimizer` as the class suffix +SGD = SGDOptimizer +Momentum = MomentumOptimizer +Adagrad = AdagradOptimizer +Adam = AdamOptimizer +Adamax = AdamaxOptimizer +DecayedAdagrad = DecayedAdagradOptimizer diff --git a/python/paddle/v2/fluid/regularizer.py b/python/paddle/v2/fluid/regularizer.py index 098cd0dd6439554f49e429ab75fb11bfa2c9d28c..c2c18e1951234f7160ff9f92d6dd6922a56683dd 100644 --- a/python/paddle/v2/fluid/regularizer.py +++ b/python/paddle/v2/fluid/regularizer.py @@ -1,8 +1,6 @@ -import paddle.v2.fluid.framework as framework +import framework -__all__ = [ - 'append_regularization_ops', 'L2DecayRegularizer', 'L1DecayRegularizer' -] +__all__ = ['append_regularization_ops', 'L1Decay', 'L2Decay'] def append_regularization_ops(parameters_and_grads): @@ -139,3 +137,16 @@ class L1DecayRegularizer(WeightDecayRegularizer): attrs={"scale": self._regularization_coeff}) return decay + + +# We short the class name, since users will use the regulaizer with the package +# name. The sample code: +# +# import paddle.fluid as fluid +# +# hidden = fluid.layers.fc(..., +# param_attr=ParamAttr(fluid.regularizer.Xavier())) +# +# It is no need to add a `Regularizer` as the class suffix +L1Decay = L1DecayRegularizer +L2Decay = L2DecayRegularizer diff --git a/python/paddle/v2/fluid/tests/.gitignore b/python/paddle/v2/fluid/tests/.gitignore index fcc52c04886865d96c1bfe1597a9dc99c181de1f..a648f2b387c2c7b9422eea6749e43e7b8871f60f 100644 --- a/python/paddle/v2/fluid/tests/.gitignore +++ b/python/paddle/v2/fluid/tests/.gitignore @@ -1,2 +1,3 @@ image/ fit_a_line.model/ +tmp diff --git a/python/paddle/v2/fluid/tests/book/test_fit_a_line.py b/python/paddle/v2/fluid/tests/book/test_fit_a_line.py index a899f1088d77c4ca6462cf5306393444ea114e6c..9f98493adb21a03b8efde0f88c490e77c9d303e7 100644 --- a/python/paddle/v2/fluid/tests/book/test_fit_a_line.py +++ b/python/paddle/v2/fluid/tests/book/test_fit_a_line.py @@ -1,23 +1,18 @@ import numpy as np import paddle.v2 as paddle -import paddle.v2.fluid.core as core -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -from paddle.v2.fluid.executor import Executor -from paddle.v2.fluid.io import save_persistables, load_persistables -from paddle.v2.fluid.optimizer import SGDOptimizer +import paddle.v2.fluid as fluid -x = layers.data(name='x', shape=[13], dtype='float32') +x = fluid.layers.data(name='x', shape=[13], dtype='float32') -y_predict = layers.fc(input=x, size=1, act=None) +y_predict = fluid.layers.fc(input=x, size=1, act=None) -y = layers.data(name='y', shape=[1], dtype='float32') +y = fluid.layers.data(name='y', shape=[1], dtype='float32') -cost = layers.square_error_cost(input=y_predict, label=y) -avg_cost = layers.mean(x=cost) +cost = fluid.layers.square_error_cost(input=y_predict, label=y) +avg_cost = fluid.layers.mean(x=cost) -sgd_optimizer = SGDOptimizer(learning_rate=0.001) -opts = sgd_optimizer.minimize(avg_cost) +sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) +sgd_optimizer.minimize(avg_cost) BATCH_SIZE = 20 @@ -26,32 +21,24 @@ train_reader = paddle.batch( paddle.dataset.uci_housing.train(), buf_size=500), batch_size=BATCH_SIZE) -place = core.CPUPlace() -exe = Executor(place) +place = fluid.CPUPlace() +exe = fluid.Executor(place) -exe.run(framework.default_startup_program()) +exe.run(fluid.default_startup_program()) PASS_NUM = 100 for pass_id in range(PASS_NUM): - save_persistables(exe, "./fit_a_line.model/") - load_persistables(exe, "./fit_a_line.model/") + fluid.io.save_persistables(exe, "./fit_a_line.model/") + fluid.io.load_persistables(exe, "./fit_a_line.model/") for data in train_reader(): - x_data = np.array(map(lambda x: x[0], data)).astype("float32") - y_data = np.array(map(lambda x: x[1], data)).astype("float32") - - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - # print tensor_x.get_dims() - - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) - # print tensor_y.get_dims() - outs = exe.run(framework.default_main_program(), - feed={'x': tensor_x, - 'y': tensor_y}, - fetch_list=[avg_cost]) - out = np.array(outs[0]) - - if out[0] < 10.0: + x_data = np.array(map(lambda _: _[0], data)).astype("float32") + y_data = np.array(map(lambda _: _[1], data)).astype("float32") + + avg_loss_value, = exe.run(fluid.default_main_program(), + feed={'x': x_data, + 'y': y_data}, + fetch_list=[avg_cost]) + + if avg_loss_value[0] < 10.0: exit(0) # if avg cost less than 10.0, we think our code is good. exit(1) diff --git a/python/paddle/v2/fluid/tests/book/test_image_classification_train.py b/python/paddle/v2/fluid/tests/book/test_image_classification_train.py index 76cbd410f94a4be04ba71d1e3175eaed590ac80a..690c53397198889ac6005aaacbfa9d6e02b7da3d 100644 --- a/python/paddle/v2/fluid/tests/book/test_image_classification_train.py +++ b/python/paddle/v2/fluid/tests/book/test_image_classification_train.py @@ -1,19 +1,12 @@ +from __future__ import print_function import numpy as np import paddle.v2 as paddle -import paddle.v2.fluid.core as core -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -import paddle.v2.fluid.nets as nets -import paddle.v2.fluid.evaluator as evaluator -from paddle.v2.fluid.io import get_inference_program -from paddle.v2.fluid.executor import Executor -from paddle.v2.fluid.initializer import XavierInitializer -from paddle.v2.fluid.optimizer import AdamOptimizer +import paddle.v2.fluid as fluid def resnet_cifar10(input, depth=32): def conv_bn_layer(input, ch_out, filter_size, stride, padding, act='relu'): - tmp = layers.conv2d( + tmp = fluid.layers.conv2d( input=input, filter_size=filter_size, num_filters=ch_out, @@ -21,12 +14,11 @@ def resnet_cifar10(input, depth=32): padding=padding, act=None, bias_attr=False) - return layers.batch_norm(input=tmp, act=act) + return fluid.layers.batch_norm(input=tmp, act=act) - def shortcut(input, ch_in, ch_out, stride, program, init_program): + def shortcut(input, ch_in, ch_out, stride): if ch_in != ch_out: - return conv_bn_layer(input, ch_out, 1, stride, 0, None, program, - init_program) + return conv_bn_layer(input, ch_out, 1, stride, 0, None) else: return input @@ -34,7 +26,7 @@ def resnet_cifar10(input, depth=32): tmp = conv_bn_layer(input, ch_out, 3, stride, 1) tmp = conv_bn_layer(tmp, ch_out, 3, 1, 1, act=None) short = shortcut(input, ch_in, ch_out, stride) - return layers.elementwise_add(x=tmp, y=short, act='relu') + return fluid.layers.elementwise_add(x=tmp, y=short, act='relu') def layer_warp(block_func, input, ch_in, ch_out, count, stride): tmp = block_func(input, ch_in, ch_out, stride) @@ -49,14 +41,14 @@ def resnet_cifar10(input, depth=32): res1 = layer_warp(basicblock, conv1, 16, 16, n, 1) res2 = layer_warp(basicblock, res1, 16, 32, n, 2) res3 = layer_warp(basicblock, res2, 32, 64, n, 2) - pool = layers.pool2d( + pool = fluid.layers.pool2d( input=res3, pool_size=8, pool_type='avg', pool_stride=1) return pool def vgg16_bn_drop(input): def conv_block(input, num_filter, groups, dropouts): - return nets.img_conv_group( + return fluid.nets.img_conv_group( input=input, pool_size=2, pool_stride=2, @@ -73,26 +65,20 @@ def vgg16_bn_drop(input): conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0]) conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0]) - drop = layers.dropout(x=conv5, dropout_prob=0.5) - fc1 = layers.fc(input=drop, - size=512, - act=None, - param_attr={"initializer": XavierInitializer()}) - reshape1 = layers.reshape(x=fc1, shape=list(fc1.shape + (1, 1))) - bn = layers.batch_norm(input=reshape1, act='relu') - drop2 = layers.dropout(x=bn, dropout_prob=0.5) - fc2 = layers.fc(input=drop2, - size=512, - act=None, - param_attr={"initializer": XavierInitializer()}) + drop = fluid.layers.dropout(x=conv5, dropout_prob=0.5) + fc1 = fluid.layers.fc(input=drop, size=512, act=None) + reshape1 = fluid.layers.reshape(x=fc1, shape=list(fc1.shape + (1, 1))) + bn = fluid.layers.batch_norm(input=reshape1, act='relu') + drop2 = fluid.layers.dropout(x=bn, dropout_prob=0.5) + fc2 = fluid.layers.fc(input=drop2, size=512, act=None) return fc2 classdim = 10 data_shape = [3, 32, 32] -images = layers.data(name='pixel', shape=data_shape, dtype='float32') -label = layers.data(name='label', shape=[1], dtype='int64') +images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32') +label = fluid.layers.data(name='label', shape=[1], dtype='int64') # Add neural network config # option 1. resnet @@ -100,35 +86,29 @@ label = layers.data(name='label', shape=[1], dtype='int64') # option 2. vgg net = vgg16_bn_drop(images) -# print(program) +predict = fluid.layers.fc(input=net, size=classdim, act='softmax') +cost = fluid.layers.cross_entropy(input=predict, label=label) +avg_cost = fluid.layers.mean(x=cost) -predict = layers.fc(input=net, size=classdim, act='softmax') -cost = layers.cross_entropy(input=predict, label=label) -avg_cost = layers.mean(x=cost) - -# optimizer = SGDOptimizer(learning_rate=0.001) -optimizer = AdamOptimizer(learning_rate=0.001) +optimizer = fluid.optimizer.Adam(learning_rate=0.001) opts = optimizer.minimize(avg_cost) -accuracy, acc_out = evaluator.accuracy(input=predict, label=label) +accuracy = fluid.evaluator.Accuracy(input=predict, label=label) BATCH_SIZE = 128 PASS_NUM = 1 train_reader = paddle.batch( paddle.reader.shuffle( - paddle.dataset.cifar.train10(), buf_size=BATCH_SIZE * 10), + paddle.dataset.cifar.train10(), buf_size=128 * 10), batch_size=BATCH_SIZE) -test_reader = paddle.batch(paddle.dataset.cifar.test10(), batch_size=BATCH_SIZE) - -place = core.CPUPlace() -exe = Executor(place) +place = fluid.CPUPlace() +exe = fluid.Executor(place) -exe.run(framework.default_startup_program()) +exe.run(fluid.default_startup_program()) for pass_id in range(PASS_NUM): - batch_id = 0 accuracy.reset(exe) for data in train_reader(): img_data = np.array(map(lambda x: x[0].reshape(data_shape), @@ -139,56 +119,13 @@ for pass_id in range(PASS_NUM): batch_size = batch_size * i y_data = y_data.reshape([batch_size, 1]) - tensor_img = core.LoDTensor() - tensor_y = core.LoDTensor() - tensor_img.set(img_data, place) - tensor_y.set(y_data, place) - - outs = exe.run(framework.default_main_program(), - feed={"pixel": tensor_img, - "label": tensor_y}, - fetch_list=[avg_cost, acc_out]) - - loss = np.array(outs[0]) - acc = np.array(outs[1]) + loss, acc = exe.run(fluid.default_main_program(), + feed={"pixel": img_data, + "label": y_data}, + fetch_list=[avg_cost] + accuracy.metrics) pass_acc = accuracy.eval(exe) - - batch_id = batch_id + 1 - - test_accuracy, test_acc_out = evaluator.accuracy( - input=predict, label=label) - - test_target = [avg_cost, test_acc_out] + test_accuracy.states().values() - inference_program = get_inference_program(test_target) - - test_accuracy.reset(exe) - - for data in test_reader(): - x_data = np.array(map(lambda x: x[0].reshape(data_shape), - data)).astype("float32") - y_data = np.array(map(lambda x: x[1], data)).astype("int64") - y_data = np.expand_dims(y_data, axis=1) - - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) - - outs = exe.run(inference_program, - feed={'pixel': tensor_x, - 'label': tensor_y}, - fetch_list=[avg_cost, test_acc_out]) - out = np.array(outs[0]) - acc = np.array(outs[1]) - - test_pass_acc = test_accuracy.eval(exe) - - print("pass_id:" + str(pass_id) + " batch_id:" + str(batch_id) + - " loss:" + str(loss) + " acc:" + str(acc) + " pass_acc:" + str( - pass_acc) + " test_pass_acc:" + str(test_pass_acc)) - - if batch_id > 1: - # this model is slow, so if we can train two mini batch, we think it works properly. - exit(0) + print("loss:" + str(loss) + " acc:" + str(acc) + " pass_acc:" + str( + pass_acc)) + # this model is slow, so if we can train two mini batch, we think it works properly. + exit(0) exit(1) diff --git a/python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py b/python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py index 9c9064ba9639829ef3afd8111278b17035bee84a..93987a2b80dc9ca304a708d4799bc38b448a68c4 100644 --- a/python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py +++ b/python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py @@ -1,11 +1,7 @@ import numpy as np import paddle.v2 as paddle import paddle.v2.dataset.conll05 as conll05 -import paddle.v2.fluid.core as core -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -from paddle.v2.fluid.executor import Executor, g_scope -from paddle.v2.fluid.optimizer import SGDOptimizer +import paddle.v2.fluid as fluid word_dict, verb_dict, label_dict = conll05.get_dict() word_dict_len = len(word_dict) @@ -34,23 +30,23 @@ def load_parameter(file_name, h, w): def db_lstm(): # 8 features - word = layers.data(name='word_data', shape=[1], dtype='int64') - predicate = layers.data(name='verb_data', shape=[1], dtype='int64') - ctx_n2 = layers.data(name='ctx_n2_data', shape=[1], dtype='int64') - ctx_n1 = layers.data(name='ctx_n1_data', shape=[1], dtype='int64') - ctx_0 = layers.data(name='ctx_0_data', shape=[1], dtype='int64') - ctx_p1 = layers.data(name='ctx_p1_data', shape=[1], dtype='int64') - ctx_p2 = layers.data(name='ctx_p2_data', shape=[1], dtype='int64') - mark = layers.data(name='mark_data', shape=[1], dtype='int64') - - predicate_embedding = layers.embedding( + word = fluid.layers.data(name='word_data', shape=[1], dtype='int64') + predicate = fluid.layers.data(name='verb_data', shape=[1], dtype='int64') + ctx_n2 = fluid.layers.data(name='ctx_n2_data', shape=[1], dtype='int64') + ctx_n1 = fluid.layers.data(name='ctx_n1_data', shape=[1], dtype='int64') + ctx_0 = fluid.layers.data(name='ctx_0_data', shape=[1], dtype='int64') + ctx_p1 = fluid.layers.data(name='ctx_p1_data', shape=[1], dtype='int64') + ctx_p2 = fluid.layers.data(name='ctx_p2_data', shape=[1], dtype='int64') + mark = fluid.layers.data(name='mark_data', shape=[1], dtype='int64') + + predicate_embedding = fluid.layers.embedding( input=predicate, size=[pred_len, word_dim], dtype='float32', is_sparse=IS_SPARSE, param_attr={'name': 'vemb'}) - mark_embedding = layers.embedding( + mark_embedding = fluid.layers.embedding( input=mark, size=[mark_dict_len, mark_dim], dtype='float32', @@ -58,7 +54,7 @@ def db_lstm(): word_input = [word, ctx_n2, ctx_n1, ctx_0, ctx_p1, ctx_p2] emb_layers = [ - layers.embedding( + fluid.layers.embedding( size=[word_dict_len, word_dim], input=x, param_attr={'name': embedding_name, @@ -68,12 +64,12 @@ def db_lstm(): emb_layers.append(mark_embedding) hidden_0_layers = [ - layers.fc(input=emb, size=hidden_dim) for emb in emb_layers + fluid.layers.fc(input=emb, size=hidden_dim) for emb in emb_layers ] - hidden_0 = layers.sums(input=hidden_0_layers) + hidden_0 = fluid.layers.sums(input=hidden_0_layers) - lstm_0 = layers.dynamic_lstm( + lstm_0 = fluid.layers.dynamic_lstm( input=hidden_0, size=hidden_dim, candidate_activation='relu', @@ -84,12 +80,12 @@ def db_lstm(): input_tmp = [hidden_0, lstm_0] for i in range(1, depth): - mix_hidden = layers.sums(input=[ - layers.fc(input=input_tmp[0], size=hidden_dim), - layers.fc(input=input_tmp[1], size=hidden_dim) + mix_hidden = fluid.layers.sums(input=[ + fluid.layers.fc(input=input_tmp[0], size=hidden_dim), + fluid.layers.fc(input=input_tmp[1], size=hidden_dim) ]) - lstm = layers.dynamic_lstm( + lstm = fluid.layers.dynamic_lstm( input=mix_hidden, size=hidden_dim, candidate_activation='relu', @@ -99,9 +95,9 @@ def db_lstm(): input_tmp = [mix_hidden, lstm] - feature_out = layers.sums(input=[ - layers.fc(input=input_tmp[0], size=label_dict_len), - layers.fc(input=input_tmp[1], size=label_dict_len) + feature_out = fluid.layers.sums(input=[ + fluid.layers.fc(input=input_tmp[0], size=label_dict_len), + fluid.layers.fc(input=input_tmp[1], size=label_dict_len) ]) return feature_out @@ -116,7 +112,7 @@ def to_lodtensor(data, place): lod.append(cur_len) flattened_data = np.concatenate(data, axis=0).astype("int64") flattened_data = flattened_data.reshape([len(flattened_data), 1]) - res = core.LoDTensor() + res = fluid.LoDTensor() res.set(flattened_data, place) res.set_lod([lod]) return res @@ -125,29 +121,29 @@ def to_lodtensor(data, place): def main(): # define network topology feature_out = db_lstm() - target = layers.data(name='target', shape=[1], dtype='int64') - crf_cost = layers.linear_chain_crf( + target = fluid.layers.data(name='target', shape=[1], dtype='int64') + crf_cost = fluid.layers.linear_chain_crf( input=feature_out, label=target, param_attr={"name": 'crfw', "learning_rate": mix_hidden_lr}) - avg_cost = layers.mean(x=crf_cost) + avg_cost = fluid.layers.mean(x=crf_cost) # TODO(qiao) # 1. add crf_decode_layer and evaluator # 2. use other optimizer and check why out will be NAN - sgd_optimizer = SGDOptimizer(learning_rate=0.0001) - opts = sgd_optimizer.minimize(avg_cost) + sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.0001) + sgd_optimizer.minimize(avg_cost) train_data = paddle.batch( paddle.reader.shuffle( paddle.dataset.conll05.test(), buf_size=8192), batch_size=BATCH_SIZE) - place = core.CPUPlace() - exe = Executor(place) + place = fluid.CPUPlace() + exe = fluid.Executor(place) - exe.run(framework.default_startup_program()) + exe.run(fluid.default_startup_program()) - embedding_param = g_scope.find_var(embedding_name).get_tensor() + embedding_param = fluid.g_scope.find_var(embedding_name).get_tensor() embedding_param.set( load_parameter(conll05.get_embedding(), word_dict_len, word_dim), place) @@ -164,7 +160,7 @@ def main(): mark_data = to_lodtensor(map(lambda x: x[7], data), place) target = to_lodtensor(map(lambda x: x[8], data), place) - outs = exe.run(framework.default_main_program(), + outs = exe.run(fluid.default_main_program(), feed={ 'word_data': word_data, 'ctx_n2_data': ctx_n2_data, diff --git a/python/paddle/v2/fluid/tests/book/test_recognize_digits_conv.py b/python/paddle/v2/fluid/tests/book/test_recognize_digits_conv.py index 0bea5f95c895b278db86f25f54e2795d3ec0af69..ba686b56f8603834c12f5ed24e0ef7308c78585d 100644 --- a/python/paddle/v2/fluid/tests/book/test_recognize_digits_conv.py +++ b/python/paddle/v2/fluid/tests/book/test_recognize_digits_conv.py @@ -1,23 +1,18 @@ +from __future__ import print_function import numpy as np import paddle.v2 as paddle -import paddle.v2.fluid.core as core -import paddle.v2.fluid.evaluator as evaluator -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -import paddle.v2.fluid.nets as nets -from paddle.v2.fluid.executor import Executor -from paddle.v2.fluid.optimizer import AdamOptimizer +import paddle.v2.fluid as fluid -images = layers.data(name='pixel', shape=[1, 28, 28], dtype='float32') -label = layers.data(name='label', shape=[1], dtype='int64') -conv_pool_1 = nets.simple_img_conv_pool( +images = fluid.layers.data(name='pixel', shape=[1, 28, 28], dtype='float32') +label = fluid.layers.data(name='label', shape=[1], dtype='int64') +conv_pool_1 = fluid.nets.simple_img_conv_pool( input=images, filter_size=5, num_filters=20, pool_size=2, pool_stride=2, act="relu") -conv_pool_2 = nets.simple_img_conv_pool( +conv_pool_2 = fluid.nets.simple_img_conv_pool( input=conv_pool_1, filter_size=5, num_filters=50, @@ -25,13 +20,13 @@ conv_pool_2 = nets.simple_img_conv_pool( pool_stride=2, act="relu") -predict = layers.fc(input=conv_pool_2, size=10, act="softmax") -cost = layers.cross_entropy(input=predict, label=label) -avg_cost = layers.mean(x=cost) -optimizer = AdamOptimizer(learning_rate=0.01, beta1=0.9, beta2=0.999) -opts = optimizer.minimize(avg_cost) +predict = fluid.layers.fc(input=conv_pool_2, size=10, act="softmax") +cost = fluid.layers.cross_entropy(input=predict, label=label) +avg_cost = fluid.layers.mean(x=cost) +optimizer = fluid.optimizer.Adam(learning_rate=0.01) +optimizer.minimize(avg_cost) -accuracy, acc_out = evaluator.accuracy(input=predict, label=label) +accuracy = fluid.evaluator.Accuracy(input=predict, label=label) BATCH_SIZE = 50 PASS_NUM = 3 @@ -40,10 +35,10 @@ train_reader = paddle.batch( paddle.dataset.mnist.train(), buf_size=500), batch_size=BATCH_SIZE) -place = core.CPUPlace() -exe = Executor(place) +place = fluid.CPUPlace() +exe = fluid.Executor(place) -exe.run(framework.default_startup_program()) +exe.run(fluid.default_startup_program()) for pass_id in range(PASS_NUM): accuracy.reset(exe) @@ -53,17 +48,10 @@ for pass_id in range(PASS_NUM): y_data = np.array(map(lambda x: x[1], data)).astype("int64") y_data = y_data.reshape([BATCH_SIZE, 1]) - tensor_img = core.LoDTensor() - tensor_y = core.LoDTensor() - tensor_img.set(img_data, place) - tensor_y.set(y_data, place) - - outs = exe.run(framework.default_main_program(), - feed={"pixel": tensor_img, - "label": tensor_y}, - fetch_list=[avg_cost, acc_out]) - loss = np.array(outs[0]) - acc = np.array(outs[1]) + loss, acc = exe.run(fluid.default_main_program(), + feed={"pixel": img_data, + "label": y_data}, + fetch_list=[avg_cost] + accuracy.metrics) pass_acc = accuracy.eval(exe) print("pass_id=" + str(pass_id) + " acc=" + str(acc) + " pass_acc=" + str(pass_acc)) diff --git a/python/paddle/v2/fluid/tests/book/test_recognize_digits_mlp.py b/python/paddle/v2/fluid/tests/book/test_recognize_digits_mlp.py index f57a5c8d98cd8b89e1d300b4d1fe00d6b24b0d68..c96d186ffe8d9313cb818a55d68dfc3c13db19cc 100644 --- a/python/paddle/v2/fluid/tests/book/test_recognize_digits_mlp.py +++ b/python/paddle/v2/fluid/tests/book/test_recognize_digits_mlp.py @@ -1,42 +1,39 @@ +from __future__ import print_function import numpy as np import paddle.v2 as paddle -import paddle.v2.fluid.core as core -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -import paddle.v2.fluid.evaluator as evaluator -from paddle.v2.fluid.io import get_inference_program -from paddle.v2.fluid.executor import Executor -from paddle.v2.fluid.initializer import UniformInitializer -from paddle.v2.fluid.optimizer import MomentumOptimizer -from paddle.v2.fluid.regularizer import L2DecayRegularizer +import paddle.v2.fluid as fluid BATCH_SIZE = 128 -image = layers.data(name='x', shape=[784], dtype='float32') +image = fluid.layers.data(name='x', shape=[784], dtype='float32') param_attr = { 'name': None, - 'initializer': UniformInitializer( - low=-1.0, high=1.0), - 'regularization': L2DecayRegularizer(0.0005 * BATCH_SIZE) + 'regularization': fluid.regularizer.L2Decay(0.0005 * BATCH_SIZE) } -hidden1 = layers.fc(input=image, size=128, act='relu', param_attr=param_attr) -hidden2 = layers.fc(input=hidden1, size=64, act='relu', param_attr=param_attr) +hidden1 = fluid.layers.fc(input=image, + size=128, + act='relu', + param_attr=param_attr) +hidden2 = fluid.layers.fc(input=hidden1, + size=64, + act='relu', + param_attr=param_attr) -predict = layers.fc(input=hidden2, - size=10, - act='softmax', - param_attr=param_attr) +predict = fluid.layers.fc(input=hidden2, + size=10, + act='softmax', + param_attr=param_attr) -label = layers.data(name='y', shape=[1], dtype='int64') +label = fluid.layers.data(name='y', shape=[1], dtype='int64') -cost = layers.cross_entropy(input=predict, label=label) -avg_cost = layers.mean(x=cost) +cost = fluid.layers.cross_entropy(input=predict, label=label) +avg_cost = fluid.layers.mean(x=cost) -optimizer = MomentumOptimizer(learning_rate=0.001, momentum=0.9) +optimizer = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9) opts = optimizer.minimize(avg_cost) -accuracy, acc_out = evaluator.accuracy(input=predict, label=label) +accuracy = fluid.evaluator.Accuracy(input=predict, label=label) train_reader = paddle.batch( paddle.reader.shuffle( @@ -45,10 +42,10 @@ train_reader = paddle.batch( test_reader = paddle.batch(paddle.dataset.mnist.test(), batch_size=128) -place = core.CPUPlace() -exe = Executor(place) +place = fluid.CPUPlace() +exe = fluid.Executor(place) -exe.run(framework.default_startup_program()) +exe.run(fluid.default_startup_program()) PASS_NUM = 100 for pass_id in range(PASS_NUM): @@ -58,25 +55,24 @@ for pass_id in range(PASS_NUM): y_data = np.array(map(lambda x: x[1], data)).astype("int64") y_data = np.expand_dims(y_data, axis=1) - tensor_x = core.LoDTensor() + tensor_x = fluid.LoDTensor() tensor_x.set(x_data, place) - tensor_y = core.LoDTensor() + tensor_y = fluid.LoDTensor() tensor_y.set(y_data, place) - outs = exe.run(framework.default_main_program(), + outs = exe.run(fluid.default_main_program(), feed={'x': tensor_x, 'y': tensor_y}, - fetch_list=[avg_cost, acc_out]) + fetch_list=[avg_cost] + accuracy.metrics) out = np.array(outs[0]) acc = np.array(outs[1]) pass_acc = accuracy.eval(exe) - test_accuracy, test_acc_out = evaluator.accuracy( - input=predict, label=label) + test_accuracy = fluid.evaluator.Accuracy(input=predict, label=label) - test_target = [avg_cost, test_acc_out] + test_accuracy.states().values() - inference_program = get_inference_program(test_target) + test_target = [avg_cost] + test_accuracy.metrics + test_accuracy.states + inference_program = fluid.io.get_inference_program(test_target) test_accuracy.reset(exe) for data in test_reader(): @@ -84,18 +80,10 @@ for pass_id in range(PASS_NUM): y_data = np.array(map(lambda x: x[1], data)).astype("int64") y_data = np.expand_dims(y_data, axis=1) - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) - - outs = exe.run(inference_program, - feed={'x': tensor_x, - 'y': tensor_y}, - fetch_list=[avg_cost, test_acc_out]) - out = np.array(outs[0]) - acc = np.array(outs[1]) + out, acc = exe.run(inference_program, + feed={'x': x_data, + 'y': y_data}, + fetch_list=[avg_cost] + test_accuracy.metrics) test_pass_acc = test_accuracy.eval(exe) print("pass_id=" + str(pass_id) + " train_cost=" + str( diff --git a/python/paddle/v2/fluid/tests/book/test_understand_sentiment_conv.py b/python/paddle/v2/fluid/tests/book/test_understand_sentiment_conv.py index 3103be83a63d64fcba87132ddc5d830b92047b27..be875a952b7086ee64984525d70ffd3f1ecb5fae 100644 --- a/python/paddle/v2/fluid/tests/book/test_understand_sentiment_conv.py +++ b/python/paddle/v2/fluid/tests/book/test_understand_sentiment_conv.py @@ -1,40 +1,35 @@ +from __future__ import print_function import numpy as np import paddle.v2 as paddle -import paddle.v2.fluid.core as core -import paddle.v2.fluid.evaluator as evaluator -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -import paddle.v2.fluid.nets as nets -from paddle.v2.fluid.executor import Executor -from paddle.v2.fluid.optimizer import AdamOptimizer +import paddle.v2.fluid as fluid def convolution_net(input_dim, class_dim=2, emb_dim=32, hid_dim=32): - data = layers.data(name="words", shape=[1], dtype="int64") - label = layers.data(name="label", shape=[1], dtype="int64") + data = fluid.layers.data(name="words", shape=[1], dtype="int64") + label = fluid.layers.data(name="label", shape=[1], dtype="int64") - emb = layers.embedding(input=data, size=[input_dim, emb_dim]) - conv_3 = nets.sequence_conv_pool( + emb = fluid.layers.embedding(input=data, size=[input_dim, emb_dim]) + conv_3 = fluid.nets.sequence_conv_pool( input=emb, num_filters=hid_dim, filter_size=3, act="tanh", pool_type="sqrt") - conv_4 = nets.sequence_conv_pool( + conv_4 = fluid.nets.sequence_conv_pool( input=emb, num_filters=hid_dim, filter_size=4, act="tanh", pool_type="sqrt") - prediction = layers.fc(input=[conv_3, conv_4], - size=class_dim, - act="softmax") - cost = layers.cross_entropy(input=prediction, label=label) - avg_cost = layers.mean(x=cost) - adam_optimizer = AdamOptimizer(learning_rate=0.002) - opts = adam_optimizer.minimize(avg_cost) - accuracy, acc_out = evaluator.accuracy(input=prediction, label=label) - return avg_cost, accuracy, acc_out + prediction = fluid.layers.fc(input=[conv_3, conv_4], + size=class_dim, + act="softmax") + cost = fluid.layers.cross_entropy(input=prediction, label=label) + avg_cost = fluid.layers.mean(x=cost) + adam_optimizer = fluid.optimizer.Adam(learning_rate=0.002) + adam_optimizer.minimize(avg_cost) + accuracy = fluid.evaluator.Accuracy(input=prediction, label=label) + return avg_cost, accuracy, accuracy.metrics[0] def to_lodtensor(data, place): @@ -46,7 +41,7 @@ def to_lodtensor(data, place): lod.append(cur_len) flattened_data = np.concatenate(data, axis=0).astype("int64") flattened_data = flattened_data.reshape([len(flattened_data), 1]) - res = core.LoDTensor() + res = fluid.LoDTensor() res.set(flattened_data, place) res.set_lod([lod]) return res @@ -67,10 +62,10 @@ def main(): paddle.reader.shuffle( paddle.dataset.imdb.train(word_dict), buf_size=1000), batch_size=BATCH_SIZE) - place = core.CPUPlace() - exe = Executor(place) + place = fluid.CPUPlace() + exe = fluid.Executor(place) - exe.run(framework.default_startup_program()) + exe.run(fluid.default_startup_program()) for pass_id in xrange(PASS_NUM): accuracy.reset(exe) @@ -80,15 +75,14 @@ def main(): label = np.array(map(lambda x: x[1], data)).astype("int64") label = label.reshape([BATCH_SIZE, 1]) - tensor_label = core.LoDTensor() + tensor_label = fluid.LoDTensor() tensor_label.set(label, place) - outs = exe.run(framework.default_main_program(), - feed={"words": tensor_words, - "label": tensor_label}, - fetch_list=[cost, acc_out]) - cost_val = np.array(outs[0]) - acc_val = np.array(outs[1]) + cost_val, acc_val = exe.run( + fluid.default_main_program(), + feed={"words": tensor_words, + "label": tensor_label}, + fetch_list=[cost, acc_out]) pass_acc = accuracy.eval(exe) print("cost=" + str(cost_val) + " acc=" + str(acc_val) + " pass_acc=" + str(pass_acc)) diff --git a/python/paddle/v2/fluid/tests/book/test_understand_sentiment_dynamic_lstm.py b/python/paddle/v2/fluid/tests/book/test_understand_sentiment_dynamic_lstm.py index 208978224f4e83a23efadae37fbe51d0d59dafe8..094a3cdcda12eaee351476e99a388c44b3c81cd6 100644 --- a/python/paddle/v2/fluid/tests/book/test_understand_sentiment_dynamic_lstm.py +++ b/python/paddle/v2/fluid/tests/book/test_understand_sentiment_dynamic_lstm.py @@ -1,11 +1,6 @@ import numpy as np import paddle.v2 as paddle -import paddle.v2.fluid.core as core -import paddle.v2.fluid.evaluator as evaluator -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -from paddle.v2.fluid.executor import Executor -from paddle.v2.fluid.optimizer import AdamOptimizer +import paddle.v2.fluid as fluid def stacked_lstm_net(input_dim, @@ -14,36 +9,36 @@ def stacked_lstm_net(input_dim, hid_dim=512, stacked_num=3): assert stacked_num % 2 == 1 - data = layers.data(name="words", shape=[1], dtype="int64") - label = layers.data(name="label", shape=[1], dtype="int64") + data = fluid.layers.data(name="words", shape=[1], dtype="int64") + label = fluid.layers.data(name="label", shape=[1], dtype="int64") - emb = layers.embedding(input=data, size=[input_dim, emb_dim]) + emb = fluid.layers.embedding(input=data, size=[input_dim, emb_dim]) # add bias attr # TODO(qijun) linear act - fc1 = layers.fc(input=emb, size=hid_dim) - lstm1, cell1 = layers.dynamic_lstm(input=fc1, size=hid_dim) + fc1 = fluid.layers.fc(input=emb, size=hid_dim) + lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=hid_dim) inputs = [fc1, lstm1] for i in range(2, stacked_num + 1): - fc = layers.fc(input=inputs, size=hid_dim) - lstm, cell = layers.dynamic_lstm( + fc = fluid.layers.fc(input=inputs, size=hid_dim) + lstm, cell = fluid.layers.dynamic_lstm( input=fc, size=hid_dim, is_reverse=(i % 2) == 0) inputs = [fc, lstm] - fc_last = layers.sequence_pool(input=inputs[0], pool_type='max') - lstm_last = layers.sequence_pool(input=inputs[1], pool_type='max') + fc_last = fluid.layers.sequence_pool(input=inputs[0], pool_type='max') + lstm_last = fluid.layers.sequence_pool(input=inputs[1], pool_type='max') - prediction = layers.fc(input=[fc_last, lstm_last], - size=class_dim, - act='softmax') - cost = layers.cross_entropy(input=prediction, label=label) - avg_cost = layers.mean(x=cost) - adam_optimizer = AdamOptimizer(learning_rate=0.002) - opts = adam_optimizer.minimize(avg_cost) - accuracy, acc_out = evaluator.accuracy(input=prediction, label=label) - return avg_cost, accuracy, acc_out + prediction = fluid.layers.fc(input=[fc_last, lstm_last], + size=class_dim, + act='softmax') + cost = fluid.layers.cross_entropy(input=prediction, label=label) + avg_cost = fluid.layers.mean(x=cost) + adam_optimizer = fluid.optimizer.Adam(learning_rate=0.002) + adam_optimizer.minimize(avg_cost) + accuracy = fluid.evaluator.Accuracy(input=prediction, label=label) + return avg_cost, accuracy, accuracy.metrics[0] def to_lodtensor(data, place): @@ -55,7 +50,7 @@ def to_lodtensor(data, place): lod.append(cur_len) flattened_data = np.concatenate(data, axis=0).astype("int64") flattened_data = flattened_data.reshape([len(flattened_data), 1]) - res = core.LoDTensor() + res = fluid.LoDTensor() res.set(flattened_data, place) res.set_lod([lod]) return res @@ -77,10 +72,10 @@ def main(): paddle.reader.shuffle( paddle.dataset.imdb.train(word_dict), buf_size=1000), batch_size=BATCH_SIZE) - place = core.CPUPlace() - exe = Executor(place) + place = fluid.CPUPlace() + exe = fluid.Executor(place) - exe.run(framework.default_startup_program()) + exe.run(fluid.default_startup_program()) for pass_id in xrange(PASS_NUM): accuracy.reset(exe) @@ -90,15 +85,14 @@ def main(): label = np.array(map(lambda x: x[1], data)).astype("int64") label = label.reshape([BATCH_SIZE, 1]) - tensor_label = core.LoDTensor() + tensor_label = fluid.LoDTensor() tensor_label.set(label, place) - outs = exe.run(framework.default_main_program(), - feed={"words": tensor_words, - "label": tensor_label}, - fetch_list=[cost, acc_out]) - cost_val = np.array(outs[0]) - acc_val = np.array(outs[1]) + cost_val, acc_val = exe.run( + fluid.default_main_program(), + feed={"words": tensor_words, + "label": tensor_label}, + fetch_list=[cost, acc_out]) pass_acc = accuracy.eval(exe) print("cost=" + str(cost_val) + " acc=" + str(acc_val) + " pass_acc=" + str(pass_acc)) diff --git a/python/paddle/v2/fluid/tests/book/test_understand_sentiment_lstm.py b/python/paddle/v2/fluid/tests/book/test_understand_sentiment_lstm.py index 8aebeba653cf49438929fa51312b5af33c3b438d..b2479320330bde5771c3d4a8e2923b5ab1eecf2e 100644 --- a/python/paddle/v2/fluid/tests/book/test_understand_sentiment_lstm.py +++ b/python/paddle/v2/fluid/tests/book/test_understand_sentiment_lstm.py @@ -1,40 +1,39 @@ import numpy as np import paddle.v2 as paddle -import paddle.v2.fluid.core as core -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -from paddle.v2.fluid.executor import Executor -from paddle.v2.fluid.optimizer import AdamOptimizer +import paddle.v2.fluid as fluid def lstm_net(dict_dim, class_dim=2, emb_dim=32, seq_len=80, batch_size=50): - data = layers.data( + data = fluid.layers.data( name="words", shape=[seq_len * batch_size, 1], append_batch_size=False, dtype="int64") - label = layers.data( + label = fluid.layers.data( name="label", shape=[batch_size, 1], append_batch_size=False, dtype="int64") - emb = layers.embedding(input=data, size=[dict_dim, emb_dim]) - emb = layers.reshape(x=emb, shape=[batch_size, seq_len, emb_dim]) - emb = layers.transpose(x=emb, axis=[1, 0, 2]) + emb = fluid.layers.embedding(input=data, size=[dict_dim, emb_dim]) + emb = fluid.layers.reshape(x=emb, shape=[batch_size, seq_len, emb_dim]) + emb = fluid.layers.transpose(x=emb, axis=[1, 0, 2]) - c_pre_init = layers.fill_constant( + c_pre_init = fluid.layers.fill_constant( dtype=emb.dtype, shape=[batch_size, emb_dim], value=0.0) - layer_1_out = layers.lstm(emb, c_pre_init=c_pre_init, hidden_dim=emb_dim) - layer_1_out = layers.transpose(x=layer_1_out, axis=[1, 0, 2]) + layer_1_out = fluid.layers.lstm( + emb, c_pre_init=c_pre_init, hidden_dim=emb_dim) + layer_1_out = fluid.layers.transpose(x=layer_1_out, axis=[1, 0, 2]) - prediction = layers.fc(input=layer_1_out, size=class_dim, act="softmax") - cost = layers.cross_entropy(input=prediction, label=label) + prediction = fluid.layers.fc(input=layer_1_out, + size=class_dim, + act="softmax") + cost = fluid.layers.cross_entropy(input=prediction, label=label) - avg_cost = layers.mean(x=cost) - adam_optimizer = AdamOptimizer(learning_rate=0.002) - opts = adam_optimizer.minimize(avg_cost) - acc = layers.accuracy(input=prediction, label=label) + avg_cost = fluid.layers.mean(x=cost) + adam_optimizer = fluid.optimizer.Adam(learning_rate=0.002) + adam_optimizer.minimize(avg_cost) + acc = fluid.layers.accuracy(input=prediction, label=label) return avg_cost, acc @@ -48,7 +47,7 @@ def to_lodtensor(data, place): lod.append(cur_len) flattened_data = np.concatenate(data, axis=0).astype("int64") flattened_data = flattened_data.reshape([len(flattened_data), 1]) - res = core.LoDTensor() + res = fluid.LoDTensor() res.set(flattened_data, place) res.set_lod([lod]) return res @@ -65,7 +64,7 @@ def prepare_feed_data(data, place): label = np.array(map(lambda x: x[1], data)).astype("int64") label = label.reshape([len(label), 1]) - tensor_label = core.LoDTensor() + tensor_label = fluid.LoDTensor() tensor_label.set(label, place) return tensor_words, tensor_label @@ -86,17 +85,17 @@ def main(): paddle.reader.shuffle( paddle.dataset.imdb.train(word_dict), buf_size=BATCH_SIZE * 10), batch_size=BATCH_SIZE) - place = core.CPUPlace() - exe = Executor(place) + place = fluid.CPUPlace() + exe = fluid.Executor(place) - exe.run(framework.default_startup_program()) + exe.run(fluid.default_startup_program()) for pass_id in xrange(PASS_NUM): for data in train_data(): chopped_data = chop_data(data) tensor_words, tensor_label = prepare_feed_data(chopped_data, place) - outs = exe.run(framework.default_main_program(), + outs = exe.run(fluid.default_main_program(), feed={"words": tensor_words, "label": tensor_label}, fetch_list=[cost, acc]) diff --git a/python/paddle/v2/fluid/tests/book/test_word2vec.py b/python/paddle/v2/fluid/tests/book/test_word2vec.py index 0629e1cab7fd7e501d9cbf3ae8ee22fe9383ad2b..b0cd1a518cd1be60474df126470573a5a5b81b70 100644 --- a/python/paddle/v2/fluid/tests/book/test_word2vec.py +++ b/python/paddle/v2/fluid/tests/book/test_word2vec.py @@ -1,10 +1,6 @@ import numpy as np import paddle.v2 as paddle -import paddle.v2.fluid.core as core -import paddle.v2.fluid.framework as framework -import paddle.v2.fluid.layers as layers -from paddle.v2.fluid.executor import Executor -from paddle.v2.fluid.optimizer import SGDOptimizer +import paddle.v2.fluid as fluid PASS_NUM = 100 EMBED_SIZE = 32 @@ -16,57 +12,57 @@ IS_SPARSE = True word_dict = paddle.dataset.imikolov.build_dict() dict_size = len(word_dict) -first_word = layers.data(name='firstw', shape=[1], dtype='int64') -second_word = layers.data(name='secondw', shape=[1], dtype='int64') -third_word = layers.data(name='thirdw', shape=[1], dtype='int64') -forth_word = layers.data(name='forthw', shape=[1], dtype='int64') -next_word = layers.data(name='nextw', shape=[1], dtype='int64') +first_word = fluid.layers.data(name='firstw', shape=[1], dtype='int64') +second_word = fluid.layers.data(name='secondw', shape=[1], dtype='int64') +third_word = fluid.layers.data(name='thirdw', shape=[1], dtype='int64') +forth_word = fluid.layers.data(name='forthw', shape=[1], dtype='int64') +next_word = fluid.layers.data(name='nextw', shape=[1], dtype='int64') -embed_first = layers.embedding( +embed_first = fluid.layers.embedding( input=first_word, size=[dict_size, EMBED_SIZE], dtype='float32', is_sparse=IS_SPARSE, param_attr={'name': 'shared_w'}) -embed_second = layers.embedding( +embed_second = fluid.layers.embedding( input=second_word, size=[dict_size, EMBED_SIZE], dtype='float32', is_sparse=IS_SPARSE, param_attr={'name': 'shared_w'}) -embed_third = layers.embedding( +embed_third = fluid.layers.embedding( input=third_word, size=[dict_size, EMBED_SIZE], dtype='float32', is_sparse=IS_SPARSE, param_attr={'name': 'shared_w'}) -embed_forth = layers.embedding( +embed_forth = fluid.layers.embedding( input=forth_word, size=[dict_size, EMBED_SIZE], dtype='float32', is_sparse=IS_SPARSE, param_attr={'name': 'shared_w'}) -concat_embed = layers.concat( +concat_embed = fluid.layers.concat( input=[embed_first, embed_second, embed_third, embed_forth], axis=1) -hidden1 = layers.fc(input=concat_embed, size=HIDDEN_SIZE, act='sigmoid') -predict_word = layers.fc(input=hidden1, size=dict_size, act='softmax') -cost = layers.cross_entropy(input=predict_word, label=next_word) -avg_cost = layers.mean(x=cost) -sgd_optimizer = SGDOptimizer(learning_rate=0.001) -opts = sgd_optimizer.minimize(avg_cost) +hidden1 = fluid.layers.fc(input=concat_embed, size=HIDDEN_SIZE, act='sigmoid') +predict_word = fluid.layers.fc(input=hidden1, size=dict_size, act='softmax') +cost = fluid.layers.cross_entropy(input=predict_word, label=next_word) +avg_cost = fluid.layers.mean(x=cost) +sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) +sgd_optimizer.minimize(avg_cost) train_reader = paddle.batch( paddle.dataset.imikolov.train(word_dict, N), BATCH_SIZE) -place = core.CPUPlace() -exe = Executor(place) +place = fluid.CPUPlace() +exe = fluid.Executor(place) # fix https://github.com/PaddlePaddle/Paddle/issues/5434 then remove # below exit line. exit(0) -exe.run(framework.default_startup_program()) +exe.run(fluid.default_startup_program()) for pass_id in range(PASS_NUM): for data in train_reader(): @@ -74,36 +70,15 @@ for pass_id in range(PASS_NUM): input_data = map(lambda x: np.array(x).astype("int64"), input_data) input_data = map(lambda x: np.expand_dims(x, axis=1), input_data) - first_data = input_data[0] - first_tensor = core.LoDTensor() - first_tensor.set(first_data, place) - - second_data = input_data[1] - second_tensor = core.LoDTensor() - second_tensor.set(second_data, place) - - third_data = input_data[2] - third_tensor = core.LoDTensor() - third_tensor.set(third_data, place) - - forth_data = input_data[3] - forth_tensor = core.LoDTensor() - forth_tensor.set(forth_data, place) - - next_data = input_data[4] - next_tensor = core.LoDTensor() - next_tensor.set(next_data, place) - - outs = exe.run(framework.default_main_program(), - feed={ - 'firstw': first_tensor, - 'secondw': second_tensor, - 'thirdw': third_tensor, - 'forthw': forth_tensor, - 'nextw': next_tensor - }, - fetch_list=[avg_cost]) - out = np.array(outs[0]) - if out[0] < 10.0: + avg_cost_np = exe.run(fluid.default_main_program(), + feed={ + 'firstw': input_data[0], + 'secondw': input_data[1], + 'thirdw': input_data[2], + 'forthw': input_data[3], + 'nextw': input_data[4] + }, + fetch_list=[avg_cost]) + if avg_cost_np[0] < 10.0: exit(0) # if avg cost less than 10.0, we think our code is good. exit(1) diff --git a/python/paddle/v2/fluid/tests/op_test.py b/python/paddle/v2/fluid/tests/op_test.py index 51023bd19a8326152335eabc9e96600427527f26..e83c4a0622013cbfebdf39434ef252412697acb1 100644 --- a/python/paddle/v2/fluid/tests/op_test.py +++ b/python/paddle/v2/fluid/tests/op_test.py @@ -261,7 +261,10 @@ class OpTest(unittest.TestCase): feed_map = self.feed_var(inputs, place) exe = Executor(place) - outs = exe.run(program, feed=feed_map, fetch_list=fetch_list) + outs = exe.run(program, + feed=feed_map, + fetch_list=fetch_list, + return_numpy=False) for out_name, out_dup in Operator.get_op_outputs(self.op_type): if out_name not in self.outputs: @@ -500,5 +503,6 @@ class OpTest(unittest.TestCase): fetch_list = [g for p, g in param_grad_list] executor = Executor(place) - result = executor.run(prog, feed_dict, fetch_list) - return map(np.array, result) + return map( + np.array, + executor.run(prog, feed_dict, fetch_list, return_numpy=False)) diff --git a/python/paddle/v2/fluid/tests/test_array_read_write_op.py b/python/paddle/v2/fluid/tests/test_array_read_write_op.py index e019a4e15f0e25deaedf30911b44e576c8f89013..b7790b01062d480cbd6c9e1a626d318385b4f61e 100644 --- a/python/paddle/v2/fluid/tests/test_array_read_write_op.py +++ b/python/paddle/v2/fluid/tests/test_array_read_write_op.py @@ -52,15 +52,13 @@ class TestArrayReadWrite(unittest.TestCase): exe = Executor(cpu) - tensor = core.LoDTensor() - tensor.set(numpy.random.random(size=(100, 100)).astype('float32'), cpu) - - outs = map(numpy.array, - exe.run(feed={'x0': tensor, - 'x1': tensor, - 'x2': tensor}, - fetch_list=[a_sum, x_sum], - scope=scope)) + tensor = numpy.random.random(size=(100, 100)).astype('float32') + + outs = exe.run(feed={'x0': tensor, + 'x1': tensor, + 'x2': tensor}, + fetch_list=[a_sum, x_sum], + scope=scope) self.assertEqual(outs[0], outs[1]) total_sum = layers.sums(input=[a_sum, x_sum]) @@ -72,12 +70,11 @@ class TestArrayReadWrite(unittest.TestCase): [each_x.name + "@GRAD" for each_x in x]) g_out = [ item.sum() - for item in map( - numpy.array, - exe.run(feed={'x0': tensor, - 'x1': tensor, - 'x2': tensor}, - fetch_list=g_vars)) + for item in exe.run( + feed={'x0': tensor, + 'x1': tensor, + 'x2': tensor}, + fetch_list=g_vars) ] g_out_sum = numpy.array(g_out).sum() diff --git a/python/paddle/v2/fluid/tests/test_conditional_block.py b/python/paddle/v2/fluid/tests/test_conditional_block.py index 2a30fd107968ce0fa188bda44e731ad760dce1f5..d953ee7ddc37d150d87cbd680379410a4d16f6b1 100644 --- a/python/paddle/v2/fluid/tests/test_conditional_block.py +++ b/python/paddle/v2/fluid/tests/test_conditional_block.py @@ -21,18 +21,15 @@ class ConditionalBlock(unittest.TestCase): exe = Executor(cpu) exe.run(g_startup_program) - x = core.LoDTensor() - x.set(numpy.random.random(size=(10, 1)).astype('float32'), cpu) + x = numpy.random.random(size=(10, 1)).astype('float32') - outs = map(numpy.array, exe.run(feed={'X': x}, fetch_list=[out]))[0] + outs = exe.run(feed={'X': x}, fetch_list=[out])[0] print outs loss = layers.mean(x=out) append_backward_ops(loss=loss) - outs = map(numpy.array, - exe.run(feed={'X': x}, - fetch_list=[ - g_main_program.block(0).var(data.name + "@GRAD") - ]))[0] + outs = exe.run( + feed={'X': x}, + fetch_list=[g_main_program.block(0).var(data.name + "@GRAD")])[0] print outs diff --git a/python/paddle/v2/fluid/tests/test_conv2d_op.py b/python/paddle/v2/fluid/tests/test_conv2d_op.py index 2240dc73cdd31f320fed174dd811e93c6640137f..e82e3ab0c9c0bc75a13a8948fda925bc4f0b6512 100644 --- a/python/paddle/v2/fluid/tests/test_conv2d_op.py +++ b/python/paddle/v2/fluid/tests/test_conv2d_op.py @@ -16,8 +16,8 @@ def conv2d_forward_naive(input, filter, group, conv_param): out_w = 1 + (in_w + 2 * pad[1] - (dilation[1] * (f_w - 1) + 1)) / stride[1] out = np.zeros((in_n, out_c, out_h, out_w)) - d_bolck_w = (dilation[0] * (f_h - 1) + 1) - d_bolck_h = (dilation[1] * (f_w - 1) + 1) + d_bolck_h = (dilation[0] * (f_h - 1) + 1) + d_bolck_w = (dilation[1] * (f_w - 1) + 1) input_pad = np.pad(input, ((0, ), (0, ), (pad[0], ), (pad[1], )), mode='constant', @@ -167,27 +167,27 @@ class TestWithDilation(TestConv2dOp): #----------------Conv2dCudnn---------------- class TestCudnn(TestConv2dOp): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" class TestCudnnWithPad(TestWithPad): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" class TestCudnnWithStride(TestWithStride): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" class TestCudnnWithGroup(TestWithGroup): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" class TestCudnnWith1x1(TestWith1x1): def init_op_type(self): - self.op_type = "conv_cudnn" + self.op_type = "conv2d_cudnn" # cudnn v5 does not support dilation conv. diff --git a/python/paddle/v2/fluid/tests/test_conv3d_op.py b/python/paddle/v2/fluid/tests/test_conv3d_op.py index 934ea46437d67b78309a86a2779e0c6577399136..8593dff20b5c283d5862206dfb0c0d2501039d07 100644 --- a/python/paddle/v2/fluid/tests/test_conv3d_op.py +++ b/python/paddle/v2/fluid/tests/test_conv3d_op.py @@ -169,5 +169,31 @@ class TestWithDilation(TestConv3dOp): self.groups = 3 +class TestCudnn(TestConv3dOp): + def init_op_type(self): + self.op_type = "conv3d_cudnn" + + +class TestWithGroup1Cudnn(TestWithGroup1): + def init_op_type(self): + self.op_type = "conv3d_cudnn" + + +class TestWithGroup2Cudnn(TestWithGroup2): + def init_op_type(self): + self.op_type = "conv3d_cudnn" + + +class TestWith1x1Cudnn(TestWith1x1): + def init_op_type(self): + self.op_type = "conv3d_cudnn" + + +# FIXME(typhoonzero): find a way to determine if +# using cudnn > 6 in python +# class TestWithDilationCudnn(TestWithDilation): +# def init_op_type(self): +# self.op_type = "conv3d_cudnn" + if __name__ == '__main__': unittest.main() diff --git a/python/paddle/v2/fluid/tests/test_executor_and_mul.py b/python/paddle/v2/fluid/tests/test_executor_and_mul.py index da64739de5eb4eca8db8ac8370276c41692a7242..558273e30dff7fb74f78751f4fe569f79a453d0d 100644 --- a/python/paddle/v2/fluid/tests/test_executor_and_mul.py +++ b/python/paddle/v2/fluid/tests/test_executor_and_mul.py @@ -1,5 +1,5 @@ import unittest -from paddle.v2.fluid.layers import mul, data +from paddle.v2.fluid.layers import mul, data, sequence_pool import paddle.v2.fluid.core as core from paddle.v2.fluid.executor import Executor from paddle.v2.fluid.framework import g_main_program @@ -17,17 +17,13 @@ class TestExecutor(unittest.TestCase): out = mul(x=a, y=b) place = core.CPUPlace() a_np = numpy.random.random((100, 784)).astype('float32') - tensor_a = core.LoDTensor() - tensor_a.set(a_np, place) b_np = numpy.random.random((784, 100)).astype('float32') - tensor_b = core.LoDTensor() - tensor_b.set(b_np, place) exe = Executor(place) outs = exe.run(g_main_program, - feed={'a': tensor_a, - 'b': tensor_b}, + feed={'a': a_np, + 'b': b_np}, fetch_list=[out]) - out = numpy.array(outs[0]) + out = outs[0] self.assertEqual((100, 100), out.shape) self.assertTrue(numpy.allclose(out, numpy.dot(a_np, b_np))) diff --git a/python/paddle/v2/fluid/tests/test_inference_model_io.py b/python/paddle/v2/fluid/tests/test_inference_model_io.py index 74f1ce23262bbc969f9544885a7390534c76cdf6..60aed62ead83dedbeb9438c431ec292558d88ce5 100644 --- a/python/paddle/v2/fluid/tests/test_inference_model_io.py +++ b/python/paddle/v2/fluid/tests/test_inference_model_io.py @@ -1,13 +1,13 @@ -import paddle.v2 as paddle -import paddle.v2.fluid.layers as layers +import unittest + +import numpy as np import paddle.v2.fluid.core as core -import paddle.v2.fluid.optimizer as optimizer +import paddle.v2.fluid.executor as executor +import paddle.v2.fluid.layers as layers +import paddle.v2.fluid.optimizer as optimizer from paddle.v2.fluid.framework import Program from paddle.v2.fluid.io import save_inference_model, load_inference_model -import paddle.v2.fluid.executor as executor -import unittest -import numpy as np class TestBook(unittest.TestCase): @@ -44,7 +44,7 @@ class TestBook(unittest.TestCase): x=cost, main_program=program, startup_program=init_program) sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) - opts = sgd_optimizer.minimize(avg_cost, init_program) + sgd_optimizer.minimize(avg_cost, init_program) place = core.CPUPlace() exe = executor.Executor(place) @@ -52,25 +52,20 @@ class TestBook(unittest.TestCase): exe.run(init_program, feed={}, fetch_list=[]) for i in xrange(100): - x_data = np.array( + tensor_x = np.array( [[1, 1], [1, 2], [3, 4], [5, 2]]).astype("float32") - y_data = np.array([[-2], [-3], [-7], [-7]]).astype("float32") + tensor_y = np.array([[-2], [-3], [-7], [-7]]).astype("float32") - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) exe.run(program, feed={'x': tensor_x, 'y': tensor_y}, fetch_list=[avg_cost]) save_inference_model(MODEL_DIR, ["x", "y"], [avg_cost], exe, program) - outs = exe.run(program, - feed={'x': tensor_x, - 'y': tensor_y}, - fetch_list=[avg_cost]) - expected = np.array(outs[0]) + expected = exe.run(program, + feed={'x': tensor_x, + 'y': tensor_y}, + fetch_list=[avg_cost])[0] reload(executor) # reload to build a new scope exe = executor.Executor(place) @@ -83,7 +78,7 @@ class TestBook(unittest.TestCase): feed={feed_var_names[0]: tensor_x, feed_var_names[1]: tensor_y}, fetch_list=fetch_vars) - actual = np.array(outs[0]) + actual = outs[0] self.assertEqual(feed_var_names, ["x", "y"]) self.assertEqual(len(fetch_vars), 1) diff --git a/python/paddle/v2/fluid/tests/test_lod_array_length_op.py b/python/paddle/v2/fluid/tests/test_lod_array_length_op.py index a01ae83772185df218b8c453557dc0cac719673b..8a4be545eda841dbda33b7c8cae9f91a4199f2f8 100644 --- a/python/paddle/v2/fluid/tests/test_lod_array_length_op.py +++ b/python/paddle/v2/fluid/tests/test_lod_array_length_op.py @@ -13,7 +13,7 @@ class TestLoDArrayLength(unittest.TestCase): arr_len = layers.array_length(arr) cpu = core.CPUPlace() exe = Executor(cpu) - result = numpy.array(exe.run(fetch_list=[arr_len])[0]) + result = exe.run(fetch_list=[arr_len])[0] self.assertEqual(11, result[0]) diff --git a/python/paddle/v2/fluid/tests/test_lod_tensor_array_ops.py b/python/paddle/v2/fluid/tests/test_lod_tensor_array_ops.py index 16e64b8cd52d72a3bbc84e43d772b843dad0129a..0a916a55bc3d097e17fb504b0d6b2f2818f030c9 100644 --- a/python/paddle/v2/fluid/tests/test_lod_tensor_array_ops.py +++ b/python/paddle/v2/fluid/tests/test_lod_tensor_array_ops.py @@ -18,7 +18,11 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): tensor.set_lod([[0, 3, 9, 10]]) expect = map(lambda x: numpy.array(x).astype('int32'), [[3, 0, 9], [4, 1], [5, 2], [6], [7], [8]]) - self.main(tensor=tensor, expect_array=expect, expect_lod=[] * 6) + self.main( + tensor=tensor, + expect_array=expect, + expect_lod=[] * 6, + expect_max_len=6) def test_lod_tensor_to_array_level_0_empty_seq(self): tensor = core.LoDTensor() @@ -27,7 +31,11 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): tensor.set_lod([[0, 3, 9, 9, 10]]) expect = map(lambda x: numpy.array(x).astype('int32'), [[3, 0, 9], [4, 1], [5, 2], [6], [7], [8]]) - self.main(tensor=tensor, expect_array=expect, expect_lod=[] * 6) + self.main( + tensor=tensor, + expect_array=expect, + expect_lod=[] * 6, + expect_max_len=6) def test_lod_tensor_to_array_level_1(self): tensor = core.LoDTensor() @@ -44,7 +52,11 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): ] lod = [[[0, 2, 5]], [[0, 6, 12]], [[0, 3]]] - self.main(tensor=tensor, expect_array=expect, expect_lod=lod) + self.main( + tensor=tensor, + expect_array=expect, + expect_lod=lod, + expect_max_len=3) def test_lod_tensor_to_array_level_1_empty_seq(self): tensor = core.LoDTensor() @@ -63,7 +75,11 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): ] lod = [[[0, 5, 8, 8, 15]], [[0, 2, 6, 7, 8]], [[0, 2, 6]], [[0, 2]]] - self.main(tensor=tensor, expect_array=expect, expect_lod=lod) + self.main( + tensor=tensor, + expect_array=expect, + expect_lod=lod, + expect_max_len=4) def test_lod_tensor_to_array_level_2(self): tensor = core.LoDTensor() @@ -80,7 +96,11 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): ] lod = [[[0, 1, 3, 4], [0, 1, 4, 8, 12]], [[0, 4, 7], [0, 1, 5, 9, 17, 21, 27, 31]], [[0, 2], [0, 6, 7]]] - self.main(tensor=tensor, expect_array=expect, expect_lod=lod) + self.main( + tensor=tensor, + expect_array=expect, + expect_lod=lod, + expect_max_len=3) def test_lod_tensor_to_array_level_2_skip_level(self): tensor = core.LoDTensor() @@ -88,14 +108,21 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): numpy.arange(50).reshape(50, 1).astype('int32'), self.place()) tensor.set_lod([[0, 2, 5, 6], [0, 2, 5, 6, 10, 12, 13], [0, 3, 7, 11, 17, 21, 22, 23, 27, 31, 39, 45, 46, 50]]) - self.main(tensor=tensor, expect_array=None, expect_lod=None, level=1) - - def main(self, tensor, expect_array, expect_lod, level=0): + self.main( + tensor=tensor, + expect_array=None, + expect_lod=None, + expect_max_len=4, + level=1) + + def main(self, tensor, expect_array, expect_lod, expect_max_len, level=0): place = self.place() program = Program() x = layers.data(name='x', shape=[10], main_program=program) x.persistable = True table = layers.lod_rank_table(x, level=level, main_program=program) + max_len = layers.max_sequence_len(table, main_program=program) + max_len.persistable = True array = layers.lod_tensor_to_array(x, table, main_program=program) array.persistable = True @@ -110,6 +137,10 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): self.check_array_same(array, expect_array, expect_lod) self.check_tensor_same(scope.find_var(result.name).get_tensor(), tensor) + self.assertEqual( + numpy.array(scope.find_var(max_len.name).get_tensor())[0], + expect_max_len) + def check_array_same(self, array, expect_tensor, expect_lod): self.assertEqual(len(expect_tensor), len(array)) for i, exp in enumerate(zip(expect_tensor, expect_lod)): @@ -151,10 +182,11 @@ class TestCPULoDTensorArrayOpGrad(unittest.TestCase): exe = Executor(place) g_out = [ - item.sum() - for item in map( - numpy.array, - exe.run(program, feed={'x': tensor}, fetch_list=[g_vars])) + numpy.array(item).sum() + for item in exe.run(program, + feed={'x': tensor}, + fetch_list=[g_vars], + return_numpy=False) ] g_out_sum = numpy.array(g_out).sum() diff --git a/python/paddle/v2/fluid/tests/test_mnist_if_else_op.py b/python/paddle/v2/fluid/tests/test_mnist_if_else_op.py index e76357a5be07d79eafee4c3a27911efe8a3eaef4..50fcc4a72ddbd6d7a3d3b73434c6ac8de5a006e2 100644 --- a/python/paddle/v2/fluid/tests/test_mnist_if_else_op.py +++ b/python/paddle/v2/fluid/tests/test_mnist_if_else_op.py @@ -65,17 +65,10 @@ class TestMNISTIfElseOp(unittest.TestCase): y_data = np.array(map(lambda x: x[1], data)).astype("int64") y_data = np.expand_dims(y_data, axis=1) - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) - - outs = map(np.array, - exe.run(kwargs['main_program'], - feed={'x': tensor_x, - 'y': tensor_y}, - fetch_list=[avg_loss])) + outs = exe.run(kwargs['main_program'], + feed={'x': x_data, + 'y': y_data}, + fetch_list=[avg_loss]) print outs[0] if outs[0] < 1.0: return @@ -129,19 +122,12 @@ class TestMNISTIfElseOp(unittest.TestCase): for data in train_reader(): x_data = np.array(map(lambda x: x[0], data)).astype("float32") y_data = np.array(map(lambda x: x[1], data)).astype("int64") - y_data = np.expand_dims(y_data, axis=1) - - tensor_x = core.LoDTensor() - tensor_x.set(x_data, place) - - tensor_y = core.LoDTensor() - tensor_y.set(y_data, place) + y_data = y_data.reshape((y_data.shape[0], 1)) - outs = map(np.array, - exe.run(kwargs['main_program'], - feed={'x': tensor_x, - 'y': tensor_y}, - fetch_list=[avg_loss])) + outs = exe.run(kwargs['main_program'], + feed={'x': x_data, + 'y': y_data}, + fetch_list=[avg_loss]) print outs[0] if outs[0] < 1.0: return diff --git a/python/paddle/v2/fluid/tests/test_parameter.py b/python/paddle/v2/fluid/tests/test_parameter.py index d467e4bbb79b291c442c643158ef6c0d630920dd..13f6278ad8b7244e7980b32463f29d7a824b4572 100644 --- a/python/paddle/v2/fluid/tests/test_parameter.py +++ b/python/paddle/v2/fluid/tests/test_parameter.py @@ -24,7 +24,7 @@ class TestParameter(unittest.TestCase): self.assertEqual(0, param.block.idx) exe = Executor(core.CPUPlace()) p = exe.run(g_main_program, fetch_list=[param])[0] - self.assertTrue(np.allclose(np.array(p), np.ones(shape) * val)) + self.assertTrue(np.allclose(p, np.ones(shape) * val)) p = io.get_parameter_value_by_name('fc.w', exe, g_main_program) self.assertTrue(np.allclose(np.array(p), np.ones(shape) * val)) diff --git a/python/paddle/v2/fluid/tests/test_recurrent_op.py b/python/paddle/v2/fluid/tests/test_recurrent_op.py index 88bcdc3e6a21881ace2be53c22a62d78df30a974..84548847f76c6315da000e1b3d062deafe55a05e 100644 --- a/python/paddle/v2/fluid/tests/test_recurrent_op.py +++ b/python/paddle/v2/fluid/tests/test_recurrent_op.py @@ -156,7 +156,7 @@ class RecurrentOpTest1(unittest.TestCase): feed=self.feed_map, fetch_list=[self.output]) - return np.array(out[0]) + return out[0] def backward(self): self.feed_map = { @@ -171,7 +171,8 @@ class RecurrentOpTest1(unittest.TestCase): exe = Executor(self.place) return exe.run(self.main_program, feed=self.feed_map, - fetch_list=fetch_list) + fetch_list=fetch_list, + return_numpy=False) def test_backward(self): self.check_forward() diff --git a/python/paddle/v2/fluid/tests/test_rnn_memory_helper_op.py b/python/paddle/v2/fluid/tests/test_rnn_memory_helper_op.py index a3cba92504a28590083df57e69f7662a887d94a6..9999165ed509aa40f31f26aa676f381561bd0016 100644 --- a/python/paddle/v2/fluid/tests/test_rnn_memory_helper_op.py +++ b/python/paddle/v2/fluid/tests/test_rnn_memory_helper_op.py @@ -7,12 +7,6 @@ import numpy as np import paddle.v2.fluid.core as core -def create_tensor(np_data, place): - tensor = core.LoDTensor() - tensor.set(np_data, place) - return tensor - - class RNNMemoryHelperOpTest(unittest.TestCase): def setUp(self): self.program = Program() @@ -30,13 +24,13 @@ class RNNMemoryHelperOpTest(unittest.TestCase): def test_forward(self): x_np = np.random.normal(size=(2, 3)).astype("float32") - self.feed_map = {'X': create_tensor(x_np, self.place)} + self.feed_map = {'X': x_np} self.fetch_list = [self.Out] exe = Executor(self.place) out = exe.run(self.program, feed=self.feed_map, fetch_list=self.fetch_list) - np.isclose(np.array(out[0]), x_np, rtol=1e-5) + self.assertTrue(np.allclose(out[0], x_np, rtol=1e-5)) class RNNMemoryHelperGradOpTest(unittest.TestCase): @@ -66,8 +60,7 @@ class RNNMemoryHelperGradOpTest(unittest.TestCase): def test_backward(self): self.feed_map = { - name: create_tensor( - np.random.normal(size=(2, 3)).astype("float32"), self.place) + name: np.random.normal(size=(2, 3)).astype("float32") for name in self.input_names } self.fetch_list = [self.output_vars['X@GRAD']] @@ -76,7 +69,7 @@ class RNNMemoryHelperGradOpTest(unittest.TestCase): out = exe.run(self.program, feed=self.feed_map, fetch_list=self.fetch_list) - np.isclose(np.array(out[0]), self.feed_map['Out@GRAD'], rtol=1e-5) + np.isclose(out[0], self.feed_map['Out@GRAD'], rtol=1e-5) class RNNMemoryHelperGradOpWithoutInputTest(unittest.TestCase): @@ -110,8 +103,7 @@ class RNNMemoryHelperGradOpWithoutInputTest(unittest.TestCase): def test_backward(self): self.feed_map = { - name: create_tensor( - np.random.normal(size=(2, 3)).astype("float32"), self.place) + name: np.random.normal(size=(2, 3)).astype("float32") for name in ['X', 'Out'] } self.fetch_list = [self.output_vars['X@GRAD']] @@ -120,10 +112,9 @@ class RNNMemoryHelperGradOpWithoutInputTest(unittest.TestCase): out = exe.run(self.program, feed=self.feed_map, fetch_list=self.fetch_list) - np.isclose( - np.array(out[0]), - np.zeros(shape=(2, 3)).astype("float32"), - rtol=1e-5) + self.assertTrue( + np.allclose( + out[0], np.zeros(shape=(2, 3)).astype("float32"), rtol=1e-5)) if __name__ == '__main__': diff --git a/python/paddle/v2/fluid/tests/test_shrink_rnn_memory.py b/python/paddle/v2/fluid/tests/test_shrink_rnn_memory.py index 953629d610e183cdddf97081f94a77951fe979d8..05f6a560644f18da6ff2e015911901cd73cc36c9 100644 --- a/python/paddle/v2/fluid/tests/test_shrink_rnn_memory.py +++ b/python/paddle/v2/fluid/tests/test_shrink_rnn_memory.py @@ -27,19 +27,16 @@ class TestShrinkRNNMemory(unittest.TestCase): tensor_np = numpy.random.random(size=(3, 100)).astype('float32') tensor.set(tensor_np, cpu) exe = Executor(cpu) - outs = map(numpy.array, - exe.run(feed={'x': tensor}, fetch_list=[mem1, mem2, mem3])) + outs = exe.run(feed={'x': tensor}, fetch_list=[mem1, mem2, mem3]) self.assertTrue(numpy.allclose(tensor_np[0:3], outs[0])) self.assertTrue(numpy.allclose(tensor_np[0:2], outs[1])) self.assertTrue(numpy.allclose(tensor_np[0:1], outs[2])) mem3_mean = layers.mean(x=mem3) append_backward_ops(loss=mem3_mean) - x_grad = map(numpy.array, - exe.run(feed={'x': tensor}, - fetch_list=[ - g_main_program.global_block().var('x@GRAD') - ]))[0] + x_grad = exe.run( + feed={'x': tensor}, + fetch_list=[g_main_program.global_block().var('x@GRAD')])[0] self.assertAlmostEqual(1.0, x_grad.sum(), delta=0.1) diff --git a/python/paddle/v2/fluid/tests/test_split_and_merge_lod_tensor_op.py b/python/paddle/v2/fluid/tests/test_split_and_merge_lod_tensor_op.py index a98cb3bbab8442886206b59a2b591fee96deeb9f..f5da4e408f0a83dbf6da530b478e91bbf9cd5ab2 100644 --- a/python/paddle/v2/fluid/tests/test_split_and_merge_lod_tensor_op.py +++ b/python/paddle/v2/fluid/tests/test_split_and_merge_lod_tensor_op.py @@ -98,7 +98,11 @@ class TestCPULoDTensorArrayOps(unittest.TestCase): exe = Executor(place) scope = core.Scope() - exe.run(program, feed={'x': tensor, 'y': mask}, scope=scope) + exe.run(program, + feed={'x': tensor, + 'y': mask}, + scope=scope, + return_numpy=False) var_true = scope.find_var(out_true.name).get_tensor() @@ -169,7 +173,8 @@ class TestCPUSplitMergeLoDTensorGrad(unittest.TestCase): feed={'x': tensor, 'y': mask}, fetch_list=[g_vars], - scope=scope)) + scope=scope, + return_numpy=False)) ] g_out_sum = np.array(g_out).sum() diff --git a/python/paddle/v2/fluid/tests/test_while_op.py b/python/paddle/v2/fluid/tests/test_while_op.py index fca0cdcc319ff661ced33b6bcd242c894941576c..033b03a4957131e1155c61e8ed2f10eefb23fda4 100644 --- a/python/paddle/v2/fluid/tests/test_while_op.py +++ b/python/paddle/v2/fluid/tests/test_while_op.py @@ -55,19 +55,10 @@ class TestWhileOp(unittest.TestCase): for i in xrange(3): d.append(numpy.random.random(size=[10]).astype('float32')) - d_tensor = [] - for item in d: - t = core.LoDTensor() - t.set(item, cpu) - d_tensor.append(t) - - outs = map(numpy.array, - exe.run(feed={ - 'd0': d_tensor[0], - 'd1': d_tensor[1], - 'd2': d_tensor[2] - }, - fetch_list=[sum_result])) + outs = exe.run(feed={'d0': d[0], + 'd1': d[1], + 'd2': d[2]}, + fetch_list=[sum_result]) self.assertAlmostEqual(numpy.sum(d), numpy.sum(outs[0]), delta=0.01) diff --git a/python/paddle/v2/fluid/tests/tmp/inference_model/__model__ b/python/paddle/v2/fluid/tests/tmp/inference_model/__model__ deleted file mode 100644 index e333d10da94943372b0fe4dedd9d857817ec9ca6..0000000000000000000000000000000000000000 Binary files a/python/paddle/v2/fluid/tests/tmp/inference_model/__model__ and /dev/null differ diff --git a/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.b_0 b/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.b_0 deleted file mode 100644 index b1e5fad056e58f23c2cf917a3f4c4d4632ae7d58..0000000000000000000000000000000000000000 Binary files a/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.b_0 and /dev/null differ diff --git a/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.w_0 b/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.w_0 deleted file mode 100644 index 2f41796c0495570941c236c8c3f422b3cbd5edd2..0000000000000000000000000000000000000000 Binary files a/python/paddle/v2/fluid/tests/tmp/inference_model/fc_0.w_0 and /dev/null differ diff --git a/python/paddle/v2/framework/tests/test_elementwise_mod_op.py b/python/paddle/v2/framework/tests/test_elementwise_mod_op.py deleted file mode 100644 index 35c38147a24fb237b4607836a86cffa81b2d8904..0000000000000000000000000000000000000000 --- a/python/paddle/v2/framework/tests/test_elementwise_mod_op.py +++ /dev/null @@ -1,36 +0,0 @@ -import unittest -import numpy as np -from op_test import OpTest - - -class ElementwiseModOp(OpTest): - def setUp(self): - self.op_type = "elementwise_mod" - """ Warning - CPU gradient check error! - 'X': np.random.randint((32,84)).astype("int32"), - 'Y': np.random.randint((32,84)).astype("int32") - """ - self.inputs = { - 'X': np.random.randint(1, 10, [13, 17]).astype("int32"), - 'Y': np.random.randint(1, 10, [13, 17]).astype("int32") - } - self.outputs = {'Out': np.mod(self.inputs['X'], self.inputs['Y'])} - - def test_check_output(self): - self.check_output() - - def test_check_grad_normal(self): - self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.05) - - def test_check_grad_ingore_x(self): - self.check_grad( - ['Y'], 'Out', max_relative_error=0.05, no_grad_set=set("X")) - - def test_check_grad_ingore_y(self): - self.check_grad( - ['X'], 'Out', max_relative_error=0.05, no_grad_set=set('Y')) - - -if __name__ == '__main__': - unittest.main()