提交 3c0f0793 编写于 作者: C chengduoZH

remove conflict and fix InferShape function

...@@ -13,9 +13,13 @@ function train() { ...@@ -13,9 +13,13 @@ function train() {
log="logs/${topology}-mkldnn-${bs}.log" log="logs/${topology}-mkldnn-${bs}.log"
elif [ $3 == "False" ]; then elif [ $3 == "False" ]; then
thread=`nproc` thread=`nproc`
# each trainer_count use only 1 core to avoid conflict
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
log="logs/${topology}-${thread}mklml-${bs}.log" log="logs/${topology}-${thread}mklml-${bs}.log"
else else
echo "Wrong input $3, use True or False." echo "Wrong input $3, use True or False."
exit 0
fi fi
args="batch_size=${bs}" args="batch_size=${bs}"
config="${topology}.py" config="${topology}.py"
......
###################
编译安装与单元测试
###################
.. contents::
1. 运行Docker GPU镜像出现 "CUDA driver version is insufficient"
----------------------------------------------------------------
用户在使用PaddlePaddle GPU的Docker镜像的时候,常常出现 `Cuda Error: CUDA driver version is insufficient for CUDA runtime version`, 原因在于没有把机器上CUDA相关的驱动和库映射到容器内部。
具体的解决方法是:
.. code-block:: bash
$ export CUDA_SO="$(\ls usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
$ export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
$ docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddlepaddle:latest-gpu
更多关于Docker的安装与使用, 请参考 `PaddlePaddle Docker 文档 <http://www.paddlepaddle.org/doc_cn/build_and_install/install/docker_install.html>`_ 。
2. CMake源码编译, 找到的PythonLibs和PythonInterp版本不一致
----------------------------------------------------------------
这是目前CMake寻找Python的逻辑存在缺陷,如果系统安装了多个Python版本,CMake找到的Python库和Python解释器版本可能有不一致现象,导致编译PaddlePaddle失败。正确的解决方法是,
用户强制指定特定的Python版本,具体操作如下:
.. code-block:: bash
cmake .. -DPYTHON_EXECUTABLE=<exc_path> -DPYTHON_LIBRARY=<lib_path> -DPYTHON_INCLUDE_DIR=<inc_path>
用户需要指定本机上Python的路径:``<exc_path>``, ``<lib_path>``, ``<inc_path>``
3. CMake源码编译,Paddle版本号为0.0.0
--------------------------------------
如果运行 :code:`paddle version`, 出现 :code:`PaddlePaddle 0.0.0`;或者运行 :code:`cmake ..`,出现
.. code-block:: bash
CMake Warning at cmake/version.cmake:20 (message):
Cannot add paddle version from git tag
那么用户需要拉取所有的远程分支到本机,命令为 :code:`git fetch upstream`,然后重新cmake即可。
4. paddlepaddle\*.whl is not a supported wheel on this platform.
------------------------------------------------------------------------
出现这个问题的主要原因是,没有找到和当前系统匹配的paddlepaddle安装包。最新的paddlepaddle python安装包支持Linux x86_64和MacOS 10.12操作系统,并安装了python 2.7和pip 9.0.1。
更新 :code:`pip` 包的方法是\:
.. code-block:: bash
pip install --upgrade pip
如果还不行,可以执行 :code:`python -c "import pip; print(pip.pep425tags.get_supported())"` 获取当前系统支持的python包的后缀,
并对比是否和正在安装的后缀一致。
如果系统支持的是 :code:`linux_x86_64` 而安装包是 :code:`manylinux1_x86_64` ,需要升级pip版本到最新;
如果系统支持 :code:`manylinux1_x86_64` 而安装包(本地)是 :code:`linux_x86_64` ,可以重命名这个whl包为 :code:`manylinux1_x86_64` 再安装。
5. 编译安装后执行 import paddle.v2 as paddle 报ImportError: No module named v2
------------------------------------------------------------------------------------------
先查看一下是否曾经安装过paddle v1版本,有的话需要先卸载:
pip uninstall py_paddle paddle
然后安装paddle的python环境, 在build目录下执行
pip install python/dist/paddle*.whl && pip install ../paddle/dist/py_paddle*.whl
6. 遇到“非法指令”或者是“illegal instruction”
--------------------------------------------
PaddlePaddle使用avx SIMD指令提高cpu执行效率,因此错误的使用二进制发行版可能会导致这种错误,请选择正确的版本。
7. python相关的单元测试都过不了
--------------------------------
如果出现以下python相关的单元测试都过不了的情况:
.. code-block:: bash
24 - test_PyDataProvider (Failed)
26 - test_RecurrentGradientMachine (Failed)
27 - test_NetworkCompare (Failed)
28 - test_PyDataProvider2 (Failed)
32 - test_Prediction (Failed)
33 - test_Compare (Failed)
34 - test_Trainer (Failed)
35 - test_TrainerOnePass (Failed)
36 - test_CompareTwoNets (Failed)
37 - test_CompareTwoOpts (Failed)
38 - test_CompareSparse (Failed)
39 - test_recurrent_machine_generation (Failed)
40 - test_PyDataProviderWrapper (Failed)
41 - test_config_parser (Failed)
42 - test_swig_api (Failed)
43 - layers_test (Failed)
并且查询PaddlePaddle单元测试的日志,提示:
.. code-block:: bash
paddle package is already in your PYTHONPATH. But unittest need a clean environment.
Please uninstall paddle package before start unittest. Try to 'pip uninstall paddle'.
解决办法是:
* 卸载PaddlePaddle包 :code:`pip uninstall paddle`, 清理掉老旧的PaddlePaddle安装包,使得单元测试有一个干净的环境。如果PaddlePaddle包已经在python的site-packages里面,单元测试会引用site-packages里面的python包,而不是源码目录里 :code:`/python` 目录下的python包。同时,即便设置 :code:`PYTHONPATH` 到 :code:`/python` 也没用,因为python的搜索路径是优先已经安装的python包。
###############
集群训练与预测
###############
.. contents::
1. 集群多节点训练,日志中保存均为网络通信类错误
------------------------------------------------
集群多节点训练,日志报错为网络通信类错误,比如 :code:`Connection reset by peer` 等。
此类报错通常是由于某一个节点的错误导致这个节点的训练进程退出,从而引发其他节点无法连接导致,可以参考下面的步骤排查:
* 从 :code:`train.log` , :code:`server.log` 找到最早报错的地方,查看是否是其他错误引发的报错(比如FPE,内存不足,磁盘空间不足等)。
* 如果发现最早的报错就是网络通信的问题,很有可能是非独占方式执行导致的端口冲突,可以联系OP,看当前MPI集群是否支持resource=full参数提交,如果支持增加此参数提交,并更换job 端口。
* 如果当前MPI集群并不支持任务独占模式,可以联系OP是否可以更换集群或升级当前集群。
####################
FAQ FAQ
#################### ====
.. contents:: .. toctree::
:maxdepth: 1
1. 如何减少内存占用
---------------------------------
神经网络的训练本身是一个非常消耗内存和显存的工作,经常会消耗数10GB的内存和数GB的显存。
PaddlePaddle的内存占用主要分为如下几个方面\:
* DataProvider缓冲池内存(只针对内存)
* 神经元激活内存(针对内存和显存)
* 参数内存 (针对内存和显存)
* 其他内存杂项
其中,其他内存杂项是指PaddlePaddle本身所用的一些内存,包括字符串分配,临时变量等等,暂不考虑在内。
减少DataProvider缓冲池内存
++++++++++++++++++++++++++
PyDataProvider使用的是异步加载,同时在内存里直接随即选取数据来做Shuffle。即
.. graphviz::
digraph {
rankdir=LR;
数据文件 -> 内存池 -> PaddlePaddle训练
}
所以,减小这个内存池即可减小内存占用,同时也可以加速开始训练前数据载入的过程。但是,这
个内存池实际上决定了shuffle的粒度。所以,如果将这个内存池减小,又要保证数据是随机的,
那么最好将数据文件在每次读取之前做一次shuffle。可能的代码为
.. literalinclude:: src/reduce_min_pool_size.py
这样做可以极大的减少内存占用,并且可能会加速训练过程,详细文档参考 :ref:`api_pydataprovider2` 。
神经元激活内存
++++++++++++++
神经网络在训练的时候,会对每一个激活暂存一些数据,如神经元激活值等。
在反向传递的时候,这些数据会被用来更新参数。这些数据使用的内存主要和两个参数有关系,
一是batch size,另一个是每条序列(Sequence)长度。所以,其实也是和每个mini-batch中包含
的时间步信息成正比。
所以做法可以有两种:
* 减小batch size。 即在网络配置中 :code:`settings(batch_size=1000)` 设置成一个小一些的值。但是batch size本身是神经网络的超参数,减小batch size可能会对训练结果产生影响。
* 减小序列的长度,或者直接扔掉非常长的序列。比如,一个数据集大部分序列长度是100-200,
但是突然有一个10000长的序列,就很容易导致内存超限,特别是在LSTM等RNN中。
参数内存
++++++++
PaddlePaddle支持非常多的优化算法(Optimizer),不同的优化算法需要使用不同大小的内存。
例如使用 :code:`adadelta` 算法,则需要使用等于权重参数规模大约5倍的内存。举例,如果参数保存下来的模型目录
文件为 :code:`100M`, 那么该优化算法至少需要 :code:`500M` 的内存。
可以考虑使用一些优化算法,例如 :code:`momentum`。
2. 如何加速PaddlePaddle的训练速度
---------------------------------
加速PaddlePaddle训练可以考虑从以下几个方面\:
* 减少数据载入的耗时
* 加速训练速度
* 利用分布式训练驾驭更多的计算资源
减少数据载入的耗时
++++++++++++++++++
使用\ :code:`pydataprovider`\ 时,可以减少缓存池的大小,同时设置内存缓存功能,即可以极大的加速数据载入流程。
:code:`DataProvider` 缓存池的减小,和之前减小通过减小缓存池来减小内存占用的原理一致。
.. literalinclude:: src/reduce_min_pool_size.py
同时 :code:`@provider` 接口有一个 :code:`cache` 参数来控制缓存方法,将其设置成 :code:`CacheType.CACHE_PASS_IN_MEM` 的话,会将第一个 :code:`pass` (过完所有训练数据即为一个pass)生成的数据缓存在内存里,在之后的 :code:`pass` 中,不会再从 :code:`python` 端读取数据,而是直接从内存的缓存里读取数据。这也会极大减少数据读入的耗时。
加速训练速度
++++++++++++
PaddlePaddle支持Sparse的训练,sparse训练需要训练特征是 :code:`sparse_binary_vector` 、 :code:`sparse_vector` 、或者 :code:`integer_value` 的任一一种。同时,与这个训练数据交互的Layer,需要将其Parameter设置成 sparse 更新模式,即设置 :code:`sparse_update=True`
这里使用简单的 :code:`word2vec` 训练语言模型距离,具体使用方法为\:
使用一个词前两个词和后两个词,来预测这个中间的词。这个任务的DataProvider为\:
.. literalinclude:: src/word2vec_dataprovider.py
这个任务的配置为\:
.. literalinclude:: src/word2vec_config.py
利用更多的计算资源
++++++++++++++++++
利用更多的计算资源可以分为一下几个方式来进行\:
* 单机CPU训练
* 使用多线程训练。设置命令行参数 :code:`trainer_count`。
* 单机GPU训练
* 使用显卡训练。设置命令行参数 :code:`use_gpu`。
* 使用多块显卡训练。设置命令行参数 :code:`use_gpu` 和 :code:`trainer_count` 。
* 多机训练
* 请参考 :ref:`cluster_train` 。
3. 遇到“非法指令”或者是“illegal instruction”
--------------------------------------------
PaddlePaddle使用avx SIMD指令提高cpu执行效率,因此错误的使用二进制发行版可能会导致这种错误,请选择正确的版本。
4. 如何选择SGD算法的学习率
--------------------------
在采用sgd/async_sgd进行训练时,一个重要的问题是选择正确的learning_rate。如果learning_rate太大,那么训练有可能不收敛,如果learning_rate太小,那么收敛可能很慢,导致训练时间过长。
通常做法是从一个比较大的learning_rate开始试,如果不收敛,那减少学习率10倍继续试验,直到训练收敛为止。那么如何判断训练不收敛呢?可以估计出如果模型采用不变的输出最小的cost0是多少。
如果训练过程的的cost明显高于这个常数输出的cost,那么我们可以判断为训练不收敛。举一个例子,假如我们是三分类问题,采用multi-class-cross-entropy作为cost,数据中0,1,2三类的比例为 :code:`0.2, 0.5, 0.3` , 那么常数输出所能达到的最小cost是 :code:`-(0.2*log(0.2)+0.5*log(0.5)+0.3*log(0.3))=1.03` 。如果训练一个pass(或者更早)后,cost还大于这个数,那么可以认为训练不收敛,应该降低学习率。
5. 如何初始化参数
-----------------
默认情况下,PaddlePaddle使用均值0,标准差为 :math:`\frac{1}{\sqrt{d}}` 来初始化参数。其中 :math:`d` 为参数矩阵的宽度。这种初始化方式在一般情况下不会产生很差的结果。如果用户想要自定义初始化方式,PaddlePaddle目前提供两种参数初始化的方式\:
* 高斯分布。将 :code:`param_attr` 设置成 :code:`param_attr=ParamAttr(initial_mean=0.0, initial_std=1.0)`
* 均匀分布。将 :code:`param_attr` 设置成 :code:`param_attr=ParamAttr(initial_max=1.0, initial_min=-1.0)`
比如设置一个全连接层的参数初始化方式和bias初始化方式,可以使用如下代码。
.. code-block:: python
hidden = fc_layer(input=ipt, param_attr=ParamAttr(initial_max=1.0, initial_min=-1.0),
bias_attr=ParamAttr(initial_mean=1.0, initial_std=0.0))
上述代码将bias全部初始化为1.0, 同时将参数初始化为 :code:`[1.0, -1.0]` 的均匀分布。
6. 如何共享参数
---------------
PaddlePaddle的参数使用名字 :code:`name` 作为参数的ID,相同名字的参数,会共享参数。设置参数的名字,可以使用 :code:`ParamAttr(name="YOUR_PARAM_NAME")` 来设置。更方便的设置方式,是使得要共享的参数使用同样的 :code:`ParamAttr` 对象。
简单的全连接网络,参数共享的配置示例为\:
.. literalinclude:: ../../python/paddle/trainer_config_helpers/tests/configs/shared_fc.py
这里 :code:`hidden_a` 和 :code:`hidden_b` 使用了同样的parameter和bias。并且softmax层的两个输入也使用了同样的参数 :code:`softmax_param`。
7. paddlepaddle\*.whl is not a supported wheel on this platform.
------------------------------------------------------------------------
出现这个问题的主要原因是,没有找到和当前系统匹配的paddlepaddle安装包。最新的paddlepaddle python安装包支持Linux x86_64和MacOS 10.12操作系统,并安装了python 2.7和pip 9.0.1。
更新 :code:`pip` 包的方法是\:
.. code-block:: bash
pip install --upgrade pip
如果还不行,可以执行 :code:`python -c "import pip; print(pip.pep425tags.get_supported())"` 获取当前系统支持的python包的后缀,
并对比是否和正在安装的后缀一致。
如果系统支持的是 :code:`linux_x86_64` 而安装包是 :code:`manylinux1_x86_64` ,需要升级pip版本到最新;
如果系统支持 :code:`manylinux1_x86_64` 而安装包(本地)是 :code:`linux_x86_64` ,可以重命名这个whl包为 :code:`manylinux1_x86_64` 再安装。
8. python相关的单元测试都过不了
--------------------------------
如果出现以下python相关的单元测试都过不了的情况:
.. code-block:: bash
24 - test_PyDataProvider (Failed)
26 - test_RecurrentGradientMachine (Failed)
27 - test_NetworkCompare (Failed)
28 - test_PyDataProvider2 (Failed)
32 - test_Prediction (Failed)
33 - test_Compare (Failed)
34 - test_Trainer (Failed)
35 - test_TrainerOnePass (Failed)
36 - test_CompareTwoNets (Failed)
37 - test_CompareTwoOpts (Failed)
38 - test_CompareSparse (Failed)
39 - test_recurrent_machine_generation (Failed)
40 - test_PyDataProviderWrapper (Failed)
41 - test_config_parser (Failed)
42 - test_swig_api (Failed)
43 - layers_test (Failed)
并且查询PaddlePaddle单元测试的日志,提示:
.. code-block:: bash
paddle package is already in your PYTHONPATH. But unittest need a clean environment.
Please uninstall paddle package before start unittest. Try to 'pip uninstall paddle'.
解决办法是:
* 卸载PaddlePaddle包 :code:`pip uninstall paddle`, 清理掉老旧的PaddlePaddle安装包,使得单元测试有一个干净的环境。如果PaddlePaddle包已经在python的site-packages里面,单元测试会引用site-packages里面的python包,而不是源码目录里 :code:`/python` 目录下的python包。同时,即便设置 :code:`PYTHONPATH` 到 :code:`/python` 也没用,因为python的搜索路径是优先已经安装的python包。
9. 运行Docker GPU镜像出现 "CUDA driver version is insufficient"
----------------------------------------------------------------
用户在使用PaddlePaddle GPU的Docker镜像的时候,常常出现 `Cuda Error: CUDA driver version is insufficient for CUDA runtime version`, 原因在于没有把机器上CUDA相关的驱动和库映射到容器内部。
具体的解决方法是:
.. code-block:: bash
$ export CUDA_SO="$(\ls usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
$ export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
$ docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddlepaddle:latest-gpu
更多关于Docker的安装与使用, 请参考 `PaddlePaddle Docker 文档 <http://www.paddlepaddle.org/doc_cn/build_and_install/install/docker_install.html>`_ 。
10. CMake源码编译, 找到的PythonLibs和PythonInterp版本不一致
----------------------------------------------------------------
这是目前CMake寻找Python的逻辑存在缺陷,如果系统安装了多个Python版本,CMake找到的Python库和Python解释器版本可能有不一致现象,导致编译PaddlePaddle失败。正确的解决方法是,
用户强制指定特定的Python版本,具体操作如下:
.. code-block:: bash
cmake .. -DPYTHON_EXECUTABLE=<exc_path> -DPYTHON_LIBRARY=<lib_path> -DPYTHON_INCLUDE_DIR=<inc_path>
用户需要指定本机上Python的路径:``<exc_path>``, ``<lib_path>``, ``<inc_path>``
11. CMake源码编译,Paddle版本号为0.0.0
--------------------------------------
如果运行 :code:`paddle version`, 出现 :code:`PaddlePaddle 0.0.0`;或者运行 :code:`cmake ..`,出现
.. code-block:: bash
CMake Warning at cmake/version.cmake:20 (message):
Cannot add paddle version from git tag
那么用户需要拉取所有的远程分支到本机,命令为 :code:`git fetch upstream`,然后重新cmake即可。
12. A protocol message was rejected because it was too big
----------------------------------------------------------
如果在训练NLP相关模型时,出现以下错误:
.. code-block:: bash
[libprotobuf ERROR google/protobuf/io/coded_stream.cc:171] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
F1205 14:59:50.295174 14703 TrainerConfigHelper.cpp:59] Check failed: m->conf.ParseFromString(configProtoStr)
可能的原因是:传给dataprovider的某一个args过大,一般是由于直接传递大字典导致的。错误的define_py_data_sources2类似:
.. code-block:: python
src_dict = dict()
for line_count, line in enumerate(open(src_dict_path, "r")):
src_dict[line.strip()] = line_count
define_py_data_sources2(
train_list,
test_list,
module="dataprovider",
obj="process",
args={"src_dict": src_dict})
解决方案是:将字典的地址作为args传给dataprovider,然后在dataprovider里面根据该地址加载字典。即define_py_data_sources2应改为:
.. code-block:: python
define_py_data_sources2(
train_list,
test_list,
module="dataprovider",
obj="process",
args={"src_dict_path": src_dict_path})
完整源码可参考 `seqToseq <https://github.com/PaddlePaddle/Paddle/tree/develop/demo/seqToseq>`_ 示例。
13. 如何指定GPU设备
-------------------
例如机器上有4块GPU,编号从0开始,指定使用2、3号GPU:
* 方式1:通过 `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ 环境变量来指定特定的GPU。
.. code-block:: bash
env CUDA_VISIBLE_DEVICES=2,3 paddle train --use_gpu=true --trainer_count=2
* 方式2:通过命令行参数 ``--gpu_id`` 指定。
.. code-block:: bash
paddle train --use_gpu=true --trainer_count=2 --gpu_id=2
14. 训练过程中出现 :code:`Floating point exception`, 训练因此退出怎么办?
------------------------------------------------------------------------
Paddle二进制在运行时捕获了浮点数异常,只要出现浮点数异常(即训练过程中出现NaN或者Inf),立刻退出。浮点异常通常的原因是浮点数溢出、除零等问题。
主要原因包括两个方面:
* 训练过程中参数或者训练过程中的梯度尺度过大,导致参数累加,乘除等时候,导致了浮点数溢出。
* 模型一直不收敛,发散到了一个数值特别大的地方。
* 训练数据有问题,导致参数收敛到了一些奇异的情况。或者输入数据尺度过大,有些特征的取值达到数百万,这时进行矩阵乘法运算就可能导致浮点数溢出。
主要的解决办法是减小学习率或者对数据进行归一化处理。
15. 编译安装后执行 import paddle.v2 as paddle 报ImportError: No module named v2
------------------------------------------------------------------------
先查看一下是否曾经安装过paddle v1版本,有的话需要先卸载:
pip uninstall py_paddle paddle
然后安装paddle的python环境, 在build目录下执行
pip install python/dist/paddle*.whl && pip install ../paddle/dist/py_paddle*.whl
16. PaddlePaddle存储的参数格式是什么,如何和明文进行相互转化
---------------------------------------------------------
PaddlePaddle保存的模型参数文件内容由16字节头信息和网络参数两部分组成。头信息中,1~4字节表示PaddlePaddle版本信息,请直接填充0;5~8字节表示每个参数占用的字节数,当保存的网络参数为float类型时为4,double类型时为8;9~16字节表示保存的参数总个数。
将PaddlePaddle保存的模型参数还原回明文时,可以使用相应数据类型的 :code:`numpy.array` 加载具体网络参数,此时可以跳过PaddlePaddle模型参数文件的头信息。若在PaddlePaddle编译时,未指定按照double精度编译,默认情况下按照float精度计算,保存的参数也是float类型。这时在使用 :code:`numpy.array` 时,一般设置 :code:`dtype=float32` 。示例如下:
.. code-block:: python
def read_parameter(fname, width):
s = open(fname).read()
# skip header
vec = np.fromstring(s[16:], dtype=np.float32)
# width is the size of the corresponding layer
np.savetxt(fname + ".csv", vec.reshape(width, -1),
fmt="%.6f", delimiter=",")
将明文参数转化为PaddlePaddle可加载的模型参数时,首先构造头信息,再写入网络参数。下面的代码将随机生成的矩阵转化为可以被PaddlePaddle加载的模型参数。
.. code-block:: python
def gen_rand_param(param_file, width, height, need_trans):
np.random.seed()
header = struct.pack("iil", 0, 4, height * width)
param = np.float32(np.random.rand(height, width))
with open(param_file, "w") as fparam:
fparam.write(header + param.tostring())
17. 如何加载预训练参数
------------------------------
* 对加载预训练参数的层,设置其参数属性 :code:`is_static=True`,使该层的参数在训练过程中保持不变。以embedding层为例,代码如下:
.. code-block:: python
emb_para = paddle.attr.Param(name='emb', is_static=True)
paddle.layer.embedding(size=word_dim, input=x, param_attr=emb_para)
* 从模型文件将预训练参数载入 :code:`numpy.array`,在创建parameters后,使用 :code:`parameters.set()` 加载预训练参数。PaddlePaddle保存的模型参数文件前16字节为头信息,用户将参数载入 :code:`numpy.array` 时须从第17字节开始。以embedding层为例,代码如下:
.. code-block:: python
def load_parameter(file_name, h, w):
with open(file_name, 'rb') as f:
f.read(16) # skip header.
return np.fromfile(f, dtype=np.float32).reshape(h, w)
parameters = paddle.parameters.create(my_cost)
parameters.set('emb', load_parameter(emb_param_file, 30000, 256))
18. 集群多节点训练,日志中保存均为网络通信类错误
------------------------------
集群多节点训练,日志报错为网络通信类错误,比如 :code:`Connection reset by peer` 等。
此类报错通常是由于某一个节点的错误导致这个节点的训练进程退出,从而引发其他节点无法连接导致,可以参考下面的步骤排查:
* 从 :code:`train.log` , :code:`server.log` 找到最早报错的地方,查看是否是其他错误引发的报错(比如FPE,内存不足,磁盘空间不足等)。
* 如果发现最早的报错就是网络通信的问题,很有可能是非独占方式执行导致的端口冲突,可以联系OP,看当前MPI集群是否支持resource=full参数提交,如果支持增加此参数提交,并更换job 端口。
* 如果当前MPI集群并不支持任务独占模式,可以联系OP是否可以更换集群或升级当前集群。
19. PaddlePaddle如何输出多个层
------------------------------
* 将需要输出的层作为 :code:`paddle.inference.Inference()` 接口的 :code:`output_layer` 参数输入,代码如下:
.. code-block:: python
inferer = paddle.inference.Inference(output_layer=[layer1, layer2], parameters=parameters)
* 指定要输出的字段进行输出。以输出 :code:`value` 字段为例,代码如下:
.. code-block:: python
out = inferer.infer(input=data_batch, flatten_result=False, field=["value"])
这里设置 :code:`flatten_result=False`,得到的输出结果是元素个数等于输出字段数的 :code:`list`,该 :code:`list` 的每个元素是由所有输出层相应字段结果组成的 :code:`list`,每个字段结果的类型是 :code:`numpy.array`。:code:`flatten_result` 的默认值为 :code:`True`,该情况下,PaddlePaddle会分别对每个字段将所有输出层的结果按行进行拼接,如果各输出层该字段 :code:`numpy.array` 结果的相应维数不匹配,程序将不能正常运行。
20. :code:`paddle.layer.memory` 的参数 :code:`name` 如何使用
-------------------------------------------------------------
* :code:`paddle.layer.memory` 用于获取特定layer上一时间步的输出,该layer是通过参数 :code:`name` 指定,即,:code:`paddle.layer.memory` 会关联参数 :code:`name` 取值相同的layer,并将该layer上一时间步的输出作为自身当前时间步的输出。
* PaddlePaddle的所有layer都有唯一的name,用户通过参数 :code:`name` 设定,当用户没有显式设定时,PaddlePaddle会自动设定。而 :code:`paddle.layer.memory` 不是真正的layer,其name由参数 :code:`memory_name` 设定,当用户没有显式设定时,PaddlePaddle会自动设定。:code:`paddle.layer.memory` 的参数 :code:`name` 用于指定其要关联的layer,需要用户显式设定。
21. dropout 使用
-----------------
* 在PaddlePaddle中使用dropout有两种方式
* 在相应layer的 :code:`layer_atter` 设置 :code:`drop_rate`,以 :code:`paddle.layer.fc` 为例,代码如下:
.. code-block:: python
fc = paddle.layer.fc(input=input, layer_attr=paddle.attr.ExtraLayerAttribute(drop_rate=0.5))
* 使用 :code:`paddle.layer.dropout`,以 :code:`paddle.layer.fc` 为例,代码如下:
.. code-block:: python
fc = paddle.layer.fc(input=input)
drop_fc = paddle.layer.dropout(input=fc, dropout_rate=0.5)
* :code:`paddle.layer.dropout` 实际上使用了 :code:`paddle.layer.add_to`,并在该layer里采用第一种方式设置 :code:`drop_rate` 来使用dropout的。这种方式对内存消耗较大。
* PaddlePaddle在激活函数里实现dropout,而不是在layer里实现。
* :code:`paddle.layer.lstmemory`、:code:`paddle.layer.grumemory`、:code:`paddle.layer.recurrent` 不是通过一般的方式来实现对输出的激活,所以不能采用第一种方式在这几个layer里设置 :code:`drop_rate` 来使用dropout。若要对这几个layer使用dropout,可采用第二种方式,即使用 :code:`paddle.layer.dropout`。
22. 如何设置学习率退火(learning rate annealing)
------------------------------------------------
在相应的优化算法里设置learning_rate_schedule及相关参数,以使用Adam算法为例,代码如下:
.. code-block:: python
optimizer = paddle.optimizer.Adam(
learning_rate=1e-3,
learning_rate_decay_a=0.5,
learning_rate_decay_b=0.75,
learning_rate_schedule="poly",)
PaddlePaddle目前支持8种learning_rate_schedule,这8种learning_rate_schedule及其对应学习率计算方式如下:
* "constant"
lr = learning_rate
* "poly"
lr = learning_rate * pow(1 + learning_rate_decay_a * num_samples_processed, -learning_rate_decay_b)
其中,num_samples_processed为已训练样本数,下同。
* "caffe_poly"
lr = learning_rate * pow(1.0 - num_samples_processed / learning_rate_decay_a, learning_rate_decay_b)
* "exp"
lr = learning_rate * pow(learning_rate_decay_a, num_samples_processed / learning_rate_decay_b)
* "discexp"
lr = learning_rate * pow(learning_rate_decay_a, floor(num_samples_processed / learning_rate_decay_b))
* "linear"
lr = max(learning_rate - learning_rate_decay_a * num_samples_processed, learning_rate_decay_b)
* "manual"
这是一种按已训练样本数分段取值的学习率退火方法。使用该learning_rate_schedule时,用户通过参数 :code:`learning_rate_args` 设置学习率衰减因子分段函数,当前的学习率为所设置 :code:`learning_rate` 与当前的衰减因子的乘积。以使用Adam算法为例,代码如下:
.. code-block:: python
optimizer = paddle.optimizer.Adam(
learning_rate=1e-3,
learning_rate_schedule="manual",
learning_rate_args="1000:1.0,2000:0.9,3000:0.8",)
在该示例中,当已训练样本数小于等于1000时,学习率为 :code:`1e-3 * 1.0`;当已训练样本数大于1000小于等于2000时,学习率为 :code:`1e-3 * 0.9`;当已训练样本数大于2000时,学习率为 :code:`1e-3 * 0.8`。
* "pass_manual"
这是一种按已训练pass数分段取值的学习率退火方法。使用该learning_rate_schedule时,用户通过参数 :code:`learning_rate_args` 设置学习率衰减因子分段函数,当前的学习率为所设置 :code:`learning_rate` 与当前的衰减因子的乘积。以使用Adam算法为例,代码如下:
.. code-block:: python
optimizer = paddle.optimizer.Adam(
learning_rate=1e-3,
learning_rate_schedule="manual",
learning_rate_args="1:1.0,2:0.9,3:0.8",)
在该示例中,当已训练pass数小于等于1时,学习率为 :code:`1e-3 * 1.0`;当已训练pass数大于1小于等于2时,学习率为 :code:`1e-3 * 0.9`;当已训练pass数大于2时,学习率为 :code:`1e-3 * 0.8`。
23. 出现 :code:`Duplicated layer name` 错误怎么办
--------------------------------------------------
出现该错误的原因一般是用户对不同layer的参数 :code:`name` 设置了相同的取值。遇到该错误时,先找出参数 :code:`name` 取值相同的layer,然后将这些layer的参数 :code:`name` 设置为不同的值。
build_and_install/index_cn.rst
model/index_cn.rst
parameter/index_cn.rst
local/index_cn.rst
cluster/index_cn.rst
###############
本地训练与预测
###############
.. contents::
1. 如何减少内存占用
-------------------
神经网络的训练本身是一个非常消耗内存和显存的工作,经常会消耗数10GB的内存和数GB的显存。
PaddlePaddle的内存占用主要分为如下几个方面\:
* DataProvider缓冲池内存(只针对内存)
* 神经元激活内存(针对内存和显存)
* 参数内存 (针对内存和显存)
* 其他内存杂项
其中,其他内存杂项是指PaddlePaddle本身所用的一些内存,包括字符串分配,临时变量等等,暂不考虑在内。
减少DataProvider缓冲池内存
++++++++++++++++++++++++++
PyDataProvider使用的是异步加载,同时在内存里直接随即选取数据来做Shuffle。即
.. graphviz::
digraph {
rankdir=LR;
数据文件 -> 内存池 -> PaddlePaddle训练
}
所以,减小这个内存池即可减小内存占用,同时也可以加速开始训练前数据载入的过程。但是,这
个内存池实际上决定了shuffle的粒度。所以,如果将这个内存池减小,又要保证数据是随机的,
那么最好将数据文件在每次读取之前做一次shuffle。可能的代码为
.. literalinclude:: src/reduce_min_pool_size.py
这样做可以极大的减少内存占用,并且可能会加速训练过程,详细文档参考 :ref:`api_pydataprovider2` 。
神经元激活内存
++++++++++++++
神经网络在训练的时候,会对每一个激活暂存一些数据,如神经元激活值等。
在反向传递的时候,这些数据会被用来更新参数。这些数据使用的内存主要和两个参数有关系,
一是batch size,另一个是每条序列(Sequence)长度。所以,其实也是和每个mini-batch中包含
的时间步信息成正比。
所以做法可以有两种:
* 减小batch size。 即在网络配置中 :code:`settings(batch_size=1000)` 设置成一个小一些的值。但是batch size本身是神经网络的超参数,减小batch size可能会对训练结果产生影响。
* 减小序列的长度,或者直接扔掉非常长的序列。比如,一个数据集大部分序列长度是100-200,
但是突然有一个10000长的序列,就很容易导致内存超限,特别是在LSTM等RNN中。
参数内存
++++++++
PaddlePaddle支持非常多的优化算法(Optimizer),不同的优化算法需要使用不同大小的内存。
例如使用 :code:`adadelta` 算法,则需要使用等于权重参数规模大约5倍的内存。举例,如果参数保存下来的模型目录
文件为 :code:`100M`, 那么该优化算法至少需要 :code:`500M` 的内存。
可以考虑使用一些优化算法,例如 :code:`momentum`。
2. 如何加速训练速度
-------------------
加速PaddlePaddle训练可以考虑从以下几个方面\:
* 减少数据载入的耗时
* 加速训练速度
* 利用分布式训练驾驭更多的计算资源
减少数据载入的耗时
++++++++++++++++++
使用\ :code:`pydataprovider`\ 时,可以减少缓存池的大小,同时设置内存缓存功能,即可以极大的加速数据载入流程。
:code:`DataProvider` 缓存池的减小,和之前减小通过减小缓存池来减小内存占用的原理一致。
.. literalinclude:: src/reduce_min_pool_size.py
同时 :code:`@provider` 接口有一个 :code:`cache` 参数来控制缓存方法,将其设置成 :code:`CacheType.CACHE_PASS_IN_MEM` 的话,会将第一个 :code:`pass` (过完所有训练数据即为一个pass)生成的数据缓存在内存里,在之后的 :code:`pass` 中,不会再从 :code:`python` 端读取数据,而是直接从内存的缓存里读取数据。这也会极大减少数据读入的耗时。
加速训练速度
++++++++++++
PaddlePaddle支持Sparse的训练,sparse训练需要训练特征是 :code:`sparse_binary_vector` 、 :code:`sparse_vector` 、或者 :code:`integer_value` 的任一一种。同时,与这个训练数据交互的Layer,需要将其Parameter设置成 sparse 更新模式,即设置 :code:`sparse_update=True`
这里使用简单的 :code:`word2vec` 训练语言模型距离,具体使用方法为\:
使用一个词前两个词和后两个词,来预测这个中间的词。这个任务的DataProvider为\:
.. literalinclude:: src/word2vec_dataprovider.py
这个任务的配置为\:
.. literalinclude:: src/word2vec_config.py
利用更多的计算资源
++++++++++++++++++
利用更多的计算资源可以分为一下几个方式来进行\:
* 单机CPU训练
* 使用多线程训练。设置命令行参数 :code:`trainer_count`。
* 单机GPU训练
* 使用显卡训练。设置命令行参数 :code:`use_gpu`。
* 使用多块显卡训练。设置命令行参数 :code:`use_gpu` 和 :code:`trainer_count` 。
* 多机训练
* 请参考 :ref:`cluster_train` 。
3. 如何指定GPU设备
------------------
例如机器上有4块GPU,编号从0开始,指定使用2、3号GPU:
* 方式1:通过 `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ 环境变量来指定特定的GPU。
.. code-block:: bash
env CUDA_VISIBLE_DEVICES=2,3 paddle train --use_gpu=true --trainer_count=2
* 方式2:通过命令行参数 ``--gpu_id`` 指定。
.. code-block:: bash
paddle train --use_gpu=true --trainer_count=2 --gpu_id=2
4. 训练过程中出现 :code:`Floating point exception`, 训练因此退出怎么办?
------------------------------------------------------------------------
Paddle二进制在运行时捕获了浮点数异常,只要出现浮点数异常(即训练过程中出现NaN或者Inf),立刻退出。浮点异常通常的原因是浮点数溢出、除零等问题。
主要原因包括两个方面:
* 训练过程中参数或者训练过程中的梯度尺度过大,导致参数累加,乘除等时候,导致了浮点数溢出。
* 模型一直不收敛,发散到了一个数值特别大的地方。
* 训练数据有问题,导致参数收敛到了一些奇异的情况。或者输入数据尺度过大,有些特征的取值达到数百万,这时进行矩阵乘法运算就可能导致浮点数溢出。
这里有两种有效的解决方法:
1. 设置 :code:`gradient_clipping_threshold` 参数,示例代码如下:
.. code-block:: python
optimizer = paddle.optimizer.RMSProp(
learning_rate=1e-3,
gradient_clipping_threshold=10.0,
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
具体可以参考 `nmt_without_attention <https://github.com/PaddlePaddle/models/blob/develop/nmt_without_attention/train.py#L35>`_ 示例。
2. 设置 :code:`error_clipping_threshold` 参数,示例代码如下:
.. code-block:: python
decoder_inputs = paddle.layer.fc(
act=paddle.activation.Linear(),
size=decoder_size * 3,
bias_attr=False,
input=[context, current_word],
layer_attr=paddle.attr.ExtraLayerAttribute(
error_clipping_threshold=100.0))
完整代码可以参考示例 `machine translation <https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/train.py#L66>`_ 。
两种方法的区别:
1. 两者都是对梯度的截断,但截断时机不同,前者在 :code:`optimzier` 更新网络参数时应用;后者在激活函数反向计算时被调用;
2. 截断对象不同:前者截断可学习参数的梯度,后者截断回传给前层的梯度;
除此之外,还可以通过减小学习律或者对数据进行归一化处理来解决这类问题。
5. 如何调用 infer 接口输出多个layer的预测结果
-----------------------------------------------
* 将需要输出的层作为 :code:`paddle.inference.Inference()` 接口的 :code:`output_layer` 参数输入,代码如下:
.. code-block:: python
inferer = paddle.inference.Inference(output_layer=[layer1, layer2], parameters=parameters)
* 指定要输出的字段进行输出。以输出 :code:`value` 字段为例,代码如下:
.. code-block:: python
out = inferer.infer(input=data_batch, field=["value"])
需要注意的是:
* 如果指定了2个layer作为输出层,实际上需要的输出结果是两个矩阵;
* 假设第一个layer的输出A是一个 N1 * M1 的矩阵,第二个 Layer 的输出B是一个 N2 * M2 的矩阵;
* paddle.v2 默认会将A和B 横向拼接,当N1 和 N2 大小不一样时,会报如下的错误:
.. code-block:: python
ValueError: all the input array dimensions except for the concatenation axis must match exactly
多个层的输出矩阵的高度不一致导致拼接失败,这种情况常常发生在:
* 同时输出序列层和非序列层;
* 多个输出层处理多个不同长度的序列;
此时可以在调用infer接口时通过设置 :code:`flatten_result=False` , 跳过“拼接”步骤,来解决上面的问题。这时,infer接口的返回值是一个python list:
* list 中元素的个数等于网络中输出层的个数;
* list 中每个元素是一个layer的输出结果矩阵,类型是numpy的ndarray;
* 每一个layer输出矩阵的高度,在非序列输入时:等于样本数;序列输入时等于:输入序列中元素的总数;宽度等于配置中layer的size;
#########
模型配置
#########
.. contents::
1. 出现 :code:`Duplicated layer name` 错误怎么办
--------------------------------------------------
出现该错误的原因一般是用户对不同layer的参数 :code:`name` 设置了相同的取值。遇到该错误时,先找出参数 :code:`name` 取值相同的layer,然后将这些layer的参数 :code:`name` 设置为不同的值。
2. :code:`paddle.layer.memory` 的参数 :code:`name` 如何使用
-------------------------------------------------------------
* :code:`paddle.layer.memory` 用于获取特定layer上一时间步的输出,该layer是通过参数 :code:`name` 指定,即,:code:`paddle.layer.memory` 会关联参数 :code:`name` 取值相同的layer,并将该layer上一时间步的输出作为自身当前时间步的输出。
* PaddlePaddle的所有layer都有唯一的name,用户通过参数 :code:`name` 设定,当用户没有显式设定时,PaddlePaddle会自动设定。而 :code:`paddle.layer.memory` 不是真正的layer,其name由参数 :code:`memory_name` 设定,当用户没有显式设定时,PaddlePaddle会自动设定。:code:`paddle.layer.memory` 的参数 :code:`name` 用于指定其要关联的layer,需要用户显式设定。
3. 两种使用 drop_out 的方法有何区别
------------------------------------
* 在PaddlePaddle中使用dropout有两种方式
* 在相应layer的 :code:`layer_atter` 设置 :code:`drop_rate`,以 :code:`paddle.layer.fc` 为例,代码如下:
.. code-block:: python
fc = paddle.layer.fc(input=input, layer_attr=paddle.attr.ExtraLayerAttribute(drop_rate=0.5))
* 使用 :code:`paddle.layer.dropout`,以 :code:`paddle.layer.fc` 为例,代码如下:
.. code-block:: python
fc = paddle.layer.fc(input=input)
drop_fc = paddle.layer.dropout(input=fc, dropout_rate=0.5)
* :code:`paddle.layer.dropout` 实际上使用了 :code:`paddle.layer.add_to`,并在该layer里采用第一种方式设置 :code:`drop_rate` 来使用dropout的。这种方式对内存消耗较大。
* PaddlePaddle在激活函数里实现dropout,而不是在layer里实现。
* :code:`paddle.layer.lstmemory`、:code:`paddle.layer.grumemory`、:code:`paddle.layer.recurrent` 不是通过一般的方式来实现对输出的激活,所以不能采用第一种方式在这几个layer里设置 :code:`drop_rate` 来使用dropout。若要对这几个layer使用dropout,可采用第二种方式,即使用 :code:`paddle.layer.dropout`。
4. 不同的 recurrent layer 的区别
----------------------------------
以LSTM为例,在PaddlePaddle中包含以下 recurrent layer:
* :code:`paddle.layer.lstmemory`
* :code:`paddle.networks.simple_lstm`
* :code:`paddle.networks.lstmemory_group`
* :code:`paddle.networks.bidirectional_lstm`
按照具体实现方式可以归纳为2类:
1. 由 recurrent_group 实现的 recurrent layer:
* 用户在使用这一类recurrent layer时,可以访问由recurrent unit在一个时间步内计算得到的中间值(例如:hidden states, memory cells等);
* 上述的 :code:`paddle.networks.lstmemory_group` 是这一类的 recurrent layer ;
2. 将recurrent layer作为一个整体来实现:
* 用户在使用这一类recurrent layer,只能访问它们的输出值;
* 上述的 :code:`paddle.networks.lstmemory_group` 、 :code:`paddle.networks.simple_lstm` 和 :code:`paddle.networks.bidirectional_lstm` 属于这一类的实现;
将recurrent layer作为一个整体来实现, 能够针对CPU和GPU的计算做更多优化, 所以相比于recurrent group的实现方式, 第二类 recurrent layer 计算效率更高。 在实际应用中,如果用户不需要访问LSTM的中间变量,而只需要获得recurrent layer计算的输出,我们建议使用第二类实现。
此外,关于LSTM, PaddlePaddle中还包含 :code:`paddle.networks.lstmemory_unit` 这一计算单元:
* 不同于上述介绍的recurrent layer , :code:`paddle.networks.lstmemory_unit` 定义了LSTM单元在一个时间步内的计算过程,它并不是一个完整的recurrent layer,也不能接收序列数据作为输入;
* :code:`paddle.networks.lstmemory_unit` 只能在recurrent_group中作为step function使用;
#########
参数设置
#########
.. contents::
1. 如何选择SGD算法的学习率
--------------------------
在采用sgd/async_sgd进行训练时,一个重要的问题是选择正确的learning_rate。如果learning_rate太大,那么训练有可能不收敛,如果learning_rate太小,那么收敛可能很慢,导致训练时间过长。
通常做法是从一个比较大的learning_rate开始试,如果不收敛,那减少学习率10倍继续试验,直到训练收敛为止。那么如何判断训练不收敛呢?可以估计出如果模型采用不变的输出最小的cost0是多少。
如果训练过程的的cost明显高于这个常数输出的cost,那么我们可以判断为训练不收敛。举一个例子,假如我们是三分类问题,采用multi-class-cross-entropy作为cost,数据中0,1,2三类的比例为 :code:`0.2, 0.5, 0.3` , 那么常数输出所能达到的最小cost是 :code:`-(0.2*log(0.2)+0.5*log(0.5)+0.3*log(0.3))=1.03` 。如果训练一个pass(或者更早)后,cost还大于这个数,那么可以认为训练不收敛,应该降低学习率。
2. 如何设置学习率退火(learning rate annealing)
------------------------------------------------
在相应的优化算法里设置learning_rate_schedule及相关参数,以使用Adam算法为例,代码如下:
.. code-block:: python
optimizer = paddle.optimizer.Adam(
learning_rate=1e-3,
learning_rate_decay_a=0.5,
learning_rate_decay_b=0.75,
learning_rate_schedule="poly",)
PaddlePaddle目前支持8种learning_rate_schedule,这8种learning_rate_schedule及其对应学习率计算方式如下:
* "constant"
lr = learning_rate
* "poly"
lr = learning_rate * pow(1 + learning_rate_decay_a * num_samples_processed, -learning_rate_decay_b)
其中,num_samples_processed为已训练样本数,下同。
* "caffe_poly"
lr = learning_rate * pow(1.0 - num_samples_processed / learning_rate_decay_a, learning_rate_decay_b)
* "exp"
lr = learning_rate * pow(learning_rate_decay_a, num_samples_processed / learning_rate_decay_b)
* "discexp"
lr = learning_rate * pow(learning_rate_decay_a, floor(num_samples_processed / learning_rate_decay_b))
* "linear"
lr = max(learning_rate - learning_rate_decay_a * num_samples_processed, learning_rate_decay_b)
* "manual"
这是一种按已训练样本数分段取值的学习率退火方法。使用该learning_rate_schedule时,用户通过参数 :code:`learning_rate_args` 设置学习率衰减因子分段函数,当前的学习率为所设置 :code:`learning_rate` 与当前的衰减因子的乘积。以使用Adam算法为例,代码如下:
.. code-block:: python
optimizer = paddle.optimizer.Adam(
learning_rate=1e-3,
learning_rate_schedule="manual",
learning_rate_args="1000:1.0,2000:0.9,3000:0.8",)
在该示例中,当已训练样本数小于等于1000时,学习率为 :code:`1e-3 * 1.0`;当已训练样本数大于1000小于等于2000时,学习率为 :code:`1e-3 * 0.9`;当已训练样本数大于2000时,学习率为 :code:`1e-3 * 0.8`。
* "pass_manual"
这是一种按已训练pass数分段取值的学习率退火方法。使用该learning_rate_schedule时,用户通过参数 :code:`learning_rate_args` 设置学习率衰减因子分段函数,当前的学习率为所设置 :code:`learning_rate` 与当前的衰减因子的乘积。以使用Adam算法为例,代码如下:
.. code-block:: python
optimizer = paddle.optimizer.Adam(
learning_rate=1e-3,
learning_rate_schedule="manual",
learning_rate_args="1:1.0,2:0.9,3:0.8",)
在该示例中,当已训练pass数小于等于1时,学习率为 :code:`1e-3 * 1.0`;当已训练pass数大于1小于等于2时,学习率为 :code:`1e-3 * 0.9`;当已训练pass数大于2时,学习率为 :code:`1e-3 * 0.8`。
3. 如何初始化参数
-----------------
默认情况下,PaddlePaddle使用均值0,标准差为 :math:`\frac{1}{\sqrt{d}}` 来初始化参数。其中 :math:`d` 为参数矩阵的宽度。这种初始化方式在一般情况下不会产生很差的结果。如果用户想要自定义初始化方式,PaddlePaddle目前提供两种参数初始化的方式\:
* 高斯分布。将 :code:`param_attr` 设置成 :code:`param_attr=ParamAttr(initial_mean=0.0, initial_std=1.0)`
* 均匀分布。将 :code:`param_attr` 设置成 :code:`param_attr=ParamAttr(initial_max=1.0, initial_min=-1.0)`
比如设置一个全连接层的参数初始化方式和bias初始化方式,可以使用如下代码。
.. code-block:: python
hidden = fc_layer(input=ipt, param_attr=ParamAttr(initial_max=1.0, initial_min=-1.0),
bias_attr=ParamAttr(initial_mean=1.0, initial_std=0.0))
上述代码将bias全部初始化为1.0, 同时将参数初始化为 :code:`[1.0, -1.0]` 的均匀分布。
4. 如何共享参数
---------------
PaddlePaddle的参数使用名字 :code:`name` 作为参数的ID,相同名字的参数,会共享参数。设置参数的名字,可以使用 :code:`ParamAttr(name="YOUR_PARAM_NAME")` 来设置。更方便的设置方式,是使得要共享的参数使用同样的 :code:`ParamAttr` 对象。
简单的全连接网络,参数共享的配置示例为\:
.. literalinclude:: ../../python/paddle/trainer_config_helpers/tests/configs/shared_fc.py
这里 :code:`hidden_a` 和 :code:`hidden_b` 使用了同样的parameter和bias。并且softmax层的两个输入也使用了同样的参数 :code:`softmax_param`。
5. 如何加载预训练参数
------------------------
* 对加载预训练参数的层,设置其参数属性 :code:`is_static=True`,使该层的参数在训练过程中保持不变。以embedding层为例,代码如下:
.. code-block:: python
emb_para = paddle.attr.Param(name='emb', is_static=True)
paddle.layer.embedding(size=word_dim, input=x, param_attr=emb_para)
* 从模型文件将预训练参数载入 :code:`numpy.array`,在创建parameters后,使用 :code:`parameters.set()` 加载预训练参数。PaddlePaddle保存的模型参数文件前16字节为头信息,用户将参数载入 :code:`numpy.array` 时须从第17字节开始。以embedding层为例,代码如下:
.. code-block:: python
def load_parameter(file_name, h, w):
with open(file_name, 'rb') as f:
f.read(16) # skip header.
return np.fromfile(f, dtype=np.float32).reshape(h, w)
parameters = paddle.parameters.create(my_cost)
parameters.set('emb', load_parameter(emb_param_file, 30000, 256))
6. 存储的参数格式是什么,如何和明文进行相互转化
--------------------------------------------------
PaddlePaddle保存的模型参数文件内容由16字节头信息和网络参数两部分组成。头信息中,1~4字节表示PaddlePaddle版本信息,请直接填充0;5~8字节表示每个参数占用的字节数,当保存的网络参数为float类型时为4,double类型时为8;9~16字节表示保存的参数总个数。
将PaddlePaddle保存的模型参数还原回明文时,可以使用相应数据类型的 :code:`numpy.array` 加载具体网络参数,此时可以跳过PaddlePaddle模型参数文件的头信息。若在PaddlePaddle编译时,未指定按照double精度编译,默认情况下按照float精度计算,保存的参数也是float类型。这时在使用 :code:`numpy.array` 时,一般设置 :code:`dtype=float32` 。示例如下:
.. code-block:: python
def read_parameter(fname, width):
s = open(fname).read()
# skip header
vec = np.fromstring(s[16:], dtype=np.float32)
# width is the size of the corresponding layer
np.savetxt(fname + ".csv", vec.reshape(width, -1),
fmt="%.6f", delimiter=",")
将明文参数转化为PaddlePaddle可加载的模型参数时,首先构造头信息,再写入网络参数。下面的代码将随机生成的矩阵转化为可以被PaddlePaddle加载的模型参数。
.. code-block:: python
def gen_rand_param(param_file, width, height, need_trans):
np.random.seed()
header = struct.pack("iil", 0, 4, height * width)
param = np.float32(np.random.rand(height, width))
with open(param_file, "w") as fparam:
fparam.write(header + param.tostring())
7. A protocol message was rejected because it was too big
------------------------------------------------------------
如果在训练NLP相关模型时,出现以下错误:
.. code-block:: bash
[libprotobuf ERROR google/protobuf/io/coded_stream.cc:171] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
F1205 14:59:50.295174 14703 TrainerConfigHelper.cpp:59] Check failed: m->conf.ParseFromString(configProtoStr)
可能的原因是:传给dataprovider的某一个args过大,一般是由于直接传递大字典导致的。错误的define_py_data_sources2类似:
.. code-block:: python
src_dict = dict()
for line_count, line in enumerate(open(src_dict_path, "r")):
src_dict[line.strip()] = line_count
define_py_data_sources2(
train_list,
test_list,
module="dataprovider",
obj="process",
args={"src_dict": src_dict})
解决方案是:将字典的地址作为args传给dataprovider,然后在dataprovider里面根据该地址加载字典。即define_py_data_sources2应改为:
.. code-block:: python
define_py_data_sources2(
train_list,
test_list,
module="dataprovider",
obj="process",
args={"src_dict_path": src_dict_path})
完整源码可参考 `seqToseq <https://github.com/PaddlePaddle/Paddle/tree/develop/demo/seqToseq>`_ 示例。
# How to write a new operator # How to write a new operator
- [Background](#Background) - [Background](#background)
- [Implementing C++ Types](#Implementing_C++_Types) - [Implementing C++ Types](#implementing-c++-types)
- [Defining ProtoMaker](#Defining_ProtoMaker) - [Defining ProtoMaker](#defining-protoMaker)
- [Defining Operator](#Defining_Operator) - [Defining Operator](#defining-operator)
- [Registering Operator](#Registering_Operator) - [Registering Operator](#registering-operator)
- [Compilation](#Compilation) - [Compilation](#compilation)
- [Python Binding](#Python_Binding) - [Python Binding](#python-binding)
- [Unit Tests](#Unit_Tests) - [Unit Tests](#unit-tests)
- [Testing Forward Operators](#testing-forward-operators)
- [Testing Backward Operators](#testing-backward-operators)
- [Compiling and Running](#compiling-and-running)
- [Remarks](#remarks)
## Background ## Background
Here are the base types needed. For details, please refer to the design docs. Here are the base types needed. For details, please refer to the design docs.
...@@ -232,4 +235,122 @@ The system will automatically bind to Python and link it to a generated library. ...@@ -232,4 +235,122 @@ The system will automatically bind to Python and link it to a generated library.
## Unit Tests ## Unit Tests
Unit tests include comparing a forward operator's implementations on different devices, comparing a backward operator's implementation on different devices, and a scaling test for the backward operator. Here, we introduce the [unit tests for `MulOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py). Unit tests for an operator include
1. comparing a forward operator's implementations on different devices,
2. comparing a backward operator's implementation on different devices, and
3. a scaling test for the backward operator.
Here, we introduce the [unit tests for `MulOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py).
### Testing Forward Operators
A forward operator unit test inherits `unittest.TestCase` and defines metaclass `__metaclass__ = OpTestMeta`. More concrete tests are performed in `OpTestMeta`. Testing a forward operator requires the following:
1. Defining input, output and relevant attributes in `setUp` method.
2. Generating random input data.
3. Implementing the same computation logic in a Python script:
```python
import unittest
import numpy as np
from gradient_checker import GradientChecker, create_op
from op_test_util import OpTestMeta
class TestMulOp(unittest.TestCase):
__metaclass__ = OpTestMeta
def setUp(self):
self.type = "mul"
self.inputs = {
'X': np.random.random((32, 84)).astype("float32"),
'Y': np.random.random((84, 100)).astype("float32")
}
self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
```
Get its output, and compare it with the forward operator's own output.
The code above first loads required packages. In addition, we have
- `self.type = "mul" ` defines the type that is identical to what the operator's registered type.
- `self.inputs` defines input, with type `numpy.array` and initializes it.
- `self.outputs` defines output and completes the same operator computation in the Python script, and returns its result from the Python script.
### Testing Backward Operators
A backward operator unit test inherits `GradientChecker`, which inherits `unittest.TestCase`. As a result, **a backward operator unit test needs to be have the prefix `test_`**.
```python
class TestMulGradOp(GradientChecker):
def setUp(self):
self.op = create_op("mul")
self.inputs = {
'X': np.random.random((32, 84)).astype("float32"),
'Y': np.random.random((84, 100)).astype("float32")
}
def test_cpu_gpu_compare(self):
self.compare_grad(self.op, self.inputs)
def test_normal(self):
# mul op will enlarge the relative error
self.check_grad(
self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5)
def test_ignore_x(self):
self.check_grad(
self.op,
self.inputs, ["Y"],
"Out",
max_relative_error=0.5,
no_grad_set={"X"})
def test_ignore_y(self):
self.check_grad(
self.op,
self.inputs, ["X"],
"Out",
max_relative_error=0.5,
no_grad_set={"Y"})
```
Some key points in the code above include:
- `create_op("mul")` creates the backward operator's corresponding forward operator.
- `compare_grad` compares results between utilizing the CPU and the GPU.
- `test_normal` calls `check_grad` to validate scaling tests' correctness and stability through numeric methods.
- The first variable `self.op` denotes the forward operator.
- The second variable `self.inputs` denotes the input dictionary, which has its key value identical to its `ProtoMaker` definitions.
- The third variable `["X", "Y"]` appoints `X` and `Y` to be scale tested.
- The fourth variable `"Out"` points to the network's final output target `Out`.
- `test_ignore_x` and `test_ignore_y`branches test the cases where there is only one scaling input.
### Compiling and Running
Any new unit testing file of the format `test_*.py` added to the director `python/paddle/v2/framework/tests` is automatically added to the project to compile.
Note that **unlike the compile test for Ops, running unit tests requires compiling the entire project** and requires compiling with flag `WITH_TESTING` on i.e. `cmake paddle_dir -DWITH_TESTING=ON`.
After successfully compiling the project, run the following command to run unit tests:
```bash
make test ARGS="-R test_mul_op -V"
```
Or,
```bash
ctest -R test_mul_op
```
## Remarks
- Every `*_op.h` (if applicable), `*_op.cc`, and `*_op.cu` (if applicable) must be created for a unique Op. Compiling will fail if multiple operators are included per file.
- The type with which an operator is registered needs to be identical to the Op's name. Registering `REGISTER_OP(B, ...)` in `A_op.cc` will cause unit testing failures.
- If the operator does not implement a GPU kernel, please refrain from creating an empty `*_op.cu` file, or else unit tests will fail.
- If multiple operators rely on some shared methods, a file NOT named `*_op.*` can be created to store them, such as `gather.h`.
...@@ -26,7 +26,7 @@ cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope) ...@@ -26,7 +26,7 @@ cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope)
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry) cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry)
cc_library(grad_op_builder SRCS grad_op_builder.cc DEPS operator) cc_library(grad_op_builder SRCS grad_op_builder.cc DEPS operator)
cc_library(op_registry SRCS op_registry.cc DEPS grad_op_builder op_proto_maker) cc_library(op_registry SRCS op_registry.cc DEPS grad_op_builder op_proto_maker op_info)
cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry) cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry)
cc_test(grad_op_builder_test SRCS grad_op_builder_test.cc DEPS grad_op_builder op_registry add_op) cc_test(grad_op_builder_test SRCS grad_op_builder_test.cc DEPS grad_op_builder op_registry add_op)
......
...@@ -24,6 +24,9 @@ static ProgramDesc* g_program_desc = nullptr; ...@@ -24,6 +24,9 @@ static ProgramDesc* g_program_desc = nullptr;
ProgramDesc& GetProgramDesc() { ProgramDesc& GetProgramDesc() {
if (g_program_desc == nullptr) { if (g_program_desc == nullptr) {
g_program_desc = new ProgramDesc(); g_program_desc = new ProgramDesc();
auto root_block = g_program_desc->mutable_blocks()->Add();
root_block->set_idx(0);
root_block->set_parent_idx(-1);
} }
return *g_program_desc; return *g_program_desc;
} }
......
...@@ -45,6 +45,21 @@ inline AttrType AttrTypeID() { ...@@ -45,6 +45,21 @@ inline AttrType AttrTypeID() {
Attribute GetAttrValue(const OpDesc::Attr& attr_desc); Attribute GetAttrValue(const OpDesc::Attr& attr_desc);
class AttrReader {
public:
explicit AttrReader(const AttributeMap& attrs) : attrs_(attrs) {}
template <typename T>
inline const T& Get(const std::string& name) const {
PADDLE_ENFORCE(attrs_.count(name) != 0, "%s should be in AttributeMap",
name);
return boost::get<T>(attrs_.at(name));
}
private:
const AttributeMap& attrs_;
};
// check whether a value(attribute) fit a certain limit // check whether a value(attribute) fit a certain limit
template <typename T> template <typename T>
class GreaterThanChecker { class GreaterThanChecker {
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
## Motivation ## Motivation
In Neural Network, many model is solved by the the backpropagation algorithm(known as BP) at present. Technically it caculates the gradient of the loss function, then distributed back through the networks. Follows the chain rule, so we need a module chains the gradient operators/expressions together with to construct the backward pass. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. In Neural Network, most models are solved by the backpropagation algorithm(known as **BP**) at present. Technically, BP calculates the gradient of the loss function, then propagates it back through the networks following the chain rule. Hence we need a module that chains the gradient operators/expressions together to construct the backward pass. Every forward network needs a backward network to construct the full computation graph. The operator/expression's backward pass will be generated with respect to the forward pass.
## Implementation ## Implementation
...@@ -24,9 +24,9 @@ A backward network is built up with several backward operators. Backward operato ...@@ -24,9 +24,9 @@ A backward network is built up with several backward operators. Backward operato
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | | **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients | | **Operator::outputs_** | Outputs | InputGradients |
In most cases, there is a one-to-one correspondence between the forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced. In most cases, there is a one-to-one relation between the forward and backward operators. These relations are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and to make operators pluggable, the registry mechanism is introduced.
For example, we have got a `mul_op`, and we can register its information and corresponding backward operator by the following macro: For example, we have `mul_op`, and we can register its information and corresponding backward operator by the following macro:
```cpp ```cpp
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
...@@ -48,7 +48,7 @@ The function `BuildGradOp` will sequentially execute following processes: ...@@ -48,7 +48,7 @@ The function `BuildGradOp` will sequentially execute following processes:
1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`. 1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing. 2. Build two maps named `inputs` and `outputs` to temporarily store backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing.
3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`. 3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
...@@ -56,11 +56,11 @@ The function `BuildGradOp` will sequentially execute following processes: ...@@ -56,11 +56,11 @@ The function `BuildGradOp` will sequentially execute following processes:
### Backward Network Building ### Backward Network Building
A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and append them together one by one. There is some corner case need to process specially. A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and appending them together one by one. There are some corner cases that need special processing.
1. Op 1. Op
When the input forward network is an Op, return its gradient Operator Immediately. If all of its outputs are in no gradient set, then return a special `NOP`. When the input forward network is an Op, return its gradient Operator immediately. If all of its outputs are in no gradient set, then return a special `NOP`.
2. NetOp 2. NetOp
...@@ -68,33 +68,33 @@ A backward network is a series of backward operators. The main idea of building ...@@ -68,33 +68,33 @@ A backward network is a series of backward operators. The main idea of building
3. RnnOp 3. RnnOp
RnnOp is a nested stepnet operator. Backward module need to recusively call `Backward` for every stepnet. RnnOp is a nested stepnet operator. Backward module needs to recusively call `Backward` for every stepnet.
4. Sharing Variables 4. Sharing Variables
**sharing variables**. As illustrated in the pictures, two operator's share the same variable name of W@GRAD, which will overwrite their sharing input variable. As illustrated in the figure 1 and figure 2, two operators share the same variable name **W@GRAD**, which will overwrite their shared input variable.
<p align="center"> <p align="center">
<img src="./images/duplicate_op.png" width="50%" ><br/> <img src="./images/duplicate_op.png" width="50%" ><br/>
pic 1. Sharing variables in operators. Figure 1. Sharing variables in operators.
</p> </p>
​ Sharing variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator to replace the overwrite links. ​ Sharing variable between operators or same input variable used in multiple operators can lead to duplicate gradient variables. As illustrated in figure 2, we need to rename the gradient names recursively and add a generic add operator to prevent overwriting.
<p align="center"> <p align="center">
<img src="images/duplicate_op2.png" width="40%" ><br/> <img src="images/duplicate_op2.png" width="40%" ><br/>
pic 2. Replace sharing variable's gradient with `Add` operator. Figure 2. Replace sharing variable's gradient with `Add` operator.
</p> </p>
​ Because our framework finds variables accord to their names, we need to rename the output links. We add a suffix of number to represent its position in clockwise. ​ Because the framework finds variables according to their names, we need to rename the output links. We add an integer suffix to represent its position in the clockwise direction.
5. Part of Gradient is Zero. 5. Part of the Gradient is Zero.
In the whole graph, there is some case of that one operator's gradient is not needed, but its input's gradient is a dependency link of other operator, we need to fill a same shape gradient matrix in the position. In our implement, we insert a special `fillZeroLike` operator. In the whole graph, there is some case of that one operator's gradient is not needed, but its input's gradient is a dependency link of other operator, we need to fill a same shape gradient matrix in the position. In our implementation, we insert a special `fillZeroLike` operator.
Follow these rules above, then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it. Follow these rules above, then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.
...@@ -14,7 +14,7 @@ limitations under the License. */ ...@@ -14,7 +14,7 @@ limitations under the License. */
#include "paddle/framework/operator.h" #include "paddle/framework/operator.h"
#include <algorithm> #include <algorithm>
#include "paddle/framework/op_registry.h" #include <atomic>
namespace paddle { namespace paddle {
namespace framework { namespace framework {
...@@ -33,6 +33,24 @@ ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const { ...@@ -33,6 +33,24 @@ ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const {
} }
#endif #endif
const Tensor* GetTensorFromVar(const Variable* var) {
if (var->IsType<LoDTensor>()) {
return &var->Get<LoDTensor>();
}
PADDLE_ENFORCE(var->IsType<Tensor>(),
"The Input must be LoDTensor or Tensor.");
return &var->Get<Tensor>();
}
Tensor* GetTensorFromVar(Variable* var) {
if (var->IsType<LoDTensor>()) {
return var->GetMutable<LoDTensor>();
}
PADDLE_ENFORCE(var->IsType<Tensor>(),
"The Input must be LoDTensor or Tensor.");
return var->GetMutable<Tensor>();
}
std::string OperatorBase::Input(const std::string& name) const { std::string OperatorBase::Input(const std::string& name) const {
auto& ins = Inputs(name); auto& ins = Inputs(name);
PADDLE_ENFORCE_LE(ins.size(), 1UL, PADDLE_ENFORCE_LE(ins.size(), 1UL,
......
...@@ -24,6 +24,7 @@ limitations under the License. */ ...@@ -24,6 +24,7 @@ limitations under the License. */
#include "paddle/framework/framework.pb.h" #include "paddle/framework/framework.pb.h"
#include "paddle/framework/lod_tensor.h" #include "paddle/framework/lod_tensor.h"
#include "paddle/framework/scope.h" #include "paddle/framework/scope.h"
#include "paddle/framework/shape_inference.h"
#include "paddle/framework/tensor.h" #include "paddle/framework/tensor.h"
#include "paddle/platform/device_context.h" #include "paddle/platform/device_context.h"
#include "paddle/platform/place.h" #include "paddle/platform/place.h"
...@@ -56,6 +57,9 @@ class OperatorBase; ...@@ -56,6 +57,9 @@ class OperatorBase;
class InferShapeContext; class InferShapeContext;
class ExecutionContext; class ExecutionContext;
extern const Tensor* GetTensorFromVar(const Variable* var);
extern Tensor* GetTensorFromVar(Variable* var);
/** /**
* OperatorBase has the basic element that Net will call to do computation. * OperatorBase has the basic element that Net will call to do computation.
* Only CreateOperator from OpRegistry will new Operator directly. User * Only CreateOperator from OpRegistry will new Operator directly. User
...@@ -262,15 +266,6 @@ class InferShapeContext { ...@@ -262,15 +266,6 @@ class InferShapeContext {
return res; return res;
} }
const Tensor* GetTensorFromVar(const Variable* var) const {
if (var->IsType<LoDTensor>()) {
return &var->Get<LoDTensor>();
}
PADDLE_ENFORCE(var->IsType<Tensor>(),
"The Input(%s) must be LoDTensor or Tensor.");
return &var->Get<Tensor>();
}
void ShareLoD(const std::string& in, const std::string& out, size_t i = 0, void ShareLoD(const std::string& in, const std::string& out, size_t i = 0,
size_t j = 0) const { size_t j = 0) const {
PADDLE_ENFORCE_LT(i, InputSize(in)); PADDLE_ENFORCE_LT(i, InputSize(in));
...@@ -340,6 +335,78 @@ class ExecutionContext : public InferShapeContext { ...@@ -340,6 +335,78 @@ class ExecutionContext : public InferShapeContext {
const platform::DeviceContext& device_context_; const platform::DeviceContext& device_context_;
}; };
class RuntimeInferShapeContext : public InferShapeContextBase {
public:
RuntimeInferShapeContext(const OperatorBase& op, const Scope& scope)
: op_(op), scope_(scope) {}
bool HasInput(const std::string& name) const {
auto ipt = op_.Input(name);
auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
return var != nullptr;
}
bool HasOutput(const std::string& name) const {
auto ipt = op_.Output(name);
auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
return var != nullptr;
}
DDim GetInputDim(const std::string& name) const {
return GetDim(op_.Input(name));
}
void SetInputDim(const std::string& name, const DDim& dim) {
SetDim(op_.Input(name), dim);
}
DDim GetOutputDim(const std::string& name) const {
return GetDim(op_.Output(name));
}
void SetOutputDim(const std::string& name, const DDim& dim) {
SetDim(op_.Output(name), dim);
}
AttrReader Attrs() const { return AttrReader(op_.Attrs()); }
const std::vector<std::string>& Inputs(const std::string& name) const {
return op_.Inputs(name);
}
const std::vector<std::string>& Outputs(const std::string& name) const {
return op_.Outputs(name);
}
private:
template <bool Allocate>
Tensor* GetTensor(const std::string& name) const {
Tensor* t = nullptr;
auto* var = scope_.FindVar(name);
if (!var->IsType<LoDTensor>() && !var->IsType<Tensor>()) {
if (Allocate) {
t = var->GetMutable<LoDTensor>();
} else {
PADDLE_THROW("Variable(%s) should be tensor", name);
}
} else {
t = GetTensorFromVar(scope_.FindVar(name));
}
return t;
}
DDim GetDim(const std::string& name) const {
return GetTensor<false>(name)->dims();
}
void SetDim(const std::string& name, const DDim& dim) {
GetTensor<true>(name)->Resize(dim);
}
const OperatorBase& op_;
const Scope& scope_;
};
class OpKernel { class OpKernel {
public: public:
/** /**
...@@ -383,8 +450,10 @@ class OperatorWithKernel : public OperatorBase { ...@@ -383,8 +450,10 @@ class OperatorWithKernel : public OperatorBase {
const VariableNameMap& outputs, const AttributeMap& attrs) const VariableNameMap& outputs, const AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {} : OperatorBase(type, inputs, outputs, attrs) {}
// runtime infershape
void InferShape(const Scope& scope) const override { void InferShape(const Scope& scope) const override {
InferShape(InferShapeContext(*this, scope)); auto c = RuntimeInferShapeContext(*this, scope);
InferShape(&c);
} }
void Run(const Scope& scope, void Run(const Scope& scope,
...@@ -406,7 +475,7 @@ class OperatorWithKernel : public OperatorBase { ...@@ -406,7 +475,7 @@ class OperatorWithKernel : public OperatorBase {
} }
protected: protected:
virtual void InferShape(const InferShapeContext& ctx) const = 0; virtual void InferShape(InferShapeContextBase* ctx) const = 0;
}; };
} // namespace framework } // namespace framework
......
...@@ -14,6 +14,7 @@ limitations under the License. */ ...@@ -14,6 +14,7 @@ limitations under the License. */
#include "paddle/framework/operator.h" #include "paddle/framework/operator.h"
#include "gtest/gtest.h" #include "gtest/gtest.h"
#include "paddle/framework/op_info.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
namespace paddle { namespace paddle {
...@@ -114,7 +115,7 @@ class OpWithKernelTest : public OperatorWithKernel { ...@@ -114,7 +115,7 @@ class OpWithKernelTest : public OperatorWithKernel {
using OperatorWithKernel::OperatorWithKernel; using OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override {} void InferShape(framework::InferShapeContextBase* ctx) const override {}
}; };
template <typename T1, typename T2> template <typename T1, typename T2>
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/ddim.h"
namespace paddle {
namespace framework {
class InferShapeContextBase {
public:
virtual ~InferShapeContextBase() {}
virtual bool HasInput(const std::string &name) const = 0;
virtual bool HasOutput(const std::string &name) const = 0;
virtual framework::DDim GetInputDim(const std::string &name) const = 0;
std::vector<framework::DDim> GetInputsDim(const std::string &name) const {
const std::vector<std::string> &names = Inputs(name);
return GetDims(names);
}
virtual void SetInputDim(const std::string &name,
const framework::DDim &dim) = 0;
void SetInputsDim(const std::string &name,
const std::vector<framework::DDim> &dims) {
auto &names = Inputs(name);
SetDims(names, dims);
}
virtual framework::DDim GetOutputDim(const std::string &name) const = 0;
std::vector<framework::DDim> GetOutputsDim(const std::string &name) const {
const std::vector<std::string> &names = Outputs(name);
return GetDims(names);
}
virtual void SetOutputDim(const std::string &name, const DDim &dim) = 0;
void SetOutputsDim(const std::string &name,
const std::vector<framework::DDim> &dims) {
auto &names = Outputs(name);
SetDims(names, dims);
}
virtual AttrReader Attrs() const = 0;
virtual const std::vector<std::string> &Inputs(
const std::string &name) const = 0;
virtual const std::vector<std::string> &Outputs(
const std::string &name) const = 0;
// TODO(qiao) implement this function
void ShareLoD(const std::string &in, const std::string &out, size_t i = 0,
size_t j = 0) const {}
protected:
virtual framework::DDim GetDim(const std::string &name) const = 0;
virtual void SetDim(const std::string &name, const framework::DDim &dim) = 0;
std::vector<framework::DDim> GetDims(
const std::vector<std::string> &names) const {
std::vector<framework::DDim> ret;
ret.reserve(names.size());
std::transform(
names.begin(), names.end(), std::back_inserter(ret),
[this](const std::string &name) { return this->GetDim(name); });
return ret;
}
void SetDims(const std::vector<std::string> &names,
const std::vector<framework::DDim> &dims) {
size_t length = names.size();
PADDLE_ENFORCE_EQ(length, dims.size());
for (size_t i = 0; i < length; ++i) {
SetDim(names[i], dims[i]);
}
}
};
} // namespace framework
} // namespace paddle
...@@ -7,7 +7,7 @@ Variable is also known as *blob* in MxNet and Caffe2. It is the input and outpu ...@@ -7,7 +7,7 @@ Variable is also known as *blob* in MxNet and Caffe2. It is the input and outpu
For the flexibility of a DL system, a variable should be able to contain any typed value -- a tensor in most cases, but could also be some integer IDs or a scope of other variables in the case of RNN. For the flexibility of a DL system, a variable should be able to contain any typed value -- a tensor in most cases, but could also be some integer IDs or a scope of other variables in the case of RNN.
To use the minimum amount of memory, we'd like that a variable to allocate memory when it has to, or, lazy memory allocation. Let's take the following example: To use the minimum amount of memory, we would like that a variable allocates memory only when it has to, or, lazy memory allocation. Let's take the following example:
```cpp ```cpp
Variable vr, v1, v2; Variable vr, v1, v2;
...@@ -38,7 +38,7 @@ This syntax for lazy memory allocation when we call `Randomize` and `Mult`, thos ...@@ -38,7 +38,7 @@ This syntax for lazy memory allocation when we call `Randomize` and `Mult`, thos
To make memory allocation lazy, we cannot assume that we know the type held by a variable at definition time. In other words, `class Variable` cannot be a template `template <T> class Variable`. To make memory allocation lazy, we cannot assume that we know the type held by a variable at definition time. In other words, `class Variable` cannot be a template `template <T> class Variable`.
Because we don't know the type `T`, we cannot save a `T*` as `Variable's` data member. Instead, we save an interface object `Placeholder`, who can return the pointer to the saved object via `Placeholder::Ptr()` as `void*`. Because we don't know the type `T`, we cannot save a `T*` as `Variable's` data member. Instead, we save an interface object `Placeholder`, which can return the pointer to the saved object via `Placeholder::Ptr()` as `void*`.
But anyway, Variable needs to know `T` so could it `delete<T>(ptr)` and so could `Variable::Get` checks the expected type and the saved object's type. But anyway, Variable needs to know `T` so could it `delete<T>(ptr)` and so could `Variable::Get` checks the expected type and the saved object's type.
...@@ -49,4 +49,4 @@ Because `PlaceholderImpl` knows `T`, it can save and return `typeid(T)` for the ...@@ -49,4 +49,4 @@ Because `PlaceholderImpl` knows `T`, it can save and return `typeid(T)` for the
## Conclusion ## Conclusion
The technique type hiding utilizes C++ class templates, interface and derivation, and C++ RTTI (typeid). This combination saves us from definition something like `caffe2::TypeMata`, which takes hundreds of lines of C++ code. The technique type hiding utilizes C++ class templates, interface and derivation, and C++ RTTI (typeid). This combination saves us from defining something like `caffe2::TypeMeta`, which takes hundreds of lines of C++ code.
...@@ -94,10 +94,14 @@ add_subdirectory(math) ...@@ -94,10 +94,14 @@ add_subdirectory(math)
set(DEPS_OPS set(DEPS_OPS
recurrent_op recurrent_op
cond_op) cond_op
cross_entropy_op
softmax_with_cross_entropy_op)
op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc
DEPS framework_proto tensor net_op) DEPS framework_proto tensor net_op)
op_library(cond_op SRCS cond_op.cc DEPS framework_proto tensor operator net_op) op_library(cond_op SRCS cond_op.cc DEPS framework_proto tensor operator net_op)
op_library(cross_entropy_op DEPS cross_entropy_function)
op_library(softmax_with_cross_entropy_op DEPS cross_entropy_function softmax_function)
list(REMOVE_ITEM GENERAL_OPS ${DEPS_OPS}) list(REMOVE_ITEM GENERAL_OPS ${DEPS_OPS})
foreach(src ${GENERAL_OPS}) foreach(src ${GENERAL_OPS})
......
...@@ -22,25 +22,23 @@ class AccuracyOp : public framework::OperatorWithKernel { ...@@ -22,25 +22,23 @@ class AccuracyOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasInput("Inference"),
ctx.InputVar("Inference"),
"Input(Inference) of AccuracyOp should not be null."); "Input(Inference) of AccuracyOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"),
"Input(Label) of AccuracyOp should not be null."); "Input(Label) of AccuracyOp should not be null.");
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasOutput("Accuracy"),
ctx.OutputVar("Accuracy"),
"Output(Accuracy) of AccuracyOp should not be null."); "Output(Accuracy) of AccuracyOp should not be null.");
auto *inference = ctx.Input<framework::Tensor>("Inference"); auto inference_dim = ctx->GetInputDim("Inference");
auto *label = ctx.Input<framework::Tensor>("Label"); auto label_dim = ctx->GetInputDim("Label");
PADDLE_ENFORCE_EQ(label->dims().size(), 1, "label must be a vector"); PADDLE_ENFORCE_EQ(label_dim.size(), 1, "label must be a vector");
PADDLE_ENFORCE_EQ(inference->dims()[0], label->dims()[0], PADDLE_ENFORCE_EQ(inference_dim[0], label_dim[0],
"inference size must be the same as label size"); "inference size must be the same as label size");
ctx.Output<framework::Tensor>("Accuracy")->Resize({1}); ctx->SetOutputDim("Accuracy", {1});
ctx.ShareLoD("Inference", /*->*/ "Accuracy"); ctx->ShareLoD("Inference", /*->*/ "Accuracy");
} }
}; };
......
...@@ -22,10 +22,9 @@ class ActivationOp : public framework::OperatorWithKernel { ...@@ -22,10 +22,9 @@ class ActivationOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
ctx.Output<framework::Tensor>("Y")->Resize( ctx->SetOutputDim("Y", ctx->GetInputDim("X"));
ctx.Input<framework::Tensor>("X")->dims()); ctx->ShareLoD("X", /*->*/ "Y");
ctx.ShareLoD("X", /*->*/ "Y");
} }
}; };
...@@ -34,9 +33,8 @@ class ActivationOpGrad : public framework::OperatorWithKernel { ...@@ -34,9 +33,8 @@ class ActivationOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
ctx.Output<framework::Tensor>(framework::GradVarName("X")) ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("Y"));
->Resize(ctx.Input<framework::Tensor>("Y")->dims());
} }
}; };
......
...@@ -22,25 +22,23 @@ class AddOp : public framework::OperatorWithKernel { ...@@ -22,25 +22,23 @@ class AddOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) of AddOp should not be null.");
"Input(X) of AddOp should not be null."); PADDLE_ENFORCE(ctx->HasInput("Y"), "Input(Y) of AddOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Input(Y) of AddOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"),
"Output(Out) of AddOp should not be null."); "Output(Out) of AddOp should not be null.");
PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("X")->dims(), auto x_dims = ctx->GetInputDim("X");
ctx.Input<Tensor>("Y")->dims(), auto y_dims = ctx->GetInputDim("Y");
PADDLE_ENFORCE_EQ(x_dims, y_dims,
"Two input of Add Op's dimension must be same."); "Two input of Add Op's dimension must be same.");
ctx.Output<framework::Tensor>("Out")->Resize( ctx->SetOutputDim("Out", x_dims);
ctx.Input<Tensor>("X")->dims());
} }
}; };
class AddOpMaker : public framework::OpProtoAndCheckerMaker { class AddOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AddOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) AddOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The first input of add op"); AddInput("X", "The first input of add op");
AddInput("Y", "The second input of add op"); AddInput("Y", "The second input of add op");
...@@ -58,7 +56,7 @@ class AddOpGrad : public framework::OperatorWithKernel { ...@@ -58,7 +56,7 @@ class AddOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override {} void InferShape(framework::InferShapeContextBase* ctx) const override {}
}; };
} // namespace operators } // namespace operators
......
...@@ -22,28 +22,28 @@ class ClipOp : public framework::OperatorWithKernel { ...@@ -22,28 +22,28 @@ class ClipOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of ClipOp should not be null."); "Input(X) of ClipOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of ClipOp should not be null."); "Output(Out) of ClipOp should not be null.");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto max = Attr<float>("max"); auto max = Attr<float>("max");
auto min = Attr<float>("min"); auto min = Attr<float>("min");
PADDLE_ENFORCE_LT(min, max, "max should be greater than min."); PADDLE_ENFORCE_LT(min, max, "max should be greater than min.");
ctx.Output<Tensor>("Out")->Resize(x_dims); ctx->SetOutputDim("Out", x_dims);
ctx.ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
} }
}; };
template <typename AttrType> template <typename AttrType>
class ClipOpMaker : public framework::OpProtoAndCheckerMaker { class ClipOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ClipOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) ClipOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor)The input of clip op." "(Tensor)The input of clip op."
"The input should be a k-D tensor(k > 0 and k < 7)"); "The number of dimensions must be between [1, 9].");
AddOutput("Out", "(Tensor)The output of clip op with shape as input(X)"); AddOutput("Out", "(Tensor)The output of clip op with shape as input(X)");
AddAttr<AttrType>( AddAttr<AttrType>(
"min", "(float)Minimum value, under which element is replaced by min."); "min", "(float)Minimum value, under which element is replaced by min.");
...@@ -61,14 +61,13 @@ class ClipOpGrad : public framework::OperatorWithKernel { ...@@ -61,14 +61,13 @@ class ClipOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X")); if (ctx->HasOutput(framework::GradVarName("X"))) {
if (x_grad != nullptr) { ctx->SetOutputDim(framework::GradVarName("X"), x_dims);
x_grad->Resize(x_dims);
} }
} }
}; };
......
...@@ -24,31 +24,30 @@ class ConcatOp : public framework::OperatorWithKernel { ...@@ -24,31 +24,30 @@ class ConcatOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of ConcatOp should not be null."); "Output(Out) of ConcatOp should not be null.");
auto ins = ctx.MultiInput<framework::Tensor>("X"); auto ins = ctx->GetInputsDim("X");
auto *out = ctx.Output<framework::Tensor>("Out"); size_t axis = static_cast<size_t>(ctx->Attrs().Get<int>("axis"));
size_t axis = static_cast<size_t>(ctx.Attr<int>("axis"));
size_t n = ins.size(); size_t n = ins.size();
PADDLE_ENFORCE_GT(n, 1, "Input tensors count should > 1."); PADDLE_ENFORCE_GT(n, 1, "Input tensors count should > 1.");
auto out_dims = ins[0]->dims(); auto out_dims = ins[0];
size_t in_zero_dims_size = out_dims.size(); size_t in_zero_dims_size = out_dims.size();
for (size_t i = 1; i < n; i++) { for (size_t i = 1; i < n; i++) {
for (size_t j = 0; j < in_zero_dims_size; j++) { for (size_t j = 0; j < in_zero_dims_size; j++) {
if (j == axis) { if (j == axis) {
out_dims[axis] += ins[i]->dims()[j]; out_dims[axis] += ins[i][j];
continue; continue;
} }
PADDLE_ENFORCE_EQ(out_dims[j], ins[i]->dims()[j], PADDLE_ENFORCE_EQ(out_dims[j], ins[i][j],
"Input tensors should have the same " "Input tensors should have the same "
"elements except the specify axis.") "elements except the specify axis.")
} }
} }
out->Resize(out_dims); ctx->SetOutputDim("Out", out_dims);
} }
}; };
......
...@@ -27,27 +27,25 @@ class Conv2DOp : public framework::OperatorWithKernel { ...@@ -27,27 +27,25 @@ class Conv2DOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Input"), PADDLE_ENFORCE(ctx->HasInput("Input"),
"Input(Input) of Conv2DOp should not be null."); "Input(Input) of Conv2DOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Filter"), PADDLE_ENFORCE(ctx->HasInput("Filter"),
"Input(Filter) of Conv2DOp should not be null."); "Input(Filter) of Conv2DOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Output"), PADDLE_ENFORCE(ctx->HasOutput("Output"),
"Output(Output) of Conv2DOp should not be null."); "Output(Output) of Conv2DOp should not be null.");
auto in = ctx.Input<Tensor>("Input"); auto in_dims = ctx->GetInputDim("Input");
auto filter = ctx.Input<Tensor>("Filter"); auto filter_dims = ctx->GetInputDim("Filter");
auto out = ctx.Output<framework::Tensor>("Output"); std::vector<int> strides = ctx->Attrs().Get<std::vector<int>>("strides");
std::vector<int> strides = Attr<std::vector<int>>("strides"); std::vector<int> paddings = ctx->Attrs().Get<std::vector<int>>("paddings");
std::vector<int> paddings = Attr<std::vector<int>>("paddings"); int groups = ctx->Attrs().Get<int>("groups");
int groups = Attr<int>("groups"); int input_channels = in_dims[1];
int input_channels = in->dims()[1]; int output_channels = filter_dims[0];
int output_channels = filter->dims()[0];
PADDLE_ENFORCE_EQ(in_dims.size(), 4, "Conv2DOp input should be 4-D.");
PADDLE_ENFORCE_EQ(in->dims().size(), 4, "Conv2DOp input should be 4-D."); PADDLE_ENFORCE_EQ(filter_dims.size(), 4, "Conv2DOp filter should be 4-D.");
PADDLE_ENFORCE_EQ(filter->dims().size(), 4, PADDLE_ENFORCE_EQ(input_channels, filter_dims[1] * groups,
"Conv2DOp filter should be 4-D.");
PADDLE_ENFORCE_EQ(input_channels, filter->dims()[1] * groups,
"The number of input channels should be equal to filter " "The number of input channels should be equal to filter "
"channels * groups."); "channels * groups.");
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
...@@ -55,17 +53,17 @@ class Conv2DOp : public framework::OperatorWithKernel { ...@@ -55,17 +53,17 @@ class Conv2DOp : public framework::OperatorWithKernel {
"The number of output channels should be divided by groups."); "The number of output channels should be divided by groups.");
auto output_height = auto output_height =
outputSize(in->dims()[2], filter->dims()[2], paddings[0], strides[0]); outputSize(in_dims[2], filter_dims[2], paddings[0], strides[0]);
auto output_width = auto output_width =
outputSize(in->dims()[3], filter->dims()[3], paddings[1], strides[1]); outputSize(in_dims[3], filter_dims[3], paddings[1], strides[1]);
out->Resize( ctx->SetOutputDim(
{in->dims()[0], filter->dims()[0], output_height, output_width}); "Output", {in_dims[0], filter_dims[0], output_height, output_width});
} }
}; };
class Conv2DOpMaker : public framework::OpProtoAndCheckerMaker { class Conv2DOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Conv2DOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) Conv2DOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"Input", "Input",
...@@ -108,14 +106,15 @@ class Conv2DOpGrad : public framework::OperatorWithKernel { ...@@ -108,14 +106,15 @@ class Conv2DOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
auto in = ctx.Input<Tensor>("Input"); auto in_dims = ctx->GetInputDim("Input");
auto filter = ctx.Input<Tensor>("Filter"); auto filter_dims = ctx->GetInputDim("Filter");
auto d_in = ctx.Output<framework::Tensor>(framework::GradVarName("Input")); if (ctx->HasOutput(framework::GradVarName("Input"))) {
auto d_filter = ctx->SetOutputDim(framework::GradVarName("Input"), in_dims);
ctx.Output<framework::Tensor>(framework::GradVarName("Filter")); }
if (d_in) d_in->Resize(in->dims()); if (ctx->HasOutput(framework::GradVarName("Filter"))) {
if (d_filter) d_filter->Resize(filter->dims()); ctx->SetOutputDim(framework::GradVarName("Filter"), filter_dims);
}
} }
}; };
......
...@@ -24,22 +24,22 @@ class CosSimOp : public framework::OperatorWithKernel { ...@@ -24,22 +24,22 @@ class CosSimOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
// notnull check // notnull check
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of CosSimOp should not be null."); "Input(X) of CosSimOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), PADDLE_ENFORCE(ctx->HasInput("Y"),
"Input(Y) of CosSimOp should not be null."); "Input(Y) of CosSimOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of CosSimOp should not be null."); "Output(Out) of CosSimOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("XNorm"), PADDLE_ENFORCE(ctx->HasOutput("XNorm"),
"Output(XNorm) of CosSimOp should not be null."); "Output(XNorm) of CosSimOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("YNorm"), PADDLE_ENFORCE(ctx->HasOutput("YNorm"),
"Output(YNorm) of CosSimOp should not be null."); "Output(YNorm) of CosSimOp should not be null.");
// shape check // shape check
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto y_dims = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx->GetInputDim("Y");
PADDLE_ENFORCE_EQ(x_dims.size(), y_dims.size(), PADDLE_ENFORCE_EQ(x_dims.size(), y_dims.size(),
"Ranks of Input(X) and Input(Y) must be equal."); "Ranks of Input(X) and Input(Y) must be equal.");
...@@ -54,16 +54,16 @@ class CosSimOp : public framework::OperatorWithKernel { ...@@ -54,16 +54,16 @@ class CosSimOp : public framework::OperatorWithKernel {
" just 1 (which will be broadcasted to match Input(X))."); " just 1 (which will be broadcasted to match Input(X)).");
// resize tensor // resize tensor
ctx.Output<framework::Tensor>("Out")->Resize({x_dims[0], 1}); ctx->SetOutputDim("Out", {x_dims[0], 1});
ctx.Output<framework::Tensor>("XNorm")->Resize({x_dims[0], 1}); ctx->SetOutputDim("XNorm", {x_dims[0], 1});
ctx.Output<framework::Tensor>("YNorm")->Resize({y_dims[0], 1}); ctx->SetOutputDim("YNorm", {y_dims[0], 1});
ctx.ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
} }
}; };
class CosSimOpMaker : public framework::OpProtoAndCheckerMaker { class CosSimOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CosSimOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) CosSimOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The 1st input of cos_sim op."); AddInput("X", "The 1st input of cos_sim op.");
AddInput("Y", "The 2nd input of cos_sim op."); AddInput("Y", "The 2nd input of cos_sim op.");
...@@ -98,27 +98,23 @@ class CosSimOpGrad : public framework::OperatorWithKernel { ...@@ -98,27 +98,23 @@ class CosSimOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
// notnull check // notnull check
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) must not be null."); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) must not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) must not be null."); PADDLE_ENFORCE(ctx->HasInput("Y"), "Input(Y) must not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("XNorm"), PADDLE_ENFORCE(ctx->HasInput("XNorm"), "Input(XNorm) must not be null.");
"Input(XNorm) must not be null."); PADDLE_ENFORCE(ctx->HasInput("YNorm"), "Input(YNorm) must not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("YNorm"), PADDLE_ENFORCE(ctx->HasInput("Out"), "Input(Out) must not be null.");
"Input(YNorm) must not be null."); PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Out"),
"Input(Out) must not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")),
"Input(Out@GRAD) must not be null."); "Input(Out@GRAD) must not be null.");
// shape check // shape check
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto y_dims = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx->GetInputDim("Y");
auto xnorm_dims = ctx.Input<Tensor>("XNorm")->dims(); auto xnorm_dims = ctx->GetInputDim("XNorm");
auto ynorm_dims = ctx.Input<Tensor>("YNorm")->dims(); auto ynorm_dims = ctx->GetInputDim("YNorm");
auto out_dims = ctx.Input<Tensor>("Out")->dims(); auto out_dims = ctx->GetInputDim("Out");
auto out_grad_dims = auto out_grad_dims = ctx->GetInputDim(framework::GradVarName("Out"));
ctx.Input<Tensor>(framework::GradVarName("Out"))->dims();
PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(), PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(),
"Ranks of Input(X) and Input(Y) must be equal."); "Ranks of Input(X) and Input(Y) must be equal.");
...@@ -143,10 +139,14 @@ class CosSimOpGrad : public framework::OperatorWithKernel { ...@@ -143,10 +139,14 @@ class CosSimOpGrad : public framework::OperatorWithKernel {
"Shape of Input(Out@Grad) must be [X.Dim(0), 1]."); "Shape of Input(Out@Grad) must be [X.Dim(0), 1].");
// resize tensor // resize tensor
auto *x_grad = ctx.Output<framework::Tensor>(framework::GradVarName("X")); auto x_grad_name = framework::GradVarName("X");
auto *y_grad = ctx.Output<framework::Tensor>(framework::GradVarName("Y")); auto y_grad_name = framework::GradVarName("Y");
if (x_grad) x_grad->Resize(x_dims); if (ctx->HasOutput(x_grad_name)) {
if (y_grad) y_grad->Resize(y_dims); ctx->SetOutputDim(x_grad_name, x_dims);
}
if (ctx->HasOutput(y_grad_name)) {
ctx->SetOutputDim(y_grad_name, y_dims);
}
} }
}; };
......
...@@ -25,16 +25,14 @@ class CropOp : public framework::OperatorWithKernel { ...@@ -25,16 +25,14 @@ class CropOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of CropOp should not be null."); "Input(X) of CropOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of CropOp should not be null."); "Output(Out) of CropOp should not be null.");
auto x_dim = ctx.Input<Tensor>("X")->dims(); auto x_dim = ctx->GetInputDim("X");
auto *y = ctx.Input<Tensor>("Y"); if (!ctx->HasInput("Y")) {
auto *out = ctx.Output<Tensor>("Out"); auto shape = ctx->Attrs().Get<std::vector<int>>("shape");
if (y == nullptr) {
auto shape = Attr<std::vector<int>>("shape");
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
int64_t(shape.size()), x_dim.size(), int64_t(shape.size()), x_dim.size(),
"Shape size should be equal to dimention size of input tensor."); "Shape size should be equal to dimention size of input tensor.");
...@@ -42,19 +40,20 @@ class CropOp : public framework::OperatorWithKernel { ...@@ -42,19 +40,20 @@ class CropOp : public framework::OperatorWithKernel {
for (size_t i = 0; i < shape.size(); ++i) { for (size_t i = 0; i < shape.size(); ++i) {
tensor_shape[i] = static_cast<int64_t>(shape[i]); tensor_shape[i] = static_cast<int64_t>(shape[i]);
} }
out->Resize(framework::make_ddim(tensor_shape)); ctx->SetOutputDim("Out", framework::make_ddim(tensor_shape));
} else { } else {
PADDLE_ENFORCE_EQ(framework::arity(x_dim), framework::arity(y->dims()), auto y_dim = ctx->GetInputDim("Y");
PADDLE_ENFORCE_EQ(framework::arity(x_dim), framework::arity(y_dim),
"Tensor rank of both CropOp's " "Tensor rank of both CropOp's "
"inputs must be same."); "inputs must be same.");
out->Resize(y->dims()); ctx->SetOutputDim("Out", y_dim);
} }
} }
}; };
class CropOpMaker : public framework::OpProtoAndCheckerMaker { class CropOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CropOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) CropOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input of pad op. " "The input of pad op. "
...@@ -116,14 +115,14 @@ class CropOpGrad : public framework::OperatorWithKernel { ...@@ -116,14 +115,14 @@ class CropOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X")); auto x_grad_name = framework::GradVarName("X");
if (x_grad != nullptr) { if (ctx->HasOutput(x_grad_name)) {
x_grad->Resize(x_dims); ctx->SetOutputDim(x_grad_name, x_dims);
} }
} }
}; };
......
...@@ -22,33 +22,30 @@ class CrossEntropyOp : public framework::OperatorWithKernel { ...@@ -22,33 +22,30 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should be not null."); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should be not null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"), "Input(Label) should be not null.");
"Input(Label) should be not null."); PADDLE_ENFORCE(ctx->HasOutput("Y"), "Output(Y) should be not null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Y"),
"Output(Y) should be not null."); auto x_dims = ctx->GetInputDim("X");
auto label_dims = ctx->GetInputDim("Label");
auto x = ctx.Input<Tensor>("X"); PADDLE_ENFORCE_EQ(x_dims.size(), 2, "Input(X)'s rank should be 2.");
auto label = ctx.Input<Tensor>("Label"); PADDLE_ENFORCE_EQ(label_dims.size(), 2, "Input(Label)'s rank should be 2.");
PADDLE_ENFORCE_EQ(x->dims().size(), 2, "Input(X)'s rank should be 2."); PADDLE_ENFORCE_EQ(x_dims[0], label_dims[0],
PADDLE_ENFORCE_EQ(label->dims().size(), 2,
"Input(Label)'s rank should be 2.");
PADDLE_ENFORCE_EQ(x->dims()[0], label->dims()[0],
"The 1st dimension of Input(X) and Input(Label) should " "The 1st dimension of Input(X) and Input(Label) should "
"be equal."); "be equal.");
if (ctx.Attr<bool>("softLabel")) { if (ctx->Attrs().Get<bool>("softLabel")) {
PADDLE_ENFORCE_EQ(x->dims()[1], label->dims()[1], PADDLE_ENFORCE_EQ(x_dims[1], label_dims[1],
"If Attr(softLabel) == true, the 2nd dimension of " "If Attr(softLabel) == true, the 2nd dimension of "
"Input(X) and Input(Label) should be equal."); "Input(X) and Input(Label) should be equal.");
} else { } else {
PADDLE_ENFORCE_EQ(label->dims()[1], 1, PADDLE_ENFORCE_EQ(label_dims[1], 1,
"If Attr(softLabel) == false, the 2nd dimension of " "If Attr(softLabel) == false, the 2nd dimension of "
"Input(Label) should be 1."); "Input(Label) should be 1.");
} }
ctx.Output<Tensor>("Y")->Resize({x->dims()[0], 1}); ctx->SetOutputDim("Y", {x_dims[0], 1});
ctx.ShareLoD("X", /*->*/ "Y"); ctx->ShareLoD("X", /*->*/ "Y");
} }
}; };
...@@ -57,50 +54,45 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel { ...@@ -57,50 +54,45 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should be not null."); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should be not null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"), "Input(Label) should be not null.");
"Input(Label) should be not null."); PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Y")),
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Y")),
"Input(Y@GRAD) shoudl be not null."); "Input(Y@GRAD) shoudl be not null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar(framework::GradVarName("X")), PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("X")),
"Output(X@GRAD) should be not null."); "Output(X@GRAD) should be not null.");
auto x = ctx.Input<Tensor>("X"); auto x_dims = ctx->GetInputDim("X");
auto label = ctx.Input<Tensor>("Label"); auto label_dims = ctx->GetInputDim("Label");
auto dy = ctx.Input<Tensor>(framework::GradVarName("Y")); auto dy_dims = ctx->GetInputDim(framework::GradVarName("Y"));
PADDLE_ENFORCE_EQ(x->dims().size(), 2, "Input(X)'s rank should be 2."); PADDLE_ENFORCE_EQ(x_dims.size(), 2, "Input(X)'s rank should be 2.");
PADDLE_ENFORCE_EQ(dy->dims().size(), 2, PADDLE_ENFORCE_EQ(dy_dims.size(), 2, "Input(Y@Grad)'s rank should be 2.");
"Input(Y@Grad)'s rank should be 2."); PADDLE_ENFORCE_EQ(label_dims.size(), 2, "Input(Label)'s rank should be 2.");
PADDLE_ENFORCE_EQ(label->dims().size(), 2, PADDLE_ENFORCE_EQ(x_dims[0], label_dims[0],
"Input(Label)'s rank should be 2.");
PADDLE_ENFORCE_EQ(x->dims()[0], label->dims()[0],
"The 1st dimension of Input(X) and Input(Label) should " "The 1st dimension of Input(X) and Input(Label) should "
"be equal."); "be equal.");
PADDLE_ENFORCE_EQ(x->dims()[0], dy->dims()[0], PADDLE_ENFORCE_EQ(x_dims[0], dy_dims[0],
"The 1st dimension of Input(X) and Input(Y@Grad) should " "The 1st dimension of Input(X) and Input(Y@Grad) should "
"be equal."); "be equal.");
PADDLE_ENFORCE_EQ(dy->dims()[1], 1, PADDLE_ENFORCE_EQ(dy_dims[1], 1,
"The 2nd dimension of Input(Y@Grad) should be 1."); "The 2nd dimension of Input(Y@Grad) should be 1.");
if (ctx.Attr<bool>("softLabel")) { if (ctx->Attrs().Get<bool>("softLabel")) {
PADDLE_ENFORCE_EQ(x->dims()[1], label->dims()[1], PADDLE_ENFORCE_EQ(x_dims[1], label_dims[1],
"When Attr(softLabel) == true, the 2nd dimension of " "When Attr(softLabel) == true, the 2nd dimension of "
"Input(X) and Input(Label) should be equal."); "Input(X) and Input(Label) should be equal.");
} else { } else {
PADDLE_ENFORCE_EQ(label->dims()[1], 1, PADDLE_ENFORCE_EQ(label_dims[1], 1,
"When Attr(softLabel) == false, the 2nd dimension of " "When Attr(softLabel) == false, the 2nd dimension of "
"Input(Label) should be 1."); "Input(Label) should be 1.");
} }
ctx->SetOutputDim(framework::GradVarName("X"), x_dims);
auto dx = ctx.Output<Tensor>(framework::GradVarName("X"));
dx->Resize(x->dims());
} }
}; };
class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker { class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CrossEntropyOpMaker(framework::OpProto *proto, CrossEntropyOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor, default Tensor<float>), a 2-D tensor with shape N x D, " "(Tensor, default Tensor<float>), a 2-D tensor with shape N x D, "
......
...@@ -12,62 +12,12 @@ ...@@ -12,62 +12,12 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/framework/op_registry.h"
#include "paddle/operators/cross_entropy_op.h" #include "paddle/operators/cross_entropy_op.h"
#include "paddle/platform/assert.h"
#include "paddle/platform/hostdevice.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
template <typename T> namespace {
__global__ void CrossEntropyKernel(T* Y, const T* X, const int* label,
const int N, const int D) {
// TOOD(qingqing) define CUDA_1D_KERNEL_LOOP macro in a common file.
// CUDA_1D_KERNEL_LOOP(i, N) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
i += blockDim.x * gridDim.x) {
PADDLE_ASSERT(label[i] >= 0 && label[i] < D);
Y[i] = -TolerableValue<T>()(log(X[i * D + label[i]]));
}
}
template <typename T>
__device__ __forceinline__ T sum_single_warp(T val) {
val += __shfl_down(val, 16);
val += __shfl_down(val, 8);
val += __shfl_down(val, 4);
val += __shfl_down(val, 2);
val += __shfl_down(val, 1);
return val;
}
template <typename T>
__global__ void SoftCrossEntropyKernel(T* Y, const T* X, const T* label,
const int class_num) {
int tid = threadIdx.x;
extern __shared__ T d_sum[];
d_sum[tid] = 0;
int cur_idx = tid;
int next_idx = blockIdx.x * class_num + tid;
while (cur_idx < class_num) {
d_sum[tid] += TolerableValue<T>()(std::log(X[next_idx])) * label[next_idx];
next_idx += blockDim.x;
cur_idx += blockDim.x;
}
__syncthreads();
for (unsigned int stride = blockDim.x >> 1; stride >= 32; stride >>= 1) {
if (tid < stride) d_sum[tid] += d_sum[tid + stride];
__syncthreads();
}
T val = d_sum[tid];
val = sum_single_warp<T>(val);
if (tid == 0) Y[blockIdx.x] = -val;
}
// TODO(qingqing): make zero setting a common function. // TODO(qingqing): make zero setting a common function.
template <typename T> template <typename T>
__global__ void Zero(T* X, const int N) { __global__ void Zero(T* X, const int N) {
...@@ -100,6 +50,7 @@ __global__ void SoftCrossEntropyGradientKernel(T* dX, const T* dY, const T* X, ...@@ -100,6 +50,7 @@ __global__ void SoftCrossEntropyGradientKernel(T* dX, const T* dY, const T* X,
dX[ids] = -label[ids] * dY[row_ids] / X[ids]; dX[ids] = -label[ids] * dY[row_ids] / X[ids];
} }
} }
} // namespace
template <typename T> template <typename T>
class CrossEntropyOpCUDAKernel : public framework::OpKernel { class CrossEntropyOpCUDAKernel : public framework::OpKernel {
...@@ -107,36 +58,13 @@ class CrossEntropyOpCUDAKernel : public framework::OpKernel { ...@@ -107,36 +58,13 @@ class CrossEntropyOpCUDAKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device."); "This kernel only runs on GPU device.");
const Tensor* x = ctx.Input<Tensor>("X"); const Tensor* x = ctx.Input<Tensor>("X");
const Tensor* label = ctx.Input<Tensor>("Label"); const Tensor* label = ctx.Input<Tensor>("Label");
Tensor* y = ctx.Output<Tensor>("Y"); Tensor* y = ctx.Output<Tensor>("Y");
y->mutable_data<T>(ctx.GetPlace());
const T* x_data = x->data<T>(); math::CrossEntropyFunctor<platform::GPUPlace, T>()(
T* y_data = y->mutable_data<T>(ctx.GetPlace()); ctx, y, x, label, ctx.Attr<bool>("softLabel"));
int batch_size = x->dims()[0];
int class_num = x->dims()[1];
if (ctx.Attr<bool>("softLabel")) {
auto* label_data = ctx.Input<Tensor>("Label")->data<T>();
int block = class_num > 512 ? 512 : pow(2, int(std::log2(class_num)));
SoftCrossEntropyKernel<
T><<<batch_size, block, block * sizeof(T),
reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context())
.stream()>>>(y_data, x_data, label_data, class_num);
} else {
auto* label_data = ctx.Input<Tensor>("Label")->data<int>();
int block = 512;
int grid = (batch_size + block - 1) / block;
CrossEntropyKernel<T><<<
grid, block, 0, reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context())
.stream()>>>(y_data, x_data, label_data,
batch_size, class_num);
}
} }
}; };
...@@ -150,6 +78,7 @@ class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel { ...@@ -150,6 +78,7 @@ class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel {
const Tensor* x = ctx.Input<Tensor>("X"); const Tensor* x = ctx.Input<Tensor>("X");
const Tensor* label = ctx.Input<Tensor>("Label"); const Tensor* label = ctx.Input<Tensor>("Label");
Tensor* dx = ctx.Output<Tensor>(framework::GradVarName("X")); Tensor* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
dx->mutable_data<T>(ctx.GetPlace());
const T* dy_data = const T* dy_data =
ctx.Input<Tensor>(framework::GradVarName("Y"))->data<T>(); ctx.Input<Tensor>(framework::GradVarName("Y"))->data<T>();
......
...@@ -15,7 +15,7 @@ limitations under the License. */ ...@@ -15,7 +15,7 @@ limitations under the License. */
#pragma once #pragma once
#include "paddle/framework/eigen.h" #include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/platform/hostdevice.h" #include "paddle/operators/math/cross_entropy.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
...@@ -25,18 +25,6 @@ template <typename T, int MajorType = Eigen::RowMajor, ...@@ -25,18 +25,6 @@ template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex> typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>; using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename T>
struct TolerableValue {
HOSTDEVICE T operator()(const T& x) const {
PADDLE_ASSERT(std::is_floating_point<T>::value);
const T kApproInf = 1e20;
if (x == INFINITY) return kApproInf;
if (x == -INFINITY) return -kApproInf;
return x;
}
};
template <typename T> template <typename T>
class CrossEntropyOpKernel : public framework::OpKernel { class CrossEntropyOpKernel : public framework::OpKernel {
public: public:
...@@ -46,28 +34,10 @@ class CrossEntropyOpKernel : public framework::OpKernel { ...@@ -46,28 +34,10 @@ class CrossEntropyOpKernel : public framework::OpKernel {
const Tensor* x = ctx.Input<Tensor>("X"); const Tensor* x = ctx.Input<Tensor>("X");
const Tensor* labels = ctx.Input<Tensor>("Label"); const Tensor* labels = ctx.Input<Tensor>("Label");
Tensor* y = ctx.Output<Tensor>("Y"); Tensor* y = ctx.Output<Tensor>("Y");
T* y_data = y->mutable_data<T>(ctx.GetPlace()); y->mutable_data<T>(ctx.GetPlace());
const int batch_size = x->dims()[0];
if (ctx.Attr<bool>("softLabel")) {
auto prob = EigenMatrix<T>::From(*x);
auto lbl_mat = EigenMatrix<T>::From(*labels);
auto loss = EigenMatrix<T>::From(*y);
loss.device(ctx.GetEigenDevice<platform::CPUPlace>()) = math::CrossEntropyFunctor<platform::CPUPlace, T>()(
-((lbl_mat * prob.log().unaryExpr(TolerableValue<T>())) ctx, y, x, labels, ctx.Attr<bool>("softLabel"));
.sum(Eigen::DSizes<int, 1>(1))
.reshape(Eigen::DSizes<int, 2>(batch_size, 1)));
} else {
const int class_num = x->dims()[1];
const T* x_data = x->data<T>();
const int* label_data = labels->data<int>();
for (int i = 0; i < batch_size; ++i) {
int index = i * class_num + label_data[i];
y_data[i] = -TolerableValue<T>()(std::log(x_data[index]));
}
}
} }
}; };
......
...@@ -24,25 +24,25 @@ class DropoutOp : public framework::OperatorWithKernel { ...@@ -24,25 +24,25 @@ class DropoutOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) must not be null."); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) must not be null.");
PADDLE_ENFORCE_GE(ctx.Attr<float>("dropout_prob"), 0); PADDLE_ENFORCE_GE(ctx->Attrs().Get<float>("dropout_prob"), 0);
PADDLE_ENFORCE_LE(ctx.Attr<float>("dropout_prob"), 1); PADDLE_ENFORCE_LE(ctx->Attrs().Get<float>("dropout_prob"), 1);
auto dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
ctx.Output<Tensor>("Out")->Resize(dims); ctx->SetOutputDim("Out", x_dims);
if (ctx.Attr<bool>("is_training")) { if (ctx->Attrs().Get<bool>("is_training") == 1) {
ctx.Output<Tensor>("Mask")->Resize(dims); ctx->SetOutputDim("Mask", x_dims);
} }
ctx.ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
} }
}; };
template <typename AttrType> template <typename AttrType>
class DropoutOpMaker : public framework::OpProtoAndCheckerMaker { class DropoutOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
DropoutOpMaker(framework::OpProto *proto, DropoutOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddAttr<AttrType>("dropout_prob", "Probability of setting units to zero.") AddAttr<AttrType>("dropout_prob", "Probability of setting units to zero.")
.SetDefault(.5f); .SetDefault(.5f);
...@@ -70,27 +70,26 @@ class DropoutOpGrad : public framework::OperatorWithKernel { ...@@ -70,27 +70,26 @@ class DropoutOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE(ctx.Attr<bool>("is_training"), PADDLE_ENFORCE_EQ(ctx->Attrs().Get<bool>("is_training"), 1,
"GradOp is only callable when is_training is true"); "GradOp is only callable when is_training is true");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) must not be null."); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) must not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Mask"), "Mask must not be null."); PADDLE_ENFORCE(ctx->HasInput("Mask"), "Mask must not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) must not be null."); "Input(Out@GRAD) must not be null.");
PADDLE_ENFORCE_GE(ctx.Attr<AttrType>("dropout_prob"), 0); PADDLE_ENFORCE_GE(ctx->Attrs().Get<AttrType>("dropout_prob"), 0);
PADDLE_ENFORCE_LE(ctx.Attr<AttrType>("dropout_prob"), 1); PADDLE_ENFORCE_LE(ctx->Attrs().Get<AttrType>("dropout_prob"), 1);
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims(); auto out_dims = ctx->GetInputDim(framework::GradVarName("Out"));
PADDLE_ENFORCE_EQ(x_dims, out_dims, PADDLE_ENFORCE_EQ(x_dims, out_dims,
"Dimensions of Input(X) and Out@Grad must be the same."); "Dimensions of Input(X) and Out@Grad must be the same.");
auto mask_dims = ctx.Input<Tensor>("Mask")->dims(); auto mask_dims = ctx->GetInputDim("Mask");
PADDLE_ENFORCE_EQ(x_dims, mask_dims, PADDLE_ENFORCE_EQ(x_dims, mask_dims,
"Dimensions of Input(X) and Mask must be the same."); "Dimensions of Input(X) and Mask must be the same.");
auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X")); ctx->SetOutputDim(framework::GradVarName("X"), x_dims);
x_grad->Resize(x_dims);
} }
}; };
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
limitations under the License. */ limitations under the License. */
#include "paddle/operators/elementwise_add_op.h" #include "paddle/operators/elementwise_add_op.h"
#include "paddle/operators/elementwise_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
#pragma once #pragma once
#include "paddle/operators/elementwise_op.h" #include "paddle/operators/elementwise_op_function.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
limitations under the License. */ limitations under the License. */
#include "paddle/operators/elementwise_div_op.h" #include "paddle/operators/elementwise_div_op.h"
#include "paddle/operators/elementwise_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
#pragma once #pragma once
#include "paddle/operators/elementwise_op.h" #include "paddle/operators/elementwise_op_function.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
limitations under the License. */ limitations under the License. */
#include "paddle/operators/elementwise_mul_op.h" #include "paddle/operators/elementwise_mul_op.h"
#include "paddle/operators/elementwise_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
limitations under the License. */ limitations under the License. */
#pragma once #pragma once
#include "paddle/operators/elementwise_op.h" #include "paddle/operators/elementwise_op_function.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.
You may obtain a copy of the License at You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#pragma once #pragma once
#include <iostream>
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/math/math_function.h" #include "paddle/framework/operator.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
/*
* Out = X ⊙ Y
* If Y's shape does not match X' shape, they will be reshaped.
* For example:
* 1. shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
* pre=2, n=3*4, post=5
* x.shape(2, 12, 5) * y.shape(1,12,1).broadcast(2,12,5)
* 2. shape(X) = (2, 3, 4, 5), shape(Y) = (4,5)
* pre=2*3, n=4*5, post=1
* x.shape(2, 3, 20) * y.shape(1,1,20).broadcast(2,3,20)
*/
inline void get_mid_dims(const framework::DDim& x_dims,
const framework::DDim& y_dims, const int axis,
int& pre, int& n, int& post) {
pre = 1;
n = 1;
post = 1;
for (int i = 0; i < axis; ++i) {
pre *= x_dims[i];
}
for (int i = 0; i < y_dims.size(); ++i) {
PADDLE_ENFORCE_EQ(x_dims[i + axis], y_dims[i],
"Broadcast dimension mismatch.");
n *= y_dims[i];
}
for (int i = axis + y_dims.size(); i < x_dims.size(); ++i) {
post *= x_dims[i];
}
}
#define EIGEN_FUNCTOR(name, eigen_op) \
struct Eigen##name##Functor { \
template <typename Place, typename T> \
inline void Run(const framework::Tensor* x, const framework::Tensor* y, \
framework::Tensor* z, \
const framework::ExecutionContext& ctx) { \
auto x_e = framework::EigenVector<T>::Flatten(*x); \
auto y_e = framework::EigenVector<T>::Flatten(*y); \
auto z_e = framework::EigenVector<T>::Flatten(*z); \
z_e.device(ctx.GetEigenDevice<Place>()) = eigen_op(x_e, y_e); \
} \
template <typename Place, typename T> \
inline void RunBroadCast(const framework::Tensor* x, \
const framework::Tensor* y, framework::Tensor* z, \
const framework::ExecutionContext& ctx, int pre, \
int n) { \
auto x_e = framework::EigenVector<T>::Flatten(*x); \
auto y_e = framework::EigenVector<T>::Flatten(*y); \
auto z_e = framework::EigenVector<T>::Flatten(*z); \
auto y_bcast = y_e.reshape(Eigen::DSizes<int, 2>(1, n)) \
.broadcast(Eigen::DSizes<int, 2>(pre, 1)) \
.reshape(Eigen::DSizes<int, 1>(x_e.size())); \
z_e.device(ctx.GetEigenDevice<Place>()) = eigen_op(x_e, y_bcast); \
} \
template <typename Place, typename T> \
inline void RunBroadCast2(const framework::Tensor* x, \
const framework::Tensor* y, \
framework::Tensor* z, \
const framework::ExecutionContext& ctx, int pre, \
int n, int post) { \
auto x_e = framework::EigenVector<T>::Flatten(*x); \
auto y_e = framework::EigenVector<T>::Flatten(*y); \
auto z_e = framework::EigenVector<T>::Flatten(*z); \
auto y_bcast = y_e.reshape(Eigen::DSizes<int, 3>(1, n, 1)) \
.broadcast(Eigen::DSizes<int, 3>(pre, 1, post)) \
.reshape(Eigen::DSizes<int, 1>(x_e.size())); \
z_e.device(ctx.GetEigenDevice<Place>()) = eigen_op(x_e, y_bcast); \
} \
}
template <class functor, typename Place, typename T>
void ElementwiseCompute(const framework::ExecutionContext& ctx) {
using Tensor = framework::Tensor;
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* z = ctx.Output<Tensor>("Out");
z->mutable_data<T>(ctx.GetPlace());
auto x_dims = x->dims();
auto y_dims = y->dims();
PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(),
"Rank of first input must >= rank of second input.")
if (x_dims == y_dims || product(y_dims) == 1) {
functor f;
f.template Run<Place, T>(x, y, z, ctx);
return;
}
int axis = ctx.Attr<int>("axis");
axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis);
PADDLE_ENFORCE(axis >= 0 && axis < x_dims.size(),
"Axis should be in range [0, x_dims)");
int pre, n, post;
get_mid_dims(x_dims, y_dims, axis, pre, n, post);
if (post == 1) {
functor f;
f.template RunBroadCast<Place, T>(x, y, z, ctx, pre, n);
return;
} else {
functor f;
f.template RunBroadCast2<Place, T>(x, y, z, ctx, pre, n, post);
return;
}
}
#define EIGEN_ADD(x, y) ((x) + (y))
EIGEN_FUNCTOR(Add, EIGEN_ADD);
#define EIGEN_SUB(x, y) ((x) - (y))
EIGEN_FUNCTOR(Sub, EIGEN_SUB);
#define EIGEN_MUL(x, y) ((x) * (y))
EIGEN_FUNCTOR(Mul, EIGEN_MUL);
#define EIGEN_DIV(x, y) ((x) / (y))
EIGEN_FUNCTOR(Div, EIGEN_DIV);
template <typename Place, typename T, typename functor, typename functor1,
typename broadcastfunctor, typename broadcast2functor>
void ElementwiseGradCompute(const framework::ExecutionContext& ctx) {
using Tensor = framework::Tensor;
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Input<Tensor>("Out");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto place = ctx.GetEigenDevice<Place>();
auto x_dims = x->dims();
auto y_dims = y->dims();
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
}
if (x_dims == y_dims) {
functor f;
f(place, x, y, out, dx, dy, dout);
return;
}
if (product(y_dims) == 1) {
functor1 f;
f(place, x, y, out, dx, dy, dout);
return;
}
int axis = ctx.Attr<int>("axis");
axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis);
int pre, n, post;
get_mid_dims(x_dims, y_dims, axis, pre, n, post);
if (post == 1) {
broadcastfunctor f;
f(place, x, y, out, dx, dy, dout, pre, n);
return;
} else {
broadcast2functor f;
f(place, x, y, out, dx, dy, dout, pre, n, post);
return;
}
}
class ElementwiseOp : public framework::OperatorWithKernel { class ElementwiseOp : public framework::OperatorWithKernel {
public: public:
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of elementwise op should not be null"); "Input(X) of elementwise op should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), PADDLE_ENFORCE(ctx->HasInput("Y"),
"Input(Y) of elementwise op should not be null"); "Input(Y) of elementwise op should not be null");
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasOutput("Out"),
ctx.OutputVar("Out"),
"Output(Out) of elementwise op should not be null."); "Output(Out) of elementwise op should not be null.");
auto x_dim = ctx.Input<Tensor>("X")->dims(); auto x_dim = ctx->GetInputDim("X");
auto y_dim = ctx.Input<Tensor>("Y")->dims(); auto y_dim = ctx->GetInputDim("Y");
PADDLE_ENFORCE_GE(x_dim.size(), y_dim.size(), PADDLE_ENFORCE_GE(x_dim.size(), y_dim.size(),
"Rank of first input must >= rank of second input.") "Rank of first input must >= rank of second input.")
ctx.Output<framework::Tensor>("Out")->Resize(x_dim); ctx->SetOutputDim("Out", x_dim);
ctx.ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
} }
}; };
...@@ -284,27 +106,26 @@ class ElementwiseOpGrad : public framework::OperatorWithKernel { ...@@ -284,27 +106,26 @@ class ElementwiseOpGrad : public framework::OperatorWithKernel {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) should not be null"); PADDLE_ENFORCE(ctx->HasInput("Y"), "Input(Y) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto y_dims = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx->GetInputDim("Y");
auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims(); auto out_dims = ctx->GetInputDim(framework::GradVarName("Out"));
auto* x_grad = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
auto* y_grad = ctx.Output<framework::Tensor>(framework::GradVarName("Y"));
PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(), PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(),
"Rank of first input must >= rank of second input.") "Rank of first input must >= rank of second input.")
if (x_grad) { auto x_grad_name = framework::GradVarName("X");
x_grad->Resize(x_dims); auto y_grad_name = framework::GradVarName("Y");
if (ctx->HasOutput(x_grad_name)) {
ctx->SetOutputDim(x_grad_name, x_dims);
} }
if (ctx->HasOutput(y_grad_name)) {
if (y_grad) { ctx->SetOutputDim(y_grad_name, y_dims);
y_grad->Resize(y_dims);
} }
} }
}; };
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
#include "paddle/framework/operator.h"
#include "paddle/operators/math/math_function.h"
namespace paddle {
namespace operators {
/*
* Out = X ⊙ Y
* If Y's shape does not match X' shape, they will be reshaped.
* For example:
* 1. shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
* pre=2, n=3*4, post=5
* x.shape(2, 12, 5) * y.shape(1,12,1).broadcast(2,12,5)
* 2. shape(X) = (2, 3, 4, 5), shape(Y) = (4,5)
* pre=2*3, n=4*5, post=1
* x.shape(2, 3, 20) * y.shape(1,1,20).broadcast(2,3,20)
*/
inline void get_mid_dims(const framework::DDim& x_dims,
const framework::DDim& y_dims, const int axis,
int& pre, int& n, int& post) {
pre = 1;
n = 1;
post = 1;
for (int i = 0; i < axis; ++i) {
pre *= x_dims[i];
}
for (int i = 0; i < y_dims.size(); ++i) {
PADDLE_ENFORCE_EQ(x_dims[i + axis], y_dims[i],
"Broadcast dimension mismatch.");
n *= y_dims[i];
}
for (int i = axis + y_dims.size(); i < x_dims.size(); ++i) {
post *= x_dims[i];
}
}
#define EIGEN_FUNCTOR(name, eigen_op) \
struct Eigen##name##Functor { \
template <typename Place, typename T> \
inline void Run(const framework::Tensor* x, const framework::Tensor* y, \
framework::Tensor* z, \
const framework::ExecutionContext& ctx) { \
auto x_e = framework::EigenVector<T>::Flatten(*x); \
auto y_e = framework::EigenVector<T>::Flatten(*y); \
auto z_e = framework::EigenVector<T>::Flatten(*z); \
z_e.device(ctx.GetEigenDevice<Place>()) = eigen_op(x_e, y_e); \
} \
template <typename Place, typename T> \
inline void RunBroadCast(const framework::Tensor* x, \
const framework::Tensor* y, framework::Tensor* z, \
const framework::ExecutionContext& ctx, int pre, \
int n) { \
auto x_e = framework::EigenVector<T>::Flatten(*x); \
auto y_e = framework::EigenVector<T>::Flatten(*y); \
auto z_e = framework::EigenVector<T>::Flatten(*z); \
auto y_bcast = y_e.reshape(Eigen::DSizes<int, 2>(1, n)) \
.broadcast(Eigen::DSizes<int, 2>(pre, 1)) \
.reshape(Eigen::DSizes<int, 1>(x_e.size())); \
z_e.device(ctx.GetEigenDevice<Place>()) = eigen_op(x_e, y_bcast); \
} \
template <typename Place, typename T> \
inline void RunBroadCast2(const framework::Tensor* x, \
const framework::Tensor* y, \
framework::Tensor* z, \
const framework::ExecutionContext& ctx, int pre, \
int n, int post) { \
auto x_e = framework::EigenVector<T>::Flatten(*x); \
auto y_e = framework::EigenVector<T>::Flatten(*y); \
auto z_e = framework::EigenVector<T>::Flatten(*z); \
auto y_bcast = y_e.reshape(Eigen::DSizes<int, 3>(1, n, 1)) \
.broadcast(Eigen::DSizes<int, 3>(pre, 1, post)) \
.reshape(Eigen::DSizes<int, 1>(x_e.size())); \
z_e.device(ctx.GetEigenDevice<Place>()) = eigen_op(x_e, y_bcast); \
} \
}
template <class functor, typename Place, typename T>
void ElementwiseCompute(const framework::ExecutionContext& ctx) {
using Tensor = framework::Tensor;
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* z = ctx.Output<Tensor>("Out");
z->mutable_data<T>(ctx.GetPlace());
auto x_dims = x->dims();
auto y_dims = y->dims();
PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(),
"Rank of first input must >= rank of second input.")
if (x_dims == y_dims || product(y_dims) == 1) {
functor f;
f.template Run<Place, T>(x, y, z, ctx);
return;
}
int axis = ctx.Attr<int>("axis");
axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis);
PADDLE_ENFORCE(axis >= 0 && axis < x_dims.size(),
"Axis should be in range [0, x_dims)");
int pre, n, post;
get_mid_dims(x_dims, y_dims, axis, pre, n, post);
if (post == 1) {
functor f;
f.template RunBroadCast<Place, T>(x, y, z, ctx, pre, n);
return;
} else {
functor f;
f.template RunBroadCast2<Place, T>(x, y, z, ctx, pre, n, post);
return;
}
}
#define EIGEN_ADD(x, y) ((x) + (y))
EIGEN_FUNCTOR(Add, EIGEN_ADD);
#define EIGEN_SUB(x, y) ((x) - (y))
EIGEN_FUNCTOR(Sub, EIGEN_SUB);
#define EIGEN_MUL(x, y) ((x) * (y))
EIGEN_FUNCTOR(Mul, EIGEN_MUL);
#define EIGEN_DIV(x, y) ((x) / (y))
EIGEN_FUNCTOR(Div, EIGEN_DIV);
template <typename Place, typename T, typename functor, typename functor1,
typename broadcastfunctor, typename broadcast2functor>
void ElementwiseGradCompute(const framework::ExecutionContext& ctx) {
using Tensor = framework::Tensor;
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Input<Tensor>("Out");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto place = ctx.GetEigenDevice<Place>();
auto x_dims = x->dims();
auto y_dims = y->dims();
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
}
if (x_dims == y_dims) {
functor f;
f(place, x, y, out, dx, dy, dout);
return;
}
if (product(y_dims) == 1) {
functor1 f;
f(place, x, y, out, dx, dy, dout);
return;
}
int axis = ctx.Attr<int>("axis");
axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis);
int pre, n, post;
get_mid_dims(x_dims, y_dims, axis, pre, n, post);
if (post == 1) {
broadcastfunctor f;
f(place, x, y, out, dx, dy, dout, pre, n);
return;
} else {
broadcast2functor f;
f(place, x, y, out, dx, dy, dout, pre, n, post);
return;
}
}
} // namespace operators
} // namespace paddle
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
limitations under the License. */ limitations under the License. */
#include "paddle/operators/elementwise_sub_op.h" #include "paddle/operators/elementwise_sub_op.h"
#include "paddle/operators/elementwise_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
limitations under the License. */ limitations under the License. */
#pragma once #pragma once
#include "paddle/operators/elementwise_op.h" #include "paddle/operators/elementwise_op_function.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -22,15 +22,13 @@ class FillZerosLikeOp : public framework::OperatorWithKernel { ...@@ -22,15 +22,13 @@ class FillZerosLikeOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of FillZerosLikeOp should not be null."); "Input(X) of FillZerosLikeOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Y"), PADDLE_ENFORCE(ctx->HasOutput("Y"),
"Output(Y) of FillZerosLikeOp should not be null."); "Output(Y) of FillZerosLikeOp should not be null.");
ctx->SetOutputDim("Y", ctx->GetInputDim("X"));
ctx.Output<framework::Tensor>("Y")->Resize( ctx->ShareLoD("X", /*->*/ "Y");
ctx.Input<framework::Tensor>("X")->dims());
ctx.ShareLoD("X", /*->*/ "Y");
} }
}; };
......
...@@ -23,19 +23,19 @@ class GatherOp : public framework::OperatorWithKernel { ...@@ -23,19 +23,19 @@ class GatherOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of GatherOp should not be null."); "Input(X) of GatherOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Index"), PADDLE_ENFORCE(ctx->HasInput("Index"),
"Input(Index) of GatherOp should not be null."); "Input(Index) of GatherOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of GatherOp should not be null."); "Output(Out) of GatherOp should not be null.");
int batch_size = ctx.Input<Tensor>("Index")->dims()[0]; int batch_size = ctx->GetInputDim("Index")[0];
PADDLE_ENFORCE_GE(batch_size, 0, "Batch size must be >0"); PADDLE_ENFORCE_GE(batch_size, 0, "Batch size must be >0");
framework::DDim output_dims(ctx.Input<Tensor>("X")->dims()); framework::DDim output_dims(ctx->GetInputDim("X"));
output_dims[0] = batch_size; output_dims[0] = batch_size;
ctx.Output<framework::Tensor>("Out")->Resize(output_dims); ctx->SetOutputDim("Out", output_dims);
} }
}; };
...@@ -44,17 +44,14 @@ class GatherGradOp : public framework::OperatorWithKernel { ...@@ -44,17 +44,14 @@ class GatherGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
auto X_grad = ctx.Output<framework::Tensor>(framework::GradVarName("X")); ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
auto X = ctx.Input<Tensor>("X");
X_grad->Resize(X->dims());
} }
}; };
class GatherOpMaker : public framework::OpProtoAndCheckerMaker { class GatherOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
GatherOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) GatherOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The source input of gather op"); AddInput("X", "The source input of gather op");
AddInput("Index", "The index input of gather op"); AddInput("Index", "The index input of gather op");
......
...@@ -43,13 +43,10 @@ class GaussianRandomOp : public framework::OperatorWithKernel { ...@@ -43,13 +43,10 @@ class GaussianRandomOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasOutput("Out"),
ctx.OutputVar("Out"),
"Output(Out) of GaussianRandomOp should not be null."); "Output(Out) of GaussianRandomOp should not be null.");
auto dims = ctx->Attrs().Get<std::vector<int>>("dims");
auto* tensor = ctx.Output<framework::Tensor>("Out");
auto dims = Attr<std::vector<int>>("dims");
std::vector<int64_t> temp; std::vector<int64_t> temp;
temp.reserve(dims.size()); temp.reserve(dims.size());
for (auto dim : dims) { for (auto dim : dims) {
...@@ -57,7 +54,7 @@ class GaussianRandomOp : public framework::OperatorWithKernel { ...@@ -57,7 +54,7 @@ class GaussianRandomOp : public framework::OperatorWithKernel {
} }
PADDLE_ENFORCE(dims.size() > 0UL, PADDLE_ENFORCE(dims.size() > 0UL,
"dims can be one int or array. dims must be set."); "dims can be one int or array. dims must be set.");
tensor->Resize(framework::make_ddim(temp)); ctx->SetOutputDim("Out", framework::make_ddim(temp));
} }
}; };
......
...@@ -22,27 +22,26 @@ class LookupTableOp : public framework::OperatorWithKernel { ...@@ -22,27 +22,26 @@ class LookupTableOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("W"), PADDLE_ENFORCE(ctx->HasInput("W"),
"Input(W) of LookupTableOp should not be null."); "Input(W) of LookupTableOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Ids"), PADDLE_ENFORCE(ctx->HasInput("Ids"),
"Input(Ids) of LookupTableOp should not be null."); "Input(Ids) of LookupTableOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of LookupTableOp should not be null."); "Output(Out) of LookupTableOp should not be null.");
auto table_t = ctx.Input<Tensor>("W"); auto table_dims = ctx->GetInputDim("W");
auto ids_t = ctx.Input<Tensor>("Ids"); auto ids_dims = ctx->GetInputDim("Ids");
auto output_t = ctx.Output<framework::Tensor>("Out");
output_t->Resize({ids_t->dims()[0], table_t->dims()[1]}); ctx->SetOutputDim("Out", {ids_dims[0], table_dims[1]});
ctx.ShareLoD("Ids", /*->*/ "Out"); ctx->ShareLoD("Ids", /*->*/ "Out");
} }
}; };
class LookupTableOpMaker : public framework::OpProtoAndCheckerMaker { class LookupTableOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LookupTableOpMaker(framework::OpProto *proto, LookupTableOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("W", AddInput("W",
"An input represents embedding tensors," "An input represents embedding tensors,"
...@@ -66,11 +65,9 @@ class LookupTableOpGrad : public framework::OperatorWithKernel { ...@@ -66,11 +65,9 @@ class LookupTableOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &context) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
auto table = context.Input<Tensor>("W"); auto table_dims = ctx->GetInputDim("W");
auto d_table = ctx->SetOutputDim(framework::GradVarName("W"), table_dims);
context.Output<framework::Tensor>(framework::GradVarName("W"));
d_table->Resize(table->dims());
} }
}; };
......
...@@ -22,37 +22,36 @@ class LstmUnitOp : public framework::OperatorWithKernel { ...@@ -22,37 +22,36 @@ class LstmUnitOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) of LSTM should not be null.");
"Input(X) of LSTM should not be null."); PADDLE_ENFORCE(ctx->HasInput("C_prev"),
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("C_prev"),
"Input(C_prev) of LSTM should not be null."); "Input(C_prev) of LSTM should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("C"), PADDLE_ENFORCE(ctx->HasOutput("C"),
"Output(C) of LSTM should not be null."); "Output(C) of LSTM should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("H"), PADDLE_ENFORCE(ctx->HasOutput("H"),
"Output(H) of LSTM should not be null."); "Output(H) of LSTM should not be null.");
auto *x = ctx.Input<framework::Tensor>("X"); auto x_dims = ctx->GetInputDim("X");
auto *c_prev = ctx.Input<framework::Tensor>("C_prev"); auto c_prev_dims = ctx->GetInputDim("C_prev");
PADDLE_ENFORCE_EQ(x->dims().size(), 2, "Input(X)'s rank must be 2."); PADDLE_ENFORCE_EQ(x_dims.size(), 2, "Input(X)'s rank must be 2.");
PADDLE_ENFORCE(x->dims()[0] == c_prev->dims()[0], PADDLE_ENFORCE(x_dims[0] == c_prev_dims[0],
"Batch size of inputs and states must be equal"); "Batch size of inputs and states must be equal");
PADDLE_ENFORCE(x->dims()[1] == c_prev->dims()[1] * 4, PADDLE_ENFORCE(x_dims[1] == c_prev_dims[1] * 4,
"Dimension of FC should equal to prev state * 4"); "Dimension of FC should equal to prev state * 4");
int b_size = c_prev->dims()[0]; // batch size int b_size = c_prev_dims[0]; // batch size
int s_dim = c_prev->dims()[1]; // state dim int s_dim = c_prev_dims[1]; // state dim
ctx.Output<framework::LoDTensor>("C")->Resize({b_size, s_dim}); ctx->SetOutputDim("C", {b_size, s_dim});
ctx.Output<framework::LoDTensor>("H")->Resize({b_size, s_dim}); ctx->SetOutputDim("H", {b_size, s_dim});
} }
}; };
template <typename AttrType> template <typename AttrType>
class LstmUnitOpMaker : public framework::OpProtoAndCheckerMaker { class LstmUnitOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LstmUnitOpMaker(framework::OpProto *proto, LstmUnitOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "FC input before the non-linear activation."); AddInput("X", "FC input before the non-linear activation.");
AddInput( AddInput(
...@@ -79,15 +78,14 @@ class LstmUnitGradOp : public framework::OperatorWithKernel { ...@@ -79,15 +78,14 @@ class LstmUnitGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("C")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("C")),
"Input(C@GRAD) should not be null"); "Input(C@GRAD) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("H")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("H")),
"Input(H@GRAD) should not be null"); "Input(H@GRAD) should not be null");
ctx.Output<framework::LoDTensor>(framework::GradVarName("X")) ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
->Resize(ctx.Input<Tensor>("X")->dims()); ctx->SetOutputDim(framework::GradVarName("C_prev"),
ctx.Output<framework::LoDTensor>(framework::GradVarName("C_prev")) ctx->GetInputDim("C_prev"));
->Resize(ctx.Input<Tensor>("C_prev")->dims());
} }
}; };
......
if(WITH_GPU) if(WITH_GPU)
nv_library(math_function SRCS math_function.cc math_function.cu im2col.cc nv_library(math_function SRCS math_function.cc math_function.cu im2col.cc
im2col.cu pooling.cc pooling.cu DEPS cblas device_context) im2col.cu pooling.cc pooling.cu DEPS cblas device_context)
nv_library(softmax_function SRCS softmax.cc softmax.cu
DEPS operator)
nv_library(cross_entropy_function SRCS cross_entropy.cc cross_entropy.cu
DEPS operator)
else() else()
cc_library(math_function SRCS math_function.cc im2col.cc pooling.cc DEPS cblas device_context) cc_library(math_function SRCS math_function.cc im2col.cc pooling.cc DEPS cblas device_context)
cc_library(softmax_function SRCS softmax.cc DEPS operator)
cc_library(cross_entropy_function SRCS cross_entropy.cc DEPS operator)
endif() endif()
nv_test(math_function_test SRCS math_function_test.cc DEPS math_function tensor) nv_test(math_function_test SRCS math_function_test.cc DEPS math_function tensor)
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/math/cross_entropy.h"
namespace paddle {
namespace operators {
namespace math {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename T>
class CrossEntropyFunctor<platform::CPUPlace, T> {
public:
void operator()(const framework::ExecutionContext& ctx,
framework::Tensor* out, const framework::Tensor* prob,
const framework::Tensor* labels, const bool softLabel) {
const int batch_size = prob->dims()[0];
if (softLabel) {
auto in = EigenMatrix<T>::From(*prob);
auto lbl = EigenMatrix<T>::From(*labels);
auto loss = EigenMatrix<T>::From(*out);
loss.device(ctx.GetEigenDevice<platform::CPUPlace>()) =
-((lbl * in.log().unaryExpr(math::TolerableValue<T>()))
.sum(Eigen::DSizes<int, 1>(1))
.reshape(Eigen::DSizes<int, 2>(batch_size, 1)));
} else {
const int class_num = prob->dims()[1];
const T* prob_data = prob->data<T>();
T* loss_data = out->data<T>();
const int* label_data = labels->data<int>();
for (int i = 0; i < batch_size; ++i) {
int index = i * class_num + label_data[i];
loss_data[i] = -math::TolerableValue<T>()(std::log(prob_data[index]));
}
}
}
};
template class CrossEntropyFunctor<platform::CPUPlace, float>;
} // namespace math
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/math/cross_entropy.h"
namespace paddle {
namespace operators {
namespace math {
namespace {
template <typename T>
__global__ void CrossEntropyKernel(T* Y, const T* X, const int* label,
const int N, const int D) {
// TOOD(qingqing) define CUDA_1D_KERNEL_LOOP macro in a common file.
// CUDA_1D_KERNEL_LOOP(i, N) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
i += blockDim.x * gridDim.x) {
PADDLE_ASSERT(label[i] >= 0 && label[i] < D);
Y[i] = -math::TolerableValue<T>()(log(X[i * D + label[i]]));
}
}
template <typename T>
__device__ __forceinline__ T sum_single_warp(T val) {
val += __shfl_down(val, 16);
val += __shfl_down(val, 8);
val += __shfl_down(val, 4);
val += __shfl_down(val, 2);
val += __shfl_down(val, 1);
return val;
}
template <typename T>
__global__ void SoftCrossEntropyKernel(T* Y, const T* X, const T* label,
const int class_num) {
int tid = threadIdx.x;
extern __shared__ T d_sum[];
d_sum[tid] = 0;
int cur_idx = tid;
int next_idx = blockIdx.x * class_num + tid;
while (cur_idx < class_num) {
d_sum[tid] +=
math::TolerableValue<T>()(std::log(X[next_idx])) * label[next_idx];
next_idx += blockDim.x;
cur_idx += blockDim.x;
}
__syncthreads();
for (unsigned int stride = blockDim.x >> 1; stride >= 32; stride >>= 1) {
if (tid < stride) d_sum[tid] += d_sum[tid + stride];
__syncthreads();
}
T val = d_sum[tid];
val = sum_single_warp<T>(val);
if (tid == 0) Y[blockIdx.x] = -val;
}
} // namespace
using Tensor = framework::Tensor;
template <typename T>
class CrossEntropyFunctor<platform::GPUPlace, T> {
public:
void operator()(const framework::ExecutionContext& ctx,
framework::Tensor* out, const framework::Tensor* prob,
const framework::Tensor* labels, bool softLabel) {
const T* prob_data = prob->data<T>();
T* loss_data = out->mutable_data<T>(ctx.GetPlace());
int batch_size = prob->dims()[0];
int class_num = prob->dims()[1];
if (softLabel) {
const T* label_data = labels->data<T>();
int block = class_num > 512 ? 512 : pow(2, int(std::log2(class_num)));
SoftCrossEntropyKernel<
T><<<batch_size, block, block * sizeof(T),
reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context())
.stream()>>>(loss_data, prob_data, label_data, class_num);
} else {
const int* label_data = labels->data<int>();
int block = 512;
int grid = (batch_size + block - 1) / block;
CrossEntropyKernel<T><<<
grid, block, 0, reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context())
.stream()>>>(loss_data, prob_data, label_data,
batch_size, class_num);
}
}
};
template class CrossEntropyFunctor<platform::GPUPlace, float>;
} // namespace math
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/operator.h"
#include "paddle/framework/tensor.h"
#include "paddle/platform/hostdevice.h"
namespace paddle {
namespace operators {
namespace math {
template <typename T>
struct TolerableValue {
HOSTDEVICE T operator()(const T& x) const {
PADDLE_ASSERT(std::is_floating_point<T>::value);
const T kApproInf = 1e20;
if (x == INFINITY) return kApproInf;
if (x == -INFINITY) return -kApproInf;
return x;
}
};
template <typename Place, typename T>
class CrossEntropyFunctor {
public:
// (TODO caoying) it is much better to use DeviceContext as the first
// parameter.
void operator()(const framework::ExecutionContext& context,
framework::Tensor* out, const framework::Tensor* prob,
const framework::Tensor* labels, const bool softLabel);
};
} // namespace math
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/math/softmax.h"
namespace paddle {
namespace operators {
namespace math {
template class SoftmaxFunctor<platform::CPUPlace, float>;
} // namespace math
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/math/softmax.h"
namespace paddle {
namespace operators {
namespace math {
template class SoftmaxFunctor<platform::GPUPlace, float>;
} // namespace math
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/operator.h"
#include "paddle/framework/tensor.h"
namespace paddle {
namespace operators {
namespace math {
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename T>
struct ValueClip {
HOSTDEVICE T operator()(const T& x) const {
const T kThreshold = -64.;
return x < kThreshold ? kThreshold : x;
}
};
template <typename Place, typename T>
class SoftmaxFunctor {
public:
void operator()(const framework::ExecutionContext& context,
const framework::Tensor* X, framework::Tensor* Y) {
auto logits = EigenMatrix<T>::From(*X);
auto softmax = EigenMatrix<T>::From(*Y);
const int kBatchDim = 0;
const int kClassDim = 1;
const int batch_size = logits.dimension(kBatchDim);
const int num_classes = logits.dimension(kClassDim);
Eigen::DSizes<int, 1> along_class(kClassDim);
Eigen::DSizes<int, 2> batch_by_one(batch_size, 1);
Eigen::DSizes<int, 2> one_by_class(1, num_classes);
auto shifted_logits = (logits -
logits.maximum(along_class)
.eval()
.reshape(batch_by_one)
.broadcast(one_by_class))
.unaryExpr(ValueClip<T>());
softmax.device(context.GetEigenDevice<Place>()) = shifted_logits.exp();
softmax.device(context.GetEigenDevice<Place>()) =
(softmax *
softmax.sum(along_class)
.inverse()
.eval()
.reshape(batch_by_one)
.broadcast(one_by_class));
}
};
} // namespace math
} // namespace operators
} // namespace paddle
...@@ -22,18 +22,18 @@ class MeanOp : public framework::OperatorWithKernel { ...@@ -22,18 +22,18 @@ class MeanOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of MeanOp should not be null."); "Input(X) of MeanOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of MeanOp should not be null."); "Output(Out) of MeanOp should not be null.");
ctx.Output<framework::Tensor>("Out")->Resize({1}); ctx->SetOutputDim("Out", {1});
} }
}; };
class MeanOpMaker : public framework::OpProtoAndCheckerMaker { class MeanOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MeanOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) MeanOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of mean op"); AddInput("X", "The input of mean op");
AddOutput("Out", "The output of mean op").NotInGradient(); AddOutput("Out", "The output of mean op").NotInGradient();
...@@ -47,9 +47,8 @@ class MeanGradOp : public framework::OperatorWithKernel { ...@@ -47,9 +47,8 @@ class MeanGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
ctx.Output<framework::Tensor>(framework::GradVarName("X")) ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
->Resize(ctx.Input<Tensor>("X")->dims());
} }
}; };
......
...@@ -26,22 +26,22 @@ class MinusOp : public framework::OperatorWithKernel { ...@@ -26,22 +26,22 @@ class MinusOp : public framework::OperatorWithKernel {
: OperatorWithKernel(type, inputs, outputs, attrs) {} : OperatorWithKernel(type, inputs, outputs, attrs) {}
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of MinusOp should not be null."); "Input(X) of MinusOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), PADDLE_ENFORCE(ctx->HasInput("Y"),
"Input(Y) of MinusOp should not be null."); "Input(Y) of MinusOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of MinusOp should not be null."); "Output(Out) of MinusOp should not be null.");
auto *left_tensor = ctx.Input<framework::Tensor>("X"); auto x_dims = ctx->GetInputDim("X");
auto *right_tensor = ctx.Input<framework::Tensor>("Y"); auto y_dims = ctx->GetInputDim("Y");
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
left_tensor->numel(), right_tensor->numel(), x_dims, y_dims,
"Minus operator must take two tensor with same num of elements"); "Minus operator must take two tensor with same num of elements");
ctx.Output<framework::Tensor>("Out")->Resize(left_tensor->dims()); ctx->SetOutputDim("Out", x_dims);
ctx.ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
} }
}; };
......
...@@ -22,20 +22,19 @@ class ModifiedHuberLossOp : public framework::OperatorWithKernel { ...@@ -22,20 +22,19 @@ class ModifiedHuberLossOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& context) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(context.InputVar("X"), "X must be initialized."); PADDLE_ENFORCE(ctx->HasInput("X"), "X must be initialized.");
PADDLE_ENFORCE_NOT_NULL(context.InputVar("Y"), "Y must be initialized."); PADDLE_ENFORCE(ctx->HasInput("Y"), "Y must be initialized.");
auto* x = context.Input<Tensor>("X"); auto x_dims = ctx->GetInputDim("X");
auto* y = context.Input<Tensor>("Y"); auto y_dims = ctx->GetInputDim("Y");
PADDLE_ENFORCE_EQ(x->dims(), y->dims(), PADDLE_ENFORCE_EQ(x_dims, y_dims, "The shape of X and Y must be the same.");
"The shape of X and Y must be the same."); PADDLE_ENFORCE_EQ(x_dims.size(), 2, "The tensor rank of X must be 2.");
PADDLE_ENFORCE_EQ(x->dims().size(), 2, "The tensor rank of X must be 2."); PADDLE_ENFORCE_EQ(x_dims[1], 1, "The 2nd dimension of X must be 1.");
PADDLE_ENFORCE_EQ(x->dims()[1], 1, "The 2nd dimension of X must be 1.");
context.Output<framework::Tensor>("IntermediateVal")->Resize(x->dims()); ctx->SetOutputDim("IntermediateVal", x_dims);
context.Output<framework::Tensor>("Out")->Resize({x->dims()[0], 1}); ctx->SetOutputDim("Out", {x_dims[0], 1});
} }
}; };
...@@ -75,27 +74,28 @@ class ModifiedHuberLossGradOp : public framework::OperatorWithKernel { ...@@ -75,27 +74,28 @@ class ModifiedHuberLossGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& context) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
auto* x = context.Input<Tensor>("X"); PADDLE_ENFORCE(ctx->HasInput("X"), "X must be initialized.");
auto* y = context.Input<Tensor>("Y"); PADDLE_ENFORCE(ctx->HasInput("Y"), "Y must be initialized.");
auto* intermediate_val = context.Input<Tensor>("IntermediateVal"); PADDLE_ENFORCE(ctx->HasInput("IntermediateVal"),
auto* out_grad = context.Input<Tensor>(framework::GradVarName("Out"));
auto* x_grad =
context.Output<framework::Tensor>(framework::GradVarName("X"));
PADDLE_ENFORCE_NOT_NULL(x, "X must be initialized.");
PADDLE_ENFORCE_NOT_NULL(y, "Y must be initialized.");
PADDLE_ENFORCE_NOT_NULL(intermediate_val,
"Intermediate value must not be null."); "Intermediate value must not be null.");
PADDLE_ENFORCE_NOT_NULL(out_grad, "Input(Out@Grad) must not be null."); PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@Grad) must not be null.");
auto x_dims = ctx->GetInputDim("X");
auto y_dims = ctx->GetInputDim("Y");
auto intermediate_dims = ctx->GetInputDim("IntermediateVal");
auto out_grad_dims = ctx->GetInputDim(framework::GradVarName("Out"));
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
intermediate_val->dims(), x->dims(), intermediate_dims, x_dims,
"The shape of X and intermediate value must be the same."); "The shape of X and intermediate value must be the same.");
PADDLE_ENFORCE_EQ(out_grad->dims(), x->dims(), PADDLE_ENFORCE_EQ(out_grad_dims, x_dims,
"The shape of Input(Out@Grad) and X must be the same."); "The shape of Input(Out@Grad) and X must be the same.");
if (x_grad) x_grad->Resize(x->dims()); if (ctx->HasOutput(framework::GradVarName("X"))) {
ctx->SetOutputDim(framework::GradVarName("X"), x_dims);
}
} }
}; };
......
...@@ -24,27 +24,23 @@ class MulOp : public framework::OperatorWithKernel { ...@@ -24,27 +24,23 @@ class MulOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) of MulOp should not be null.");
"Input(X) of MulOp should not be null."); PADDLE_ENFORCE(ctx->HasInput("Y"), "Input(Y) of MulOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Input(Y) of MulOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"),
"Output(Out) of MulOp should not be null."); "Output(Out) of MulOp should not be null.");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto y_dims = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx->GetInputDim("Y");
int x_num_col_dims = Attr<int>("x_num_col_dims"); int x_num_col_dims = ctx->Attrs().Get<int>("x_num_col_dims");
int y_num_col_dims = Attr<int>("y_num_col_dims"); int y_num_col_dims = ctx->Attrs().Get<int>("y_num_col_dims");
PADDLE_ENFORCE(x_dims.size() > x_num_col_dims, PADDLE_ENFORCE(x_dims.size() > x_num_col_dims,
"The rank of input tensor X(%s) should be larger than " "The rank of input tensor X should be larger than "
"`mul_op`'s `x_num_col_dims`.", "`mul_op`'s `x_num_col_dims`.");
ctx.op().Input("X"));
PADDLE_ENFORCE(y_dims.size() > y_num_col_dims, PADDLE_ENFORCE(y_dims.size() > y_num_col_dims,
"The rank of input tensor Y(%s) should be larger than " "The rank of input tensor Y should be larger than "
"`mul_op`'s `y_num_col_dims`.", "`mul_op`'s `y_num_col_dims`.");
ctx.op().Input("Y"));
auto x_mat_dims = framework::flatten_to_2d(x_dims, x_num_col_dims); auto x_mat_dims = framework::flatten_to_2d(x_dims, x_num_col_dims);
auto y_mat_dims = framework::flatten_to_2d(y_dims, y_num_col_dims); auto y_mat_dims = framework::flatten_to_2d(y_dims, y_num_col_dims);
...@@ -52,15 +48,14 @@ class MulOp : public framework::OperatorWithKernel { ...@@ -52,15 +48,14 @@ class MulOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
x_mat_dims[1], y_mat_dims[0], x_mat_dims[1], y_mat_dims[0],
"First matrix's width must be equal with second matrix's height."); "First matrix's width must be equal with second matrix's height.");
ctx.Output<framework::Tensor>("Out")->Resize( ctx->SetOutputDim("Out", {x_mat_dims[0], y_mat_dims[1]});
{x_mat_dims[0], y_mat_dims[1]}); ctx->ShareLoD("X", /*->*/ "Out");
ctx.ShareLoD("X", /*->*/ "Out");
} }
}; };
class MulOpMaker : public framework::OpProtoAndCheckerMaker { class MulOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MulOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) MulOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The first input of mul op"); AddInput("X", "The first input of mul op");
AddInput("Y", "The second input of mul op"); AddInput("Y", "The second input of mul op");
...@@ -100,16 +95,14 @@ class MulOpGrad : public framework::OperatorWithKernel { ...@@ -100,16 +95,14 @@ class MulOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) should not be null"); PADDLE_ENFORCE(ctx->HasInput("Y"), "Input(Y) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto y_dims = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx->GetInputDim("Y");
auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims(); auto out_dims = ctx->GetInputDim(framework::GradVarName("Out"));
auto *x_grad = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
auto *y_grad = ctx.Output<framework::Tensor>(framework::GradVarName("Y"));
auto x_mat_dims = auto x_mat_dims =
framework::flatten_to_2d(x_dims, Attr<int>("x_num_col_dims")); framework::flatten_to_2d(x_dims, Attr<int>("x_num_col_dims"));
...@@ -125,8 +118,15 @@ class MulOpGrad : public framework::OperatorWithKernel { ...@@ -125,8 +118,15 @@ class MulOpGrad : public framework::OperatorWithKernel {
"The second dimension of Out@GRAD must equal to the second " "The second dimension of Out@GRAD must equal to the second "
"dimension of the second operand."); "dimension of the second operand.");
if (x_grad) x_grad->Resize(x_dims); auto x_grad_name = framework::GradVarName("X");
if (y_grad) y_grad->Resize(y_dims); auto y_grad_name = framework::GradVarName("Y");
if (ctx->HasOutput(x_grad_name)) {
ctx->SetOutputDim(x_grad_name, x_dims);
}
if (ctx->HasOutput(y_grad_name)) {
ctx->SetOutputDim(y_grad_name, y_dims);
}
} }
}; };
......
...@@ -18,61 +18,64 @@ namespace paddle { ...@@ -18,61 +18,64 @@ namespace paddle {
namespace operators { namespace operators {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
class MultiplexOp : public framework::OperatorWithKernel { class MultiplexOp : public framework::OperatorWithKernel {
public: public:
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE(!ctx.MultiInputVar("X").empty(), PADDLE_ENFORCE(ctx->HasInput("Ids"), "Input(Ids) shouldn't be null.");
"Input(X) should not be null"); PADDLE_ENFORCE(!ctx->Inputs("X").empty(),
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), "MultiInput(X) shouldn't be empty.");
"Output(Out) shouldn't be null."); PADDLE_ENFORCE(ctx->HasOutput("Out"), "Output(Out) shouldn't be null.");
auto ins = ctx.MultiInput<Tensor>("X"); auto ids_dim = ctx->GetInputDim("Ids");
auto *out = ctx.Output<LoDTensor>("Out");
auto num_ins = ins.size();
PADDLE_ENFORCE(num_ins > 2,
"multiplex operator should have more than 2 inputs.");
PADDLE_ENFORCE_EQ(ins[0]->dims().size(), 1,
"The first input must be a index vector.");
auto in_dim = ins[1]->dims();
for (size_t i = 2; i < num_ins; i++) {
auto dim = ins[i]->dims();
PADDLE_ENFORCE( PADDLE_ENFORCE(
in_dim == dim, ids_dim.size() == 2 && ids_dim[1] == 1,
"All the input tensors except the first one must have the same size"); "The index tensor must be a vector with size batchSize x 1.");
auto ins_dims = ctx->GetInputsDim("X");
auto num_ins = ins_dims.size();
PADDLE_ENFORCE(num_ins > 1,
"multiplex operator should have more than "
"one candidate input tensors.");
auto in_dim = ins_dims[0];
PADDLE_ENFORCE(in_dim.size() >= 2,
"The rank of candidate tensors must be not less than 2.");
for (size_t i = 1; i < num_ins; i++) {
auto dim = ins_dims[i];
PADDLE_ENFORCE(in_dim == dim,
"All the candidate tensors must have the same size.");
} }
out->Resize(in_dim); ctx->SetOutputDim("Out", in_dim);
} }
}; };
class MultiplexOpMaker : public framework::OpProtoAndCheckerMaker { class MultiplexOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MultiplexOpMaker(framework::OpProto *proto, MultiplexOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input tensors of multiplex operator.").AsDuplicable(); AddInput("Ids", "The index tensor of multiplex operator.");
AddInput("X", "The candidate tensors of multiplex operator.")
.AsDuplicable();
AddOutput("Out", "The output tensor of multiplex operator."); AddOutput("Out", "The output tensor of multiplex operator.");
AddComment(R"DOC(Multiplex operator AddComment(R"DOC(Multiplex operator
Multiplex multiple tensors according to the index provided by the first Multiplex multiple tensors according to the index provided by the index tensor.
input tensor.
ins[0]: the index tensor. Ids: the index tensor.
ins[1:N]: the candidate output tensors. X[0 : N - 1]: the candidate tensors for output (N >= 2).
For each index i from 0 to batchSize - 1, the output is the i-th row of the For each index i from 0 to batchSize - 1, the output is the i-th row of the
the (index[i] + 1)-th tensor. the (Ids[i])-th tensor.
For i-th row of the output tensor: For i-th row of the output tensor:
y[i][j] = x_{k}[i][j], j = 0,1, ... , (x_{1}.width - 1) y[i] = x_{k}[i]
where y is the output tensor. `x_{k}` is the k-th input tensor where y is the output tensor. `x_{k}` is the k-th input tensor
and `k = x{0}[i] + 1`. and `k = Ids[i]`.
)DOC"); )DOC");
} }
}; };
...@@ -82,21 +85,19 @@ class MultiplexGradOp : public framework::OperatorWithKernel { ...@@ -82,21 +85,19 @@ class MultiplexGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE(!ctx.MultiInputVar("X").empty(), PADDLE_ENFORCE(!ctx->Inputs("X").empty(), "Input(X) should not be null.");
"Input(X) should not be null"); PADDLE_ENFORCE(!ctx->Outputs(framework::GradVarName("X")).empty(),
PADDLE_ENFORCE(!ctx.MultiOutputVar(framework::GradVarName("X")).empty(), "Output(X@Grad) should not be null.");
"Output(X@Grad) should not be null"); PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), "Input(Out@GRAD) should not be null.");
"Input(Out@GRAD) shouldn't be null."); std::vector<framework::DDim> d_ins;
auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X")); auto ins = ctx->GetInputsDim("X");
auto ins = ctx.MultiInput<Tensor>("X"); // No need to compute gradient for Input(Ids)
// don't compute gradient for index (ins[0]) for (size_t i = 0; i < ins.size(); i++) {
for (size_t i = 1; i < ins.size(); i++) { d_ins.push_back(ins[i]);
if (d_ins[i]) {
d_ins[i]->Resize(ins[i]->dims());
}
} }
ctx->SetOutputsDim(framework::GradVarName("X"), d_ins);
} }
}; };
......
...@@ -18,27 +18,30 @@ ...@@ -18,27 +18,30 @@
namespace paddle { namespace paddle {
namespace operators { namespace operators {
using Tensor = framework::Tensor;
template <typename Place, typename T> template <typename Place, typename T>
class MultiplexGPUKernel : public framework::OpKernel { class MultiplexGPUKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& ctx) const { void Compute(const framework::ExecutionContext& ctx) const {
auto ins = ctx.MultiInput<framework::Tensor>("X"); auto ins = ctx.MultiInput<Tensor>("X");
auto* out = ctx.Output<framework::LoDTensor>("Out"); auto* ids = ctx.Input<Tensor>("Ids");
auto* out = ctx.Output<Tensor>("Out");
out->mutable_data<T>(ctx.GetPlace()); out->mutable_data<T>(ctx.GetPlace());
auto rows = ins[1]->dims()[0]; auto rows = ins[0]->dims()[0];
auto cols = ins[1]->dims()[1]; auto cols = ins[0]->numel() / rows;
// copy index to cpu // copy index to cpu
framework::Tensor index_t_cpu; Tensor index_t_cpu;
index_t_cpu.CopyFrom<T>(*(ins[0]), platform::CPUPlace()); index_t_cpu.CopyFrom<int32_t>(*ids, platform::CPUPlace());
auto* index = index_t_cpu.data<T>(); auto* index = index_t_cpu.data<int32_t>();
auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context()) ctx.device_context())
.stream(); .stream();
Place place = boost::get<Place>(ctx.GetPlace()); Place place = boost::get<Place>(ctx.GetPlace());
for (auto i = 0; i < rows; i++) { for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1; int32_t k = index[i];
PADDLE_ENFORCE_GE(k, 0, "index must be nonnegative.");
PADDLE_ENFORCE_LT(k, ins.size(), PADDLE_ENFORCE_LT(k, ins.size(),
"index exceeds the number of candidate tensors."); "index exceeds the number of candidate tensors.");
memory::Copy(place, out->data<T>() + i * cols, place, memory::Copy(place, out->data<T>() + i * cols, place,
...@@ -51,11 +54,11 @@ template <typename Place, typename T> ...@@ -51,11 +54,11 @@ template <typename Place, typename T>
class MultiplexGradGPUKernel : public framework::OpKernel { class MultiplexGradGPUKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& ctx) const { void Compute(const framework::ExecutionContext& ctx) const {
auto* d_out = ctx.Input<framework::Tensor>(framework::GradVarName("Out")); auto* d_out = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto ins = ctx.MultiInput<framework::Tensor>("X"); auto ins = ctx.MultiInput<Tensor>("X");
auto d_ins = auto* ids = ctx.Input<Tensor>("Ids");
ctx.MultiOutput<framework::Tensor>(framework::GradVarName("X")); auto d_ins = ctx.MultiOutput<Tensor>(framework::GradVarName("X"));
for (size_t i = 1; i < d_ins.size(); i++) { for (size_t i = 0; i < d_ins.size(); i++) {
if (d_ins[i]) { if (d_ins[i]) {
d_ins[i]->mutable_data<T>(ctx.GetPlace()); d_ins[i]->mutable_data<T>(ctx.GetPlace());
auto t = framework::EigenVector<T>::Flatten(*d_ins[i]); auto t = framework::EigenVector<T>::Flatten(*d_ins[i]);
...@@ -63,19 +66,19 @@ class MultiplexGradGPUKernel : public framework::OpKernel { ...@@ -63,19 +66,19 @@ class MultiplexGradGPUKernel : public framework::OpKernel {
} }
} }
auto rows = ins[1]->dims()[0]; auto rows = ins[0]->dims()[0];
auto cols = ins[1]->dims()[1]; auto cols = ins[0]->numel() / rows;
// copy index to cpu // copy index to cpu
framework::Tensor index_t_cpu; Tensor index_t_cpu;
index_t_cpu.CopyFrom<T>(*(ins[0]), platform::CPUPlace()); index_t_cpu.CopyFrom<int32_t>(*ids, platform::CPUPlace());
auto* index = index_t_cpu.data<T>(); auto* index = index_t_cpu.data<int32_t>();
auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context()) ctx.device_context())
.stream(); .stream();
Place place = boost::get<Place>(ctx.GetPlace()); Place place = boost::get<Place>(ctx.GetPlace());
for (auto i = 0; i < rows; i++) { for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1; size_t k = static_cast<size_t>(index[i]);
if (d_ins[k]) { if (d_ins[k]) {
memory::Copy(place, d_ins[k]->data<T>() + i * cols, place, memory::Copy(place, d_ins[k]->data<T>() + i * cols, place,
d_out->data<T>() + i * cols, cols * sizeof(T), stream); d_out->data<T>() + i * cols, cols * sizeof(T), stream);
......
...@@ -27,16 +27,18 @@ class MultiplexCPUKernel : public framework::OpKernel { ...@@ -27,16 +27,18 @@ class MultiplexCPUKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& ctx) const { void Compute(const framework::ExecutionContext& ctx) const {
auto ins = ctx.MultiInput<framework::Tensor>("X"); auto ins = ctx.MultiInput<framework::Tensor>("X");
auto* out = ctx.Output<framework::LoDTensor>("Out"); auto ids = ctx.Input<framework::Tensor>("Ids");
auto* out = ctx.Output<framework::Tensor>("Out");
out->mutable_data<T>(ctx.GetPlace()); out->mutable_data<T>(ctx.GetPlace());
auto rows = ins[1]->dims()[0]; auto rows = ins[0]->dims()[0];
auto cols = ins[1]->dims()[1]; auto cols = ins[0]->numel() / rows;
auto* index = ins[0]->data<T>(); auto index = ids->data<int32_t>();
Place place = boost::get<Place>(ctx.GetPlace()); Place place = boost::get<Place>(ctx.GetPlace());
for (auto i = 0; i < rows; i++) { for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1; int32_t k = index[i];
PADDLE_ENFORCE_GE(k, 0, "index must be nonnegative.");
PADDLE_ENFORCE_LT(static_cast<size_t>(k), ins.size(), PADDLE_ENFORCE_LT(static_cast<size_t>(k), ins.size(),
"index exceeds the number of candidate tensors."); "index exceeds the number of candidate tensors.");
memory::Copy(place, out->data<T>() + i * cols, place, memory::Copy(place, out->data<T>() + i * cols, place,
...@@ -50,10 +52,11 @@ class MultiplexGradCPUKernel : public framework::OpKernel { ...@@ -50,10 +52,11 @@ class MultiplexGradCPUKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& ctx) const { void Compute(const framework::ExecutionContext& ctx) const {
auto* d_out = ctx.Input<framework::Tensor>(framework::GradVarName("Out")); auto* d_out = ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
auto* ids = ctx.Input<framework::Tensor>("Ids");
auto ins = ctx.MultiInput<framework::Tensor>("X"); auto ins = ctx.MultiInput<framework::Tensor>("X");
auto d_ins = auto d_ins =
ctx.MultiOutput<framework::Tensor>(framework::GradVarName("X")); ctx.MultiOutput<framework::Tensor>(framework::GradVarName("X"));
for (size_t i = 1; i < d_ins.size(); i++) { for (size_t i = 0; i < d_ins.size(); i++) {
if (d_ins[i]) { if (d_ins[i]) {
d_ins[i]->mutable_data<T>(ctx.GetPlace()); d_ins[i]->mutable_data<T>(ctx.GetPlace());
auto t = framework::EigenVector<T>::Flatten(*d_ins[i]); auto t = framework::EigenVector<T>::Flatten(*d_ins[i]);
...@@ -61,12 +64,12 @@ class MultiplexGradCPUKernel : public framework::OpKernel { ...@@ -61,12 +64,12 @@ class MultiplexGradCPUKernel : public framework::OpKernel {
} }
} }
auto rows = ins[1]->dims()[0]; auto rows = ins[0]->dims()[0];
auto cols = ins[1]->dims()[1]; auto cols = ins[0]->numel() / rows;
auto* index = ins[0]->data<T>(); auto* index = ids->data<int32_t>();
Place place = boost::get<Place>(ctx.GetPlace()); Place place = boost::get<Place>(ctx.GetPlace());
for (auto i = 0; i < rows; i++) { for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1; size_t k = static_cast<size_t>(index[i]);
if (d_ins[k]) { if (d_ins[k]) {
memory::Copy(place, d_ins[k]->data<T>() + i * cols, place, memory::Copy(place, d_ins[k]->data<T>() + i * cols, place,
d_out->data<T>() + i * cols, cols * sizeof(T)); d_out->data<T>() + i * cols, cols * sizeof(T));
......
...@@ -24,14 +24,13 @@ class PadOp : public framework::OperatorWithKernel { ...@@ -24,14 +24,13 @@ class PadOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) of PadOp should not be null.");
"Input(X) of PadOp should not be null."); PADDLE_ENFORCE(ctx->HasOutput("Out"),
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"),
"Output(Out) of PadOp should not be null."); "Output(Out) of PadOp should not be null.");
auto x_dim = ctx.Input<Tensor>("X")->dims(); auto x_dim = ctx->GetInputDim("X");
auto paddings = Attr<std::vector<int>>("paddings"); auto paddings = ctx->Attrs().Get<std::vector<int>>("paddings");
PADDLE_ENFORCE_EQ(x_dim.size() * 2, int64_t(paddings.size()), PADDLE_ENFORCE_EQ(x_dim.size() * 2, int64_t(paddings.size()),
"Size of paddings should be equal to 2 * dimension size " "Size of paddings should be equal to 2 * dimension size "
"of input tensor."); "of input tensor.");
...@@ -39,19 +38,18 @@ class PadOp : public framework::OperatorWithKernel { ...@@ -39,19 +38,18 @@ class PadOp : public framework::OperatorWithKernel {
for (int i = 0; i < x_dim.size(); ++i) { for (int i = 0; i < x_dim.size(); ++i) {
out_dims[i] = x_dim[i] + paddings[i * 2] + paddings[i * 2 + 1]; out_dims[i] = x_dim[i] + paddings[i * 2] + paddings[i * 2 + 1];
} }
ctx.Output<framework::Tensor>("Out")->Resize( ctx->SetOutputDim("Out", framework::make_ddim(out_dims));
framework::make_ddim(out_dims));
if (out_dims[0] == x_dim[0]) { if (out_dims[0] == x_dim[0]) {
// Only pass LoD when the first dimension is equal between // Only pass LoD when the first dimension is equal between
// output and input. // output and input.
ctx.ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
} }
} }
}; };
class PadOpMaker : public framework::OpProtoAndCheckerMaker { class PadOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PadOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) PadOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input of pad op. " "The input of pad op. "
...@@ -101,14 +99,14 @@ class PadOpGrad : public framework::OperatorWithKernel { ...@@ -101,14 +99,14 @@ class PadOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto *x_g = ctx.Output<framework::Tensor>(framework::GradVarName("X")); auto x_grad_name = framework::GradVarName("X");
if (x_g != nullptr) { if (ctx->HasOutput(x_grad_name)) {
x_g->Resize(x_dims); ctx->SetOutputDim(x_grad_name, x_dims);
} }
} }
}; };
......
...@@ -27,32 +27,31 @@ class PoolOp : public framework::OperatorWithKernel { ...@@ -27,32 +27,31 @@ class PoolOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"X(Input) of Pooling should not be null."); "X(Input) of Pooling should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Out(Output) of Pooling should not be null."); "Out(Output) of Pooling should not be null.");
auto in_x = ctx.Input<Tensor>("X"); auto in_x_dims = ctx->GetInputDim("X");
auto out = ctx.Output<Tensor>("Out");
bool global_pooling = Attr<bool>("globalPooling"); std::string pooling_type = ctx->Attrs().Get<std::string>("poolingType");
std::string pooling_type = Attr<std::string>("poolingType"); std::vector<int> ksize = ctx->Attrs().Get<std::vector<int>>("ksize");
std::vector<int> ksize = Attr<std::vector<int>>("ksize"); std::vector<int> strides = ctx->Attrs().Get<std::vector<int>>("strides");
std::vector<int> strides = Attr<std::vector<int>>("strides"); std::vector<int> paddings = ctx->Attrs().Get<std::vector<int>>("paddings");
std::vector<int> paddings = Attr<std::vector<int>>("paddings");
PADDLE_ENFORCE(pooling_type == "max" || pooling_type == "avg", PADDLE_ENFORCE(pooling_type == "max" || pooling_type == "avg",
"pooling_type should be 'max' or 'avg'"); "pooling_type should be 'max' or 'avg'");
PADDLE_ENFORCE(in_x->dims().size() == 4 || in_x->dims().size() == 5, PADDLE_ENFORCE(in_x_dims.size() == 4 || in_x_dims.size() == 5,
"Pooling intput should be 4-D or 5-D"); "Pooling intput should be 4-D or 5-D");
if (global_pooling) { if (ctx->Attrs().Get<bool>("globalPooling")) {
ksize.resize(static_cast<size_t>(in_x->dims().size()) - 2); ksize.resize(static_cast<size_t>(in_x_dims.size()) - 2);
for (size_t i = 0; i < ksize.size(); ++i) for (size_t i = 0; i < ksize.size(); ++i)
ksize[i] = static_cast<int>(in_x->dims()[i + 2]); ksize[i] = static_cast<int>(in_x_dims[i + 2]);
} }
PADDLE_ENFORCE(in_x->dims().size() == static_cast<size_t>(ksize.size() + 2), PADDLE_ENFORCE(in_x_dims.size() - ksize.size() == 2,
"Input size and Pooling size should be consistent."); "Input size and Pooling size should be consistent.");
PADDLE_ENFORCE(ksize.size() == 2 || ksize.size() == 3, PADDLE_ENFORCE(ksize.size() == 2 || ksize.size() == 3,
"Pooling size should be 2 elements. or 3 elements."); "Pooling size should be 2 elements. or 3 elements.");
...@@ -61,12 +60,12 @@ class PoolOp : public framework::OperatorWithKernel { ...@@ -61,12 +60,12 @@ class PoolOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_EQ(ksize.size(), paddings.size(), PADDLE_ENFORCE_EQ(ksize.size(), paddings.size(),
"paddings size and pooling size should be the same."); "paddings size and pooling size should be the same.");
std::vector<int64_t> output_shape({in_x->dims()[0], in_x->dims()[1]}); std::vector<int64_t> output_shape({in_x_dims[0], in_x_dims[1]});
for (size_t i = 0; i < ksize.size(); ++i) { for (size_t i = 0; i < ksize.size(); ++i) {
output_shape.push_back(OutputSizePool(in_x->dims()[i + 2], ksize[i], output_shape.push_back(
paddings[i], strides[i])); OutputSizePool(in_x_dims[i + 2], ksize[i], paddings[i], strides[i]));
} }
out->Resize(framework::make_ddim(output_shape)); ctx->SetOutputDim("Out", framework::make_ddim(output_shape));
} }
}; };
...@@ -75,17 +74,13 @@ class PoolOpGrad : public framework::OperatorWithKernel { ...@@ -75,17 +74,13 @@ class PoolOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"X(Input) of Pooling should not be null."); "X(Input) of Pooling should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("X")),
"Out(Output) of Pooling should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.Output<Tensor>(framework::GradVarName("X")),
"Input@Grad of Pooling should not be null."); "Input@Grad of Pooling should not be null.");
auto in = ctx.Input<Tensor>("X"); ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
auto d_in = ctx.Output<Tensor>(framework::GradVarName("X"));
d_in->Resize(in->dims());
} }
}; };
......
...@@ -26,19 +26,14 @@ class PReluOp : public framework::OperatorWithKernel { ...@@ -26,19 +26,14 @@ class PReluOp : public framework::OperatorWithKernel {
: OperatorWithKernel(type, inputs, outputs, attrs) {} : OperatorWithKernel(type, inputs, outputs, attrs) {}
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
auto *in = ctx.Input<framework::Tensor>("X"); PADDLE_ENFORCE(ctx->HasInput("Alpha"), "Input(Alpha) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Alpha"), PADDLE_ENFORCE(product(ctx->GetInputDim("Alpha")) == 1,
"Input(Alpha) should not be null"); "Size of weight Alpha must be one.");
auto *alpha = ctx.Input<framework::Tensor>("Alpha"); PADDLE_ENFORCE(ctx->HasOutput("Out"), "Output(Out) should not be null");
PADDLE_ENFORCE(alpha->numel() == 1, "Size of weight Alpha must be one."); ctx->SetOutputDim("Out", ctx->GetInputDim("X"));
ctx->ShareLoD("X", /*->*/ "Out");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"),
"Output(Out) should not be null");
auto *out = ctx.Output<framework::Tensor>("Out");
out->Resize(in->dims());
ctx.ShareLoD("X", /*->*/ "Out");
} }
}; };
...@@ -68,19 +63,13 @@ class PReluGradOp : public framework::OperatorWithKernel { ...@@ -68,19 +63,13 @@ class PReluGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) must not be null."); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) must not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto *dx = ctx.Output<framework::Tensor>(framework::GradVarName("X")); ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
auto *x = ctx.Input<framework::Tensor>("X"); ctx->SetOutputDim(framework::GradVarName("Alpha"),
ctx->GetInputDim("Alpha"));
auto *dalpha =
ctx.Output<framework::Tensor>(framework::GradVarName("Alpha"));
auto *alpha = ctx.Input<framework::Tensor>("Alpha");
dx->Resize(x->dims());
dalpha->Resize(alpha->dims());
} }
}; };
......
...@@ -25,22 +25,21 @@ class RankLossOp : public framework::OperatorWithKernel { ...@@ -25,22 +25,21 @@ class RankLossOp : public framework::OperatorWithKernel {
: OperatorWithKernel(type, inputs, outputs, attrs) {} : OperatorWithKernel(type, inputs, outputs, attrs) {}
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
// input check // input check
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"), "Input(Label) shouldn't be null");
"Input(Label) shouldn't be null"); PADDLE_ENFORCE(ctx->HasInput("Left"), "Input(Left) shouldn't be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Left"), PADDLE_ENFORCE(ctx->HasInput("Right"), "Input(Right) shouldn't be null");
"Input(Left) shouldn't be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Right"), auto label_dims = ctx->GetInputDim("Label");
"Input(Right) shouldn't be null"); auto left_dims = ctx->GetInputDim("Left");
auto label_dims = ctx.Input<framework::Tensor>("Label")->dims(); auto right_dims = ctx->GetInputDim("Right");
auto left_dims = ctx.Input<framework::Tensor>("Left")->dims();
auto right_dims = ctx.Input<framework::Tensor>("Right")->dims();
PADDLE_ENFORCE((label_dims == left_dims) && (left_dims == right_dims), PADDLE_ENFORCE((label_dims == left_dims) && (left_dims == right_dims),
"All inputs must have the same size"); "All inputs must have the same size");
PADDLE_ENFORCE((label_dims.size() == 2) && (label_dims[1] == 1), PADDLE_ENFORCE((label_dims.size() == 2) && (label_dims[1] == 1),
"All inputs must be row vector with size batch_size x 1."); "All inputs must be row vector with size batch_size x 1.");
ctx.Output<framework::Tensor>("Out")->Resize(label_dims); ctx->SetOutputDim("Out", label_dims);
} }
}; };
...@@ -91,25 +90,22 @@ class RankLossGradOp : public framework::OperatorWithKernel { ...@@ -91,25 +90,22 @@ class RankLossGradOp : public framework::OperatorWithKernel {
: OperatorWithKernel(type, inputs, outputs, attrs) {} : OperatorWithKernel(type, inputs, outputs, attrs) {}
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"), "Input(Label) shouldn't be null.");
"Input(Label) shouldn't be null."); PADDLE_ENFORCE(ctx->HasInput("Left"), "Input(Left) shouldn't be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Left"), PADDLE_ENFORCE(ctx->HasInput("Right"), "Input(Right) shouldn't be null.");
"Input(Left) shouldn't be null."); PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Right"),
"Input(Right) shouldn't be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")),
"Input(Out@GRAD) shouldn't be null."); "Input(Out@GRAD) shouldn't be null.");
auto dims = ctx.Input<framework::Tensor>("Left")->dims(); auto dims = ctx->GetInputDim("Left");
auto *left_grad = auto left_grad_name = framework::GradVarName("Left");
ctx.Output<framework::Tensor>(framework::GradVarName("Left")); auto right_grad_name = framework::GradVarName("Right");
auto *right_grad =
ctx.Output<framework::Tensor>(framework::GradVarName("Right")); if (ctx->HasOutput(left_grad_name)) {
if (left_grad) { ctx->SetOutputDim(left_grad_name, dims);
left_grad->Resize(dims);
} }
if (right_grad) {
right_grad->Resize(dims); if (ctx->HasOutput(right_grad_name)) {
ctx->SetOutputDim(right_grad_name, dims);
} }
} }
}; };
......
...@@ -26,14 +26,14 @@ class ReshapeOp : public framework::OperatorWithKernel { ...@@ -26,14 +26,14 @@ class ReshapeOp : public framework::OperatorWithKernel {
: OperatorWithKernel(type, inputs, outputs, attrs) {} : OperatorWithKernel(type, inputs, outputs, attrs) {}
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
// input check // input check
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of ReshapeOp should not be null."); "Input(X) of ReshapeOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of ReshapeOp should not be null."); "Output(Out) of ReshapeOp should not be null.");
auto shape = ctx.Attr<std::vector<int>>("shape"); auto shape = ctx->Attrs().Get<std::vector<int>>("shape");
PADDLE_ENFORCE(shape.size() > 0, "Attr(shape) shouldn't be empty."); PADDLE_ENFORCE(shape.size() > 0, "Attr(shape) shouldn't be empty.");
for (auto dim : shape) { for (auto dim : shape) {
PADDLE_ENFORCE(dim > 0, "Each dimension of shape must be positive."); PADDLE_ENFORCE(dim > 0, "Each dimension of shape must be positive.");
...@@ -41,8 +41,8 @@ class ReshapeOp : public framework::OperatorWithKernel { ...@@ -41,8 +41,8 @@ class ReshapeOp : public framework::OperatorWithKernel {
// capacity check // capacity check
int64_t capacity = int64_t capacity =
std::accumulate(shape.begin(), shape.end(), 1, std::multiplies<int>()); std::accumulate(shape.begin(), shape.end(), 1, std::multiplies<int>());
auto *in = ctx.Input<framework::Tensor>("X"); auto x_dims = ctx->GetInputDim("X");
int64_t in_size = framework::product(in->dims()); int64_t in_size = framework::product(x_dims);
PADDLE_ENFORCE_EQ(capacity, in_size, PADDLE_ENFORCE_EQ(capacity, in_size,
"The size of Input(X) mismatches with Attr(shape)."); "The size of Input(X) mismatches with Attr(shape).");
// resize output // resize output
...@@ -50,11 +50,11 @@ class ReshapeOp : public framework::OperatorWithKernel { ...@@ -50,11 +50,11 @@ class ReshapeOp : public framework::OperatorWithKernel {
std::transform(shape.begin(), shape.end(), shape_int64.begin(), std::transform(shape.begin(), shape.end(), shape_int64.begin(),
[](int a) { return static_cast<int64_t>(a); }); [](int a) { return static_cast<int64_t>(a); });
auto out_dims = framework::make_ddim(shape_int64); auto out_dims = framework::make_ddim(shape_int64);
ctx.Output<framework::Tensor>("Out")->Resize(out_dims); ctx->SetOutputDim("Out", out_dims);
if (shape[0] == in->dims()[0]) { if (shape[0] == x_dims[0]) {
// Only pass LoD when the first dimension is equal between // Only pass LoD when the first dimension is equal between
// output and input. // output and input.
ctx.ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
} }
} }
}; };
...@@ -94,13 +94,11 @@ class ReshapeGradOp : public framework::OperatorWithKernel { ...@@ -94,13 +94,11 @@ class ReshapeGradOp : public framework::OperatorWithKernel {
: OperatorWithKernel(type, inputs, outputs, attrs) {} : OperatorWithKernel(type, inputs, outputs, attrs) {}
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) shouldn't be null."); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) shouldn't be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) shouldn't be null."); "Input(Out@GRAD) shouldn't be null.");
auto dims = ctx.Input<framework::Tensor>("X")->dims(); ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
auto *d_in = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
d_in->Resize(dims);
} }
}; };
......
...@@ -24,16 +24,16 @@ class RowwiseAddOp : public framework::OperatorWithKernel { ...@@ -24,16 +24,16 @@ class RowwiseAddOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of RowwiseAddOp should not be null."); "Input(X) of RowwiseAddOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("b"), PADDLE_ENFORCE(ctx->HasInput("b"),
"Input(b) of RowwiseAddOp should not be null."); "Input(b) of RowwiseAddOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of RowwiseAddOp should not be null."); "Output(Out) of RowwiseAddOp should not be null.");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto b_dims = ctx.Input<Tensor>("b")->dims(); auto b_dims = ctx->GetInputDim("b");
PADDLE_ENFORCE_GT( PADDLE_ENFORCE_GT(
x_dims.size(), b_dims.size(), x_dims.size(), b_dims.size(),
"The rank of input `X` must be larger than the one of input `b`."); "The rank of input `X` must be larger than the one of input `b`.");
...@@ -43,16 +43,17 @@ class RowwiseAddOp : public framework::OperatorWithKernel { ...@@ -43,16 +43,17 @@ class RowwiseAddOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims, framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims,
"The width of two operands must be same"); "The width of two operands must be same");
PADDLE_ENFORCE_EQ(ctx.OutputSize("Out"), 1, "The output size must be 1"); PADDLE_ENFORCE_EQ(ctx->Outputs("Out").size(), 1,
ctx.Output<framework::Tensor>("Out")->Resize(x_dims); "The output size must be 1");
ctx.ShareLoD("X", /*->*/ "Out"); ctx->SetOutputDim("Out", x_dims);
ctx->ShareLoD("X", /*->*/ "Out");
} }
}; };
class RowwiseAddOpMaker : public framework::OpProtoAndCheckerMaker { class RowwiseAddOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
RowwiseAddOpMaker(framework::OpProto *proto, RowwiseAddOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The left input of row-wise add op, must be matrix"); AddInput("X", "The left input of row-wise add op, must be matrix");
AddInput("b", "The right input of row-wise add op, must be vector"); AddInput("b", "The right input of row-wise add op, must be vector");
...@@ -69,25 +70,29 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel { ...@@ -69,25 +70,29 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "X should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "X should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("b"), "b should not be null"); PADDLE_ENFORCE(ctx->HasInput("b"), "b should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto b_dims = ctx.Input<Tensor>("b")->dims(); auto b_dims = ctx->GetInputDim("b");
PADDLE_ENFORCE_GT( PADDLE_ENFORCE_GT(
x_dims.size(), b_dims.size(), x_dims.size(), b_dims.size(),
"The rank of input `X` must be larger than the one of input `b`."); "The rank of input `X` must be larger than the one of input `b`.");
int num_col_dims = x_dims.size() - b_dims.size(); int64_t num_col_dims = x_dims.size() - b_dims.size();
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims, framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims,
"The width of two operands must be same"); "The width of two operands must be same");
auto *dx = ctx.Output<framework::Tensor>(framework::GradVarName("X")); auto x_grad_name = framework::GradVarName("X");
auto *db = ctx.Output<framework::Tensor>(framework::GradVarName("b")); auto b_grad_name = framework::GradVarName("b");
if (dx) dx->Resize(x_dims); if (ctx->HasOutput(x_grad_name)) {
if (db) db->Resize(b_dims); ctx->SetOutputDim(x_grad_name, x_dims);
}
if (ctx->HasOutput(b_grad_name)) {
ctx->SetOutputDim(b_grad_name, b_dims);
}
} }
}; };
......
...@@ -26,16 +26,13 @@ class ScaleOp : public framework::OperatorWithKernel { ...@@ -26,16 +26,13 @@ class ScaleOp : public framework::OperatorWithKernel {
: OperatorWithKernel(type, inputs, outputs, attrs) {} : OperatorWithKernel(type, inputs, outputs, attrs) {}
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of ScaleOp should not be null."); "Input(X) of ScaleOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of ScaleOp should not be null."); "Output(Out) of ScaleOp should not be null.");
ctx->SetOutputDim("Out", ctx->GetInputDim("X"));
auto *in = ctx.Input<framework::Tensor>("X"); ctx->ShareLoD("X", /*->*/ "Out");
auto *out = ctx.Output<framework::Tensor>("Out");
out->Resize(in->dims());
ctx.ShareLoD("X", /*->*/ "Out");
} }
}; };
......
...@@ -23,29 +23,30 @@ class ScatterOp : public framework::OperatorWithKernel { ...@@ -23,29 +23,30 @@ class ScatterOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Ref"), PADDLE_ENFORCE(ctx->HasInput("Ref"),
"Input(Ref) of ScatterOp should not be null."); "Input(Ref) of ScatterOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Index"), PADDLE_ENFORCE(ctx->HasInput("Index"),
"Input(Index) of ScatterOp should not be null."); "Input(Index) of ScatterOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Updates"), PADDLE_ENFORCE(ctx->HasInput("Updates"),
"Input(Updates) of ScatterOp should not be null."); "Input(Updates) of ScatterOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of ScatterOp should not be null."); "Output(Out) of ScatterOp should not be null.");
PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("Index")->dims().size(), 1, auto updates_dims = ctx->GetInputDim("Updates");
auto ref_dims = ctx->GetInputDim("Ref");
PADDLE_ENFORCE_EQ(ctx->GetInputDim("Index").size(), 1,
"Update Index should be 1-D."); "Update Index should be 1-D.");
PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("Ref")->dims().size(), PADDLE_ENFORCE_EQ(ref_dims.size(), updates_dims.size(),
ctx.Input<Tensor>("Updates")->dims().size(),
"Reference and Updates should have the same shape size"); "Reference and Updates should have the same shape size");
PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("Updates")->dims()[0], PADDLE_ENFORCE_EQ(ctx->GetInputDim("Updates")[0],
ctx.Input<Tensor>("Index")->dims()[0], ctx->GetInputDim("Index")[0],
"Updates and Index should have same batch-size."); "Updates and Index should have same batch-size.");
framework::DDim data_dim(ctx.Input<Tensor>("Updates")->dims()); framework::DDim data_dim(updates_dims);
for (int i = 1; i < data_dim.size(); ++i) for (int i = 1; i < data_dim.size(); ++i) {
PADDLE_ENFORCE_EQ(data_dim[i], ctx.Input<Tensor>("Updates")->dims()[i]); PADDLE_ENFORCE_EQ(data_dim[i], updates_dims[i]);
ctx.Output<framework::Tensor>("Out")->Resize( }
ctx.Input<Tensor>("Ref")->dims()); ctx->SetOutputDim("Out", ref_dims);
} }
}; };
...@@ -54,22 +55,17 @@ class ScatterGradOp : public framework::OperatorWithKernel { ...@@ -54,22 +55,17 @@ class ScatterGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
auto *dUpdates = ctx->SetOutputDim(framework::GradVarName("Updates"),
ctx.Output<framework::Tensor>(framework::GradVarName("Updates")); ctx->GetInputDim("Updates"));
auto *Updates = ctx.Input<Tensor>("Updates"); ctx->SetOutputDim(framework::GradVarName("Ref"), ctx->GetInputDim("Ref"));
auto *dRef = ctx.Output<framework::Tensor>(framework::GradVarName("Ref"));
auto *Ref = ctx.Input<Tensor>("Ref");
dRef->Resize(Ref->dims());
dUpdates->Resize(Updates->dims());
} }
}; };
class ScatterOpMaker : public framework::OpProtoAndCheckerMaker { class ScatterOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ScatterOpMaker(framework::OpProto *proto, ScatterOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Ref", "The source input of scatter op"); AddInput("Ref", "The source input of scatter op");
AddInput("Index", AddInput("Index",
...@@ -84,6 +80,7 @@ Out[Index] = Ref[Index] + Updates ...@@ -84,6 +80,7 @@ Out[Index] = Ref[Index] + Updates
)DOC"); )DOC");
} }
}; };
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
......
...@@ -22,23 +22,12 @@ class SequencePoolOp : public framework::OperatorWithKernel { ...@@ -22,23 +22,12 @@ class SequencePoolOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of SequencePoolOp should not be null."); "Input(X) of SequenceAvgPoolOp should not be null.");
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasOutput("Out"),
ctx.OutputVar("Out"), "Output(Out) of SequenceAvgPoolOp should not be null.");
"Output(Out) of SequencePoolOp should not be null."); ctx->SetOutputDim("Out", ctx->GetInputDim("X"));
auto* x = ctx.Input<framework::LoDTensor>("X");
auto dims = x->dims();
auto lod = x->lod();
PADDLE_ENFORCE_EQ(lod.size(), 1UL, "Only support one level sequence now.");
PADDLE_ENFORCE_GE(
dims[0],
/*batch size = */ static_cast<int64_t>(lod[0].size() - 1),
"The first dimension of Input(X) must be large than batch size.");
dims[0] = lod[0].size() - 1;
ctx.Output<framework::LoDTensor>("Out")->Resize({dims});
} }
}; };
...@@ -85,22 +74,18 @@ class SequencePoolGradOp : public framework::OperatorWithKernel { ...@@ -85,22 +74,18 @@ class SequencePoolGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Gradient of Out should not be null."); "Gradient of Out should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"), "The input X should not be null.");
"The input X should not be null."); auto og_dims = ctx->GetInputDim(framework::GradVarName("Out"));
auto og_dims = auto x_dims = ctx->GetInputDim("X");
ctx.Input<framework::LoDTensor>(framework::GradVarName("Out"))->dims();
auto x_dims = ctx.Input<framework::LoDTensor>("X")->dims();
PADDLE_ENFORCE_EQ(og_dims.size(), x_dims.size(), PADDLE_ENFORCE_EQ(og_dims.size(), x_dims.size(),
"The rank of output grad must equal to Input(X)."); "The rank of output grad must equal to Input(X).");
for (int64_t i = 1; i < og_dims.size(); ++i) { for (int64_t i = 1; i < og_dims.size(); ++i) {
PADDLE_ENFORCE_EQ(og_dims[i], x_dims[i], "The dimension mismatch."); PADDLE_ENFORCE_EQ(og_dims[i], x_dims[i], "The dimension mismatch.");
} }
auto* x_grad = ctx->SetOutputDim(framework::GradVarName("X"), x_dims);
ctx.Output<framework::LoDTensor>(framework::GradVarName("X"));
x_grad->Resize(x_dims);
} }
}; };
......
...@@ -46,16 +46,27 @@ class SequencePoolKernel : public framework::OpKernel { ...@@ -46,16 +46,27 @@ class SequencePoolKernel : public framework::OpKernel {
int strategy = context.Attr<int>("strategy"); int strategy = context.Attr<int>("strategy");
auto dims = in->dims(); auto dims = in->dims();
auto lod = in->lod()[0]; auto lod = in->lod();
int64_t w = in->numel() / dims[0]; int64_t w = in->numel() / dims[0];
// InferShape by lod
PADDLE_ENFORCE_EQ(lod.size(), 1UL, "Only support one level sequence now.");
PADDLE_ENFORCE_GE(
dims[0],
/*batch size = */ static_cast<int64_t>(lod[0].size() - 1),
"The first dimension of Input(X) must be large than batch size.");
dims[0] = lod[0].size() - 1;
out->Resize({dims});
auto lod_level_0 = lod[0];
out->mutable_data<T>(context.GetPlace()); out->mutable_data<T>(context.GetPlace());
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
for (int i = 0; i < static_cast<int>(lod.size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) {
Tensor in_t = Tensor in_t = in->Slice<T>(static_cast<int>(lod_level_0[i]),
in->Slice<T>(static_cast<int>(lod[i]), static_cast<int>(lod[i + 1])); static_cast<int>(lod_level_0[i + 1]));
Tensor out_t = out->Slice<T>(i, i + 1); Tensor out_t = out->Slice<T>(i, i + 1);
int64_t h = static_cast<int64_t>(lod[i + 1] - lod[i]); int64_t h = static_cast<int64_t>(lod_level_0[i + 1] - lod_level_0[i]);
auto in_e = EigenMatrix<T>::From(in_t, framework::make_ddim({h, w})); auto in_e = EigenMatrix<T>::From(in_t, framework::make_ddim({h, w}));
auto out_e = EigenVector<T>::Flatten(out_t); auto out_e = EigenVector<T>::Flatten(out_t);
......
...@@ -22,19 +22,18 @@ class SGDOp : public framework::OperatorWithKernel { ...@@ -22,19 +22,18 @@ class SGDOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("param"), PADDLE_ENFORCE(ctx->HasInput("param"),
"Input(param) of SGDOp should not be null."); "Input(param) of SGDOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("grad"), PADDLE_ENFORCE(ctx->HasInput("grad"),
"Input(grad) of SGDOp should not be null."); "Input(grad) of SGDOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("param_out"), PADDLE_ENFORCE(ctx->HasOutput("param_out"),
"Output(param_out) of SGDOp should not be null."); "Output(param_out) of SGDOp should not be null.");
PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("param")->dims(), auto param_dim = ctx->GetInputDim("param");
ctx.Input<Tensor>("grad")->dims(), PADDLE_ENFORCE_EQ(param_dim, ctx->GetInputDim("grad"),
"Two input of SGD Op's dimension must be same."); "Two input of SGD Op's dimension must be same.");
ctx.Output<framework::Tensor>("param_out") ctx->SetOutputDim("param_out", param_dim);
->Resize(ctx.Input<Tensor>("param")->dims());
} }
}; };
......
...@@ -22,33 +22,28 @@ class SmoothL1LossOp : public framework::OperatorWithKernel { ...@@ -22,33 +22,28 @@ class SmoothL1LossOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "X must be initialized."); PADDLE_ENFORCE(ctx->HasInput("X"), "X must be initialized.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Y must be initialized."); PADDLE_ENFORCE(ctx->HasInput("Y"), "Y must be initialized.");
auto* x = ctx.Input<framework::Tensor>("X"); auto x_dims = ctx->GetInputDim("X");
auto* y = ctx.Input<framework::Tensor>("Y"); auto y_dims = ctx->GetInputDim("Y");
PADDLE_ENFORCE_EQ(x->dims(), y->dims(), PADDLE_ENFORCE_EQ(x_dims, y_dims, "The shape of X and Y must be the same.");
"The shape of X and Y must be the same."); PADDLE_ENFORCE_GE(x_dims.size(), 2,
PADDLE_ENFORCE_GE(x->dims().size(), 2,
"The tensor rank of X must be at least 2."); "The tensor rank of X must be at least 2.");
auto* inside_weight = ctx.Input<framework::Tensor>("InsideWeight"); if (ctx->HasInput("InsideWeight")) {
if (inside_weight) { PADDLE_ENFORCE(ctx->HasInput("OutsideWeight"),
auto* outside_weight = ctx.Input<framework::Tensor>("OutsideWeight");
PADDLE_ENFORCE_NOT_NULL(outside_weight,
"If weights are provided, must specify both " "If weights are provided, must specify both "
"inside and outside weights."); "inside and outside weights.");
PADDLE_ENFORCE_EQ(inside_weight->dims(), x->dims(), PADDLE_ENFORCE_EQ(ctx->GetInputDim("InsideWeight"), x_dims,
"The shape of InsideWeight must be same as X."); "The shape of InsideWeight must be same as X.");
PADDLE_ENFORCE_EQ(outside_weight->dims(), x->dims(), PADDLE_ENFORCE_EQ(ctx->GetInputDim("OutsideWeight"), x_dims,
"The shape of OutsideWeight must be same as X."); "The shape of OutsideWeight must be same as X.");
} }
auto* diff = ctx.Output<framework::Tensor>("Diff"); ctx->SetOutputDim("Diff", x_dims);
auto* out = ctx.Output<framework::Tensor>("Out");
diff->Resize(x->dims());
// loss is a two-rank tensor // loss is a two-rank tensor
out->Resize({x->dims()[0], 1}); ctx->SetOutputDim("Out", {x_dims[0], 1});
} }
}; };
...@@ -99,12 +94,9 @@ class SmoothL1LossGradOp : public framework::OperatorWithKernel { ...@@ -99,12 +94,9 @@ class SmoothL1LossGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
auto in_dims = ctx.Input<framework::Tensor>("X")->dims(); auto in_dims = ctx->GetInputDim("X");
auto out_dims = auto out_dims = ctx->GetInputDim(framework::GradVarName("Out"));
ctx.Input<framework::Tensor>(framework::GradVarName("Out"))->dims();
auto* x_grad = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
auto* y_grad = ctx.Output<framework::Tensor>(framework::GradVarName("Y"));
PADDLE_ENFORCE_GE(out_dims.size(), 2, PADDLE_ENFORCE_GE(out_dims.size(), 2,
"The tensor rank of Input(Out@Grad) should be 2."); "The tensor rank of Input(Out@Grad) should be 2.");
...@@ -114,8 +106,14 @@ class SmoothL1LossGradOp : public framework::OperatorWithKernel { ...@@ -114,8 +106,14 @@ class SmoothL1LossGradOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_EQ(out_dims[1], 1, PADDLE_ENFORCE_EQ(out_dims[1], 1,
"The 2nd dimension of Input(Out@Grad) must be 1."); "The 2nd dimension of Input(Out@Grad) must be 1.");
if (x_grad) x_grad->Resize(in_dims); auto x_grad_name = framework::GradVarName("X");
if (y_grad) y_grad->Resize(in_dims); auto y_grad_name = framework::GradVarName("Y");
if (ctx->HasOutput(x_grad_name)) {
ctx->SetOutputDim(x_grad_name, in_dims);
}
if (ctx->HasOutput(y_grad_name)) {
ctx->SetOutputDim(y_grad_name, in_dims);
}
} }
}; };
......
...@@ -22,22 +22,23 @@ class SoftmaxOp : public framework::OperatorWithKernel { ...@@ -22,22 +22,23 @@ class SoftmaxOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of SoftmaxOp should not be null."); "Input(X) of SoftmaxOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Y"), PADDLE_ENFORCE(ctx->HasOutput("Y"),
"Output(Y) of SoftmaxOp should not be null."); "Output(Y) of SoftmaxOp should not be null.");
PADDLE_ENFORCE(ctx.Input<Tensor>("X")->dims().size() == 2UL, auto x_dims = ctx->GetInputDim("X");
PADDLE_ENFORCE(x_dims.size() == 2UL,
"The input of softmax op must be a matrix."); "The input of softmax op must be a matrix.");
ctx.Output<framework::Tensor>("Y")->Resize(ctx.Input<Tensor>("X")->dims()); ctx->SetOutputDim("Y", x_dims);
} }
}; };
class SoftmaxOpMaker : public framework::OpProtoAndCheckerMaker { class SoftmaxOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SoftmaxOpMaker(framework::OpProto *proto, SoftmaxOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input tensor of softmax. " "The input tensor of softmax. "
...@@ -68,16 +69,15 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel { ...@@ -68,16 +69,15 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) should be not null."); PADDLE_ENFORCE(ctx->HasInput("Y"), "Input(Y) should be not null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Y")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Y")),
"Input(Y@GRAD) should be not null."); "Input(Y@GRAD) should be not null.");
PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("Y")->dims(), PADDLE_ENFORCE_EQ(ctx->GetInputDim("Y"),
ctx.Input<Tensor>(framework::GradVarName("Y"))->dims(), ctx->GetInputDim(framework::GradVarName("Y")),
"Input(Y) and its gradients should have a same shape."); "Input(Y) and its gradients should have a same shape.");
ctx.Output<framework::Tensor>(framework::GradVarName("X")) ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
->Resize(ctx.Input<Tensor>("X")->dims());
} }
}; };
......
...@@ -15,6 +15,7 @@ limitations under the License. */ ...@@ -15,6 +15,7 @@ limitations under the License. */
#pragma once #pragma once
#include "paddle/framework/eigen.h" #include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/math/softmax.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
...@@ -30,36 +31,11 @@ class SoftmaxKernel : public framework::OpKernel { ...@@ -30,36 +31,11 @@ class SoftmaxKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto X = context.Input<Tensor>("X"); auto X = context.Input<Tensor>("X");
auto Y = context.Output<Tensor>("Y"); auto Y = context.Output<Tensor>("Y");
Y->mutable_data<T>(context.GetPlace());
auto logits = EigenMatrix<T>::From(*X);
auto softmax = EigenMatrix<T>::From(*Y);
const int kBatchDim = 0;
const int kClassDim = 1;
const int batch_size = logits.dimension(kBatchDim);
const int num_classes = logits.dimension(kClassDim);
Eigen::DSizes<int, 1> along_class(kClassDim); // allocate memory on device.
Eigen::DSizes<int, 2> batch_by_one(batch_size, 1); Y->mutable_data<T>(context.GetPlace());
Eigen::DSizes<int, 2> one_by_class(1, num_classes);
auto shifted_logits = (logits -
logits.maximum(along_class)
.eval()
.reshape(batch_by_one)
.broadcast(one_by_class));
softmax.device(context.GetEigenDevice<Place>()) = shifted_logits.exp();
softmax.device(context.GetEigenDevice<Place>()) = math::SoftmaxFunctor<Place, T>()(context, X, Y);
(softmax *
softmax.sum(along_class)
.inverse()
.eval()
.reshape(batch_by_one)
.broadcast(one_by_class));
} }
}; };
...@@ -67,8 +43,6 @@ template <typename Place, typename T> ...@@ -67,8 +43,6 @@ template <typename Place, typename T>
class SoftmaxGradKernel : public framework::OpKernel { class SoftmaxGradKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
std::shared_ptr<Tensor> scale_ = std::make_shared<Tensor>();
auto Y = context.Input<Tensor>("Y"); auto Y = context.Input<Tensor>("Y");
auto dY = context.Input<Tensor>(framework::GradVarName("Y")); auto dY = context.Input<Tensor>(framework::GradVarName("Y"));
auto dX = context.Output<Tensor>(framework::GradVarName("X")); auto dX = context.Output<Tensor>(framework::GradVarName("X"));
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/softmax_with_cross_entropy_op.h"
namespace paddle {
namespace operators {
class SoftmaxWithCrossEntropyOpMaker
: public framework::OpProtoAndCheckerMaker {
public:
SoftmaxWithCrossEntropyOpMaker(framework::OpProto* proto,
framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Logits",
"(Tensor, default: Tensor<float>), The unscaled log probabilities "
"which is a 2-D tensor with shape [N x K]. N is the batch_size, "
"and K is the class number.")
.NotInGradient();
AddInput(
"Label",
"(Tensor, default: Tensor<int>), The ground truth which is a 2-D "
"tensor. "
"If softLable is set to 0, Label is a Tensor<int> with shape [N x 1]. "
"If softLable is set to 1, Label is a Tensor<float/double> "
"with shape [N x K].");
AddOutput(
"Softmax",
"(Tensor, default: Tensor<float>), A 2-D tensor with shape [N x K]. "
"The outputs value of softmax activation by given the input batch, "
"which will be used in backward calculation.")
.AsIntermediate();
AddOutput("Loss",
"(Tensor, default: Tensor<float>), A 2-D tensor. The cross "
"entropy loss with shape [N x 1].");
AddAttr<bool>(
"softLabel",
"(bool, default: false), A flag to indicate whether to interpretate "
"the given labels as soft labels.")
.SetDefault(false);
AddComment(R"DOC(
Cross entropy loss with softmax are used as the output layer extensively. This
operator computes the softmax normalized values for each row of the input
tensor, after which cross-entropy loss is then computed. This provides a more
numerically stable gradient.
Because this operators performs a softmax on logits internally, it expects
unscaled logits. Please do not call this op with the output of softmax operator,
which will produce incorrect results.
This operators expects mutually exclusive hard labels, each sample in a batch
is in exactly one class with probabilities 1. Each sample in the batch with one
and only one label.
Equation:
1) hard label (one-hot label)
Loss_j = -\text{Logit}_{Label_j} + \log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right), j = 1, ..., K
2) soft label (a distribution over all classes)
Loss_j = -\sum_{i=0}^{K}\text{Label}_i\left(\text{Logit}_i-\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right)\right), j = 1,...,K
)DOC");
}
};
class SoftmaxWithCrossEntropyOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Logits"),
"Input(Logits) should be not null.");
PADDLE_ENFORCE(ctx->HasInput("Label"), "Input(Label) should be not null.");
PADDLE_ENFORCE(ctx->HasOutput("Softmax"),
"Output(Softmax) should be not null.");
PADDLE_ENFORCE(ctx->HasOutput("Loss"), "Output(Loss) should be not null.");
auto logits_dims = ctx->GetInputDim("Logits");
auto labels_dims = ctx->GetInputDim("Label");
PADDLE_ENFORCE_EQ(
logits_dims.size(), 2UL,
"The input of softmax_with_cross_entropy should be a 2-D tensor.");
PADDLE_ENFORCE_EQ(labels_dims.size(), 2UL,
"The labels should be a 2-D tensor.");
if (ctx->Attrs().Get<bool>("softLabel")) {
PADDLE_ENFORCE_EQ(logits_dims[1], labels_dims[1],
"If Attr(softLabel) == true, the 2nd dimension of "
"Input(X) and Input(Label) should be equal.");
} else {
PADDLE_ENFORCE_EQ(labels_dims[1], 1UL,
"If Attr(softLabel) == false, the 2nd dimension of "
"Input(Label) should be 1.");
}
ctx->SetOutputDim("Softmax", logits_dims);
ctx->SetOutputDim("Loss", {logits_dims[0], 1});
ctx->ShareLoD("Logits", /*->*/ "Softmax");
ctx->ShareLoD("Logits", /*->*/ "Loss");
}
};
class SoftmaxWithCrossEntropyOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Loss")),
"Input(Loss@Grad) should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Softmax"),
"Input(Softmax) should be not null.");
PADDLE_ENFORCE(ctx->HasInput("Label"), "Input(Label) should be not null.");
PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("Logits")),
"Output(Logits@Grad) should be not null.");
auto softmax_dims = ctx->GetInputDim("Softmax");
auto labels_dims = ctx->GetInputDim("Label");
PADDLE_ENFORCE_EQ(labels_dims.size(), 2UL,
"The labels should be a 2-D tensor.");
if (ctx->Attrs().Get<bool>("softLabel")) {
PADDLE_ENFORCE_EQ(softmax_dims[1], labels_dims[1],
"When Attr(softLabel) == true, the 2nd dimension of "
"Input(X) and Input(Label) should be equal.");
} else {
PADDLE_ENFORCE_EQ(labels_dims[1], 1UL,
"When Attr(softLabel) == false, the 2nd dimension of "
"Input(Label) should be 1.");
}
ctx->SetOutputDim(framework::GradVarName("Logits"),
ctx->GetInputDim("Softmax"));
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(softmax_with_cross_entropy, ops::SoftmaxWithCrossEntropyOp,
ops::SoftmaxWithCrossEntropyOpMaker,
softmax_with_cross_entropy_grad,
ops::SoftmaxWithCrossEntropyOpGrad);
REGISTER_OP_CPU_KERNEL(softmax_with_cross_entropy,
ops::SoftmaxWithCrossEntropyKernel<float>);
REGISTER_OP_CPU_KERNEL(softmax_with_cross_entropy_grad,
ops::SoftmaxWithCrossEntropyGradKernel<float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/softmax_with_cross_entropy_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
namespace {
template <typename T>
__global__ void CrossEntropyGrad(T* out_grad, const T* in_grad,
const int* labels, const int batch_size,
const int class_num) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int sample_idx = tid / class_num;
if (tid < batch_size * class_num) out_grad[tid] *= in_grad[sample_idx];
__syncthreads();
if (tid < batch_size) {
PADDLE_ASSERT(labels[sample_idx] >= 0 && labels[sample_idx] < class_num);
out_grad[tid * class_num + labels[tid]] -= 1.;
}
}
template <typename T>
__global__ void SoftCrossEntropyGradientKernel(T* logit_grad,
const T* loss_grad,
const T* labels,
const int batch_size,
const int class_num) {
int ids = blockIdx.x * blockDim.x + threadIdx.x;
if (ids < batch_size * class_num) {
int row_ids = ids / class_num;
logit_grad[ids] = logit_grad[ids] * loss_grad[row_ids] - labels[ids];
}
}
} // namespace
template <typename T>
class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
PADDLE_ENFORCE(platform::is_gpu_place(context.GetPlace()),
"This kernel only runs on GPU device.");
const Tensor* logits = context.Input<Tensor>("Logits");
const Tensor* labels = context.Input<Tensor>("Label");
Tensor* softmax = context.Output<Tensor>("Softmax");
Tensor* loss = context.Output<Tensor>("Loss");
softmax->mutable_data<T>(context.GetPlace());
loss->mutable_data<T>(context.GetPlace());
math::SoftmaxFunctor<platform::GPUPlace, T>()(context, logits, softmax);
math::CrossEntropyFunctor<platform::GPUPlace, T>()(
context, loss, softmax, labels, context.Attr<bool>("softLabel"));
}
};
template <typename T>
class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
PADDLE_ENFORCE(platform::is_gpu_place(context.GetPlace()),
"This kernel only runs on GPU device.");
const Tensor* labels = context.Input<Tensor>("Label");
const T* loss_grad_data =
context.Input<Tensor>(framework::GradVarName("Loss"))->data<T>();
Tensor* logit_grad =
context.Output<Tensor>(framework::GradVarName("Logits"));
logit_grad->ShareDataWith<T>(*context.Input<Tensor>("Softmax"));
T* logit_grad_data = logit_grad->data<T>();
const int batch_size = logit_grad->dims()[0];
const int class_num = logit_grad->dims()[1];
int block = 512;
int grid = (batch_size * class_num + block - 1) / block;
if (context.Attr<bool>("softLabel")) {
const T* label_data = labels->data<T>();
SoftCrossEntropyGradientKernel<T><<<
grid, block, 0, reinterpret_cast<const platform::CUDADeviceContext&>(
context.device_context())
.stream()>>>(logit_grad_data, loss_grad_data,
label_data, batch_size, class_num);
} else {
const int* label_data = labels->data<int>();
CrossEntropyGrad<T><<<
grid, block, 0, reinterpret_cast<const platform::CUDADeviceContext&>(
context.device_context())
.stream()>>>(logit_grad_data, loss_grad_data,
label_data, batch_size, class_num);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(softmax_with_cross_entropy,
ops::SoftmaxWithCrossEntropyCUDAKernel<float>);
REGISTER_OP_GPU_KERNEL(softmax_with_cross_entropy_grad,
ops::SoftmaxWithCrossEntropyGradCUDAKernel<float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/cross_entropy.h"
#include "paddle/operators/math/softmax.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename T>
class SoftmaxWithCrossEntropyKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
PADDLE_ENFORCE(platform::is_cpu_place(context.GetPlace()),
"This kernel only runs on CPU.");
const Tensor* logits = context.Input<Tensor>("Logits");
const Tensor* labels = context.Input<Tensor>("Label");
Tensor* softmax = context.Output<Tensor>("Softmax");
Tensor* loss = context.Output<Tensor>("Loss");
softmax->mutable_data<T>(context.GetPlace());
loss->mutable_data<T>(context.GetPlace());
math::SoftmaxFunctor<platform::CPUPlace, T>()(context, logits, softmax);
math::CrossEntropyFunctor<platform::CPUPlace, T>()(
context, loss, softmax, labels, context.Attr<bool>("softLabel"));
}
};
template <typename T>
class SoftmaxWithCrossEntropyGradKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
const Tensor* out_grad =
context.Input<Tensor>(framework::GradVarName("Loss"));
const Tensor* labels = context.Input<Tensor>("Label");
Tensor* logit_grad =
context.Output<Tensor>(framework::GradVarName("Logits"));
logit_grad->ShareDataWith<T>(*context.Input<Tensor>("Softmax"));
const int class_num = logit_grad->dims()[1];
if (context.Attr<bool>("softLabel")) {
auto out_grad_mat = EigenMatrix<T>::From(*out_grad);
auto logit_grad_mat = EigenMatrix<T>::From(*logit_grad);
auto lbl_mat = EigenMatrix<T>::From(*labels);
logit_grad_mat.device(context.GetEigenDevice<platform::CPUPlace>()) =
logit_grad_mat *
out_grad_mat.broadcast(Eigen::DSizes<int, 2>(1, class_num)) -
lbl_mat;
} else {
const int batch_size = logit_grad->dims()[0];
const int* label_data = labels->data<int>();
const T* out_grad_data = out_grad->data<T>();
T* logit_grad_data = logit_grad->data<T>();
for (int i = 0; i < batch_size; ++i) {
int index = i * class_num + label_data[i];
logit_grad_data[index] =
(out_grad_data[i] * logit_grad_data[index] - 1.);
}
}
}
};
} // namespace operators
} // namespace paddle
...@@ -24,40 +24,42 @@ class SplitOp : public framework::OperatorWithKernel { ...@@ -24,40 +24,42 @@ class SplitOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
// infershape auto in_dims = ctx->GetInputDim("X");
auto *in = ctx.Input<framework::Tensor>("X"); auto outs_names = ctx->Outputs("Out");
auto outs = ctx.MultiOutput<framework::Tensor>("Out"); size_t axis = static_cast<size_t>(ctx->Attrs().Get<int>("axis"));
size_t axis = static_cast<size_t>(ctx.Attr<int>("axis")); size_t num = static_cast<size_t>(ctx->Attrs().Get<int>("num"));
size_t num = static_cast<size_t>(ctx.Attr<int>("num")); std::vector<int> sections = static_cast<std::vector<int>>(
std::vector<int> sections = ctx->Attrs().Get<std::vector<int>>("sections"));
static_cast<std::vector<int>>(ctx.Attr<std::vector<int>>("sections")); const size_t outs_number = outs_names.size();
const size_t n = outs.size(); std::vector<framework::DDim> outs_dims;
outs_dims.reserve(outs_number);
if (num > 0) { if (num > 0) {
int64_t in_axis_dim = in->dims()[axis]; int64_t in_axis_dim = in_dims[axis];
PADDLE_ENFORCE_EQ(in_axis_dim % num, 0, PADDLE_ENFORCE_EQ(in_axis_dim % num, 0,
"tensor split does not result" "tensor split does not result"
" in an equal division"); " in an equal division");
size_t out_axis_dim = in_axis_dim / num; size_t out_axis_dim = in_axis_dim / num;
for (size_t i = 0; i < n; ++i) { for (size_t i = 0; i < outs_number; ++i) {
auto dim = in->dims(); auto dim = in_dims;
dim[axis] = out_axis_dim; dim[axis] = out_axis_dim;
outs[i]->Resize(dim); outs_dims.push_back(dim);
} }
} else if (sections.size() > 0) { } else if (sections.size() > 0) {
PADDLE_ENFORCE_EQ(sections.size(), n, PADDLE_ENFORCE_EQ(sections.size(), outs_number,
"tensor split sections size" "tensor split sections size"
"should be equal to output size."); "should be equal to output size.");
for (size_t i = 0; i < n; ++i) { for (size_t i = 0; i < outs_number; ++i) {
auto dim = in->dims(); auto dim = in_dims;
dim[axis] = sections[i]; dim[axis] = sections[i];
outs[i]->Resize(dim); outs_dims.push_back(dim);
} }
} else { } else {
PADDLE_ENFORCE_NOT_NULL(nullptr, "split operator should", PADDLE_ENFORCE_NOT_NULL(nullptr, "split operator should",
" specify indices or sections."); " specify indices or sections.");
} }
ctx->SetOutputsDim("Out", outs_dims);
} }
}; };
......
...@@ -22,24 +22,19 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel { ...@@ -22,24 +22,19 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasInput("X"),
ctx.InputVar("X"),
"Input(X) of SquaredL2DistanceOp should not be null."); "Input(X) of SquaredL2DistanceOp should not be null.");
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasInput("Y"),
ctx.InputVar("Y"),
"Input(Y) of SquaredL2DistanceOp should not be null."); "Input(Y) of SquaredL2DistanceOp should not be null.");
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(
ctx.OutputVar("sub_result"), ctx->HasOutput("sub_result"),
"Output(sub_result) of SquaredL2DistanceOp should not be null."); "Output(sub_result) of SquaredL2DistanceOp should not be null.");
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasOutput("Out"),
ctx.OutputVar("Out"),
"Output(Out) of SquaredL2DistanceOp should not be null."); "Output(Out) of SquaredL2DistanceOp should not be null.");
auto* x = ctx.Input<Tensor>("X"); auto x_dims = ctx->GetInputDim("X");
auto x_dims = x->dims(); auto y_dims = ctx->GetInputDim("Y");
auto* y = ctx.Input<Tensor>("Y");
auto y_dims = y->dims();
PADDLE_ENFORCE_EQ(framework::arity(x_dims), framework::arity(y_dims), PADDLE_ENFORCE_EQ(framework::arity(x_dims), framework::arity(y_dims),
"Tensor rank of both SquaredL2DistanceOp's " "Tensor rank of both SquaredL2DistanceOp's "
...@@ -47,17 +42,16 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel { ...@@ -47,17 +42,16 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel {
int rank = framework::arity(x_dims); int rank = framework::arity(x_dims);
PADDLE_ENFORCE_GE(rank, 2, "Tensor rank should be at least equal to 2."); PADDLE_ENFORCE_GE(rank, 2, "Tensor rank should be at least equal to 2.");
PADDLE_ENFORCE_EQ(x->numel() / x_dims[0], y->numel() / y_dims[0], PADDLE_ENFORCE_EQ(product(x_dims) / x_dims[0], product(y_dims) / y_dims[0],
"Product of dimensions expcet the first dimension of " "Product of dimensions expcet the first dimension of "
"input and target must be equal."); "input and target must be equal.");
PADDLE_ENFORCE(y_dims[0] == 1 || y_dims[0] == x_dims[0], PADDLE_ENFORCE(y_dims[0] == 1 || y_dims[0] == x_dims[0],
"First dimension of target must be equal to input " "First dimension of target must be equal to input "
"or to 1."); "or to 1.");
ctx.Output<framework::Tensor>("sub_result") ctx->SetOutputDim("sub_result", {x_dims[0], product(x_dims) / x_dims[0]});
->Resize({x_dims[0], x->numel() / x_dims[0]}); ctx->SetOutputDim("Out", {x_dims[0], 1});
ctx.Output<framework::Tensor>("Out")->Resize({x_dims[0], 1}); ctx->ShareLoD("X", /*->*/ "Out");
ctx.ShareLoD("X", /*->*/ "Out");
} }
}; };
...@@ -92,22 +86,22 @@ class SquaredL2DistanceGradOp : public framework::OperatorWithKernel { ...@@ -92,22 +86,22 @@ class SquaredL2DistanceGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Gradient of Out should not be null"); "Gradient of Out should not be null");
auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims(); auto out_dims = ctx->GetInputDim(framework::GradVarName("Out"));
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto y_dims = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx->GetInputDim("Y");
PADDLE_ENFORCE_EQ(out_dims[0], x_dims[0], PADDLE_ENFORCE_EQ(out_dims[0], x_dims[0],
"First dimension of output gradient and " "First dimension of output gradient and "
"input value must be equal."); "input value must be equal.");
PADDLE_ENFORCE_EQ(out_dims[1], 1, PADDLE_ENFORCE_EQ(out_dims[1], 1,
"Second dimension of output gradient " "Second dimension of output gradient "
"must be 1."); "must be 1.");
auto* x_grad = ctx.Output<framework::Tensor>(framework::GradVarName("X")); auto x_grad_name = framework::GradVarName("X");
auto* y_grad = ctx.Output<framework::Tensor>(framework::GradVarName("Y")); auto y_grad_name = framework::GradVarName("Y");
if (x_grad) x_grad->Resize(x_dims); if (ctx->HasOutput(x_grad_name)) ctx->SetOutputDim(x_grad_name, x_dims);
if (y_grad) y_grad->Resize(y_dims); if (ctx->HasOutput(y_grad_name)) ctx->SetOutputDim(y_grad_name, y_dims);
} }
}; };
......
...@@ -21,31 +21,27 @@ class SumOp : public framework::OperatorWithKernel { ...@@ -21,31 +21,27 @@ class SumOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE(!ctx.MultiInputVar("X").empty(), auto x_dims = ctx->GetInputsDim("X");
"Input(X) of SumOp should not be null."); PADDLE_ENFORCE(!x_dims.empty(), "Input(X) of SumOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of SumOp should not be null."); "Output(Out) of SumOp should not be null.");
auto ins = ctx.MultiInput<framework::Tensor>("X"); auto in_dim = x_dims[0];
auto *out = ctx.Output<framework::Tensor>("Out"); size_t N = x_dims.size();
int N = ins.size();
auto in_dim = ins[0]->dims();
PADDLE_ENFORCE_GT(N, 1, "Input tensors count should > 1."); PADDLE_ENFORCE_GT(N, 1, "Input tensors count should > 1.");
for (int i = 1; i < N; i++) { for (size_t i = 1; i < N; i++) {
auto dim = ins[i]->dims(); auto dim = x_dims[i];
PADDLE_ENFORCE(in_dim == dim, "Input tensors must have same shape"); PADDLE_ENFORCE(in_dim == dim, "Input tensors must have same shape");
} }
out->Resize(in_dim); ctx->SetOutputDim("Out", in_dim);
ctx.ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
} }
}; };
class SumOpMaker : public framework::OpProtoAndCheckerMaker { class SumOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SumOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) SumOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "the input tensors of sum operator.").AsDuplicable(); AddInput("X", "the input tensors of sum operator.").AsDuplicable();
AddOutput("Out", "the output tensor of sum operator."); AddOutput("Out", "the output tensor of sum operator.");
...@@ -63,13 +59,16 @@ class SumGradOp : public framework::OperatorWithKernel { ...@@ -63,13 +59,16 @@ class SumGradOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
auto outputs = auto out_grad_dims = ctx->GetInputDim(framework::GradVarName("Out"));
ctx.MultiOutput<framework::Tensor>(framework::GradVarName("X")); auto x_grad_names = ctx->Outputs(framework::GradVarName("X"));
auto dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims(); size_t x_length = x_grad_names.size();
for (auto output : outputs) { std::vector<framework::DDim> x_grad_dims;
output->Resize(dims); x_grad_dims.reserve(x_length);
for (size_t i = 0; i < x_length; ++i) {
x_grad_dims.push_back(out_grad_dims);
} }
ctx->SetOutputsDim(framework::GradVarName("X"), x_grad_dims);
} }
}; };
......
...@@ -22,26 +22,26 @@ class TopkOp : public framework::OperatorWithKernel { ...@@ -22,26 +22,26 @@ class TopkOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of TopkOp should not be null."); "Input(X) of TopkOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of TopkOp should not be null."); "Output(Out) of TopkOp should not be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Indices"), PADDLE_ENFORCE(ctx->HasOutput("Indices"),
"Output(Indices) of TopkOp should not be null."); "Output(Indices) of TopkOp should not be null.");
auto *input = ctx.Input<framework::Tensor>("X"); auto input_dims = ctx->GetInputDim("X");
const int k = static_cast<int>(ctx.Attr<int>("k")); const int k = static_cast<int>(ctx->Attrs().Get<int>("k"));
PADDLE_ENFORCE_GE(k, 1, "k must >= 1"); PADDLE_ENFORCE_GE(k, 1, "k must >= 1");
PADDLE_ENFORCE_GE(input->dims().size(), 1, "input must have >= 1d shape"); PADDLE_ENFORCE_GE(input_dims.size(), 1, "input must have >= 1d shape");
PADDLE_ENFORCE_GE(input->dims()[input->dims().size() - 1], k, PADDLE_ENFORCE_GE(input_dims[input_dims.size() - 1], k,
"input must have >= k columns"); "input must have >= k columns");
framework::DDim dims = input->dims(); framework::DDim dims = input_dims;
dims[dims.size() - 1] = k; dims[dims.size() - 1] = k;
ctx.Output<framework::Tensor>("Out")->Resize(dims); ctx->SetOutputDim("Out", dims);
ctx.Output<framework::Tensor>("Indices")->Resize(dims); ctx->SetOutputDim("Indices", dims);
} }
}; };
......
...@@ -24,12 +24,11 @@ class TransposeOp : public framework::OperatorWithKernel { ...@@ -24,12 +24,11 @@ class TransposeOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"), "Output(Out) should not be null");
"Output(Out) should not be null"); auto x_dims = ctx->GetInputDim("X");
auto x_dims = ctx.Input<Tensor>("X")->dims(); std::vector<int> axis = ctx->Attrs().Get<std::vector<int>>("axis");
std::vector<int> axis = ctx.Attr<std::vector<int>>("axis");
size_t x_rank = x_dims.size(); size_t x_rank = x_dims.size();
size_t axis_size = axis.size(); size_t axis_size = axis.size();
...@@ -51,14 +50,14 @@ class TransposeOp : public framework::OperatorWithKernel { ...@@ -51,14 +50,14 @@ class TransposeOp : public framework::OperatorWithKernel {
for (size_t i = 0; i < axis_size; i++) { for (size_t i = 0; i < axis_size; i++) {
out_dims[i] = x_dims[axis[i]]; out_dims[i] = x_dims[axis[i]];
} }
ctx.Output<framework::Tensor>("Out")->Resize(out_dims); ctx->SetOutputDim("Out", out_dims);
} }
}; };
class TransposeOpMaker : public framework::OpProtoAndCheckerMaker { class TransposeOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
TransposeOpMaker(framework::OpProto *proto, TransposeOpMaker(framework::OpProto* proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
...@@ -94,14 +93,15 @@ class TransposeOpGrad : public framework::OperatorWithKernel { ...@@ -94,14 +93,15 @@ class TransposeOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null"); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto x_dims = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx->GetInputDim("X");
auto *x_grad = ctx.Output<framework::Tensor>(framework::GradVarName("X")); ctx->SetOutputDim(framework::GradVarName("X"), x_dims);
if (ctx->HasOutput(framework::GradVarName("X"))) {
if (x_grad) x_grad->Resize(x_dims); ctx->SetOutputDim(framework::GradVarName("X"), x_dims);
}
} }
}; };
......
...@@ -23,18 +23,18 @@ namespace operators { ...@@ -23,18 +23,18 @@ namespace operators {
template <typename T> template <typename T>
class CPUUniformRandomKernel : public framework::OpKernel { class CPUUniformRandomKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* tensor = context.Output<framework::Tensor>("Out"); auto* tensor = ctx.Output<framework::Tensor>("Out");
T* data = tensor->mutable_data<T>(context.GetPlace()); T* data = tensor->mutable_data<T>(ctx.GetPlace());
unsigned int seed = static_cast<unsigned int>(context.Attr<int>("seed")); unsigned int seed = static_cast<unsigned int>(ctx.Attr<int>("seed"));
std::minstd_rand engine; std::minstd_rand engine;
if (seed == 0) { if (seed == 0) {
seed = std::random_device()(); seed = std::random_device()();
} }
engine.seed(seed); engine.seed(seed);
std::uniform_real_distribution<T> dist( std::uniform_real_distribution<T> dist(
static_cast<T>(context.Attr<float>("min")), static_cast<T>(ctx.Attr<float>("min")),
static_cast<T>(context.Attr<float>("max"))); static_cast<T>(ctx.Attr<float>("max")));
int64_t size = tensor->numel(); int64_t size = tensor->numel();
for (int64_t i = 0; i < size; ++i) { for (int64_t i = 0; i < size; ++i) {
data[i] = dist(engine); data[i] = dist(engine);
...@@ -47,21 +47,20 @@ class UniformRandomOp : public framework::OperatorWithKernel { ...@@ -47,21 +47,20 @@ class UniformRandomOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
protected: protected:
void InferShape(const framework::InferShapeContext& ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE(ctx->HasOutput("Out"),
ctx.OutputVar("Out"),
"Output(Out) of UniformRandomOp should not be null."); "Output(Out) of UniformRandomOp should not be null.");
PADDLE_ENFORCE(Attr<float>("min") < Attr<float>("max"), PADDLE_ENFORCE(
ctx->Attrs().Get<float>("min") < ctx->Attrs().Get<float>("max"),
"uniform_random's min must less then max"); "uniform_random's min must less then max");
auto* tensor = ctx.Output<framework::Tensor>("Out");
auto dims = Attr<std::vector<int>>("dims"); auto dims = Attr<std::vector<int>>("dims");
std::vector<int64_t> temp; std::vector<int64_t> temp;
temp.reserve(dims.size()); temp.reserve(dims.size());
for (auto dim : dims) { for (auto dim : dims) {
temp.push_back(static_cast<int64_t>(dim)); temp.push_back(static_cast<int64_t>(dim));
} }
tensor->Resize(framework::make_ddim(temp)); ctx->SetOutputDim("Out", framework::make_ddim(temp));
} }
}; };
......
...@@ -107,7 +107,7 @@ struct EnforceNotMet : public std::exception { ...@@ -107,7 +107,7 @@ struct EnforceNotMet : public std::exception {
template <typename... Args> template <typename... Args>
inline typename std::enable_if<sizeof...(Args) != 0, void>::type throw_on_error( inline typename std::enable_if<sizeof...(Args) != 0, void>::type throw_on_error(
int stat, const Args&... args) { bool stat, const Args&... args) {
if (UNLIKELY(!(stat))) { if (UNLIKELY(!(stat))) {
throw std::runtime_error(string::Sprintf(args...)); throw std::runtime_error(string::Sprintf(args...));
} }
......
if(WITH_PYTHON) if(WITH_PYTHON)
cc_library(paddle_pybind SHARED cc_library(paddle_pybind SHARED
SRCS pybind.cc SRCS pybind.cc protobuf.cc
DEPS pybind python backward DEPS pybind python backward
${GLOB_OP_LIB}) ${GLOB_OP_LIB})
endif(WITH_PYTHON) endif(WITH_PYTHON)
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/pybind/protobuf.h"
#include <deque>
#include <iostream>
#include "paddle/framework/attribute.h"
// Cast boost::variant for PyBind.
// Copy from
// https://github.com/pybind/pybind11/issues/576#issuecomment-269563199
namespace pybind11 {
namespace detail {
// Can be replaced by a generic lambda in C++14
struct variant_caster_visitor : public boost::static_visitor<handle> {
return_value_policy policy;
handle parent;
variant_caster_visitor(return_value_policy policy, handle parent)
: policy(policy), parent(parent) {}
template <class T>
handle operator()(T const &src) const {
return make_caster<T>::cast(src, policy, parent);
}
};
template <class Variant>
struct variant_caster;
template <template <class...> class V, class... Ts>
struct variant_caster<V<Ts...>> {
using Type = V<Ts...>;
template <typename T>
typename std::enable_if<
!std::is_same<T, boost::detail::variant::void_>::value,
bool>::type
try_load(handle src, bool convert) {
auto caster = make_caster<T>();
if (!load_success_ && caster.load(src, convert)) {
load_success_ = true;
value = cast_op<T>(caster);
return true;
}
return false;
}
template <typename T>
typename std::enable_if<std::is_same<T, boost::detail::variant::void_>::value,
bool>::type
try_load(handle src, bool convert) {
return false;
}
bool load(handle src, bool convert) {
auto unused = {false, try_load<Ts>(src, convert)...};
(void)(unused);
return load_success_;
}
static handle cast(Type const &src,
return_value_policy policy,
handle parent) {
variant_caster_visitor visitor(policy, parent);
return boost::apply_visitor(visitor, src);
}
PYBIND11_TYPE_CASTER(Type, _("Variant"));
bool load_success_{false};
};
// Add specialization for concrete variant type
template <class... Args>
struct type_caster<boost::variant<Args...>>
: variant_caster<boost::variant<Args...>> {};
} // namespace detail
} // namespace pybind11
namespace paddle {
namespace pybind {
using namespace paddle::framework; // NOLINT
// convert between std::vector and protobuf repeated.
template <typename T>
inline std::vector<T> RepeatedToVector(
const google::protobuf::RepeatedField<T> &repeated_field) {
std::vector<T> ret;
ret.reserve(repeated_field.size());
std::copy(
repeated_field.begin(), repeated_field.end(), std::back_inserter(ret));
return ret;
}
template <typename T, typename RepeatedField>
inline void VectorToRepeated(const std::vector<T> &vec,
RepeatedField *repeated_field) {
repeated_field->Reserve(vec.size());
for (const auto &elem : vec) {
*repeated_field->Add() = elem;
}
}
// Specialize vector<bool>.
template <typename RepeatedField>
inline void VectorToRepeated(const std::vector<bool> &vec,
RepeatedField *repeated_field) {
repeated_field->Reserve(vec.size());
for (auto elem : vec) {
*repeated_field->Add() = elem;
}
}
class ProgramDescBind;
class OpDescBind;
class BlockDescBind;
class VarDescBind;
// Each Protobuf Message, we provide a XXXBind class. In that class, we optimize
// read/write speed. Only when we want the protobuf message, the local changes
// will be synchronized (by `Sync` method).
class VarDescBind {
public:
explicit VarDescBind(const std::string &name) { desc_.set_name(name); }
VarDesc *Proto() { return &desc_; }
py::bytes Name() const { return desc_.name(); }
void SetShape(const std::vector<int64_t> &dims) {
VectorToRepeated(dims, desc_.mutable_lod_tensor()->mutable_dims());
}
void SetDataType(framework::DataType data_type) {
desc_.mutable_lod_tensor()->set_data_type(data_type);
}
std::vector<int64_t> Shape() const {
return RepeatedToVector(desc_.lod_tensor().dims());
}
framework::DataType DataType() const {
return desc_.lod_tensor().data_type();
}
private:
VarDesc desc_;
};
class OpDescBind {
public:
OpDesc *Proto() {
Sync();
return &op_desc_;
}
std::string Type() const { return op_desc_.type(); }
void SetType(const std::string &type) { op_desc_.set_type(type); }
const std::vector<std::string> &Input(const std::string &name) const {
auto it = inputs_.find(name);
PADDLE_ENFORCE(
it != inputs_.end(), "Input %s cannot be found in Op %s", name, Type());
return it->second;
}
std::vector<std::string> InputNames() const {
std::vector<std::string> retv;
retv.reserve(this->inputs_.size());
for (auto &ipt : this->inputs_) {
retv.push_back(ipt.first);
}
return retv;
}
void SetInput(const std::string &param_name,
const std::vector<std::string> &args) {
need_update_ = true;
inputs_[param_name] = args;
}
const std::vector<std::string> &Output(const std::string &name) const {
auto it = outputs_.find(name);
PADDLE_ENFORCE(it != outputs_.end(),
"Output %s cannot be found in Op %s",
name,
Type());
return it->second;
}
std::vector<std::string> OutputNames() const {
std::vector<std::string> retv;
retv.reserve(this->outputs_.size());
for (auto &ipt : this->outputs_) {
retv.push_back(ipt.first);
}
return retv;
}
void SetOutput(const std::string &param_name,
const std::vector<std::string> &args) {
need_update_ = true;
this->outputs_[param_name] = args;
}
std::string DebugString() { return this->Proto()->DebugString(); }
bool HasAttr(const std::string &name) const {
return attrs_.find(name) != attrs_.end();
}
framework::AttrType GetAttrType(const std::string &name) const {
auto it = attrs_.find(name);
PADDLE_ENFORCE(it != attrs_.end(), "Attribute %s is not found", name);
return static_cast<framework::AttrType>(it->second.which() - 1);
}
std::vector<std::string> AttrNames() const {
std::vector<std::string> retv;
retv.reserve(attrs_.size());
for (auto &attr : attrs_) {
retv.push_back(attr.first);
}
return retv;
}
void SetAttr(const std::string &name, const Attribute &v) {
this->attrs_[name] = v;
need_update_ = true;
}
void SetBlockAttr(const std::string &name, BlockDescBind &block);
Attribute GetAttr(const std::string &name) const {
auto it = attrs_.find(name);
PADDLE_ENFORCE(it != attrs_.end(), "Attribute %s is not found", name);
return it->second;
}
int GetBlockAttr(const std::string &name) const {
auto it = attrs_.find(name);
PADDLE_ENFORCE(it != attrs_.end(), "Attribute %s is not found", name);
return boost::get<BlockDesc *>(it->second)->idx();
}
private:
struct SetAttrDescVisitor : public boost::static_visitor<void> {
explicit SetAttrDescVisitor(OpDesc::Attr *attr) : attr_(attr) {}
mutable OpDesc::Attr *attr_;
void operator()(int v) const { attr_->set_i(v); }
void operator()(float v) const { attr_->set_f(v); }
void operator()(const std::string &v) const { attr_->set_s(v); }
void operator()(bool b) const { attr_->set_b(b); }
void operator()(const std::vector<int> &v) const {
VectorToRepeated(v, attr_->mutable_ints());
}
void operator()(const std::vector<float> &v) const {
VectorToRepeated(v, attr_->mutable_floats());
}
void operator()(const std::vector<std::string> &v) const {
VectorToRepeated(v, attr_->mutable_strings());
}
void operator()(const std::vector<bool> &v) const {
VectorToRepeated(v, attr_->mutable_bools());
}
void operator()(BlockDesc *desc) const {
attr_->set_block_idx(desc->idx());
}
void operator()(boost::blank) const { PADDLE_THROW("Unexpected branch"); }
};
void Sync() {
if (need_update_) {
this->op_desc_.mutable_inputs()->Clear();
for (auto &ipt : inputs_) {
auto *input = op_desc_.add_inputs();
input->set_parameter(ipt.first);
VectorToRepeated(ipt.second, input->mutable_arguments());
}
this->op_desc_.mutable_outputs()->Clear();
for (auto &opt : outputs_) {
auto *output = op_desc_.add_outputs();
output->set_parameter(opt.first);
VectorToRepeated(opt.second, output->mutable_arguments());
}
this->op_desc_.mutable_attrs()->Clear();
for (auto &attr : attrs_) {
auto *attr_desc = op_desc_.add_attrs();
attr_desc->set_name(attr.first);
attr_desc->set_type(
static_cast<framework::AttrType>(attr.second.which() - 1));
boost::apply_visitor(SetAttrDescVisitor(attr_desc), attr.second);
}
need_update_ = false;
}
}
OpDesc op_desc_;
std::unordered_map<std::string, std::vector<std::string>> inputs_;
std::unordered_map<std::string, std::vector<std::string>> outputs_;
std::unordered_map<std::string, Attribute> attrs_;
// need_update_ indicate there some local changes not be synchronized. If
// local changes should be synchronized, need_update_ should be set to true.
bool need_update_{false};
};
class BlockDescBind {
public:
BlockDescBind(ProgramDescBind *prog, BlockDesc *desc)
: prog_(prog), desc_(desc), need_update_(false) {}
BlockDescBind(const BlockDescBind &o) = delete;
BlockDescBind &operator=(const BlockDescBind &o) = delete;
int32_t ID() const { return desc_->idx(); }
int32_t Parent() const { return desc_->parent_idx(); }
VarDescBind *NewVar(py::bytes name_bytes) {
std::string name = name_bytes;
need_update_ = true;
auto it = vars_.find(name);
PADDLE_ENFORCE(it == vars_.end(), "Duplicated variable %s", name);
auto var = new VarDescBind(name);
vars_[name].reset(var);
return var;
}
VarDescBind *Var(py::bytes name_bytes) const {
std::string name = name_bytes;
auto it = vars_.find(name);
PADDLE_ENFORCE(
it != vars_.end(), "Can not find variable %s in current block.", name);
return it->second.get();
}
std::vector<VarDescBind *> AllVars() const {
std::vector<VarDescBind *> res;
for (const auto &p : vars_) {
res.push_back(p.second.get());
}
return res;
}
BlockDescBind *ParentBlock() const;
OpDescBind *AppendOp() {
need_update_ = true;
ops_.emplace_back(new OpDescBind());
return ops_.back().get();
}
OpDescBind *PrependOp() {
need_update_ = true;
ops_.emplace_front(new OpDescBind());
return ops_.front().get();
}
std::vector<OpDescBind *> AllOps() const {
std::vector<OpDescBind *> res;
for (const auto &op : ops_) {
res.push_back(op.get());
}
return res;
}
void Sync() {
if (need_update_) {
auto &op_field = *this->desc_->mutable_ops();
op_field.Clear();
op_field.Reserve(static_cast<int>(ops_.size()));
for (auto &op_desc : ops_) {
op_field.AddAllocated(op_desc->Proto());
}
need_update_ = false;
}
}
BlockDesc *RawPtr() { return desc_; }
private:
ProgramDescBind *prog_; // not_own
BlockDesc *desc_; // not_own
bool need_update_;
std::deque<std::unique_ptr<OpDescBind>> ops_;
std::unordered_map<std::string, std::unique_ptr<VarDescBind>> vars_;
};
using ProgDescMap =
std::unordered_map<ProgramDesc *, std::unique_ptr<ProgramDescBind>>;
static ProgDescMap *g_bind_map = nullptr;
class ProgramDescBind {
public:
static ProgramDescBind &Instance(ProgramDesc *prog) {
if (g_bind_map == nullptr) {
g_bind_map = new ProgDescMap();
}
auto &map = *g_bind_map;
auto &ptr = map[prog];
if (ptr == nullptr) {
ptr.reset(new ProgramDescBind(prog));
}
return *ptr;
}
ProgramDescBind(const ProgramDescBind &o) = delete;
ProgramDescBind &operator=(const ProgramDescBind &o) = delete;
BlockDescBind *AppendBlock(const BlockDescBind &parent) {
auto *b = prog_->add_blocks();
b->set_parent_idx(parent.ID());
b->set_idx(prog_->blocks_size() - 1);
blocks_.emplace_back(new BlockDescBind(this, b));
return blocks_.back().get();
}
BlockDescBind *Block(size_t idx) { return blocks_[idx].get(); }
std::string DebugString() { return Proto()->DebugString(); }
size_t Size() const { return blocks_.size(); }
ProgramDesc *Proto() {
for (auto &block : blocks_) {
block->Sync();
}
return prog_;
}
private:
explicit ProgramDescBind(ProgramDesc *prog) : prog_(prog) {
for (auto &block : *prog->mutable_blocks()) {
blocks_.emplace_back(new BlockDescBind(this, &block));
}
}
// Not owned
ProgramDesc *prog_;
std::vector<std::unique_ptr<BlockDescBind>> blocks_;
};
BlockDescBind *BlockDescBind::ParentBlock() const {
if (this->desc_->parent_idx() == -1) {
return nullptr;
}
return prog_->Block(static_cast<size_t>(this->desc_->parent_idx()));
}
void OpDescBind::SetBlockAttr(const std::string &name, BlockDescBind &block) {
BlockDesc *desc = block.RawPtr();
this->attrs_[name] = desc;
}
// Bind Methods
void BindProgramDesc(py::module &m) {
py::class_<ProgramDescBind>(m, "ProgramDesc", "")
.def_static("instance",
[]() -> ProgramDescBind * {
return &ProgramDescBind::Instance(&GetProgramDesc());
},
py::return_value_policy::reference)
.def_static("__create_program_desc__",
[]() -> ProgramDescBind * {
// Only used for unit-test
auto *prog_desc = new ProgramDesc;
auto *block = prog_desc->mutable_blocks()->Add();
block->set_idx(0);
block->set_parent_idx(-1);
return &ProgramDescBind::Instance(prog_desc);
},
py::return_value_policy::reference)
.def("append_block",
&ProgramDescBind::AppendBlock,
py::return_value_policy::reference)
.def("block", &ProgramDescBind::Block, py::return_value_policy::reference)
.def("__str__", &ProgramDescBind::DebugString)
.def("num_blocks", &ProgramDescBind::Size);
}
void BindBlockDesc(py::module &m) {
py::class_<BlockDescBind>(m, "BlockDesc", "")
.def_property_readonly("id", &BlockDescBind::ID)
.def_property_readonly("parent", &BlockDescBind::Parent)
.def("append_op",
&BlockDescBind::AppendOp,
py::return_value_policy::reference)
.def("prepend_op",
&BlockDescBind::PrependOp,
py::return_value_policy::reference)
.def(
"new_var", &BlockDescBind::NewVar, py::return_value_policy::reference)
.def("var", &BlockDescBind::Var, py::return_value_policy::reference)
.def("all_vars",
&BlockDescBind::AllVars,
py::return_value_policy::reference)
.def("all_ops",
&BlockDescBind::AllOps,
py::return_value_policy::reference);
}
void BindVarDsec(py::module &m) {
py::enum_<framework::DataType>(m, "DataType", "")
.value("BOOL", DataType::BOOL)
.value("INT16", DataType::INT16)
.value("INT32", DataType::INT32)
.value("INT64", DataType::INT64)
.value("FP16", DataType::FP16)
.value("FP32", DataType::FP32)
.value("FP64", DataType::FP64);
py::class_<VarDescBind>(m, "VarDesc", "")
.def("name", &VarDescBind::Name, py::return_value_policy::reference)
.def("set_shape", &VarDescBind::SetShape)
.def("set_data_type", &VarDescBind::SetDataType)
.def("shape", &VarDescBind::Shape, py::return_value_policy::reference)
.def("data_type", &VarDescBind::DataType);
}
void BindOpDesc(py::module &m) {
py::enum_<framework::AttrType>(m, "AttrType", "")
.value("INT", AttrType::INT)
.value("INTS", AttrType::INTS)
.value("FLOAT", AttrType::FLOAT)
.value("FLOATS", AttrType::FLOATS)
.value("STRING", AttrType::STRING)
.value("STRINGS", AttrType::STRINGS)
.value("BOOL", AttrType::BOOLEAN)
.value("BOOLS", AttrType::BOOLEANS)
.value("BLOCK", AttrType::BLOCK);
py::class_<OpDescBind> op_desc(m, "OpDesc", "");
op_desc.def("type", &OpDescBind::Type)
.def("set_type", &OpDescBind::SetType)
.def("input", &OpDescBind::Input)
.def("input_names", &OpDescBind::InputNames)
.def("set_input", &OpDescBind::SetInput)
.def("output", &OpDescBind::Output)
.def("output_names", &OpDescBind::OutputNames)
.def("set_output", &OpDescBind::SetOutput)
.def("__str__", &OpDescBind::DebugString)
.def("__repr__", &OpDescBind::DebugString)
.def("has_attr", &OpDescBind::HasAttr)
.def("attr_type", &OpDescBind::GetAttrType)
.def("attr_names", &OpDescBind::AttrNames)
.def("set_attr", &OpDescBind::SetAttr)
.def("attr", &OpDescBind::GetAttr)
.def("set_block_attr", &OpDescBind::SetBlockAttr)
.def("get_block_attr", &OpDescBind::GetBlockAttr);
}
} // namespace pybind
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <Python.h>
#include <fstream>
#include <vector>
#include "paddle/framework/op_registry.h"
#include "pybind11/numpy.h"
#include "pybind11/pybind11.h"
#include "pybind11/stl.h"
namespace py = pybind11;
namespace paddle {
namespace pybind {
void BindProgramDesc(py::module& m);
void BindBlockDesc(py::module& m);
void BindVarDsec(py::module& m);
void BindOpDesc(py::module& m);
} // namespace pybind
} // namespace paddle
...@@ -12,13 +12,10 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,13 +12,10 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include <Python.h> #include "paddle/pybind/protobuf.h"
#include <fstream>
#include <vector>
#include "paddle/framework/backward.h" #include "paddle/framework/backward.h"
#include "paddle/framework/lod_tensor.h" #include "paddle/framework/lod_tensor.h"
#include "paddle/framework/op_registry.h"
#include "paddle/operators/cond_op.h" #include "paddle/operators/cond_op.h"
#include "paddle/operators/net_op.h" #include "paddle/operators/net_op.h"
#include "paddle/operators/recurrent_op.h" #include "paddle/operators/recurrent_op.h"
...@@ -27,11 +24,6 @@ limitations under the License. */ ...@@ -27,11 +24,6 @@ limitations under the License. */
#include "paddle/pybind/pybind.h" #include "paddle/pybind/pybind.h"
#include "paddle/pybind/tensor_py.h" #include "paddle/pybind/tensor_py.h"
#include "paddle/string/to_string.h" #include "paddle/string/to_string.h"
#include "pybind11/numpy.h"
#include "pybind11/pybind11.h"
#include "pybind11/stl.h"
namespace py = pybind11;
namespace paddle { namespace paddle {
namespace pybind { namespace pybind {
...@@ -320,6 +312,11 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -320,6 +312,11 @@ All parameter, weight, gradient are variables in Paddle.
m.def("is_compile_gpu", IsCompileGPU); m.def("is_compile_gpu", IsCompileGPU);
BindProgramDesc(m);
BindBlockDesc(m);
BindVarDsec(m);
BindOpDesc(m);
return m.ptr(); return m.ptr();
} }
} // namespace pybind } // namespace pybind
......
...@@ -177,7 +177,7 @@ def get_gradient(scope, op, inputs, outputs, grad_name, place, ...@@ -177,7 +177,7 @@ def get_gradient(scope, op, inputs, outputs, grad_name, place,
class OpTest(unittest.TestCase): class OpTest(unittest.TestCase):
def check_output_with_place(self, place): def check_output_with_place(self, place, atol):
self.scope = core.Scope() self.scope = core.Scope()
op_inputs = self.inputs if hasattr(self, "inputs") else dict() op_inputs = self.inputs if hasattr(self, "inputs") else dict()
op_outputs = self.outputs if hasattr(self, "outputs") else dict() op_outputs = self.outputs if hasattr(self, "outputs") else dict()
...@@ -206,22 +206,23 @@ class OpTest(unittest.TestCase): ...@@ -206,22 +206,23 @@ class OpTest(unittest.TestCase):
self.scope.find_var(sub_out_name).get_tensor()) self.scope.find_var(sub_out_name).get_tensor())
self.assertTrue( self.assertTrue(
np.allclose( np.allclose(
actual, expect, atol=1e-05), actual, expect, atol=atol),
"output name: " + out_name + " has diff") "output name: " + out_name + " has diff.")
else: else:
actual = np.array(self.scope.find_var(out_name).get_tensor()) actual = np.array(self.scope.find_var(out_name).get_tensor())
expect = self.outputs[out_name] expect = self.outputs[out_name]
self.assertTrue( self.assertTrue(
np.allclose( np.allclose(
actual, expect, atol=1e-05), actual, expect, atol=atol),
"output name: " + out_name + " has diff") "output name: " + out_name + " has diff.")
def check_output(self): def check_output(self, atol=1e-5):
places = [core.CPUPlace()] places = [core.CPUPlace()]
if core.is_compile_gpu(): if core.is_compile_gpu():
places.append(core.GPUPlace(0)) places.append(core.GPUPlace(0))
for place in places: for place in places:
self.check_output_with_place(place) self.check_output_with_place(place, atol)
def __assert_is_close(self, numeric_grads, analytic_grads, names, def __assert_is_close(self, numeric_grads, analytic_grads, names,
max_relative_error, msg_prefix): max_relative_error, msg_prefix):
...@@ -235,9 +236,10 @@ class OpTest(unittest.TestCase): ...@@ -235,9 +236,10 @@ class OpTest(unittest.TestCase):
def err_msg(): def err_msg():
offset = np.argmax(diff_mat > max_relative_error) offset = np.argmax(diff_mat > max_relative_error)
return "%s Variable %s max gradient diff %f over limit %f, the first " \ return ("%s Variable %s max gradient diff %f over limit %f, "
"error element is %d" % ( "the first error element is %d") % (
msg_prefix, name, max_diff, max_relative_error, offset) msg_prefix, name, max_diff, max_relative_error,
offset)
self.assertLessEqual(max_diff, max_relative_error, err_msg()) self.assertLessEqual(max_diff, max_relative_error, err_msg())
......
...@@ -5,22 +5,31 @@ from op_test import OpTest ...@@ -5,22 +5,31 @@ from op_test import OpTest
def modified_huber_loss_forward(val): def modified_huber_loss_forward(val):
if val < -1: if val < -1:
return -4 * val return -4. * val
elif val < 1: elif val < 1:
return (1 - val) * (1 - val) return (1. - val) * (1. - val)
else: else:
return 0 return 0.
class TestModifiedHuberLossOp(OpTest): class TestModifiedHuberLossOp(OpTest):
def setUp(self): def setUp(self):
self.op_type = 'modified_huber_loss' self.op_type = 'modified_huber_loss'
samples_num = 32 samples_num = 32
self.inputs = {
'X': np.random.uniform(-1, 1., (samples_num, 1)).astype('float32'), x_np = np.random.uniform(-2., 2., (samples_num, 1)).astype('float32')
'Y': np.random.choice([0, 1], samples_num).reshape((samples_num, 1)) y_np = np.random.choice([0, 1], samples_num).reshape(
} (samples_num, 1)).astype('float32')
product_res = self.inputs['X'] * (2 * self.inputs['Y'] - 1) product_res = x_np * (2. * y_np - 1.)
# keep away from the junction of piecewise function
for pos, val in np.ndenumerate(product_res):
while abs(val - 1.) < 0.05:
x_np[pos] = np.random.uniform(-2., 2.)
y_np[pos] = np.random.choice([0, 1])
product_res[pos] = x_np[pos] * (2 * y_np[pos] - 1)
val = product_res[pos]
self.inputs = {'X': x_np, 'Y': y_np}
loss = np.vectorize(modified_huber_loss_forward)(product_res) loss = np.vectorize(modified_huber_loss_forward)(product_res)
self.outputs = { self.outputs = {
...@@ -32,7 +41,7 @@ class TestModifiedHuberLossOp(OpTest): ...@@ -32,7 +41,7 @@ class TestModifiedHuberLossOp(OpTest):
self.check_output() self.check_output()
def test_check_grad(self): def test_check_grad(self):
self.check_grad(['X'], 'Out', max_relative_error=0.005) self.check_grad(['X'], 'Out', max_relative_error=0.01)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -6,20 +6,22 @@ from op_test import OpTest ...@@ -6,20 +6,22 @@ from op_test import OpTest
class TestMultiplexOp(OpTest): class TestMultiplexOp(OpTest):
def setUp(self): def setUp(self):
self.op_type = "multiplex" self.op_type = "multiplex"
rows = 3 rows = 4
index = np.array([3, 1, 0]) index = np.arange(0, rows).astype('int32')
np.random.shuffle(index)
index = np.reshape(index, (rows, 1))
ins1 = np.random.random((rows, 10)).astype("float32") ins1 = np.random.random((rows, 10)).astype("float32")
ins2 = np.random.random((rows, 10)).astype("float32") ins2 = np.random.random((rows, 10)).astype("float32")
ins3 = np.random.random((rows, 10)).astype("float32") ins3 = np.random.random((rows, 10)).astype("float32")
ins4 = np.random.random((rows, 10)).astype("float32") ins4 = np.random.random((rows, 10)).astype("float32")
self.inputs = { self.inputs = {
'X': [('index', index), ('x1', ins1), ('x2', ins2), ('x3', ins3), 'Ids': index,
('x4', ins4)] 'X': [('x1', ins1), ('x2', ins2), ('x3', ins3), ('x4', ins4)]
} }
# multiplex output # multiplex output
output = np.zeros_like(ins1) output = np.zeros_like(ins1)
for i in range(0, rows): for i in range(0, rows):
k = index[i] + 1 k = index[i][0]
output[i] = self.inputs['X'][k][1][i] output[i] = self.inputs['X'][k][1][i]
self.outputs = {'Out': output} self.outputs = {'Out': output}
......
import unittest
import paddle.v2.framework.core as core
class TestOpDesc(unittest.TestCase):
def test_op_desc(self):
prog = core.ProgramDesc.__create_program_desc__()
self.assertIsNotNone(prog)
block = prog.block(0)
self.assertIsNotNone(block)
op = block.append_op()
self.assertIsNotNone(op)
op.set_type("test")
self.assertEqual("test", op.type())
op.set_input("X", ["a", "b", "c"])
self.assertEqual(["a", "b", "c"], op.input("X"))
self.assertEqual(["X"], op.input_names())
op.set_output("Out", ["z"])
self.assertEqual(['z'], op.output("Out"))
self.assertEqual(["Out"], op.output_names())
op.set_attr("int_attr", 1)
self.assertEqual(1, op.attr("int_attr"))
self.assertTrue(op.has_attr("int_attr"))
self.assertEqual(core.AttrType.INT, op.attr_type("int_attr"))
op.set_attr("float_attr", -1.32)
self.assertAlmostEqual(-1.32, op.attr("float_attr"), delta=1e-4)
self.assertTrue(op.has_attr("float_attr"))
op.set_attr("bool_attr", False)
self.assertFalse(op.attr("bool_attr"))
op.set_attr("string_attr", "abc")
self.assertEqual("abc", op.attr("string_attr"))
self.assertTrue(op.has_attr("string_attr"))
op.set_attr("ints_attr", [1, 2, 3])
self.assertEqual([1, 2, 3], op.attr("ints_attr"))
expected = [1.2, 2.3, 3.4]
op.set_attr("floats_attr", expected)
for e, a in zip(expected, op.attr("floats_attr")):
self.assertAlmostEqual(e, a, delta=1e-4)
op.set_attr("strings_attr", ["a", "b", "c"])
self.assertEqual(["a", "b", "c"], op.attr("strings_attr"))
op.set_attr("bools_attr", [True, False, True])
self.assertEqual([True, False, True], op.attr("bools_attr"))
self.assertEqual(8, len(op.attr_names()))
op.set_block_attr("block_attr", prog.block(0))
self.assertEqual(0, op.get_block_attr("block_attr"))
class TestProgramDesc(unittest.TestCase):
def test_instance(self):
program_desc = core.ProgramDesc.__create_program_desc__()
self.assertIsNotNone(program_desc)
del program_desc
program_desc = core.ProgramDesc.instance()
self.assertIsNotNone(program_desc)
self.assertIsNotNone(program_desc.block(0))
del program_desc
def test_append_block(self):
prog_desc = core.ProgramDesc.__create_program_desc__()
self.assertIsNotNone(prog_desc)
block_root = prog_desc.block(0)
self.assertIsNotNone(block_root)
self.assertEqual(block_root.id, 0)
block1 = prog_desc.append_block(block_root)
block2 = prog_desc.append_block(block1)
self.assertIsNotNone(block1)
self.assertEqual(block1.id, block2.parent)
self.assertEqual(block_root.id, block1.parent)
block3 = prog_desc.append_block(block_root)
self.assertEqual(block3.parent, block_root.id)
self.assertEqual(prog_desc.block(1).id, 1)
self.assertEqual(4, prog_desc.num_blocks())
class TestVarDesc(unittest.TestCase):
def test_shape(self):
program_desc = core.ProgramDesc.__create_program_desc__()
block = program_desc.block(0)
var = block.new_var('my_var')
src_shape = [3, 2, 10, 8]
var.set_shape(src_shape)
res_shape = var.shape()
self.assertEqual(src_shape, res_shape)
def test_data_type(self):
program_desc = core.ProgramDesc.__create_program_desc__()
block = program_desc.block(0)
var = block.new_var('my_var')
var.set_data_type(core.DataType.INT32)
self.assertEqual(core.DataType.INT32, var.data_type())
class TestBlockDesc(unittest.TestCase):
def test_add_var(self):
prog = core.ProgramDesc.__create_program_desc__()
self.assertIsNotNone(prog)
block = prog.block(0)
self.assertIsNotNone(block)
var1 = block.new_var("var1")
var2 = block.new_var("var2")
var3 = block.new_var("var3")
all_vars = block.all_vars()
self.assertEqual(set(all_vars), set([var1, var2, var3]))
var2_re = block.var("var2")
self.assertEqual(var2_re, var2)
def test_add_op(self):
prog = core.ProgramDesc.__create_program_desc__()
self.assertIsNotNone(prog)
block = prog.block(0)
self.assertIsNotNone(block)
op1 = block.append_op()
op2 = block.append_op()
op0 = block.prepend_op()
all_ops = block.all_ops()
self.assertEqual(all_ops, [op0, op1, op2])
if __name__ == '__main__':
unittest.main()
...@@ -5,7 +5,7 @@ from op_test import OpTest ...@@ -5,7 +5,7 @@ from op_test import OpTest
def stable_softmax(x): def stable_softmax(x):
"""Compute the softmax of vector x in a numerically stable way.""" """Compute the softmax of vector x in a numerically stable way."""
shiftx = x - np.max(x) shiftx = x - np.max(x).clip(-64.)
exps = np.exp(shiftx) exps = np.exp(shiftx)
return exps / np.sum(exps) return exps / np.sum(exps)
......
import unittest
import numpy as np
from op_test import OpTest
from test_softmax_op import stable_softmax
class TestSoftmaxWithCrossEntropyOp(OpTest):
"""
Test softmax with cross entropy operator with discreate one-hot labels.
"""
def setUp(self):
self.op_type = "softmax_with_cross_entropy"
batch_size = 3
class_num = 37
logits = np.random.uniform(0.1, 1.0,
[batch_size, class_num]).astype("float32")
softmax = np.apply_along_axis(stable_softmax, 1, logits)
labels = np.random.randint(0, class_num, [batch_size, 1], dtype="int32")
cross_entropy = np.asmatrix(
[[-np.log(softmax[i][labels[i][0]])]
for i in range(softmax.shape[0])],
dtype="float32")
self.inputs = {"Logits": logits, "Label": labels}
self.outputs = {"Softmax": softmax, "Loss": cross_entropy}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(["Logits"], "Loss", max_relative_error=0.05)
class TestSoftmaxWithCrossEntropyOp2(OpTest):
"""
Test softmax with cross entropy operator with soft labels.
"""
def setUp(self):
self.op_type = "softmax_with_cross_entropy"
batch_size = 2
class_num = 17
logits = np.random.uniform(0.1, 1.0,
[batch_size, class_num]).astype("float32")
softmax = np.apply_along_axis(stable_softmax, 1, logits)
labels = np.random.uniform(0.1, 1.0,
[batch_size, class_num]).astype("float32")
labels /= np.sum(labels, axis=1, keepdims=True)
cross_entropy = (-labels * np.log(softmax)).sum(
axis=1, keepdims=True).astype("float32")
self.inputs = {"Logits": logits, "Label": labels}
self.outputs = {"Softmax": softmax, "Loss": cross_entropy}
self.attrs = {"softLabel": True}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(["Logits"], "Loss", max_relative_error=0.05)
if __name__ == "__main__":
unittest.main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册