提交 aa320894 编写于 作者: Q qijun

fix conflict

...@@ -61,32 +61,32 @@ Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddl ...@@ -61,32 +61,32 @@ Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddl
## Installation ## Installation
It is recommended to check out the It is recommended to check out the
[Docker installation guide](http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/docker_install_en.html) [Docker installation guide](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/build_and_install/docker_install_en.html)
before looking into the before looking into the
[build from source guide](http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/build_from_source_en.html). [build from source guide](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/build_and_install/build_from_source_en.html).
## Documentation ## Documentation
We provide [English](http://doc.paddlepaddle.org/develop/doc/) and We provide [English](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/index_en.html) and
[Chinese](http://doc.paddlepaddle.org/doc_cn/) documentation. [Chinese](http://www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn.html) documentation.
- [Deep Learning 101](http://book.paddlepaddle.org/index.html) - [Deep Learning 101](http://www.paddlepaddle.org/docs/develop/book/01.fit_a_line/index.html)
You might want to start from this online interactive book that can run in a Jupyter Notebook. You might want to start from this online interactive book that can run in a Jupyter Notebook.
- [Distributed Training](http://doc.paddlepaddle.org/develop/doc/howto/usage/cluster/cluster_train_en.html) - [Distributed Training](http://www.paddlepaddle.org/docs/develop/documentation/en/howto/usage/cluster/cluster_train_en.html)
You can run distributed training jobs on MPI clusters. You can run distributed training jobs on MPI clusters.
- [Distributed Training on Kubernetes](http://doc.paddlepaddle.org/develop/doc/howto/usage/k8s/k8s_en.html) - [Distributed Training on Kubernetes](http://www.paddlepaddle.org/docs/develop/documentation/en/howto/usage/cluster/k8s_en.html)
You can also run distributed training jobs on Kubernetes clusters. You can also run distributed training jobs on Kubernetes clusters.
- [Python API](http://doc.paddlepaddle.org/develop/doc/api/index_en.html) - [Python API](http://www.paddlepaddle.org/docs/develop/documentation/en/api/index_en.html)
Our new API enables much shorter programs. Our new API enables much shorter programs.
- [How to Contribute](http://doc.paddlepaddle.org/develop/doc/howto/dev/contribute_to_paddle_en.html) - [How to Contribute](http://www.paddlepaddle.org/docs/develop/documentation/en/howto/dev/contribute_to_paddle_en.html)
We appreciate your contributions! We appreciate your contributions!
......
...@@ -305,3 +305,10 @@ lstm_unit ...@@ -305,3 +305,10 @@ lstm_unit
--------- ---------
.. autofunction:: paddle.v2.fluid.layers.lstm_unit .. autofunction:: paddle.v2.fluid.layers.lstm_unit
:noindex: :noindex:
sequence_softmax
---------
.. autofunction:: paddle.v2.fluid.layers.sequence_softmax
:noindex:
# PaddlePaddle分布式训练 # 分布式训练
## 概述 ## 概述
...@@ -181,8 +181,8 @@ PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务 ...@@ -181,8 +181,8 @@ PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务
## 在不同集群中运行 ## 在不同集群中运行
- [fabric](fabric_cn.md) - [fabric集群](fabric_cn.md)
- [openmpi](openmpi_cn.md) - [openmpi集群](openmpi_cn.md)
- [kubernetes](k8s_cn.md) - [kubernetes单机](k8s_cn.md)
- [kubernetes distributed](k8s_distributed_cn.md) - [kubernetes distributed分布式](k8s_distributed_cn.md)
- [kubernetes on AWS](k8s_aws_cn.md) - [AWS上运行kubernetes集群训练](k8s_aws_cn.md)
# PaddlePaddle Distributed Training # Distributed Training
## Introduction ## Introduction
...@@ -188,5 +188,4 @@ These cluster platforms provide API or environment variables for training proces ...@@ -188,5 +188,4 @@ These cluster platforms provide API or environment variables for training proces
- [fabric](fabric_en.md) - [fabric](fabric_en.md)
- [openmpi](openmpi_en.md) - [openmpi](openmpi_en.md)
- [kubernetes](k8s_en.md) - [kubernetes](k8s_en.md)
- kubernetes distributed
- [kubernetes on AWS](k8s_aws_en.md) - [kubernetes on AWS](k8s_aws_en.md)
...@@ -493,7 +493,7 @@ spec: ...@@ -493,7 +493,7 @@ spec:
spec: spec:
containers: containers:
- name: paddle-data - name: paddle-data
image: paddledev/paddle-tutorial:k8s_data image: paddlepaddle/paddle-tutorial:k8s_data
imagePullPolicy: Always imagePullPolicy: Always
volumeMounts: volumeMounts:
- mountPath: "/efs" - mountPath: "/efs"
...@@ -522,7 +522,7 @@ NAME DESIRED SUCCESSFUL AGE ...@@ -522,7 +522,7 @@ NAME DESIRED SUCCESSFUL AGE
paddle-data 1 1 6m paddle-data 1 1 6m
``` ```
Data preparation is done by docker image `paddledev/paddle-tutorial:k8s_data`, see [here](src/k8s_data/README.md) for how to build this docker image and source code. Data preparation is done by docker image `paddlepaddle/paddle-tutorial:k8s_data`, see [here](src/k8s_data/README.md) for how to build this docker image and source code.
#### Start Training #### Start Training
...@@ -545,7 +545,7 @@ spec: ...@@ -545,7 +545,7 @@ spec:
claimName: efsvol claimName: efsvol
containers: containers:
- name: trainer - name: trainer
image: paddledev/paddle-tutorial:k8s_train image: paddlepaddle/paddle-tutorial:k8s_train
command: ["bin/bash", "-c", "/root/start.sh"] command: ["bin/bash", "-c", "/root/start.sh"]
env: env:
- name: JOB_NAME - name: JOB_NAME
...@@ -617,7 +617,7 @@ kubectl --kubeconfig=kubeconfig log -f POD_NAME ...@@ -617,7 +617,7 @@ kubectl --kubeconfig=kubeconfig log -f POD_NAME
Run `kubectl --kubeconfig=kubeconfig describe job paddle-cluster-job` to check training job status. It will complete in around 20 minutes. Run `kubectl --kubeconfig=kubeconfig describe job paddle-cluster-job` to check training job status. It will complete in around 20 minutes.
The details for start `pserver` and `trainer` are hidden inside docker image `paddledev/paddle-tutorial:k8s_train`, see [here](src/k8s_train/README.md) for how to build the docker image and source code. The details for start `pserver` and `trainer` are hidden inside docker image `paddlepaddle/paddle-tutorial:k8s_train`, see [here](src/k8s_train/README.md) for how to build the docker image and source code.
#### Inspect Training Output #### Inspect Training Output
......
# Kubernetes单机训练 # Kubernetes单机训练
在这篇文档里,我们介绍如何在 Kubernetes 集群上启动一个单机使用CPU的Paddle训练作业。在下一篇中,我们将介绍如何启动分布式训练作业。 在这篇文档里,我们介绍如何在 Kubernetes 集群上启动一个单机使用CPU的PaddlePaddle训练作业。在下一篇中,我们将介绍如何启动分布式训练作业。
## 制作Docker镜像 ## 制作Docker镜像
在一个功能齐全的Kubernetes机群里,通常我们会安装Ceph等分布式文件系统来存储训练数据。这样的话,一个分布式Paddle训练任务中的每个进程都可以从Ceph读取数据。在这个例子里,我们只演示一个单机作业,所以可以简化对环境的要求,把训练数据直接放在 在一个功能齐全的Kubernetes机群里,通常我们会安装Ceph等分布式文件系统来存储训练数据。这样的话,一个分布式PaddlePaddle训练任务中
Paddle的Docker image里。为此,我们需要制作一个包含训练数据的Paddle镜像。 的每个进程都可以从Ceph读取数据。在这个例子里,我们只演示一个单机作业,所以可以简化对环境的要求,把训练数据直接放在
PaddlePaddle的Docker Image里。为此,我们需要制作一个包含训练数据的PaddlePaddle镜像。
Paddle[Quick Start Tutorial](http://www.paddlepaddle.org/doc/demo/quick_start/index_en.html) PaddlePaddle的 `paddlepaddle/paddle:cpu-demo-latest` 镜像里有PaddlePaddle的源码与demo,
里介绍了用Paddle源码中的脚本下载训练数据的过程。 (请注意,默认的PaddlePaddle生产环境镜像 `paddlepaddle/paddle:latest` 是不包括源码的,PaddlePaddle的各版本镜像可以参考
`paddledev/paddle:cpu-demo-latest` 镜像里有 Paddle 源码与demo,( 请注意,默认的 [Docker Installation Guide](http://paddlepaddle.org/docs/develop/documentation/zh/getstarted/build_and_install/docker_install_cn.html)),
Paddle镜像 `paddledev/paddle:cpu-latest` 是不包括源码的, Paddle的各版本镜像可以参考 [Docker installation guide](http://www.paddlepaddle.org/doc/build/docker_install.html) ),所以我们使用这个镜像来下载训练数据到Docker container中,然后把这个包含了训练数据的container保存为一个新的镜像。 下面我们使用这个镜像来下载数据到Docker Container中,并把这个包含了训练数据的Container保存为一个新的镜像。
### 运行容器 ### 运行容器
``` ```
$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest $ docker run --name quick_start_data -it paddlepaddle/paddle:cpu-demo-latest
``` ```
### 下载数据 ### 下载数据
...@@ -103,7 +104,7 @@ spec: ...@@ -103,7 +104,7 @@ spec:
restartPolicy: Never restartPolicy: Never
``` ```
### 创建Paddle Job ### 创建PaddlePaddle Job
使用上文创建的yaml文件创建Kubernetes Job,命令为: 使用上文创建的yaml文件创建Kubernetes Job,命令为:
......
...@@ -28,7 +28,7 @@ PaddlePaddle镜像需要提供`paddle pserver`与`paddle train`进程的运行 ...@@ -28,7 +28,7 @@ PaddlePaddle镜像需要提供`paddle pserver`与`paddle train`进程的运行
- 拷贝训练文件到容器内 - 拷贝训练文件到容器内
- 生成`paddle pserver``paddle train`进程的启动参数,并且启动训练 - 生成`paddle pserver``paddle train`进程的启动参数,并且启动训练
因为官方镜像 `paddledev/paddle:cpu-latest` 内已经包含PaddlePaddle的执行程序但是还没上述功能,所以我们可以在这个基础上,添加启动脚本,制作新镜像来完成以上的工作。参考镜像的[*Dockerfile*](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/usage/cluster/src/k8s_train/Dockerfile) 因为官方镜像 `paddlepaddle/paddle:latest` 内已经包含PaddlePaddle的执行程序但是还没上述功能,所以我们可以在这个基础上,添加启动脚本,制作新镜像来完成以上的工作。参考镜像的[*Dockerfile*](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/usage/cluster/src/k8s_train/Dockerfile)
```bash ```bash
$ cd doc/howto/usage/k8s/src/k8s_train $ cd doc/howto/usage/k8s/src/k8s_train
...@@ -62,7 +62,7 @@ spec: ...@@ -62,7 +62,7 @@ spec:
hostNetwork: true hostNetwork: true
containers: containers:
- name: paddle-data - name: paddle-data
image: paddledev/paddle-tutorial:k8s_data image: paddlepaddle/paddle-tutorial:k8s_data
imagePullPolicy: Always imagePullPolicy: Always
volumeMounts: volumeMounts:
- mountPath: "/mnt" - mountPath: "/mnt"
......
# Paddle On Kubernetes # PaddlePaddle On Kubernetes
>In this article, we will introduce how to run Paddle training job on single CPU machine using Kubernetes. In next article, we will introduce how to run Paddle training job on distributed cluster. In this article, we will introduce how to run PaddlePaddle training job on single CPU machine using Kubernetes. In next article, we will introduce how to run PaddlePaddle training job on distributed cluster.
## Build Docker Image ## Build Docker Image
In distributed Kubernetes cluster, we will use Ceph or other shared storage system for storing training related data so that all processes in Paddle training can retrieve data from Ceph. In this example, we will only demo training job on single machine. In order to simplify the requirement of the environment, we will directly put training data into Paddle's Docker Image, so we need to create a Paddle Docker image that already includes the training data. In distributed Kubernetes cluster, we will use Ceph or other distributed
storage system for storing training related data so that all processes in
PaddlePaddle training can retrieve data from Ceph. In this example, we will
only demo training job on single machine. In order to simplify the requirement
of the environment, we will directly put training data into the PaddlePaddle Docker Image,
so we need to create a PaddlePaddle Docker image that includes the training data.
Paddle's [Quick Start Tutorial](http://www.paddlepaddle.org/doc/demo/quick_start/index_en.html) introduces how to download and train data by using script from Paddle's source code. The production Docker Image `paddlepaddle/paddle:cpu-demo-latest` has the PaddlePaddle
And `paddledev/paddle:cpu-demo-latest` image has the Paddle source code and demo. (Caution: Default Paddle image `paddledev/paddle:cpu-latest` doesn't include the source code, Paddle's different versions of image can be referred here: [Docker installation guide](http://www.paddlepaddle.org/doc/build/docker_install.html)), so we run this container and download the training data, and then commit the whole container to be a new Docker image. source code and demo. (Caution: Default PaddlePaddle Docker Image `paddlepaddle/paddle:latest` doesn't include
the source code, PaddlePaddle's different versions of Docker Image can be referred here:
[Docker Installation Guide](http://paddlepaddle.org/docs/develop/documentation/zh/getstarted/build_and_install/docker_install_en.html)),
so we run this Docker Image and download the training data, and then commit the whole
Container to be a new Docker Image.
### Run Docker Container ### Run Docker Container
``` ```
$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest $ docker run --name quick_start_data -it paddlepaddle/paddle:cpu-demo-latest
``` ```
### Download Training Data ### Download Training Data
...@@ -67,7 +76,7 @@ $ docker commit quick_start_data mypaddle/paddle:quickstart ...@@ -67,7 +76,7 @@ $ docker commit quick_start_data mypaddle/paddle:quickstart
## Use Kubernetes For Training ## Use Kubernetes For Training
>We will use Kubernetes job for training process, following steps shows how to do the training with Kubernetes. We will use Kubernetes job for training process, following steps shows how to do the training with Kubernetes.
### Create Yaml Files ### Create Yaml Files
...@@ -99,7 +108,7 @@ spec: ...@@ -99,7 +108,7 @@ spec:
restartPolicy: Never restartPolicy: Never
``` ```
### Start Paddle Job ### Start PaddlePaddle Job
Using the above yaml file to start the Kubernetes job. Using the above yaml file to start the Kubernetes job.
......
FROM paddledev/paddle:cpu-latest FROM paddlepaddle/paddle:latest
MAINTAINER zjsxzong89@gmail.com MAINTAINER zjsxzong89@gmail.com
......
FROM paddledev/paddle:cpu-latest FROM paddlepaddle/paddle:latest
COPY start.sh /root/ COPY start.sh /root/
COPY start_paddle.py /root/ COPY start_paddle.py /root/
......
if(WITH_PYTHON) if(WITH_PYTHON)
cc_library(paddle_pybind SHARED cc_library(paddle_pybind SHARED
SRCS pybind.cc exception.cc protobuf.cc SRCS pybind.cc exception.cc protobuf.cc const_value.cc
DEPS pybind python backward proto_desc paddle_memory executor prune init DEPS pybind python backward proto_desc paddle_memory executor prune init
${GLOB_OP_LIB}) ${GLOB_OP_LIB})
endif(WITH_PYTHON) endif(WITH_PYTHON)
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "const_value.h"
#include "paddle/framework/operator.h"
namespace paddle {
namespace pybind {
void BindConstValue(pybind11::module& m) {
m.def("kEmptyVarName", [] { return framework::kEmptyVarName; });
m.def("kTempVarName", [] { return framework::kTempVarName; });
m.def("kGradVarSuffix", [] { return framework::kGradVarSuffix; });
m.def("kZeroVarSuffix", [] { return framework::kZeroVarSuffix; });
}
} // namespace pybind
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <Python.h>
#include "paddle/platform/enforce.h"
#include "pybind11/pybind11.h"
namespace py = pybind11;
namespace paddle {
namespace pybind {
extern void BindConstValue(pybind11::module& m);
} // namespace pybind
} // namespace paddle
...@@ -30,6 +30,7 @@ limitations under the License. */ ...@@ -30,6 +30,7 @@ limitations under the License. */
#include "paddle/operators/net_op.h" #include "paddle/operators/net_op.h"
#include "paddle/platform/enforce.h" #include "paddle/platform/enforce.h"
#include "paddle/platform/place.h" #include "paddle/platform/place.h"
#include "paddle/pybind/const_value.h"
#include "paddle/pybind/exception.h" #include "paddle/pybind/exception.h"
#include "paddle/pybind/pybind.h" #include "paddle/pybind/pybind.h"
#include "paddle/pybind/tensor_py.h" #include "paddle/pybind/tensor_py.h"
...@@ -431,6 +432,7 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -431,6 +432,7 @@ All parameter, weight, gradient are variables in Paddle.
BindBlockDesc(m); BindBlockDesc(m);
BindVarDsec(m); BindVarDsec(m);
BindOpDesc(m); BindOpDesc(m);
BindConstValue(m);
py::class_<framework::LoDRankTable>(m, "LodRankTable") py::class_<framework::LoDRankTable>(m, "LodRankTable")
.def("items", [](framework::LoDRankTable &table) { .def("items", [](framework::LoDRankTable &table) {
......
...@@ -16,12 +16,13 @@ import regularizer ...@@ -16,12 +16,13 @@ import regularizer
from param_attr import ParamAttr from param_attr import ParamAttr
from data_feeder import DataFeeder from data_feeder import DataFeeder
from core import LoDTensor, CPUPlace, GPUPlace from core import LoDTensor, CPUPlace, GPUPlace
import clip
Tensor = LoDTensor Tensor = LoDTensor
__all__ = framework.__all__ + executor.__all__ + [ __all__ = framework.__all__ + executor.__all__ + [
'io', 'initializer', 'layers', 'nets', 'optimizer', 'backward', 'io', 'initializer', 'layers', 'nets', 'optimizer', 'backward',
'regularizer', 'LoDTensor', 'CPUPlace', 'GPUPlace', 'Tensor', 'ParamAttr' 'regularizer', 'LoDTensor', 'CPUPlace', 'GPUPlace', 'Tensor', 'ParamAttr'
'DataFeeder' 'DataFeeder', 'clip'
] ]
......
import functools
import layers
__all__ = ['GradientClipByValue', 'append_gradient_clip_ops']
class BaseGradientClipAttr(object):
def process_context(self, context, p_g):
raise NotImplementedError()
def create_operators(self, param, grad):
raise NotImplementedError()
class NullGradientClipAttr(BaseGradientClipAttr):
def process_context(self, context, p_g):
pass
def create_operators(self, param, grad):
return param, grad
class GradientClipByValue(BaseGradientClipAttr):
def __init__(self, max, min=None):
max = float(max)
if min is None:
min = -max
else:
min = float(min)
self.max = max
self.min = min
def process_context(self, context, p_g):
pass
def create_operators(self, param, grad):
new_grad = layers.clip(x=grad, min=self.min, max=self.max)
return param, new_grad
def append_gradient_clip_ops(param_grad):
context = dict()
create_op_callbacks = []
for p, g in param_grad:
clip_attr = getattr(p, 'clip_attr', NullGradientClipAttr())
if clip_attr is None:
clip_attr = NullGradientClipAttr()
if not isinstance(clip_attr, BaseGradientClipAttr):
raise TypeError(
"clip attribute should be an instance of BaseGradientClippingAttr"
)
clip_attr.process_context(context=context, p_g=param_grad)
create_op_callbacks.append(
functools.partial(
clip_attr.create_operators, param=p, grad=g))
return [each_callback() for each_callback in create_op_callbacks]
ClipByValue = GradientClipByValue
import collections import collections
import contextlib
import numpy as np import numpy as np
from . import core
import proto.framework_pb2 as framework_pb2 import proto.framework_pb2 as framework_pb2
import google.protobuf.message from . import core
import contextlib
__all__ = [ __all__ = [
'Block', 'Variable', 'Program', 'Operator', 'default_startup_program', 'Block', 'Variable', 'Program', 'Operator', 'default_startup_program',
...@@ -12,6 +12,18 @@ __all__ = [ ...@@ -12,6 +12,18 @@ __all__ = [
'switch_main_program' 'switch_main_program'
] ]
EMPTY_VAR_NAME = core.kEmptyVarName()
TEMP_VAR_NAME = core.kTempVarName()
GRAD_VAR_SUFFIX = core.kGradVarSuffix()
ZERO_VAR_SUFFIX = core.kZeroVarSuffix()
def grad_var_name(var_name):
"""
return gradient name for a certain var name
"""
return var_name + GRAD_VAR_SUFFIX
def unique_name(prefix): def unique_name(prefix):
""" """
...@@ -704,6 +716,7 @@ class Block(object): ...@@ -704,6 +716,7 @@ class Block(object):
trainable=p.trainable, trainable=p.trainable,
optimize_attr=p.optimize_attr, optimize_attr=p.optimize_attr,
regularizer=p.regularizer, regularizer=p.regularizer,
clip_attr=p.clip_attr,
name=v.name) name=v.name)
self.vars[new_p.name] = new_p self.vars[new_p.name] = new_p
...@@ -866,6 +879,8 @@ class Parameter(Variable): ...@@ -866,6 +879,8 @@ class Parameter(Variable):
self.regularizer = kwargs.get('regularizer', None) self.regularizer = kwargs.get('regularizer', None)
self.clip_attr = kwargs.get('clip_attr', None)
# program is a global instance. # program is a global instance.
_main_program_ = Program() _main_program_ = Program()
......
...@@ -2,7 +2,7 @@ from ..registry import register_layer ...@@ -2,7 +2,7 @@ from ..registry import register_layer
__all__ = [ __all__ = [
'mean', 'mul', 'dropout', 'reshape', 'sigmoid', 'scale', 'transpose', 'mean', 'mul', 'dropout', 'reshape', 'sigmoid', 'scale', 'transpose',
'sigmoid_cross_entropy_with_logits', 'elementwise_add', 'elementwise_div', 'sigmoid_cross_entropy_with_logits', 'elementwise_add', 'elementwise_div',
'elementwise_sub', 'elementwise_mul', 'clip', 'abs' 'elementwise_sub', 'elementwise_mul', 'clip', 'abs', 'sequence_softmax'
] ]
for _OP in set(__all__): for _OP in set(__all__):
......
...@@ -6,6 +6,7 @@ from framework import unique_name, program_guard ...@@ -6,6 +6,7 @@ from framework import unique_name, program_guard
from initializer import Constant from initializer import Constant
from layer_helper import LayerHelper from layer_helper import LayerHelper
from regularizer import append_regularization_ops from regularizer import append_regularization_ops
from clip import append_gradient_clip_ops
__all__ = ['SGD', 'Momentum', 'Adagrad', 'Adam', 'Adamax', 'DecayedAdagrad'] __all__ = ['SGD', 'Momentum', 'Adagrad', 'Adam', 'Adamax', 'DecayedAdagrad']
...@@ -197,9 +198,13 @@ class Optimizer(object): ...@@ -197,9 +198,13 @@ class Optimizer(object):
`create_optimization_pass()` into one. `create_optimization_pass()` into one.
""" """
params_grads = append_backward_ops(loss, parameter_list, no_grad_set) params_grads = append_backward_ops(loss, parameter_list, no_grad_set)
params_grads = append_gradient_clip_ops(params_grads)
# Add regularization if any # Add regularization if any
params_grads = append_regularization_ops(params_grads, params_grads = append_regularization_ops(params_grads,
self.regularization) self.regularization)
optimize_ops = self.create_optimization_pass(params_grads, loss, optimize_ops = self.create_optimization_pass(params_grads, loss,
startup_program) startup_program)
return optimize_ops return optimize_ops
......
from initializer import Initializer, Xavier, Constant from initializer import Initializer, Xavier, Constant
from regularizer import WeightDecayRegularizer from regularizer import WeightDecayRegularizer
__all__ = ['ParamAttr']
class ParamAttr(object): class ParamAttr(object):
def __init__(self, def __init__(self,
...@@ -8,12 +10,14 @@ class ParamAttr(object): ...@@ -8,12 +10,14 @@ class ParamAttr(object):
initializer=None, initializer=None,
learning_rate=1.0, learning_rate=1.0,
regularizer=None, regularizer=None,
trainable=True): trainable=True,
clip=None):
self.name = name self.name = name
self.initializer = initializer self.initializer = initializer
self.learning_rate = learning_rate self.learning_rate = learning_rate
self.regularizer = regularizer self.regularizer = regularizer
self.trainable = trainable self.trainable = trainable
self.clip = clip
def set_default_initializer(self, initializer): def set_default_initializer(self, initializer):
if initializer is None: if initializer is None:
...@@ -56,7 +60,8 @@ class ParamAttr(object): ...@@ -56,7 +60,8 @@ class ParamAttr(object):
'name': self.name, 'name': self.name,
'learning_rate': self.learning_rate, 'learning_rate': self.learning_rate,
'regularizer': self.regularizer, 'regularizer': self.regularizer,
'trainable': self.trainable 'trainable': self.trainable,
'clip_attr': self.clip
} }
if with_initializer: if with_initializer:
kwargs['initializer'] = self.initializer kwargs['initializer'] = self.initializer
......
...@@ -11,7 +11,9 @@ regularizer = fluid.regularizer.L2Decay(0.0005 * BATCH_SIZE) ...@@ -11,7 +11,9 @@ regularizer = fluid.regularizer.L2Decay(0.0005 * BATCH_SIZE)
hidden1 = fluid.layers.fc(input=image, hidden1 = fluid.layers.fc(input=image,
size=128, size=128,
act='relu', act='relu',
param_attr=regularizer) param_attr=fluid.ParamAttr(
regularizer=regularizer,
clip=fluid.clip.ClipByValue(10)))
hidden2 = fluid.layers.fc(input=hidden1, hidden2 = fluid.layers.fc(input=hidden1,
size=64, size=64,
act='relu', act='relu',
......
...@@ -3,10 +3,7 @@ import numpy as np ...@@ -3,10 +3,7 @@ import numpy as np
from op_test import OpTest from op_test import OpTest
import paddle.v2.fluid.core as core import paddle.v2.fluid.core as core
from paddle.v2.fluid.op import Operator from paddle.v2.fluid.op import Operator
from paddle.v2.fluid.framework import grad_var_name
def grad_var_name(var_name):
return var_name + "@GRAD"
def get_backward_op(scope, op, no_grad_set): def get_backward_op(scope, op, no_grad_set):
......
import unittest
import paddle.v2.fluid.framework as framework
class ConditionalBlock(unittest.TestCase):
def test_const_value(self):
self.assertEqual(framework.GRAD_VAR_SUFFIX, "@GRAD")
self.assertEqual(framework.TEMP_VAR_NAME, "@TEMP@")
self.assertEqual(framework.GRAD_VAR_SUFFIX, "@GRAD")
self.assertEqual(framework.ZERO_VAR_SUFFIX, "@ZERO")
if __name__ == '__main__':
unittest.main()
...@@ -187,6 +187,15 @@ class TestBook(unittest.TestCase): ...@@ -187,6 +187,15 @@ class TestBook(unittest.TestCase):
x_t=x_t, hidden_t_prev=prev_hidden, cell_t_prev=prev_cell)) x_t=x_t, hidden_t_prev=prev_hidden, cell_t_prev=prev_cell))
print(str(program)) print(str(program))
def test_sequence_softmax(self):
program = Program()
with program_guard(program):
seq_data = layers.data(
name='seq_data', shape=[10, 10], dtype='float32', lod_level=1)
seq = layers.fc(input=seq_data, size=20)
self.assertIsNotNone(layers.sequence_softmax(x=seq))
print(str(program))
def test_get_places(self): def test_get_places(self):
program = Program() program = Program()
with program_guard(program): with program_guard(program):
......
import unittest import unittest
import paddle.v2.fluid.op as op import paddle.v2.fluid.op as op
import paddle.v2.fluid.core as core
import paddle.v2.fluid.proto.framework_pb2 as framework_pb2 import paddle.v2.fluid.proto.framework_pb2 as framework_pb2
......
from __future__ import print_function from __future__ import print_function
import unittest import unittest
from paddle.v2.fluid.framework import Program, default_main_program, program_guard from paddle.v2.fluid.framework import Program, default_main_program, program_guard, grad_var_name
import paddle.v2.fluid.layers as layers import paddle.v2.fluid.layers as layers
main_program = default_main_program() main_program = default_main_program()
...@@ -109,12 +109,10 @@ class TestProgram(unittest.TestCase): ...@@ -109,12 +109,10 @@ class TestProgram(unittest.TestCase):
self.assertEqual(add_op.idx, 1) self.assertEqual(add_op.idx, 1)
param_to_grad = prog.append_backward(mean_out, set()) param_to_grad = prog.append_backward(mean_out, set())
def grad_name(name):
return name + "@GRAD"
for var_name in ("mul.x", "mul.y", "mul.out", "add.y", "add.out", for var_name in ("mul.x", "mul.y", "mul.out", "add.y", "add.out",
"mean.out"): "mean.out"):
self.assertEqual(param_to_grad[var_name][0], grad_name(var_name)) self.assertEqual(param_to_grad[var_name][0],
grad_var_name(var_name))
self.assertEqual(param_to_grad[var_name][1], 0) self.assertEqual(param_to_grad[var_name][1], 0)
expect_ops = [ expect_ops = [
......
import unittest import unittest
import paddle.v2.fluid.layers as layers import paddle.v2.fluid.layers as layers
from paddle.v2.fluid.framework import Program from paddle.v2.fluid.framework import Program, grad_var_name
from paddle.v2.fluid.executor import Executor from paddle.v2.fluid.executor import Executor
from paddle.v2.fluid.backward import append_backward_ops from paddle.v2.fluid.backward import append_backward_ops
import numpy as np import numpy as np
...@@ -164,7 +164,7 @@ class RecurrentOpTest1(unittest.TestCase): ...@@ -164,7 +164,7 @@ class RecurrentOpTest1(unittest.TestCase):
for x in self.data_field for x in self.data_field
} }
fetch_list = [ fetch_list = [
self.main_program.global_block().var(x + "@GRAD") self.main_program.global_block().var(grad_var_name(x))
for x in self.data_field for x in self.data_field
] ]
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册