提交 c4394bc5 编写于 作者: T tensor-tang

Merge remote-tracking branch 'ups/develop' into refine/infershape

# PaddlePaddle发行规范 # PaddlePaddle发行规范
PaddlePaddle使用git-flow branching model做分支管理,使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。 PaddlePaddle使用Trunk Based Development,使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。
PaddlePaddle每次发新的版本,遵循以下流程: PaddlePaddle每次发新的版本,遵循以下流程:
1.`develop`分支派生出新的分支,分支名为`release/版本号`。例如,`release/0.10.0` 1.`develop`分支派生出新的分支,分支名为`release/版本号`。例如,`release/0.10.0`
1. 将新分支的版本打上tag,tag为`版本号rc.Patch号`。第一个tag为`0.10.0rc1`,第二个为`0.10.0rc2`,依次类推。 2. 将新分支的版本打上tag,tag为`版本号rc-Patch号`。例如,第一个tag为`0.10.0-rc0`
1. 对这个版本的提交,做如下几个操作: 3. 新分支一般不接受新的feature和优化。QA在release分支上进行测试。研发基于最新的develop开发。
* 使用Regression Test List作为检查列表,测试本次release的正确性。 4. QA和研发发现的bug,在develop上修复验证后,cherry-pick修复到release分支。直到release分支相对稳定。
* 如果失败,记录下所有失败的例子,在这个`release/版本号`分支中,修复所有bug后,Patch号加一,到第二步 5. 如果有需要,在release分支最新代码上打上新的tag,比如`0.10.0-rc1`,让更多的用户加入测试。重复3-4步。
* 修改`python/setup.py.in`中的版本信息,并将`istaged`字段设为`True` 6. release分支稳定后,打上正式的release tag,比如`0.10.0`
* 将这个版本的python wheel包发布到pypi。 7. 将这个版本的python wheel包发布到pypi。
* 更新Docker镜像(参考后面的操作细节)。 8. 更新Docker镜像(参考后面的操作细节)。
1. 第三步完成后,将`release/版本号`分支合入master分支,将master分支的合入commit打上tag,tag为`版本号`。同时再将`master`分支合入`develop`分支。
1. 协同完成Release Note的书写。
需要注意的是: 需要注意的是:
* `release/版本号`分支一旦建立,一般不允许再从`develop`分支合入`release/版本号`。这样保证`release/版本号`分支功能的封闭,方便测试人员测试PaddlePaddle的行为。 * bug修复需要先在develop上进行,然后进入release分支。而不是直接在release分支上开发。
*`release/版本号`分支存在的时候,如果有bugfix的行为,需要将bugfix的分支同时merge到`master`, `develop``release/版本号`这三个分支。
* release分支原则上只接受修复类的修改,不接受新feature。
## 发布wheel包到pypi ## 发布wheel包到pypi
...@@ -61,24 +60,21 @@ docker push [镜像]:[version] ...@@ -61,24 +60,21 @@ docker push [镜像]:[version]
## PaddlePaddle 分支规范 ## PaddlePaddle 分支规范
PaddlePaddle开发过程使用[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,并适应github的特性做了一些区别。 PaddlePaddle开发过程使用[Trunk Based Development](https://trunkbaseddevelopment.com/) 开发规范。
* PaddlePaddle的主版本库遵循[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范。其中:
* `master`分支为稳定(stable branch)版本分支。每一个`master`分支的版本都是经过单元测试和回归测试的版本。
* `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试,但并没有经过回归测试。
* `release/版本号`分支为每一次Release时建立的临时分支。在这个阶段的代码正在经历回归测试。
* 其他用户的fork版本库并不需要严格遵守[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,但所有fork的版本库的所有分支都相当于特性分支。 * `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试。并且会经过模型回归测试。
* 建议,开发者fork的版本库使用`develop`分支同步主版本库的`develop`分支 * `release/版本号`分支为每一次Release时建立的临时分支。release分支主要用于测试,bug修复和最终发版。
* 建议,开发者fork的版本库中,再基于`develop`版本fork出自己的功能分支。 * `master`分支因为历史原因,已经废弃。
* 当功能分支开发完毕后,向PaddlePaddle的主版本库提交`Pull Reuqest`,进而进行代码评审。
* 在评审过程中,开发者修改自己的代码,可以继续在自己的功能分支提交代码。
* BugFix分支也是在开发者自己的fork版本库维护,与功能分支不同的是,BugFix分支需要分别给主版本库的`master``develop`与可能有的`release/版本号`分支,同时提起`Pull Request` * 其他开发者fork的feature branch。
* 建议,开发者的feature branch需要同步主版本库的`develop`分支。
* 建议,开发者的feature branch需要基于主版本库中的`develop`分支。
* 当feature branch开发完毕后,向PaddlePaddle的主版本库提交`Pull Reuqest`,进而进行代码评审。
* 在评审过程中,开发者修改自己的代码,可以继续在自己的feature branch提交代码。
## PaddlePaddle回归测试列表 ## PaddlePaddle回归测试列表
本列表说明PaddlePaddle发版之前需要测试的功能点。 TODO
### PaddlePaddle Book中所有章节 ### PaddlePaddle Book中所有章节
......
...@@ -4,26 +4,21 @@ PaddlePaddle manages its branches using "git-flow branching model", and [Semanti ...@@ -4,26 +4,21 @@ PaddlePaddle manages its branches using "git-flow branching model", and [Semanti
Each time we release a new PaddlePaddle version, we should follow the below steps: Each time we release a new PaddlePaddle version, we should follow the below steps:
1. Fork a new branch from `develop` named `release/[version]`, e.g. `release/0.10.0`. 1. Create a new release branch from `develop`,named `release/[version]`. E.g.,`release/0.10.0`
1. Push a new tag on the release branch, the tag name should be like `[version]rc.patch`. The 2. Create a new tag for the release branch, tag format: `version-rc.Patch`. E.g. the first tag is `0.10.0-rc0`
first tag should be `0.10.0rc1`, and the second should be `0.10.0.rc2` and so on. 3. New release branch normally doesn't accept new features or optimizations. QA will test on the release branch. Developer should develop based on `develop` branch.
1. After that, we should do: 4. If QA or Developer find bugs. They should first fix and verify on `develop` branch. Then cherry-pick the fix to the release branch. Wait until the release branch is stable.
* Run all regression test on the Regression Test List (see PaddlePaddle TeamCity CI), to confirm 5. If necessary, create a new tag on the relese branch, e.g. `0.10.0-rc1`. Involve more users to try it and repeat step 3-4.
that this release has no major bugs. 6. After release branch is stable,Create the official release tag,such as `0.10.0`.
* If regression test fails, we must fix those bugs and create a new `release/[version]` 7. Release the python wheel package to pypi.
branch from previous release branch. 8. Update the docker image (More details below).
* Modify `python/setup.py.in`, change the version number and change `ISTAGED` to `True`.
* Publish PaddlePaddle release wheel packages to pypi (see below instructions for detail). NOTE:
* Update the Docker images (see below instructions for detail).
1. After above step, merge `release/[version]` branch to master and push a tag on the master commit, * bug fix should happen on `develop` branch, then cherry-pick to relese branch. Avoid developing directly on release branch.
then merge `master` to `develop`.
1. Update the Release Note. * release normally only accept bug fixes. Don't add new features.
***NOTE:***
* Do ***NOT*** merge commits from develop branch to release branches to keep the release branch contain
features only for current release, so that we can test on that version.
* If we want to fix bugs on release branches, we must merge the fix to master, develop and release branch.
## Publish Wheel Packages to pypi ## Publish Wheel Packages to pypi
...@@ -97,26 +92,22 @@ You can then checkout the latest pushed tags at https://hub.docker.com/r/paddlep ...@@ -97,26 +92,22 @@ You can then checkout the latest pushed tags at https://hub.docker.com/r/paddlep
## Branching Model ## Branching Model
We use [git-flow](http://nvie.com/posts/a-successful-git-branching-model/) as our branching model, PaddlePaddle uses [Trunk Based Development](https://trunkbaseddevelopment.com/) as our branching model.
with some modifications:
* `develop` branch is used for development. Each comment to `develop` branc goes through unit tests and model regression tests.
* `master` branch is the stable branch. Each version on the master branch is tested and guaranteed. * `release/[version]` branch is used for each release. Release branch is used for tests, bug fix and evetual release.
* `develop` branch is for development. Each commit on develop branch has passed CI unit test, but no * `master` branch as been deprecated for historical reasons
regression tests are run.
* `release/[version]` branch is used to publish each release. Latest release version branches have * Developer's feature branch。
bugfix only for that version, but no feature updates. * Developer's feature branch should sync with upstream `develop` branch.
* Developer forks are not required to follow * Developer's feature branch should be forked from upstream `develop` branch.
[git-flow](http://nvie.com/posts/a-successful-git-branching-model/) * After feature branch is ready, create a `Pull Request` against the Paddle repo and go through code review.
branching model, all forks is like a feature branch. * In the review process, develop modify codes and push to their own feature branch.
* Advise: developer fork's develop branch is used to sync up with main repo's develop branch.
* Advise: developer use it's fork's develop branch to for new branch to start developing.
* Use that branch on developer's fork to create pull requests and start reviews.
* developer can push new commits to that branch when the pull request is open.
* Bug fixes are also started from developers forked repo. And, bug fixes branch can merge to
`master`, `develop` and `releases`.
## PaddlePaddle Regression Test List ## PaddlePaddle Regression Test List
TODO
### All Chapters of PaddlePaddle Book ### All Chapters of PaddlePaddle Book
We need to guarantee that all the chapters of PaddlePaddle Book can run correctly. Including We need to guarantee that all the chapters of PaddlePaddle Book can run correctly. Including
......
...@@ -100,7 +100,7 @@ paddle.fluid.layers.gru_unit ArgSpec(args=['input', 'hidden', 'size', 'param_att ...@@ -100,7 +100,7 @@ paddle.fluid.layers.gru_unit ArgSpec(args=['input', 'hidden', 'size', 'param_att
paddle.fluid.layers.linear_chain_crf ArgSpec(args=['input', 'label', 'param_attr'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.linear_chain_crf ArgSpec(args=['input', 'label', 'param_attr'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.crf_decoding ArgSpec(args=['input', 'param_attr', 'label'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.crf_decoding ArgSpec(args=['input', 'param_attr', 'label'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.cos_sim ArgSpec(args=['X', 'Y'], varargs=None, keywords=None, defaults=None) paddle.fluid.layers.cos_sim ArgSpec(args=['X', 'Y'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label'], varargs=None, keywords=None, defaults=(False,)) paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label', 'ignore_index'], varargs=None, keywords=None, defaults=(False, -100))
paddle.fluid.layers.square_error_cost ArgSpec(args=['input', 'label'], varargs=None, keywords=None, defaults=None) paddle.fluid.layers.square_error_cost ArgSpec(args=['input', 'label'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.chunk_eval ArgSpec(args=['input', 'label', 'chunk_scheme', 'num_chunk_types', 'excluded_chunk_types'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.chunk_eval ArgSpec(args=['input', 'label', 'chunk_scheme', 'num_chunk_types', 'excluded_chunk_types'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sequence_conv ArgSpec(args=['input', 'num_filters', 'filter_size', 'filter_stride', 'padding', 'bias_attr', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(3, 1, None, None, None, None)) paddle.fluid.layers.sequence_conv ArgSpec(args=['input', 'num_filters', 'filter_size', 'filter_stride', 'padding', 'bias_attr', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(3, 1, None, None, None, None))
...@@ -142,7 +142,7 @@ paddle.fluid.layers.beam_search ArgSpec(args=['pre_ids', 'pre_scores', 'ids', 's ...@@ -142,7 +142,7 @@ paddle.fluid.layers.beam_search ArgSpec(args=['pre_ids', 'pre_scores', 'ids', 's
paddle.fluid.layers.row_conv ArgSpec(args=['input', 'future_context_size', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(None, None)) paddle.fluid.layers.row_conv ArgSpec(args=['input', 'future_context_size', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.layers.multiplex ArgSpec(args=['inputs', 'index'], varargs=None, keywords=None, defaults=None) paddle.fluid.layers.multiplex ArgSpec(args=['inputs', 'index'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.layer_norm ArgSpec(args=['input', 'scale', 'shift', 'begin_norm_axis', 'epsilon', 'param_attr', 'bias_attr', 'act', 'name'], varargs=None, keywords=None, defaults=(True, True, 1, 1e-05, None, None, None, None)) paddle.fluid.layers.layer_norm ArgSpec(args=['input', 'scale', 'shift', 'begin_norm_axis', 'epsilon', 'param_attr', 'bias_attr', 'act', 'name'], varargs=None, keywords=None, defaults=(True, True, 1, 1e-05, None, None, None, None))
paddle.fluid.layers.softmax_with_cross_entropy ArgSpec(args=['logits', 'label', 'soft_label'], varargs=None, keywords=None, defaults=(False,)) paddle.fluid.layers.softmax_with_cross_entropy ArgSpec(args=['logits', 'label', 'soft_label', 'ignore_index'], varargs=None, keywords=None, defaults=(False, -100))
paddle.fluid.layers.smooth_l1 ArgSpec(args=['x', 'y', 'inside_weight', 'outside_weight', 'sigma'], varargs=None, keywords=None, defaults=(None, None, None)) paddle.fluid.layers.smooth_l1 ArgSpec(args=['x', 'y', 'inside_weight', 'outside_weight', 'sigma'], varargs=None, keywords=None, defaults=(None, None, None))
paddle.fluid.layers.one_hot ArgSpec(args=['input', 'depth'], varargs=None, keywords=None, defaults=None) paddle.fluid.layers.one_hot ArgSpec(args=['input', 'depth'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.autoincreased_step_counter ArgSpec(args=['counter_name', 'begin', 'step'], varargs=None, keywords=None, defaults=(None, 1, 1)) paddle.fluid.layers.autoincreased_step_counter ArgSpec(args=['counter_name', 'begin', 'step'], varargs=None, keywords=None, defaults=(None, 1, 1))
......
...@@ -138,6 +138,11 @@ class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -138,6 +138,11 @@ class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker {
"(bool, default false), a flag indicating whether to " "(bool, default false), a flag indicating whether to "
"interpretate the given labels as soft labels.") "interpretate the given labels as soft labels.")
.SetDefault(false); .SetDefault(false);
AddAttr<int>("ignore_index",
"(int, default -100), Specifies a target value that is"
"ignored and does not contribute to the input gradient."
"Only valid if soft_label is set to False")
.SetDefault(-100);
AddComment(R"DOC( AddComment(R"DOC(
CrossEntropy Operator. CrossEntropy Operator.
......
...@@ -40,7 +40,7 @@ class CrossEntropyOpKernel : public framework::OpKernel<T> { ...@@ -40,7 +40,7 @@ class CrossEntropyOpKernel : public framework::OpKernel<T> {
math::CrossEntropyFunctor<DeviceContext, T>()( math::CrossEntropyFunctor<DeviceContext, T>()(
ctx.template device_context<DeviceContext>(), &y_2d, &x_2d, &labels_2d, ctx.template device_context<DeviceContext>(), &y_2d, &x_2d, &labels_2d,
ctx.Attr<bool>("soft_label")); ctx.Attr<bool>("soft_label"), ctx.Attr<int>("ignore_index"));
} }
}; };
...@@ -74,16 +74,22 @@ class XeGradFunctor { ...@@ -74,16 +74,22 @@ class XeGradFunctor {
const T* dy, // NOLINT const T* dy, // NOLINT
const T* x, // NOLINT const T* x, // NOLINT
const int64_t* label, // NOLINT const int64_t* label, // NOLINT
size_t num_classes) size_t num_classes, size_t ignore_index)
: dx_(dx), dy_(dy), x_(x), label_(label), num_classes_(num_classes) {} : dx_(dx),
dy_(dy),
x_(x),
label_(label),
num_classes_(num_classes),
ignore_index_(ignore_index) {}
HOSTDEVICE void operator()(size_t sample_id) { HOSTDEVICE void operator()(size_t sample_id) {
auto x_is_true_offset = sample_id * num_classes_ + label_[sample_id]; auto x_is_true_offset = sample_id * num_classes_ + label_[sample_id];
for (size_t x_offset = sample_id * num_classes_; for (size_t x_offset = sample_id * num_classes_;
x_offset < (sample_id + 1) * num_classes_; ++x_offset) { x_offset < (sample_id + 1) * num_classes_; ++x_offset) {
dx_[x_offset] = x_offset != x_is_true_offset dx_[x_offset] =
? static_cast<T>(0) (x_offset != x_is_true_offset || label_[sample_id] == ignore_index_)
: -dy_[sample_id] / x_[x_offset]; ? static_cast<T>(0)
: -dy_[sample_id] / x_[x_offset];
} }
} }
...@@ -93,6 +99,7 @@ class XeGradFunctor { ...@@ -93,6 +99,7 @@ class XeGradFunctor {
const T* x_; const T* x_;
const int64_t* label_; const int64_t* label_;
size_t num_classes_; size_t num_classes_;
size_t ignore_index_;
}; };
template <typename DeviceContext, typename T> template <typename DeviceContext, typename T>
...@@ -109,6 +116,7 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel<T> { ...@@ -109,6 +116,7 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel<T> {
// unnecessary to convert tensors to 2-D views. // unnecessary to convert tensors to 2-D views.
int rank = x->dims().size(); int rank = x->dims().size();
int64_t class_num = x->dims()[rank - 1]; int64_t class_num = x->dims()[rank - 1];
int64_t ignore_index = ctx.Attr<int>("ignore_index");
if (ctx.Attr<bool>("soft_label")) { if (ctx.Attr<bool>("soft_label")) {
XeSoftlabelGradFunctor<T> functor(dx_data, dy->data<T>(), x->data<T>(), XeSoftlabelGradFunctor<T> functor(dx_data, dy->data<T>(), x->data<T>(),
label->data<T>(), label->data<T>(),
...@@ -118,9 +126,9 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel<T> { ...@@ -118,9 +126,9 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel<T> {
static_cast<size_t>(dx->numel())); static_cast<size_t>(dx->numel()));
for_range(functor); for_range(functor);
} else { } else {
XeGradFunctor<T> functor(dx_data, dy->data<T>(), x->data<T>(), XeGradFunctor<T> functor(
label->data<int64_t>(), dx_data, dy->data<T>(), x->data<T>(), label->data<int64_t>(),
static_cast<size_t>(class_num)); static_cast<size_t>(class_num), static_cast<size_t>(ignore_index));
platform::ForRange<DeviceContext> for_range( platform::ForRange<DeviceContext> for_range(
ctx.template device_context<DeviceContext>(), ctx.template device_context<DeviceContext>(),
static_cast<size_t>(dy->numel())); static_cast<size_t>(dy->numel()));
......
...@@ -28,7 +28,8 @@ class CrossEntropyFunctor<platform::CPUDeviceContext, T> { ...@@ -28,7 +28,8 @@ class CrossEntropyFunctor<platform::CPUDeviceContext, T> {
public: public:
void operator()(const platform::CPUDeviceContext& ctx, framework::Tensor* out, void operator()(const platform::CPUDeviceContext& ctx, framework::Tensor* out,
const framework::Tensor* prob, const framework::Tensor* prob,
const framework::Tensor* labels, const bool softLabel) { const framework::Tensor* labels, const bool softLabel,
const int ignore_index) {
const int batch_size = prob->dims()[0]; const int batch_size = prob->dims()[0];
if (softLabel) { if (softLabel) {
auto in = EigenMatrix<T>::From(*prob); auto in = EigenMatrix<T>::From(*prob);
...@@ -49,8 +50,12 @@ class CrossEntropyFunctor<platform::CPUDeviceContext, T> { ...@@ -49,8 +50,12 @@ class CrossEntropyFunctor<platform::CPUDeviceContext, T> {
int lbl = label_data[i]; int lbl = label_data[i];
PADDLE_ENFORCE_GE(lbl, 0); PADDLE_ENFORCE_GE(lbl, 0);
PADDLE_ENFORCE_LT(lbl, class_num); PADDLE_ENFORCE_LT(lbl, class_num);
PADDLE_ENFORCE((lbl >= 0 && lbl < class_num) || lbl == ignore_index);
int index = i * class_num + lbl; int index = i * class_num + lbl;
loss_data[i] = -math::TolerableValue<T>()(std::log(prob_data[index])); loss_data[i] =
lbl == ignore_index
? 0
: -math::TolerableValue<T>()(std::log(prob_data[index]));
} }
} }
} }
......
...@@ -23,11 +23,14 @@ namespace math { ...@@ -23,11 +23,14 @@ namespace math {
namespace { namespace {
template <typename T> template <typename T>
__global__ void CrossEntropyKernel(T* Y, const T* X, const int64_t* label, __global__ void CrossEntropyKernel(T* Y, const T* X, const int64_t* label,
const int N, const int D) { const int N, const int D,
const int ignore_index) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N; for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
i += blockDim.x * gridDim.x) { i += blockDim.x * gridDim.x) {
PADDLE_ASSERT(label[i] >= 0 && label[i] < D); PADDLE_ASSERT(label[i] >= 0 && label[i] < D || label[i] == ignore_index);
Y[i] = -math::TolerableValue<T>()(log(X[i * D + label[i]])); Y[i] = ignore_index == label[i]
? 0
: -math::TolerableValue<T>()(log(X[i * D + label[i]]));
} }
} }
...@@ -57,7 +60,8 @@ class CrossEntropyFunctor<platform::CUDADeviceContext, T> { ...@@ -57,7 +60,8 @@ class CrossEntropyFunctor<platform::CUDADeviceContext, T> {
public: public:
void operator()(const platform::CUDADeviceContext& ctx, void operator()(const platform::CUDADeviceContext& ctx,
framework::Tensor* out, const framework::Tensor* prob, framework::Tensor* out, const framework::Tensor* prob,
const framework::Tensor* labels, bool softLabel) { const framework::Tensor* labels, bool softLabel,
const int ignore_index) {
const T* prob_data = prob->data<T>(); const T* prob_data = prob->data<T>();
T* loss_data = out->mutable_data<T>(ctx.GetPlace()); T* loss_data = out->mutable_data<T>(ctx.GetPlace());
...@@ -77,7 +81,8 @@ class CrossEntropyFunctor<platform::CUDADeviceContext, T> { ...@@ -77,7 +81,8 @@ class CrossEntropyFunctor<platform::CUDADeviceContext, T> {
int block = 512; int block = 512;
int grid = (batch_size + block - 1) / block; int grid = (batch_size + block - 1) / block;
CrossEntropyKernel<T><<<grid, block, 0, ctx.stream()>>>( CrossEntropyKernel<T><<<grid, block, 0, ctx.stream()>>>(
loss_data, prob_data, label_data, batch_size, class_num); loss_data, prob_data, label_data, batch_size, class_num,
ignore_index);
} }
} }
}; };
......
...@@ -38,7 +38,8 @@ class CrossEntropyFunctor { ...@@ -38,7 +38,8 @@ class CrossEntropyFunctor {
public: public:
void operator()(const DeviceContext& context, framework::Tensor* out, void operator()(const DeviceContext& context, framework::Tensor* out,
const framework::Tensor* prob, const framework::Tensor* prob,
const framework::Tensor* labels, const bool softLabel); const framework::Tensor* labels, const bool softLabel,
const int ignore_index);
}; };
} // namespace math } // namespace math
} // namespace operators } // namespace operators
......
...@@ -44,6 +44,12 @@ class SoftmaxWithCrossEntropyOpMaker ...@@ -44,6 +44,12 @@ class SoftmaxWithCrossEntropyOpMaker
"(bool, default: false), A flag to indicate whether to interpretate " "(bool, default: false), A flag to indicate whether to interpretate "
"the given labels as soft labels.") "the given labels as soft labels.")
.SetDefault(false); .SetDefault(false);
AddAttr<int>(
"ignore_index",
"(int, default -100), Specifies a target value that is ignored and"
"does not contribute to the input gradient. Only valid if soft_label"
"is set to False")
.SetDefault(-100);
AddComment(R"DOC( AddComment(R"DOC(
Softmax With Cross Entropy Operator. Softmax With Cross Entropy Operator.
......
...@@ -26,7 +26,8 @@ using Tensor = framework::Tensor; ...@@ -26,7 +26,8 @@ using Tensor = framework::Tensor;
namespace { namespace {
template <typename T> template <typename T>
__global__ void CrossEntropyGrad(T* logit_grad, const int64_t* labels, __global__ void CrossEntropyGrad(T* logit_grad, const int64_t* labels,
const int batch_size, const int class_num) { const int batch_size, const int class_num,
const int ignore_index) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch_size; for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch_size;
i += blockDim.x * gridDim.x) { i += blockDim.x * gridDim.x) {
int idx = i * class_num + labels[i]; int idx = i * class_num + labels[i];
...@@ -260,6 +261,7 @@ class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel<T> { ...@@ -260,6 +261,7 @@ class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel<T> {
auto* loss_data = loss->mutable_data<T>(context.GetPlace()); auto* loss_data = loss->mutable_data<T>(context.GetPlace());
auto soft_label = context.Attr<bool>("soft_label"); auto soft_label = context.Attr<bool>("soft_label");
auto ignore_index = context.Attr<int>("ignore_index");
if (soft_label) { if (soft_label) {
int batch_size = logits->dims()[0]; int batch_size = logits->dims()[0];
int feature_size = logits->dims()[1]; int feature_size = logits->dims()[1];
...@@ -272,7 +274,8 @@ class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel<T> { ...@@ -272,7 +274,8 @@ class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel<T> {
math::SoftmaxCUDNNFunctor<T>()(context.cuda_device_context(), logits, math::SoftmaxCUDNNFunctor<T>()(context.cuda_device_context(), logits,
softmax); softmax);
math::CrossEntropyFunctor<platform::CUDADeviceContext, T>()( math::CrossEntropyFunctor<platform::CUDADeviceContext, T>()(
context.cuda_device_context(), loss, softmax, labels, false); context.cuda_device_context(), loss, softmax, labels, false,
ignore_index);
} }
} }
}; };
...@@ -295,7 +298,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> { ...@@ -295,7 +298,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> {
const int class_num = logit_grad->dims()[1]; const int class_num = logit_grad->dims()[1];
int block = 512; int block = 512;
auto stream = context.cuda_device_context().stream(); auto stream = context.cuda_device_context().stream();
auto ignore_index = context.Attr<int>("ignore_index");
if (context.Attr<bool>("soft_label")) { if (context.Attr<bool>("soft_label")) {
int grid = (batch_size * class_num + block - 1) / block; int grid = (batch_size * class_num + block - 1) / block;
const T* label_data = labels->data<T>(); const T* label_data = labels->data<T>();
...@@ -305,7 +308,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> { ...@@ -305,7 +308,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> {
int grid = (batch_size + block - 1) / block; int grid = (batch_size + block - 1) / block;
const int64_t* label_data = labels->data<int64_t>(); const int64_t* label_data = labels->data<int64_t>();
CrossEntropyGrad<T><<<grid, block, 0, stream>>>( CrossEntropyGrad<T><<<grid, block, 0, stream>>>(
logit_grad_data, label_data, batch_size, class_num); logit_grad_data, label_data, batch_size, class_num, ignore_index);
int num = batch_size * class_num; int num = batch_size * class_num;
grid = (num + block - 1) / block; grid = (num + block - 1) / block;
Scale<T><<<grid, block, 0, stream>>>(logit_grad_data, loss_grad_data, num, Scale<T><<<grid, block, 0, stream>>>(logit_grad_data, loss_grad_data, num,
......
...@@ -45,7 +45,8 @@ class SoftmaxWithCrossEntropyKernel : public framework::OpKernel<T> { ...@@ -45,7 +45,8 @@ class SoftmaxWithCrossEntropyKernel : public framework::OpKernel<T> {
math::SoftmaxFunctor<platform::CPUDeviceContext, T>()(dev_ctx, logits, math::SoftmaxFunctor<platform::CPUDeviceContext, T>()(dev_ctx, logits,
softmax); softmax);
math::CrossEntropyFunctor<platform::CPUDeviceContext, T>()( math::CrossEntropyFunctor<platform::CPUDeviceContext, T>()(
dev_ctx, loss, softmax, labels, context.Attr<bool>("soft_label")); dev_ctx, loss, softmax, labels, context.Attr<bool>("soft_label"),
context.Attr<int>("ignore_index"));
} }
}; };
......
...@@ -968,7 +968,7 @@ def dropout(x, dropout_prob, is_test=False, seed=None, name=None): ...@@ -968,7 +968,7 @@ def dropout(x, dropout_prob, is_test=False, seed=None, name=None):
return out return out
def cross_entropy(input, label, soft_label=False): def cross_entropy(input, label, soft_label=False, ignore_index=-100):
""" """
**Cross Entropy Layer** **Cross Entropy Layer**
...@@ -1012,7 +1012,10 @@ def cross_entropy(input, label, soft_label=False): ...@@ -1012,7 +1012,10 @@ def cross_entropy(input, label, soft_label=False):
tensor<float/double> with shape [N x D]. tensor<float/double> with shape [N x D].
soft_label (bool): a flag indicating whether to soft_label (bool): a flag indicating whether to
interpretate the given labels as soft interpretate the given labels as soft
labels, default `False`. labels. Default: `False`.
ignore_index (int): Specifies a target value that is ignored and does
not contribute to the input gradient. Only valid
if soft_label is set to False. Default: -100
Returns: Returns:
A 2-D tensor with shape [N x 1], the cross entropy loss. A 2-D tensor with shape [N x 1], the cross entropy loss.
...@@ -1037,7 +1040,8 @@ def cross_entropy(input, label, soft_label=False): ...@@ -1037,7 +1040,8 @@ def cross_entropy(input, label, soft_label=False):
inputs={'X': [input], inputs={'X': [input],
'Label': [label]}, 'Label': [label]},
outputs={'Y': [out]}, outputs={'Y': [out]},
attrs={"soft_label": soft_label}) attrs={"soft_label": soft_label,
"ignore_index": ignore_index})
return out return out
...@@ -4242,7 +4246,10 @@ def multiplex(inputs, index): ...@@ -4242,7 +4246,10 @@ def multiplex(inputs, index):
return out return out
def softmax_with_cross_entropy(logits, label, soft_label=False): def softmax_with_cross_entropy(logits,
label,
soft_label=False,
ignore_index=-100):
""" """
**Softmax With Cross Entropy Operator.** **Softmax With Cross Entropy Operator.**
...@@ -4284,6 +4291,10 @@ def softmax_with_cross_entropy(logits, label, soft_label=False): ...@@ -4284,6 +4291,10 @@ def softmax_with_cross_entropy(logits, label, soft_label=False):
soft_label is set to true, Label is a Tensor<float/double> with soft_label is set to true, Label is a Tensor<float/double> with
soft_label (bool): A flag to indicate whether to interpretate the given soft_label (bool): A flag to indicate whether to interpretate the given
labels as soft labels. By default, `soft_label` is set to False. labels as soft labels. By default, `soft_label` is set to False.
ignore_index (int): Specifies a target value that is ignored and does
not contribute to the input gradient. Only valid
if soft_label is set to False. Default: -100
Returns: Returns:
Variable: The cross entropy loss is a 2-D tensor with shape [N x 1]. Variable: The cross entropy loss is a 2-D tensor with shape [N x 1].
...@@ -4305,7 +4316,8 @@ def softmax_with_cross_entropy(logits, label, soft_label=False): ...@@ -4305,7 +4316,8 @@ def softmax_with_cross_entropy(logits, label, soft_label=False):
'Label': label}, 'Label': label},
outputs={'Softmax': softmax, outputs={'Softmax': softmax,
'Loss': loss}, 'Loss': loss},
attrs={'soft_label': soft_label}) attrs={'soft_label': soft_label,
'ignore_index': ignore_index})
return loss return loss
......
...@@ -209,5 +209,34 @@ class TestCrossEntropyOp6(OpTest): ...@@ -209,5 +209,34 @@ class TestCrossEntropyOp6(OpTest):
["X"], "Y", max_relative_error=0.05, numeric_grad_delta=0.001) ["X"], "Y", max_relative_error=0.05, numeric_grad_delta=0.001)
class TestCrossEntropyOp7(OpTest):
"""Test cross-entropy with ignore index.
"""
def setUp(self):
self.op_type = "cross_entropy"
batch_size = 30
class_num = 10
ignore_index = 3
X = randomize_probability(batch_size, class_num, dtype='float64')
label = np.random.randint(0, class_num, (batch_size, 1), dtype="int64")
cross_entropy = np.asmatrix(
[[-np.log(X[i][label[i][0]])]
if label[i][0] != ignore_index else [0]
for i in range(X.shape[0])],
dtype="float64")
self.inputs = {"X": X, "Label": label}
self.outputs = {"Y": cross_entropy}
self.attrs = {"soft_label": False, "ignore_index": ignore_index}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(["X"], "Y", numeric_grad_delta=0.001)
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()
...@@ -556,6 +556,15 @@ class TestBook(unittest.TestCase): ...@@ -556,6 +556,15 @@ class TestBook(unittest.TestCase):
out = layers.sequence_enumerate(input=x, win_size=2, pad_value=0) out = layers.sequence_enumerate(input=x, win_size=2, pad_value=0)
print(str(program)) print(str(program))
def test_cross_entropy(self):
program = Program()
with program_guard(program):
x = layers.data(name="x", shape=[30, 10], dtype="float32")
label = layers.data(name="label", shape=[30, 1], dtype="int32")
mode = 'channel'
out = layers.cross_entropy(x, label, False, 4)
self.assertIsNotNone(out)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -88,5 +88,40 @@ class TestSoftmaxWithCrossEntropyOp2(OpTest): ...@@ -88,5 +88,40 @@ class TestSoftmaxWithCrossEntropyOp2(OpTest):
self.check_grad(["Logits"], "Loss") self.check_grad(["Logits"], "Loss")
class TestSoftmaxWithCrossEntropyOp3(OpTest):
"""
Test softmax with cross entropy operator with ignore_index.
"""
def setUp(self):
self.op_type = "softmax_with_cross_entropy"
batch_size = 41
class_num = 37
logits = np.random.uniform(0.1, 1.0,
[batch_size, class_num]).astype("float64")
softmax = np.apply_along_axis(stable_softmax, 1, logits)
labels = np.random.randint(0, class_num, [batch_size, 1], dtype="int64")
ignore_index = 7
cross_entropy = np.asmatrix(
[[-np.log(softmax[i][labels[i][0]])]
if labels[i] != ignore_index else [0]
for i in range(softmax.shape[0])],
dtype="float64")
self.inputs = {"Logits": logits, "Label": labels}
self.outputs = {
"Softmax": softmax.astype("float64"),
"Loss": cross_entropy.astype("float64")
}
self.attrs = {"ignore_index": ignore_index}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(["Logits"], "Loss")
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()
...@@ -14,11 +14,14 @@ ...@@ -14,11 +14,14 @@
__all__ = [ __all__ = [
'map_readers', 'buffered', 'compose', 'chain', 'shuffle', 'map_readers', 'buffered', 'compose', 'chain', 'shuffle',
'ComposeNotAligned', 'firstn', 'xmap_readers', 'PipeReader' 'ComposeNotAligned', 'firstn', 'xmap_readers', 'PipeReader',
'multiprocess_reader'
] ]
from threading import Thread from threading import Thread
import subprocess import subprocess
import multiprocessing
import sys
from six.moves.queue import Queue from six.moves.queue import Queue
from six.moves import zip_longest from six.moves import zip_longest
...@@ -332,6 +335,100 @@ def xmap_readers(mapper, reader, process_num, buffer_size, order=False): ...@@ -332,6 +335,100 @@ def xmap_readers(mapper, reader, process_num, buffer_size, order=False):
return xreader return xreader
def multiprocess_reader(readers, use_pipe=True, queue_size=1000):
"""
multiprocess_reader use python multi process to read data from readers
and then use multiprocess.Queue or multiprocess.Pipe to merge all
data. The process number is equal to the number of input readers, each
process call one reader.
Multiprocess.Queue require the rw access right to /dev/shm, some
platform does not support.
you need to create multiple readers first, these readers should be independent
to each other so that each process can work independently.
An example:
.. code-block:: python
reader0 = reader(["file01", "file02"])
reader1 = reader(["file11", "file12"])
reader1 = reader(["file21", "file22"])
reader = multiprocess_reader([reader0, reader1, reader2],
queue_size=100, use_pipe=False)
"""
try:
import ujson as json
except Exception as e:
sys.stderr.write("import ujson error: " + str(e) + " use json\n")
import json
assert type(readers) is list and len(readers) > 0
def _read_into_queue(reader, queue):
for sample in reader():
if sample is None:
raise ValueError("sample has None")
queue.put(sample)
queue.put(None)
def queue_reader():
queue = multiprocessing.Queue(queue_size)
for reader in readers:
p = multiprocessing.Process(
target=_read_into_queue, args=(reader, queue))
p.start()
reader_num = len(readers)
finish_num = 0
while finish_num < reader_num:
sample = queue.get()
if sample is None:
finish_num += 1
else:
yield sample
def _read_into_pipe(reader, conn):
for sample in reader():
if sample is None:
raise ValueError("sample has None!")
conn.send(json.dumps(sample))
conn.send(json.dumps(None))
conn.close()
def pipe_reader():
conns = []
for reader in readers:
parent_conn, child_conn = multiprocessing.Pipe()
conns.append(parent_conn)
p = multiprocessing.Process(
target=_read_into_pipe, args=(reader, child_conn))
p.start()
reader_num = len(readers)
finish_num = 0
conn_to_remove = []
while finish_num < reader_num:
for conn in conn_to_remove:
conns.remove(conn)
conn_to_remove = []
for conn in conns:
sample = json.loads(conn.recv())
if sample is None:
finish_num += 1
conn.close()
conn_to_remove.append(conn)
else:
yield sample
if use_pipe:
return pipe_reader
else:
return queue_reader
def _buf2lines(buf, line_break="\n"): def _buf2lines(buf, line_break="\n"):
# FIXME: line_break should be automatically configured. # FIXME: line_break should be automatically configured.
lines = buf.split(line_break) lines = buf.split(line_break)
......
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
import time import time
import unittest import unittest
import functools
import paddle.reader import paddle.reader
...@@ -174,5 +175,33 @@ class TestPipeReader(unittest.TestCase): ...@@ -174,5 +175,33 @@ class TestPipeReader(unittest.TestCase):
temp.close() temp.close()
class TestMultiProcessReader(unittest.TestCase):
def setup(self):
self.samples = []
for i in range(1000):
self.samples.append([[i], [i + 1, i + 2], i + 3])
def reader(index):
for i in range(len(self.samples)):
if i % 3 == index:
yield self.samples[i]
self.reader0 = functools.partial(reader, 0)
self.reader1 = functools.partial(reader, 1)
self.reader2 = functools.partial(reader, 2)
def reader_test(self, use_pipe):
self.setup()
results = []
for data in paddle.reader.multiprocess_reader(
[self.reader0, self.reader1, self.reader2], 100, use_pipe)():
results.append(data)
self.assertEqual(sorted(self.samples), sorted(results))
def test_multi_process_reader(self):
self.reader_test(use_pipe=False)
self.reader_test(use_pipe=True)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册