diff --git a/doc/fluid/dev/releasing_process_cn.md b/doc/fluid/dev/releasing_process_cn.md index 4c6728fba7150b0f1e180e57590f18a5b677c70d..acea9a2b5df903a958edf3683900e165670e196f 100644 --- a/doc/fluid/dev/releasing_process_cn.md +++ b/doc/fluid/dev/releasing_process_cn.md @@ -1,24 +1,23 @@ # PaddlePaddle发行规范 -PaddlePaddle使用git-flow branching model做分支管理,使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。 +PaddlePaddle使用Trunk Based Development,使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。 PaddlePaddle每次发新的版本,遵循以下流程: 1. 从`develop`分支派生出新的分支,分支名为`release/版本号`。例如,`release/0.10.0` -1. 将新分支的版本打上tag,tag为`版本号rc.Patch号`。第一个tag为`0.10.0rc1`,第二个为`0.10.0rc2`,依次类推。 -1. 对这个版本的提交,做如下几个操作: - * 使用Regression Test List作为检查列表,测试本次release的正确性。 - * 如果失败,记录下所有失败的例子,在这个`release/版本号`分支中,修复所有bug后,Patch号加一,到第二步 - * 修改`python/setup.py.in`中的版本信息,并将`istaged`字段设为`True`。 - * 将这个版本的python wheel包发布到pypi。 - * 更新Docker镜像(参考后面的操作细节)。 -1. 第三步完成后,将`release/版本号`分支合入master分支,将master分支的合入commit打上tag,tag为`版本号`。同时再将`master`分支合入`develop`分支。 -1. 协同完成Release Note的书写。 +2. 将新分支的版本打上tag,tag为`版本号rc-Patch号`。例如,第一个tag为`0.10.0-rc0`。 +3. 新分支一般不接受新的feature和优化。QA在release分支上进行测试。研发基于最新的develop开发。 +4. QA和研发发现的bug,在develop上修复验证后,cherry-pick修复到release分支。直到release分支相对稳定。 +5. 如果有需要,在release分支最新代码上打上新的tag,比如`0.10.0-rc1`,让更多的用户加入测试。重复3-4步。 +6. release分支稳定后,打上正式的release tag,比如`0.10.0`。 +7. 将这个版本的python wheel包发布到pypi。 +8. 更新Docker镜像(参考后面的操作细节)。 需要注意的是: -* `release/版本号`分支一旦建立,一般不允许再从`develop`分支合入`release/版本号`。这样保证`release/版本号`分支功能的封闭,方便测试人员测试PaddlePaddle的行为。 -* 在`release/版本号`分支存在的时候,如果有bugfix的行为,需要将bugfix的分支同时merge到`master`, `develop`和`release/版本号`这三个分支。 +* bug修复需要先在develop上进行,然后进入release分支。而不是直接在release分支上开发。 + +* release分支原则上只接受修复类的修改,不接受新feature。 ## 发布wheel包到pypi @@ -61,24 +60,21 @@ docker push [镜像]:[version] ## PaddlePaddle 分支规范 -PaddlePaddle开发过程使用[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,并适应github的特性做了一些区别。 - -* PaddlePaddle的主版本库遵循[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范。其中: - * `master`分支为稳定(stable branch)版本分支。每一个`master`分支的版本都是经过单元测试和回归测试的版本。 - * `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试,但并没有经过回归测试。 - * `release/版本号`分支为每一次Release时建立的临时分支。在这个阶段的代码正在经历回归测试。 +PaddlePaddle开发过程使用[Trunk Based Development](https://trunkbaseddevelopment.com/) 开发规范。 -* 其他用户的fork版本库并不需要严格遵守[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,但所有fork的版本库的所有分支都相当于特性分支。 - * 建议,开发者fork的版本库使用`develop`分支同步主版本库的`develop`分支 - * 建议,开发者fork的版本库中,再基于`develop`版本fork出自己的功能分支。 - * 当功能分支开发完毕后,向PaddlePaddle的主版本库提交`Pull Reuqest`,进而进行代码评审。 - * 在评审过程中,开发者修改自己的代码,可以继续在自己的功能分支提交代码。 +* `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试。并且会经过模型回归测试。 +* `release/版本号`分支为每一次Release时建立的临时分支。release分支主要用于测试,bug修复和最终发版。 +* `master`分支因为历史原因,已经废弃。 -* BugFix分支也是在开发者自己的fork版本库维护,与功能分支不同的是,BugFix分支需要分别给主版本库的`master`、`develop`与可能有的`release/版本号`分支,同时提起`Pull Request`。 +* 其他开发者fork的feature branch。 + * 建议,开发者的feature branch需要同步主版本库的`develop`分支。 + * 建议,开发者的feature branch需要基于主版本库中的`develop`分支。 + * 当feature branch开发完毕后,向PaddlePaddle的主版本库提交`Pull Reuqest`,进而进行代码评审。 + * 在评审过程中,开发者修改自己的代码,可以继续在自己的feature branch提交代码。 ## PaddlePaddle回归测试列表 -本列表说明PaddlePaddle发版之前需要测试的功能点。 +TODO ### PaddlePaddle Book中所有章节 diff --git a/doc/fluid/dev/releasing_process_en.md b/doc/fluid/dev/releasing_process_en.md index 2c1c30c1eddfde6d9a8e2637be86537c43cc1b00..b810dc941d27fdb5004812ab58e105502e83280f 100644 --- a/doc/fluid/dev/releasing_process_en.md +++ b/doc/fluid/dev/releasing_process_en.md @@ -4,26 +4,21 @@ PaddlePaddle manages its branches using "git-flow branching model", and [Semanti Each time we release a new PaddlePaddle version, we should follow the below steps: -1. Fork a new branch from `develop` named `release/[version]`, e.g. `release/0.10.0`. -1. Push a new tag on the release branch, the tag name should be like `[version]rc.patch`. The - first tag should be `0.10.0rc1`, and the second should be `0.10.0.rc2` and so on. -1. After that, we should do: - * Run all regression test on the Regression Test List (see PaddlePaddle TeamCity CI), to confirm - that this release has no major bugs. - * If regression test fails, we must fix those bugs and create a new `release/[version]` - branch from previous release branch. - * Modify `python/setup.py.in`, change the version number and change `ISTAGED` to `True`. - * Publish PaddlePaddle release wheel packages to pypi (see below instructions for detail). - * Update the Docker images (see below instructions for detail). -1. After above step, merge `release/[version]` branch to master and push a tag on the master commit, - then merge `master` to `develop`. -1. Update the Release Note. - -***NOTE:*** - -* Do ***NOT*** merge commits from develop branch to release branches to keep the release branch contain - features only for current release, so that we can test on that version. -* If we want to fix bugs on release branches, we must merge the fix to master, develop and release branch. +1. Create a new release branch from `develop`,named `release/[version]`. E.g.,`release/0.10.0` +2. Create a new tag for the release branch, tag format: `version-rc.Patch`. E.g. the first tag is `0.10.0-rc0`。 +3. New release branch normally doesn't accept new features or optimizations. QA will test on the release branch. Developer should develop based on `develop` branch. +4. If QA or Developer find bugs. They should first fix and verify on `develop` branch. Then cherry-pick the fix to the release branch. Wait until the release branch is stable. +5. If necessary, create a new tag on the relese branch, e.g. `0.10.0-rc1`. Involve more users to try it and repeat step 3-4. +6. After release branch is stable,Create the official release tag,such as `0.10.0`. +7. Release the python wheel package to pypi. +8. Update the docker image (More details below). + +NOTE: + +* bug fix should happen on `develop` branch, then cherry-pick to relese branch. Avoid developing directly on release branch. + +* release normally only accept bug fixes. Don't add new features. + ## Publish Wheel Packages to pypi @@ -97,26 +92,22 @@ You can then checkout the latest pushed tags at https://hub.docker.com/r/paddlep ## Branching Model -We use [git-flow](http://nvie.com/posts/a-successful-git-branching-model/) as our branching model, -with some modifications: - -* `master` branch is the stable branch. Each version on the master branch is tested and guaranteed. -* `develop` branch is for development. Each commit on develop branch has passed CI unit test, but no - regression tests are run. -* `release/[version]` branch is used to publish each release. Latest release version branches have - bugfix only for that version, but no feature updates. -* Developer forks are not required to follow - [git-flow](http://nvie.com/posts/a-successful-git-branching-model/) - branching model, all forks is like a feature branch. - * Advise: developer fork's develop branch is used to sync up with main repo's develop branch. - * Advise: developer use it's fork's develop branch to for new branch to start developing. - * Use that branch on developer's fork to create pull requests and start reviews. - * developer can push new commits to that branch when the pull request is open. -* Bug fixes are also started from developers forked repo. And, bug fixes branch can merge to - `master`, `develop` and `releases`. +PaddlePaddle uses [Trunk Based Development](https://trunkbaseddevelopment.com/) as our branching model. + +* `develop` branch is used for development. Each comment to `develop` branc goes through unit tests and model regression tests. +* `release/[version]` branch is used for each release. Release branch is used for tests, bug fix and evetual release. +* `master` branch as been deprecated for historical reasons + +* Developer's feature branch。 + * Developer's feature branch should sync with upstream `develop` branch. + * Developer's feature branch should be forked from upstream `develop` branch. + * After feature branch is ready, create a `Pull Request` against the Paddle repo and go through code review. + * In the review process, develop modify codes and push to their own feature branch. ## PaddlePaddle Regression Test List +TODO + ### All Chapters of PaddlePaddle Book We need to guarantee that all the chapters of PaddlePaddle Book can run correctly. Including diff --git a/paddle/fluid/API.spec b/paddle/fluid/API.spec index ae5f30e431aba4cae04b0fb35f00bce84f18de33..842fde1ec5f16aeec28a790f1869eaed64e3516c 100644 --- a/paddle/fluid/API.spec +++ b/paddle/fluid/API.spec @@ -100,7 +100,7 @@ paddle.fluid.layers.gru_unit ArgSpec(args=['input', 'hidden', 'size', 'param_att paddle.fluid.layers.linear_chain_crf ArgSpec(args=['input', 'label', 'param_attr'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.crf_decoding ArgSpec(args=['input', 'param_attr', 'label'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.cos_sim ArgSpec(args=['X', 'Y'], varargs=None, keywords=None, defaults=None) -paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label'], varargs=None, keywords=None, defaults=(False,)) +paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label', 'ignore_index'], varargs=None, keywords=None, defaults=(False, -100)) paddle.fluid.layers.square_error_cost ArgSpec(args=['input', 'label'], varargs=None, keywords=None, defaults=None) paddle.fluid.layers.chunk_eval ArgSpec(args=['input', 'label', 'chunk_scheme', 'num_chunk_types', 'excluded_chunk_types'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.sequence_conv ArgSpec(args=['input', 'num_filters', 'filter_size', 'filter_stride', 'padding', 'bias_attr', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(3, 1, None, None, None, None)) @@ -142,7 +142,7 @@ paddle.fluid.layers.beam_search ArgSpec(args=['pre_ids', 'pre_scores', 'ids', 's paddle.fluid.layers.row_conv ArgSpec(args=['input', 'future_context_size', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(None, None)) paddle.fluid.layers.multiplex ArgSpec(args=['inputs', 'index'], varargs=None, keywords=None, defaults=None) paddle.fluid.layers.layer_norm ArgSpec(args=['input', 'scale', 'shift', 'begin_norm_axis', 'epsilon', 'param_attr', 'bias_attr', 'act', 'name'], varargs=None, keywords=None, defaults=(True, True, 1, 1e-05, None, None, None, None)) -paddle.fluid.layers.softmax_with_cross_entropy ArgSpec(args=['logits', 'label', 'soft_label'], varargs=None, keywords=None, defaults=(False,)) +paddle.fluid.layers.softmax_with_cross_entropy ArgSpec(args=['logits', 'label', 'soft_label', 'ignore_index'], varargs=None, keywords=None, defaults=(False, -100)) paddle.fluid.layers.smooth_l1 ArgSpec(args=['x', 'y', 'inside_weight', 'outside_weight', 'sigma'], varargs=None, keywords=None, defaults=(None, None, None)) paddle.fluid.layers.one_hot ArgSpec(args=['input', 'depth'], varargs=None, keywords=None, defaults=None) paddle.fluid.layers.autoincreased_step_counter ArgSpec(args=['counter_name', 'begin', 'step'], varargs=None, keywords=None, defaults=(None, 1, 1)) diff --git a/paddle/fluid/operators/cross_entropy_op.cc b/paddle/fluid/operators/cross_entropy_op.cc index 578ab63bc380ee62d76e34b7cf3cbd590bfa2eda..66f19fe7ecfa51b2ce917f0c5fcb6d486f1a7307 100644 --- a/paddle/fluid/operators/cross_entropy_op.cc +++ b/paddle/fluid/operators/cross_entropy_op.cc @@ -138,6 +138,11 @@ class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker { "(bool, default false), a flag indicating whether to " "interpretate the given labels as soft labels.") .SetDefault(false); + AddAttr("ignore_index", + "(int, default -100), Specifies a target value that is" + "ignored and does not contribute to the input gradient." + "Only valid if soft_label is set to False") + .SetDefault(-100); AddComment(R"DOC( CrossEntropy Operator. diff --git a/paddle/fluid/operators/cross_entropy_op.h b/paddle/fluid/operators/cross_entropy_op.h index 36b58d80144d242277f6fc970a3a61a6721d4b50..03974a7fc511b1e1cb5b0eca532b260fdf9bf964 100644 --- a/paddle/fluid/operators/cross_entropy_op.h +++ b/paddle/fluid/operators/cross_entropy_op.h @@ -40,7 +40,7 @@ class CrossEntropyOpKernel : public framework::OpKernel { math::CrossEntropyFunctor()( ctx.template device_context(), &y_2d, &x_2d, &labels_2d, - ctx.Attr("soft_label")); + ctx.Attr("soft_label"), ctx.Attr("ignore_index")); } }; @@ -74,16 +74,22 @@ class XeGradFunctor { const T* dy, // NOLINT const T* x, // NOLINT const int64_t* label, // NOLINT - size_t num_classes) - : dx_(dx), dy_(dy), x_(x), label_(label), num_classes_(num_classes) {} + size_t num_classes, size_t ignore_index) + : dx_(dx), + dy_(dy), + x_(x), + label_(label), + num_classes_(num_classes), + ignore_index_(ignore_index) {} HOSTDEVICE void operator()(size_t sample_id) { auto x_is_true_offset = sample_id * num_classes_ + label_[sample_id]; for (size_t x_offset = sample_id * num_classes_; x_offset < (sample_id + 1) * num_classes_; ++x_offset) { - dx_[x_offset] = x_offset != x_is_true_offset - ? static_cast(0) - : -dy_[sample_id] / x_[x_offset]; + dx_[x_offset] = + (x_offset != x_is_true_offset || label_[sample_id] == ignore_index_) + ? static_cast(0) + : -dy_[sample_id] / x_[x_offset]; } } @@ -93,6 +99,7 @@ class XeGradFunctor { const T* x_; const int64_t* label_; size_t num_classes_; + size_t ignore_index_; }; template @@ -109,6 +116,7 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel { // unnecessary to convert tensors to 2-D views. int rank = x->dims().size(); int64_t class_num = x->dims()[rank - 1]; + int64_t ignore_index = ctx.Attr("ignore_index"); if (ctx.Attr("soft_label")) { XeSoftlabelGradFunctor functor(dx_data, dy->data(), x->data(), label->data(), @@ -118,9 +126,9 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel { static_cast(dx->numel())); for_range(functor); } else { - XeGradFunctor functor(dx_data, dy->data(), x->data(), - label->data(), - static_cast(class_num)); + XeGradFunctor functor( + dx_data, dy->data(), x->data(), label->data(), + static_cast(class_num), static_cast(ignore_index)); platform::ForRange for_range( ctx.template device_context(), static_cast(dy->numel())); diff --git a/paddle/fluid/operators/math/cross_entropy.cc b/paddle/fluid/operators/math/cross_entropy.cc index caff35e03ae3a144f799d982c859ded62cb3e93d..18bf1a66f6d9903f32048574dc93faf7e98953ac 100644 --- a/paddle/fluid/operators/math/cross_entropy.cc +++ b/paddle/fluid/operators/math/cross_entropy.cc @@ -28,7 +28,8 @@ class CrossEntropyFunctor { public: void operator()(const platform::CPUDeviceContext& ctx, framework::Tensor* out, const framework::Tensor* prob, - const framework::Tensor* labels, const bool softLabel) { + const framework::Tensor* labels, const bool softLabel, + const int ignore_index) { const int batch_size = prob->dims()[0]; if (softLabel) { auto in = EigenMatrix::From(*prob); @@ -49,8 +50,12 @@ class CrossEntropyFunctor { int lbl = label_data[i]; PADDLE_ENFORCE_GE(lbl, 0); PADDLE_ENFORCE_LT(lbl, class_num); + PADDLE_ENFORCE((lbl >= 0 && lbl < class_num) || lbl == ignore_index); int index = i * class_num + lbl; - loss_data[i] = -math::TolerableValue()(std::log(prob_data[index])); + loss_data[i] = + lbl == ignore_index + ? 0 + : -math::TolerableValue()(std::log(prob_data[index])); } } } diff --git a/paddle/fluid/operators/math/cross_entropy.cu b/paddle/fluid/operators/math/cross_entropy.cu index 0de58d5fddd84d33f708c4c73e5a19dc2fe8a86b..c92341ea55ea21773acba33665e267b2f1c25fe3 100644 --- a/paddle/fluid/operators/math/cross_entropy.cu +++ b/paddle/fluid/operators/math/cross_entropy.cu @@ -23,11 +23,14 @@ namespace math { namespace { template __global__ void CrossEntropyKernel(T* Y, const T* X, const int64_t* label, - const int N, const int D) { + const int N, const int D, + const int ignore_index) { for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N; i += blockDim.x * gridDim.x) { - PADDLE_ASSERT(label[i] >= 0 && label[i] < D); - Y[i] = -math::TolerableValue()(log(X[i * D + label[i]])); + PADDLE_ASSERT(label[i] >= 0 && label[i] < D || label[i] == ignore_index); + Y[i] = ignore_index == label[i] + ? 0 + : -math::TolerableValue()(log(X[i * D + label[i]])); } } @@ -57,7 +60,8 @@ class CrossEntropyFunctor { public: void operator()(const platform::CUDADeviceContext& ctx, framework::Tensor* out, const framework::Tensor* prob, - const framework::Tensor* labels, bool softLabel) { + const framework::Tensor* labels, bool softLabel, + const int ignore_index) { const T* prob_data = prob->data(); T* loss_data = out->mutable_data(ctx.GetPlace()); @@ -77,7 +81,8 @@ class CrossEntropyFunctor { int block = 512; int grid = (batch_size + block - 1) / block; CrossEntropyKernel<<>>( - loss_data, prob_data, label_data, batch_size, class_num); + loss_data, prob_data, label_data, batch_size, class_num, + ignore_index); } } }; diff --git a/paddle/fluid/operators/math/cross_entropy.h b/paddle/fluid/operators/math/cross_entropy.h index adc5b3fe47cd3bf524eb56747b6bd51e345a2eb6..e8aeb5d0575ac0f6b8761e97896df73578e8a103 100644 --- a/paddle/fluid/operators/math/cross_entropy.h +++ b/paddle/fluid/operators/math/cross_entropy.h @@ -38,7 +38,8 @@ class CrossEntropyFunctor { public: void operator()(const DeviceContext& context, framework::Tensor* out, const framework::Tensor* prob, - const framework::Tensor* labels, const bool softLabel); + const framework::Tensor* labels, const bool softLabel, + const int ignore_index); }; } // namespace math } // namespace operators diff --git a/paddle/fluid/operators/softmax_with_cross_entropy_op.cc b/paddle/fluid/operators/softmax_with_cross_entropy_op.cc index 53cb716a979229c99fcbdc12f1aeab4e21b320f3..1a9324ec862fc3dd7ce669c5fed94527cac22b8f 100644 --- a/paddle/fluid/operators/softmax_with_cross_entropy_op.cc +++ b/paddle/fluid/operators/softmax_with_cross_entropy_op.cc @@ -44,6 +44,12 @@ class SoftmaxWithCrossEntropyOpMaker "(bool, default: false), A flag to indicate whether to interpretate " "the given labels as soft labels.") .SetDefault(false); + AddAttr( + "ignore_index", + "(int, default -100), Specifies a target value that is ignored and" + "does not contribute to the input gradient. Only valid if soft_label" + "is set to False") + .SetDefault(-100); AddComment(R"DOC( Softmax With Cross Entropy Operator. diff --git a/paddle/fluid/operators/softmax_with_cross_entropy_op.cu b/paddle/fluid/operators/softmax_with_cross_entropy_op.cu index a559b01ed32a48e3befb37c2ae8935b4f3a4acb0..148faec4af50c4fe3a8e9d1f22e0da70c8ddcb44 100644 --- a/paddle/fluid/operators/softmax_with_cross_entropy_op.cu +++ b/paddle/fluid/operators/softmax_with_cross_entropy_op.cu @@ -26,7 +26,8 @@ using Tensor = framework::Tensor; namespace { template __global__ void CrossEntropyGrad(T* logit_grad, const int64_t* labels, - const int batch_size, const int class_num) { + const int batch_size, const int class_num, + const int ignore_index) { for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch_size; i += blockDim.x * gridDim.x) { int idx = i * class_num + labels[i]; @@ -260,6 +261,7 @@ class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel { auto* loss_data = loss->mutable_data(context.GetPlace()); auto soft_label = context.Attr("soft_label"); + auto ignore_index = context.Attr("ignore_index"); if (soft_label) { int batch_size = logits->dims()[0]; int feature_size = logits->dims()[1]; @@ -272,7 +274,8 @@ class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel { math::SoftmaxCUDNNFunctor()(context.cuda_device_context(), logits, softmax); math::CrossEntropyFunctor()( - context.cuda_device_context(), loss, softmax, labels, false); + context.cuda_device_context(), loss, softmax, labels, false, + ignore_index); } } }; @@ -295,7 +298,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel { const int class_num = logit_grad->dims()[1]; int block = 512; auto stream = context.cuda_device_context().stream(); - + auto ignore_index = context.Attr("ignore_index"); if (context.Attr("soft_label")) { int grid = (batch_size * class_num + block - 1) / block; const T* label_data = labels->data(); @@ -305,7 +308,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel { int grid = (batch_size + block - 1) / block; const int64_t* label_data = labels->data(); CrossEntropyGrad<<>>( - logit_grad_data, label_data, batch_size, class_num); + logit_grad_data, label_data, batch_size, class_num, ignore_index); int num = batch_size * class_num; grid = (num + block - 1) / block; Scale<<>>(logit_grad_data, loss_grad_data, num, diff --git a/paddle/fluid/operators/softmax_with_cross_entropy_op.h b/paddle/fluid/operators/softmax_with_cross_entropy_op.h index dd6f6aca5ada7aa215d3b3444194fc53efeb7020..e9aba3b37b8cc01d4fe5de5200579d4e93f67e56 100644 --- a/paddle/fluid/operators/softmax_with_cross_entropy_op.h +++ b/paddle/fluid/operators/softmax_with_cross_entropy_op.h @@ -45,7 +45,8 @@ class SoftmaxWithCrossEntropyKernel : public framework::OpKernel { math::SoftmaxFunctor()(dev_ctx, logits, softmax); math::CrossEntropyFunctor()( - dev_ctx, loss, softmax, labels, context.Attr("soft_label")); + dev_ctx, loss, softmax, labels, context.Attr("soft_label"), + context.Attr("ignore_index")); } }; diff --git a/python/paddle/fluid/layers/nn.py b/python/paddle/fluid/layers/nn.py index 8408e6d2a12edacb310ed5eb543ad51585f3d82a..3ae0fac4bef5c47964f9a9cd8dd45b57e705e1f8 100644 --- a/python/paddle/fluid/layers/nn.py +++ b/python/paddle/fluid/layers/nn.py @@ -968,7 +968,7 @@ def dropout(x, dropout_prob, is_test=False, seed=None, name=None): return out -def cross_entropy(input, label, soft_label=False): +def cross_entropy(input, label, soft_label=False, ignore_index=-100): """ **Cross Entropy Layer** @@ -1012,7 +1012,10 @@ def cross_entropy(input, label, soft_label=False): tensor with shape [N x D]. soft_label (bool): a flag indicating whether to interpretate the given labels as soft - labels, default `False`. + labels. Default: `False`. + ignore_index (int): Specifies a target value that is ignored and does + not contribute to the input gradient. Only valid + if soft_label is set to False. Default: -100 Returns: A 2-D tensor with shape [N x 1], the cross entropy loss. @@ -1037,7 +1040,8 @@ def cross_entropy(input, label, soft_label=False): inputs={'X': [input], 'Label': [label]}, outputs={'Y': [out]}, - attrs={"soft_label": soft_label}) + attrs={"soft_label": soft_label, + "ignore_index": ignore_index}) return out @@ -4242,7 +4246,10 @@ def multiplex(inputs, index): return out -def softmax_with_cross_entropy(logits, label, soft_label=False): +def softmax_with_cross_entropy(logits, + label, + soft_label=False, + ignore_index=-100): """ **Softmax With Cross Entropy Operator.** @@ -4284,6 +4291,10 @@ def softmax_with_cross_entropy(logits, label, soft_label=False): soft_label is set to true, Label is a Tensor with soft_label (bool): A flag to indicate whether to interpretate the given labels as soft labels. By default, `soft_label` is set to False. + ignore_index (int): Specifies a target value that is ignored and does + not contribute to the input gradient. Only valid + if soft_label is set to False. Default: -100 + Returns: Variable: The cross entropy loss is a 2-D tensor with shape [N x 1]. @@ -4305,7 +4316,8 @@ def softmax_with_cross_entropy(logits, label, soft_label=False): 'Label': label}, outputs={'Softmax': softmax, 'Loss': loss}, - attrs={'soft_label': soft_label}) + attrs={'soft_label': soft_label, + 'ignore_index': ignore_index}) return loss diff --git a/python/paddle/fluid/tests/unittests/test_cross_entropy_op.py b/python/paddle/fluid/tests/unittests/test_cross_entropy_op.py index fa367f95fc9c65dd782d53a2799cacadf74dcfd2..f22badbea0c67b210f7ac4e14e5d647f1cffa6cc 100644 --- a/python/paddle/fluid/tests/unittests/test_cross_entropy_op.py +++ b/python/paddle/fluid/tests/unittests/test_cross_entropy_op.py @@ -209,5 +209,34 @@ class TestCrossEntropyOp6(OpTest): ["X"], "Y", max_relative_error=0.05, numeric_grad_delta=0.001) +class TestCrossEntropyOp7(OpTest): + """Test cross-entropy with ignore index. + """ + + def setUp(self): + self.op_type = "cross_entropy" + batch_size = 30 + class_num = 10 + ignore_index = 3 + + X = randomize_probability(batch_size, class_num, dtype='float64') + + label = np.random.randint(0, class_num, (batch_size, 1), dtype="int64") + cross_entropy = np.asmatrix( + [[-np.log(X[i][label[i][0]])] + if label[i][0] != ignore_index else [0] + for i in range(X.shape[0])], + dtype="float64") + self.inputs = {"X": X, "Label": label} + self.outputs = {"Y": cross_entropy} + self.attrs = {"soft_label": False, "ignore_index": ignore_index} + + def test_check_output(self): + self.check_output() + + def test_check_grad(self): + self.check_grad(["X"], "Y", numeric_grad_delta=0.001) + + if __name__ == "__main__": unittest.main() diff --git a/python/paddle/fluid/tests/unittests/test_layers.py b/python/paddle/fluid/tests/unittests/test_layers.py index bc4d364c74c6cb6b8f0df59e7ede77e6271f4b96..b04346b052903959f44aa96f6fccb7d20652e854 100644 --- a/python/paddle/fluid/tests/unittests/test_layers.py +++ b/python/paddle/fluid/tests/unittests/test_layers.py @@ -556,6 +556,15 @@ class TestBook(unittest.TestCase): out = layers.sequence_enumerate(input=x, win_size=2, pad_value=0) print(str(program)) + def test_cross_entropy(self): + program = Program() + with program_guard(program): + x = layers.data(name="x", shape=[30, 10], dtype="float32") + label = layers.data(name="label", shape=[30, 1], dtype="int32") + mode = 'channel' + out = layers.cross_entropy(x, label, False, 4) + self.assertIsNotNone(out) + if __name__ == '__main__': unittest.main() diff --git a/python/paddle/fluid/tests/unittests/test_softmax_with_cross_entropy_op.py b/python/paddle/fluid/tests/unittests/test_softmax_with_cross_entropy_op.py index b7e5ff6d52ad7dde3dd94b3bd660cfca383e1ada..a18941dd3126ac027f022ddafbbaed8516166233 100644 --- a/python/paddle/fluid/tests/unittests/test_softmax_with_cross_entropy_op.py +++ b/python/paddle/fluid/tests/unittests/test_softmax_with_cross_entropy_op.py @@ -88,5 +88,40 @@ class TestSoftmaxWithCrossEntropyOp2(OpTest): self.check_grad(["Logits"], "Loss") +class TestSoftmaxWithCrossEntropyOp3(OpTest): + """ + Test softmax with cross entropy operator with ignore_index. + """ + + def setUp(self): + self.op_type = "softmax_with_cross_entropy" + batch_size = 41 + class_num = 37 + + logits = np.random.uniform(0.1, 1.0, + [batch_size, class_num]).astype("float64") + softmax = np.apply_along_axis(stable_softmax, 1, logits) + labels = np.random.randint(0, class_num, [batch_size, 1], dtype="int64") + ignore_index = 7 + cross_entropy = np.asmatrix( + [[-np.log(softmax[i][labels[i][0]])] + if labels[i] != ignore_index else [0] + for i in range(softmax.shape[0])], + dtype="float64") + + self.inputs = {"Logits": logits, "Label": labels} + self.outputs = { + "Softmax": softmax.astype("float64"), + "Loss": cross_entropy.astype("float64") + } + self.attrs = {"ignore_index": ignore_index} + + def test_check_output(self): + self.check_output() + + def test_check_grad(self): + self.check_grad(["Logits"], "Loss") + + if __name__ == "__main__": unittest.main() diff --git a/python/paddle/reader/decorator.py b/python/paddle/reader/decorator.py index 6d7ac876fdf65fe0e85cceb94c311a93d9ea39c2..5b9459b670ac8583ee0e65a3c1b51f6248bb6303 100644 --- a/python/paddle/reader/decorator.py +++ b/python/paddle/reader/decorator.py @@ -14,11 +14,14 @@ __all__ = [ 'map_readers', 'buffered', 'compose', 'chain', 'shuffle', - 'ComposeNotAligned', 'firstn', 'xmap_readers', 'PipeReader' + 'ComposeNotAligned', 'firstn', 'xmap_readers', 'PipeReader', + 'multiprocess_reader' ] from threading import Thread import subprocess +import multiprocessing +import sys from six.moves.queue import Queue from six.moves import zip_longest @@ -332,6 +335,100 @@ def xmap_readers(mapper, reader, process_num, buffer_size, order=False): return xreader +def multiprocess_reader(readers, use_pipe=True, queue_size=1000): + """ + multiprocess_reader use python multi process to read data from readers + and then use multiprocess.Queue or multiprocess.Pipe to merge all + data. The process number is equal to the number of input readers, each + process call one reader. + + Multiprocess.Queue require the rw access right to /dev/shm, some + platform does not support. + + you need to create multiple readers first, these readers should be independent + to each other so that each process can work independently. + + An example: + + .. code-block:: python + + reader0 = reader(["file01", "file02"]) + reader1 = reader(["file11", "file12"]) + reader1 = reader(["file21", "file22"]) + reader = multiprocess_reader([reader0, reader1, reader2], + queue_size=100, use_pipe=False) + """ + + try: + import ujson as json + except Exception as e: + sys.stderr.write("import ujson error: " + str(e) + " use json\n") + import json + + assert type(readers) is list and len(readers) > 0 + + def _read_into_queue(reader, queue): + for sample in reader(): + if sample is None: + raise ValueError("sample has None") + queue.put(sample) + queue.put(None) + + def queue_reader(): + queue = multiprocessing.Queue(queue_size) + for reader in readers: + p = multiprocessing.Process( + target=_read_into_queue, args=(reader, queue)) + p.start() + + reader_num = len(readers) + finish_num = 0 + while finish_num < reader_num: + sample = queue.get() + if sample is None: + finish_num += 1 + else: + yield sample + + def _read_into_pipe(reader, conn): + for sample in reader(): + if sample is None: + raise ValueError("sample has None!") + conn.send(json.dumps(sample)) + conn.send(json.dumps(None)) + conn.close() + + def pipe_reader(): + conns = [] + for reader in readers: + parent_conn, child_conn = multiprocessing.Pipe() + conns.append(parent_conn) + p = multiprocessing.Process( + target=_read_into_pipe, args=(reader, child_conn)) + p.start() + + reader_num = len(readers) + finish_num = 0 + conn_to_remove = [] + while finish_num < reader_num: + for conn in conn_to_remove: + conns.remove(conn) + conn_to_remove = [] + for conn in conns: + sample = json.loads(conn.recv()) + if sample is None: + finish_num += 1 + conn.close() + conn_to_remove.append(conn) + else: + yield sample + + if use_pipe: + return pipe_reader + else: + return queue_reader + + def _buf2lines(buf, line_break="\n"): # FIXME: line_break should be automatically configured. lines = buf.split(line_break) diff --git a/python/paddle/reader/tests/decorator_test.py b/python/paddle/reader/tests/decorator_test.py index 537df489b9738864933b3a7922d178701db3d19f..c324092f8850e4bd64955aa9c987746b5cec54b5 100644 --- a/python/paddle/reader/tests/decorator_test.py +++ b/python/paddle/reader/tests/decorator_test.py @@ -14,6 +14,7 @@ import time import unittest +import functools import paddle.reader @@ -174,5 +175,33 @@ class TestPipeReader(unittest.TestCase): temp.close() +class TestMultiProcessReader(unittest.TestCase): + def setup(self): + self.samples = [] + for i in range(1000): + self.samples.append([[i], [i + 1, i + 2], i + 3]) + + def reader(index): + for i in range(len(self.samples)): + if i % 3 == index: + yield self.samples[i] + + self.reader0 = functools.partial(reader, 0) + self.reader1 = functools.partial(reader, 1) + self.reader2 = functools.partial(reader, 2) + + def reader_test(self, use_pipe): + self.setup() + results = [] + for data in paddle.reader.multiprocess_reader( + [self.reader0, self.reader1, self.reader2], 100, use_pipe)(): + results.append(data) + self.assertEqual(sorted(self.samples), sorted(results)) + + def test_multi_process_reader(self): + self.reader_test(use_pipe=False) + self.reader_test(use_pipe=True) + + if __name__ == '__main__': unittest.main()