提交 89e63b13 编写于 作者: W wangmeng28

Merge remote-tracking branch 'upstream/develop' into factorization_machine_layer

...@@ -12,11 +12,11 @@ Machine: ...@@ -12,11 +12,11 @@ Machine:
System: CentOS release 6.3 (Final), Docker 1.12.1. System: CentOS release 6.3 (Final), Docker 1.12.1.
PaddlePaddle: paddlepaddle/paddle:latest (TODO: will rerun after 0.11.0) PaddlePaddle: paddlepaddle/paddle:latest (for MKLML and MKL-DNN), paddlepaddle/paddle:latest-openblas (for OpenBLAS)
- MKL-DNN tag v0.11
- MKL-DNN tag v0.10 - MKLML 2018.0.1.20171007
- MKLML 2018.0.20170720
- OpenBLAS v0.2.20 - OpenBLAS v0.2.20
(TODO: will rerun after 0.11.0)
On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively. On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively.
...@@ -31,17 +31,37 @@ Input image size - 3 * 224 * 224, Time: images/second ...@@ -31,17 +31,37 @@ Input image size - 3 * 224 * 224, Time: images/second
| BatchSize | 64 | 128 | 256 | | BatchSize | 64 | 128 | 256 |
|--------------|-------| -----| --------| |--------------|-------| -----| --------|
| OpenBLAS | 7.82 | 8.62 | 10.34 | | OpenBLAS | 7.80 | 9.00 | 10.80 |
| MKLML | 11.02 | 12.86 | 15.33 | | MKLML | 12.12 | 13.70 | 16.18 |
| MKL-DNN | 27.69 | 28.8 | 29.27 | | MKL-DNN | 28.46 | 29.83 | 30.44 |
chart on batch size 128
TBD
- ResNet-50
| BatchSize | 64 | 128 | 256 |
|--------------|-------| ------| -------|
| OpenBLAS | 25.22 | 25.68 | 27.12 |
| MKLML | 32.52 | 31.89 | 33.12 |
| MKL-DNN | 81.69 | 82.35 | 84.08 |
chart on batch size 128 chart on batch size 128
TBD TBD
- ResNet
- GoogLeNet - GoogLeNet
| BatchSize | 64 | 128 | 256 |
|--------------|-------| ------| -------|
| OpenBLAS | 89.52 | 96.97 | 108.25 |
| MKLML | 128.46| 137.89| 158.63 |
| MKL-DNN     | 250.46| 264.83| 269.50 |
chart on batch size 128
TBD
### Laptop ### Laptop
TBD TBD
### Desktop ### Desktop
......
# Python Data Reader Design Doc # Python Data Reader Design Doc
At training and testing time, PaddlePaddle programs need to read data. To ease the users' work to write data reading code, we define that During the training and testing phases, PaddlePaddle programs need to read data. To help the users write code that performs reading input data, we define the following:
- A *reader* is a function that reads data (from file, network, random number generator, etc) and yields data items. - A *reader*: A function that reads data (from file, network, random number generator, etc) and yields the data items.
- A *reader creator* is a function that returns a reader function. - A *reader creator*: A function that returns a reader function.
- A *reader decorator* is a function, which accepts one or more readers, and returns a reader. - A *reader decorator*: A function, which takes in one or more readers, and returns a reader.
- A *batch reader* is a function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items. - A *batch reader*: A function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items.
and provide function which converts reader to batch reader, frequently used reader creators and reader decorators. and also provide a function which can convert a reader to a batch reader, frequently used reader creators and reader decorators.
## Data Reader Interface ## Data Reader Interface
Indeed, *data reader* doesn't have to be a function that reads and yields data items. It can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`): *Data reader* doesn't have to be a function that reads and yields data items. It can just be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`) as follows:
``` ```
iterable = data_reader() iterable = data_reader()
``` ```
Element produced from the iterable should be a **single** entry of data, **not** a mini batch. That entry of data could be a single item, or a tuple of items. Item should be of [supported type](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int) The item produced from the iterable should be a **single** entry of data and **not** a mini batch. The entry of data could be a single item or a tuple of items. Item should be of one of the [supported types](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int etc.)
An example implementation for single item data reader creator: An example implementation for single item data reader creator is as follows:
```python ```python
def reader_creator_random_image(width, height): def reader_creator_random_image(width, height):
...@@ -29,7 +29,7 @@ def reader_creator_random_image(width, height): ...@@ -29,7 +29,7 @@ def reader_creator_random_image(width, height):
return reader return reader
``` ```
An example implementation for multiple item data reader creator: An example implementation for multiple item data reader creator is as follows:
```python ```python
def reader_creator_random_image_and_label(width, height, label): def reader_creator_random_image_and_label(width, height, label):
def reader(): def reader():
...@@ -40,9 +40,10 @@ def reader_creator_random_image_and_label(width, height, label): ...@@ -40,9 +40,10 @@ def reader_creator_random_image_and_label(width, height, label):
## Batch Reader Interface ## Batch Reader Interface
*batch reader* can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list must be a tuple. *Batch reader* can be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list should be a tuple.
Here are some valid outputs:
Here are valid outputs:
```python ```python
# a mini batch of three data items. Each data item consist three columns of data, each of which is 1. # a mini batch of three data items. Each data item consist three columns of data, each of which is 1.
[(1, 1, 1), [(1, 1, 1),
...@@ -58,20 +59,22 @@ Here are valid outputs: ...@@ -58,20 +59,22 @@ Here are valid outputs:
Please note that each item inside the list must be a tuple, below is an invalid output: Please note that each item inside the list must be a tuple, below is an invalid output:
```python ```python
# wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],). # wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],).
# Otherwise it's ambiguous whether [1,1,1] means a single column of data [1, 1, 1], # Otherwise it is ambiguous whether [1,1,1] means a single column of data [1, 1, 1],
# or three column of datas, each of which is 1. # or three columns of data, each of which is 1.
[[1,1,1], [[1,1,1],
[2,2,2], [2,2,2],
[3,3,3]] [3,3,3]]
``` ```
It's easy to convert from reader to batch reader: It is easy to convert from a reader to a batch reader:
```python ```python
mnist_train = paddle.dataset.mnist.train() mnist_train = paddle.dataset.mnist.train()
mnist_train_batch_reader = paddle.batch(mnist_train, 128) mnist_train_batch_reader = paddle.batch(mnist_train, 128)
``` ```
Also easy to create custom batch reader: It is also straight forward to create a custom batch reader:
```python ```python
def custom_batch_reader(): def custom_batch_reader():
while True: while True:
...@@ -85,7 +88,8 @@ mnist_random_image_batch_reader = custom_batch_reader ...@@ -85,7 +88,8 @@ mnist_random_image_batch_reader = custom_batch_reader
## Usage ## Usage
batch reader, mapping from item(s) read to data layer, batch size and number of total pass will be passed into `paddle.train`: Following is how we can use the reader with PaddlePaddle:
The batch reader, a mapping from item(s) to data layer, the batch size and the number of total passes will be passed into `paddle.train` as follows:
```python ```python
# two data layer is created: # two data layer is created:
...@@ -99,13 +103,13 @@ paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...) ...@@ -99,13 +103,13 @@ paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...)
## Data Reader Decorator ## Data Reader Decorator
*Data reader decorator* takes a single or multiple data reader, returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` syntax. The *Data reader decorator* takes in a single reader or multiple data readers and returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` in the syntax.
Since we have a strict interface for data readers (no parameter, return a single data item). Data reader can be used flexiable via data reader decorators. Following are a few examples: Since we have a strict interface for data readers (no parameters and return a single data item), a data reader can be used in a flexible way using data reader decorators. Following are a few examples:
### Prefetch Data ### Prefetch Data
Since reading data may take time and training can not proceed without data. It is generally a good idea to prefetch data. Since reading data may take some time and training can not proceed without data, it is generally a good idea to prefetch the data.
Use `paddle.reader.buffered` to prefetch data: Use `paddle.reader.buffered` to prefetch data:
...@@ -117,9 +121,9 @@ buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100) ...@@ -117,9 +121,9 @@ buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100)
### Compose Multiple Data Readers ### Compose Multiple Data Readers
For example, we want to use a source of real images (reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661). For example, if we want to use a source of real images (say reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661).
We can do: We can do the following :
```python ```python
def reader_creator_random_image(width, height): def reader_creator_random_image(width, height):
...@@ -139,13 +143,13 @@ false_reader = reader_creator_bool(False) ...@@ -139,13 +143,13 @@ false_reader = reader_creator_bool(False)
reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader) reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader)
# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry. # Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry.
# And we don't care second item at this time. # And we don't care about the second item at this time.
paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...) paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...)
``` ```
### Shuffle ### Shuffle
Given shuffle buffer size `n`, `paddle.reader.shuffle` will return a data reader that buffers `n` data entries and shuffle them before a data entry is read. Given the shuffle buffer size `n`, `paddle.reader.shuffle` returns a data reader that buffers `n` data entries and shuffles them before a data entry is read.
Example: Example:
```python ```python
...@@ -154,21 +158,21 @@ reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512) ...@@ -154,21 +158,21 @@ reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512)
## Q & A ## Q & A
### Why reader return only a single entry, but not a mini batch? ### Why does a reader return only a single entry, and not a mini batch?
Always returning a single entry make reusing existing data readers much easier (e.g., if existing reader return not a single entry but 3 entries, training code will be more complex because it need to handle cases like batch size 2). Returning a single entry makes reusing existing data readers much easier (for example, if an existing reader returns 3 entries instead if a single entry, the training code will be more complicated because it need to handle cases like a batch size 2).
We provide function `paddle.batch` to turn (single entry) reader into batch reader. We provide a function: `paddle.batch` to turn (a single entry) reader into a batch reader.
### Why do we need batch reader, isn't train take reader and batch_size as arguments sufficient? ### Why do we need a batch reader, isn't is sufficient to give the reader and batch_size as arguments during training ?
In most of the case, train taking reader and batch_size as arguments would be sufficent. However sometimes user want to customize order of data entries inside a mini batch. Or even change batch size dynamically. In most of the cases, it would be sufficient to give the reader and batch_size as arguments to the train method. However sometimes the user wants to customize the order of data entries inside a mini batch, or even change the batch size dynamically. For these cases using a batch reader is very efficient and helpful.
### Why use a dictionary but not a list to provide mapping? ### Why use a dictionary instead of a list to provide mapping?
We decided to use dictionary (`{"image":0, "label":1}`) instead of list (`["image", "label"]`) is because that user can easily resue item (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or skip item (e.g., using `{"image_a":0, "label":2}`). Using a dictionary (`{"image":0, "label":1}`) instead of a list (`["image", "label"]`) gives the advantage that the user can easily reuse the items (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or even skip an item (e.g., using `{"image_a":0, "label":2}`).
### How to create custom data reader creator ### How to create a custom data reader creator ?
```python ```python
def image_reader_creator(image_path, label_path, n): def image_reader_creator(image_path, label_path, n):
...@@ -192,7 +196,7 @@ paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...) ...@@ -192,7 +196,7 @@ paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...)
### How is `paddle.train` implemented ### How is `paddle.train` implemented
An example implementation of paddle.train could be: An example implementation of paddle.train is:
```python ```python
def train(batch_reader, mapping, batch_size, total_pass): def train(batch_reader, mapping, batch_size, total_pass):
......
#include <paddle/capi.h> #include <paddle/capi.h>
#include <time.h> #include <time.h>
#include "../common/common.h" #include "../common/common.h"
#define CONFIG_BIN "./trainer_config.bin" #define CONFIG_BIN "./trainer_config.bin"
...@@ -27,20 +28,19 @@ int main() { ...@@ -27,20 +28,19 @@ int main() {
CHECK(paddle_arguments_resize(in_args, 1)); CHECK(paddle_arguments_resize(in_args, 1));
// Create input matrix. // Create input matrix.
paddle_matrix mat = paddle_matrix_create(/* sample_num */ 10, paddle_matrix mat = paddle_matrix_create(/* sample_num */ 1,
/* size */ 784, /* size */ 784,
/* useGPU */ false); /* useGPU */ false);
srand(time(0)); srand(time(0));
std::vector<paddle_real> input; paddle_real* array;
input.resize(784 * 10);
// Get First row.
CHECK(paddle_matrix_get_row(mat, 0, &array));
for (int i = 0; i < input.size(); ++i) { for (int i = 0; i < 784; ++i) {
input[i] = rand() / ((float)RAND_MAX); array[i] = rand() / ((float)RAND_MAX);
} }
// Set value for the input matrix
CHECK(paddle_matrix_set_value(mat, input.data()));
CHECK(paddle_arguments_set_value(in_args, 0, mat)); CHECK(paddle_arguments_set_value(in_args, 0, mat));
...@@ -53,17 +53,18 @@ int main() { ...@@ -53,17 +53,18 @@ int main() {
CHECK(paddle_arguments_get_value(out_args, 0, prob)); CHECK(paddle_arguments_get_value(out_args, 0, prob));
std::std::vector<paddle_real> result; uint64_t height;
int height; uint64_t width;
int width;
CHECK(paddle_matrix_get_shape(prob, &height, &width); CHECK(paddle_matrix_get_shape(prob, &height, &width));
result.resize(height * width); CHECK(paddle_matrix_get_row(prob, 0, &array));
CHECK(paddle_matrix_get_value(prob, result.data()));
printf("Prob: "); printf("Prob: \n");
for (int i = 0; i < height * width; ++i) { for (int i = 0; i < height * width; ++i) {
printf("%.2f ", result[i]); printf("%.4f ", array[i]);
if ((i + 1) % width == 0) {
printf("\n");
}
} }
printf("\n"); printf("\n");
......
...@@ -41,6 +41,7 @@ bool BatchNormBaseLayer::init(const LayerMap& layerMap, ...@@ -41,6 +41,7 @@ bool BatchNormBaseLayer::init(const LayerMap& layerMap,
useGlobalStats_ = config_.use_global_stats(); useGlobalStats_ = config_.use_global_stats();
} }
movingAvgFraction_ = config_.moving_average_fraction(); movingAvgFraction_ = config_.moving_average_fraction();
epsilon_ = config_.epsilon();
weight_.reset(new Weight(1, channels_, parameters_[0])); weight_.reset(new Weight(1, channels_, parameters_[0]));
movingMean_.reset(new Weight(1, channels_, parameters_[1])); movingMean_.reset(new Weight(1, channels_, parameters_[1]));
......
...@@ -94,6 +94,8 @@ protected: ...@@ -94,6 +94,8 @@ protected:
bool useGlobalStats_; bool useGlobalStats_;
// use to compute moving mean and variance. // use to compute moving mean and variance.
real movingAvgFraction_; real movingAvgFraction_;
// Epsilon is a small random noise used in batch normalization for stability.
real epsilon_;
}; };
} // namespace paddle } // namespace paddle
...@@ -22,8 +22,6 @@ namespace paddle { ...@@ -22,8 +22,6 @@ namespace paddle {
REGISTER_LAYER(batch_norm, BatchNormalizationLayer); REGISTER_LAYER(batch_norm, BatchNormalizationLayer);
const real BatchNormalizationLayer::EPS = 1E-5;
bool BatchNormalizationLayer::init(const LayerMap& layerMap, bool BatchNormalizationLayer::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) { const ParameterMap& parameterMap) {
/* Initialize the basic parent class */ /* Initialize the basic parent class */
...@@ -53,7 +51,7 @@ void BatchNormalizationLayer::calMeanAndStd(const MatrixPtr& mat) { ...@@ -53,7 +51,7 @@ void BatchNormalizationLayer::calMeanAndStd(const MatrixPtr& mat) {
calMovingMeanAndVar(); calMovingMeanAndVar();
savedInvVar_->subScalar(-EPS); savedInvVar_->subScalar(-epsilon_);
savedInvVar_->sqrt2(*savedInvVar_); savedInvVar_->sqrt2(*savedInvVar_);
} }
...@@ -74,7 +72,7 @@ void BatchNormalizationLayer::setMeanAndStd() { ...@@ -74,7 +72,7 @@ void BatchNormalizationLayer::setMeanAndStd() {
savedInvVar_->copyFrom(*(movingVar_->getW())); savedInvVar_->copyFrom(*(movingVar_->getW()));
savedInvVar_->downClip(real(0.0)); savedInvVar_->downClip(real(0.0));
savedInvVar_->subScalar(-EPS); savedInvVar_->subScalar(-epsilon_);
savedInvVar_->sqrt2(*savedInvVar_); savedInvVar_->sqrt2(*savedInvVar_);
} }
......
...@@ -39,9 +39,6 @@ public: ...@@ -39,9 +39,6 @@ public:
void backward(const UpdateCallback& callback = nullptr) override; void backward(const UpdateCallback& callback = nullptr) override;
protected: protected:
/// Epsilon value used in the batch normalization formula.
static const real EPS;
/// Load pre-calculated mean and std. /// Load pre-calculated mean and std.
void setMeanAndStd(); void setMeanAndStd();
......
...@@ -21,8 +21,6 @@ namespace paddle { ...@@ -21,8 +21,6 @@ namespace paddle {
REGISTER_LAYER(cudnn_batch_norm, CudnnBatchNormLayer); REGISTER_LAYER(cudnn_batch_norm, CudnnBatchNormLayer);
const double CudnnBatchNormLayer::EPS = 1E-5;
bool CudnnBatchNormLayer::init(const LayerMap& layerMap, bool CudnnBatchNormLayer::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) { const ParameterMap& parameterMap) {
/* Initialize the basic parent class */ /* Initialize the basic parent class */
...@@ -61,6 +59,9 @@ void CudnnBatchNormLayer::forward(PassType passType) { ...@@ -61,6 +59,9 @@ void CudnnBatchNormLayer::forward(PassType passType) {
real* movingMean = movingMean_->getW()->getData(); real* movingMean = movingMean_->getW()->getData();
real* movingVar = movingVar_->getW()->getData(); real* movingVar = movingVar_->getW()->getData();
// cuDNN does not allow an epsilon value less than CUDNN_BN_MIN_EPSILON.
eps_ = std::max(CUDNN_BN_MIN_EPSILON, static_cast<double>(epsilon_));
if (!useGlobalStats_) { if (!useGlobalStats_) {
REGISTER_TIMER_INFO("CudnnBatchFwTimer", getName().c_str()); REGISTER_TIMER_INFO("CudnnBatchFwTimer", getName().c_str());
real* savedMean = savedMean_->getData(); real* savedMean = savedMean_->getData();
...@@ -75,7 +76,7 @@ void CudnnBatchNormLayer::forward(PassType passType) { ...@@ -75,7 +76,7 @@ void CudnnBatchNormLayer::forward(PassType passType) {
1.0 - movingAvgFraction_, 1.0 - movingAvgFraction_,
movingMean, movingMean,
movingVar, movingVar,
EPS, eps_,
savedMean, savedMean,
savedInvVar); savedInvVar);
} else { } else {
...@@ -90,7 +91,7 @@ void CudnnBatchNormLayer::forward(PassType passType) { ...@@ -90,7 +91,7 @@ void CudnnBatchNormLayer::forward(PassType passType) {
beta, beta,
movingMean, movingMean,
movingVar, movingVar,
EPS); eps_);
} else { } else {
// There is a limitation in cudnn library. // There is a limitation in cudnn library.
// When the batch size is larger than 1024 in cuDNN v5.1, // When the batch size is larger than 1024 in cuDNN v5.1,
...@@ -101,7 +102,7 @@ void CudnnBatchNormLayer::forward(PassType passType) { ...@@ -101,7 +102,7 @@ void CudnnBatchNormLayer::forward(PassType passType) {
beta, beta,
movingMean, movingMean,
movingVar, movingVar,
EPS, eps_,
batchSize, batchSize,
channels_, channels_,
imageH_ * imageD_, imageH_ * imageD_,
...@@ -128,6 +129,9 @@ void CudnnBatchNormLayer::backward(const UpdateCallback& callback) { ...@@ -128,6 +129,9 @@ void CudnnBatchNormLayer::backward(const UpdateCallback& callback) {
real* savedMean = savedMean_->getData(); real* savedMean = savedMean_->getData();
real* savedInvVar = savedInvVar_->getData(); real* savedInvVar = savedInvVar_->getData();
// cuDNN does not allow an epsilon value less than CUDNN_BN_MIN_EPSILON.
eps_ = std::max(CUDNN_BN_MIN_EPSILON, static_cast<double>(epsilon_));
auto create = [](MatrixPtr& m, size_t h, size_t w, real** p) { auto create = [](MatrixPtr& m, size_t h, size_t w, real** p) {
Matrix::resizeOrCreate(m, h, w, false, true); Matrix::resizeOrCreate(m, h, w, false, true);
m->zeroMem(); m->zeroMem();
...@@ -157,7 +161,7 @@ void CudnnBatchNormLayer::backward(const UpdateCallback& callback) { ...@@ -157,7 +161,7 @@ void CudnnBatchNormLayer::backward(const UpdateCallback& callback) {
gamma, gamma,
gammaGrad, gammaGrad,
betaGrad, betaGrad,
EPS, eps_,
savedMean, savedMean,
savedInvVar); savedInvVar);
......
...@@ -14,6 +14,7 @@ limitations under the License. */ ...@@ -14,6 +14,7 @@ limitations under the License. */
#pragma once #pragma once
#include <cudnn.h>
#include "BatchNormBaseLayer.h" #include "BatchNormBaseLayer.h"
#include "Layer.h" #include "Layer.h"
#include "paddle/utils/Stat.h" #include "paddle/utils/Stat.h"
...@@ -46,12 +47,9 @@ public: ...@@ -46,12 +47,9 @@ public:
void backward(const UpdateCallback& callback = nullptr) override; void backward(const UpdateCallback& callback = nullptr) override;
protected: protected:
/** /// Epsilon value used in the batch normalization formula.
* Epsilon value used in the batch normalization formula. /// Same epsilon value should be used in forward and backward functions.
* Minimum allowed value is CUDNN_BN_MIN_EPSILON defined in cudnn.h. double eps_;
* Same epsilon value should be used in forward and backward functions.
*/
static const double EPS;
/// Input/output tensor descriptor desc /// Input/output tensor descriptor desc
hl_tensor_descriptor ioDesc_; hl_tensor_descriptor ioDesc_;
......
...@@ -38,12 +38,13 @@ bool MKLDNNAddtoLayer::init(const LayerMap& layerMap, ...@@ -38,12 +38,13 @@ bool MKLDNNAddtoLayer::init(const LayerMap& layerMap,
} }
void MKLDNNAddtoLayer::reshape( void MKLDNNAddtoLayer::reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) { int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) {
CHECK_EQ(layerSize_, getSize()) << "this layer size can not be changed"; CHECK_EQ(layerSize_, getSize()) << "this layer size can not be changed";
reshapeInput(bs, ih, iw); reshapeInput(bs, ih, iw);
ic = inputLayers_[0]->getSize() / ih / iw; ic = inputLayers_[0]->getSize() / ih / iw;
CHECK_EQ((size_t)ic * ih * iw, inputLayers_[0]->getSize()); CHECK_EQ((size_t)ic * ih * iw, inputLayers_[0]->getSize());
CHECK_EQ(inputElemenCnt_, (size_t)bs * ic * ih * iw); CHECK_EQ(inputLayers_[0]->getOutputValue()->getElementCnt(),
(size_t)bs * ic * ih * iw);
for (size_t i = 0; i < inputLayers_.size(); i++) { for (size_t i = 0; i < inputLayers_.size(); i++) {
CHECK_EQ(int64_t(bs), inputLayers_[i]->getOutput().getBatchSize()); CHECK_EQ(int64_t(bs), inputLayers_[i]->getOutput().getBatchSize());
CHECK_EQ(layerSize_, inputLayers_[i]->getSize()); CHECK_EQ(layerSize_, inputLayers_[i]->getSize());
...@@ -57,47 +58,43 @@ void MKLDNNAddtoLayer::reshape( ...@@ -57,47 +58,43 @@ void MKLDNNAddtoLayer::reshape(
} }
void MKLDNNAddtoLayer::resetFwd(std::vector<primitive>& pipeline, void MKLDNNAddtoLayer::resetFwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetFwdBuffers(inVals_, bias, out); resetFwdBuffers(inputs, biasVal_, out);
in = inVals_[0];
std::shared_ptr<sum::primitive_desc> fwdPD; std::shared_ptr<sum::primitive_desc> fwdPD;
std::shared_ptr<sum::primitive_desc> biasPD; std::shared_ptr<sum::primitive_desc> biasPD;
resetFwdPD(fwdPD, biasPD, inVals_, bias, out); resetFwdPD(fwdPD, biasPD, inputs, biasVal_, out);
resetFwdPipeline(pipeline, fwdPD, biasPD, inVals_, bias, out); resetFwdPipeline(pipeline, fwdPD, biasPD, inputs, biasVal_, out);
} }
void MKLDNNAddtoLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNAddtoLayer::resetBwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetBwdBuffers(inGrads_, bias, out); resetBwdBuffers(inputs, biasGrad_, out);
in = inGrads_[0];
// backward only need share output grad to input grad // backward only need share output grad to input grad
for (size_t i = 0; i < inGrads_.size(); i++) { for (size_t i = 0; i < inputs.size(); i++) {
if (inGrads_[i] != nullptr) { if (inputs[i] != nullptr) {
inGrads_[i] = out; inputs[i] = out;
inputLayers_[i]->getOutputGrad()->setData(inGrads_[i]->getData()); inputLayers_[i]->getOutputGrad()->setData(inputs[i]->getData());
} }
} }
// backward bias // backward bias
bwdBias_ = nullptr; bwdBias_ = nullptr;
if (bias) { if (biasGrad_) {
std::vector<float> scales(bs_, 1.0); std::vector<float> scales(bs_, 1.0);
std::vector<memory::primitive_desc> srcPDs(bs_, bias->getPrimitiveDesc()); std::vector<memory::primitive_desc> srcPDs(bs_,
auto biasPD = sum::primitive_desc(bias->getMemoryDesc(), scales, srcPDs); biasGrad_->getPrimitiveDesc());
auto biasPD =
sum::primitive_desc(biasGrad_->getMemoryDesc(), scales, srcPDs);
std::vector<primitive::at> srcs; std::vector<primitive::at> srcs;
for (size_t i = 0; i < grads_.size(); ++i) { for (size_t i = 0; i < grads_.size(); ++i) {
srcs.push_back(*(grads_[i])); srcs.push_back(*(grads_[i]));
} }
bwdBias_.reset(new sum(biasPD, srcs, *bias)); bwdBias_.reset(new sum(biasPD, srcs, *biasGrad_));
pipeline.push_back(*bwdBias_); pipeline.push_back(*bwdBias_);
} }
} }
...@@ -208,7 +205,7 @@ void MKLDNNAddtoLayer::resetBwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, ...@@ -208,7 +205,7 @@ void MKLDNNAddtoLayer::resetBwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
inputs.resize(inputLayers_.size()); inputs.resize(inputLayers_.size());
for (size_t i = 0; i < inputs.size(); i++) { for (size_t i = 0; i < inputs.size(); i++) {
resetInGrad(inputs[i], inVal_->getPrimitiveDesc(), i); resetInGrad(inputs[i], inVals_[i]->getPrimitiveDesc(), i);
CHECK_PRIMITIVE_DESC_EQ(inputs[i], out->getPrimitiveDesc()); CHECK_PRIMITIVE_DESC_EQ(inputs[i], out->getPrimitiveDesc());
} }
......
...@@ -26,9 +26,6 @@ namespace paddle { ...@@ -26,9 +26,6 @@ namespace paddle {
*/ */
class MKLDNNAddtoLayer : public MKLDNNLayer { class MKLDNNAddtoLayer : public MKLDNNLayer {
protected: protected:
std::vector<MKLDNNMatrixPtr> inVals_;
std::vector<MKLDNNMatrixPtr> inGrads_;
// layer size == ic * ih * iw == oc * oh *ow, and can not be changed // layer size == ic * ih * iw == oc * oh *ow, and can not be changed
size_t layerSize_; size_t layerSize_;
...@@ -50,52 +47,19 @@ public: ...@@ -50,52 +47,19 @@ public:
const ParameterMap& parameterMap) override; const ParameterMap& parameterMap) override;
void reshape( void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) override; int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) override;
void resetFwd(std::vector<mkldnn::primitive>& pipeline, void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void resetBwd(std::vector<mkldnn::primitive>& pipeline, void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void updateWeights(const UpdateCallback& callback) override; void updateWeights(const UpdateCallback& callback) override;
void printValueFormat() override {
for (size_t i = 0; i < inVals_.size(); ++i) {
VLOG(MKLDNN_FMTS) << i << " input: " << inVals_[i]->getFormat() << " >>>";
}
if (outVal_) {
VLOG(MKLDNN_FMTS) << outVal_->getFormat() << " >>> ";
}
if (extOutVal_) {
VLOG(MKLDNN_FMTS) << extOutVal_->getFormat();
}
}
void printGradFormat() override {
if (extOutGrad_) {
VLOG(MKLDNN_FMTS) << extOutGrad_->getFormat();
}
if (outGrad_) {
VLOG(MKLDNN_FMTS) << outGrad_->getFormat() << " <<< ";
}
for (size_t i = 0; i < inGrads_.size(); ++i) {
VLOG(MKLDNN_FMTS) << i << " input: " << inGrads_[i]->getFormat() << "<<<";
}
}
protected: protected:
/**
* Forward functions: reset buffers(inputs, output, bias),
* reset primitive descriptor,
* reset pipeline.
*/
void resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, void resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
...@@ -110,17 +74,10 @@ protected: ...@@ -110,17 +74,10 @@ protected:
std::vector<MKLDNNMatrixPtr>& inputs, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* Backward functions: reset buffers(inputs, output, bias)
*/
void resetBwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, void resetBwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* prepare for bias
*/
void prepareBias(MKLDNNMatrixPtr& bias, void prepareBias(MKLDNNMatrixPtr& bias,
const MatrixPtr& biasMat, const MatrixPtr& biasMat,
const MKLDNNMatrixPtr& out, const MKLDNNMatrixPtr& out,
......
...@@ -21,8 +21,6 @@ namespace paddle { ...@@ -21,8 +21,6 @@ namespace paddle {
REGISTER_LAYER(mkldnn_batch_norm, MKLDNNBatchNormLayer); REGISTER_LAYER(mkldnn_batch_norm, MKLDNNBatchNormLayer);
const real MKLDNNBatchNormLayer::EPS = 1E-5;
bool MKLDNNBatchNormLayer::init(const LayerMap& layerMap, bool MKLDNNBatchNormLayer::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) { const ParameterMap& parameterMap) {
if (!MKLDNNLayer::init(layerMap, parameterMap)) { if (!MKLDNNLayer::init(layerMap, parameterMap)) {
...@@ -50,6 +48,8 @@ bool MKLDNNBatchNormLayer::init(const LayerMap& layerMap, ...@@ -50,6 +48,8 @@ bool MKLDNNBatchNormLayer::init(const LayerMap& layerMap,
useGlobalStats_ = config_.use_global_stats(); useGlobalStats_ = config_.use_global_stats();
} }
movingAvgFraction_ = config_.moving_average_fraction(); movingAvgFraction_ = config_.moving_average_fraction();
epsilon_ = config_.epsilon();
VLOG(MKLDNN_BASE) << "--- " << (useGlobalStats_ ? "use" : "do not use") VLOG(MKLDNN_BASE) << "--- " << (useGlobalStats_ ? "use" : "do not use")
<< " --- global stats"; << " --- global stats";
VLOG(MKLDNN_BASE) << "Moving average fraction: " << movingAvgFraction_; VLOG(MKLDNN_BASE) << "Moving average fraction: " << movingAvgFraction_;
...@@ -116,21 +116,20 @@ void MKLDNNBatchNormLayer::calMovingMeanAndVar() { ...@@ -116,21 +116,20 @@ void MKLDNNBatchNormLayer::calMovingMeanAndVar() {
} }
void MKLDNNBatchNormLayer::reshape( void MKLDNNBatchNormLayer::reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) { int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) {
reshapeInput(bs, ih, iw); reshapeInput(bs, ih, iw);
oh = ih; oh = ih;
ow = iw; ow = iw;
// ic_ and oc can not be changed // ic_ and oc can not be changed
CHECK_EQ(inputElemenCnt_ / bs / ih / iw, (size_t)ic) CHECK_EQ((size_t)ic,
inputLayers_[0]->getOutputValue()->getElementCnt() / bs / ih / iw)
<< "Input channel can not be changed"; << "Input channel can not be changed";
reshapeOutput(oh, ow); reshapeOutput(oh, ow);
resizeOutput(bs, oc * oh * ow); resizeOutput(bs, oc * oh * ow);
} }
void MKLDNNBatchNormLayer::resetFwd(std::vector<primitive>& pipeline, void MKLDNNBatchNormLayer::resetFwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
// In training phase, it will always calculate mean and var, // In training phase, it will always calculate mean and var,
// so useGlobalStats must be false. // so useGlobalStats must be false.
...@@ -140,25 +139,23 @@ void MKLDNNBatchNormLayer::resetFwd(std::vector<primitive>& pipeline, ...@@ -140,25 +139,23 @@ void MKLDNNBatchNormLayer::resetFwd(std::vector<primitive>& pipeline,
useGlobalStats_ = false; useGlobalStats_ = false;
} }
resetFwdBuffers(in, wgt, out); resetFwdBuffers(inputs[0], wgtVal_, out);
resetFwdPD(fwdPD_, in, wgt, out); resetFwdPD(fwdPD_, inputs[0], wgtVal_, out);
resetFwdPipeline(pipeline, fwdPD_, in, wgt, out); resetFwdPipeline(pipeline, fwdPD_, inputs[0], wgtVal_, out);
} }
void MKLDNNBatchNormLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNBatchNormLayer::resetBwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
std::shared_ptr<bn_bwd::primitive_desc> pd; std::shared_ptr<bn_bwd::primitive_desc> pd;
resetBwdBuffers(in, wgt, out); resetBwdBuffers(inputs[0], wgtGrad_, out);
resetBwdPD(pd, in, wgt, out); resetBwdPD(pd, inputs[0], wgtGrad_, out);
resetBwdPipeline(pipeline, pd, in, wgt, out); resetBwdPipeline(pipeline, pd, inputs[0], wgtGrad_, out);
} }
void MKLDNNBatchNormLayer::forward(PassType passType) { void MKLDNNBatchNormLayer::forward(PassType passType) {
...@@ -213,7 +210,7 @@ void MKLDNNBatchNormLayer::resetFwdPD( ...@@ -213,7 +210,7 @@ void MKLDNNBatchNormLayer::resetFwdPD(
if (wgt) { if (wgt) {
flags_ = (flags_ | batch_normalization_flag::use_scale_shift); flags_ = (flags_ | batch_normalization_flag::use_scale_shift);
} }
auto fwdDesc = bn_fwd::desc(pk, in->getMemoryDesc(), EPS, flags_); auto fwdDesc = bn_fwd::desc(pk, in->getMemoryDesc(), epsilon_, flags_);
pd.reset(new bn_fwd::primitive_desc(fwdDesc, engine_)); pd.reset(new bn_fwd::primitive_desc(fwdDesc, engine_));
CHECK_PRIMITIVE_DESC_EQ(out, pd->dst_primitive_desc()); CHECK_PRIMITIVE_DESC_EQ(out, pd->dst_primitive_desc());
if (wgt) { if (wgt) {
...@@ -260,9 +257,9 @@ void MKLDNNBatchNormLayer::resetFwdPipeline( ...@@ -260,9 +257,9 @@ void MKLDNNBatchNormLayer::resetFwdPipeline(
void MKLDNNBatchNormLayer::resetBwdBuffers(MKLDNNMatrixPtr& in, void MKLDNNBatchNormLayer::resetBwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
CHECK(inVal_ && outVal_); CHECK(inVals_[0] && outVal_);
resetOutGrad(out, outVal_->getPrimitiveDesc()); resetOutGrad(out, outVal_->getPrimitiveDesc());
resetInGrad(in, inVal_->getPrimitiveDesc()); resetInGrad(in, inVals_[0]->getPrimitiveDesc());
if (gradScaleShift_) { if (gradScaleShift_) {
CHECK(wgtVal_); CHECK(wgtVal_);
resetWithMatrix(wgt, gradScaleShift_, wgtVal_->getPrimitiveDesc()); resetWithMatrix(wgt, gradScaleShift_, wgtVal_->getPrimitiveDesc());
...@@ -280,7 +277,7 @@ void MKLDNNBatchNormLayer::resetBwdPD( ...@@ -280,7 +277,7 @@ void MKLDNNBatchNormLayer::resetBwdPD(
} }
CHECK_PRIMITIVE_DESC_EQ(out, in->getPrimitiveDesc()); CHECK_PRIMITIVE_DESC_EQ(out, in->getPrimitiveDesc());
auto md = in->getMemoryDesc(); auto md = in->getMemoryDesc();
auto bwdDesc = bn_bwd::desc(prop_kind::backward, md, md, EPS, flags_); auto bwdDesc = bn_bwd::desc(prop_kind::backward, md, md, epsilon_, flags_);
pd.reset(new bn_bwd::primitive_desc(bwdDesc, engine_, *fwdPD_)); pd.reset(new bn_bwd::primitive_desc(bwdDesc, engine_, *fwdPD_));
CHECK(pd->weights_primitive_desc() == fwdPD_->weights_primitive_desc()); CHECK(pd->weights_primitive_desc() == fwdPD_->weights_primitive_desc());
CHECK_PRIMITIVE_DESC_EQ(wgt, pd->diff_weights_primitive_desc()); CHECK_PRIMITIVE_DESC_EQ(wgt, pd->diff_weights_primitive_desc());
...@@ -297,11 +294,12 @@ void MKLDNNBatchNormLayer::resetBwdPipeline( ...@@ -297,11 +294,12 @@ void MKLDNNBatchNormLayer::resetBwdPipeline(
if (pd == nullptr) { if (pd == nullptr) {
return; return;
} }
CHECK(inVal_); CHECK(inVals_[0]);
bwdData_.reset( bwdData_.reset(
wgt && wgtVal_ wgt && wgtVal_
? new bn_bwd(*pd, *inVal_, *mean_, *var_, *out, *wgtVal_, *in, *wgt) ? new bn_bwd(
: new bn_bwd(*pd, *inVal_, *mean_, *var_, *out, *in)); *pd, *inVals_[0], *mean_, *var_, *out, *wgtVal_, *in, *wgt)
: new bn_bwd(*pd, *inVals_[0], *mean_, *var_, *out, *in));
pipeline.push_back(*bwdData_); pipeline.push_back(*bwdData_);
} }
......
...@@ -32,7 +32,8 @@ protected: ...@@ -32,7 +32,8 @@ protected:
std::shared_ptr<bn_fwd::primitive_desc> fwdPD_; std::shared_ptr<bn_fwd::primitive_desc> fwdPD_;
// Epsilon value used in the batch normalization formula. // Epsilon value used in the batch normalization formula.
static const real EPS; real epsilon_;
// weight and bias in paddle // weight and bias in paddle
std::unique_ptr<Weight> weight_; std::unique_ptr<Weight> weight_;
std::unique_ptr<Weight> biases_; std::unique_ptr<Weight> biases_;
...@@ -73,18 +74,14 @@ public: ...@@ -73,18 +74,14 @@ public:
void forward(PassType passType) override; void forward(PassType passType) override;
void reshape( void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) override; int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) override;
void resetFwd(std::vector<mkldnn::primitive>& pipeline, void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void resetBwd(std::vector<mkldnn::primitive>& pipeline, void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void updateWeights(const UpdateCallback& callback) override; void updateWeights(const UpdateCallback& callback) override;
...@@ -98,11 +95,7 @@ protected: ...@@ -98,11 +95,7 @@ protected:
* moving = moving * AvgFraction + local * (1 - AvgFraction) * moving = moving * AvgFraction + local * (1 - AvgFraction)
*/ */
void calMovingMeanAndVar(); void calMovingMeanAndVar();
/**
* Forward functions: reset buffers(input, weight, output),
* reset primitive descriptor,
* reset pipeline.
*/
void resetFwdBuffers(MKLDNNMatrixPtr& in, void resetFwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
...@@ -115,12 +108,6 @@ protected: ...@@ -115,12 +108,6 @@ protected:
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* Backward functions: reset buffers(input, weight, output),
* reset primitive descriptor,
* reset pipeline.
*/
void resetBwdBuffers(MKLDNNMatrixPtr& in, void resetBwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
......
...@@ -32,17 +32,16 @@ bool MKLDNNConcatLayer::init(const LayerMap& layerMap, ...@@ -32,17 +32,16 @@ bool MKLDNNConcatLayer::init(const LayerMap& layerMap,
} }
void MKLDNNConcatLayer::reshape( void MKLDNNConcatLayer::reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) { int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) {
reshapeInput(bs, ih, iw); reshapeInput(bs, ih, iw);
ic = inputLayers_[0]->getSize() / ih / iw; ic = inputLayers_[0]->getSize() / ih / iw;
CHECK_EQ((size_t)ic * ih * iw, inputLayers_[0]->getSize()); CHECK_EQ((size_t)ic * ih * iw, inputLayers_[0]->getSize());
CHECK_EQ(inputElemenCnt_, (size_t)bs * ic * ih * iw); CHECK_EQ(inputLayers_[0]->getOutputValue()->getElementCnt(),
(size_t)bs * ic * ih * iw);
CHECK_GT(inputLayers_.size(), 1UL); CHECK_GT(inputLayers_.size(), 1UL);
channels_.resize(inputLayers_.size()); channels_.resize(inputLayers_.size());
channels_[0] = ic; channels_[0] = ic;
// need change the output channel, so use oc_ instead oc = ic;
// TODO(TJ): change API, use &oc
oc_ = ic;
for (size_t i = 1; i < inputLayers_.size(); i++) { for (size_t i = 1; i < inputLayers_.size(); i++) {
int batchsize, height, witdh; int batchsize, height, witdh;
reshapeInput(batchsize, height, witdh, i); reshapeInput(batchsize, height, witdh, i);
...@@ -52,37 +51,31 @@ void MKLDNNConcatLayer::reshape( ...@@ -52,37 +51,31 @@ void MKLDNNConcatLayer::reshape(
channels_[i] = inputLayers_[i]->getSize() / height / witdh; channels_[i] = inputLayers_[i]->getSize() / height / witdh;
CHECK_EQ((size_t)channels_[i] * height * witdh, inputLayers_[i]->getSize()); CHECK_EQ((size_t)channels_[i] * height * witdh, inputLayers_[i]->getSize());
oc_ += channels_[i]; oc += channels_[i];
} }
oh = ih; oh = ih;
ow = iw; ow = iw;
reshapeOutput(oh, ow); reshapeOutput(oh, ow);
resizeOutput(bs, oc_ * oh * ow); resizeOutput(bs, oc * oh * ow);
} }
void MKLDNNConcatLayer::resetFwd(std::vector<primitive>& pipeline, void MKLDNNConcatLayer::resetFwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetFwdBuffers(inVals_, out); resetFwdBuffers(inputs, out);
in = inVals_[0];
std::shared_ptr<concat::primitive_desc> fwdPD; std::shared_ptr<concat::primitive_desc> fwdPD;
resetFwdPD(fwdPD, inVals_, out); resetFwdPD(fwdPD, inputs, out);
resetFwdPipeline(pipeline, fwdPD, inVals_, out); resetFwdPipeline(pipeline, fwdPD, inputs, out);
} }
void MKLDNNConcatLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNConcatLayer::resetBwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetBwdBuffers(inGrads_, out); resetBwdBuffers(inputs, out);
in = inGrads_[0];
resetBwdPipeline(pipeline, bwds_, inGrads_, out); resetBwdPipeline(pipeline, bwds_, inputs, out);
} }
void MKLDNNConcatLayer::resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, void MKLDNNConcatLayer::resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
...@@ -90,10 +83,7 @@ void MKLDNNConcatLayer::resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, ...@@ -90,10 +83,7 @@ void MKLDNNConcatLayer::resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
inputs.resize(inputLayers_.size()); inputs.resize(inputLayers_.size());
bool has8c = false, has16c = false, hasnc = false; bool has8c = false, has16c = false, hasnc = false;
for (size_t i = 0; i < inputs.size(); i++) { for (size_t i = 0; i < inputs.size(); i++) {
// resetInValue will use ic_ so temporary change as current input's channel resetInValue(inputs[i], nullptr, i, channels_[i]);
// TODO(TJ): change ic_ as vector then can remove channels_
ic_ = channels_[i];
resetInValue(inputs[i], nullptr, i);
CHECK(inputs[i]); CHECK(inputs[i]);
auto dm = inputs[i]->getDims(); auto dm = inputs[i]->getDims();
// inputs format can be different, but ndims must equal // inputs format can be different, but ndims must equal
...@@ -114,8 +104,6 @@ void MKLDNNConcatLayer::resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, ...@@ -114,8 +104,6 @@ void MKLDNNConcatLayer::resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
has16c = true; has16c = true;
} }
} }
// change back, ic_ always save the input 0 size
ic_ = channels_[0];
format outFmt; format outFmt;
if (has16c && oc_ % 16 == 0) { if (has16c && oc_ % 16 == 0) {
...@@ -168,14 +156,9 @@ void MKLDNNConcatLayer::resetBwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, ...@@ -168,14 +156,9 @@ void MKLDNNConcatLayer::resetBwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
inputs.resize(inputLayers_.size()); inputs.resize(inputLayers_.size());
for (size_t i = 0; i < inputs.size(); i++) { for (size_t i = 0; i < inputs.size(); i++) {
CHECK(inVals_[i]); CHECK(inVals_[i]);
// resetInGrad will use inVal_
// TODO(TJ): change move inVals_ to MKLDNNLayer ans remove inVal_
inVal_ = inVals_[i];
resetInGrad(inputs[i], inVals_[i]->getPrimitiveDesc(), i); resetInGrad(inputs[i], inVals_[i]->getPrimitiveDesc(), i);
CHECK_PRIMITIVE_DESC_EQ(inputs[i], inVals_[i]->getPrimitiveDesc()); CHECK_PRIMITIVE_DESC_EQ(inputs[i], inVals_[i]->getPrimitiveDesc());
} }
// change back, inVal_ always save the input 0
inVal_ = inVals_[0];
} }
void MKLDNNConcatLayer::resetBwdPipeline( void MKLDNNConcatLayer::resetBwdPipeline(
......
...@@ -26,8 +26,6 @@ namespace paddle { ...@@ -26,8 +26,6 @@ namespace paddle {
*/ */
class MKLDNNConcatLayer : public MKLDNNLayer { class MKLDNNConcatLayer : public MKLDNNLayer {
protected: protected:
std::vector<MKLDNNMatrixPtr> inVals_;
std::vector<MKLDNNMatrixPtr> inGrads_;
std::vector<std::shared_ptr<mkldnn::primitive>> bwds_; std::vector<std::shared_ptr<mkldnn::primitive>> bwds_;
// input channel numbers // input channel numbers
std::vector<int> channels_; std::vector<int> channels_;
...@@ -47,18 +45,14 @@ public: ...@@ -47,18 +45,14 @@ public:
const ParameterMap& parameterMap) override; const ParameterMap& parameterMap) override;
void reshape( void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) override; int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) override;
void resetFwd(std::vector<mkldnn::primitive>& pipeline, void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void resetBwd(std::vector<mkldnn::primitive>& pipeline, void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void printSizeInfo() override { void printSizeInfo() override {
...@@ -72,38 +66,16 @@ public: ...@@ -72,38 +66,16 @@ public:
<< ", " << ow_; << ", " << ow_;
} }
void printValueFormat() override { size_t keepCondition() {
for (size_t i = 0; i < inVals_.size(); ++i) { // reset when the total element size of all inputs changed
VLOG(MKLDNN_FMTS) << "Input " << i << ", " << inputLayers_[i]->getName() size_t totalSize = inputLayers_[0]->getOutputValue()->getElementCnt();
<< ": " << inVals_[i]->getFormat() << " >>>"; for (size_t i = 1; i < inputLayers_.size(); ++i) {
} totalSize += inputLayers_[i]->getOutputValue()->getElementCnt();
if (outVal_) {
VLOG(MKLDNN_FMTS) << outVal_->getFormat() << " >>> ";
}
if (extOutVal_) {
VLOG(MKLDNN_FMTS) << extOutVal_->getFormat();
}
}
void printGradFormat() override {
if (extOutGrad_) {
VLOG(MKLDNN_FMTS) << extOutGrad_->getFormat();
}
if (outGrad_) {
VLOG(MKLDNN_FMTS) << outGrad_->getFormat() << " <<< ";
}
for (size_t i = 0; i < inGrads_.size(); ++i) {
VLOG(MKLDNN_FMTS) << "Input " << i << ", " << inputLayers_[i]->getName()
<< ": " << inGrads_[i]->getFormat() << "<<<";
} }
return totalSize;
} }
protected: protected:
/**
* Forward functions: reset buffers(inputs, output, bias),
* reset primitive descriptor,
* reset pipeline.
*/
void resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, void resetFwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
void resetFwdPD(std::shared_ptr<mkldnn::concat::primitive_desc>& pd, void resetFwdPD(std::shared_ptr<mkldnn::concat::primitive_desc>& pd,
...@@ -113,11 +85,6 @@ protected: ...@@ -113,11 +85,6 @@ protected:
std::shared_ptr<mkldnn::concat::primitive_desc>& pd, std::shared_ptr<mkldnn::concat::primitive_desc>& pd,
std::vector<MKLDNNMatrixPtr>& inputs, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* Backward functions: reset buffers(inputs, output, bias)
* reset primitives and pipeline
*/
void resetBwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs, void resetBwdBuffers(std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
void resetBwdPipeline(std::vector<mkldnn::primitive>& pipeline, void resetBwdPipeline(std::vector<mkldnn::primitive>& pipeline,
......
...@@ -90,7 +90,7 @@ void MKLDNNConvLayer::convertWeightsToPaddle() { ...@@ -90,7 +90,7 @@ void MKLDNNConvLayer::convertWeightsToPaddle() {
} }
void MKLDNNConvLayer::reshape( void MKLDNNConvLayer::reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) { int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) {
reshapeInput(bs, ih, iw); reshapeInput(bs, ih, iw);
// cal output sizes // cal output sizes
...@@ -105,21 +105,17 @@ void MKLDNNConvLayer::reshape( ...@@ -105,21 +105,17 @@ void MKLDNNConvLayer::reshape(
} }
void MKLDNNConvLayer::resetFwd(std::vector<primitive>& pipeline, void MKLDNNConvLayer::resetFwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetFwdPD(fwdPD_); resetFwdPD(fwdPD_);
resetFwdBuffers(fwdPD_, in, wgt, bias, out); resetFwdBuffers(fwdPD_, inputs[0], wgtVal_, biasVal_, out);
resetFwdPipeline(pipeline, fwdPD_, in, wgt, bias, out); resetFwdPipeline(pipeline, fwdPD_, inputs[0], wgtVal_, biasVal_, out);
} }
void MKLDNNConvLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNConvLayer::resetBwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
std::shared_ptr<conv_bwdWgt::primitive_desc> bwdWgtPD; std::shared_ptr<conv_bwdWgt::primitive_desc> bwdWgtPD;
std::shared_ptr<conv_bwdData::primitive_desc> bwdDataPD; std::shared_ptr<conv_bwdData::primitive_desc> bwdDataPD;
...@@ -128,9 +124,10 @@ void MKLDNNConvLayer::resetBwd(std::vector<primitive>& pipeline, ...@@ -128,9 +124,10 @@ void MKLDNNConvLayer::resetBwd(std::vector<primitive>& pipeline,
resetBwdDataPD(bwdDataPD); resetBwdDataPD(bwdDataPD);
resetBwdBuffers(bwdWgtPD, bwdDataPD, in, wgt, bias, out); resetBwdBuffers(bwdWgtPD, bwdDataPD, inputs[0], wgtGrad_, biasGrad_, out);
resetBwdPipeline(pipeline, bwdWgtPD, bwdDataPD, in, wgt, bias, out); resetBwdPipeline(
pipeline, bwdWgtPD, bwdDataPD, inputs[0], wgtGrad_, biasGrad_, out);
} }
void MKLDNNConvLayer::updateWeights(const UpdateCallback& callback) { void MKLDNNConvLayer::updateWeights(const UpdateCallback& callback) {
...@@ -236,14 +233,14 @@ void MKLDNNConvLayer::resetBwdWgtPD( ...@@ -236,14 +233,14 @@ void MKLDNNConvLayer::resetBwdWgtPD(
loadConvSettings(wgtDims, biasDims, strides, dilations, padL, padR); loadConvSettings(wgtDims, biasDims, strides, dilations, padL, padR);
// create backward weight using input, output and weight value memory desc // create backward weight using input, output and weight value memory desc
CHECK(inVal_) << "Should have internal input value"; CHECK(inVals_[0]) << "Should have internal input value";
CHECK(outVal_) << "Should have internal output value"; CHECK(outVal_) << "Should have internal output value";
CHECK(wgtVal_) << "Should have weight value"; CHECK(wgtVal_) << "Should have weight value";
algorithm algo = algorithm::convolution_direct; algorithm algo = algorithm::convolution_direct;
padding_kind padKind = padding_kind::zero; padding_kind padKind = padding_kind::zero;
auto bwdWgtDesc = biasVal_ != nullptr auto bwdWgtDesc = biasVal_ != nullptr
? conv_bwdWgt::desc(algo, ? conv_bwdWgt::desc(algo,
inVal_->getMemoryDesc(), inVals_[0]->getMemoryDesc(),
wgtVal_->getMemoryDesc(), wgtVal_->getMemoryDesc(),
biasVal_->getMemoryDesc(), biasVal_->getMemoryDesc(),
outVal_->getMemoryDesc(), outVal_->getMemoryDesc(),
...@@ -252,7 +249,7 @@ void MKLDNNConvLayer::resetBwdWgtPD( ...@@ -252,7 +249,7 @@ void MKLDNNConvLayer::resetBwdWgtPD(
padR, padR,
padKind) padKind)
: conv_bwdWgt::desc(algo, : conv_bwdWgt::desc(algo,
inVal_->getMemoryDesc(), inVals_[0]->getMemoryDesc(),
wgtVal_->getMemoryDesc(), wgtVal_->getMemoryDesc(),
outVal_->getMemoryDesc(), outVal_->getMemoryDesc(),
strides, strides,
...@@ -260,7 +257,7 @@ void MKLDNNConvLayer::resetBwdWgtPD( ...@@ -260,7 +257,7 @@ void MKLDNNConvLayer::resetBwdWgtPD(
padR, padR,
padKind); padKind);
pd.reset(new conv_bwdWgt::primitive_desc(bwdWgtDesc, engine_, *fwdPD_)); pd.reset(new conv_bwdWgt::primitive_desc(bwdWgtDesc, engine_, *fwdPD_));
CHECK_PRIMITIVE_DESC_EQ(inVal_, pd->src_primitive_desc()); CHECK_PRIMITIVE_DESC_EQ(inVals_[0], pd->src_primitive_desc());
CHECK_PRIMITIVE_DESC_EQ( CHECK_PRIMITIVE_DESC_EQ(
outVal_, outVal_,
pd->diff_dst_primitive_desc(), pd->diff_dst_primitive_desc(),
...@@ -280,12 +277,12 @@ void MKLDNNConvLayer::resetBwdDataPD( ...@@ -280,12 +277,12 @@ void MKLDNNConvLayer::resetBwdDataPD(
memory::dims wgtDims, biasDims, strides, dilations, padL, padR; memory::dims wgtDims, biasDims, strides, dilations, padL, padR;
loadConvSettings(wgtDims, biasDims, strides, dilations, padL, padR); loadConvSettings(wgtDims, biasDims, strides, dilations, padL, padR);
CHECK(inVal_) << "Should have internal input value"; CHECK(inVals_[0]) << "Should have internal input value";
CHECK(outVal_) << "Should have internal output value"; CHECK(outVal_) << "Should have internal output value";
// create backward data using input and output value memory desc // create backward data using input and output value memory desc
// but using weight memory desc with any format // but using weight memory desc with any format
auto bwdDataDesc = conv_bwdData::desc(algorithm::convolution_direct, auto bwdDataDesc = conv_bwdData::desc(algorithm::convolution_direct,
inVal_->getMemoryDesc(), inVals_[0]->getMemoryDesc(),
MKLDNNMatrix::createMemoryDesc(wgtDims), MKLDNNMatrix::createMemoryDesc(wgtDims),
outVal_->getMemoryDesc(), outVal_->getMemoryDesc(),
strides, strides,
...@@ -294,7 +291,7 @@ void MKLDNNConvLayer::resetBwdDataPD( ...@@ -294,7 +291,7 @@ void MKLDNNConvLayer::resetBwdDataPD(
padding_kind::zero); padding_kind::zero);
pd.reset(new conv_bwdData::primitive_desc(bwdDataDesc, engine_, *fwdPD_)); pd.reset(new conv_bwdData::primitive_desc(bwdDataDesc, engine_, *fwdPD_));
CHECK_PRIMITIVE_DESC_EQ( CHECK_PRIMITIVE_DESC_EQ(
inVal_, inVals_[0],
pd->diff_src_primitive_desc(), pd->diff_src_primitive_desc(),
"primitive desc of in value and grad should be equal"); "primitive desc of in value and grad should be equal");
CHECK_PRIMITIVE_DESC_EQ( CHECK_PRIMITIVE_DESC_EQ(
...@@ -346,12 +343,12 @@ void MKLDNNConvLayer::resetBwdPipeline( ...@@ -346,12 +343,12 @@ void MKLDNNConvLayer::resetBwdPipeline(
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
CHECK(inVal_); CHECK(inVals_[0]);
// add bwdWgt handle // add bwdWgt handle
if (bias) { if (bias) {
bwdWgt_.reset(new conv_bwdWgt(*wgtPD, *inVal_, *out, *wgt, *bias)); bwdWgt_.reset(new conv_bwdWgt(*wgtPD, *inVals_[0], *out, *wgt, *bias));
} else { } else {
bwdWgt_.reset(new conv_bwdWgt(*wgtPD, *inVal_, *out, *wgt)); bwdWgt_.reset(new conv_bwdWgt(*wgtPD, *inVals_[0], *out, *wgt));
} }
pipeline.push_back(*bwdWgt_); pipeline.push_back(*bwdWgt_);
......
...@@ -69,18 +69,14 @@ public: ...@@ -69,18 +69,14 @@ public:
const ParameterMap& parameterMap) override; const ParameterMap& parameterMap) override;
void reshape( void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) override; int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) override;
void resetFwd(std::vector<mkldnn::primitive>& pipeline, void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void resetBwd(std::vector<mkldnn::primitive>& pipeline, void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void updateWeights(const UpdateCallback& callback) override; void updateWeights(const UpdateCallback& callback) override;
...@@ -107,48 +103,26 @@ protected: ...@@ -107,48 +103,26 @@ protected:
mkldnn::memory::dims& padL, mkldnn::memory::dims& padL,
mkldnn::memory::dims& padR); mkldnn::memory::dims& padR);
/**
* reset the forward primitive descriptor.
*/
void resetFwdPD(std::shared_ptr<conv_fwd::primitive_desc>& pd); void resetFwdPD(std::shared_ptr<conv_fwd::primitive_desc>& pd);
/**
* reset the MKLDNNMatrix buffers used in forward.
*/
void resetFwdBuffers(std::shared_ptr<conv_fwd::primitive_desc>& pd, void resetFwdBuffers(std::shared_ptr<conv_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* reset the forward pipeline.
*/
void resetFwdPipeline(std::vector<mkldnn::primitive>& pipeline, void resetFwdPipeline(std::vector<mkldnn::primitive>& pipeline,
std::shared_ptr<conv_fwd::primitive_desc>& pd, std::shared_ptr<conv_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* reset the backward weight primitive descriptor.
*/
void resetBwdWgtPD(std::shared_ptr<conv_bwdWgt::primitive_desc>& pd); void resetBwdWgtPD(std::shared_ptr<conv_bwdWgt::primitive_desc>& pd);
/**
* reset the backward data primitive descriptor.
*/
void resetBwdDataPD(std::shared_ptr<conv_bwdData::primitive_desc>& pd); void resetBwdDataPD(std::shared_ptr<conv_bwdData::primitive_desc>& pd);
/**
* reset the MKLDNNMatrix buffers used in backward.
*/
void resetBwdBuffers(std::shared_ptr<conv_bwdWgt::primitive_desc>& wgtPD, void resetBwdBuffers(std::shared_ptr<conv_bwdWgt::primitive_desc>& wgtPD,
std::shared_ptr<conv_bwdData::primitive_desc>& dataPD, std::shared_ptr<conv_bwdData::primitive_desc>& dataPD,
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* reset the backward pipeline.
*/
void resetBwdPipeline(std::vector<mkldnn::primitive>& pipeline, void resetBwdPipeline(std::vector<mkldnn::primitive>& pipeline,
std::shared_ptr<conv_bwdWgt::primitive_desc>& wgtPD, std::shared_ptr<conv_bwdWgt::primitive_desc>& wgtPD,
std::shared_ptr<conv_bwdData::primitive_desc>& dataPD, std::shared_ptr<conv_bwdData::primitive_desc>& dataPD,
......
...@@ -74,7 +74,7 @@ void MKLDNNFcLayer::convertWeightsToPaddle() { ...@@ -74,7 +74,7 @@ void MKLDNNFcLayer::convertWeightsToPaddle() {
} }
void MKLDNNFcLayer::reshape( void MKLDNNFcLayer::reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) { int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) {
reshapeInput(bs, ih, iw); reshapeInput(bs, ih, iw);
CHECK_EQ(iLayerSize_, inputLayers_[0]->getSize()); CHECK_EQ(iLayerSize_, inputLayers_[0]->getSize());
...@@ -87,32 +87,29 @@ void MKLDNNFcLayer::reshape( ...@@ -87,32 +87,29 @@ void MKLDNNFcLayer::reshape(
} }
void MKLDNNFcLayer::resetFwd(std::vector<primitive>& pipeline, void MKLDNNFcLayer::resetFwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetFwdBuffers(in, wgt, bias, out); resetFwdBuffers(inputs[0], wgtVal_, biasVal_, out);
resetFwdPD(fwdPD_, in, wgt, bias, out); resetFwdPD(fwdPD_, inputs[0], wgtVal_, biasVal_, out);
resetFwdPipeline(pipeline, fwdPD_, in, wgt, bias, out); resetFwdPipeline(pipeline, fwdPD_, inputs[0], wgtVal_, biasVal_, out);
} }
void MKLDNNFcLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNFcLayer::resetBwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
std::shared_ptr<fc_bwdWgt::primitive_desc> bwdWgtPD; std::shared_ptr<fc_bwdWgt::primitive_desc> bwdWgtPD;
std::shared_ptr<fc_bwdData::primitive_desc> bwdDataPD; std::shared_ptr<fc_bwdData::primitive_desc> bwdDataPD;
resetBwdBuffers(in, wgt, bias, out); resetBwdBuffers(inputs[0], wgtGrad_, biasGrad_, out);
resetBwdWgtPD(bwdWgtPD, wgt, bias, out); resetBwdWgtPD(bwdWgtPD, wgtGrad_, biasGrad_, out);
resetBwdDataPD(bwdDataPD, in, out); resetBwdDataPD(bwdDataPD, inputs[0], out);
resetBwdPipeline(pipeline, bwdWgtPD, bwdDataPD, in, wgt, bias, out); resetBwdPipeline(
pipeline, bwdWgtPD, bwdDataPD, inputs[0], wgtGrad_, biasGrad_, out);
} }
void MKLDNNFcLayer::updateWeights(const UpdateCallback& callback) { void MKLDNNFcLayer::updateWeights(const UpdateCallback& callback) {
...@@ -193,9 +190,9 @@ void MKLDNNFcLayer::resetBwdBuffers(MKLDNNMatrixPtr& in, ...@@ -193,9 +190,9 @@ void MKLDNNFcLayer::resetBwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
CHECK(inVal_ && outVal_); CHECK(inVals_[0] && outVal_);
resetOutGrad(out, outVal_->getPrimitiveDesc()); resetOutGrad(out, outVal_->getPrimitiveDesc());
resetInGrad(in, inVal_->getPrimitiveDesc()); resetInGrad(in, inVals_[0]->getPrimitiveDesc());
CHECK(wgtVal_); CHECK(wgtVal_);
resetWithMatrix(wgt, weight_->getWGrad(), wgtVal_->getPrimitiveDesc()); resetWithMatrix(wgt, weight_->getWGrad(), wgtVal_->getPrimitiveDesc());
...@@ -212,14 +209,15 @@ void MKLDNNFcLayer::resetBwdWgtPD( ...@@ -212,14 +209,15 @@ void MKLDNNFcLayer::resetBwdWgtPD(
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
CHECK(inVal_); CHECK(inVals_[0]);
fc_bwdWgt::desc bwdWgtDesc = bias ? fc_bwdWgt::desc(inVal_->getMemoryDesc(), fc_bwdWgt::desc bwdWgtDesc =
wgt->getMemoryDesc(), bias ? fc_bwdWgt::desc(inVals_[0]->getMemoryDesc(),
bias->getMemoryDesc(), wgt->getMemoryDesc(),
out->getMemoryDesc()) bias->getMemoryDesc(),
: fc_bwdWgt::desc(inVal_->getMemoryDesc(), out->getMemoryDesc())
wgt->getMemoryDesc(), : fc_bwdWgt::desc(inVals_[0]->getMemoryDesc(),
out->getMemoryDesc()); wgt->getMemoryDesc(),
out->getMemoryDesc());
pd.reset(new fc_bwdWgt::primitive_desc(bwdWgtDesc, engine_, *fwdPD_)); pd.reset(new fc_bwdWgt::primitive_desc(bwdWgtDesc, engine_, *fwdPD_));
} }
...@@ -245,11 +243,11 @@ void MKLDNNFcLayer::resetBwdPipeline( ...@@ -245,11 +243,11 @@ void MKLDNNFcLayer::resetBwdPipeline(
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
CHECK(inVal_); CHECK(inVals_[0]);
if (bias) { if (bias) {
bwdWgt_.reset(new fc_bwdWgt(*bwdWgtPD, *inVal_, *out, *wgt, *bias)); bwdWgt_.reset(new fc_bwdWgt(*bwdWgtPD, *inVals_[0], *out, *wgt, *bias));
} else { } else {
bwdWgt_.reset(new fc_bwdWgt(*bwdWgtPD, *inVal_, *out, *wgt)); bwdWgt_.reset(new fc_bwdWgt(*bwdWgtPD, *inVals_[0], *out, *wgt));
} }
pipeline.push_back(*bwdWgt_); pipeline.push_back(*bwdWgt_);
......
...@@ -52,18 +52,14 @@ public: ...@@ -52,18 +52,14 @@ public:
const ParameterMap& parameterMap) override; const ParameterMap& parameterMap) override;
void reshape( void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) override; int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) override;
void resetFwd(std::vector<mkldnn::primitive>& pipeline, void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void resetBwd(std::vector<mkldnn::primitive>& pipeline, void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void updateWeights(const UpdateCallback& callback) override; void updateWeights(const UpdateCallback& callback) override;
...@@ -73,11 +69,6 @@ public: ...@@ -73,11 +69,6 @@ public:
void convertWeightsToPaddle() override; void convertWeightsToPaddle() override;
protected: protected:
/**
* Forward functions: reset buffers(input, output, weight and bias),
* reset primitive descriptor,
* reset pipeline.
*/
void resetFwdBuffers(MKLDNNMatrixPtr& in, void resetFwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
...@@ -93,13 +84,6 @@ protected: ...@@ -93,13 +84,6 @@ protected:
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* Backward functions: reset buffers(input, output, weight and bias),
* reset primitive descriptor for backward weight,
* reset primitive descriptor for backward data,
* reset pipeline.
*/
void resetBwdBuffers(MKLDNNMatrixPtr& in, void resetBwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
......
...@@ -48,31 +48,20 @@ void MKLDNNLayer::forward(PassType passType) { ...@@ -48,31 +48,20 @@ void MKLDNNLayer::forward(PassType passType) {
REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str()); REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str());
CHECK(!inputLayers_.empty()); CHECK(!inputLayers_.empty());
copySeqInfoToOutputs(); copySeqInfoToOutputs();
size_t elemenCnt = inputLayers_[0]->getOutputValue()->getElementCnt(); if (condition_ != keepCondition()) {
if (inputElemenCnt_ != elemenCnt) {
VLOG(MKLDNN_BASE) << getName() << " reset mkldnn forward"; VLOG(MKLDNN_BASE) << getName() << " reset mkldnn forward";
// reset when input total sizes changed, not only the batchsize condition_ = keepCondition();
inputElemenCnt_ = elemenCnt;
pipelineFwd_.clear();
reshape(bs_, ic_, ih_, iw_, oc_, oh_, ow_); reshape(bs_, ic_, ih_, iw_, oc_, oh_, ow_);
// all cpu device output grad or value share output's printSizeInfo();
// the output_.value and output_.grad are shared with CPU device
shareCPUDevice(); shareCPUDevice();
resetFwd(pipelineFwd_, inVal_, wgtVal_, biasVal_, outVal_); pipelineFwd_.clear();
// MKLDNNLayer output value should be MKLDNNMatrix inVals_.resize(inputLayers_.size(), nullptr);
// so external output value is necessary. extInVals_.resize(inputLayers_.size(), nullptr);
// Then external input value is not necessary, cvtInVals_.resize(inputLayers_.size(), nullptr);
// since input may be mkldnn internal buffer. resetFwd(pipelineFwd_, inVals_, outVal_);
CHECK(extOutVal_) << "external output value is necessary"; prepareValueConversions(pipelineFwd_);
output_.value = std::dynamic_pointer_cast<Matrix>(extOutVal_);
CHECK(inVal_ && outVal_) << "internal memories are necessary";
if (cvtInVal_) {
pipelineFwd_.insert(pipelineFwd_.begin(), *cvtInVal_);
}
if (cvtOutVal_) {
pipelineFwd_.push_back(*cvtOutVal_);
}
convertWeightsFromPaddle(); convertWeightsFromPaddle();
printSizeInfo();
printValueFormat(); printValueFormat();
needResetBwd_ = true; needResetBwd_ = true;
} }
...@@ -80,8 +69,8 @@ void MKLDNNLayer::forward(PassType passType) { ...@@ -80,8 +69,8 @@ void MKLDNNLayer::forward(PassType passType) {
if (inputLayers_[0]->getType() == "data" && inputLayers_.size() == 1) { if (inputLayers_[0]->getType() == "data" && inputLayers_.size() == 1) {
// Update input value data when input layer is "data" type, // Update input value data when input layer is "data" type,
// since the input value data address might be changed. // since the input value data address might be changed.
CHECK(extInVal_); CHECK(extInVals_[0]);
extInVal_->setData(getInputValue(0, CPU_DEVICE)->getData()); extInVals_[0]->setData(getInputValue(0, CPU_DEVICE)->getData());
} }
if (!outputOnlyMKLDNN_) { if (!outputOnlyMKLDNN_) {
...@@ -99,22 +88,13 @@ void MKLDNNLayer::backward(const UpdateCallback& callback) { ...@@ -99,22 +88,13 @@ void MKLDNNLayer::backward(const UpdateCallback& callback) {
if (needResetBwd_) { if (needResetBwd_) {
VLOG(MKLDNN_BASE) << getName() << " reset mkldnn backward"; VLOG(MKLDNN_BASE) << getName() << " reset mkldnn backward";
pipelineBwd_.clear(); pipelineBwd_.clear();
inGrads_.resize(inputLayers_.size(), nullptr);
extInGrads_.resize(inputLayers_.size(), nullptr);
cvtInGrads_.resize(inputLayers_.size(), nullptr);
pipelineMergeGrad_.clear(); pipelineMergeGrad_.clear();
mergeGrad_ = nullptr; mergeGrad_ = nullptr;
resetBwd(pipelineBwd_, inGrad_, wgtGrad_, biasGrad_, outGrad_); resetBwd(pipelineBwd_, inGrads_, outGrad_);
// external output grad is not necessary prepareGradConversions(pipelineBwd_);
// since output may be mkldnn internal buffer or merge them directly.
CHECK(outGrad_) << "internal output grad is necessary";
if (extOutGrad_) {
CHECK_EQ(extOutGrad_->getData(), output_.grad->getData())
<< "the external buffer should share the same data with output_.grad";
}
if (cvtOutGrad_) {
pipelineBwd_.insert(pipelineBwd_.begin(), *cvtOutGrad_);
}
if (cvtInGrad_) {
pipelineBwd_.push_back(*cvtInGrad_);
}
printGradFormat(); printGradFormat();
needResetBwd_ = false; needResetBwd_ = false;
} }
...@@ -141,8 +121,8 @@ void MKLDNNLayer::backward(const UpdateCallback& callback) { ...@@ -141,8 +121,8 @@ void MKLDNNLayer::backward(const UpdateCallback& callback) {
void MKLDNNLayer::reshapeInput(int& batchsize, void MKLDNNLayer::reshapeInput(int& batchsize,
int& height, int& height,
int& width, int& width,
size_t inputIdx) { size_t idx) {
const Argument& input = inputLayers_[inputIdx]->getOutput(); const Argument& input = inputLayers_[idx]->getOutput();
batchsize = input.getBatchSize(); batchsize = input.getBatchSize();
int h = input.getFrameHeight(); int h = input.getFrameHeight();
int w = input.getFrameWidth(); int w = input.getFrameWidth();
...@@ -176,27 +156,30 @@ void MKLDNNLayer::resetWithMatrix(MKLDNNMatrixPtr& dnn, ...@@ -176,27 +156,30 @@ void MKLDNNLayer::resetWithMatrix(MKLDNNMatrixPtr& dnn,
void MKLDNNLayer::resetInValue( void MKLDNNLayer::resetInValue(
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
const std::shared_ptr<memory::primitive_desc>& intPD, const std::shared_ptr<memory::primitive_desc>& intPD,
size_t inputIdx) { size_t idx,
cvtInVal_ = nullptr; int inputChannel) {
extInVal_ = nullptr; cvtInVals_[idx] = nullptr;
extInVals_[idx] = nullptr;
in = nullptr; in = nullptr;
CHECK_GT(bs_ * ic_ * ih_ * iw_, 0); inputChannel = inputChannel == 0 ? ic_ : inputChannel;
CHECK_GT(bs_ * inputChannel * ih_ * iw_, 0);
auto extPD = MKLDNNMatrix::createPrimitiveDesc( auto extPD = MKLDNNMatrix::createPrimitiveDesc(
{bs_, ic_, ih_, iw_}, format::nchw, engine_); {bs_, inputChannel, ih_, iw_}, format::nchw, engine_);
const MatrixPtr& inMat = inputLayers_[inputIdx]->getOutputValue(); const MatrixPtr& inMat = inputLayers_[idx]->getOutputValue();
extInVal_ = std::dynamic_pointer_cast<MKLDNNMatrix>(inMat); extInVals_[idx] = std::dynamic_pointer_cast<MKLDNNMatrix>(inMat);
CHECK_EQ(inputIsOnlyMKLDNN(), extInVal_ != nullptr); CHECK_EQ(inputIsOnlyMKLDNN(), extInVals_[idx] != nullptr);
if (extInVal_ == nullptr || extInVal_->getFormat() == format::nc) { if (extInVals_[idx] == nullptr ||
extInVal_ = MKLDNNMatrix::create(extPD, inMat); extInVals_[idx]->getFormat() == format::nc) {
extInVals_[idx] = MKLDNNMatrix::create(extPD, inMat);
} }
in = extInVal_; in = extInVals_[idx];
if (nullptr == intPD || in->getPrimitiveDesc() == *intPD) { if (nullptr == intPD || in->getPrimitiveDesc() == *intPD) {
return; return;
} }
// need create reorder // need create reorder
in = MKLDNNMatrix::create(*intPD); in = MKLDNNMatrix::create(*intPD);
cvtInVal_ = MKLDNNMatrix::createReorder(extInVal_, in); cvtInVals_[idx] = MKLDNNMatrix::createReorder(extInVals_[idx], in);
CHECK(cvtInVal_) << "should not be emptry"; CHECK(cvtInVals_[idx]) << "should not be emptry";
} }
void MKLDNNLayer::resetOutValue(MKLDNNMatrixPtr& out, void MKLDNNLayer::resetOutValue(MKLDNNMatrixPtr& out,
...@@ -218,11 +201,11 @@ void MKLDNNLayer::resetOutValue(MKLDNNMatrixPtr& out, ...@@ -218,11 +201,11 @@ void MKLDNNLayer::resetOutValue(MKLDNNMatrixPtr& out,
void MKLDNNLayer::resetInGrad(MKLDNNMatrixPtr& in, void MKLDNNLayer::resetInGrad(MKLDNNMatrixPtr& in,
memory::primitive_desc intPD, memory::primitive_desc intPD,
size_t inputIdx) { size_t idx) {
cvtInGrad_ = nullptr; cvtInGrads_[idx] = nullptr;
extInGrad_ = nullptr; extInGrads_[idx] = nullptr;
in = nullptr; in = nullptr;
LayerPtr& input = inputLayers_[inputIdx]; LayerPtr& input = inputLayers_[idx];
if (input->getOutputGrad() == nullptr) { if (input->getOutputGrad() == nullptr) {
// no need input grad // no need input grad
return; return;
...@@ -237,23 +220,25 @@ void MKLDNNLayer::resetInGrad(MKLDNNMatrixPtr& in, ...@@ -237,23 +220,25 @@ void MKLDNNLayer::resetInGrad(MKLDNNMatrixPtr& in,
in = MKLDNNMatrix::create(intPD, inMat); in = MKLDNNMatrix::create(intPD, inMat);
Argument& arg = input->getOutput(this->getName()); Argument& arg = input->getOutput(this->getName());
arg.grad = std::dynamic_pointer_cast<Matrix>(in); arg.grad = std::dynamic_pointer_cast<Matrix>(in);
CHECK_PRIMITIVE_DESC_EQ(inVal_, intPD); CHECK_PRIMITIVE_DESC_EQ(inVals_[idx], intPD);
if (inputIsOnlyMKLDNN()) { if (inputIsOnlyMKLDNN()) {
return; return;
} }
extInGrad_ = in; extInGrads_[idx] = in;
if (isPaddleFormat(extInGrad_->getFormat())) { if (isPaddleFormat(extInGrads_[idx]->getFormat())) {
return; return;
} }
// need create reorder // need create reorder
CHECK(extInVal_ != nullptr && isPaddleFormat(extInVal_->getFormat())) CHECK(extInVals_[idx] != nullptr &&
isPaddleFormat(extInVals_[idx]->getFormat()))
<< "should have external input value and the format must be nchw(nc)"; << "should have external input value and the format must be nchw(nc)";
extInGrad_ = MKLDNNMatrix::create(extInVal_->getPrimitiveDesc(), inMat); extInGrads_[idx] =
CHECK_PRIMITIVE_DESC_EQ(inVal_, intPD); MKLDNNMatrix::create(extInVals_[idx]->getPrimitiveDesc(), inMat);
CHECK_PRIMITIVE_DESC_EQ(inVals_[idx], intPD);
in = MKLDNNMatrix::create(intPD); in = MKLDNNMatrix::create(intPD);
cvtInGrad_ = MKLDNNMatrix::createReorder(in, extInGrad_); cvtInGrads_[idx] = MKLDNNMatrix::createReorder(in, extInGrads_[idx]);
CHECK(cvtInGrad_); CHECK(cvtInGrads_[idx]);
} }
void MKLDNNLayer::resetOutGrad(MKLDNNMatrixPtr& out, void MKLDNNLayer::resetOutGrad(MKLDNNMatrixPtr& out,
...@@ -309,22 +294,8 @@ void MKLDNNLayer::resetMergeGrad(MKLDNNMatrixPtr& out) { ...@@ -309,22 +294,8 @@ void MKLDNNLayer::resetMergeGrad(MKLDNNMatrixPtr& out) {
srcs.push_back(*src); srcs.push_back(*src);
} }
// TODO(TJ): remove me when mkldnn sum support different formats auto sumPD = sum::primitive_desc(out->getMemoryDesc(), scales, srcPDs);
for (size_t i = 1; i < srcPDs.size(); ++i) { mergeGrad_.reset(new sum(sumPD, srcs, *out));
CHECK(srcPDs[0] == srcPDs[i]);
}
tmpOutGrad_ = out;
tmpCvt_ = nullptr;
if (out->getPrimitiveDesc() != srcPDs[0]) {
tmpOutGrad_ = MKLDNNMatrix::create(srcPDs[0]);
tmpCvt_ = MKLDNNMatrix::createReorder(tmpOutGrad_, out);
CHECK(tmpCvt_);
pipelineMergeGrad_.push_back(*tmpCvt_);
}
auto sumPD =
sum::primitive_desc(tmpOutGrad_->getMemoryDesc(), scales, srcPDs);
mergeGrad_.reset(new sum(sumPD, srcs, *tmpOutGrad_));
pipelineMergeGrad_.insert(pipelineMergeGrad_.begin(), *mergeGrad_); pipelineMergeGrad_.insert(pipelineMergeGrad_.begin(), *mergeGrad_);
} }
......
...@@ -34,15 +34,16 @@ typedef std::shared_ptr<MKLDNNLayer> MKLDNNLayerPtr; ...@@ -34,15 +34,16 @@ typedef std::shared_ptr<MKLDNNLayer> MKLDNNLayerPtr;
*/ */
class MKLDNNLayer : public Layer { class MKLDNNLayer : public Layer {
protected: protected:
// input value element count
size_t inputElemenCnt_;
// batch size // batch size
int bs_; int bs_;
// their sizes are always from the first input layer
// input image channel, height and width // input image channel, height and width
int ic_, ih_, iw_; int ic_, ih_, iw_;
// output image channel, height and width // output image channel, height and width
int oc_, oh_, ow_; int oc_, oh_, ow_;
// the condition that forward need be reset
size_t condition_;
// backward also need reset after reset forward handle // backward also need reset after reset forward handle
bool needResetBwd_; bool needResetBwd_;
...@@ -67,18 +68,18 @@ protected: ...@@ -67,18 +68,18 @@ protected:
* When all layers are mkldnn layers, they could save internal data. * When all layers are mkldnn layers, they could save internal data.
*/ */
// below MKLDNNMatrix buffers are all internal buffers // below MKLDNNMatrix buffers are all internal buffers
MKLDNNMatrixPtr inVal_; std::vector<MKLDNNMatrixPtr> inVals_;
MKLDNNMatrixPtr inGrad_; std::vector<MKLDNNMatrixPtr> inGrads_;
MKLDNNMatrixPtr outVal_; MKLDNNMatrixPtr outVal_;
MKLDNNMatrixPtr outGrad_; MKLDNNMatrixPtr outGrad_;
// below are external value and grad // below are external value and grad
MKLDNNMatrixPtr extInVal_; std::vector<MKLDNNMatrixPtr> extInVals_;
MKLDNNMatrixPtr extInGrad_; std::vector<MKLDNNMatrixPtr> extInGrads_;
MKLDNNMatrixPtr extOutVal_; MKLDNNMatrixPtr extOutVal_;
MKLDNNMatrixPtr extOutGrad_; MKLDNNMatrixPtr extOutGrad_;
// convert handle between external and internal buffers // convert handle between external and internal buffers
std::shared_ptr<mkldnn::reorder> cvtInVal_; std::vector<std::shared_ptr<mkldnn::reorder>> cvtInVals_;
std::shared_ptr<mkldnn::reorder> cvtInGrad_; std::vector<std::shared_ptr<mkldnn::reorder>> cvtInGrads_;
std::shared_ptr<mkldnn::reorder> cvtOutVal_; std::shared_ptr<mkldnn::reorder> cvtOutVal_;
std::shared_ptr<mkldnn::reorder> cvtOutGrad_; std::shared_ptr<mkldnn::reorder> cvtOutGrad_;
...@@ -93,23 +94,11 @@ protected: ...@@ -93,23 +94,11 @@ protected:
std::vector<mkldnn::primitive> pipelineMergeGrad_; std::vector<mkldnn::primitive> pipelineMergeGrad_;
// tmp input argument to save input grad, only used to merge grad // tmp input argument to save input grad, only used to merge grad
Argument tmpInArg_; Argument tmpInArg_;
// since mkldnn sum do not support different formats:
// can refer to https://github.com/01org/mkl-dnn/issues/134
// so need create reorder manually and save tmp MKLDNNMatrix
MKLDNNMatrixPtr tmpOutGrad_;
std::shared_ptr<mkldnn::primitive> tmpCvt_;
public: public:
explicit MKLDNNLayer(const LayerConfig& config) explicit MKLDNNLayer(const LayerConfig& config)
: Layer(config), : Layer(config),
inputElemenCnt_(0), condition_(0),
bs_(0),
ic_(0),
ih_(0),
iw_(0),
oc_(0),
oh_(0),
ow_(0),
needResetBwd_(true), needResetBwd_(true),
outputOnlyMKLDNN_(false), outputOnlyMKLDNN_(false),
engine_(mkldnn::engine::cpu, 0), engine_(mkldnn::engine::cpu, 0),
...@@ -125,31 +114,28 @@ public: ...@@ -125,31 +114,28 @@ public:
virtual void backward(const UpdateCallback& callback); virtual void backward(const UpdateCallback& callback);
/** /**
* reshape the input image sizes * reshape the input and output channels and image sizes
* and reset output image and buffer size * and reset output buffer size
* output channel can not be changed
*/ */
virtual void reshape( virtual void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) = 0; int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) = 0;
/** /**
* reset the mkldnn forward primitve and memories * reset the mkldnn forward primitve and memories
* only would be called when input size changes * only would be called when input size changes
* weight and bias buffers should be coverd by child class itself
*/ */
virtual void resetFwd(std::vector<mkldnn::primitive>& pipeline, virtual void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) = 0; MKLDNNMatrixPtr& out) = 0;
/** /**
* reset the mkldnn backward primitve and memories * reset the mkldnn backward primitve and memories
* only would be called when needed * only would be called when needed
* weight and bias buffers should be coverd by child class itself
*/ */
virtual void resetBwd(std::vector<mkldnn::primitive>& pipeline, virtual void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) = 0; MKLDNNMatrixPtr& out) = 0;
/** /**
...@@ -175,13 +161,19 @@ public: ...@@ -175,13 +161,19 @@ public:
void addOutputArgument(int deviceId) { Layer::addOutputArgument(deviceId); } void addOutputArgument(int deviceId) { Layer::addOutputArgument(deviceId); }
protected: protected:
/**
* Some layers may have different condition to reset the forward.
* The function returns the condition that do not need reset forward.
*/
inline virtual size_t keepCondition() {
// reset when the first input element size changed, not only the batchsize
return inputLayers_[0]->getOutputValue()->getElementCnt();
}
/** /**
* reshape the input image sizes and input batchsize * reshape the input image sizes and input batchsize
*/ */
void reshapeInput(int& batchsize, void reshapeInput(int& batchsize, int& height, int& width, size_t idx = 0);
int& height,
int& width,
size_t inputIdx = 0);
/** /**
* reshape output image sizes * reshape output image sizes
...@@ -199,11 +191,13 @@ protected: ...@@ -199,11 +191,13 @@ protected:
/** /**
* reset input value from input MKLDNNMatrix and internal primitive desc. * reset input value from input MKLDNNMatrix and internal primitive desc.
* reset both internal and external buffer and create reorder if necessary. * reset both internal and external buffer and create reorder if necessary.
* input channel may be different in concat.
*/ */
void resetInValue( void resetInValue(
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
const std::shared_ptr<mkldnn::memory::primitive_desc>& intPD = nullptr, const std::shared_ptr<mkldnn::memory::primitive_desc>& intPD = nullptr,
size_t inputIdx = 0); size_t idx = 0,
int inputChannel = 0);
/** /**
* reset output value from internal primitive desc. * reset output value from internal primitive desc.
...@@ -218,7 +212,7 @@ protected: ...@@ -218,7 +212,7 @@ protected:
*/ */
void resetInGrad(MKLDNNMatrixPtr& in, void resetInGrad(MKLDNNMatrixPtr& in,
mkldnn::memory::primitive_desc intPD, mkldnn::memory::primitive_desc intPD,
size_t inputIdx = 0); size_t idx = 0);
/** /**
* reset output grad from internal primitive desc. * reset output grad from internal primitive desc.
...@@ -296,17 +290,19 @@ protected: ...@@ -296,17 +290,19 @@ protected:
* print the mkldnn memory format of value * print the mkldnn memory format of value
*/ */
virtual void printValueFormat() { virtual void printValueFormat() {
if (extInVal_) { for (size_t i = 0; i < inVals_.size(); ++i) {
VLOG(MKLDNN_FMTS) << extInVal_->getFormat() << " >>> "; if (!inVals_[i]) {
} continue;
if (inVal_) { }
VLOG(MKLDNN_FMTS) << inVal_->getFormat() << " >>>"; VLOG(MKLDNN_FMTS) << "Input " << i << ", " << inputLayers_[i]->getName()
<< ": " << (extInVals_[i] ? extInVals_[i]->getFormat()
: inVals_[i]->getFormat())
<< " >>> " << inVals_[i]->getFormat() << " >>>";
} }
if (outVal_) { if (outVal_) {
VLOG(MKLDNN_FMTS) << outVal_->getFormat() << " >>> "; VLOG(MKLDNN_FMTS) << outVal_->getFormat() << " >>> "
} << (extOutVal_ ? extOutVal_->getFormat()
if (extOutVal_) { : outVal_->getFormat());
VLOG(MKLDNN_FMTS) << extOutVal_->getFormat();
} }
if (wgtVal_) { if (wgtVal_) {
VLOG(MKLDNN_FMTS) << "Weight value format: " << wgtVal_->getFormat(); VLOG(MKLDNN_FMTS) << "Weight value format: " << wgtVal_->getFormat();
...@@ -320,17 +316,19 @@ protected: ...@@ -320,17 +316,19 @@ protected:
* print the mkldnn memory format of grad * print the mkldnn memory format of grad
*/ */
virtual void printGradFormat() { virtual void printGradFormat() {
if (extOutGrad_) {
VLOG(MKLDNN_FMTS) << extOutGrad_->getFormat();
}
if (outGrad_) { if (outGrad_) {
VLOG(MKLDNN_FMTS) << outGrad_->getFormat() << " <<< "; VLOG(MKLDNN_FMTS) << outGrad_->getFormat() << " <<< "
<< (extOutGrad_ ? extOutGrad_->getFormat()
: outGrad_->getFormat());
} }
if (inGrad_) { for (size_t i = 0; i < inGrads_.size(); ++i) {
VLOG(MKLDNN_FMTS) << inGrad_->getFormat() << " <<<"; if (!inGrads_[i]) {
} continue;
if (extInGrad_) { }
VLOG(MKLDNN_FMTS) << extInGrad_->getFormat() << " <<< "; VLOG(MKLDNN_FMTS) << "Input " << i << ", " << inputLayers_[i]->getName()
<< ": " << (extInGrads_[i] ? extInGrads_[i]->getFormat()
: inGrads_[i]->getFormat())
<< " <<< " << inGrads_[i]->getFormat() << " <<<";
} }
if (wgtGrad_) { if (wgtGrad_) {
VLOG(MKLDNN_FMTS) << "Weight grad format: " << wgtGrad_->getFormat(); VLOG(MKLDNN_FMTS) << "Weight grad format: " << wgtGrad_->getFormat();
...@@ -437,6 +435,41 @@ private: ...@@ -437,6 +435,41 @@ private:
outputOtherDevice_[i].cpuSequenceDims = output_.cpuSequenceDims; outputOtherDevice_[i].cpuSequenceDims = output_.cpuSequenceDims;
} }
} }
void prepareValueConversions(std::vector<mkldnn::primitive>& pipeline) {
// MKLDNNLayer output value should be MKLDNNMatrix
// so external output value is necessary.
// Then external input value is not necessary,
// since input may be mkldnn internal buffer.
CHECK(extOutVal_) << "external output value is necessary";
output_.value = std::dynamic_pointer_cast<Matrix>(extOutVal_);
CHECK(inVals_[0] && outVal_) << "internal memories are necessary";
for (size_t i = 0; i < cvtInVals_.size(); ++i) {
if (cvtInVals_[i]) {
pipeline.insert(pipeline.begin(), *cvtInVals_[i]);
}
}
if (cvtOutVal_) {
pipeline.push_back(*cvtOutVal_);
}
}
void prepareGradConversions(std::vector<mkldnn::primitive>& pipeline) {
// external output grad is not necessary
// since output may be mkldnn internal buffer or merge them directly.
CHECK(outGrad_) << "internal output grad is necessary";
if (extOutGrad_) {
CHECK_EQ(extOutGrad_->getData(), output_.grad->getData())
<< "the external buffer should share the same data with output_.grad";
}
if (cvtOutGrad_) {
pipeline.insert(pipeline.begin(), *cvtOutGrad_);
}
for (size_t i = 0; i < cvtInGrads_.size(); ++i) {
if (cvtInGrads_[i]) {
pipeline.push_back(*cvtInGrads_[i]);
}
}
}
}; };
} // namespace paddle } // namespace paddle
...@@ -58,10 +58,11 @@ bool MKLDNNPoolLayer::init(const LayerMap& layerMap, ...@@ -58,10 +58,11 @@ bool MKLDNNPoolLayer::init(const LayerMap& layerMap,
} }
void MKLDNNPoolLayer::reshape( void MKLDNNPoolLayer::reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) { int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) {
reshapeInput(bs, ih, iw); reshapeInput(bs, ih, iw);
// ic_ and oc can not be changed // ic_ and oc can not be changed
CHECK_EQ(inputElemenCnt_ / bs / ih / iw, (size_t)ic) CHECK_EQ((size_t)ic,
inputLayers_[0]->getOutputValue()->getElementCnt() / bs / ih / iw)
<< "Input channel can not be changed"; << "Input channel can not be changed";
// cal output sizes // cal output sizes
...@@ -74,29 +75,25 @@ void MKLDNNPoolLayer::reshape( ...@@ -74,29 +75,25 @@ void MKLDNNPoolLayer::reshape(
} }
void MKLDNNPoolLayer::resetFwd(std::vector<primitive>& pipeline, void MKLDNNPoolLayer::resetFwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetFwdBuffers(in, out); resetFwdBuffers(inputs[0], out);
resetFwdPD(fwdPD_, in, out); resetFwdPD(fwdPD_, inputs[0], out);
resetFwdPipeline(pipeline, fwdPD_, in, out); resetFwdPipeline(pipeline, fwdPD_, inputs[0], out);
} }
void MKLDNNPoolLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNPoolLayer::resetBwd(std::vector<primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
std::shared_ptr<pool_bwd::primitive_desc> pd; std::shared_ptr<pool_bwd::primitive_desc> pd;
resetBwdBuffers(in, out); resetBwdBuffers(inputs[0], out);
resetBwdPD(pd, in, out); resetBwdPD(pd, inputs[0], out);
resetBwdPipeline(pipeline, pd, in, out); resetBwdPipeline(pipeline, pd, inputs[0], out);
} }
void MKLDNNPoolLayer::resetFwdBuffers(MKLDNNMatrixPtr& in, void MKLDNNPoolLayer::resetFwdBuffers(MKLDNNMatrixPtr& in,
...@@ -151,9 +148,9 @@ void MKLDNNPoolLayer::resetFwdPipeline( ...@@ -151,9 +148,9 @@ void MKLDNNPoolLayer::resetFwdPipeline(
void MKLDNNPoolLayer::resetBwdBuffers(MKLDNNMatrixPtr& in, void MKLDNNPoolLayer::resetBwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
CHECK(inVal_ && outVal_); CHECK(inVals_[0] && outVal_);
resetOutGrad(out, outVal_->getPrimitiveDesc()); resetOutGrad(out, outVal_->getPrimitiveDesc());
resetInGrad(in, inVal_->getPrimitiveDesc()); resetInGrad(in, inVals_[0]->getPrimitiveDesc());
} }
void MKLDNNPoolLayer::resetBwdPD(std::shared_ptr<pool_bwd::primitive_desc>& pd, void MKLDNNPoolLayer::resetBwdPD(std::shared_ptr<pool_bwd::primitive_desc>& pd,
......
...@@ -53,18 +53,14 @@ public: ...@@ -53,18 +53,14 @@ public:
const ParameterMap& parameterMap) override; const ParameterMap& parameterMap) override;
void reshape( void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) override; int& bs, int& ic, int& ih, int& iw, int& oc, int& oh, int& ow) override;
void resetFwd(std::vector<mkldnn::primitive>& pipeline, void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void resetBwd(std::vector<mkldnn::primitive>& pipeline, void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in, std::vector<MKLDNNMatrixPtr>& inputs,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void printSizeInfo() override { void printSizeInfo() override {
...@@ -75,11 +71,6 @@ public: ...@@ -75,11 +71,6 @@ public:
} }
protected: protected:
/**
* Forward functions: reset buffers(input, output),
* reset primitive descriptor,
* reset pipeline.
*/
void resetFwdBuffers(MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& out); void resetFwdBuffers(MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& out);
void resetFwdPD(std::shared_ptr<pool_fwd::primitive_desc>& pd, void resetFwdPD(std::shared_ptr<pool_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr in, MKLDNNMatrixPtr in,
...@@ -88,12 +79,6 @@ protected: ...@@ -88,12 +79,6 @@ protected:
std::shared_ptr<pool_fwd::primitive_desc>& pd, std::shared_ptr<pool_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* Backward functions: reset buffers(input, output),
* reset primitive descriptor,
* reset pipeline.
*/
void resetBwdBuffers(MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& out); void resetBwdBuffers(MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& out);
void resetBwdPD(std::shared_ptr<pool_bwd::primitive_desc>& pd, void resetBwdPD(std::shared_ptr<pool_bwd::primitive_desc>& pd,
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
......
...@@ -315,7 +315,7 @@ TEST(MKLDNNLayer, AddtoLayer) { ...@@ -315,7 +315,7 @@ TEST(MKLDNNLayer, AddtoLayer) {
static void getMKLDNNConcatConfig(TestConfig& cfg, static void getMKLDNNConcatConfig(TestConfig& cfg,
const std::vector<testImageDesc>& inputs) { const std::vector<testImageDesc>& inputs) {
CHECK_GE(inputs.size(), 2) << "at least two inputs"; CHECK_GE(inputs.size(), 2UL) << "at least two inputs";
int oc = inputs[0].ic; int oc = inputs[0].ic;
for (size_t i = 1; i < inputs.size(); ++i) { for (size_t i = 1; i < inputs.size(); ++i) {
CHECK_EQ(inputs[i].bs, inputs[0].bs); CHECK_EQ(inputs[i].bs, inputs[0].bs);
......
...@@ -98,7 +98,6 @@ $y = \max(x, 0)$ ...@@ -98,7 +98,6 @@ $y = \max(x, 0)$
} }
}; };
template <typename AttrType>
class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker { class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LeakyReluOpMaker(framework::OpProto *proto, LeakyReluOpMaker(framework::OpProto *proto,
...@@ -106,8 +105,7 @@ class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -106,8 +105,7 @@ class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of LeakyRelu operator"); AddInput("X", "Input of LeakyRelu operator");
AddOutput("Y", "Output of LeakyRelu operator"); AddOutput("Y", "Output of LeakyRelu operator");
AddAttr<AttrType>("alpha", "The small negative slope") AddAttr<float>("alpha", "The small negative slope").SetDefault(0.02f);
.SetDefault(static_cast<AttrType>(0.02f));
AddComment(R"DOC( AddComment(R"DOC(
LeakyRelu Activation Operator. LeakyRelu Activation Operator.
...@@ -117,7 +115,6 @@ $y = \max(x, \alpha * x)$ ...@@ -117,7 +115,6 @@ $y = \max(x, \alpha * x)$
} }
}; };
template <typename AttrType>
class SoftShrinkOpMaker : public framework::OpProtoAndCheckerMaker { class SoftShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SoftShrinkOpMaker(framework::OpProto *proto, SoftShrinkOpMaker(framework::OpProto *proto,
...@@ -125,8 +122,7 @@ class SoftShrinkOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -125,8 +122,7 @@ class SoftShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Softshrink operator"); AddInput("X", "Input of Softshrink operator");
AddOutput("Y", "Output of Softshrink operator"); AddOutput("Y", "Output of Softshrink operator");
AddAttr<AttrType>("lambda", "non-negative offset") AddAttr<float>("lambda", "non-negative offset").SetDefault(0.5f);
.SetDefault(static_cast<AttrType>(0.5f));
AddComment(R"DOC( AddComment(R"DOC(
Softshrink Activation Operator. Softshrink Activation Operator.
...@@ -173,7 +169,6 @@ $$y = x - \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$ ...@@ -173,7 +169,6 @@ $$y = x - \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$
} }
}; };
template <typename AttrType>
class HardShrinkOpMaker : public framework::OpProtoAndCheckerMaker { class HardShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
HardShrinkOpMaker(framework::OpProto *proto, HardShrinkOpMaker(framework::OpProto *proto,
...@@ -181,8 +176,8 @@ class HardShrinkOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -181,8 +176,8 @@ class HardShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of HardShrink operator"); AddInput("X", "Input of HardShrink operator");
AddOutput("Y", "Output of HardShrink operator"); AddOutput("Y", "Output of HardShrink operator");
AddAttr<AttrType>("threshold", "The value of threshold for HardShrink") AddAttr<float>("threshold", "The value of threshold for HardShrink")
.SetDefault(static_cast<AttrType>(0.5)); .SetDefault(0.5f);
AddComment(R"DOC( AddComment(R"DOC(
HardShrink Activation Operator. HardShrink Activation Operator.
...@@ -308,17 +303,16 @@ $$y = \frac{x}{1 + |x|}$$ ...@@ -308,17 +303,16 @@ $$y = \frac{x}{1 + |x|}$$
} }
}; };
template <typename AttrType>
class BReluOpMaker : public framework::OpProtoAndCheckerMaker { class BReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BReluOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) BReluOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of BRelu operator"); AddInput("X", "Input of BRelu operator");
AddOutput("Y", "Output of BRelu operator"); AddOutput("Y", "Output of BRelu operator");
AddAttr<AttrType>("t_min", "The min marginal value of BRelu") AddAttr<float>("t_min", "The min marginal value of BRelu")
.SetDefault(static_cast<AttrType>(0)); .SetDefault(static_cast<float>(0));
AddAttr<AttrType>("t_max", "The max marginal value of BRelu") AddAttr<float>("t_max", "The max marginal value of BRelu")
.SetDefault(static_cast<AttrType>(24)); .SetDefault(static_cast<float>(24));
AddComment(R"DOC( AddComment(R"DOC(
BRelu Activation Operator. BRelu Activation Operator.
...@@ -328,7 +322,6 @@ $y = \max(\min(x, t_{min}), t_{max})$ ...@@ -328,7 +322,6 @@ $y = \max(\min(x, t_{min}), t_{max})$
} }
}; };
template <typename AttrType>
class SoftReluOpMaker : public framework::OpProtoAndCheckerMaker { class SoftReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SoftReluOpMaker(framework::OpProto *proto, SoftReluOpMaker(framework::OpProto *proto,
...@@ -336,8 +329,8 @@ class SoftReluOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -336,8 +329,8 @@ class SoftReluOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of SoftRelu operator"); AddInput("X", "Input of SoftRelu operator");
AddOutput("Y", "Output of SoftRelu operator"); AddOutput("Y", "Output of SoftRelu operator");
AddAttr<AttrType>("threshold", "The threshold value of SoftRelu") AddAttr<float>("threshold", "The threshold value of SoftRelu")
.SetDefault(static_cast<AttrType>(40)); .SetDefault(40.0f);
AddComment(R"DOC( AddComment(R"DOC(
SoftRelu Activation Operator. SoftRelu Activation Operator.
...@@ -347,15 +340,13 @@ $y = \ln(1 + \exp(\max(\min(x, threshold), threshold))$ ...@@ -347,15 +340,13 @@ $y = \ln(1 + \exp(\max(\min(x, threshold), threshold))$
} }
}; };
template <typename AttrType>
class ELUOpMaker : public framework::OpProtoAndCheckerMaker { class ELUOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ELUOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) ELUOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of ELU operator"); AddInput("X", "Input of ELU operator");
AddOutput("Y", "Output of ELU operator"); AddOutput("Y", "Output of ELU operator");
AddAttr<AttrType>("alpha", "The alpha value of ELU") AddAttr<float>("alpha", "The alpha value of ELU").SetDefault(1.0f);
.SetDefault(static_cast<AttrType>(1.0f));
AddComment(R"DOC( AddComment(R"DOC(
ELU Activation Operator. ELU Activation Operator.
...@@ -368,15 +359,14 @@ $y = \max(0, x) + \min(0, \alpha * (e^x - 1))$ ...@@ -368,15 +359,14 @@ $y = \max(0, x) + \min(0, \alpha * (e^x - 1))$
} }
}; };
template <typename AttrType>
class Relu6OpMaker : public framework::OpProtoAndCheckerMaker { class Relu6OpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Relu6OpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) Relu6OpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Relu6 operator"); AddInput("X", "Input of Relu6 operator");
AddOutput("Y", "Output of Relu6 operator"); AddOutput("Y", "Output of Relu6 operator");
AddAttr<AttrType>("threshold", "The threshold value of Relu6") AddAttr<float>("threshold", "The threshold value of Relu6")
.SetDefault(static_cast<AttrType>(6)); .SetDefault(6.0f);
AddComment(R"DOC( AddComment(R"DOC(
Relu6 Activation Operator. Relu6 Activation Operator.
...@@ -386,15 +376,13 @@ $y = \min(\max(0, x), 6)$ ...@@ -386,15 +376,13 @@ $y = \min(\max(0, x), 6)$
} }
}; };
template <typename AttrType>
class PowOpMaker : public framework::OpProtoAndCheckerMaker { class PowOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PowOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) PowOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Pow operator"); AddInput("X", "Input of Pow operator");
AddOutput("Y", "Output of Pow operator"); AddOutput("Y", "Output of Pow operator");
AddAttr<AttrType>("factor", "The exponential factor of Pow") AddAttr<float>("factor", "The exponential factor of Pow").SetDefault(1.0f);
.SetDefault(static_cast<AttrType>(1));
AddComment(R"DOC( AddComment(R"DOC(
Pow Activation Operator. Pow Activation Operator.
...@@ -404,17 +392,16 @@ $y = x^{factor}$ ...@@ -404,17 +392,16 @@ $y = x^{factor}$
} }
}; };
template <typename AttrType>
class STanhOpMaker : public framework::OpProtoAndCheckerMaker { class STanhOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
STanhOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) STanhOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of STanh operator"); AddInput("X", "Input of STanh operator");
AddOutput("Y", "Output of STanh operator"); AddOutput("Y", "Output of STanh operator");
AddAttr<AttrType>("scale_a", "The scale parameter of a for the input") AddAttr<float>("scale_a", "The scale parameter of a for the input")
.SetDefault(static_cast<AttrType>(2 / 3)); .SetDefault(2.0f / 3.0f);
AddAttr<AttrType>("scale_b", "The scale parameter of b for the input") AddAttr<float>("scale_b", "The scale parameter of b for the input")
.SetDefault(static_cast<AttrType>(1.7159)); .SetDefault(1.7159f);
AddComment(R"DOC( AddComment(R"DOC(
STanh Activation Operator. STanh Activation Operator.
...@@ -424,7 +411,6 @@ $$y = b * \frac{e^{a * x} - e^{-a * x}}{e^{a * x} + e^{-a * x}}$$ ...@@ -424,7 +411,6 @@ $$y = b * \frac{e^{a * x} - e^{-a * x}}{e^{a * x} + e^{-a * x}}$$
} }
}; };
template <typename AttrType>
class ThresholdedReluOpMaker : public framework::OpProtoAndCheckerMaker { class ThresholdedReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ThresholdedReluOpMaker(framework::OpProto *proto, ThresholdedReluOpMaker(framework::OpProto *proto,
...@@ -432,8 +418,8 @@ class ThresholdedReluOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -432,8 +418,8 @@ class ThresholdedReluOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of ThresholdedRelu operator"); AddInput("X", "Input of ThresholdedRelu operator");
AddOutput("Y", "Output of ThresholdedRelu operator"); AddOutput("Y", "Output of ThresholdedRelu operator");
AddAttr<AttrType>("threshold", "The threshold location of activation") AddAttr<float>("threshold", "The threshold location of activation")
.SetDefault(static_cast<AttrType>(1.0)); .SetDefault(1.0f);
AddComment(R"DOC( AddComment(R"DOC(
ThresholdedRelu Activation Operator. ThresholdedRelu Activation Operator.
...@@ -448,7 +434,6 @@ $$ ...@@ -448,7 +434,6 @@ $$
} }
}; };
template <typename AttrType>
class HardSigmoidOpMaker : public framework::OpProtoAndCheckerMaker { class HardSigmoidOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
HardSigmoidOpMaker(framework::OpProto *proto, HardSigmoidOpMaker(framework::OpProto *proto,
...@@ -456,10 +441,10 @@ class HardSigmoidOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -456,10 +441,10 @@ class HardSigmoidOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of HardSigmoid operator"); AddInput("X", "Input of HardSigmoid operator");
AddOutput("Y", "Output of HardSigmoid operator"); AddOutput("Y", "Output of HardSigmoid operator");
AddAttr<AttrType>("slope", "Slope for linear approximation of sigmoid") AddAttr<float>("slope", "Slope for linear approximation of sigmoid")
.SetDefault(static_cast<AttrType>(0.2)); .SetDefault(0.2f);
AddAttr<AttrType>("offset", "Offset for linear approximation of sigmoid") AddAttr<float>("offset", "Offset for linear approximation of sigmoid")
.SetDefault(static_cast<AttrType>(0.5)); .SetDefault(0.5f);
AddComment(R"DOC( AddComment(R"DOC(
HardSigmoid Activation Operator. HardSigmoid Activation Operator.
...@@ -499,7 +484,7 @@ REGISTER_OP(tanh, ops::ActivationOp, ops::TanhOpMaker, tanh_grad, ...@@ -499,7 +484,7 @@ REGISTER_OP(tanh, ops::ActivationOp, ops::TanhOpMaker, tanh_grad,
REGISTER_OP(tanh_shrink, ops::ActivationOp, ops::TanhShrinkOpMaker, REGISTER_OP(tanh_shrink, ops::ActivationOp, ops::TanhShrinkOpMaker,
tanh_shrink_grad, ops::ActivationOpGrad); tanh_shrink_grad, ops::ActivationOpGrad);
REGISTER_OP(softshrink, ops::ActivationOp, ops::SoftShrinkOpMaker<float>, REGISTER_OP(softshrink, ops::ActivationOp, ops::SoftShrinkOpMaker,
softshrink_grad, ops::ActivationOpGrad); softshrink_grad, ops::ActivationOpGrad);
REGISTER_OP(sqrt, ops::ActivationOp, ops::SqrtOpMaker, sqrt_grad, REGISTER_OP(sqrt, ops::ActivationOp, ops::SqrtOpMaker, sqrt_grad,
...@@ -523,35 +508,34 @@ REGISTER_OP(softplus, ops::ActivationOp, ops::SoftplusOpMaker, softplus_grad, ...@@ -523,35 +508,34 @@ REGISTER_OP(softplus, ops::ActivationOp, ops::SoftplusOpMaker, softplus_grad,
REGISTER_OP(softsign, ops::ActivationOp, ops::SoftsignOpMaker, softsign_grad, REGISTER_OP(softsign, ops::ActivationOp, ops::SoftsignOpMaker, softsign_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP(brelu, ops::ActivationOp, ops::BReluOpMaker<float>, brelu_grad, REGISTER_OP(brelu, ops::ActivationOp, ops::BReluOpMaker, brelu_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP(leaky_relu, ops::ActivationOp, ops::LeakyReluOpMaker<float>, REGISTER_OP(leaky_relu, ops::ActivationOp, ops::LeakyReluOpMaker,
leaky_relu_grad, ops::ActivationOpGrad); leaky_relu_grad, ops::ActivationOpGrad);
REGISTER_OP(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker<float>, REGISTER_OP(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker, soft_relu_grad,
soft_relu_grad, ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP(elu, ops::ActivationOp, ops::ELUOpMaker<float>, elu_grad, REGISTER_OP(elu, ops::ActivationOp, ops::ELUOpMaker, elu_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP(relu6, ops::ActivationOp, ops::Relu6OpMaker<float>, relu6_grad, REGISTER_OP(relu6, ops::ActivationOp, ops::Relu6OpMaker, relu6_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP(pow, ops::ActivationOp, ops::PowOpMaker<float>, pow_grad, REGISTER_OP(pow, ops::ActivationOp, ops::PowOpMaker, pow_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP(stanh, ops::ActivationOp, ops::STanhOpMaker<float>, stanh_grad, REGISTER_OP(stanh, ops::ActivationOp, ops::STanhOpMaker, stanh_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP(hard_shrink, ops::ActivationOp, ops::HardShrinkOpMaker<float>, REGISTER_OP(hard_shrink, ops::ActivationOp, ops::HardShrinkOpMaker,
hard_shrink_grad, ops::ActivationOpGrad); hard_shrink_grad, ops::ActivationOpGrad);
REGISTER_OP(thresholded_relu, ops::ActivationOp, REGISTER_OP(thresholded_relu, ops::ActivationOp, ops::ThresholdedReluOpMaker,
ops::ThresholdedReluOpMaker<float>, thresholded_relu_grad, thresholded_relu_grad, ops::ActivationOpGrad);
ops::ActivationOpGrad);
REGISTER_OP(hard_sigmoid, ops::ActivationOp, ops::HardSigmoidOpMaker<float>, REGISTER_OP(hard_sigmoid, ops::ActivationOp, ops::HardSigmoidOpMaker,
hard_sigmoid_grad, ops::ActivationOpGrad); hard_sigmoid_grad, ops::ActivationOpGrad);
#define REGISTER_ACTIVATION_CPU_KERNEL(act_type, functor, grad_functor) \ #define REGISTER_ACTIVATION_CPU_KERNEL(act_type, functor, grad_functor) \
......
...@@ -109,4 +109,5 @@ paramOut = param + paramUpdate$$ ...@@ -109,4 +109,5 @@ paramOut = param + paramUpdate$$
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(adadelta, ops::AdadeltaOp, ops::AdadeltaOpMaker); REGISTER_OP_WITHOUT_GRADIENT(adadelta, ops::AdadeltaOp, ops::AdadeltaOpMaker);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
adadelta, ops::AdadeltaOpKernel<paddle::platform::CPUPlace, float>); adadelta, ops::AdadeltaOpKernel<paddle::platform::CPUPlace, float>,
ops::AdadeltaOpKernel<paddle::platform::CPUPlace, double>);
...@@ -17,4 +17,5 @@ ...@@ -17,4 +17,5 @@
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL( REGISTER_OP_GPU_KERNEL(
adadelta, ops::AdadeltaOpKernel<paddle::platform::GPUPlace, float>); adadelta, ops::AdadeltaOpKernel<paddle::platform::GPUPlace, float>,
ops::AdadeltaOpKernel<paddle::platform::GPUPlace, double>);
...@@ -33,8 +33,8 @@ class AdadeltaOpKernel : public framework::OpKernel<T> { ...@@ -33,8 +33,8 @@ class AdadeltaOpKernel : public framework::OpKernel<T> {
avg_squared_grad_out_tensor->mutable_data<T>(ctx.GetPlace()); avg_squared_grad_out_tensor->mutable_data<T>(ctx.GetPlace());
avg_squared_update_out_tensor->mutable_data<T>(ctx.GetPlace()); avg_squared_update_out_tensor->mutable_data<T>(ctx.GetPlace());
float rho = ctx.Attr<float>("rho"); T rho = static_cast<T>(ctx.Attr<float>("rho"));
float epsilon = ctx.Attr<float>("epsilon"); T epsilon = static_cast<T>(ctx.Attr<float>("epsilon"));
auto param = framework::EigenVector<T>::Flatten( auto param = framework::EigenVector<T>::Flatten(
*ctx.Input<framework::Tensor>("Param")); *ctx.Input<framework::Tensor>("Param"));
......
...@@ -14,8 +14,8 @@ ...@@ -14,8 +14,8 @@
#define EIGEN_USE_GPU #define EIGEN_USE_GPU
#include "paddle/operators/adagrad_op.h" #include "paddle/operators/adagrad_op.h"
#include "paddle/operators/math/selected_rows_functor.h"
#include "paddle/operators/math/math_function.h" #include "paddle/operators/math/math_function.h"
#include "paddle/operators/math/selected_rows_functor.h"
#include "paddle/platform/cuda_helper.h" #include "paddle/platform/cuda_helper.h"
namespace paddle { namespace paddle {
...@@ -134,8 +134,8 @@ struct SparseAdagradFunctor<platform::GPUPlace, T> { ...@@ -134,8 +134,8 @@ struct SparseAdagradFunctor<platform::GPUPlace, T> {
T, 256><<<grid2, threads, 0, T, 256><<<grid2, threads, 0,
reinterpret_cast<const platform::CUDADeviceContext&>(context) reinterpret_cast<const platform::CUDADeviceContext&>(context)
.stream()>>>(grad_merge_data, grad_merge->rows().data(), .stream()>>>(grad_merge_data, grad_merge->rows().data(),
lr, param_data, lr, param_data, moment_data, grad_width,
moment_data, grad_width, epsilon); epsilon);
} }
}; };
......
...@@ -127,4 +127,5 @@ paramOut = param - learningRate * moment_1/ ($\sqrt{(moment_2)} + \epsilon)$$ ...@@ -127,4 +127,5 @@ paramOut = param - learningRate * moment_1/ ($\sqrt{(moment_2)} + \epsilon)$$
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(adam, ops::AdamOp, ops::AdamOpMaker); REGISTER_OP_WITHOUT_GRADIENT(adam, ops::AdamOp, ops::AdamOpMaker);
REGISTER_OP_CPU_KERNEL(adam, REGISTER_OP_CPU_KERNEL(adam,
ops::AdamOpKernel<paddle::platform::CPUPlace, float>); ops::AdamOpKernel<paddle::platform::CPUPlace, float>,
ops::AdamOpKernel<paddle::platform::CPUPlace, double>);
...@@ -17,4 +17,5 @@ ...@@ -17,4 +17,5 @@
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(adam, REGISTER_OP_GPU_KERNEL(adam,
ops::AdamOpKernel<paddle::platform::GPUPlace, float>); ops::AdamOpKernel<paddle::platform::GPUPlace, float>,
ops::AdamOpKernel<paddle::platform::GPUPlace, double>);
...@@ -31,9 +31,9 @@ class AdamOpKernel : public framework::OpKernel<T> { ...@@ -31,9 +31,9 @@ class AdamOpKernel : public framework::OpKernel<T> {
moment1_out_tensor->mutable_data<T>(ctx.GetPlace()); moment1_out_tensor->mutable_data<T>(ctx.GetPlace());
moment2_out_tensor->mutable_data<T>(ctx.GetPlace()); moment2_out_tensor->mutable_data<T>(ctx.GetPlace());
float beta1 = ctx.Attr<float>("beta1"); T beta1 = static_cast<T>(ctx.Attr<float>("beta1"));
float beta2 = ctx.Attr<float>("beta2"); T beta2 = static_cast<T>(ctx.Attr<float>("beta2"));
float epsilon = ctx.Attr<float>("epsilon"); T epsilon = static_cast<T>(ctx.Attr<float>("epsilon"));
auto param = framework::EigenVector<T>::Flatten( auto param = framework::EigenVector<T>::Flatten(
*ctx.Input<framework::Tensor>("Param")); *ctx.Input<framework::Tensor>("Param"));
......
...@@ -126,4 +126,5 @@ division by 0 error. ...@@ -126,4 +126,5 @@ division by 0 error.
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(adamax, ops::AdamaxOp, ops::AdamaxOpMaker); REGISTER_OP_WITHOUT_GRADIENT(adamax, ops::AdamaxOp, ops::AdamaxOpMaker);
REGISTER_OP_CPU_KERNEL(adamax, REGISTER_OP_CPU_KERNEL(adamax,
ops::AdamaxOpKernel<paddle::platform::CPUPlace, float>); ops::AdamaxOpKernel<paddle::platform::CPUPlace, float>,
ops::AdamaxOpKernel<paddle::platform::CPUPlace, double>);
...@@ -17,4 +17,5 @@ ...@@ -17,4 +17,5 @@
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(adamax, REGISTER_OP_GPU_KERNEL(adamax,
ops::AdamaxOpKernel<paddle::platform::GPUPlace, float>); ops::AdamaxOpKernel<paddle::platform::GPUPlace, float>,
ops::AdamaxOpKernel<paddle::platform::GPUPlace, double>);
...@@ -31,9 +31,9 @@ class AdamaxOpKernel : public framework::OpKernel<T> { ...@@ -31,9 +31,9 @@ class AdamaxOpKernel : public framework::OpKernel<T> {
moment_out_tensor->mutable_data<T>(ctx.GetPlace()); moment_out_tensor->mutable_data<T>(ctx.GetPlace());
inf_norm_out_tensor->mutable_data<T>(ctx.GetPlace()); inf_norm_out_tensor->mutable_data<T>(ctx.GetPlace());
float beta1 = ctx.Attr<float>("beta1"); T beta1 = static_cast<T>(ctx.Attr<float>("beta1"));
float beta2 = ctx.Attr<float>("beta2"); T beta2 = static_cast<T>(ctx.Attr<float>("beta2"));
float epsilon = ctx.Attr<float>("epsilon"); T epsilon = static_cast<T>(ctx.Attr<float>("epsilon"));
auto param = framework::EigenVector<T>::Flatten( auto param = framework::EigenVector<T>::Flatten(
*ctx.Input<framework::Tensor>("Param")); *ctx.Input<framework::Tensor>("Param"));
......
...@@ -139,7 +139,7 @@ bool BeamSearch::NextItemSet(std::vector<BeamSearch::Item> *items) { ...@@ -139,7 +139,7 @@ bool BeamSearch::NextItemSet(std::vector<BeamSearch::Item> *items) {
items->reserve(framework::product(ids.dims())); items->reserve(framework::product(ids.dims()));
for (size_t offset = abs_lod[lod_level_][sent_offset_]; for (size_t offset = abs_lod[lod_level_][sent_offset_];
offset < abs_lod[lod_level_][sent_offset_ + 1]; offset++) { offset < abs_lod[lod_level_][sent_offset_ + 1]; offset++) {
for (int d = 0; d < instance_dim; d++) { for (size_t d = 0; d < instance_dim; d++) {
const size_t dim_offset = offset * instance_dim + d; const size_t dim_offset = offset * instance_dim + d;
items->emplace_back(offset, ids_data[dim_offset], items->emplace_back(offset, ids_data[dim_offset],
scores_data[dim_offset]); scores_data[dim_offset]);
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/ftrl_op.h"
namespace paddle {
namespace operators {
class FTRLOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Param"),
"Input(Param) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasInput("SquaredAccumulator"),
"Input(SquaredAccumulator) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasInput("LinearAccumulator"),
"Input(LinearAccumulator) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Grad"),
"Input(Grad) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasInput("LearningRate"),
"Input(LearningRate) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("ParamOut"),
"Output(ParamOut) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("SquaredAccumOut"),
"Output(SquaredAccumOut) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("LinearAccumOut"),
"Output(LinearAccumOut) of FTRL should not be null.");
auto param_dim = ctx->GetInputDim("Param");
PADDLE_ENFORCE_EQ(param_dim, ctx->GetInputDim("Grad"),
"Two input of FTRL Op's dimension must be same.");
auto lr_dim = ctx->GetInputDim("LearningRate");
PADDLE_ENFORCE_EQ(framework::product(lr_dim), 1,
"Learning Rate should be a scalar.");
ctx->SetOutputDim("ParamOut", param_dim);
ctx->SetOutputDim("SquaredAccumOut", param_dim);
ctx->SetOutputDim("LinearAccumOut", param_dim);
}
};
class FTRLOpMaker : public framework::OpProtoAndCheckerMaker {
public:
FTRLOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param",
"(Tensor, default Tensor<float>) "
"Input parameter value that has to be updated.");
AddInput("SquaredAccumulator",
"(Tensor, default Tensor<float>) "
"Accumulator that accumulates squared gradients.");
AddInput("LinearAccumulator",
"(Tensor, default Tensor<float>) "
"Accumulator that accumulates linear gradients.");
AddInput("Grad",
"(Tensor, default Tensor<float>) "
"Input gradient of the parameter.");
AddInput("LearningRate",
"(Tensor, default Tensor<float>) "
"The learning rate should be a tensor of size 1.");
AddOutput("ParamOut", "(Tensor) Output updated parameter value.");
AddOutput("SquaredAccumOut",
"(Tensor) Output accumulated squared"
" gradients.");
AddOutput("LinearAccumOut",
"(Tensor) Output accumulated linear"
" gradients.");
AddAttr<float>("l1",
"(float, default 0.0) "
"L1 regularization strength.")
.SetDefault(0.0f);
AddAttr<float>("l2",
"(float, default 0.0) "
"L2 regularization strength.")
.SetDefault(0.0f);
AddAttr<float>("lr_power",
"(float, default -0.5f) "
"Learning Rate Power.")
.SetDefault(-0.5f);
AddComment(R"DOC(
FTRL (Follow The Regularized Leader) Operator.
Optimizer that implements the FTRL algorithm:
$$
new\_accum = squared\_accum + grad^2 \\
if (lr\_power == -0.5) {
linear\_accum += grad - (\surd(new\_accum) - \surd(squared\_accum)) /
(learning\_rate * param) \\
} else {
linear\_accum += grad -
(new\_accum^{-lr\_power} - accum^{-lr\_power}) /
(learning\_rate * param) \\
}
x = (l1 * sign(linear\_accum) - linear\_accum)
if (lr\_power == -0.5) {
y = \frac{\surd(new\_accum)}{learning\_rate} + (2 * l2) \\
pre\_shrink = \frac{x}{y} \\
param = (abs(linear\_accum) > l1).select(pre\_shrink, 0.0) \\
} else {
y = \frac{new\_accum^{-lr\_power}}{learning\_rate} + (2 * l2) \\
pre\_shrink = \frac{x}{y} \\
param = (abs(linear\_accum) > l1).select(pre\_shrink, 0.0) \\
}
squared\_accum += grad^2;
$$
The paper that proposed Follow The Regularized Leader (FTRL):
(https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf)
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(ftrl, ops::FTRLOp, ops::FTRLOpMaker);
REGISTER_OP_CPU_KERNEL(ftrl,
ops::FTRLOpKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
You may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/ftrl_op.h"
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(ftrl,
ops::FTRLOpKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename Place, typename T>
class FTRLOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* param_out = ctx.Output<Tensor>("ParamOut");
auto* sq_accum_out = ctx.Output<Tensor>("SquaredAccumOut");
auto* lin_accum_out = ctx.Output<Tensor>("LinearAccumOut");
param_out->mutable_data<T>(ctx.GetPlace());
sq_accum_out->mutable_data<T>(ctx.GetPlace());
lin_accum_out->mutable_data<T>(ctx.GetPlace());
auto grad = ctx.Input<Tensor>("Grad");
auto l1 = static_cast<T>(ctx.Attr<float>("l1"));
auto l2 = static_cast<T>(ctx.Attr<float>("l2"));
auto lr_power = static_cast<T>(ctx.Attr<float>("lr_power"));
auto p = EigenVector<T>::Flatten(*ctx.Input<Tensor>("Param"));
auto sq_accum =
EigenVector<T>::Flatten(*ctx.Input<Tensor>("SquaredAccumulator"));
auto lin_accum =
EigenVector<T>::Flatten(*ctx.Input<Tensor>("LinearAccumulator"));
auto g = EigenVector<T>::Flatten(*grad);
auto lr = EigenVector<T>::Flatten(*ctx.Input<Tensor>("LearningRate"));
auto p_out = EigenVector<T>::Flatten(*param_out);
auto s_acc_out = EigenVector<T>::Flatten(*sq_accum_out);
auto l_acc_out = EigenVector<T>::Flatten(*lin_accum_out);
auto place = ctx.GetEigenDevice<Place>();
Eigen::DSizes<int, 1> grad_dsize(grad->numel());
auto new_accum = sq_accum + g * g;
// Special case for lr_power = -0.5
if (lr_power == static_cast<T>(-0.5)) {
l_acc_out.device(place) =
lin_accum + g -
((new_accum.sqrt() - sq_accum.sqrt()) / lr.broadcast(grad_dsize)) * p;
} else {
l_acc_out.device(place) =
lin_accum + g -
((new_accum.pow(-lr_power) - sq_accum.pow(-lr_power)) /
lr.broadcast(grad_dsize)) *
p;
}
auto x = (l_acc_out.constant(l1) * l_acc_out.sign() - l_acc_out);
if (lr_power == static_cast<T>(-0.5)) {
auto y = (new_accum.sqrt() / lr.broadcast(grad_dsize)) +
l_acc_out.constant(static_cast<T>(2) * l2);
auto pre_shrink = x / y;
p_out.device(place) =
(l_acc_out.abs() > l_acc_out.constant(l1))
.select(pre_shrink, p.constant(static_cast<T>(0)));
} else {
auto y = (new_accum.pow(-lr_power) / lr.broadcast(grad_dsize)) +
l_acc_out.constant(static_cast<T>(2) * l2);
auto pre_shrink = x / y;
p_out.device(place) =
(l_acc_out.abs() > l_acc_out.constant(l1))
.select(pre_shrink, p.constant(static_cast<T>(0)));
}
s_acc_out.device(place) = sq_accum + g * g;
}
};
} // namespace operators
} // namespace paddle
...@@ -114,18 +114,19 @@ class GRUUnitOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -114,18 +114,19 @@ class GRUUnitOpMaker : public framework::OpProtoAndCheckerMaker {
.SetDefault(sigmoid) .SetDefault(sigmoid)
.InEnum({identity, sigmoid, tanh, relu}); .InEnum({identity, sigmoid, tanh, relu});
AddComment(R"DOC( AddComment(R"DOC(
GRUUnit Operator. GRUUnit Operator implements partial calculations of the GRU unit as following:
This operator implements partial calculations of the GRU unit as follows:
$$ $$
update \ gate: u_t = actGate(xu_t + W_u * hidden_{prev} + bias_u) \\ update \ gate: u_t = actGate(xu_t + W_u * h_{t-1} + b_u) \\
reset \ gate: r_t = actGate(xr_t + W_r * hidden_{prev} + bias_r) \\ reset \ gate: r_t = actGate(xr_t + W_r * h_{t-1} + b_r) \\
output \ candidate: {h}_t = actNode({xc}_t + W_c * dot(r_t, hidden_{prev}) + bias_c) \\ output \ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, h_{t-1}) + b_c) \\
output: h_t = dot((1-u_t), {h}_t) + dot(u_t, hidden_{prev}) output: h_t = dot((1 - u_t), h_{t-1}) + dot(u_t, {h}_t)
$$ $$
The rest of GRU unit can be completed by using FCOp's output as the input of GRUUnitOp. which is same as one time step of GRU Operator.
@note To implement the complete GRU unit, fully-connected operator must be
used before to feed xu, xr and xc as the Input of GRUUnit operator.
)DOC"); )DOC");
} }
...@@ -150,12 +151,6 @@ class GRUUnitGradOp : public framework::OperatorWithKernel { ...@@ -150,12 +151,6 @@ class GRUUnitGradOp : public framework::OperatorWithKernel {
"ResetHiddenPrev"); "ResetHiddenPrev");
PADDLE_ENFORCE(ctx->HasInput("Hidden"), PADDLE_ENFORCE(ctx->HasInput("Hidden"),
"Input(%s) of GRUUnitGradOp should not be null.", "Hidden"); "Input(%s) of GRUUnitGradOp should not be null.", "Hidden");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Gate")),
"Input(%s@GRAD) of GRUUnitGradOp should not be null.",
"Gate");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("ResetHiddenPrev")),
"Input(%s@GRAD) of GRUUnitGradOp should not be null.",
"ResetHiddenPrev");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Hidden")), PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Hidden")),
"Input(%s@GRAD) of GRUUnitGradOp should not be null.", "Input(%s@GRAD) of GRUUnitGradOp should not be null.",
"Hidden"); "Hidden");
......
...@@ -110,7 +110,7 @@ class GRUUnitKernel : public framework::OpKernel<T> { ...@@ -110,7 +110,7 @@ class GRUUnitKernel : public framework::OpKernel<T> {
auto c = g.slice(c_offsets, extents); // output candidate auto c = g.slice(c_offsets, extents); // output candidate
// calculate final output // calculate final output
h.device(place) = u * (h_p - c) + c; h.device(place) = u * (c - h_p) + h_p;
} }
}; };
...@@ -146,35 +146,27 @@ class GRUUnitGradKernel : public framework::OpKernel<T> { ...@@ -146,35 +146,27 @@ class GRUUnitGradKernel : public framework::OpKernel<T> {
auto* weight_grad = auto* weight_grad =
context.Output<Tensor>(framework::GradVarName("Weight")); context.Output<Tensor>(framework::GradVarName("Weight"));
auto* bias_grad = context.Output<Tensor>(framework::GradVarName("Bias")); auto* bias_grad = context.Output<Tensor>(framework::GradVarName("Bias"));
input_grad->mutable_data<T>(context.GetPlace());
hidden_prev_grad->mutable_data<T>(context.GetPlace());
weight_grad->mutable_data<T>(context.GetPlace());
Tensor gate_grad; Tensor gate_grad;
gate_grad.mutable_data<T>(input->dims(), context.GetPlace());
Tensor reset_hidden_prev_grad; Tensor reset_hidden_prev_grad;
reset_hidden_prev_grad.mutable_data<T>(reset_hidden_prev->dims(),
context.GetPlace());
int batch_size = input->dims()[0];
int frame_size = hidden_prev->dims()[1];
const T* hidden_prev_data = hidden_prev->data<T>(); const T* hidden_prev_data = hidden_prev->data<T>();
T* hidden_prev_grad_data = hidden_prev_grad->data<T>();
const T* weight_data = weight->data<T>(); const T* weight_data = weight->data<T>();
T* weight_grad_data = weight_grad->data<T>(); T* gate_grad_data =
T* gate_grad_data = gate_grad.data<T>(); gate_grad.mutable_data<T>(input->dims(), context.GetPlace());
const T* reset_hidden_prev_data = reset_hidden_prev->data<T>(); const T* reset_hidden_prev_data = reset_hidden_prev->data<T>();
T* reset_hidden_prev_grad_data = reset_hidden_prev_grad.data<T>(); T* reset_hidden_prev_grad_data = reset_hidden_prev_grad.mutable_data<T>(
reset_hidden_prev->dims(), context.GetPlace());
auto h_p = EigenMatrix<T>::From(*hidden_prev); auto h_p = EigenMatrix<T>::From(*hidden_prev);
auto g = EigenMatrix<T>::From(*gate); auto g = EigenMatrix<T>::From(*gate);
auto d_h = EigenMatrix<T>::From(*hidden_grad); auto d_h = EigenMatrix<T>::From(*hidden_grad);
auto d_x = EigenMatrix<T>::From(*input_grad);
auto d_h_p = EigenMatrix<T>::From(*hidden_prev_grad);
auto d_g = EigenMatrix<T>::From(gate_grad); auto d_g = EigenMatrix<T>::From(gate_grad);
auto d_r_h_p = EigenMatrix<T>::From(reset_hidden_prev_grad); auto d_r_h_p = EigenMatrix<T>::From(reset_hidden_prev_grad);
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
int batch_size = input->dims()[0];
int frame_size = hidden_prev->dims()[1];
Eigen::array<int, 2> extents({{batch_size, frame_size}}); Eigen::array<int, 2> extents({{batch_size, frame_size}});
Eigen::array<int, 2> u_offsets({{0, 0}}); Eigen::array<int, 2> u_offsets({{0, 0}});
auto u = g.slice(u_offsets, extents); // update gate auto u = g.slice(u_offsets, extents); // update gate
...@@ -185,38 +177,52 @@ class GRUUnitGradKernel : public framework::OpKernel<T> { ...@@ -185,38 +177,52 @@ class GRUUnitGradKernel : public framework::OpKernel<T> {
// backward for unactivated update gate // backward for unactivated update gate
ActGradCompute(context.Attr<int>("gate_activation"), place, u, u, ActGradCompute(context.Attr<int>("gate_activation"), place, u, u,
d_g.slice(u_offsets, extents), d_h * (h_p - c)); d_g.slice(u_offsets, extents), d_h * (c - h_p));
// backward for unactivated output candidate // backward for unactivated output candidate
ActGradCompute(context.Attr<int>("activation"), place, c, c, ActGradCompute(context.Attr<int>("activation"), place, c, c,
d_g.slice(c_offsets, extents), d_h * (u.constant(T(1)) - u)); d_g.slice(c_offsets, extents), d_h * u);
// backward for reset_hidden_prev // backward for reset_hidden_prev
math::gemm<Place, T>(context.device_context(), false, true, batch_size, math::gemm<Place, T>(context.device_context(), false, true, batch_size,
frame_size, frame_size, 1, frame_size, frame_size, 1,
gate_grad_data + frame_size * 2, frame_size * 3, gate_grad_data + frame_size * 2, frame_size * 3,
weight_data + frame_size * frame_size * 2, frame_size, weight_data + frame_size * frame_size * 2, frame_size,
0, reset_hidden_prev_grad_data, frame_size); 0, reset_hidden_prev_grad_data, frame_size);
// backward for state_weight
math::gemm<Place, T>(
context.device_context(), true, false, frame_size, frame_size,
batch_size, 1, reset_hidden_prev_data, frame_size,
gate_grad_data + frame_size * 2, frame_size * 3, 0,
weight_grad_data + frame_size * frame_size * 2, frame_size);
// backward for unactivated reset gate // backward for unactivated reset gate
ActGradCompute(context.Attr<int>("gate_activation"), place, r, r, ActGradCompute(context.Attr<int>("gate_activation"), place, r, r,
d_g.slice(r_offsets, extents), d_r_h_p * h_p); d_g.slice(r_offsets, extents), d_r_h_p * h_p);
// backward for update_gate_weight and reset_gate_weight // backward for weight
math::gemm<Place, T>(context.device_context(), true, false, frame_size, if (weight_grad) {
frame_size * 2, batch_size, 1, hidden_prev_data, T* weight_grad_data = weight_grad->mutable_data<T>(context.GetPlace());
frame_size, gate_grad_data, frame_size * 3, 0, // backward for state_weight
weight_grad_data, frame_size * 2); math::gemm<Place, T>(
context.device_context(), true, false, frame_size, frame_size,
batch_size, 1, reset_hidden_prev_data, frame_size,
gate_grad_data + frame_size * 2, frame_size * 3, 0,
weight_grad_data + frame_size * frame_size * 2, frame_size);
// backward for update_gate_weight and reset_gate_weight
math::gemm<Place, T>(context.device_context(), true, false, frame_size,
frame_size * 2, batch_size, 1, hidden_prev_data,
frame_size, gate_grad_data, frame_size * 3, 0,
weight_grad_data, frame_size * 2);
}
// backward for hidden_prev // backward for hidden_prev
d_h_p.device(place) = d_r_h_p * r + d_h * u; if (hidden_prev_grad) {
math::gemm<Place, T>(context.device_context(), false, true, batch_size, T* hidden_prev_grad_data =
frame_size, frame_size * 2, 1, gate_grad_data, hidden_prev_grad->mutable_data<T>(context.GetPlace());
frame_size * 3, weight_data, frame_size * 2, 1, auto d_h_p = EigenMatrix<T>::From(*hidden_prev_grad);
hidden_prev_grad_data, frame_size); d_h_p.device(place) = d_r_h_p * r + d_h * (u.constant(T(1)) - u);
math::gemm<Place, T>(context.device_context(), false, true, batch_size,
frame_size, frame_size * 2, 1, gate_grad_data,
frame_size * 3, weight_data, frame_size * 2, 1,
hidden_prev_grad_data, frame_size);
}
// backward for input // backward for input
d_x.device(place) = d_g; if (input_grad) {
input_grad->mutable_data<T>(context.GetPlace());
auto d_x = EigenMatrix<T>::From(*input_grad);
d_x.device(place) = d_g;
}
// backward for bias // backward for bias
if (bias_grad) { if (bias_grad) {
bias_grad->mutable_data<T>(context.GetPlace()); bias_grad->mutable_data<T>(context.GetPlace());
......
...@@ -271,7 +271,7 @@ class LinearChainCRFOpKernel : public framework::OpKernel<T> { ...@@ -271,7 +271,7 @@ class LinearChainCRFOpKernel : public framework::OpKernel<T> {
ll -= std::log(sum); ll -= std::log(sum);
// Now ll is equal to -log(Z). // Now ll is equal to -log(Z).
const int* lbl = label.data<int>(); const int64_t* lbl = label.data<int64_t>();
PADDLE_ENFORCE_LT( PADDLE_ENFORCE_LT(
static_cast<size_t>(*std::max_element(lbl, lbl + seq_length)), tag_num, static_cast<size_t>(*std::max_element(lbl, lbl + seq_length)), tag_num,
"An invalid tag label that execesses the largest tag number."); "An invalid tag label that execesses the largest tag number.");
...@@ -449,7 +449,7 @@ class LinearChainCRFGradOpKernel : public framework::OpKernel<T> { ...@@ -449,7 +449,7 @@ class LinearChainCRFGradOpKernel : public framework::OpKernel<T> {
Tensor* emission_grad) const { Tensor* emission_grad) const {
const T* w_exps = transition_exps.data<T>(); const T* w_exps = transition_exps.data<T>();
const T* x_exps = emission_exps.data<T>(); const T* x_exps = emission_exps.data<T>();
const int* label_value = label.data<int>(); const int64_t* label_value = label.data<int64_t>();
T* beta_value = beta->data<T>(); T* beta_value = beta->data<T>();
auto x_dims = emission_exps.dims(); auto x_dims = emission_exps.dims();
......
...@@ -179,7 +179,9 @@ REGISTER_OP(sequence_conv, ops::SequenceConvOp, ops::SequenceConvOpMaker, ...@@ -179,7 +179,9 @@ REGISTER_OP(sequence_conv, ops::SequenceConvOp, ops::SequenceConvOpMaker,
sequence_conv_grad, ops::SequenceConvGradOp); sequence_conv_grad, ops::SequenceConvGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
sequence_conv, ops::SequenceConvKernel<paddle::platform::CPUPlace, float>); sequence_conv, ops::SequenceConvKernel<paddle::platform::CPUPlace, float>,
ops::SequenceConvKernel<paddle::platform::CPUPlace, double>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
sequence_conv_grad, sequence_conv_grad,
ops::SequenceConvGradKernel<paddle::platform::CPUPlace, float>); ops::SequenceConvGradKernel<paddle::platform::CPUPlace, float>,
ops::SequenceConvGradKernel<paddle::platform::CPUPlace, double>);
...@@ -16,7 +16,9 @@ ...@@ -16,7 +16,9 @@
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL( REGISTER_OP_GPU_KERNEL(
sequence_conv, ops::SequenceConvKernel<paddle::platform::GPUPlace, float>); sequence_conv, ops::SequenceConvKernel<paddle::platform::GPUPlace, float>,
ops::SequenceConvKernel<paddle::platform::GPUPlace, double>);
REGISTER_OP_GPU_KERNEL( REGISTER_OP_GPU_KERNEL(
sequence_conv_grad, sequence_conv_grad,
ops::SequenceConvGradKernel<paddle::platform::GPUPlace, float>); ops::SequenceConvGradKernel<paddle::platform::GPUPlace, float>,
ops::SequenceConvGradKernel<paddle::platform::GPUPlace, double>);
...@@ -144,7 +144,7 @@ function gen_dockerfile() { ...@@ -144,7 +144,7 @@ function gen_dockerfile() {
DOCKERFILE_GPU_ENV="" DOCKERFILE_GPU_ENV=""
DOCKERFILE_CUDNN_DSO="" DOCKERFILE_CUDNN_DSO=""
if [[ ${WITH_GPU:-OFF} == 'ON' ]]; then if [[ ${WITH_GPU:-OFF} == 'ON' ]]; then
DOCKERFILE_GPU_ENV="ENV LD_LIBRARY_PATH /usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}" DOCKERFILE_GPU_ENV="ENV LD_LIBRARY_PATH /usr/lib/x86_64-linux-gnu:\${LD_LIBRARY_PATH}"
DOCKERFILE_CUDNN_DSO="RUN ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.5 /usr/lib/x86_64-linux-gnu/libcudnn.so" DOCKERFILE_CUDNN_DSO="RUN ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.5 /usr/lib/x86_64-linux-gnu/libcudnn.so"
fi fi
......
...@@ -138,7 +138,7 @@ void Trainer::init(const std::shared_ptr<TrainerConfigHelper>& config, ...@@ -138,7 +138,7 @@ void Trainer::init(const std::shared_ptr<TrainerConfigHelper>& config,
} }
if (FLAGS_use_mkldnn) { if (FLAGS_use_mkldnn) {
CHECK_EQ(FLAGS_trainer_count, 1UL) << "MKLDNN only need 1 trainer"; CHECK_EQ(FLAGS_trainer_count, 1) << "MKLDNN only need 1 trainer";
} }
if (testing) { if (testing) {
......
...@@ -11,7 +11,6 @@ add_unittest_without_exec(test_Trainer ...@@ -11,7 +11,6 @@ add_unittest_without_exec(test_Trainer
test_Trainer.cpp) test_Trainer.cpp)
add_test(NAME test_Trainer add_test(NAME test_Trainer
COMMAND ${PADDLE_SOURCE_DIR}/paddle/.set_python_path.sh -d ${PADDLE_SOURCE_DIR}/python/ COMMAND ${PADDLE_SOURCE_DIR}/paddle/.set_python_path.sh -d ${PADDLE_SOURCE_DIR}/python/
${PYTHON_EXECUTABLE} ${PADDLE_SOURCE_DIR}/paddle/trainer/tests/gen_proto_data.py &&
${PADDLE_SOURCE_DIR}/paddle/.set_python_path.sh -d ${PADDLE_SOURCE_DIR}/python/ ${PADDLE_SOURCE_DIR}/paddle/.set_python_path.sh -d ${PADDLE_SOURCE_DIR}/python/
${CMAKE_CURRENT_BINARY_DIR}/test_Trainer ${CMAKE_CURRENT_BINARY_DIR}/test_Trainer
WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle/) WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle/)
......
#edit-mode: -*- python -*-
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#Todo(luotao02) This config is only used for unitest. It is out of date now, and will be updated later.
TrainData(ProtoData(
files = 'trainer/tests/train_files.txt',
usage_ratio = 1.0,
))
TestData(ProtoData(
files = 'trainer/tests/test_files.txt'
))
default_initial_std(1)
default_decay_rate(4e-4)
default_device(0)
Inputs("features", "word", "pos", "chunk")
Outputs("crf")
Layer(
name = "features",
type = "data",
size = 4339,
)
Layer(
name = "word",
type = "data",
size = 478,
)
Layer(
name = "pos",
type = "data",
size = 45
)
Layer(
name = "chunk",
type = "data",
size = 23
)
Layer(
name = "output",
type = "mixed",
size = 23,
bias = False,
device = -1,
inputs = [
FullMatrixProjection("features", parameter_name="feature_weights"),
# TableProjection("word"),
# TableProjection("pos"),
],
)
Layer(
name = "crf",
type = "crf",
size = 23,
device = -1,
inputs = [
Input("output", parameter_name="crfw"),
"chunk"
]
)
Layer(
name = "crf_decoding",
type = "crf_decoding",
size = 23,
device = -1,
inputs = [
Input("output", parameter_name="crfw"),
"chunk"
]
)
Evaluator(
name = "error",
type = "sum",
inputs = "crf_decoding",
)
'''
# chuck evaluator cannot be used for GPU training
Evaluator(
name = "chunk_f1",
type = "chunk",
inputs = ["crf_decoding", "chunk"],
chunk_scheme = "IOB",
num_chunk_types = 11,
)
'''
Settings(
algorithm = 'sgd',
batch_size = 100,
average_window = 0.5,
max_average_window = 2500,
learning_rate = 1e-1,
learning_rate_decay_a = 5e-7,
learning_rate_decay_b = 0.75,
l1weight = 0,
l2weight = 1,
c1 = 0.0001,
backoff = 0.5,
owlqn_steps = 100,
max_backoff = 5,
)
因为 它太大了无法显示 source diff 。你可以改为 查看blob
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from cStringIO import StringIO
import paddle.proto.DataFormat_pb2 as DataFormat
from google.protobuf.internal.encoder import _EncodeVarint
import logging
import pprint
logging.basicConfig(
format='[%(levelname)s %(asctime)s %(filename)s:%(lineno)s] %(message)s', )
logger = logging.getLogger('paddle')
logger.setLevel(logging.INFO)
OOV_POLICY_IGNORE = 0
OOV_POLICY_USE = 1
OOV_POLICY_ERROR = 2
num_original_columns = 3
# Feature combination patterns.
# [[-1,0], [0,0]] means previous token at column 0 and current token at
# column 0 are combined as one feature.
patterns = [
[[-2, 0]],
[[-1, 0]],
[[0, 0]],
[[1, 0]],
[[2, 0]],
[[-1, 0], [0, 0]],
[[0, 0], [1, 0]],
[[-2, 1]],
[[-1, 1]],
[[0, 1]],
[[1, 1]],
[[2, 1]],
[[-2, 1], [-1, 1]],
[[-1, 1], [0, 1]],
[[0, 1], [1, 1]],
[[1, 1], [2, 1]],
[[-2, 1], [-1, 1], [0, 1]],
[[-1, 1], [0, 1], [1, 1]],
[[0, 1], [1, 1], [2, 1]],
]
def make_features(sequence):
length = len(sequence)
num_features = len(sequence[0])
def get_features(pos):
if pos < 0:
return ['#B%s' % -pos] * num_features
if pos >= length:
return ['#E%s' % (pos - length + 1)] * num_features
return sequence[pos]
for i in xrange(length):
for pattern in patterns:
fname = '/'.join([get_features(i + pos)[f] for pos, f in pattern])
sequence[i].append(fname)
'''
Source file format:
Each line is for one timestep. The features are separated by space.
An empty line indicates end of a sequence.
cutoff: a list of numbers. If count of a feature is smaller than this,
it will be ignored.
if oov_policy[i] is OOV_POLICY_USE, id 0 is reserved for OOV features of
i-th column.
return a list of dict for each column
'''
def create_dictionaries(filename, cutoff, oov_policy):
def add_to_dict(sequence, dicts):
num_features = len(dicts)
for features in sequence:
l = len(features)
assert l == num_features, "Wrong number of features " + line
for i in xrange(l):
if features[i] in dicts[i]:
dicts[i][features[i]] += 1
else:
dicts[i][features[i]] = 1
num_features = len(cutoff)
dicts = []
for i in xrange(num_features):
dicts.append(dict())
f = open(filename, 'rb')
sequence = []
for line in f:
line = line.strip()
if not line:
make_features(sequence)
add_to_dict(sequence, dicts)
sequence = []
continue
features = line.split(' ')
sequence.append(features)
for i in xrange(num_features):
dct = dicts[i]
n = 1 if oov_policy[i] == OOV_POLICY_USE else 0
todo = []
for k, v in dct.iteritems():
if v < cutoff[i]:
todo.append(k)
else:
dct[k] = n
n += 1
if oov_policy[i] == OOV_POLICY_USE:
# placeholder so that len(dct) will be the number of features
# including OOV
dct['#OOV#'] = 0
logger.info('column %d dict size=%d, ignored %d' % (i, n, len(todo)))
for k in todo:
del dct[k]
f.close()
return dicts
def encode_varint(v):
out = StringIO()
_EncodeVarint(out.write, v)
return out.getvalue()
def write_proto(file, message):
s = message.SerializeToString()
packed_len = encode_varint(len(s))
file.write(packed_len + s)
'''
if oov_policy[i] == OOV_POLICY_USE, features in i-th column which are not
existed in dicts[i] will be assigned to id 0.
if oov_policy[i] == OOV_POLICY_ERROR, all features in i-th column MUST exist
in dicts[i].
'''
def gen_proto_file(input_file, dicts, oov_policy, output_file):
def write_sequence(out, sequence):
num_features = len(dicts)
is_beginning = True
for features in sequence:
assert len(features) == num_features, \
"Wrong number of features: " + line
sample = DataFormat.DataSample()
for i in xrange(num_original_columns):
id = dicts[i].get(features[i], -1)
if id != -1:
sample.id_slots.append(id)
elif oov_policy[i] == OOV_POLICY_IGNORE:
sample.id_slots.append(0xffffffff)
elif oov_policy[i] == OOV_POLICY_ERROR:
logger.fatal("Unknown token: %s" % features[i])
else:
sample.id_slots.append(0)
if patterns:
dim = 0
vec = sample.vector_slots.add()
for i in xrange(num_original_columns, num_features):
id = dicts[i].get(features[i], -1)
if id != -1:
vec.ids.append(dim + id)
elif oov_policy[i] == OOV_POLICY_IGNORE:
pass
elif oov_policy[i] == OOV_POLICY_ERROR:
logger.fatal("Unknown token: %s" % features[i])
else:
vec.ids.append(dim + 0)
dim += len(dicts[i])
sample.is_beginning = is_beginning
is_beginning = False
write_proto(out, sample)
num_features = len(dicts)
f = open(input_file, 'rb')
out = open(output_file, 'wb')
header = DataFormat.DataHeader()
if patterns:
slot_def = header.slot_defs.add()
slot_def.type = DataFormat.SlotDef.VECTOR_SPARSE_NON_VALUE
slot_def.dim = sum(
[len(dicts[i]) for i in xrange(num_original_columns, len(dicts))])
logger.info("feature_dim=%s" % slot_def.dim)
for i in xrange(num_original_columns):
slot_def = header.slot_defs.add()
slot_def.type = DataFormat.SlotDef.INDEX
slot_def.dim = len(dicts[i])
write_proto(out, header)
num_sequences = 0
sequence = []
for line in f:
line = line.strip()
if not line:
make_features(sequence)
write_sequence(out, sequence)
sequence = []
num_sequences += 1
continue
features = line.split(' ')
sequence.append(features)
f.close()
out.close()
logger.info("num_sequences=%s" % num_sequences)
dict2 = {
'B-ADJP': 0,
'I-ADJP': 1,
'B-ADVP': 2,
'I-ADVP': 3,
'B-CONJP': 4,
'I-CONJP': 5,
'B-INTJ': 6,
'I-INTJ': 7,
'B-LST': 8,
'I-LST': 9,
'B-NP': 10,
'I-NP': 11,
'B-PP': 12,
'I-PP': 13,
'B-PRT': 14,
'I-PRT': 15,
'B-SBAR': 16,
'I-SBAR': 17,
'B-UCP': 18,
'I-UCP': 19,
'B-VP': 20,
'I-VP': 21,
'O': 22
}
if __name__ == '__main__':
cutoff = [3, 1, 0]
cutoff += [3] * len(patterns)
oov_policy = [OOV_POLICY_IGNORE, OOV_POLICY_ERROR, OOV_POLICY_ERROR]
oov_policy += [OOV_POLICY_IGNORE] * len(patterns)
dicts = create_dictionaries('trainer/tests/train.txt', cutoff, oov_policy)
dicts[2] = dict2
gen_proto_file('trainer/tests/train.txt', dicts, oov_policy,
'trainer/tests/train_proto.bin')
gen_proto_file('trainer/tests/test.txt', dicts, oov_policy,
'trainer/tests/test_proto.bin')
Confidence NN B-NP
in IN B-PP
the DT B-NP
pound NN I-NP
is VBZ B-VP
widely RB I-VP
expected VBN I-VP
to TO I-VP
take VB I-VP
another DT B-NP
sharp JJ I-NP
dive NN I-NP
if IN B-SBAR
trade NN B-NP
figures NNS I-NP
for IN B-PP
September NNP B-NP
, , O
due JJ B-ADJP
for IN B-PP
release NN B-NP
tomorrow NN B-NP
, , O
fail VB B-VP
to TO I-VP
show VB I-VP
a DT B-NP
substantial JJ I-NP
improvement NN I-NP
from IN B-PP
July NNP B-NP
and CC I-NP
August NNP I-NP
's POS B-NP
near-record JJ I-NP
deficits NNS I-NP
. . O
Chancellor NNP O
of IN B-PP
the DT B-NP
Exchequer NNP I-NP
Nigel NNP B-NP
Lawson NNP I-NP
's POS B-NP
restated VBN I-NP
commitment NN I-NP
to TO B-PP
a DT B-NP
firm NN I-NP
monetary JJ I-NP
policy NN I-NP
has VBZ B-VP
helped VBN I-VP
to TO I-VP
prevent VB I-VP
a DT B-NP
freefall NN I-NP
in IN B-PP
sterling NN B-NP
over IN B-PP
the DT B-NP
past JJ I-NP
week NN I-NP
. . O
But CC O
analysts NNS B-NP
reckon VBP B-VP
underlying VBG B-NP
support NN I-NP
for IN B-PP
sterling NN B-NP
has VBZ B-VP
been VBN I-VP
eroded VBN I-VP
by IN B-PP
the DT B-NP
chancellor NN I-NP
's POS B-NP
failure NN I-NP
to TO B-VP
announce VB I-VP
any DT B-NP
new JJ I-NP
policy NN I-NP
measures NNS I-NP
in IN B-PP
his PRP$ B-NP
Mansion NNP I-NP
House NNP I-NP
speech NN I-NP
last JJ B-NP
Thursday NNP I-NP
. . O
This DT B-NP
has VBZ B-VP
increased VBN I-VP
the DT B-NP
risk NN I-NP
of IN B-PP
the DT B-NP
government NN I-NP
being VBG B-VP
forced VBN I-VP
to TO I-VP
increase VB I-VP
base NN B-NP
rates NNS I-NP
to TO B-PP
16 CD B-NP
% NN I-NP
from IN B-PP
their PRP$ B-NP
current JJ I-NP
15 CD I-NP
% NN I-NP
level NN I-NP
to TO B-VP
defend VB I-VP
the DT B-NP
pound NN I-NP
, , O
economists NNS B-NP
and CC O
foreign JJ B-NP
exchange NN I-NP
market NN I-NP
analysts NNS I-NP
say VBP B-VP
. . O
`` `` O
The DT B-NP
risks NNS I-NP
for IN B-PP
sterling NN B-NP
of IN B-PP
a DT B-NP
bad JJ I-NP
trade NN I-NP
figure NN I-NP
are VBP B-VP
very RB B-ADVP
heavily RB I-ADVP
on IN B-PP
the DT B-NP
down JJ I-NP
side NN I-NP
, , O
'' '' O
said VBD B-VP
Chris NNP B-NP
Dillow NNP I-NP
, , O
senior JJ B-NP
U.K. NNP I-NP
economist NN I-NP
at IN B-PP
Nomura NNP B-NP
Research NNP I-NP
Institute NNP I-NP
. . O
`` `` O
If IN B-SBAR
there EX B-NP
is VBZ B-VP
another DT B-NP
bad JJ I-NP
trade NN I-NP
number NN I-NP
, , O
there EX B-NP
could MD B-VP
be VB I-VP
an DT B-NP
awful JJ I-NP
lot NN I-NP
of IN B-PP
pressure NN B-NP
, , O
'' '' O
noted VBD B-VP
Simon NNP B-NP
Briscoe NNP I-NP
, , O
U.K. NNP B-NP
economist NN I-NP
for IN B-PP
Midland NNP B-NP
Montagu NNP I-NP
, , O
a DT B-NP
unit NN I-NP
of IN B-PP
Midland NNP B-NP
Bank NNP I-NP
PLC NNP I-NP
. . O
Forecasts NNS B-NP
for IN B-PP
the DT B-NP
trade NN I-NP
figures NNS I-NP
range VBP B-VP
widely RB B-ADVP
, , O
but CC O
few JJ B-NP
economists NNS I-NP
expect VBP B-VP
the DT B-NP
data NNS I-NP
to TO B-VP
show VB I-VP
a DT B-NP
very RB I-NP
marked VBN I-NP
improvement NN I-NP
from IN B-PP
the DT O
# # O
2 CD O
billion CD O
-LRB- ( O
$ $ B-ADJP
3.2 CD O
billion CD O
-RRB- ) O
deficit NN B-NP
in IN B-PP
the DT B-NP
current JJ I-NP
account NN I-NP
reported VBD B-VP
for IN B-PP
August NNP B-NP
. . O
The DT B-NP
August NNP I-NP
deficit NN I-NP
and CC O
the DT B-NP
# # I-NP
2.2 CD I-NP
billion CD I-NP
gap NN I-NP
registered VBN B-VP
in IN B-PP
July NNP B-NP
are VBP B-VP
topped VBN I-VP
only RB B-ADVP
by IN B-PP
the DT B-NP
# # I-NP
2.3 CD I-NP
billion CD I-NP
deficit NN I-NP
of IN B-PP
October NNP B-NP
1988 CD I-NP
. . O
Sanjay NNP B-NP
Joshi NNP I-NP
, , O
European JJ B-NP
economist NN I-NP
at IN B-PP
Baring NNP B-NP
Brothers NNPS I-NP
& CC I-NP
Co. NNP I-NP
, , O
said VBD B-VP
there EX B-NP
is VBZ B-VP
no DT B-NP
sign NN I-NP
that IN B-SBAR
Britain NNP B-NP
's POS B-NP
manufacturing NN I-NP
industry NN I-NP
is VBZ B-VP
transforming VBG I-VP
itself PRP B-NP
to TO B-VP
boost VB I-VP
exports NNS B-NP
. . O
At IN B-PP
the DT B-NP
same JJ I-NP
time NN I-NP
, , O
he PRP B-NP
remains VBZ B-VP
fairly RB B-ADJP
pessimistic JJ I-ADJP
about IN B-PP
the DT B-NP
outlook NN I-NP
for IN B-PP
imports NNS B-NP
, , O
given VBN B-PP
continued VBD B-NP
high JJ I-NP
consumer NN I-NP
and CC I-NP
capital NN I-NP
goods NNS I-NP
inflows NNS I-NP
. . O
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
However RB B-ADVP
, , O
Mr. NNP B-NP
Dillow NNP I-NP
said VBD B-VP
he PRP B-NP
believes VBZ B-VP
that IN B-SBAR
a DT B-NP
reduction NN I-NP
in IN B-PP
raw JJ B-NP
material NN I-NP
stockbuilding VBG I-NP
by IN B-PP
industry NN B-NP
could MD B-VP
lead VB I-VP
to TO B-PP
a DT B-NP
sharp JJ I-NP
drop NN I-NP
in IN B-PP
imports NNS B-NP
. . O
Combined VBN B-PP
with IN B-PP
at IN B-ADVP
least JJS I-ADVP
some DT B-NP
rebound NN I-NP
in IN B-PP
exports NNS B-NP
after IN B-PP
August NNP B-NP
's POS B-NP
unexpected JJ I-NP
decline NN I-NP
, , O
the DT B-NP
deficit NN I-NP
could MD B-VP
narrow VB I-VP
to TO B-PP
as RB B-NP
little JJ I-NP
as IN I-NP
# # I-NP
1.3 CD I-NP
billion CD I-NP
. . O
Mr. NNP B-NP
Briscoe NNP I-NP
, , O
who WP B-NP
also RB B-ADVP
forecasts VBZ B-VP
a DT B-NP
# # I-NP
1.3 CD I-NP
billion CD I-NP
current JJ I-NP
account NN I-NP
gap NN I-NP
, , O
warns VBZ B-VP
that IN B-SBAR
even RB B-SBAR
if IN I-SBAR
the DT B-NP
trade NN I-NP
figures NNS I-NP
are VBP B-VP
bullish JJ B-ADJP
for IN B-PP
sterling NN B-NP
, , O
the DT B-NP
currency NN I-NP
wo MD B-VP
n't RB I-VP
advance VB I-VP
much JJ B-NP
because IN B-SBAR
investors NNS B-NP
will MD B-VP
want VB I-VP
to TO I-VP
see VB I-VP
further JJ B-NP
evidence NN I-NP
of IN B-PP
the DT B-NP
turnaround NN I-NP
before IN B-PP
adjusting VBG B-VP
positions NNS B-NP
. . O
Nevertheless RB B-ADVP
, , O
he PRP B-NP
noted VBD B-VP
, , O
`` `` O
No DT B-NP
one PRP I-NP
will MD B-VP
want VB I-VP
to TO I-VP
go VB I-VP
into IN B-PP
the DT B-NP
trade NN I-NP
figures NNS I-NP
without IN B-PP
a DT B-NP
flat JJ I-NP
position NN I-NP
'' '' O
in IN B-PP
the DT B-NP
pound NN I-NP
. . O
Meanwhile RB B-ADVP
, , O
overall JJ B-NP
evidence NN I-NP
on IN B-PP
the DT B-NP
economy NN I-NP
remains VBZ B-VP
fairly RB B-ADJP
clouded VBN I-ADJP
. . O
In IN B-PP
his PRP$ B-NP
Mansion NNP I-NP
House NNP I-NP
speech NN I-NP
, , O
Mr. NNP B-NP
Lawson NNP I-NP
warned VBD B-VP
that IN B-SBAR
a DT B-NP
further JJ I-NP
slowdown NN I-NP
can MD B-VP
be VB I-VP
expected VBN I-VP
as IN B-SBAR
the DT B-NP
impact NN I-NP
of IN B-PP
the DT B-NP
last JJ I-NP
rise NN I-NP
in IN B-PP
interest NN B-NP
rates NNS I-NP
earlier RBR B-NP
this DT I-NP
month NN I-NP
takes VBZ B-VP
effect NN B-NP
. . O
U.K. JJ B-NP
base NN I-NP
rates NNS I-NP
are VBP B-VP
at IN B-PP
their PRP$ B-NP
highest JJS I-NP
level NN I-NP
in IN B-PP
eight CD B-NP
years NNS I-NP
. . O
But CC O
consumer NN B-NP
expenditure NN I-NP
data NNS I-NP
released VBD B-VP
Friday NNP B-NP
do VBP B-VP
n't RB I-VP
suggest VB I-VP
that IN B-SBAR
the DT B-NP
U.K. NNP I-NP
economy NN I-NP
is VBZ B-VP
slowing VBG I-VP
that DT B-ADVP
quickly RB I-ADVP
. . O
The DT B-NP
figures NNS I-NP
show VBP B-VP
that DT O
spending NN B-NP
rose VBD B-VP
0.1 CD B-NP
% NN I-NP
in IN B-PP
the DT B-NP
third JJ I-NP
quarter NN I-NP
from IN B-PP
the DT B-NP
second JJ I-NP
quarter NN I-NP
and CC O
was VBD B-VP
up IN B-ADVP
3.8 CD B-NP
% NN I-NP
from IN B-PP
a DT B-NP
year NN I-NP
ago RB B-ADVP
. . O
This DT B-NP
compares VBZ B-VP
with IN B-PP
a DT B-NP
1.6 CD I-NP
% NN I-NP
rise NN I-NP
in IN B-PP
the DT B-NP
second NN I-NP
from IN B-PP
the DT B-NP
first JJ I-NP
quarter NN I-NP
and CC O
a DT B-NP
5.4 CD I-NP
% NN I-NP
increase NN I-NP
from IN B-PP
the DT B-NP
second JJ I-NP
quarter NN I-NP
of IN B-PP
1988 CD B-NP
. . O
Mr. NNP B-NP
Dillow NNP I-NP
said VBD B-VP
the DT B-NP
data NNS I-NP
show VBP B-VP
the DT B-NP
economy NN I-NP
`` `` O
is VBZ B-VP
still RB B-ADVP
quite RB B-ADJP
strong JJ I-ADJP
, , O
'' '' O
but CC O
suggestions NNS B-NP
that IN B-SBAR
much NN B-NP
of IN B-PP
the DT B-NP
spending NN I-NP
went VBD B-VP
on IN B-PP
services NNS B-NP
rather RB B-PP
than IN I-PP
consumer NN B-NP
goods NNS I-NP
should MD B-VP
reduce VB I-VP
fears NNS B-NP
of IN B-PP
more JJR B-NP
import NN I-NP
rises NNS I-NP
. . O
Certainly RB B-ADVP
, , O
the DT B-NP
chancellor NN I-NP
has VBZ B-VP
made VBN I-VP
it PRP B-NP
clear JJ B-ADJP
that IN B-SBAR
he PRP B-NP
is VBZ B-VP
prepared VBN I-VP
to TO I-VP
increase VB I-VP
interest NN B-NP
rates NNS I-NP
again RB B-ADVP
if IN B-SBAR
necessary JJ B-ADJP
to TO B-VP
both DT I-VP
ensure VB I-VP
that IN B-SBAR
a DT B-NP
substantial JJ I-NP
slowdown NN I-NP
does VBZ B-VP
take VB I-VP
place NN B-NP
and CC O
that DT O
sterling NN B-NP
does VBZ B-VP
n't RB I-VP
decline VB I-VP
further JJ B-ADVP
. . O
Thursday NNP B-NP
, , O
he PRP B-NP
reminded VBD B-VP
his PRP$ B-NP
audience NN I-NP
that IN B-SBAR
the DT B-NP
government NN I-NP
`` `` O
can MD B-VP
not RB I-VP
allow VB I-VP
the DT B-NP
necessary JJ I-NP
rigor NN I-NP
of IN B-PP
monetary JJ B-NP
policy NN I-NP
to TO B-VP
be VB I-VP
undermined VBN I-VP
by IN B-PP
exchange NN B-NP
rate NN I-NP
weakness NN I-NP
. . O
'' '' O
Analysts NNS B-NP
agree VBP B-VP
there EX B-NP
is VBZ B-VP
little JJ B-NP
holding NN B-VP
sterling NN B-NP
firm NN B-ADJP
at IN B-PP
the DT B-NP
moment NN I-NP
other JJ B-ADJP
than IN B-PP
Mr. NNP B-NP
Lawson NNP I-NP
's POS B-NP
promise NN I-NP
that IN B-SBAR
rates NNS B-NP
will MD B-VP
be VB I-VP
pushed VBN I-VP
higher JJR B-ADJP
if IN B-SBAR
necessary JJ B-ADJP
. . O
And CC O
, , O
they PRP B-NP
warn VBP B-VP
, , O
any DT B-NP
further JJ I-NP
drop NN I-NP
in IN B-PP
the DT B-NP
government NN I-NP
's POS B-NP
popularity NN I-NP
could MD B-VP
swiftly RB I-VP
make VB I-VP
this DT B-NP
promise NN I-NP
sound NN B-VP
hollow JJ B-ADJP
. . O
Sterling NNP B-NP
was VBD B-VP
already RB I-VP
showing VBG I-VP
some DT B-NP
signs NNS I-NP
of IN B-PP
a DT B-NP
lack NN I-NP
of IN B-PP
confidence NN B-NP
in IN B-PP
Mr. NNP B-NP
Lawson NNP I-NP
's POS B-NP
promise NN I-NP
Friday NNP B-NP
. . O
In IN B-PP
European JJ B-NP
trading NN I-NP
it PRP B-NP
declined VBD B-VP
to TO B-PP
$ $ B-NP
1.5890 CD I-NP
and CC O
2.9495 CD B-NP
marks NNS I-NP
from IN B-PP
$ $ B-NP
1.5940 CD I-NP
and CC O
2.9429 CD B-NP
marks NNS I-NP
late JJ B-NP
Thursday NNP I-NP
. . O
Economists NNS B-NP
suggested VBD B-VP
that IN B-SBAR
if IN B-SBAR
the DT B-NP
pound NN I-NP
falls VBZ B-VP
much JJ B-NP
below IN B-PP
2.90 CD B-NP
marks NNS I-NP
, , O
the DT B-NP
government NN I-NP
will MD B-VP
be VB I-VP
forced VBN I-VP
to TO I-VP
increase VB I-VP
rates NNS B-NP
to TO B-PP
16 CD B-NP
% NN I-NP
, , O
both DT B-VP
to TO I-VP
halt VB B-VP
any DT B-NP
further JJ I-NP
decline NN I-NP
and CC O
ensure VB B-VP
that IN B-SBAR
the DT B-NP
balance NN I-NP
of IN B-PP
monetary JJ B-NP
policy NN I-NP
remains VBZ B-VP
unchanged JJ B-ADJP
. . O
Friday NNP B-NP
's POS B-NP
Market NNP I-NP
Activity NN I-NP
The DT B-NP
dollar NN I-NP
posted VBD B-VP
gains NNS B-NP
in IN B-PP
quiet JJ B-NP
trading NN I-NP
as IN B-SBAR
concerns NNS B-NP
about IN B-PP
equities NNS B-NP
abated VBN B-VP
. . O
Foreign JJ B-NP
exchange NN I-NP
dealers NNS I-NP
said VBD B-VP
that IN B-SBAR
the DT B-NP
currency NN I-NP
market NN I-NP
has VBZ B-VP
begun VBN I-VP
to TO I-VP
distance VB I-VP
itself PRP B-NP
from IN B-PP
the DT B-NP
volatile JJ I-NP
stock NN I-NP
exchange NN I-NP
, , O
which WDT B-NP
has VBZ B-VP
preoccupied VBN I-VP
the DT B-NP
market NN I-NP
since IN B-PP
Oct. NNP B-NP
13 CD I-NP
, , O
when WRB B-ADVP
the DT B-NP
Dow NNP I-NP
Jones NNP I-NP
Industrial NNP I-NP
Average NNP I-NP
plunged VBD B-VP
more JJR B-NP
than IN I-NP
190 CD I-NP
points NNS I-NP
. . O
Currency NN B-NP
analysts NNS I-NP
predict VBP B-VP
that IN B-SBAR
in IN B-PP
the DT B-NP
coming VBG I-NP
week NN I-NP
the DT B-NP
foreign JJ I-NP
exchange NN I-NP
market NN I-NP
will MD B-VP
shift VB I-VP
its PRP$ B-NP
focus NN I-NP
back RB B-ADVP
to TO B-PP
economic JJ B-NP
fundamentals NNS I-NP
, , O
keeping VBG B-VP
a DT B-NP
close NN I-NP
eye NN I-NP
out IN B-ADVP
for IN B-PP
any DT B-NP
signs NNS I-NP
of IN B-PP
monetary JJ B-NP
easing NN I-NP
by IN B-PP
U.S. NNP B-NP
Federal NNP I-NP
Reserve NNP I-NP
. . O
Late RB B-ADVP
in IN B-PP
the DT B-NP
New NNP I-NP
York NNP I-NP
trading NN I-NP
day NN I-NP
, , O
the DT B-NP
dollar NN I-NP
was VBD B-VP
quoted VBN I-VP
at IN B-PP
1.8578 CD B-NP
marks NNS I-NP
, , O
up IN B-ADVP
from IN B-PP
1.8470 CD B-NP
marks NNS I-NP
late JJ B-NP
Thursday NNP I-NP
in IN B-PP
New NNP B-NP
York NNP I-NP
. . O
The DT B-NP
U.S. NNP I-NP
currency NN I-NP
was VBD B-VP
also RB I-VP
changing VBG I-VP
hands NNS B-NP
at IN B-PP
142.43 CD B-NP
yen NN I-NP
, , O
up IN B-ADVP
from IN B-PP
141.70 CD B-NP
yen NN I-NP
in IN B-PP
New NNP B-NP
York NNP I-NP
late JJ B-NP
Thursday NNP I-NP
. . O
In IN B-PP
Tokyo NNP B-NP
on IN B-PP
Monday NNP B-NP
, , O
the DT B-NP
U.S. NNP I-NP
currency NN I-NP
opened VBD B-VP
for IN B-PP
trading NN B-NP
at IN B-PP
141.95 CD B-NP
yen NN I-NP
, , O
up IN B-ADVP
from IN B-PP
Friday NNP B-NP
's POS B-NP
Tokyo NNP I-NP
...@@ -24,7 +24,6 @@ using namespace std; // NOLINT ...@@ -24,7 +24,6 @@ using namespace std; // NOLINT
static const string& configFile1 = "trainer/tests/sample_trainer_config.conf"; static const string& configFile1 = "trainer/tests/sample_trainer_config.conf";
static const string& configFile2 = static const string& configFile2 =
"trainer/tests/sample_trainer_config_hsigmoid.conf"; "trainer/tests/sample_trainer_config_hsigmoid.conf";
static const string& configFile3 = "trainer/tests/chunking.conf";
static const string& configFile4 = static const string& configFile4 =
"trainer/tests/sample_trainer_config_parallel.conf"; "trainer/tests/sample_trainer_config_parallel.conf";
...@@ -95,13 +94,6 @@ TEST(checkGradient, multi) { ...@@ -95,13 +94,6 @@ TEST(checkGradient, multi) {
TEST(checkGradient, hsigmoid) { checkGradientTest(configFile2, false, false); } TEST(checkGradient, hsigmoid) { checkGradientTest(configFile2, false, false); }
TEST(checkGradient, chunk) {
checkGradientTest(configFile3, false, false);
#ifdef PADDLE_WITH_CUDA
checkGradientTest(configFile3, true, true);
#endif
}
TEST(checkGradient, non_parallel) { TEST(checkGradient, non_parallel) {
checkGradientTest(configFile4, false, false); checkGradientTest(configFile4, false, false);
} }
......
...@@ -15,12 +15,7 @@ ...@@ -15,12 +15,7 @@
from paddle.trainer_config_helpers import * from paddle.trainer_config_helpers import *
TrainData(ProtoData( TrainData(SimpleData(
files = "dummy_list",
constant_slots = [1.0],
async_load_data = True))
TestData(SimpleData(
files = "trainer/tests/sample_filelist.txt", files = "trainer/tests/sample_filelist.txt",
feat_dim = 3, feat_dim = 3,
context_len = 0, context_len = 0,
......
Confidence NN B-NP
in IN B-PP
the DT B-NP
pound NN I-NP
is VBZ B-VP
widely RB I-VP
expected VBN I-VP
to TO I-VP
take VB I-VP
another DT B-NP
sharp JJ I-NP
dive NN I-NP
if IN B-SBAR
trade NN B-NP
figures NNS I-NP
for IN B-PP
September NNP B-NP
, , O
due JJ B-ADJP
for IN B-PP
release NN B-NP
tomorrow NN B-NP
, , O
fail VB B-VP
to TO I-VP
show VB I-VP
a DT B-NP
substantial JJ I-NP
improvement NN I-NP
from IN B-PP
July NNP B-NP
and CC I-NP
August NNP I-NP
's POS B-NP
near-record JJ I-NP
deficits NNS I-NP
. . O
Chancellor NNP O
of IN B-PP
the DT B-NP
Exchequer NNP I-NP
Nigel NNP B-NP
Lawson NNP I-NP
's POS B-NP
restated VBN I-NP
commitment NN I-NP
to TO B-PP
a DT B-NP
firm NN I-NP
monetary JJ I-NP
policy NN I-NP
has VBZ B-VP
helped VBN I-VP
to TO I-VP
prevent VB I-VP
a DT B-NP
freefall NN I-NP
in IN B-PP
sterling NN B-NP
over IN B-PP
the DT B-NP
past JJ I-NP
week NN I-NP
. . O
But CC O
analysts NNS B-NP
reckon VBP B-VP
underlying VBG B-NP
support NN I-NP
for IN B-PP
sterling NN B-NP
has VBZ B-VP
been VBN I-VP
eroded VBN I-VP
by IN B-PP
the DT B-NP
chancellor NN I-NP
's POS B-NP
failure NN I-NP
to TO B-VP
announce VB I-VP
any DT B-NP
new JJ I-NP
policy NN I-NP
measures NNS I-NP
in IN B-PP
his PRP$ B-NP
Mansion NNP I-NP
House NNP I-NP
speech NN I-NP
last JJ B-NP
Thursday NNP I-NP
. . O
This DT B-NP
has VBZ B-VP
increased VBN I-VP
the DT B-NP
risk NN I-NP
of IN B-PP
the DT B-NP
government NN I-NP
being VBG B-VP
forced VBN I-VP
to TO I-VP
increase VB I-VP
base NN B-NP
rates NNS I-NP
to TO B-PP
16 CD B-NP
% NN I-NP
from IN B-PP
their PRP$ B-NP
current JJ I-NP
15 CD I-NP
% NN I-NP
level NN I-NP
to TO B-VP
defend VB I-VP
the DT B-NP
pound NN I-NP
, , O
economists NNS B-NP
and CC O
foreign JJ B-NP
exchange NN I-NP
market NN I-NP
analysts NNS I-NP
say VBP B-VP
. . O
`` `` O
The DT B-NP
risks NNS I-NP
for IN B-PP
sterling NN B-NP
of IN B-PP
a DT B-NP
bad JJ I-NP
trade NN I-NP
figure NN I-NP
are VBP B-VP
very RB B-ADVP
heavily RB I-ADVP
on IN B-PP
the DT B-NP
down JJ I-NP
side NN I-NP
, , O
'' '' O
said VBD B-VP
Chris NNP B-NP
Dillow NNP I-NP
, , O
senior JJ B-NP
U.K. NNP I-NP
economist NN I-NP
at IN B-PP
Nomura NNP B-NP
Research NNP I-NP
Institute NNP I-NP
. . O
`` `` O
If IN B-SBAR
there EX B-NP
is VBZ B-VP
another DT B-NP
bad JJ I-NP
trade NN I-NP
number NN I-NP
, , O
there EX B-NP
could MD B-VP
be VB I-VP
an DT B-NP
awful JJ I-NP
lot NN I-NP
of IN B-PP
pressure NN B-NP
, , O
'' '' O
noted VBD B-VP
Simon NNP B-NP
Briscoe NNP I-NP
, , O
U.K. NNP B-NP
economist NN I-NP
for IN B-PP
Midland NNP B-NP
Montagu NNP I-NP
, , O
a DT B-NP
unit NN I-NP
of IN B-PP
Midland NNP B-NP
Bank NNP I-NP
PLC NNP I-NP
. . O
Forecasts NNS B-NP
for IN B-PP
the DT B-NP
trade NN I-NP
figures NNS I-NP
range VBP B-VP
widely RB B-ADVP
, , O
but CC O
few JJ B-NP
economists NNS I-NP
expect VBP B-VP
the DT B-NP
data NNS I-NP
to TO B-VP
show VB I-VP
a DT B-NP
very RB I-NP
marked VBN I-NP
improvement NN I-NP
from IN B-PP
the DT O
# # O
2 CD O
billion CD O
-LRB- ( O
$ $ B-ADJP
3.2 CD O
billion CD O
-RRB- ) O
deficit NN B-NP
in IN B-PP
the DT B-NP
current JJ I-NP
account NN I-NP
reported VBD B-VP
for IN B-PP
August NNP B-NP
. . O
The DT B-NP
August NNP I-NP
deficit NN I-NP
and CC O
the DT B-NP
# # I-NP
2.2 CD I-NP
billion CD I-NP
gap NN I-NP
registered VBN B-VP
in IN B-PP
July NNP B-NP
are VBP B-VP
topped VBN I-VP
only RB B-ADVP
by IN B-PP
the DT B-NP
# # I-NP
2.3 CD I-NP
billion CD I-NP
deficit NN I-NP
of IN B-PP
October NNP B-NP
1988 CD I-NP
. . O
Sanjay NNP B-NP
Joshi NNP I-NP
, , O
European JJ B-NP
economist NN I-NP
at IN B-PP
Baring NNP B-NP
Brothers NNPS I-NP
& CC I-NP
Co. NNP I-NP
, , O
said VBD B-VP
there EX B-NP
is VBZ B-VP
no DT B-NP
sign NN I-NP
that IN B-SBAR
Britain NNP B-NP
's POS B-NP
manufacturing NN I-NP
industry NN I-NP
is VBZ B-VP
transforming VBG I-VP
itself PRP B-NP
to TO B-VP
boost VB I-VP
exports NNS B-NP
. . O
At IN B-PP
the DT B-NP
same JJ I-NP
time NN I-NP
, , O
he PRP B-NP
remains VBZ B-VP
fairly RB B-ADJP
pessimistic JJ I-ADJP
about IN B-PP
the DT B-NP
outlook NN I-NP
for IN B-PP
imports NNS B-NP
, , O
given VBN B-PP
continued VBD B-NP
high JJ I-NP
consumer NN I-NP
and CC I-NP
capital NN I-NP
goods NNS I-NP
inflows NNS I-NP
. . O
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
However RB B-ADVP
, , O
Mr. NNP B-NP
Dillow NNP I-NP
said VBD B-VP
he PRP B-NP
believes VBZ B-VP
that IN B-SBAR
a DT B-NP
reduction NN I-NP
in IN B-PP
raw JJ B-NP
material NN I-NP
stockbuilding VBG I-NP
by IN B-PP
industry NN B-NP
could MD B-VP
lead VB I-VP
to TO B-PP
a DT B-NP
sharp JJ I-NP
drop NN I-NP
in IN B-PP
imports NNS B-NP
. . O
Combined VBN B-PP
with IN B-PP
at IN B-ADVP
least JJS I-ADVP
some DT B-NP
rebound NN I-NP
in IN B-PP
exports NNS B-NP
after IN B-PP
August NNP B-NP
's POS B-NP
unexpected JJ I-NP
decline NN I-NP
, , O
the DT B-NP
deficit NN I-NP
could MD B-VP
narrow VB I-VP
to TO B-PP
as RB B-NP
little JJ I-NP
as IN I-NP
# # I-NP
1.3 CD I-NP
billion CD I-NP
. . O
Mr. NNP B-NP
Briscoe NNP I-NP
, , O
who WP B-NP
also RB B-ADVP
forecasts VBZ B-VP
a DT B-NP
# # I-NP
1.3 CD I-NP
billion CD I-NP
current JJ I-NP
account NN I-NP
gap NN I-NP
, , O
warns VBZ B-VP
that IN B-SBAR
even RB B-SBAR
if IN I-SBAR
the DT B-NP
trade NN I-NP
figures NNS I-NP
are VBP B-VP
bullish JJ B-ADJP
for IN B-PP
sterling NN B-NP
, , O
the DT B-NP
currency NN I-NP
wo MD B-VP
n't RB I-VP
advance VB I-VP
much JJ B-NP
because IN B-SBAR
investors NNS B-NP
will MD B-VP
want VB I-VP
to TO I-VP
see VB I-VP
further JJ B-NP
evidence NN I-NP
of IN B-PP
the DT B-NP
turnaround NN I-NP
before IN B-PP
adjusting VBG B-VP
positions NNS B-NP
. . O
Nevertheless RB B-ADVP
, , O
he PRP B-NP
noted VBD B-VP
, , O
`` `` O
No DT B-NP
one PRP I-NP
will MD B-VP
want VB I-VP
to TO I-VP
go VB I-VP
into IN B-PP
the DT B-NP
trade NN I-NP
figures NNS I-NP
without IN B-PP
a DT B-NP
flat JJ I-NP
position NN I-NP
'' '' O
in IN B-PP
the DT B-NP
pound NN I-NP
. . O
Meanwhile RB B-ADVP
, , O
overall JJ B-NP
evidence NN I-NP
on IN B-PP
the DT B-NP
economy NN I-NP
remains VBZ B-VP
fairly RB B-ADJP
clouded VBN I-ADJP
. . O
In IN B-PP
his PRP$ B-NP
Mansion NNP I-NP
House NNP I-NP
speech NN I-NP
, , O
Mr. NNP B-NP
Lawson NNP I-NP
warned VBD B-VP
that IN B-SBAR
a DT B-NP
further JJ I-NP
slowdown NN I-NP
can MD B-VP
be VB I-VP
expected VBN I-VP
as IN B-SBAR
the DT B-NP
impact NN I-NP
of IN B-PP
the DT B-NP
last JJ I-NP
rise NN I-NP
in IN B-PP
interest NN B-NP
rates NNS I-NP
earlier RBR B-NP
this DT I-NP
month NN I-NP
takes VBZ B-VP
effect NN B-NP
. . O
U.K. JJ B-NP
base NN I-NP
rates NNS I-NP
are VBP B-VP
at IN B-PP
their PRP$ B-NP
highest JJS I-NP
level NN I-NP
in IN B-PP
eight CD B-NP
years NNS I-NP
. . O
But CC O
consumer NN B-NP
expenditure NN I-NP
data NNS I-NP
released VBD B-VP
Friday NNP B-NP
do VBP B-VP
n't RB I-VP
suggest VB I-VP
that IN B-SBAR
the DT B-NP
U.K. NNP I-NP
economy NN I-NP
is VBZ B-VP
slowing VBG I-VP
that DT B-ADVP
quickly RB I-ADVP
. . O
The DT B-NP
figures NNS I-NP
show VBP B-VP
that DT O
spending NN B-NP
rose VBD B-VP
0.1 CD B-NP
% NN I-NP
in IN B-PP
the DT B-NP
third JJ I-NP
quarter NN I-NP
from IN B-PP
the DT B-NP
second JJ I-NP
quarter NN I-NP
and CC O
was VBD B-VP
up IN B-ADVP
3.8 CD B-NP
% NN I-NP
from IN B-PP
a DT B-NP
year NN I-NP
ago RB B-ADVP
. . O
This DT B-NP
compares VBZ B-VP
with IN B-PP
a DT B-NP
1.6 CD I-NP
% NN I-NP
rise NN I-NP
in IN B-PP
the DT B-NP
second NN I-NP
from IN B-PP
the DT B-NP
first JJ I-NP
quarter NN I-NP
and CC O
a DT B-NP
5.4 CD I-NP
% NN I-NP
increase NN I-NP
from IN B-PP
the DT B-NP
second JJ I-NP
quarter NN I-NP
of IN B-PP
1988 CD B-NP
. . O
Mr. NNP B-NP
Dillow NNP I-NP
said VBD B-VP
the DT B-NP
data NNS I-NP
show VBP B-VP
the DT B-NP
economy NN I-NP
`` `` O
is VBZ B-VP
still RB B-ADVP
quite RB B-ADJP
strong JJ I-ADJP
, , O
'' '' O
but CC O
suggestions NNS B-NP
that IN B-SBAR
much NN B-NP
of IN B-PP
the DT B-NP
spending NN I-NP
went VBD B-VP
on IN B-PP
services NNS B-NP
rather RB B-PP
than IN I-PP
consumer NN B-NP
goods NNS I-NP
should MD B-VP
reduce VB I-VP
fears NNS B-NP
of IN B-PP
more JJR B-NP
import NN I-NP
rises NNS I-NP
. . O
Certainly RB B-ADVP
, , O
the DT B-NP
chancellor NN I-NP
has VBZ B-VP
made VBN I-VP
it PRP B-NP
clear JJ B-ADJP
that IN B-SBAR
he PRP B-NP
is VBZ B-VP
prepared VBN I-VP
to TO I-VP
increase VB I-VP
interest NN B-NP
rates NNS I-NP
again RB B-ADVP
if IN B-SBAR
necessary JJ B-ADJP
to TO B-VP
both DT I-VP
ensure VB I-VP
that IN B-SBAR
a DT B-NP
substantial JJ I-NP
slowdown NN I-NP
does VBZ B-VP
take VB I-VP
place NN B-NP
and CC O
that DT O
sterling NN B-NP
does VBZ B-VP
n't RB I-VP
decline VB I-VP
further JJ B-ADVP
. . O
Thursday NNP B-NP
, , O
he PRP B-NP
reminded VBD B-VP
his PRP$ B-NP
audience NN I-NP
that IN B-SBAR
the DT B-NP
government NN I-NP
`` `` O
can MD B-VP
not RB I-VP
allow VB I-VP
the DT B-NP
necessary JJ I-NP
rigor NN I-NP
of IN B-PP
monetary JJ B-NP
policy NN I-NP
to TO B-VP
be VB I-VP
undermined VBN I-VP
by IN B-PP
exchange NN B-NP
rate NN I-NP
weakness NN I-NP
. . O
'' '' O
Analysts NNS B-NP
agree VBP B-VP
there EX B-NP
is VBZ B-VP
little JJ B-NP
holding NN B-VP
sterling NN B-NP
firm NN B-ADJP
at IN B-PP
the DT B-NP
moment NN I-NP
other JJ B-ADJP
than IN B-PP
Mr. NNP B-NP
Lawson NNP I-NP
's POS B-NP
promise NN I-NP
that IN B-SBAR
rates NNS B-NP
will MD B-VP
be VB I-VP
pushed VBN I-VP
higher JJR B-ADJP
if IN B-SBAR
necessary JJ B-ADJP
. . O
And CC O
, , O
they PRP B-NP
warn VBP B-VP
, , O
any DT B-NP
further JJ I-NP
drop NN I-NP
in IN B-PP
the DT B-NP
government NN I-NP
's POS B-NP
popularity NN I-NP
could MD B-VP
swiftly RB I-VP
make VB I-VP
this DT B-NP
promise NN I-NP
sound NN B-VP
hollow JJ B-ADJP
. . O
Sterling NNP B-NP
was VBD B-VP
already RB I-VP
showing VBG I-VP
some DT B-NP
signs NNS I-NP
of IN B-PP
a DT B-NP
lack NN I-NP
of IN B-PP
confidence NN B-NP
in IN B-PP
Mr. NNP B-NP
Lawson NNP I-NP
's POS B-NP
promise NN I-NP
Friday NNP B-NP
. . O
In IN B-PP
European JJ B-NP
trading NN I-NP
it PRP B-NP
declined VBD B-VP
to TO B-PP
$ $ B-NP
1.5890 CD I-NP
and CC O
2.9495 CD B-NP
marks NNS I-NP
from IN B-PP
$ $ B-NP
1.5940 CD I-NP
and CC O
2.9429 CD B-NP
marks NNS I-NP
late JJ B-NP
Thursday NNP I-NP
. . O
Economists NNS B-NP
suggested VBD B-VP
that IN B-SBAR
if IN B-SBAR
the DT B-NP
pound NN I-NP
falls VBZ B-VP
much JJ B-NP
below IN B-PP
2.90 CD B-NP
marks NNS I-NP
, , O
the DT B-NP
government NN I-NP
will MD B-VP
be VB I-VP
forced VBN I-VP
to TO I-VP
increase VB I-VP
rates NNS B-NP
to TO B-PP
16 CD B-NP
% NN I-NP
, , O
both DT B-VP
to TO I-VP
halt VB B-VP
any DT B-NP
further JJ I-NP
decline NN I-NP
and CC O
ensure VB B-VP
that IN B-SBAR
the DT B-NP
balance NN I-NP
of IN B-PP
monetary JJ B-NP
policy NN I-NP
remains VBZ B-VP
unchanged JJ B-ADJP
. . O
Friday NNP B-NP
's POS B-NP
Market NNP I-NP
Activity NN I-NP
The DT B-NP
dollar NN I-NP
posted VBD B-VP
gains NNS B-NP
in IN B-PP
quiet JJ B-NP
trading NN I-NP
as IN B-SBAR
concerns NNS B-NP
about IN B-PP
equities NNS B-NP
abated VBN B-VP
. . O
Foreign JJ B-NP
exchange NN I-NP
dealers NNS I-NP
said VBD B-VP
that IN B-SBAR
the DT B-NP
currency NN I-NP
market NN I-NP
has VBZ B-VP
begun VBN I-VP
to TO I-VP
distance VB I-VP
itself PRP B-NP
from IN B-PP
the DT B-NP
volatile JJ I-NP
stock NN I-NP
exchange NN I-NP
, , O
which WDT B-NP
has VBZ B-VP
preoccupied VBN I-VP
the DT B-NP
market NN I-NP
since IN B-PP
Oct. NNP B-NP
13 CD I-NP
, , O
when WRB B-ADVP
the DT B-NP
Dow NNP I-NP
Jones NNP I-NP
Industrial NNP I-NP
Average NNP I-NP
plunged VBD B-VP
more JJR B-NP
than IN I-NP
190 CD I-NP
points NNS I-NP
. . O
Currency NN B-NP
analysts NNS I-NP
predict VBP B-VP
that IN B-SBAR
in IN B-PP
the DT B-NP
coming VBG I-NP
week NN I-NP
the DT B-NP
foreign JJ I-NP
exchange NN I-NP
market NN I-NP
will MD B-VP
shift VB I-VP
its PRP$ B-NP
focus NN I-NP
back RB B-ADVP
to TO B-PP
economic JJ B-NP
fundamentals NNS I-NP
, , O
keeping VBG B-VP
a DT B-NP
close NN I-NP
eye NN I-NP
out IN B-ADVP
for IN B-PP
any DT B-NP
signs NNS I-NP
of IN B-PP
monetary JJ B-NP
easing NN I-NP
by IN B-PP
U.S. NNP B-NP
Federal NNP I-NP
Reserve NNP I-NP
. . O
Late RB B-ADVP
in IN B-PP
the DT B-NP
New NNP I-NP
York NNP I-NP
trading NN I-NP
day NN I-NP
, , O
the DT B-NP
dollar NN I-NP
was VBD B-VP
quoted VBN I-VP
at IN B-PP
1.8578 CD B-NP
marks NNS I-NP
, , O
up IN B-ADVP
from IN B-PP
1.8470 CD B-NP
marks NNS I-NP
late JJ B-NP
Thursday NNP I-NP
in IN B-PP
New NNP B-NP
York NNP I-NP
. . O
The DT B-NP
U.S. NNP I-NP
currency NN I-NP
was VBD B-VP
also RB I-VP
changing VBG I-VP
hands NNS B-NP
at IN B-PP
142.43 CD B-NP
yen NN I-NP
, , O
up IN B-ADVP
from IN B-PP
141.70 CD B-NP
yen NN I-NP
in IN B-PP
New NNP B-NP
York NNP I-NP
late JJ B-NP
Thursday NNP I-NP
. . O
In IN B-PP
Tokyo NNP B-NP
on IN B-PP
Monday NNP B-NP
, , O
the DT B-NP
U.S. NNP I-NP
currency NN I-NP
opened VBD B-VP
for IN B-PP
trading NN B-NP
at IN B-PP
141.95 CD B-NP
yen NN I-NP
, , O
up IN B-ADVP
from IN B-PP
Friday NNP B-NP
's POS B-NP
Tokyo NNP I-NP
close NN I-NP
of IN B-PP
141.35 CD B-NP
yen NN I-NP
. . O
On IN B-PP
the DT B-NP
Commodity NNP I-NP
Exchange NNP I-NP
in IN B-PP
New NNP B-NP
York NNP I-NP
, , O
gold NN B-NP
for IN B-PP
current JJ B-NP
delivery NN I-NP
settled VBD B-VP
at IN B-PP
$ $ B-NP
367.30 CD I-NP
an DT B-NP
ounce NN I-NP
, , O
up IN B-ADVP
20 CD B-NP
cents NNS I-NP
. . O
Estimated VBN B-NP
volume NN I-NP
was VBD B-VP
a DT B-NP
light NN I-NP
2.4 CD I-NP
million CD I-NP
ounces NNS I-NP
. . O
In IN B-PP
early JJ B-NP
trading NN I-NP
in IN B-PP
Hong NNP B-NP
Kong NNP I-NP
Monday NNP B-NP
, , O
gold NN B-NP
was VBD B-VP
quoted VBN I-VP
at IN B-PP
$ $ B-NP
366.50 CD I-NP
an DT B-NP
ounce NN I-NP
. . O
East NNP B-NP
Rock NNP I-NP
Partners NNP I-NP
Limited NNP I-NP
Partnership NNP I-NP
said VBD B-VP
it PRP B-NP
proposed VBD B-VP
to TO I-VP
acquire VB I-VP
A.P. NNP B-NP
Green NNP I-NP
Industries NNP I-NP
Inc. NNP I-NP
for IN B-PP
$ $ B-NP
40 CD I-NP
a DT B-NP
share NN I-NP
. . O
In IN B-PP
an DT B-NP
Oct. NNP I-NP
19 CD I-NP
letter NN I-NP
to TO B-PP
A.P. NNP B-NP
Green NNP I-NP
's POS B-NP
board NN I-NP
, , O
East NNP B-NP
Rock NNP I-NP
said VBD B-VP
the DT B-NP
offer NN I-NP
is VBZ B-VP
subject NN B-ADJP
to TO B-PP
the DT B-NP
signing NN I-NP
of IN B-PP
a DT B-NP
merger NN I-NP
agreement NN I-NP
by IN B-PP
no DT B-ADVP
later RB I-ADVP
than IN B-PP
Oct. NNP B-NP
31 CD I-NP
. . O
The DT B-NP
letter NN I-NP
, , O
attached VBN B-VP
to TO B-PP
a DT B-NP
filing NN I-NP
with IN B-PP
the DT B-NP
Securities NNP I-NP
and CC I-NP
Exchange NNP I-NP
Commission NNP I-NP
, , O
said VBD B-VP
the DT B-NP
approval NN I-NP
is VBZ B-VP
also RB B-ADVP
contingent JJ B-ADJP
upon IN B-PP
obtaining VBG B-VP
satisfactory JJ B-NP
financing NN I-NP
. . O
An DT B-NP
A.P. NNP I-NP
Green NNP I-NP
official NN I-NP
declined VBD B-VP
to TO I-VP
comment VB I-VP
on IN B-PP
the DT B-NP
filing NN I-NP
. . O
The DT B-NP
$ $ I-NP
40-a-share JJ I-NP
proposal NN I-NP
values VBZ B-VP
the DT B-NP
company NN I-NP
at IN B-PP
about RB B-NP
$ $ I-NP
106.6 CD I-NP
million CD I-NP
. . O
A.P. NNP B-NP
Green NNP I-NP
currently RB B-ADVP
has VBZ B-VP
2,664,098 CD B-NP
shares NNS I-NP
outstanding JJ B-ADJP
. . O
Its PRP$ B-NP
stock NN I-NP
closed VBD B-VP
at IN B-PP
$ $ B-NP
38 CD I-NP
, , O
up IN B-ADVP
$ $ B-NP
1.875 CD I-NP
, , O
in IN B-PP
national JJ B-NP
over-the-counter JJ I-NP
trading NN I-NP
. . O
The DT B-NP
company NN I-NP
is VBZ B-VP
a DT B-NP
Mexico NNP I-NP
, , I-NP
Mo. NNP I-NP
, , I-NP
maker NN I-NP
of IN B-PP
refractory JJ B-NP
products NNS I-NP
. . O
East NNP B-NP
Rock NNP I-NP
also RB B-ADVP
said VBD B-VP
in IN B-PP
the DT B-NP
filing NN I-NP
that IN B-SBAR
it PRP B-NP
boosted VBD B-VP
its PRP$ B-NP
stake NN I-NP
in IN B-PP
A.P. NNP B-NP
Green NNP I-NP
to TO B-PP
8.7 CD B-NP
% NN I-NP
. . O
It PRP B-NP
now RB B-ADVP
holds VBZ B-VP
233,000 CD B-NP
A.P. NNP I-NP
Green NNP I-NP
common JJ I-NP
shares NNS I-NP
, , O
including VBG B-PP
30,000 CD B-NP
shares NNS I-NP
bought VBD B-VP
last JJ B-NP
Thursday NNP I-NP
for IN B-PP
$ $ B-NP
35.50 CD I-NP
to TO I-NP
$ $ I-NP
36.50 CD I-NP
a DT B-NP
share NN I-NP
. . O
New NNP B-NP
York-based JJ I-NP
John NNP I-NP
Kuhns NNP I-NP
and CC I-NP
Robert NNP I-NP
MacDonald NNP I-NP
control NN B-VP
East NNP B-NP
Rock NNP I-NP
Partners NNP I-NP
Inc. NNP I-NP
, , O
the DT B-NP
sole JJ I-NP
general JJ I-NP
partner NN I-NP
of IN B-PP
East NNP B-NP
Rock NNP I-NP
Partners NNP I-NP
L.P NNP I-NP
. . O
The DT B-NP
sole JJ I-NP
limited JJ I-NP
partner NN I-NP
of IN B-PP
the DT B-NP
partnership NN I-NP
is VBZ B-VP
Westwood NNP B-NP
Brick NNP I-NP
Lime NNP I-NP
Inc. NNP I-NP
, , O
an DT B-NP
indirect JJ I-NP
subsidiary NN I-NP
of IN B-PP
Westwood NNP B-NP
Group NNP I-NP
Inc NNP I-NP
. . O
Both DT B-NP
Westwood NNP B-NP
Brick NNP I-NP
and CC O
Westwood NNP B-NP
Group NNP I-NP
are VBP B-VP
based VBN I-VP
in IN B-PP
Boston NNP B-NP
. . O
Freight NN B-NP
rates NNS I-NP
, , O
declining VBG B-VP
for IN B-PP
most RBS B-NP
of IN B-PP
the DT B-NP
decade NN I-NP
because IN B-PP
of IN I-PP
competition NN B-NP
spurred VBN B-VP
by IN B-PP
deregulation NN B-NP
, , O
are VBP B-VP
bottoming VBG I-VP
out IN B-PRT
, , O
turning VBG B-VP
upward RB B-ADVP
and CC O
threatening VBG B-VP
to TO I-VP
fuel VB I-VP
inflation NN B-NP
. . O
Trucking NNP B-NP
, , I-NP
shipping VBG I-NP
and CC I-NP
air-freight NN I-NP
companies NNS I-NP
have VBP B-VP
announced VBN I-VP
rate NN B-NP
increases NNS I-NP
, , O
scheduled VBN B-VP
for IN B-PP
this DT B-NP
fall NN I-NP
or CC O
early JJ B-NP
next JJ I-NP
year NN I-NP
, , O
reflecting VBG B-VP
higher JJR B-NP
costs NNS I-NP
and CC O
tightened VBD B-NP
demand NN I-NP
for IN B-PP
freight NN B-NP
transport NN I-NP
. . O
Major JJ B-NP
shippers NNS I-NP
say VBP B-VP
they PRP B-NP
expect VBP B-VP
freight NN B-NP
rates NNS I-NP
to TO B-VP
rise VB I-VP
at IN B-ADVP
least JJS I-ADVP
as RB B-ADVP
fast RB I-ADVP
as IN B-PP
inflation NN B-NP
and CC B-ADVP
maybe RB I-ADVP
faster RBR B-ADVP
in IN B-PP
the DT B-NP
next JJ I-NP
few JJ I-NP
years NNS I-NP
. . O
That DT B-NP
's VBZ B-VP
a DT B-NP
big JJ I-NP
change NN I-NP
from IN B-PP
recent JJ B-NP
years NNS I-NP
when WRB B-ADVP
freight NN B-NP
haulage NN I-NP
was VBD B-VP
a DT B-NP
bright JJ I-NP
spot NN I-NP
for IN B-PP
U.S. NNP B-NP
productivity NN I-NP
, , O
helping VBG B-VP
to TO I-VP
restrain VB I-VP
inflation NN B-NP
and CC O
make VB B-VP
U.S. NNP B-NP
industry NN I-NP
more RBR B-ADJP
competitive JJ I-ADJP
abroad RB B-ADVP
. . O
`` `` O
Demand NN B-NP
has VBZ B-VP
caught VBN I-VP
up IN B-PRT
with IN B-PP
the DT B-NP
supply NN I-NP
of IN B-PP
certain JJ B-NP
types NNS I-NP
of IN B-PP
freight NN B-NP
transportation NN I-NP
, , O
and CC O
rates NNS B-NP
are VBP B-VP
starting VBG I-VP
to TO I-VP
move VB I-VP
up IN B-ADVP
'' '' O
at IN B-PP
a DT B-NP
rate NN I-NP
`` `` O
close RB B-ADJP
to TO B-PP
or CC O
slightly RB B-ADJP
more JJR I-ADJP
than IN B-PP
the DT B-NP
inflation NN I-NP
rate NN I-NP
, , O
'' '' O
said VBD B-VP
Clifford NNP B-NP
Sayre NNP I-NP
, , O
director NN B-NP
of IN B-PP
logistics NNS B-NP
at IN B-PP
Du NNP B-NP
Pont NNP I-NP
Co NNP I-NP
. . O
Shippers NNS B-NP
surveyed VBN B-VP
recently RB B-ADVP
by IN B-PP
Ohio NNP B-NP
State NNP I-NP
University NNP I-NP
said VBD B-VP
they PRP B-NP
expect VBP B-VP
their PRP$ B-NP
freight-transport JJ I-NP
, , I-NP
storage NN I-NP
and CC I-NP
distribution NN I-NP
costs NNS I-NP
to TO B-VP
rise VB I-VP
about IN B-NP
4 CD I-NP
% NN I-NP
this DT B-NP
year NN I-NP
. . O
Only RB B-NP
10 CD I-NP
% NN I-NP
of IN B-PP
the DT B-NP
250 CD I-NP
shippers NNS I-NP
polled VBN B-VP
expected VBN B-VP
their PRP$ B-NP
freight-transport JJ I-NP
costs NNS I-NP
to TO B-VP
decrease VB I-VP
, , O
compared VBN B-PP
with IN B-PP
30 CD B-NP
% NN I-NP
who WP B-NP
had VBD B-VP
looked VBN I-VP
to TO B-PP
freight VB B-NP
transport NN I-NP
to TO B-VP
reduce VB I-VP
costs NNS B-NP
in IN B-PP
past JJ B-NP
years NNS I-NP
. . O
`` `` O
This DT B-NP
is VBZ B-VP
the DT B-NP
first JJ I-NP
year NN I-NP
since IN B-PP
transportation NN B-NP
deregulation NN I-NP
in IN B-PP
1980 CD B-NP
that IN B-ADVP
we PRP B-NP
have VBP B-VP
had VBN I-VP
such JJ B-NP
a DT I-NP
dramatic JJ I-NP
and CC I-NP
broad-based JJ I-NP
upturn NN I-NP
in IN B-PP
perceived VBN B-NP
transportation NN I-NP
rates NNS I-NP
, , O
'' '' O
said VBD B-VP
Bernard NNP B-NP
LaLonde NNP I-NP
, , O
a DT B-NP
transportation NN I-NP
logistics NNS I-NP
professor NN I-NP
at IN B-PP
Ohio NNP B-NP
State NNP I-NP
in IN B-PP
Columbus NNP B-NP
. . O
The DT B-NP
deregulation NN I-NP
of IN B-PP
railroads NNS B-NP
and CC I-NP
trucking NN I-NP
companies NNS I-NP
that WDT B-NP
began VBD B-VP
in IN B-PP
1980 CD B-NP
enabled VBD B-VP
shippers NNS B-NP
to TO B-VP
bargain VB I-VP
for IN B-PP
transportation NN B-NP
. . O
Carriers NNP B-NP
could MD B-VP
use VB I-VP
their PRP$ B-NP
equipment NN I-NP
more RBR B-ADVP
efficiently RB I-ADVP
, , O
leading VBG B-VP
to TO B-PP
overcapacity NN B-NP
they PRP B-NP
were VBD B-VP
eager JJ B-ADJP
to TO B-VP
fill VB I-VP
. . O
Shippers NNS B-NP
cut VBP B-VP
about RB B-NP
$ $ I-NP
35 CD I-NP
billion CD I-NP
from IN B-PP
their PRP$ B-NP
annual JJ I-NP
, , I-NP
inter-city JJ I-NP
truck NN I-NP
and CC I-NP
rail NN I-NP
costs NNS I-NP
, , O
to TO B-PP
about RB B-NP
$ $ I-NP
150 CD I-NP
billion CD I-NP
, , O
or CC O
about IN B-NP
6.4 CD I-NP
% NN I-NP
of IN B-PP
gross JJ B-NP
national JJ I-NP
product NN I-NP
, , O
down RB B-ADVP
from IN B-PP
8 CD B-NP
% NN I-NP
of IN B-PP
GNP NNP B-NP
in IN B-PP
1981 CD B-NP
. . O
But CC O
with IN B-PP
much NN B-NP
of IN B-PP
the DT B-NP
inefficiency NN I-NP
squeezed VBN B-VP
out IN B-PP
of IN B-PP
the DT B-NP
freight-transport JJ I-NP
system NN I-NP
, , O
rising VBG B-NP
costs NNS I-NP
are VBP B-VP
likely JJ B-ADJP
to TO B-VP
be VB I-VP
reflected VBN I-VP
directly RB B-ADVP
in IN B-PP
higher JJR B-NP
freight NN I-NP
rates NNS I-NP
. . O
`` `` O
Shippers NNS B-NP
are VBP B-VP
saying VBG I-VP
` `` O
the DT B-NP
party NN I-NP
's POS B-VP
over IN B-ADJP
, , O
' '' O
'' '' O
said VBD B-VP
Mr. NNP B-NP
LaLonde NNP I-NP
. . O
`` `` O
Shippers NNS B-NP
wo MD B-VP
n't RB I-VP
be VB I-VP
able JJ B-ADJP
to TO B-VP
look VB I-VP
for IN B-PP
transportation-cost JJ B-NP
savings NNS I-NP
as IN B-SBAR
they PRP B-NP
have VBP B-VP
for IN B-PP
the DT B-NP
last JJ I-NP
eight CD I-NP
or CC I-NP
nine CD I-NP
years NNS I-NP
. . O
Transport NN B-NP
rates NNS I-NP
wo MD B-VP
n't RB I-VP
be VB I-VP
an DT B-NP
opportunity NN I-NP
for IN B-PP
offsetting VBG B-VP
cost NN B-NP
increases NNS I-NP
in IN B-PP
other JJ B-NP
segments NNS I-NP
of IN B-PP
the DT B-NP
economy NN I-NP
. . O
'' '' O
Robert NNP B-NP
Delaney NNP I-NP
, , O
a DT B-NP
consultant NN I-NP
at IN B-PP
Arthur NNP B-NP
D. NNP I-NP
Little NNP I-NP
Inc. NNP I-NP
, , O
Cambridge NNP B-NP
, , O
Mass. NNP B-NP
, , O
said VBD B-VP
`` `` O
We PRP B-NP
've VBP B-VP
gotten VBN I-VP
all PDT B-NP
the DT I-NP
benefits NNS I-NP
of IN B-PP
deregulation NN B-NP
in IN B-PP
freight-cost JJ B-NP
reductions NNS I-NP
. . O
Now RB B-ADVP
we PRP B-NP
are VBP B-VP
starting VBG I-VP
to TO I-VP
see VB I-VP
real JJ B-NP
freight-rate JJ I-NP
increases NNS I-NP
as IN B-SBAR
carriers NNS B-NP
replace VBP B-VP
equipment NN B-NP
, , O
pay VB B-VP
higher JJR B-NP
fuel NN I-NP
costs NNS I-NP
and CC O
pay VB B-VP
more JJR B-NP
for IN B-PP
labor NN B-NP
. . O
You PRP B-NP
'll MD B-VP
see VB I-VP
carriers NNS B-NP
try VB B-VP
to TO I-VP
recoup VB I-VP
some DT B-NP
of IN B-PP
the DT B-NP
price NN I-NP
cutting VBG I-NP
that WDT B-NP
occurred VBD B-VP
previously RB B-ADVP
. . O
'' '' O
Not RB B-NP
everyone NN I-NP
believes VBZ B-VP
that IN B-SBAR
the DT B-NP
good JJ I-NP
times NNS I-NP
are VBP B-VP
over IN B-ADJP
for IN B-PP
shippers NNS B-NP
. . O
`` `` O
There EX B-NP
's VBZ B-VP
still RB B-ADVP
a DT B-NP
lot NN I-NP
of IN B-PP
pressure NN B-NP
on IN B-PP
rates NNS B-NP
in IN B-PP
both DT B-NP
rail NN I-NP
and CC I-NP
truck NN I-NP
, , O
'' '' O
said VBD B-VP
Gerard NNP B-NP
McCullough NNP I-NP
, , O
lecturer NN B-NP
in IN B-PP
transportation NN B-NP
at IN B-PP
Massachusetts NNP B-NP
Institute NNP I-NP
of IN B-PP
Technology NNP B-NP
. . O
Less-than-truckload JJ B-NP
companies NNS I-NP
, , O
which WDT B-NP
carry VBP B-VP
the DT B-NP
freight NN I-NP
of IN B-PP
several JJ B-NP
shippers NNS I-NP
in IN B-PP
each DT B-NP
truck NN I-NP
trailer NN I-NP
, , O
discounted VBD B-VP
away RB B-ADVP
a DT B-NP
4.7 CD I-NP
% NN I-NP
rate NN I-NP
increase NN I-NP
implemented VBD B-VP
last JJ B-NP
April NNP I-NP
. . O
The DT B-NP
carriers NNS I-NP
were VBD B-VP
competing VBG I-VP
fiercely RB B-ADVP
for IN B-PP
market NN B-NP
share NN I-NP
. . O
Railroad-rate JJ B-NP
increases NNS I-NP
are VBP B-VP
likely JJ B-ADJP
to TO B-VP
be VB I-VP
restrained VBN I-VP
by IN B-PP
weakening VBG B-NP
rail-traffic JJ I-NP
levels NNS I-NP
and CC O
keen JJ B-NP
competition NN I-NP
for IN B-PP
freight NN B-NP
from IN B-PP
trucks NNS B-NP
. . O
An DT B-NP
official NN I-NP
at IN B-PP
Consolidated NNP B-NP
Freightways NNP I-NP
Inc. NNP I-NP
, , O
a DT B-NP
Menlo NNP I-NP
Park NNP I-NP
, , I-NP
Calif. NNP I-NP
, , I-NP
less-than-truckload JJ I-NP
carrier NN I-NP
, , O
said VBD B-VP
rate NN B-NP
discounting NN I-NP
in IN B-PP
that DT B-NP
industry NN I-NP
has VBZ B-VP
begun VBN I-VP
to TO I-VP
`` `` O
stabilize VB B-VP
. . O
'' '' O
Consolidated NNP B-NP
Freightways NNP I-NP
plans VBZ B-VP
to TO I-VP
raise VB I-VP
its PRP$ B-NP
rates NNS I-NP
5.3 CD B-NP
% NN I-NP
late JJ B-NP
this DT I-NP
year NN I-NP
or CC O
early JJ B-NP
next JJ I-NP
year NN I-NP
, , O
and CC O
at IN B-NP
least JJS I-NP
two CD I-NP
competitors NNS I-NP
have VBP B-VP
announced VBN I-VP
similar JJ B-NP
increases NNS I-NP
. . O
Truckers NNS B-NP
are VBP B-VP
`` `` O
trying VBG B-VP
to TO I-VP
send VB I-VP
signals NNS B-NP
that IN B-SBAR
they PRP B-NP
need VBP B-VP
to TO I-VP
stop VB I-VP
the DT B-NP
bloodletting NN I-NP
, , O
forget VB B-VP
about IN B-PP
market NN B-NP
share NN I-NP
and CC O
go VB B-VP
for IN B-PP
higher JJR B-NP
rates NNS I-NP
, , O
'' '' O
said VBD B-VP
Michael NNP B-NP
Lloyd NNP I-NP
, , O
an DT B-NP
analyst NN I-NP
at IN B-PP
Salomon NNP B-NP
Bros NNP I-NP
. . O
And CC O
`` `` O
shippers NNS B-NP
are VBP B-VP
getting VBG I-VP
the DT B-NP
feeling NN I-NP
that IN B-SBAR
they PRP B-NP
have VBP B-VP
played VBN I-VP
one CD B-NP
trucker NN I-NP
off IN B-ADVP
against IN B-PP
another DT B-NP
as RB B-NP
much JJ I-NP
as IN B-SBAR
they PRP B-NP
can MD B-VP
, , O
'' '' O
he PRP B-NP
said VBD B-VP
. . O
Air-freight NN B-NP
carriers NNS I-NP
raised VBD B-VP
their PRP$ B-NP
rates NNS I-NP
for IN B-PP
U.S. NNP B-NP
products NNS I-NP
going VBG B-VP
across IN B-PP
the DT B-NP
Pacific NNP I-NP
to TO B-PP
Asia NNP B-NP
by IN B-PP
about IN B-NP
20 CD I-NP
% NN I-NP
earlier RBR B-NP
this DT I-NP
month NN I-NP
. . O
And CC O
Japan NNP B-NP
Air NNP I-NP
Lines NNPS I-NP
said VBD B-VP
it PRP B-NP
plans VBZ B-VP
to TO I-VP
boost VB I-VP
its PRP$ B-NP
rates NNS I-NP
a DT B-NP
further JJ I-NP
25 CD I-NP
% NN I-NP
over IN B-PP
the DT B-NP
next JJ I-NP
two CD I-NP
years NNS I-NP
. . O
Such JJ B-NP
rate NN I-NP
increases NNS I-NP
`` `` O
will MD B-VP
increase VB I-VP
the DT B-NP
total JJ I-NP
cost NN I-NP
of IN B-PP
U.S. NNP B-NP
products NNS I-NP
and CC O
slow JJ B-VP
down RP B-PRT
the DT B-NP
rate NN I-NP
of IN B-PP
increase NN B-NP
of IN B-PP
U.S. NNP B-NP
exports NNS I-NP
, , O
'' '' O
said VBD B-VP
Richard NNP B-NP
Connors NNP I-NP
, , O
a DT B-NP
senior JJ I-NP
vice NN I-NP
president NN I-NP
of IN B-PP
Yusen NNP B-NP
Air NNP I-NP
& CC I-NP
Sea NNP I-NP
Service NNP I-NP
U.S.A. NNP I-NP
Inc. NNP I-NP
, , O
the DT B-NP
U.S. NNP I-NP
air-freight-forwarding JJ I-NP
subsidiary NN I-NP
of IN B-PP
Nippon NNP B-NP
Yusen NNP I-NP
Kaisha NNP I-NP
of IN B-PP
Japan NNP B-NP
. . O
Ship NN B-NP
companies NNS I-NP
carrying VBG B-VP
bulk NN B-NP
commodities NNS I-NP
, , O
such JJ B-PP
as IN I-PP
oil NN B-NP
, , O
grain NN B-NP
, , O
coal NN B-NP
and CC O
iron NN B-NP
ore NN I-NP
, , O
have VBP B-VP
been VBN I-VP
able JJ B-ADJP
to TO B-VP
increase VB I-VP
their PRP$ B-NP
rates NNS I-NP
in IN B-PP
the DT B-NP
last JJ I-NP
couple NN I-NP
of IN B-PP
years NNS B-NP
. . O
Some DT B-NP
bulk NN I-NP
shipping VBG I-NP
rates NNS I-NP
have VBP B-VP
increased VBN I-VP
`` `` O
3 CD B-NP
% NN I-NP
to TO I-NP
4 CD I-NP
% NN I-NP
in IN B-PP
the DT B-NP
past JJ I-NP
few JJ I-NP
months NNS I-NP
, , O
'' '' O
said VBD B-VP
Salomon NNP B-NP
's POS B-NP
Mr. NNP I-NP
Lloyd NNP I-NP
. . O
And CC O
ship NN B-NP
lines NNS I-NP
carrying VBG B-VP
containers NNS B-NP
are VBP B-VP
also RB I-VP
trying VBG I-VP
to TO I-VP
raise VB I-VP
their PRP$ B-NP
rates NNS I-NP
. . O
Carriers NNP B-NP
boosted VBD B-VP
rates NNS B-NP
more JJR B-NP
than IN I-NP
10 CD I-NP
% NN I-NP
in IN B-PP
the DT B-NP
North NNP I-NP
Atlantic NNP I-NP
between IN B-PP
the DT B-NP
U.S. NNP I-NP
and CC O
Europe NNP B-NP
last JJ B-NP
September NNP I-NP
, , O
hoping VBG B-VP
to TO I-VP
partly RB I-VP
restore VB I-VP
rates NNS B-NP
to TO B-PP
earlier JJR B-NP
levels NNS I-NP
. . O
Ship NN B-NP
lines NNS I-NP
operating VBG B-VP
in IN B-PP
the DT B-NP
Pacific NNP I-NP
plan NN B-VP
to TO I-VP
raise VB I-VP
rates NNS B-NP
on IN B-PP
containers NNS B-NP
carrying VBG B-VP
U.S. NNP B-NP
exports NNS I-NP
to TO B-PP
Asia NNP B-NP
about IN B-NP
10 CD I-NP
% NN I-NP
, , O
effective JJ B-ADJP
next JJ B-NP
April NNP I-NP
. . O
MGM NNP B-NP
Grand NNP I-NP
Inc. NNP I-NP
said VBD B-VP
it PRP B-NP
filed VBD B-VP
a DT B-NP
registration NN I-NP
statement NN I-NP
with IN B-PP
the DT B-NP
Securities NNP I-NP
and CC I-NP
Exchange NNP I-NP
Commission NNP I-NP
for IN B-PP
a DT B-NP
public JJ I-NP
offering NN I-NP
of IN B-PP
six CD B-NP
million CD I-NP
common JJ I-NP
shares NNS I-NP
. . O
The DT B-NP
Beverly NNP I-NP
Hills NNP I-NP
, , I-NP
Calif.-based JJ I-NP
company NN I-NP
said VBD B-VP
it PRP B-NP
would MD B-VP
have VB I-VP
26.9 CD B-NP
million CD I-NP
common JJ I-NP
shares NNS I-NP
outstanding JJ B-ADJP
after IN B-PP
the DT B-NP
offering NN I-NP
. . O
The DT B-NP
hotel NN I-NP
and CC I-NP
Gaming NNP I-NP
company NN I-NP
said VBD B-VP
Merrill NNP B-NP
Lynch NNP I-NP
Capital NNP I-NP
Markets NNPS I-NP
will MD B-VP
lead VB I-VP
the DT B-NP
underwriters NNS I-NP
. . O
Proceeds NNS B-NP
from IN B-PP
the DT B-NP
sale NN I-NP
will MD B-VP
be VB I-VP
used VBN I-VP
for IN B-PP
remodeling VBG B-NP
and CC I-NP
refurbishing VBG I-NP
projects NNS I-NP
, , B-PP
as RB I-PP
well RB I-PP
as IN I-PP
for IN B-PP
the DT B-NP
planned VBN I-NP
MGM NNP I-NP
Grand NNP I-NP
hotel\/casino NN I-NP
and CC I-NP
theme NN I-NP
park NN I-NP
. . O
Bob NNP B-NP
Stone NNP I-NP
stewed JJ B-VP
over IN B-PP
a DT B-NP
letter NN I-NP
from IN B-PP
his PRP$ B-NP
manager NN I-NP
putting VBG B-VP
him PRP B-NP
on IN B-PP
probation NN B-NP
for IN B-PP
insubordination NN B-NP
. . O
Mr. NNP B-NP
Stone NNP I-NP
thought VBD B-VP
the DT B-NP
discipline NN I-NP
was VBD B-VP
unfair JJ B-ADJP
; : O
he PRP B-NP
believed VBD B-VP
that IN B-SBAR
his PRP$ B-NP
manager NN I-NP
wanted VBD B-VP
to TO I-VP
get VB I-VP
rid JJ B-ADJP
of IN B-PP
him PRP B-NP
for IN B-PP
personal JJ B-NP
reasons NNS I-NP
. . O
Unable JJ B-ADJP
to TO B-VP
persuade VB I-VP
the DT B-NP
manager NN I-NP
to TO B-VP
change VB I-VP
his PRP$ B-NP
decision NN I-NP
, , O
he PRP B-NP
went VBD B-VP
to TO B-PP
a DT B-NP
`` `` I-NP
company NN I-NP
court NN I-NP
'' '' O
for IN B-PP
a DT B-NP
hearing NN I-NP
. . O
At IN B-PP
the DT B-NP
scheduled VBN I-NP
time NN I-NP
, , O
Mr. NNP B-NP
Stone NNP I-NP
entered VBD B-VP
a DT B-NP
conference NN I-NP
room NN I-NP
in IN B-PP
a DT B-NP
building NN I-NP
near IN B-PP
where WRB B-ADVP
he PRP B-NP
worked VBD B-VP
. . O
After IN B-SBAR
the DT B-NP
three CD I-NP
members NNS I-NP
of IN B-PP
the DT B-NP
court NN I-NP
introduced VBD B-VP
themselves PRP B-NP
, , O
the DT B-NP
chairman NN I-NP
of IN B-PP
the DT B-NP
panel NN I-NP
said VBD B-VP
: : O
`` `` O
Go VB B-VP
ahead RB B-ADVP
and CC O
tell VB B-VP
us PRP B-NP
what WP B-NP
happened VBD B-VP
. . O
We PRP B-NP
may MD B-VP
ask VB I-VP
questions NNS B-NP
as IN B-SBAR
you PRP B-NP
go VBP B-VP
along IN B-PRT
, , O
or CC O
we PRP B-NP
may MD B-VP
wait VB I-VP
until IN B-PP
the DT B-NP
end NN I-NP
. . O
'' '' O
No DT B-NP
lawyers NNS I-NP
or CC I-NP
tape NN I-NP
recorders NNS I-NP
were VBD B-VP
present JJ B-ADJP
. . O
The DT B-NP
only RB I-NP
extra JJ I-NP
people NNS I-NP
were VBD B-VP
a DT B-NP
couple NN I-NP
of IN B-PP
personnel NNS B-NP
specialists NNS I-NP
, , O
one CD B-NP
of IN B-PP
whom WP B-NP
knew VBD B-VP
Mr. NNP B-NP
Stone NNP I-NP
's POS B-NP
case NN I-NP
intimately RB B-ADVP
and CC O
would MD B-VP
help VB I-VP
fill VB I-VP
in IN B-PRT
any DT B-NP
facts NNS I-NP
needed VBN B-VP
to TO B-VP
give VB I-VP
the DT B-NP
court NN I-NP
the DT B-NP
full JJ I-NP
picture NN I-NP
. . O
Over IN B-PP
a DT B-NP
cup NN I-NP
of IN B-PP
coffee NN B-NP
, , O
Mr. NNP B-NP
Stone NNP I-NP
told VBD B-VP
his PRP$ B-NP
story NN I-NP
. . O
He PRP B-NP
talked VBD B-VP
about IN B-NP
20 CD I-NP
minutes NNS I-NP
. . O
When WRB B-ADVP
he PRP B-NP
was VBD B-VP
through IN B-ADJP
, , O
the DT B-NP
court NN I-NP
members NNS I-NP
asked VBD B-VP
many JJ B-NP
questions NNS I-NP
, , O
then RB B-ADVP
the DT B-NP
chairman NN I-NP
said VBD B-VP
they PRP B-NP
would MD B-VP
like VB I-VP
to TO I-VP
hear VB I-VP
his PRP$ B-NP
manager NN I-NP
's POS B-NP
side NN I-NP
and CC O
talk VB B-VP
to TO B-PP
witnesses NNS B-NP
. . O
The DT B-NP
chairman NN I-NP
promised VBD B-VP
Mr. NNP B-NP
Stone NNP I-NP
a DT B-NP
decision NN I-NP
within IN B-PP
two CD B-NP
weeks NNS I-NP
. . O
Bob NNP B-NP
Stone NNP I-NP
is VBZ B-VP
a DT B-NP
fictional JJ I-NP
name NN I-NP
, , O
but CC O
the DT B-NP
incident NN I-NP
described VBN B-VP
is VBZ B-VP
real JJ B-ADJP
. . O
It PRP B-NP
happened VBD B-VP
at IN B-PP
Northrop NNP B-NP
Corp. NNP I-NP
in IN B-PP
Los NNP B-NP
Angeles NNP I-NP
. . O
The DT B-NP
court NN I-NP
is VBZ B-VP
called VBN I-VP
the DT B-NP
Management NNP I-NP
Appeals NNP I-NP
Committee NNP I-NP
, , O
or CC O
just RB B-NP
`` `` I-NP
MAC NNP I-NP
, , O
'' '' O
and CC O
it PRP B-NP
is VBZ B-VP
likely JJ B-ADJP
to TO B-VP
hear VB I-VP
a DT B-NP
couple NN I-NP
of IN I-NP
dozen NN I-NP
cases VBZ I-NP
a DT B-NP
year NN I-NP
. . O
Alter VB B-VP
some DT B-NP
details NNS I-NP
of IN B-PP
this DT B-NP
example NN I-NP
and CC O
it PRP B-NP
could MD B-VP
be VB I-VP
taking VBG I-VP
place NN B-NP
today NN B-ADVP
at IN B-PP
Federal NNP B-NP
Express NNP I-NP
in IN B-PP
Memphis NNP B-NP
, , O
the DT B-NP
Defense NNP I-NP
and CC I-NP
Underseas NNP I-NP
Systems NNP I-NP
divisions NNS I-NP
of IN B-PP
Honeywell NNP B-NP
in IN B-PP
Minneapolis NNP B-NP
, , O
a DT B-NP
General NNP I-NP
Electric NNP I-NP
plant NN I-NP
in IN B-PP
Columbia NNP B-NP
, , O
Md. NNP B-NP
, , O
or CC O
a DT B-NP
number NN I-NP
of IN B-PP
other JJ B-NP
companies NNS I-NP
. . O
These DT B-NP
firms NNS I-NP
are VBP B-VP
pioneers NNS B-NP
in IN B-PP
a DT B-NP
significant JJ I-NP
new JJ I-NP
trend NN I-NP
in IN B-PP
the DT B-NP
corporate JJ I-NP
world NN I-NP
: : O
the DT B-NP
rise NN I-NP
of IN B-PP
what WP B-NP
I PRP B-NP
call VBP B-VP
corporate JJ B-NP
due JJ I-NP
process NN I-NP
. . O
Although IN B-SBAR
corporate JJ B-NP
due JJ I-NP
process NN I-NP
is VBZ B-VP
practiced VBN I-VP
today NN B-NP
in IN B-PP
few JJ B-NP
companies NNS I-NP
-- : O
perhaps RB B-ADVP
40 CD B-NP
to TO I-NP
60 CD I-NP
-- : O
it PRP B-NP
is VBZ B-VP
one CD B-NP
of IN B-PP
the DT B-NP
fastest JJS I-NP
developing VBG I-NP
trends NNS I-NP
in IN B-PP
industry NN B-NP
. . O
In IN B-PP
the DT B-NP
coming VBG I-NP
decade NN I-NP
a DT B-NP
majority NN I-NP
of IN B-PP
people-oriented JJ B-NP
companies NNS I-NP
are VBP B-VP
likely JJ B-ADJP
to TO B-VP
adopt VB I-VP
it PRP B-NP
. . O
Corporate JJ B-NP
due JJ I-NP
process NN I-NP
appeals NNS B-VP
to TO B-PP
management NN B-NP
for IN B-PP
a DT B-NP
variety NN I-NP
of IN B-PP
reasons NNS B-NP
. . O
It PRP B-NP
reduces VBZ B-VP
lawsuits NNS B-NP
from IN B-PP
disgruntled JJ B-NP
employees NNS I-NP
and CC I-NP
ex-employees NNS I-NP
, , O
with IN B-PP
all DT B-NP
that WDT B-NP
means VBZ B-VP
for IN B-PP
reduced VBN B-NP
legal JJ I-NP
costs NNS I-NP
and CC O
better RBR B-NP
public JJ I-NP
relations NNS I-NP
. . O
It PRP B-NP
helps VBZ B-VP
to TO I-VP
keep VB I-VP
out IN B-PRT
unions NNS B-NP
. . O
It PRP B-NP
increases VBZ B-VP
employee NN B-NP
commitment NN I-NP
to TO B-PP
the DT B-NP
company NN I-NP
, , O
with IN B-PP
all DT B-NP
that WDT B-NP
means VBZ B-VP
for IN B-PP
efficiency NN B-NP
and CC O
quality NN B-NP
control NN I-NP
. . O
What WP B-NP
must MD O
your PRP$ B-NP
management NN I-NP
team NN I-NP
do VBP B-VP
to TO B-VP
establish VB I-VP
corporate JJ B-NP
due JJ I-NP
process NN I-NP
? . O
Here RB B-ADVP
are VBP B-VP
four CD B-NP
key JJ I-NP
steps NNS I-NP
: : O
1 CD B-LST
. . O
Make VB B-VP
sure JJ B-ADJP
you PRP B-NP
have VBP B-VP
a DT B-NP
strong JJ I-NP
personnel NNS I-NP
department NN I-NP
. . O
It PRP B-NP
must MD B-VP
be VB I-VP
able JJ B-ADJP
to TO B-VP
handle VB I-VP
most RBS B-NP
of IN B-PP
the DT B-NP
complaints NNS I-NP
that WDT B-NP
can MD B-VP
not RB I-VP
be VB I-VP
solved VBN I-VP
in IN B-PP
the DT B-NP
trenches NNS I-NP
by IN B-PP
managers NNS B-NP
and CC O
their PRP$ B-NP
subordinates NNS I-NP
, , O
else RB B-ADVP
the DT B-NP
company NN I-NP
court NN I-NP
or CC I-NP
adjudicators NNS I-NP
will MD B-VP
be VB B-VP
inundated VBN I-VP
with IN B-PP
cases NNS B-NP
. . O
At IN B-PP
Polaroid NNP B-NP
, , O
the DT B-NP
Personnel NNP I-NP
Policy NNP I-NP
Planning NNP I-NP
Committee NNP I-NP
may MD B-VP
hear VB I-VP
only RB B-NP
about IN I-NP
20 CD I-NP
cases VBZ I-NP
a DT B-NP
year NN I-NP
; : O
the DT B-NP
rest NN I-NP
of IN B-PP
the DT B-NP
many JJ I-NP
hundreds NNS I-NP
of IN B-PP
complaints NNS B-NP
are VBP B-VP
resolved VBN I-VP
at IN B-PP
earlier JJR B-NP
stages NNS I-NP
. . O
At IN B-PP
TWA NNP B-NP
, , O
the DT B-NP
System NNP I-NP
Board NNP I-NP
of IN B-PP
Adjustment NNP B-NP
hears VBZ B-VP
50 CD B-NP
to TO I-NP
75 CD I-NP
cases VBZ I-NP
a DT B-NP
year NN I-NP
, , O
only RB B-NP
a DT I-NP
fraction NN I-NP
of IN B-PP
the DT B-NP
complaints NNS I-NP
brought VBN B-VP
to TO B-PP
personnel NNS B-NP
specialists NNS I-NP
. . O
At IN B-PP
Citicorp NNP B-NP
, , O
the DT B-NP
Problem NNP I-NP
Review NNP I-NP
Board NNP I-NP
may MD B-VP
hear VB I-VP
only RB B-NP
12 CD I-NP
or CC I-NP
so RB I-NP
cases VBZ I-NP
because IN B-PP
of IN I-PP
personnel NNS B-NP
's POS B-NP
skill NN I-NP
in IN B-PP
complaint-resolution NN B-NP
. . O
In IN B-PP
a DT B-NP
typical JJ I-NP
year NN I-NP
, , O
up IN B-NP
to TO I-NP
20 CD I-NP
% NN I-NP
of IN B-PP
the DT B-NP
work NN I-NP
force NN I-NP
goes VBZ B-VP
to TO B-PP
personnel NNS B-NP
specialists NNS I-NP
with IN B-PP
complaints NNS B-NP
of IN B-PP
unfair JJ B-NP
treatment NN I-NP
. . O
In IN B-PP
a DT B-NP
large JJ I-NP
company NN I-NP
that WDT B-NP
means VBZ B-VP
many JJ B-NP
hundreds NNS I-NP
of IN B-PP
complaints NNS B-NP
for IN B-PP
personnel NNS B-NP
to TO B-VP
handle VB I-VP
. . O
2 CD B-LST
. . O
Formally RB B-ADVP
or CC I-ADVP
informally RB I-ADVP
, , O
train NN B-VP
all DT B-NP
your PRP$ I-NP
managers NNS I-NP
and CC I-NP
supervisors NNS I-NP
in IN B-PP
the DT B-NP
company NN I-NP
's POS B-NP
due-process NN I-NP
approach NN I-NP
. . O
See VB B-VP
that IN B-SBAR
they PRP B-NP
know VBP B-VP
company NN B-NP
personnel NNS I-NP
policy NN I-NP
backwards RB B-ADVP
and CC I-ADVP
forwards RB I-ADVP
, , O
for IN O
it PRP B-NP
is VBZ B-VP
the DT B-NP
`` `` I-NP
law NN I-NP
'' '' O
governing VBG B-VP
company NN B-NP
courts NNS I-NP
and CC I-NP
adjudicators NNS I-NP
. . O
Coach NNP B-VP
them PRP B-NP
in IN B-PP
handling NN B-VP
complaints NNS B-NP
so RB B-SBAR
that IN I-SBAR
they PRP B-NP
can MD B-VP
resolve VB I-VP
problems NNS B-NP
immediately RB B-ADVP
. . O
In IN B-SBAR
case NN O
managers NNS B-NP
and CC O
personnel NNS B-NP
specialists NNS I-NP
are VBP B-VP
unsuccessful JJ B-ADJP
and CC O
subordinates NNS B-NP
take VBP B-VP
their PRP$ B-NP
complaints NNS I-NP
to TO B-PP
a DT B-NP
company NN I-NP
court NN I-NP
or CC I-NP
adjudicator NN I-NP
, , O
teach VB B-VP
managers NNS B-NP
to TO B-VP
accept VB I-VP
reversals NNS B-NP
as IN B-PP
a DT B-NP
fact NN I-NP
of IN B-PP
business NN B-NP
life NN I-NP
, , O
for IN O
in IN B-PP
a DT B-NP
good JJ I-NP
due-process NN I-NP
system NN I-NP
they PRP B-NP
are VBP B-VP
bound VBN I-VP
to TO I-VP
happen VB I-VP
. . O
In IN B-PP
the DT B-NP
15 CD I-NP
companies NNS I-NP
I PRP B-NP
studied VBD B-VP
, , O
reversal NN B-NP
rates NNS I-NP
range VBP B-VP
on IN B-PP
the DT B-NP
average NN I-NP
from IN B-PP
20 CD B-NP
% NN I-NP
to TO B-PP
40 CD B-NP
% NN I-NP
. . O
3 CD B-LST
. . O
Decide VB B-VP
whether IN O
you PRP B-NP
want VBP B-VP
a DT B-NP
panel NN I-NP
system NN I-NP
or CC O
a DT B-NP
single JJ I-NP
adjudicator NN I-NP
. . O
A DT B-NP
panel NN I-NP
system NN I-NP
like IN B-PP
that DT B-NP
in NN B-PP
the DT B-NP
Bob NNP I-NP
Stone NNP I-NP
example NN I-NP
enjoys VBZ B-VP
such JJ B-NP
advantages NNS I-NP
as IN B-PP
high JJ B-NP
credibility NN I-NP
and CC O
, , O
for IN B-PP
the DT B-NP
panelists NNS I-NP
, , O
mutual JJ B-NP
support NN I-NP
. . O
An DT B-NP
adjudicator NN I-NP
system NN I-NP
-- : O
that DT B-INTJ
is VBZ I-INTJ
, , O
an DT B-NP
investigator NN I-NP
who WP B-NP
acts VBZ B-VP
first JJ B-ADVP
as IN B-PP
a DT B-NP
fact-finder NN I-NP
and CC O
then RB O
switches VBZ B-VP
hats NNS B-NP
and CC O
arbitrates VBZ B-VP
the DT B-NP
facts NNS I-NP
-- : O
has VBZ B-VP
such JJ B-NP
advantages NNS I-NP
as IN B-PP
speed NN B-NP
, , O
flexibility NN B-NP
and CC O
maximum JJ B-NP
privacy NN I-NP
. . O
International NNP B-NP
Business NNP I-NP
Machines NNPS I-NP
and CC O
Bank NNP B-NP
of IN B-PP
America NNP B-NP
are VBP B-VP
among IN B-PP
the DT B-NP
companies NNS I-NP
using VBG B-VP
the DT B-NP
single-adjudicator JJ I-NP
approach NN I-NP
. . O
4 CD B-LST
. . O
Make VB B-VP
your PRP$ B-NP
due-process NN I-NP
system NN I-NP
visible JJ B-ADJP
. . O
It PRP B-NP
wo MD B-VP
n't RB I-VP
do VB I-VP
any DT B-NP
good NN I-NP
for IN B-PP
anybody NN B-NP
unless IN B-SBAR
employees NNS B-NP
know VBP B-VP
about IN B-PP
it PRP B-NP
. . O
Most JJS B-NP
managements NNS I-NP
hesitate VBP B-VP
to TO I-VP
go VB I-VP
all DT B-ADVP
out NN I-ADVP
in IN B-PP
advertising VBG B-VP
their PRP$ B-NP
due-process NN I-NP
systems NNS I-NP
for IN B-PP
fear NN B-NP
of IN B-PP
encouraging VBG B-VP
cranks NNS B-NP
and CC O
chronic JJ B-NP
soreheads NNS I-NP
to TO B-VP
file VB I-VP
complaints NNS B-NP
. . O
On IN B-PP
the DT B-NP
other JJ I-NP
hand NN I-NP
, , O
they PRP B-NP
make VBP B-VP
sure JJ B-ADJP
at IN B-PP
a DT B-NP
minimum NN I-NP
that IN B-SBAR
their PRP$ B-NP
systems NNS I-NP
are VBP B-VP
described VBN I-VP
in IN B-PP
their PRP$ B-NP
employee NN I-NP
handbooks NNS I-NP
and CC O
talked VBD B-VP
up IN B-PRT
by IN B-PP
personnel NNS B-NP
specialists NNS I-NP
. . O
Smith-Kline NNP B-NP
Beecham NNP I-NP
goes VBZ B-VP
further JJ B-ADVP
and CC O
sometimes RB B-VP
features VBZ I-VP
its PRP$ B-NP
grievance NN I-NP
procedure NN I-NP
in IN B-PP
closed-circuit JJ B-NP
TV NN I-NP
programs NNS I-NP
. . O
Naturally RB B-ADVP
, , O
one CD B-NP
of IN B-PP
the DT B-NP
best JJS I-NP
ways NNS I-NP
to TO B-VP
guarantee VB I-VP
visibility NN B-NP
for IN B-PP
your PRP$ B-NP
due-process NN I-NP
system NN I-NP
is VBZ B-VP
for IN B-SBAR
top JJ B-NP
management NN I-NP
to TO B-VP
support VB I-VP
it PRP B-NP
. . O
At IN B-PP
IBM NNP B-NP
, , O
the DT B-NP
company NN I-NP
's POS B-NP
Open NNP I-NP
Door NNP I-NP
system NN I-NP
is VBZ B-VP
sometimes RB B-ADVP
the DT B-NP
subject NN I-NP
of IN B-PP
memorandums NNS B-NP
from IN B-PP
the DT B-NP
chief JJ I-NP
executive NN I-NP
. . O
Federal NNP B-NP
Express NNP I-NP
goes VBZ B-VP
further JJ B-ADVP
in IN B-PP
this DT B-NP
respect NN I-NP
than IN B-PP
any DT B-NP
company NN I-NP
I PRP B-NP
know VBP B-VP
of IN B-PP
with IN B-PP
both DT B-NP
Frederick NNP B-NP
Smith NNP I-NP
and CC O
James NNP B-NP
Barksdale NNP I-NP
, , O
chief JJ B-NP
executive NN I-NP
and CC O
chief JJ B-NP
operating VBG I-NP
officer NN I-NP
, , O
respectively RB B-ADVP
, , O
sitting VBG B-VP
in IN B-PRT
on IN B-PP
the DT B-NP
Appeals NNP I-NP
Board NNP I-NP
almost RB B-NP
every DT I-NP
Tuesday NNP I-NP
to TO B-VP
decide VB I-VP
cases NNS B-NP
. . O
Mr. NNP B-NP
Ewing NNP I-NP
is VBZ B-VP
a DT B-NP
consultant NN I-NP
based VBN B-VP
in IN B-PP
Winchester NNP B-NP
, , O
Mass. NNP B-NP
, , O
and CC O
author NN B-NP
of IN B-PP
`` `` O
Justice NNP B-NP
on IN B-PP
the DT B-NP
Job NNP I-NP
: : O
Resolving NNP B-VP
Grievances NNP B-NP
in IN B-PP
the DT B-NP
Nonunion NNP I-NP
Workplace NN I-NP
'' '' O
-LRB- ( O
Harvard NNP B-NP
Business NNP I-NP
School NNP I-NP
Press NNP I-NP
, , O
1989 CD B-NP
-RRB- ) O
. . O
Tokyo NNP B-NP
stocks NNS I-NP
closed VBD B-VP
higher JJR B-ADVP
in IN B-PP
active JJ B-NP
trading NN I-NP
Friday NNP B-NP
, , O
marking VBG B-VP
the DT B-NP
fourth JJ I-NP
consecutive JJ I-NP
daily JJ I-NP
gain NN I-NP
since IN B-PP
Monday NNP B-NP
's POS B-NP
sharp JJ I-NP
fall NN I-NP
. . O
London JJ B-NP
shares NNS I-NP
closed VBD B-VP
moderately RB B-ADVP
lower JJR I-ADVP
in IN B-PP
thin JJ B-NP
trading NN I-NP
. . O
At IN B-PP
Tokyo NNP B-NP
, , O
the DT B-NP
Nikkei NNP I-NP
index NN I-NP
of IN B-PP
225 CD B-NP
selected VBN I-NP
issues NNS I-NP
was VBD B-VP
up IN B-ADVP
112.16 CD B-NP
points NNS I-NP
to TO B-PP
35486.38 CD B-NP
. . O
The DT B-NP
index NN I-NP
advanced VBD B-VP
266.66 CD B-NP
points NNS I-NP
Thursday NNP B-NP
. . O
In IN B-PP
early JJ B-NP
trading NN I-NP
in IN B-PP
Tokyo NNP B-NP
Monday NNP B-NP
, , O
the DT B-NP
Nikkei NNP I-NP
index NN I-NP
rose VBD B-VP
101.98 CD B-NP
points NNS I-NP
to TO B-PP
35588.36 CD B-NP
. . O
Friday NNP B-NP
's POS B-NP
volume NN I-NP
on IN B-PP
the DT B-NP
First NNP I-NP
Section NN I-NP
was VBD B-VP
estimated VBN I-VP
at IN B-PP
one CD B-NP
billion CD I-NP
shares NNS I-NP
, , O
up IN B-ADVP
from IN B-PP
862 CD B-NP
million CD I-NP
Thursday NNP B-NP
. . O
Winners NNS B-NP
outpaced VBD B-VP
losers NNS B-NP
, , O
572 CD B-ADVP
to TO I-ADVP
368 CD I-ADVP
, , O
while IN B-SBAR
181 CD B-NP
issues NNS I-NP
remained VBD B-VP
unchanged JJ B-ADJP
. . O
With IN B-SBAR
investors NNS B-NP
relieved VBN B-ADJP
at IN B-PP
the DT B-NP
overnight JJ I-NP
gain NN I-NP
in IN B-PP
New NNP B-NP
York NNP I-NP
stocks NNS I-NP
, , O
small-lot JJ B-NP
buying NN I-NP
orders NNS I-NP
streamed VBD B-VP
into IN B-PP
the DT B-NP
market NN I-NP
from IN B-PP
early JJ B-NP
morning NN I-NP
, , O
making VBG B-VP
traders NNS B-NP
believe VBP B-VP
the DT B-NP
market NN I-NP
was VBD B-VP
back RB B-ADVP
to TO B-PP
normal JJ B-NP
. . O
The DT B-NP
Nikkei NNP I-NP
, , O
which WDT B-NP
reached VBD B-VP
as RB B-ADJP
high JJ I-ADJP
as IN B-PP
35611.38 CD B-NP
right NN B-ADVP
after IN B-PP
the DT B-NP
opening NN I-NP
, , O
surrendered VBD B-VP
part NN B-NP
of IN B-PP
its PRP$ B-NP
early JJ I-NP
advance NN I-NP
toward IN B-PP
the DT B-NP
end NN I-NP
of IN B-PP
the DT B-NP
day NN I-NP
because IN B-PP
of IN I-PP
profit-taking NN B-NP
. . O
`` `` O
Investors NNS B-NP
, , B-NP
especially RB I-NP
dealers NNS B-NP
, , O
do VBP B-VP
n't RB I-VP
want VB I-VP
to TO I-VP
hold VB I-VP
a DT B-NP
position NN I-NP
over IN B-PP
the DT B-NP
weekend NN I-NP
, , O
'' '' O
a DT B-NP
trader NN I-NP
at IN B-PP
Dai-ichi NNP B-NP
Securities NNP I-NP
said VBD B-VP
, , O
adding VBG B-VP
, , O
though RB B-ADVP
, , O
that IN B-SBAR
the DT B-NP
trading NN I-NP
mood NN I-NP
remained VBD B-VP
positive JJ B-ADJP
through IN B-PP
the DT B-NP
afternoon NN I-NP
session NN I-NP
. . O
The DT B-NP
Tokyo NNP I-NP
Stock NNP I-NP
Price NNP I-NP
Index NNP I-NP
-LRB- ( O
Topix NNP B-NP
-RRB- ) O
of IN B-PP
all DT B-NP
issues NNS I-NP
listed VBN B-VP
in IN B-PP
the DT B-NP
First NNP I-NP
Section NN I-NP
, , O
which WDT B-NP
gained VBD B-VP
22.78 CD B-NP
points NNS I-NP
Thursday NNP B-NP
, , O
was VBD B-VP
up IN B-ADVP
14.06 CD B-NP
points NNS I-NP
, , O
or CC O
0.53 CD B-NP
% NN I-NP
, , O
at IN B-PP
2679.72 CD B-NP
. . O
The DT B-NP
Second JJ I-NP
Section NN I-NP
index NN I-NP
, , O
which WDT B-NP
rose VBD B-VP
15.72 CD B-NP
points NNS I-NP
Thursday NNP B-NP
, , O
was VBD B-VP
up IN B-ADVP
11.88 CD B-NP
points NNS I-NP
, , O
or CC O
0.32 CD B-NP
% NN I-NP
, , O
to TO B-VP
close VB I-VP
at IN B-PP
3717.46 CD B-NP
. . O
Volume NN B-NP
in IN B-PP
the DT B-NP
second JJ I-NP
section NN I-NP
was VBD B-VP
estimated VBN I-VP
at IN B-PP
30 CD B-NP
million CD I-NP
shares NNS I-NP
, , O
up IN B-ADVP
from IN B-PP
28 CD B-NP
million CD I-NP
Thursday NNP B-NP
. . O
In IN B-PP
turmoil NN B-NP
caused VBN B-VP
by IN B-PP
the DT O
previous JJ B-NP
Friday NNP I-NP
's POS B-NP
plunge NN I-NP
in IN B-PP
New NNP B-NP
York NNP I-NP
stocks NNS I-NP
, , O
the DT B-NP
Nikkei NNP I-NP
marked VBD B-VP
a DT B-NP
sharp JJ I-NP
647.33-point JJ I-NP
fall NN I-NP
Monday NNP B-NP
. . O
But CC O
the DT B-NP
Nikkei NNP I-NP
fell VBD B-VP
an DT B-NP
overall JJ I-NP
1.8 CD I-NP
% NN I-NP
in IN B-PP
value NN B-NP
that DT B-NP
day NN I-NP
compared VBN B-PP
with IN B-PP
Wall NNP B-NP
Street NNP I-NP
's POS I-NP
far RB B-ADJP
sharper JJR I-ADJP
6.9 CD B-ADJP
% NN I-ADJP
drop NN B-NP
on IN B-PP
Oct. NNP B-NP
13 CD I-NP
. . O
The DT B-NP
Tokyo NNP I-NP
market NN I-NP
's POS B-NP
resiliency NN I-NP
helped VBD B-VP
participants NNS B-NP
to TO B-VP
regain VB I-VP
confidence NN B-NP
gradually RB B-ADVP
as IN B-SBAR
they PRP B-NP
spent VBD B-VP
more JJR B-NP
time NN I-NP
on IN B-PP
analyzing VBG B-VP
factors NNS B-NP
that WDT B-NP
caused VBD B-VP
the DT B-NP
Friday NNP I-NP
plunge NN I-NP
and CC O
realized VBD B-VP
these DT B-NP
problems NNS I-NP
were VBD B-VP
unique JJ B-ADJP
to TO B-PP
New NNP B-NP
York NNP I-NP
stocks NNS I-NP
and CC B-ADJP
not RB I-ADJP
directly RB B-ADJP
related VBN I-ADJP
to TO B-PP
Tokyo NNP B-NP
. . O
The DT B-NP
Nikkei NNP I-NP
continued VBD B-VP
to TO I-VP
gain VB I-VP
for IN B-PP
the DT B-NP
rest NN I-NP
of IN B-PP
the DT B-NP
week NN I-NP
, , O
adding VBG B-VP
1017.69 CD B-NP
points NNS I-NP
in IN B-PP
four CD B-NP
days NNS I-NP
-- : O
more JJR B-VP
than IN I-VP
erasing VBG I-VP
Monday NNP B-NP
's POS B-NP
losses NNS I-NP
. . O
But CC O
further JJ B-NP
major JJ I-NP
advances NNS I-NP
on IN B-PP
the DT B-NP
Nikkei NNP I-NP
are VBP B-VP
n't RB I-VP
foreseen VBN I-VP
this DT B-NP
week NN I-NP
by IN B-PP
market NN B-NP
observers NNS I-NP
. . O
Investors NNS B-NP
are VBP B-VP
still RB I-VP
waiting VBG I-VP
to TO I-VP
see VB I-VP
how WRB B-ADVP
the DT B-NP
U.S. NNP I-NP
government NN I-NP
will MD B-VP
decide VB I-VP
on IN B-PP
interest NN B-NP
rates NNS I-NP
and CC O
how WRB B-ADVP
the DT B-NP
dollar NN I-NP
will MD B-VP
be VB I-VP
stabilized VBN I-VP
. . O
Some DT B-NP
high-priced JJ I-NP
issues NNS I-NP
made VBD B-VP
a DT B-NP
comeback NN I-NP
Friday NNP B-NP
. . O
Pioneer NNP B-NP
surged VBD B-VP
450 CD B-NP
yen NN I-NP
-LRB- ( O
$ $ B-NP
3.16 CD I-NP
-RRB- ) O
to TO B-PP
6,050 CD B-NP
yen NN I-NP
-LRB- ( O
$ $ B-NP
42.60 CD I-NP
-RRB- ) O
. . O
Kyocera NNP B-NP
advanced VBD B-VP
80 CD B-NP
yen NN I-NP
to TO B-PP
5,440 CD B-NP
. . O
Fanuc NNP B-NP
gained VBD B-VP
100 CD B-NP
to TO B-PP
7,580 CD B-NP
. . O
Breweries NNP B-NP
attracted VBD B-VP
investors NNS B-NP
because IN B-PP
of IN I-PP
their PRP$ B-NP
land NN I-NP
property NN I-NP
holdings NNS I-NP
that WDT B-NP
could MD B-VP
figure VB I-VP
in IN B-PP
development NN B-NP
or CC O
other JJ B-NP
plans NNS I-NP
, , O
traders NNS B-NP
said VBD B-VP
. . O
Sapporo NNP B-NP
gained VBD B-VP
80 CD B-NP
to TO B-PP
1,920 CD B-NP
and CC O
Kirin NNP B-NP
added VBD B-VP
60 CD B-NP
to TO B-PP
2,070 CD B-NP
. . O
Housings NNS B-NP
, , I-NP
constructions NNS I-NP
and CC I-NP
pharmaceuticals NNS I-NP
continued VBD B-VP
to TO I-VP
be VB I-VP
bought VBN I-VP
following VBG B-PP
Thursday NNP B-NP
's POS B-NP
gains NNS I-NP
because IN B-PP
of IN I-PP
strong JJ B-NP
earnings NNS I-NP
outlooks NNS I-NP
. . O
Daiwa NNP B-NP
House NNP I-NP
gained VBD B-VP
50 CD B-NP
to TO B-PP
2,660 CD B-NP
. . O
Misawa NNP B-NP
Homes NNP I-NP
was VBD B-VP
up IN B-ADVP
20 CD B-NP
at IN B-PP
2,960 CD B-NP
. . O
Kajima NNP B-NP
advanced VBD B-VP
40 CD B-NP
to TO B-PP
2,120 CD B-NP
and CC O
Ohbayashi NNP B-NP
added VBD B-VP
50 CD B-NP
to TO B-PP
1,730 CD B-NP
. . O
Fujisawa NNP B-NP
added VBD B-VP
80 CD B-NP
to TO B-PP
2,010 CD B-NP
and CC O
Mochida NNP B-NP
advanced VBD B-VP
230 CD B-NP
to TO B-PP
4,400 CD B-NP
. . O
London JJ B-NP
share NN I-NP
prices NNS I-NP
were VBD B-VP
influenced VBN I-VP
largely RB B-ADVP
by IN B-PP
declines NNS B-NP
on IN B-PP
Wall NNP B-NP
Street NNP I-NP
and CC O
weakness NN B-NP
in IN B-PP
the DT B-NP
British JJ I-NP
pound NN I-NP
. . O
The DT B-NP
key JJ I-NP
Financial NNP I-NP
Times-Stock NNP I-NP
Exchange NNP I-NP
100-share JJ I-NP
index NN I-NP
ended VBD B-VP
10.2 CD B-NP
points NNS I-NP
lower JJR B-ADVP
at IN B-PP
2179.1 CD B-NP
, , O
above IN B-ADVP
its PRP$ B-NP
intraday JJ I-NP
low NN I-NP
of IN B-PP
2176.9 CD B-NP
, , B-ADVP
but CC I-ADVP
off IN B-ADVP
the DT B-NP
day NN I-NP
's POS I-NP
high NN B-NP
of IN B-PP
2189 CD B-NP
. . O
The DT B-NP
index NN I-NP
finished VBD B-VP
2.4 CD B-NP
% NN I-NP
under IN B-PP
its PRP$ B-NP
close NN I-NP
of IN B-PP
2233.9 CD B-NP
the DT B-NP
previous JJ I-NP
Friday NNP I-NP
, , O
although IN B-SBAR
it PRP B-NP
recouped VBD B-VP
some DT B-NP
of IN B-PP
the DT B-NP
sharp JJ I-NP
losses NNS I-NP
staged VBD B-VP
early JJ B-NP
last JJ I-NP
week NN I-NP
on IN B-PP
the DT B-NP
back RB I-NP
of IN B-PP
Wall NNP B-NP
Street NNP I-NP
's POS B-NP
fall NN I-NP
. . O
London NNP B-NP
was VBD B-VP
weak JJ B-ADJP
throughout IN B-PP
Friday NNP B-NP
's POS B-NP
trading NN I-NP
, , O
however RB B-ADVP
, , O
on IN B-PP
what WP B-NP
dealers NNS B-NP
attributed VBD B-VP
to TO B-PP
generally RB B-NP
thin JJ I-NP
interest NN I-NP
ahead RB B-ADVP
of IN B-PP
the DT B-NP
weekend NN I-NP
and CC O
this DT B-NP
week NN I-NP
's POS I-NP
potentially RB B-ADJP
important JJ I-ADJP
U.K. NNP B-NP
trade NN I-NP
figures NNS I-NP
for IN B-PP
September NNP B-NP
. . O
The DT B-NP
FT-SE NNP I-NP
100 CD I-NP
largely RB B-ADVP
remained VBD B-VP
within IN B-PP
an DT B-NP
11-point JJ I-NP
range NN I-NP
establshed VBN B-VP
within IN B-PP
the DT B-NP
first JJ I-NP
hour NN I-NP
of IN B-PP
trading NN B-NP
before IN B-PP
it PRP B-NP
eased VBD B-VP
to TO B-PP
an DT B-NP
intraday JJ I-NP
low JJ I-NP
late RB B-ADVP
in IN B-PP
the DT B-NP
session NN I-NP
when WRB B-ADVP
a DT B-NP
flurry NN I-NP
of IN B-PP
program NN B-NP
selling VBG I-NP
pushed VBN B-VP
Wall NNP B-NP
Street NNP I-NP
lower JJR B-ADVP
. . O
The DT B-NP
FT NNP I-NP
30-share JJ I-NP
index NN I-NP
closed VBD B-VP
11.0 CD B-NP
points NNS I-NP
lower JJR B-ADVP
at IN B-PP
1761.0 CD B-NP
. . O
Volume NN B-NP
was VBD B-VP
extremely RB B-ADJP
thin JJ I-ADJP
at IN B-PP
351.3 CD B-NP
million CD I-NP
shares NNS I-NP
, , O
the DT B-NP
lightest JJS I-NP
volume NN I-NP
of IN B-PP
the DT B-NP
week NN I-NP
and CC O
modestly RB B-ADVP
under IN B-PP
Thursday NNP B-NP
's POS B-NP
387.4 CD I-NP
million CD I-NP
shares NNS I-NP
. . O
Dealers NNS B-NP
said VBD B-VP
the DT B-NP
day NN I-NP
's POS B-NP
action NN I-NP
was VBD B-VP
featureless JJ B-ADJP
outside IN B-PP
some DT B-NP
response NN I-NP
to TO B-PP
sterling NN B-NP
's POS B-NP
early JJ I-NP
weakness NN I-NP
against IN B-PP
the DT B-NP
mark NN I-NP
, , O
and CC O
fears NNS B-NP
that IN B-SBAR
Wall NNP B-NP
Street NNP I-NP
might MD B-VP
open RB I-VP
lower JJR B-ADVP
after IN B-PP
its PRP$ B-NP
strong JJ I-NP
leap NN I-NP
forward RB B-ADVP
Thursday NNP B-NP
. . O
They PRP B-NP
added VBD B-VP
that IN B-SBAR
market-makers NNS B-NP
were VBD B-VP
largely RB I-VP
sidelined VBN I-VP
after IN B-PP
aggressively RB B-VP
supporting VBG I-VP
the DT B-NP
market NN I-NP
Thursday NNP B-NP
in IN B-PP
their PRP$ B-NP
quest NN I-NP
to TO B-VP
cover VB I-VP
internal JJ B-NP
shortages NNS I-NP
of IN B-PP
FT-SE NNP B-NP
100 CD I-NP
shares NNS I-NP
. . O
Interest NN B-NP
may MD B-VP
remain VB I-VP
limited JJ B-ADJP
into IN B-PP
tomorrow NN B-NP
's POS B-NP
U.K. NNP I-NP
trade NN I-NP
figures NNS I-NP
, , O
which WDT B-NP
the DT B-NP
market NN I-NP
will MD B-VP
be VB I-VP
watching VBG I-VP
closely RB B-ADVP
to TO B-VP
see VB I-VP
if IN B-SBAR
there EX B-NP
is VBZ B-VP
any DT B-NP
improvement NN I-NP
after IN B-PP
disappointing JJ B-NP
numbers NNS I-NP
in IN B-PP
the DT B-NP
previous JJ I-NP
two CD I-NP
months NNS I-NP
. . O
The DT B-NP
key JJ I-NP
corporate JJ I-NP
news NN I-NP
of IN B-PP
the DT B-NP
day NN I-NP
was VBD B-VP
that IN B-SBAR
British JJ B-NP
Airways NNPS I-NP
decided VBD B-VP
to TO I-VP
withdraw VB I-VP
from IN B-PP
a DT B-NP
management-led JJ I-NP
bid NN I-NP
for IN B-PP
UAL NNP B-NP
Corp. NNP I-NP
, , O
the DT B-NP
parent NN I-NP
of IN B-PP
United NNP B-NP
Airlines NNPS I-NP
. . O
British JJ B-NP
Airways NNPS I-NP
rose VBD B-VP
initially RB B-ADVP
after IN B-PP
announcing VBG B-VP
its PRP$ B-NP
withdrawal NN I-NP
from IN B-PP
the DT B-NP
UAL NNP I-NP
deal NN I-NP
. . O
Dealers NNS B-NP
said VBD B-VP
they PRP B-NP
viewed VBD B-VP
the DT O
initial JJ O
# # O
390-million CD O
-LRB- ( O
$ $ B-ADJP
622 CD O
million CD O
-RRB- ) O
outlay NN B-NP
for IN B-PP
a DT B-NP
15 CD I-NP
% NN I-NP
stake NN I-NP
in IN B-PP
the DT B-NP
airline NN I-NP
as IN B-PP
a DT B-NP
bit NN I-NP
much JJ I-NP
. . O
Its PRP$ B-NP
shares NNS I-NP
slid VBD B-VP
in IN B-PP
late JJ B-NP
dealings NNS I-NP
to TO B-VP
close VB I-VP
a DT B-NP
penny NN I-NP
per IN B-PP
share NN B-NP
lower JJR B-ADVP
at IN B-PP
197 CD B-NP
pence NN I-NP
. . O
The DT B-NP
airline NN I-NP
was VBD B-VP
the DT B-NP
most RBS I-NP
active JJ I-NP
FT-SE NNP I-NP
100 CD I-NP
at IN B-PP
8.2 CD B-NP
million CD I-NP
shares NNS I-NP
traded VBN B-VP
. . O
The DT B-NP
next JJ I-NP
most RBS I-NP
active JJ I-NP
top-tier JJ I-NP
stock NN I-NP
was VBD B-VP
B.A.T NNP B-NP
Industries NNPS I-NP
, , O
the DT B-NP
target NN I-NP
of IN B-PP
Sir NNP B-NP
James NNP I-NP
Goldsmith NNP I-NP
's POS B-NP
# # B-ADJP
13.4 CD O
billion CD O
bid NN B-NP
. . O
The DT B-NP
company NN I-NP
gained VBD B-VP
shareholder NN B-NP
approval NN I-NP
Thursday NNP B-NP
to TO B-VP
restructure VB I-VP
in IN B-PP
a DT B-NP
bid NN I-NP
to TO B-VP
fend VB I-VP
off IN B-PRT
the DT B-NP
hostile JJ I-NP
takeover NN I-NP
. . O
Sir NNP B-NP
James NNP I-NP
said VBD B-VP
Thursday NNP B-NP
night NN I-NP
that IN B-SBAR
his PRP$ B-NP
plans NNS I-NP
for IN B-PP
the DT B-NP
takeover NN I-NP
had VBD B-VP
n't RB I-VP
changed VBN I-VP
. . O
B.A.T NNP B-NP
ended VBD B-VP
the DT B-NP
day NN I-NP
at IN B-PP
778 CD B-NP
, , O
down JJ B-ADVP
5 NN B-NP
, , O
on IN B-PP
turnover NN B-NP
of IN B-PP
7.5 CD B-NP
million CD I-NP
shares NNS I-NP
. . O
Dealers NNS B-NP
said VBD B-VP
it PRP B-NP
was VBD B-VP
hit VBN I-VP
by IN B-PP
some DT B-NP
profit-taking NN I-NP
after IN B-PP
gains NNS B-NP
since IN B-PP
mid-week NN B-NP
. . O
In IN B-PP
other JJ B-NP
active JJ I-NP
shares NNS I-NP
, , O
Trusthouse NNP B-NP
Forte NNP I-NP
shed VB B-VP
10 CD B-NP
to TO B-PP
294 CD B-NP
on IN B-PP
volume NN B-NP
of IN B-PP
6.4 CD B-NP
million CD I-NP
shares NNS I-NP
after IN B-PP
a DT B-NP
Barclays NNP I-NP
De NNP I-NP
Zoete NNP I-NP
Wedd NNP I-NP
downgrading NN I-NP
, , O
while IN B-SBAR
Hillsdown NNP B-NP
Holdings NNP I-NP
, , O
a DT B-NP
food NN I-NP
products NNS I-NP
concern VBP I-NP
, , O
was VBD B-VP
boosted VBN I-VP
2 CD B-NP
to TO B-PP
271 CD B-NP
after IN O
it PRP B-NP
disclosed VBD B-VP
it PRP B-NP
would MD B-VP
seek VB I-VP
shareholder NN B-NP
approval NN I-NP
to TO B-VP
begin VB I-VP
share NN B-NP
repurchases NNS I-NP
. . O
Elsewhere RB B-ADVP
in IN B-PP
Europe NNP B-NP
, , O
share NN B-NP
prices NNS I-NP
closed VBD B-VP
higher JJR B-ADVP
in IN B-PP
Stockholm NNP B-NP
, , I-NP
Brussels NNP I-NP
and CC I-NP
Milan NNP I-NP
. . O
Prices NNS B-NP
were VBD B-VP
lower JJR B-ADJP
in IN B-PP
Frankfurt NNP B-NP
, , I-NP
Zurich NNP I-NP
, , I-NP
Paris NNP I-NP
and CC I-NP
Amsterdam NNP I-NP
. . O
South JJ B-NP
African JJ I-NP
gold NN I-NP
stocks NNS I-NP
closed VBD B-VP
moderately RB B-ADVP
lower JJR I-ADVP
. . O
Share NN B-NP
prices NNS I-NP
closed VBD B-VP
higher JJR B-ADVP
in IN B-PP
Sydney NNP B-NP
, , O
Taipei NNP B-NP
, , O
Wellington NNP B-NP
, , O
Manila NNP B-NP
, , O
Hong NNP B-NP
Kong NNP I-NP
and CC O
Singapore NNP B-NP
and CC O
were VBD B-VP
lower JJR B-ADJP
in IN B-PP
Seoul NNP B-NP
. . O
Here RB B-ADVP
are VBP B-VP
price NN B-NP
trends NNS I-NP
on IN B-PP
the DT B-NP
world NN I-NP
's POS B-NP
major JJ I-NP
stock NN I-NP
markets NNS I-NP
, , O
as IN B-SBAR
calculated VBN B-VP
by IN B-PP
Morgan NNP B-NP
Stanley NNP I-NP
Capital NNP I-NP
International NNP I-NP
Perspective NNP I-NP
, , O
Geneva NNP B-NP
. . O
To TO B-VP
make VB I-VP
them PRP B-NP
directly RB B-ADJP
comparable JJ I-ADJP
, , O
each DT B-NP
index NN I-NP
is VBZ B-VP
based VBN I-VP
on IN B-PP
the DT B-NP
close NN I-NP
of IN B-PP
1969 CD B-NP
equaling VBG B-VP
100 CD B-NP
. . O
The DT B-NP
percentage NN I-NP
change NN I-NP
is VBZ B-VP
since IN B-PP
year-end NN B-NP
. . O
The DT B-NP
U.S. NNP I-NP
is VBZ B-VP
required VBN I-VP
to TO I-VP
notify VB I-VP
foreign JJ B-NP
dictators NNS I-NP
if IN B-SBAR
it PRP B-NP
knows VBZ B-VP
of IN B-PP
coup NN B-NP
plans NNS I-NP
likely JJ B-ADJP
to TO B-VP
endanger VB I-VP
their PRP$ B-NP
lives NNS I-NP
, , O
government NN B-NP
officials NNS I-NP
said VBD B-VP
. . O
The DT B-NP
notification NN I-NP
policy NN I-NP
was VBD B-VP
part NN B-NP
of IN B-PP
a DT B-NP
set NN I-NP
of IN B-PP
guidelines NNS B-NP
on IN B-PP
handling NN B-VP
coups NNS B-NP
outlined VBN B-VP
in IN B-PP
a DT B-NP
secret JJ I-NP
1988 CD I-NP
exchange NN I-NP
of IN B-PP
letters NNS B-NP
between IN B-PP
the DT B-NP
Reagan NNP I-NP
administration NN I-NP
and CC O
the DT B-NP
Senate NNP I-NP
Intelligence NNP I-NP
Committee NNP I-NP
. . O
The DT B-NP
existence NN I-NP
of IN B-PP
the DT B-NP
guidelines NNS I-NP
has VBZ B-VP
become VBN I-VP
known VBN I-VP
since IN B-SBAR
President NNP B-NP
Bush NNP I-NP
disclosed VBD B-VP
them PRP B-NP
privately RB B-ADVP
to TO B-PP
seven CD B-NP
Republican NNP I-NP
senators NNS I-NP
at IN B-PP
a DT B-NP
White NNP I-NP
House NNP I-NP
meeting NN I-NP
last JJ B-NP
Monday NNP I-NP
. . O
Officials NNS B-NP
familiar JJ B-ADJP
with IN B-PP
the DT B-NP
meeting NN I-NP
said VBD B-VP
Mr. NNP B-NP
Bush NNP I-NP
cited VBD B-VP
the DT B-NP
policy NN I-NP
as IN B-PP
an DT B-NP
example NN I-NP
of IN B-PP
the DT B-NP
sort NN I-NP
of IN B-PP
congressional JJ B-NP
requirements NNS I-NP
the DT B-NP
administration NN I-NP
contends VBZ B-VP
contribute VB B-VP
to TO B-PP
the DT B-NP
failure NN I-NP
of IN B-PP
such JJ B-NP
covert JJ I-NP
actions NNS I-NP
as IN B-PP
this DT B-NP
month NN I-NP
's POS B-NP
futile JJ I-NP
effort NN I-NP
to TO B-VP
oust VB I-VP
Panamanian JJ B-NP
dictator NN I-NP
Manuel NNP I-NP
Noriega NNP I-NP
. . O
According VBG B-PP
to TO B-PP
the DT B-NP
officials NNS I-NP
, , O
Mr. NNP B-NP
Bush NNP I-NP
even RB B-ADVP
read VB B-VP
to TO B-PP
the DT B-NP
senators NNS I-NP
selections NNS B-NP
from IN B-PP
a DT B-NP
highly RB I-NP
classified VBN I-NP
letter NN I-NP
from IN B-PP
the DT B-NP
committee NN I-NP
to TO B-PP
the DT B-NP
White NNP I-NP
House NNP I-NP
discussing VBG B-VP
the DT B-NP
guidelines NNS I-NP
. . O
They PRP B-NP
said VBD B-VP
the DT B-NP
president NN I-NP
conceded VBD B-VP
the DT B-NP
notification NN I-NP
requirement NN I-NP
did VBD B-VP
n't RB I-VP
affect VB I-VP
his PRP$ B-NP
decision NN I-NP
to TO B-VP
lend VB I-VP
only RB B-NP
minor JJ I-NP
support NN I-NP
to TO B-PP
this DT B-NP
month NN I-NP
's POS B-NP
Panama NNP I-NP
coup NN I-NP
effort NN I-NP
. . O
No DT B-NP
notification NN I-NP
was VBD B-VP
ever RB I-VP
considered VBN I-VP
, , O
officials NNS B-NP
said VBD B-VP
, , O
apparently RB B-ADVP
because IN B-SBAR
the DT B-NP
U.S. NNP I-NP
did VBD B-VP
n't RB I-VP
think VB I-VP
the DT B-NP
coup NN I-NP
plotters NNS I-NP
intended VBN B-VP
to TO I-VP
kill VB I-VP
Mr. NNP B-NP
Noriega NNP I-NP
, , O
but CC O
merely RB B-VP
sought VBD I-VP
to TO I-VP
imprison VB I-VP
him PRP B-NP
. . O
What WP B-NP
's VBZ B-VP
more JJR B-NP
, , O
both DT B-NP
administration NN B-NP
and CC O
congressional JJ B-NP
officials NNS I-NP
hint VBP B-VP
that IN B-SBAR
the DT B-NP
notification NN I-NP
requirement NN I-NP
is VBZ B-VP
likely JJ B-ADJP
to TO B-VP
be VB I-VP
dropped VBN I-VP
from IN B-PP
the DT B-NP
guidelines NNS I-NP
on IN B-PP
coup NN B-NP
attempts NNS I-NP
that WDT B-NP
are VBP B-VP
being VBG I-VP
rewritten VBN I-VP
by IN B-PP
the DT B-NP
panel NN I-NP
and CC O
the DT B-NP
White NNP I-NP
House NNP I-NP
. . O
The DT B-NP
rewriting VBG I-NP
was VBD B-VP
launched VBN I-VP
at IN B-PP
a DT B-NP
meeting NN I-NP
between IN B-PP
Mr. NNP B-NP
Bush NNP I-NP
and CC O
intelligence NN B-NP
committee NN I-NP
leaders NNS I-NP
Oct. NNP B-NP
12 CD I-NP
, , O
a DT B-NP
few JJ I-NP
days NNS I-NP
before IN B-PP
the DT B-NP
meeting NN I-NP
at IN B-PP
which WDT B-NP
the DT B-NP
president NN I-NP
complained VBD B-VP
about IN B-PP
the DT B-NP
rules NNS I-NP
. . O
However RB B-ADVP
, , O
the DT B-NP
disclosure NN I-NP
of IN B-PP
...@@ -541,8 +541,12 @@ message LayerConfig { ...@@ -541,8 +541,12 @@ message LayerConfig {
// for switch order layer // for switch order layer
optional ReshapeConfig reshape_conf = 59; optional ReshapeConfig reshape_conf = 59;
// for batch normalization layer
// The small constant added to the variance to improve numeric stability.
optional double epsilon = 60 [ default = 0.00001 ];
// for factorization machine layer // for factorization machine layer
optional uint32 factor_size = 60; optional uint32 factor_size = 61;
} }
message EvaluatorConfig { message EvaluatorConfig {
......
...@@ -1116,35 +1116,6 @@ def PyData(files=None, ...@@ -1116,35 +1116,6 @@ def PyData(files=None,
return data_config return data_config
@config_func
def ProtoData(files=None,
type=None,
file_group_queue_capacity=None,
load_file_count=None,
constant_slots=None,
load_thread_num=None,
**xargs):
data_config = create_data_config_proto(**xargs)
if type is None:
data_config.type = 'proto'
else:
data_config.type = type
data_config.files = files
# When type="proto_group", one data provider contains at most
# load_file_count files, and there are at most
# (queue_capacity + load_thread_num + 1) data providers in memory
if file_group_queue_capacity is not None:
data_config.file_group_conf.queue_capacity = file_group_queue_capacity
if load_file_count is not None:
data_config.file_group_conf.load_file_count = load_file_count
if load_thread_num is not None:
data_config.file_group_conf.load_thread_num = load_thread_num
if constant_slots:
data_config.constant_slots.extend(constant_slots)
return data_config
#real data for training is actually provided by "sub_data" data providers. #real data for training is actually provided by "sub_data" data providers.
@config_func @config_func
def MultiData(sub_data=[]): def MultiData(sub_data=[]):
...@@ -2066,13 +2037,20 @@ class ParameterReluLayer(LayerBase): ...@@ -2066,13 +2037,20 @@ class ParameterReluLayer(LayerBase):
def __init__(self, name, inputs, partial_sum=1, **args): def __init__(self, name, inputs, partial_sum=1, **args):
super(ParameterReluLayer, self).__init__( super(ParameterReluLayer, self).__init__(
name, self.layer_type, 0, inputs=inputs, **args) name, self.layer_type, 0, inputs=inputs, **args)
input_layer = self.get_input_layer(0) input_layer = self.get_input_layer(0)
config_assert(len(self.inputs) == 1, "prelu layer has only one input.") config_assert(len(self.inputs) == 1, "prelu layer has only one input.")
config_assert(input_layer.size % partial_sum == 0, config_assert(input_layer.size % partial_sum == 0,
"a wrong setting for partial_sum") "a wrong setting for partial_sum")
dims = [1, input_layer.size / partial_sum]
self.set_layer_size(input_layer.size) self.set_layer_size(input_layer.size)
self.config.partial_sum = partial_sum self.config.partial_sum = partial_sum
self.create_input_parameter(0, input_layer.size / partial_sum) self.create_input_parameter(0, input_layer.size / partial_sum, dims)
self.set_layer_height_width(self.get_input_layer(0).height, \
self.get_input_layer(0).width)
self.set_layer_depth(self.get_input_layer(0).depth)
@config_layer('conv') @config_layer('conv')
...@@ -2434,6 +2412,7 @@ class BatchNormLayer(LayerBase): ...@@ -2434,6 +2412,7 @@ class BatchNormLayer(LayerBase):
bias=True, bias=True,
img3D=False, img3D=False,
use_global_stats=True, use_global_stats=True,
epsilon=1e-5,
moving_average_fraction=0.9, moving_average_fraction=0.9,
batch_norm_type=None, batch_norm_type=None,
mean_var_names=None, mean_var_names=None,
...@@ -2482,6 +2461,9 @@ class BatchNormLayer(LayerBase): ...@@ -2482,6 +2461,9 @@ class BatchNormLayer(LayerBase):
self.config.use_global_stats = use_global_stats self.config.use_global_stats = use_global_stats
if moving_average_fraction is not None: if moving_average_fraction is not None:
self.config.moving_average_fraction = moving_average_fraction self.config.moving_average_fraction = moving_average_fraction
if epsilon is not None:
assert epsilon >= 1e-5, "epsilon must be no less than 1e-5."
self.config.epsilon = epsilon
input_layer = self.get_input_layer(0) input_layer = self.get_input_layer(0)
image_conf = self.config.inputs[0].image_conf image_conf = self.config.inputs[0].image_conf
...@@ -2714,7 +2696,7 @@ Usage: ...@@ -2714,7 +2696,7 @@ Usage:
max_sort_size = -1, inputs = ["output", "score"]) max_sort_size = -1, inputs = ["output", "score"])
Input data: Samples of the same query should be loaded as a sequence, Input data: Samples of the same query should be loaded as a sequence,
by ProtoDataProvider or PyDataProvider etc.. User should provide by PyDataProvider etc.. User should provide
scores for each sample. The score slot should be the 2nd scores for each sample. The score slot should be the 2nd
input of lambdaRank layer. input of lambdaRank layer.
......
...@@ -297,7 +297,7 @@ def auc_evaluator( ...@@ -297,7 +297,7 @@ def auc_evaluator(
def pnpair_evaluator( def pnpair_evaluator(
input, input,
label, label,
info, query_id,
weight=None, weight=None,
name=None, ): name=None, ):
""" """
...@@ -308,16 +308,20 @@ def pnpair_evaluator( ...@@ -308,16 +308,20 @@ def pnpair_evaluator(
.. code-block:: python .. code-block:: python
eval = pnpair_evaluator(input, label, info) eval = pnpair_evaluator(input, label, query_id)
:param input: Input Layer name. The output prediction of network. :param input: Input Layer name. The output prediction of network.
:type input: LayerOutput :type input: LayerOutput
:param label: Label layer name. :param label: Label layer name.
:type label: LayerOutput :type label: LayerOutput
:param info: Info layer name. (TODO, explaination) :param query_id: Query_id layer name. Query_id indicates that which query
:type info: LayerOutput each sample belongs to. Its shape should be
the same as output of Label layer.
:type query_id: LayerOutput
:param weight: Weight Layer name. It should be a matrix with size :param weight: Weight Layer name. It should be a matrix with size
[sample_num, 1]. (TODO, explaination) [sample_num, 1] which indicates the weight of each sample.
The default weight of sample is 1 if the weight layer is None.
And the pair weight is the mean of the two samples' weight.
:type weight: LayerOutput :type weight: LayerOutput
:param name: Evaluator name. :param name: Evaluator name.
:type name: None|basestring :type name: None|basestring
...@@ -326,8 +330,8 @@ def pnpair_evaluator( ...@@ -326,8 +330,8 @@ def pnpair_evaluator(
input = [input] input = [input]
if label: if label:
input.append(label) input.append(label)
if info: if query_id:
input.append(info) input.append(query_id)
evaluator_base( evaluator_base(
input=input, input=input,
type="pnpair", type="pnpair",
......
...@@ -2510,12 +2510,12 @@ def img_conv_layer(input, ...@@ -2510,12 +2510,12 @@ def img_conv_layer(input,
input is raw pixels of image(mono or RGB), or it may be the previous layer's input is raw pixels of image(mono or RGB), or it may be the previous layer's
num_filters * num_group. num_filters * num_group.
There are several group of filter in PaddlePaddle implementation. There are several groups of filters in PaddlePaddle implementation.
Each group will process some channel of the inputs. For example, if an input Each group will process some channels of the input. For example, if
num_channel = 256, group = 4, num_filter=32, the PaddlePaddle will create num_channel = 256, group = 4, num_filter=32, the PaddlePaddle will create
32*4 = 128 filters to process inputs. The channels will be split into 4 32*4 = 128 filters to process the input. The channels will be split into 4
pieces. First 256/4 = 64 channels will process by first 32 filters. The pieces. First 256/4 = 64 channels will be processed by first 32 filters. The
rest channels will be processed by rest group of filters. rest channels will be processed by the rest groups of filters.
The example usage is: The example usage is:
...@@ -2531,53 +2531,68 @@ def img_conv_layer(input, ...@@ -2531,53 +2531,68 @@ def img_conv_layer(input,
:type name: basestring :type name: basestring
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput :type input: LayerOutput
:param filter_size: The x dimension of a filter kernel. Or input a tuple for :param filter_size: The dimensions of the filter kernel. If the parameter is
two image dimension. set to one integer, the two dimensions on x and y axises
will be same when filter_size_y is not set. If it is set
to a list, the first element indicates the dimension on
the x axis, and the second is used to specify the dimension
on the y axis when filter_size_y is not provided.
:type filter_size: int | tuple | list :type filter_size: int | tuple | list
:param filter_size_y: The y dimension of a filter kernel. Since PaddlePaddle :param filter_size_y: The dimension of the filter kernel on the y axis. If the parameter
currently supports rectangular filters, the filter's is not set, it will be set automatically according to filter_size.
shape will be (filter_size, filter_size_y). :type filter_size_y: int
:type filter_size_y: int | None
:param num_filters: Each filter group's number of filter :param num_filters: Each filter group's number of filter
:param act: Activation type. ReluActivation is the default activation. :param act: Activation type. ReluActivation is the default activation.
:type act: BaseActivation :type act: BaseActivation
:param groups: Group size of filters. :param groups: The group number. 1 is the default group number.
:type groups: int :type groups: int
:param stride: The x dimension of the stride. Or input a tuple for two image :param stride: The strides. If the parameter is set to one integer, the strides
dimension. on x and y axises will be same when stride_y is not set. If it is
set to a list, the first element indicates the stride on the x axis,
and the second is used to specify the stride on the y axis when
stride_y is not provided. 1 is the default value.
:type stride: int | tuple | list :type stride: int | tuple | list
:param stride_y: The y dimension of the stride. :param stride_y: The stride on the y axis.
:type stride_y: int :type stride_y: int
:param padding: The x dimension of the padding. Or input a tuple for two :param padding: The padding sizes. If the parameter is set to one integer, the padding
image dimension sizes on x and y axises will be same when padding_y is not set. If it
is set to a list, the first element indicates the padding size on the
x axis, and the second is used to specify the padding size on the y axis
when padding_y is not provided. 0 is the default padding size.
:type padding: int | tuple | list :type padding: int | tuple | list
:param padding_y: The y dimension of the padding. :param padding_y: The padding size on the y axis.
:type padding_y: int :type padding_y: int
:param dilation: The x dimension of the dilation. Or input a tuple for two :param dilation: The dimensions of the dilation. If the parameter is set to one integer,
image dimension the two dimensions on x and y axises will be same when dilation_y is not
set. If it is set to a list, the first element indicates the dimension
on the x axis, and the second is used to specify the dimension on the y
axis when dilation_y is not provided. 1 is the default dimension.
:type dilation: int | tuple | list :type dilation: int | tuple | list
:param dilation_y: The y dimension of the dilation. :param dilation_y: The dimension of the dilation on the y axis.
:type dilation_y: int :type dilation_y: int
:param bias_attr: The bias attribute. If the parameter is set to False or an object :param bias_attr: The bias attribute. If the parameter is set to False or an object
whose type is not ParameterAttribute, no bias is defined. If the whose type is not ParameterAttribute, no bias is defined. If the
parameter is set to True, the bias is initialized to zero. parameter is set to True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param num_channels: number of input channels. If None will be set :param num_channels: The number of input channels. If the parameter is not set or
automatically from previous output. set to None, its actual value will be automatically set to
the channel number of the input.
:type num_channels: int :type num_channels: int
:param param_attr: Convolution param attribute. None means default attribute :param param_attr: The parameter attribute. See ParameterAttribute for
details.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
:param shared_biases: Is biases will be shared between filters or not. :param shared_biases: Whether biases will be shared between filters or not.
:type shared_biases: bool :type shared_biases: bool
:param layer_attr: Layer Extra Attribute. :param layer_attr: The extra layer attributes. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:param trans: true if it is a convTransLayer, false if it is a convLayer :param trans: True if it is a convTransLayer, False if it is a convLayer
:type trans: bool :type trans: bool
:param layer_type: specify the layer_type, default is None. If trans=True, :param layer_type: Specify the layer type. If the dilation's dimension on one axis is
layer_type has to be "exconvt" or "cudnn_convt", larger than 1, layer_type has to be "cudnn_conv" or "cudnn_convt".
otherwise layer_type has to be either "exconv" or If trans=True, layer_type has to be "exconvt" or "cudnn_convt",
"cudnn_conv" otherwise layer_type has to be either "exconv" or "cudnn_conv".
:type layer_type: String :type layer_type: basestring
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
...@@ -2682,7 +2697,7 @@ def img_pool_layer(input, ...@@ -2682,7 +2697,7 @@ def img_pool_layer(input,
""" """
Image pooling Layer. Image pooling Layer.
The details of pooling layer, please refer ufldl's pooling_ . The details of pooling layer, please refer to ufldl's pooling_ .
.. _pooling: http://ufldl.stanford.edu/tutorial/supervised/Pooling/ .. _pooling: http://ufldl.stanford.edu/tutorial/supervised/Pooling/
...@@ -2714,32 +2729,37 @@ def img_pool_layer(input, ...@@ -2714,32 +2729,37 @@ def img_pool_layer(input,
padding_y=2, padding_y=2,
pool_type=MaxPooling()) pool_type=MaxPooling())
:param padding: pooling padding width. :param padding: The padding size on the x axis. 0 is the default padding size.
:type padding: int :type padding: int
:param padding_y: pooling padding height. It's equal to padding by default. :param padding_y: The padding size on the y axis. If the parameter is not set
:type padding_y: int | None or set to None, it will be set to 'padding' automatically.
:param name: name of pooling layer :param name: The name of this layer. It is optional.
:type name: basestring. :type name: basestring
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput :type input: LayerOutput
:param pool_size: pooling window width :param pool_size: The pooling window length on the x axis.
:type pool_size: int :type pool_size: int
:param pool_size_y: pooling window height. It's eaqual to pool_size by default. :param pool_size_y: The pooling window length on the y axis. If the parameter is
:type pool_size_y: int | None not set or set to None, its actual value will be automatically
:param num_channels: number of input channel. set to pool_size.
:type pool_size_y: int
:param num_channels: The number of input channels. If the parameter is not set or
set to None, its actual value will be automatically set to
the channels number of the input.
:type num_channels: int :type num_channels: int
:param pool_type: pooling type. MaxPooling or AvgPooling. Default is :param pool_type: Pooling type. MaxPooling is the default pooling.
MaxPooling.
:type pool_type: BasePoolingType :type pool_type: BasePoolingType
:param stride: stride width of pooling. :param stride: The stride on the x axis. 1 is the default value.
:type stride: int :type stride: int
:param stride_y: stride height of pooling. It is equal to stride by default. :param stride_y: The stride on the y axis. If the parameter is not set or set to
:type stride_y: int | None None, its actual value will be automatically set to 'stride'.
:param layer_attr: Extra Layer attribute. :type stride_y: int
:param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:param ceil_mode: Wether to use ceil mode to calculate output height and with. :param ceil_mode: Wether to use the ceil function to calculate output height and width.
Defalut is True. If set false, Otherwise use floor. True is the default. If it is set to False, the floor function will
be used.
:type ceil_mode: bool :type ceil_mode: bool
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -2845,24 +2865,32 @@ def img_pool3d_layer(input, ...@@ -2845,24 +2865,32 @@ def img_pool3d_layer(input,
:param padding: pooling padding width. :param padding: pooling padding width.
:type padding: int | tuple | list :type padding: int | tuple | list
:param name: name of pooling layer :param name: The name of this layer. It is optional.
:type name: basestring. :type name: basestring.
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput :type input: LayerOutput
:param pool_size: pooling window width :param pool_size: The pooling window lengths along three axises. If the parameter
is set to one integer, the three lengths will be same.
:type pool_size: int | tuple | list :type pool_size: int | tuple | list
:param num_channels: number of input channel. :param num_channels: The number of input channels. If the parameter is not set or
set to None, its actual value will be automatically set to
the channels number of the input.
:type num_channels: int :type num_channels: int
:param pool_type: pooling type. MaxPooling or AvgPooling. Default is :param pool_type: Pooling type. MaxPooling is the default pooling.
MaxPooling.
:type pool_type: BasePoolingType :type pool_type: BasePoolingType
:param stride: stride width of pooling. :param stride: The strides of the pooling along three axises. If the parameter
is set to one integer, the three strides will be same. 1 is the
default value.
:type stride: int | tuple | list :type stride: int | tuple | list
:param layer_attr: Extra Layer attribute. :param padding: The sizes of padding along three axises. If the parameter is set to
one integer, they will be same. 0 is the default padding size.
:type padding: int | tuple | list
:param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:param ceil_mode: Wether to use ceil mode to calculate output height and with. :param ceil_mode: Wether to use the ceil function to calculate output height and width.
Defalut is True. If set false, Otherwise use floor. True is the default. If it is set to False, the floor function will
be used.
:type ceil_mode: bool :type ceil_mode: bool
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -2941,9 +2969,11 @@ def spp_layer(input, ...@@ -2941,9 +2969,11 @@ def spp_layer(input,
pyramid_height=None, pyramid_height=None,
layer_attr=None): layer_attr=None):
""" """
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. A layer performs spatial pyramid pooling.
The details please refer to
`Kaiming He's paper <https://arxiv.org/abs/1406.4729>`_. Reference:
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
https://arxiv.org/abs/1406.4729
The example usage is: The example usage is:
...@@ -2958,13 +2988,16 @@ def spp_layer(input, ...@@ -2958,13 +2988,16 @@ def spp_layer(input,
:type name: basestring :type name: basestring
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput :type input: LayerOutput
:param num_channels: number of input channel. :param num_channels: The number of input channels. If the parameter is not set or
set to None, its actual value will be automatically set to
the channels number of the input.
:type num_channels: int :type num_channels: int
:param pool_type: Pooling type. MaxPooling or AveragePooling. Default is MaxPooling. :param pool_type: Pooling type. MaxPooling is the default pooling.
:type scale: BasePoolingType :type scale: BasePoolingType
:param pyramid_height: pyramid height. :param pyramid_height: The pyramid height of this pooling.
:type pyramid_height: int :type pyramid_height: int
:param layer_attr: Extra Layer Attribute. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -3088,6 +3121,7 @@ def batch_norm_layer(input, ...@@ -3088,6 +3121,7 @@ def batch_norm_layer(input,
param_attr=None, param_attr=None,
layer_attr=None, layer_attr=None,
batch_norm_type=None, batch_norm_type=None,
epsilon=1e-5,
moving_average_fraction=0.9, moving_average_fraction=0.9,
use_global_stats=None, use_global_stats=None,
mean_var_names=None): mean_var_names=None):
...@@ -3158,6 +3192,8 @@ def batch_norm_layer(input, ...@@ -3158,6 +3192,8 @@ def batch_norm_layer(input,
will use the mean and variance of the current batch will use the mean and variance of the current batch
of test data. of test data.
:type use_global_stats: bool | None. :type use_global_stats: bool | None.
:param epsilon: The small constant added to the variance to improve numeric stability.
:type epsilon: float.
:param moving_average_fraction: Factor used in the moving average computation. :param moving_average_fraction: Factor used in the moving average computation.
:math:`runningMean = newMean*(1-factor) + runningMean*factor` :math:`runningMean = newMean*(1-factor) + runningMean*factor`
:type moving_average_fraction: float. :type moving_average_fraction: float.
...@@ -3175,6 +3211,7 @@ def batch_norm_layer(input, ...@@ -3175,6 +3211,7 @@ def batch_norm_layer(input,
assert (batch_norm_type is None) or (batch_norm_type == "batch_norm") or \ assert (batch_norm_type is None) or (batch_norm_type == "batch_norm") or \
(batch_norm_type == "mkldnn_batch_norm") or \ (batch_norm_type == "mkldnn_batch_norm") or \
(batch_norm_type == "cudnn_batch_norm") (batch_norm_type == "cudnn_batch_norm")
l = Layer( l = Layer(
name=name, name=name,
img3D=img3D, img3D=img3D,
...@@ -3184,6 +3221,7 @@ def batch_norm_layer(input, ...@@ -3184,6 +3221,7 @@ def batch_norm_layer(input,
type=LayerType.BATCH_NORM_LAYER, type=LayerType.BATCH_NORM_LAYER,
batch_norm_type=batch_norm_type, batch_norm_type=batch_norm_type,
bias=ParamAttr.to_bias(bias_attr), bias=ParamAttr.to_bias(bias_attr),
epsilon=epsilon,
moving_average_fraction=moving_average_fraction, moving_average_fraction=moving_average_fraction,
use_global_stats=use_global_stats, use_global_stats=use_global_stats,
mean_var_names=mean_var_names, mean_var_names=mean_var_names,
...@@ -4697,7 +4735,7 @@ def conv_projection(input, ...@@ -4697,7 +4735,7 @@ def conv_projection(input,
will be same when filter_size_y is not set. If it is set will be same when filter_size_y is not set. If it is set
to a list, the first element indicates the dimension on to a list, the first element indicates the dimension on
the x axis, and the second is used to specify the dimension the x axis, and the second is used to specify the dimension
on the y axis when filter_size is not provided. on the y axis when filter_size_y is not provided.
:type filter_size: int | tuple | list :type filter_size: int | tuple | list
:param filter_size_y: The dimension of the filter kernel on the y axis. If the parameter :param filter_size_y: The dimension of the filter kernel on the y axis. If the parameter
is not set, it will be set automatically according to filter_size. is not set, it will be set automatically according to filter_size.
...@@ -6574,10 +6612,11 @@ def row_conv_layer(input, ...@@ -6574,10 +6612,11 @@ def row_conv_layer(input,
@layer_support() @layer_support()
@wrap_name_default() @wrap_name_default()
@wrap_param_attr_default()
def prelu_layer(input, def prelu_layer(input,
name=None, name=None,
partial_sum=1, partial_sum=1,
channel_shared=None,
num_channels=None,
param_attr=None, param_attr=None,
layer_attr=None): layer_attr=None):
""" """
...@@ -6608,6 +6647,14 @@ def prelu_layer(input, ...@@ -6608,6 +6647,14 @@ def prelu_layer(input,
- partial_sum = number of outputs, indicates all elements share the same weight. - partial_sum = number of outputs, indicates all elements share the same weight.
:type partial_sum: int :type partial_sum: int
:param channel_shared: whether or not the parameter are shared across channels.
- channel_shared = True, we set the partial_sum to the number of outputs.
- channel_shared = False, we set the partial_sum to the number of elements in one channel.
:type channel_shared: bool
:param num_channels: number of input channel.
:type num_channels: int
:param param_attr: The parameter attribute. See ParameterAttribute for details. :param param_attr: The parameter attribute. See ParameterAttribute for details.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
:param layer_attr: The extra layer attribute. See ExtraLayerAttribute for :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
...@@ -6618,7 +6665,25 @@ def prelu_layer(input, ...@@ -6618,7 +6665,25 @@ def prelu_layer(input,
""" """
assert isinstance(input, LayerOutput), 'prelu_layer accepts only one input.' assert isinstance(input, LayerOutput), 'prelu_layer accepts only one input.'
assert isinstance(param_attr, ParameterAttribute)
if not param_attr:
param_attr = ParamAttr(initial_mean=0.25, initial_std=0.0)
else:
assert isinstance(param_attr, ParameterAttribute)
if num_channels is None:
assert input.num_filters is not None, \
'the input channel cannot be detected, please specify the num_channels parameter'
num_channels = input.num_filters
if channel_shared is not None:
assert isinstance(channel_shared, bool)
assert (input.height != 0 and input.width != 0), \
'input height and widht must be setted'
if channel_shared:
partial_sum = input.height * input.width * num_channels
else:
partial_sum = input.height * input.width
l = Layer( l = Layer(
name=name, name=name,
...@@ -6630,6 +6695,7 @@ def prelu_layer(input, ...@@ -6630,6 +6695,7 @@ def prelu_layer(input,
name=name, name=name,
layer_type=LayerType.PRELU, layer_type=LayerType.PRELU,
parents=input, parents=input,
num_filters=num_channels,
size=l.config.size) size=l.config.size)
...@@ -7079,7 +7145,7 @@ def img_conv3d_layer(input, ...@@ -7079,7 +7145,7 @@ def img_conv3d_layer(input,
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:param trans: True if it is a convTransLayer, False if it is a convLayer :param trans: True if it is a convTransLayer, False if it is a convLayer
:type trans: bool :type trans: bool
:param layer_type: Specify the layer_type. If the parameter is set, it must be "deconv3d" :param layer_type: Specify the layer type. If the parameter is set, it must be "deconv3d"
when trans=True. If not set, it will be automatically set to "deconv3d" when trans=True. If not set, it will be automatically set to "deconv3d"
when trans=True and "conv3d" when trans=False. when trans=True and "conv3d" when trans=False.
:type layer_type: basestring :type layer_type: basestring
......
...@@ -65,6 +65,7 @@ layers { ...@@ -65,6 +65,7 @@ layers {
height: 227 height: 227
width: 227 width: 227
depth: 1 depth: 1
epsilon: 1e-05
} }
layers { layers {
name: "__crmnorm_0__" name: "__crmnorm_0__"
......
...@@ -65,6 +65,7 @@ layers { ...@@ -65,6 +65,7 @@ layers {
height: 256 height: 256
width: 256 width: 256
depth: 1 depth: 1
epsilon: 1e-05
} }
layers { layers {
name: "__crmnorm_0__" name: "__crmnorm_0__"
......
...@@ -36,6 +36,7 @@ layers { ...@@ -36,6 +36,7 @@ layers {
height: 6 height: 6
width: 20 width: 20
depth: 3 depth: 3
epsilon: 1e-05
} }
parameters { parameters {
name: "___batch_norm_0__.w0" name: "___batch_norm_0__.w0"
......
...@@ -4,6 +4,8 @@ layers { ...@@ -4,6 +4,8 @@ layers {
type: "data" type: "data"
size: 300 size: 300
active_type: "" active_type: ""
height: 10
width: 10
} }
layers { layers {
name: "__prelu_layer_0__" name: "__prelu_layer_0__"
...@@ -15,6 +17,9 @@ layers { ...@@ -15,6 +17,9 @@ layers {
input_parameter_name: "___prelu_layer_0__.w0" input_parameter_name: "___prelu_layer_0__.w0"
} }
partial_sum: 1 partial_sum: 1
height: 10
width: 10
depth: 1
} }
layers { layers {
name: "__prelu_layer_1__" name: "__prelu_layer_1__"
...@@ -26,6 +31,9 @@ layers { ...@@ -26,6 +31,9 @@ layers {
input_parameter_name: "___prelu_layer_1__.w0" input_parameter_name: "___prelu_layer_1__.w0"
} }
partial_sum: 1 partial_sum: 1
height: 10
width: 10
depth: 1
} }
layers { layers {
name: "__prelu_layer_2__" name: "__prelu_layer_2__"
...@@ -37,41 +45,100 @@ layers { ...@@ -37,41 +45,100 @@ layers {
input_parameter_name: "___prelu_layer_2__.w0" input_parameter_name: "___prelu_layer_2__.w0"
} }
partial_sum: 5 partial_sum: 5
height: 10
width: 10
depth: 1
}
layers {
name: "__prelu_layer_3__"
type: "prelu"
size: 300
active_type: ""
inputs {
input_layer_name: "input"
input_parameter_name: "___prelu_layer_3__.w0"
}
partial_sum: 300
height: 10
width: 10
depth: 1
}
layers {
name: "__prelu_layer_4__"
type: "prelu"
size: 300
active_type: ""
inputs {
input_layer_name: "input"
input_parameter_name: "___prelu_layer_4__.w0"
}
partial_sum: 100
height: 10
width: 10
depth: 1
} }
parameters { parameters {
name: "___prelu_layer_0__.w0" name: "___prelu_layer_0__.w0"
size: 300 size: 300
initial_mean: 0.0 initial_mean: 0.25
initial_std: 0.057735026919 initial_std: 0.0
dims: 1
dims: 300
initial_strategy: 0 initial_strategy: 0
initial_smart: true initial_smart: false
} }
parameters { parameters {
name: "___prelu_layer_1__.w0" name: "___prelu_layer_1__.w0"
size: 300 size: 300
initial_mean: 0.0 initial_mean: 0.25
initial_std: 0.057735026919 initial_std: 0.0
dims: 1
dims: 300
initial_strategy: 0 initial_strategy: 0
initial_smart: true initial_smart: false
} }
parameters { parameters {
name: "___prelu_layer_2__.w0" name: "___prelu_layer_2__.w0"
size: 60 size: 60
initial_mean: 0.0 initial_mean: 0.25
initial_std: 0.129099444874 initial_std: 0.0
dims: 1
dims: 60
initial_strategy: 0
initial_smart: false
}
parameters {
name: "___prelu_layer_3__.w0"
size: 1
initial_mean: 0.25
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
}
parameters {
name: "___prelu_layer_4__.w0"
size: 3
initial_mean: 0.25
initial_std: 0.0
dims: 1
dims: 3
initial_strategy: 0 initial_strategy: 0
initial_smart: true initial_smart: false
} }
input_layer_names: "input" input_layer_names: "input"
output_layer_names: "__prelu_layer_2__" output_layer_names: "__prelu_layer_4__"
sub_models { sub_models {
name: "root" name: "root"
layer_names: "input" layer_names: "input"
layer_names: "__prelu_layer_0__" layer_names: "__prelu_layer_0__"
layer_names: "__prelu_layer_1__" layer_names: "__prelu_layer_1__"
layer_names: "__prelu_layer_2__" layer_names: "__prelu_layer_2__"
layer_names: "__prelu_layer_3__"
layer_names: "__prelu_layer_4__"
input_layer_names: "input" input_layer_names: "input"
output_layer_names: "__prelu_layer_2__" output_layer_names: "__prelu_layer_4__"
is_recurrent_layer_group: false is_recurrent_layer_group: false
} }
from paddle.trainer_config_helpers import * from paddle.trainer_config_helpers import *
data = data_layer(name='input', size=300) data = data_layer(name='input', size=300, height=10, width=10)
prelu = prelu_layer(input=data) prelu = prelu_layer(input=data, num_channels=3)
prelu = prelu_layer(input=data, partial_sum=1) prelu = prelu_layer(input=data, partial_sum=1, num_channels=3)
prelu = prelu_layer(input=data, partial_sum=5) prelu = prelu_layer(input=data, partial_sum=5, num_channels=3)
prelu = prelu_layer(input=data, channel_shared=True, num_channels=3)
prelu = prelu_layer(input=data, channel_shared=False, num_channels=3)
outputs(prelu) outputs(prelu)
...@@ -62,21 +62,15 @@ __all__ = [ ...@@ -62,21 +62,15 @@ __all__ = [
cp.begin_parse() cp.begin_parse()
def init(**kwargs): def set_omp_mkl_env_vars(trainer_count):
import py_paddle.swig_paddle as api '''Auto set CPU environment if have not set before.
args = [] export KMP_AFFINITY, OMP_DYNAMIC according to the Hyper Threading status.
args_dict = {} export OMP_NUM_THREADS, MKL_NUM_THREADS according to trainer_count.
# NOTE: append arguments if they are in ENV '''
for ek, ev in os.environ.iteritems(): import platform
if ek.startswith("PADDLE_INIT_"): if not platform.system() in ['Linux', 'Darwin']:
args_dict[ek.replace("PADDLE_INIT_", "").lower()] = str(ev) return
args_dict.update(kwargs)
# NOTE: overwrite arguments from ENV if it is in kwargs
for key in args_dict.keys():
args.append('--%s=%s' % (key, str(args_dict[key])))
# auto set cpu environment
def set_env(key, value): def set_env(key, value):
'''If the key has not been set in the environment, set it with value.''' '''If the key has not been set in the environment, set it with value.'''
assert isinstance(key, str) assert isinstance(key, str)
...@@ -85,22 +79,59 @@ def init(**kwargs): ...@@ -85,22 +79,59 @@ def init(**kwargs):
if envset is None: if envset is None:
os.environ[key] = value os.environ[key] = value
ht = os.popen("lscpu |grep \"per core\"|awk -F':' '{print $2}'|xargs") def num_physical_cores():
ht = int(ht.read()) '''Get the number of physical cores'''
if ht == 1: # ht is off if platform.system() == "Linux":
set_env("OMP_DYNAMIC", "false") num_sockets = int(
set_env("KMP_AFFINITY", "granularity=fine,compact,0,0") os.popen("lscpu |grep \"Socket\" |awk -F':' '{print $2}'|xargs")
else: .read())
num_cores_per_socket = int(
os.popen(
"lscpu |grep \"per socket\" |awk -F':' '{print $2}'|xargs")
.read())
return num_sockets * num_cores_per_socket
else:
cmds = {"Darwin": "sysctl -n hw.physicalcpu"}
return int(os.popen(cmds.get(platform.system(), "expr 1")).read())
def num_logical_processors():
'''Get the number of logical processors'''
cmds = {
"Linux": "grep \"processor\" /proc/cpuinfo|sort -u|wc -l",
"Darwin": "sysctl -n hw.logicalcpu"
}
return int(os.popen(cmds.get(platform.system(), "expr 1")).read())
num_cores = num_physical_cores()
num_processors = num_logical_processors()
if num_processors > num_cores: # Hyper Threading is enabled
set_env("OMP_DYNAMIC", "true") set_env("OMP_DYNAMIC", "true")
set_env("KMP_AFFINITY", "granularity=fine,compact,1,0") set_env("KMP_AFFINITY", "granularity=fine,compact,1,0")
processors = os.popen("grep \"processor\" /proc/cpuinfo|sort -u|wc -l") else:
processors = int(processors.read()) set_env("OMP_DYNAMIC", "false")
trainers = kwargs.get('trainer_count', 1) set_env("KMP_AFFINITY", "granularity=fine,compact,0,0")
threads = processors / trainers threads = num_processors / trainer_count
threads = '1' if threads < 1 else str(threads) threads = '1' if threads < 1 else str(threads)
set_env("OMP_NUM_THREADS", threads) set_env("OMP_NUM_THREADS", threads)
set_env("MKL_NUM_THREADS", threads) set_env("MKL_NUM_THREADS", threads)
def init(**kwargs):
import py_paddle.swig_paddle as api
args = []
args_dict = {}
# NOTE: append arguments if they are in ENV
for ek, ev in os.environ.iteritems():
if ek.startswith("PADDLE_INIT_"):
args_dict[ek.replace("PADDLE_INIT_", "").lower()] = str(ev)
args_dict.update(kwargs)
# NOTE: overwrite arguments from ENV if it is in kwargs
for key in args_dict.keys():
args.append('--%s=%s' % (key, str(args_dict[key])))
set_omp_mkl_env_vars(kwargs.get('trainer_count', 1))
if 'use_gpu' in kwargs: if 'use_gpu' in kwargs:
cp.g_command_config_args['use_gpu'] = kwargs['use_gpu'] cp.g_command_config_args['use_gpu'] = kwargs['use_gpu']
if 'use_mkldnn' in kwargs: if 'use_mkldnn' in kwargs:
......
...@@ -15,6 +15,37 @@ def unique_name(prefix): ...@@ -15,6 +15,37 @@ def unique_name(prefix):
return "_".join([prefix, str(uid)]) return "_".join([prefix, str(uid)])
def convert_np_dtype_to_dtype_(np_dtype):
dtype = np.dtype(np_dtype)
if dtype == np.float32:
return core.DataType.FP32
elif dtype == np.float64:
return core.DataType.FP64
elif dtype == np.float16:
return core.DataType.FP16
elif dtype == np.int32:
return core.DataType.INT32
elif dtype == np.int16:
return core.DataType.INT16
elif dtype == np.int64:
return core.DataType.INT64
elif dtype == np.bool:
return core.DataType.BOOL
else:
raise ValueError("Not supported numpy dtype " + str(dtype))
def dtype_is_floating(dtype):
if not isinstance(dtype, core.DataType):
dtype = convert_np_dtype_to_dtype_(dtype)
if (dtype == core.DataType.FP16 or dtype == core.DataType.FP32 or
dtype == core.DataType.FP64):
return True
else:
return False
def _debug_string_(proto, throw_on_error=True): def _debug_string_(proto, throw_on_error=True):
error_fields = list() error_fields = list()
if not proto.IsInitialized(error_fields) and throw_on_error: if not proto.IsInitialized(error_fields) and throw_on_error:
...@@ -66,7 +97,7 @@ class Variable(object): ...@@ -66,7 +97,7 @@ class Variable(object):
"matched.".format(self.name, old_shape, shape)) "matched.".format(self.name, old_shape, shape))
if dtype is not None: if dtype is not None:
if not isinstance(dtype, core.DataType): if not isinstance(dtype, core.DataType):
dtype = Variable._convert_np_dtype_to_dtype_(dtype) dtype = convert_np_dtype_to_dtype_(dtype)
if is_new_var: if is_new_var:
self.desc.set_data_type(dtype) self.desc.set_data_type(dtype)
else: else:
...@@ -148,26 +179,6 @@ class Variable(object): ...@@ -148,26 +179,6 @@ class Variable(object):
uid = core.unique_integer(prefix) # unique during whole process. uid = core.unique_integer(prefix) # unique during whole process.
return "_".join([prefix, str(uid)]) return "_".join([prefix, str(uid)])
@staticmethod
def _convert_np_dtype_to_dtype_(np_dtype):
dtype = np.dtype(np_dtype)
if dtype == np.float32:
return core.DataType.FP32
elif dtype == np.float64:
return core.DataType.FP64
elif dtype == np.float16:
return core.DataType.FP16
elif dtype == np.int32:
return core.DataType.INT32
elif dtype == np.int16:
return core.DataType.INT16
elif dtype == np.int64:
return core.DataType.INT64
elif dtype == np.bool:
return core.DataType.BOOL
else:
raise ValueError("Not supported numpy dtype " + str(dtype))
def get_all_op_protos(): def get_all_op_protos():
""" """
......
...@@ -285,3 +285,86 @@ class XavierInitializer(Initializer): ...@@ -285,3 +285,86 @@ class XavierInitializer(Initializer):
}) })
var.op = op var.op = op
return op return op
class MSRAInitializer(Initializer):
"""Implements the MSRA initializer a.k.a. Kaiming Initializer
This class implements the weight initialization from the paper
Delving Deep into Rectifiers: Surpassing Human-Level Performance on
ImageNet Classification[1] by Kaiming He, Xiangyu Zhang, Shaoqing Ren
and Jian Sun. This is a robust initialization method that particularly
considers the rectifier nonlinearities. In case of Uniform distribution,
the range is [-x, x], where x = sqrt(6 / fan_in). In case of Normal
distribution, the mean is 0 and the standard deviation
is sqrt(2/ fan_in).
References:
[1] Delving Deep into Rectifiers: Surpassing Human-Level Performance
on ImageNet Classification
(https://arxiv.org/abs/1502.01852)
"""
def __init__(self, uniform=True, fan_in=None, seed=0):
"""Constructor for MSRAInitializer
Args:
uniform: whether to use uniform or normal distribution
fan_in: fan_in for MSRAInitializer. If None, it is
inferred from the variable.
seed: random seed
Note: It is recommended to set fan_in to None for most cases.
"""
assert uniform is not None
assert seed is not None
super(MSRAInitializer, self).__init__()
self._uniform = uniform
self._fan_in = fan_in
self._seed = seed
def __call__(self, var, block):
"""Add MSRA initialization ops for a variable
Args:
var: Variable that needs to be initialized
block: The block in which initialization ops
should be added
Returns:
the initialization op
"""
assert isinstance(var, framework.Variable)
assert isinstance(block, framework.Block)
f_in, f_out = self._compute_fans(var)
# If fan_in is passed, use it
fan_in = f_in if self._fan_in is None else self._fan_in
if self._uniform:
limit = np.sqrt(6.0 / float(fan_in))
op = block.prepend_op(
type="uniform_random",
outputs={"Out": var},
attrs={
"shape": var.shape,
"data_type": int(var.data_type),
"min": -limit,
"max": limit,
"seed": self._seed
})
else:
std = np.sqrt(2.0 / float(fan_in))
op = block.prepend_op(
type="gaussian_random",
outputs={"Out": var},
attrs={
"shape": var.shape,
"data_type": int(var.data_type),
"mean": 0.0,
"std": std,
"seed": self._seed
})
var.op = op
return op
...@@ -2,7 +2,7 @@ import copy ...@@ -2,7 +2,7 @@ import copy
import itertools import itertools
from paddle.v2.fluid.framework import Variable, g_main_program, \ from paddle.v2.fluid.framework import Variable, g_main_program, \
g_startup_program, unique_name, Program g_startup_program, unique_name, Program, dtype_is_floating
from paddle.v2.fluid.initializer import ConstantInitializer, \ from paddle.v2.fluid.initializer import ConstantInitializer, \
UniformInitializer, XavierInitializer UniformInitializer, XavierInitializer
...@@ -61,7 +61,7 @@ class LayerHelper(object): ...@@ -61,7 +61,7 @@ class LayerHelper(object):
@property @property
def param_attr(self): def param_attr(self):
default = {'name': None, 'initializer': XavierInitializer()} default = {'name': None}
actual = self.kwargs.get('param_attr', None) actual = self.kwargs.get('param_attr', None)
if actual is None: if actual is None:
actual = default actual = default
...@@ -72,7 +72,7 @@ class LayerHelper(object): ...@@ -72,7 +72,7 @@ class LayerHelper(object):
@property @property
def bias_attr(self): def bias_attr(self):
default = {'name': None, 'initializer': ConstantInitializer()} default = {'name': None}
bias_attr = self.kwargs.get('bias_attr', None) bias_attr = self.kwargs.get('bias_attr', None)
if bias_attr is None: if bias_attr is None:
bias_attr = default bias_attr = default
...@@ -119,12 +119,17 @@ class LayerHelper(object): ...@@ -119,12 +119,17 @@ class LayerHelper(object):
attr_copy = copy.deepcopy(attr) attr_copy = copy.deepcopy(attr)
if initializer is not None: if initializer is not None:
attr_copy['initializer'] = initializer attr_copy['initializer'] = initializer
else:
attr_copy['initializer'] = self._get_default_initializer(dtype)
if attr_copy['name'] is None: if attr_copy['name'] is None:
attr_copy['name'] = unique_name(".".join([self.name, suffix])) attr_copy['name'] = unique_name(".".join([self.name, suffix]))
self.startup_program.global_block().create_parameter( self.startup_program.global_block().create_parameter(
dtype=dtype, shape=shape, **attr_copy) dtype=dtype, shape=shape, **attr_copy)
return self.main_program.global_block().create_parameter( return self.main_program.global_block().create_parameter(
name=attr_copy['name'], dtype=dtype, shape=shape) name=attr_copy['name'],
dtype=dtype,
shape=shape,
trainable=attr_copy.get('trainable', True))
def create_tmp_variable(self, dtype): def create_tmp_variable(self, dtype):
return self.main_program.current_block().create_var( return self.main_program.current_block().create_var(
...@@ -149,13 +154,19 @@ class LayerHelper(object): ...@@ -149,13 +154,19 @@ class LayerHelper(object):
persistable=True, persistable=True,
initializer=initializer) initializer=initializer)
def append_bias_op(self, input_var, dim_start=1, dim_end=None): def append_bias_op(self,
input_var,
bias_initializer,
dim_start=1,
dim_end=None):
""" """
Append bias operator and return its output. If the user does not set Append bias operator and return its output. If the user does not set
bias_attr, append_bias_op will return input_var bias_attr, append_bias_op will return input_var
:param input_var: the input variable. The len(input_var.shape) is larger :param input_var: the input variable. The len(input_var.shape) is
or equal than 2. larger or equal than 2.
:bias_initializer: an instance of a subclass of Initializer used to
initialize the bias
:param dim_start: :param dim_start:
:param dim_end: the shape of the bias will be :param dim_end: the shape of the bias will be
input_var.shape[dim_start:dim_end]. The bias is broadcasted to other input_var.shape[dim_start:dim_end]. The bias is broadcasted to other
...@@ -167,7 +178,11 @@ class LayerHelper(object): ...@@ -167,7 +178,11 @@ class LayerHelper(object):
return input_var return input_var
b = self.create_parameter( b = self.create_parameter(
attr=bias_attr, shape=size, dtype=input_var.data_type, suffix='b') attr=bias_attr,
shape=size,
dtype=input_var.data_type,
suffix='b',
initializer=bias_initializer)
tmp = self.create_tmp_variable(dtype=input_var.data_type) tmp = self.create_tmp_variable(dtype=input_var.data_type)
self.append_op( self.append_op(
type='elementwise_add', type='elementwise_add',
...@@ -191,3 +206,10 @@ class LayerHelper(object): ...@@ -191,3 +206,10 @@ class LayerHelper(object):
outputs={"Y": [tmp]}, outputs={"Y": [tmp]},
attrs=act) attrs=act)
return tmp return tmp
def _get_default_initializer(self, dtype):
if dtype is None or dtype_is_floating(dtype) is True:
return XavierInitializer()
else:
# For integer and boolean types, initialize with all zeros
return ConstantInitializer()
...@@ -3,7 +3,7 @@ import paddle.v2.fluid.proto.framework_pb2 as framework_pb2 ...@@ -3,7 +3,7 @@ import paddle.v2.fluid.proto.framework_pb2 as framework_pb2
from paddle.v2.fluid.framework import OpProtoHolder, Variable, Program, \ from paddle.v2.fluid.framework import OpProtoHolder, Variable, Program, \
Operator Operator
from paddle.v2.fluid.initializer import ConstantInitializer, \ from paddle.v2.fluid.initializer import ConstantInitializer, \
NormalInitializer NormalInitializer, XavierInitializer
from paddle.v2.fluid.layer_helper import LayerHelper, unique_name from paddle.v2.fluid.layer_helper import LayerHelper, unique_name
import re import re
import cStringIO import cStringIO
...@@ -17,11 +17,13 @@ __all__ = [ ...@@ -17,11 +17,13 @@ __all__ = [
def fc(input, def fc(input,
size, size,
num_flatten_dims=1,
param_attr=None, param_attr=None,
param_initializer=None,
bias_attr=None, bias_attr=None,
name=None, bias_initializer=None,
act=None, act=None,
num_flatten_dims=1, name=None,
main_program=None, main_program=None,
startup_program=None): startup_program=None):
""" """
...@@ -30,11 +32,15 @@ def fc(input, ...@@ -30,11 +32,15 @@ def fc(input,
Args: Args:
input: The input tensor to the function input: The input tensor to the function
size: The size of the layer size: The size of the layer
num_flatten_dims: Number of columns in input
param_attr: The parameters/weights to the FC Layer param_attr: The parameters/weights to the FC Layer
param_initializer: Initializer used for the weight/parameter.
If None, XavierInitializer() is used
bias_attr: The bias parameter for the FC layer bias_attr: The bias parameter for the FC layer
name: Name/alias of the function bias_initializer: Initializer used for the bias.
If None, then ConstantInitializer() is used
act: Activation to be applied to the output of FC layer act: Activation to be applied to the output of FC layer
num_flatten_dims: Number of columns in input name: Name/alias of the function
main_program: Name of the main program that calls this main_program: Name of the main program that calls this
startup_program: Name of the startup program startup_program: Name of the startup program
...@@ -50,10 +56,23 @@ def fc(input, ...@@ -50,10 +56,23 @@ def fc(input,
to the LayerHelper constructor. to the LayerHelper constructor.
""" """
def _get_default_param_initializer():
return XavierInitializer()
def _get_default_bias_initializer():
return ConstantInitializer()
helper = LayerHelper('fc', **locals()) helper = LayerHelper('fc', **locals())
dtype = helper.input_dtype() dtype = helper.input_dtype()
if param_initializer is None:
param_initializer = _get_default_param_initializer()
if bias_initializer is None:
bias_initializer = _get_default_bias_initializer()
mul_results = [] mul_results = []
for input_var, param_attr in helper.iter_inputs_and_params(): for input_var, param_attr in helper.iter_inputs_and_params():
input_shape = input_var.shape input_shape = input_var.shape
...@@ -61,7 +80,10 @@ def fc(input, ...@@ -61,7 +80,10 @@ def fc(input,
reduce(lambda a, b: a * b, input_shape[num_flatten_dims:], 1) reduce(lambda a, b: a * b, input_shape[num_flatten_dims:], 1)
] + [size] ] + [size]
w = helper.create_parameter( w = helper.create_parameter(
attr=param_attr, shape=param_shape, dtype=dtype) attr=param_attr,
initializer=param_initializer,
shape=param_shape,
dtype=dtype)
tmp = helper.create_tmp_variable(dtype) tmp = helper.create_tmp_variable(dtype)
helper.append_op( helper.append_op(
type="mul", type="mul",
...@@ -82,16 +104,17 @@ def fc(input, ...@@ -82,16 +104,17 @@ def fc(input,
helper.append_op( helper.append_op(
type="sum", inputs={"X": mul_results}, outputs={"Out": pre_bias}) type="sum", inputs={"X": mul_results}, outputs={"Out": pre_bias})
# add bias # add bias
pre_activation = helper.append_bias_op(pre_bias) pre_activation = helper.append_bias_op(pre_bias, bias_initializer)
# add activation # add activation
return helper.append_activation(pre_activation) return helper.append_activation(pre_activation)
def embedding(input, def embedding(input,
size, size,
data_type='float32',
is_sparse=False, is_sparse=False,
param_initializer=None,
param_attr=None, param_attr=None,
data_type='float32',
main_program=None, main_program=None,
startup_program=None): startup_program=None):
""" """
...@@ -100,9 +123,9 @@ def embedding(input, ...@@ -100,9 +123,9 @@ def embedding(input,
Args: Args:
input: The input to the function input: The input to the function
size: The size of the layer size: The size of the layer
data_type: The type of data : float32, float_16, int etc
is_sparse: A flag that decleares whether the input is sparse is_sparse: A flag that decleares whether the input is sparse
param_attr: Parameters for this layer param_attr: Parameters for this layer
data_type: The type of data : float32, float_16, int etc
main_program: Name of the main program that calls this main_program: Name of the main program that calls this
startup_program: Name of the startup program startup_program: Name of the startup program
...@@ -114,9 +137,16 @@ def embedding(input, ...@@ -114,9 +137,16 @@ def embedding(input,
to the LayerHelper constructor. to the LayerHelper constructor.
""" """
def _get_default_param_initializer():
return XavierInitializer()
helper = LayerHelper('embedding', **locals()) helper = LayerHelper('embedding', **locals())
w = helper.create_parameter( w = helper.create_parameter(
attr=helper.param_attr, shape=size, dtype=data_type) attr=helper.param_attr,
shape=size,
dtype=data_type,
initializer=param_initializer or _get_default_param_initializer())
tmp = helper.create_tmp_variable(data_type) tmp = helper.create_tmp_variable(data_type)
helper.append_op( helper.append_op(
type='lookup_table', type='lookup_table',
...@@ -130,7 +160,6 @@ def embedding(input, ...@@ -130,7 +160,6 @@ def embedding(input,
# TODO(qijun): expose H0 and C0 # TODO(qijun): expose H0 and C0
def dynamic_lstm(input, def dynamic_lstm(input,
size, size,
data_type='float32',
param_attr=None, param_attr=None,
bias_attr=None, bias_attr=None,
use_peepholes=True, use_peepholes=True,
...@@ -138,6 +167,7 @@ def dynamic_lstm(input, ...@@ -138,6 +167,7 @@ def dynamic_lstm(input,
gate_activation='sigmoid', gate_activation='sigmoid',
cell_activation='tanh', cell_activation='tanh',
candidate_activation='tanh', candidate_activation='tanh',
data_type='float32',
main_program=None, main_program=None,
startup_program=None): startup_program=None):
helper = LayerHelper('lstm', **locals()) helper = LayerHelper('lstm', **locals())
...@@ -178,9 +208,9 @@ def dynamic_lstm(input, ...@@ -178,9 +208,9 @@ def dynamic_lstm(input,
def data(name, def data(name,
shape, shape,
append_batch_size=True,
data_type='float32', data_type='float32',
type=core.VarDesc.VarType.LOD_TENSOR, type=core.VarDesc.VarType.LOD_TENSOR,
append_batch_size=True,
main_program=None, main_program=None,
startup_program=None, startup_program=None,
stop_gradient=True): stop_gradient=True):
...@@ -190,9 +220,9 @@ def data(name, ...@@ -190,9 +220,9 @@ def data(name,
Args: Args:
name: The name/alias of the function name: The name/alias of the function
shape: Tuple declaring the shape. shape: Tuple declaring the shape.
append_batch_size: Whether or not to append the data as a batch.
data_type: The type of data : float32, float_16, int etc data_type: The type of data : float32, float_16, int etc
type: The output type. By default it is LOD_TENSOR. type: The output type. By default it is LOD_TENSOR.
append_batch_size: Whether or not to append the data as a batch.
main_program: Name of the main program that calls this main_program: Name of the main program that calls this
startup_program: Name of the startup program startup_program: Name of the startup program
stop_gradient: A boolean that mentions whether gradient should flow. stop_gradient: A boolean that mentions whether gradient should flow.
...@@ -226,7 +256,7 @@ def data(name, ...@@ -226,7 +256,7 @@ def data(name,
stop_gradient=stop_gradient) stop_gradient=stop_gradient)
def create_tensor(dtype, name=None, main_program=None): def create_tensor(dtype, name=None, main_program=None, startup_program=None):
helper = LayerHelper("create_tensor", **locals()) helper = LayerHelper("create_tensor", **locals())
return helper.create_variable(name=helper.name, dtype=dtype) return helper.create_variable(name=helper.name, dtype=dtype)
...@@ -390,30 +420,12 @@ _create_op_func_('mul') ...@@ -390,30 +420,12 @@ _create_op_func_('mul')
_create_op_func_('elementwise_add') _create_op_func_('elementwise_add')
_create_op_func_('dropout') _create_op_func_('dropout')
_create_op_func_('reshape') _create_op_func_('reshape')
_create_op_func_('elementwise_add')
_create_op_func_('sigmoid') _create_op_func_('sigmoid')
_create_op_func_('scale') _create_op_func_('scale')
_create_op_func_('reshape') _create_op_func_('reshape')
_create_op_func_('transpose') _create_op_func_('transpose')
def fill_constant(data_type, shape, value=None, program=None):
"""
This function creates a tensor , with shape as mentioned in the input and
specified data_type and fills this up with a constant value that
comes in the input.
"""
helper = LayerHelper('fill_constant', **locals())
out = helper.create_tmp_variable(dtype=data_type)
helper.append_op(
type='fill_constant',
outputs={'Out': [out]},
attrs={'data_type': data_type,
'shape': shape,
'value': value})
return out
def cast(x, data_type, main_program=None): def cast(x, data_type, main_program=None):
""" """
This function takes in the input with input_data_type This function takes in the input with input_data_type
...@@ -456,7 +468,42 @@ def sums(input, main_program=None, startup_program=None): ...@@ -456,7 +468,42 @@ def sums(input, main_program=None, startup_program=None):
return out return out
def assign(input, output, main_program=None): def linear_chain_crf(input,
label,
param_attr=None,
param_initializer=None,
main_program=None,
startup_program=None):
def _get_default_param_initializer():
return XavierInitializer()
helper = LayerHelper('linear_chain_crf', **locals())
size = input.shape[1]
transition = helper.create_parameter(
attr=helper.param_attr,
shape=[size + 2, size],
dtype=helper.input_dtype(),
initializer=param_initializer or _get_default_param_initializer())
alpha = helper.create_tmp_variable(dtype=helper.input_dtype())
emission_exps = helper.create_tmp_variable(dtype=helper.input_dtype())
transition_exps = helper.create_tmp_variable(dtype=helper.input_dtype())
log_likelihood = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(
type='linear_chain_crf',
inputs={"Emission": [input],
"Transition": transition,
"Label": label},
outputs={
"Alpha": [alpha],
"EmissionExps": [emission_exps],
"TransitionExps": transition_exps,
"LogLikelihood": log_likelihood
})
return log_likelihood
def assign(input, output, main_program=None, startup_program=None):
helper = LayerHelper('assign', **locals()) helper = LayerHelper('assign', **locals())
helper.append_op( helper.append_op(
type='scale', type='scale',
...@@ -468,7 +515,7 @@ def assign(input, output, main_program=None): ...@@ -468,7 +515,7 @@ def assign(input, output, main_program=None):
def split_lod_tensor(input, def split_lod_tensor(input,
mask, mask,
level, level=0,
main_program=None, main_program=None,
startup_program=None): startup_program=None):
helper = LayerHelper('split_lod_tensor', **locals()) helper = LayerHelper('split_lod_tensor', **locals())
...@@ -490,11 +537,11 @@ def merge_lod_tensor(in_true, ...@@ -490,11 +537,11 @@ def merge_lod_tensor(in_true,
in_false, in_false,
x, x,
mask, mask,
level, level=0,
main_program=None, main_program=None,
startup_program=None): startup_program=None):
helper = LayerHelper('merge_lod_tensor', **locals()) helper = LayerHelper('merge_lod_tensor', **locals())
out = helper.create_tmp_variable(dtype=x.data_type) out = helper.create_tmp_variable(dtype=in_true.data_type)
helper.append_op( helper.append_op(
type='merge_lod_tensor', type='merge_lod_tensor',
inputs={'X': x, inputs={'X': x,
...@@ -596,10 +643,12 @@ def sequence_conv(input, ...@@ -596,10 +643,12 @@ def sequence_conv(input,
num_filters, num_filters,
filter_size=3, filter_size=3,
filter_stride=1, filter_stride=1,
act=None,
padding=None, padding=None,
bias_attr=None, bias_attr=None,
bias_initializer=None,
param_attr=None, param_attr=None,
param_initializer=None,
act=None,
main_program=None, main_program=None,
startup_program=None): startup_program=None):
""" """
...@@ -607,6 +656,13 @@ def sequence_conv(input, ...@@ -607,6 +656,13 @@ def sequence_conv(input,
other convolutional configurations for the filters and stride as given other convolutional configurations for the filters and stride as given
in the input parameters to the function. in the input parameters to the function.
""" """
def _get_default_bias_initializer():
return ConstantInitializer()
def _get_default_param_initializer():
return XavierInitializer()
# FIXME(dzh) : want to unify the argument of python layer # FIXME(dzh) : want to unify the argument of python layer
# function. So we ignore some unecessary attributes. # function. So we ignore some unecessary attributes.
# such as, padding_trainable, context_start. # such as, padding_trainable, context_start.
...@@ -614,9 +670,17 @@ def sequence_conv(input, ...@@ -614,9 +670,17 @@ def sequence_conv(input,
helper = LayerHelper('sequence_conv', **locals()) helper = LayerHelper('sequence_conv', **locals())
dtype = helper.input_dtype() dtype = helper.input_dtype()
if param_initializer is None:
param_initializer = _get_default_param_initializer()
if bias_initializer is None:
bias_initializer = _get_default_bias_initializer()
filter_shape = [filter_size * input.shape[1], num_filters] filter_shape = [filter_size * input.shape[1], num_filters]
filter = helper.create_parameter( filter = helper.create_parameter(
attr=helper.param_attr, shape=filter_shape, dtype=dtype) attr=helper.param_attr,
shape=filter_shape,
dtype=dtype,
initializer=param_initializer)
pre_bias = helper.create_tmp_variable(dtype) pre_bias = helper.create_tmp_variable(dtype)
helper.append_op( helper.append_op(
...@@ -631,20 +695,22 @@ def sequence_conv(input, ...@@ -631,20 +695,22 @@ def sequence_conv(input,
'contextStart': -int(filter_size / 2), 'contextStart': -int(filter_size / 2),
'contextLength': filter_size 'contextLength': filter_size
}) })
pre_act = helper.append_bias_op(pre_bias) pre_act = helper.append_bias_op(pre_bias, bias_initializer)
return helper.append_activation(pre_act) return helper.append_activation(pre_act)
def conv2d(input, def conv2d(input,
num_filters, num_filters,
name=None, filter_size,
filter_size=[1, 1],
act=None,
groups=None,
stride=[1, 1], stride=[1, 1],
padding=None, padding=None,
bias_attr=None, groups=None,
param_attr=None, param_attr=None,
param_initializer=None,
bias_attr=None,
bias_initializer=None,
act=None,
name=None,
main_program=None, main_program=None,
startup_program=None): startup_program=None):
""" """
...@@ -654,6 +720,14 @@ def conv2d(input, ...@@ -654,6 +720,14 @@ def conv2d(input,
This funciton can also append an activation on top of the This funciton can also append an activation on top of the
conv-2d output, if mentioned in the input parameters. conv-2d output, if mentioned in the input parameters.
""" """
def _get_default_bias_initializer():
return ConstantInitializer()
def _get_default_param_initializer(filter_size, num_channels):
std = (2.0 / (filter_size[0]**2 * num_channels))**0.5
return NormalInitializer(0.0, std, 0)
helper = LayerHelper('conv2d', **locals()) helper = LayerHelper('conv2d', **locals())
dtype = helper.input_dtype() dtype = helper.input_dtype()
...@@ -675,12 +749,17 @@ def conv2d(input, ...@@ -675,12 +749,17 @@ def conv2d(input,
input_shape = input.shape input_shape = input.shape
filter_shape = [num_filters, num_filter_channels] + filter_size filter_shape = [num_filters, num_filter_channels] + filter_size
std = (2.0 / (filter_size[0]**2 * num_channels))**0.5 if param_initializer is None:
param_initializer = _get_default_param_initializer(filter_size,
num_channels)
if bias_initializer is None:
bias_initializer = _get_default_bias_initializer()
filter = helper.create_parameter( filter = helper.create_parameter(
attr=helper.param_attr, attr=helper.param_attr,
shape=filter_shape, shape=filter_shape,
dtype=dtype, dtype=dtype,
initializer=NormalInitializer(0.0, std, 0)) initializer=param_initializer)
pre_bias = helper.create_tmp_variable(dtype) pre_bias = helper.create_tmp_variable(dtype)
helper.append_op( helper.append_op(
...@@ -694,7 +773,8 @@ def conv2d(input, ...@@ -694,7 +773,8 @@ def conv2d(input,
'paddings': padding, 'paddings': padding,
'groups': groups}) 'groups': groups})
pre_act = helper.append_bias_op(pre_bias, dim_start=1, dim_end=2) pre_act = helper.append_bias_op(
pre_bias, bias_initializer, dim_start=1, dim_end=2)
return helper.append_activation(pre_act) return helper.append_activation(pre_act)
...@@ -1311,7 +1391,7 @@ def array_to_lod_tensor(x, table, main_program=None): ...@@ -1311,7 +1391,7 @@ def array_to_lod_tensor(x, table, main_program=None):
return tmp return tmp
def fill_constant(shape, dtype, value, main_program=None): def fill_constant(shape, dtype, value, main_program=None, startup_program=None):
""" """
This function creates a tensor , with shape as mentioned in the input and This function creates a tensor , with shape as mentioned in the input and
specified data_type and fills this up with a constant value that specified data_type and fills this up with a constant value that
...@@ -1332,6 +1412,31 @@ def fill_constant(shape, dtype, value, main_program=None): ...@@ -1332,6 +1412,31 @@ def fill_constant(shape, dtype, value, main_program=None):
return out return out
def fill_constant_batch_size_like(input,
shape,
dtype,
value,
input_dim_idx=0,
output_dim_idx=0,
main_program=None,
startup_program=None):
helper = LayerHelper("fill_constant_batch_size_like", **locals())
out = helper.create_tmp_variable(dtype=dtype)
helper.append_op(
type='fill_constant_batch_size_like',
inputs={'Input': input},
outputs={'Out': [out]},
attrs={
'shape': shape,
'data_type': out.data_type,
'value': float(value),
'input_dim_idx': input_dim_idx,
'output_dim_idx': output_dim_idx
})
out.stop_gradient = True
return out
def ones(shape, dtype, main_program=None): def ones(shape, dtype, main_program=None):
""" """
This function performs the same function as fill_constant() declared above This function performs the same function as fill_constant() declared above
...@@ -1394,7 +1499,7 @@ def create_array(dtype, main_program=None): ...@@ -1394,7 +1499,7 @@ def create_array(dtype, main_program=None):
dtype=dtype) dtype=dtype)
def less_than(x, y, cond=None, main_program=None): def less_than(x, y, cond=None, main_program=None, **ignored):
helper = LayerHelper("less_than", **locals()) helper = LayerHelper("less_than", **locals())
if cond is None: if cond is None:
cond = helper.create_tmp_variable(dtype='bool') cond = helper.create_tmp_variable(dtype='bool')
...@@ -1472,13 +1577,20 @@ class ConditionalBlockGuard(BlockGuard): ...@@ -1472,13 +1577,20 @@ class ConditionalBlockGuard(BlockGuard):
class ConditionalBlock(object): class ConditionalBlock(object):
def __init__(self, inputs, name=None, main_program=None): def __init__(self,
inputs,
name=None,
main_program=None,
startup_program=None):
for each_input in inputs: for each_input in inputs:
if not isinstance(each_input, Variable): if not isinstance(each_input, Variable):
raise TypeError("Each input should be variable") raise TypeError("Each input should be variable")
self.inputs = inputs self.inputs = inputs
self.helper = LayerHelper( self.helper = LayerHelper(
'conditional_block', name=name, main_program=main_program) 'conditional_block',
name=name,
main_program=main_program,
startup_program=startup_program)
def block(self): def block(self):
return ConditionalBlockGuard(self) return ConditionalBlockGuard(self)
...@@ -1523,3 +1635,148 @@ class ConditionalBlock(object): ...@@ -1523,3 +1635,148 @@ class ConditionalBlock(object):
outputs={'Out': out_list, outputs={'Out': out_list,
'Scope': [step_scope]}, 'Scope': [step_scope]},
attrs={'block': inside_block}) attrs={'block': inside_block})
class IfElseBlockGuard(object):
def __init__(self, is_true, ifelse):
if not isinstance(ifelse, IfElse):
raise TypeError("ifelse must be an instance of IfElse class")
if ifelse.status != IfElse.OUT_IF_ELSE_BLOCKS:
raise ValueError("You cannot invoke IfElse.block() inside a block")
self.is_true = is_true
self.ie = ifelse
if is_true:
self.cond_block = ifelse.conditional_true_block
else:
self.cond_block = ifelse.conditional_false_block
if not isinstance(self.cond_block, ConditionalBlock):
raise TypeError("Unexpected situation")
self.cond_block = self.cond_block.block()
def __enter__(self):
self.ie.status = IfElse.IN_IF_ELSE_TRUE_BLOCKS if self.is_true else IfElse.IN_IF_ELSE_FALSE_BLOCKS
self.cond_block.__enter__()
def __exit__(self, exc_type, exc_val, exc_tb):
if not self.cond_block.__exit__(exc_type, exc_val, exc_tb):
# re-raise inside exception
return False
if len(self.ie.output_table[1 if self.is_true else 0]) == 0:
raise ValueError("Must set output inside block")
self.ie.status = IfElse.OUT_IF_ELSE_BLOCKS
class IfElse(object):
OUT_IF_ELSE_BLOCKS = 0
IN_IF_ELSE_TRUE_BLOCKS = 1
IN_IF_ELSE_FALSE_BLOCKS = 2
def __init__(self, cond, name=None, main_program=None,
startup_program=None):
if not isinstance(cond, Variable):
raise TypeError("cond must be a Variable")
self.helper = LayerHelper(
'ifelse',
name=name,
main_program=main_program,
startup_program=startup_program)
self.cond = cond
self.input_table = {}
self.status = IfElse.OUT_IF_ELSE_BLOCKS
self.conditional_true_block = ConditionalBlock(inputs=[self.cond])
self.conditional_false_block = ConditionalBlock(inputs=[self.cond])
self.output_table = ([], []) # (true_outs, false_outs)
def input(self, x):
if self.status == IfElse.OUT_IF_ELSE_BLOCKS:
raise ValueError("input must in true/false blocks")
if id(x) not in self.input_table:
parent_block = self.parent_block()
out_true = parent_block.create_var(
name=unique_name('ifelse_input' + self.helper.name),
dtype=x.data_type)
out_false = parent_block.create_var(
name=unique_name('ifelse_input' + self.helper.name),
dtype=x.data_type)
parent_block.append_op(
type='split_lod_tensor',
inputs={
'X': x,
'Mask': self.cond,
},
outputs={'OutTrue': out_true,
'OutFalse': out_false},
attrs={'level': 0})
self.input_table[id(x)] = (out_true, out_false)
else:
out_true, out_false = self.input_table[id(x)]
if self.status == IfElse.IN_IF_ELSE_TRUE_BLOCKS:
return out_true
else:
return out_false
def parent_block(self):
current_block = self.helper.main_program.current_block()
return self.helper.main_program.block(current_block.parent_idx)
def true_block(self):
return IfElseBlockGuard(True, self)
def false_block(self):
return IfElseBlockGuard(False, self)
def output(self, *outs):
if self.status == self.OUT_IF_ELSE_BLOCKS:
raise ValueError("output can only be invoked in the sub-block")
out_table = self.output_table[1 if self.status ==
self.IN_IF_ELSE_TRUE_BLOCKS else 0]
parent_block = self.parent_block()
for each_out in outs:
if not isinstance(each_out, Variable):
raise TypeError("Each output should be a variable")
# create outside tensor
outside_out = parent_block.create_var(
name=unique_name("_".join([self.helper.name, 'output'])),
dtype=each_out.data_type)
out_table.append(outside_out)
# assign local var to outside
assign(
input=each_out,
output=outside_out,
main_program=self.helper.main_program,
startup_program=self.helper.startup_program)
def __call__(self):
if self.status != self.OUT_IF_ELSE_BLOCKS:
raise ValueError("IfElse::__call__ must be out of sub-block")
false_len, true_len = map(len, self.output_table)
if false_len == 0 and true_len == 0:
raise ValueError("Must invoke true_block/false_block before "
"__call__")
elif false_len != true_len and false_len != 0 and true_len != 0:
raise ValueError("The output side must be same")
elif false_len == 0 or true_len == 0:
return self.output_table[0 if false_len != 0 else 1]
# else none of false_len/true_len is zero
# merge together
rlist = []
for false_var, true_var in zip(*self.output_table):
rlist.append(
merge_lod_tensor(
in_true=true_var,
in_false=false_var,
mask=self.cond,
x=self.cond,
level=0,
main_program=self.helper.main_program,
startup_program=self.helper.startup_program))
return rlist
...@@ -170,7 +170,8 @@ class Optimizer(object): ...@@ -170,7 +170,8 @@ class Optimizer(object):
optimize_ops = [] optimize_ops = []
for param_and_grad in parameters_and_grads: for param_and_grad in parameters_and_grads:
if param_and_grad[1] is not None: if param_and_grad[0].trainable is True and param_and_grad[
1] is not None:
optimize_op = self._append_optimize_op(loss.block, optimize_op = self._append_optimize_op(loss.block,
param_and_grad) param_and_grad)
optimize_ops.append(optimize_op) optimize_ops.append(optimize_op)
......
import numpy as np
import paddle.v2 as paddle
import paddle.v2.dataset.conll05 as conll05
import paddle.v2.fluid.core as core
import paddle.v2.fluid.framework as framework
import paddle.v2.fluid.layers as layers
from paddle.v2.fluid.executor import Executor, g_scope
from paddle.v2.fluid.optimizer import SGDOptimizer
word_dict, verb_dict, label_dict = conll05.get_dict()
word_dict_len = len(word_dict)
label_dict_len = len(label_dict)
pred_len = len(verb_dict)
mark_dict_len = 2
word_dim = 32
mark_dim = 5
hidden_dim = 512
depth = 8
mix_hidden_lr = 1e-3
IS_SPARSE = True
PASS_NUM = 10
BATCH_SIZE = 20
embedding_name = 'emb'
def load_parameter(file_name, h, w):
with open(file_name, 'rb') as f:
f.read(16) # skip header.
return np.fromfile(f, dtype=np.float32).reshape(h, w)
def db_lstm():
# 8 features
word = layers.data(name='word_data', shape=[1], data_type='int64')
predicate = layers.data(name='verb_data', shape=[1], data_type='int64')
ctx_n2 = layers.data(name='ctx_n2_data', shape=[1], data_type='int64')
ctx_n1 = layers.data(name='ctx_n1_data', shape=[1], data_type='int64')
ctx_0 = layers.data(name='ctx_0_data', shape=[1], data_type='int64')
ctx_p1 = layers.data(name='ctx_p1_data', shape=[1], data_type='int64')
ctx_p2 = layers.data(name='ctx_p2_data', shape=[1], data_type='int64')
mark = layers.data(name='mark_data', shape=[1], data_type='int64')
predicate_embedding = layers.embedding(
input=predicate,
size=[pred_len, word_dim],
data_type='float32',
is_sparse=IS_SPARSE,
param_attr={'name': 'vemb'})
mark_embedding = layers.embedding(
input=mark,
size=[mark_dict_len, mark_dim],
data_type='float32',
is_sparse=IS_SPARSE)
word_input = [word, ctx_n2, ctx_n1, ctx_0, ctx_p1, ctx_p2]
emb_layers = [
layers.embedding(
size=[word_dict_len, word_dim],
input=x,
param_attr={'name': embedding_name,
'trainable': False}) for x in word_input
]
emb_layers.append(predicate_embedding)
emb_layers.append(mark_embedding)
hidden_0_layers = [
layers.fc(input=emb, size=hidden_dim) for emb in emb_layers
]
hidden_0 = layers.sums(input=hidden_0_layers)
lstm_0 = layers.dynamic_lstm(
input=hidden_0,
size=hidden_dim,
candidate_activation='relu',
gate_activation='sigmoid',
cell_activation='sigmoid')
# stack L-LSTM and R-LSTM with direct edges
input_tmp = [hidden_0, lstm_0]
for i in range(1, depth):
mix_hidden = layers.sums(input=[
layers.fc(input=input_tmp[0], size=hidden_dim),
layers.fc(input=input_tmp[1], size=hidden_dim)
])
lstm = layers.dynamic_lstm(
input=mix_hidden,
size=hidden_dim,
candidate_activation='relu',
gate_activation='sigmoid',
cell_activation='sigmoid',
is_reverse=((i % 2) == 1))
input_tmp = [mix_hidden, lstm]
feature_out = layers.sums(input=[
layers.fc(input=input_tmp[0], size=label_dict_len),
layers.fc(input=input_tmp[1], size=label_dict_len)
])
return feature_out
def to_lodtensor(data, place):
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = core.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
def main():
# define network topology
feature_out = db_lstm()
target = layers.data(name='target', shape=[1], data_type='int64')
crf_cost = layers.linear_chain_crf(
input=feature_out,
label=target,
param_attr={"name": 'crfw',
"learning_rate": mix_hidden_lr})
avg_cost = layers.mean(x=crf_cost)
# TODO(qiao)
# 1. add crf_decode_layer and evaluator
# 2. use other optimizer and check why out will be NAN
sgd_optimizer = SGDOptimizer(learning_rate=0.0001)
opts = sgd_optimizer.minimize(avg_cost)
train_data = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.conll05.test(), buf_size=8192),
batch_size=BATCH_SIZE)
place = core.CPUPlace()
exe = Executor(place)
exe.run(framework.default_startup_program())
embedding_param = g_scope.find_var(embedding_name).get_tensor()
embedding_param.set(
load_parameter(conll05.get_embedding(), word_dict_len, word_dim), place)
batch_id = 0
for pass_id in xrange(PASS_NUM):
for data in train_data():
word_data = to_lodtensor(map(lambda x: x[0], data), place)
ctx_n2_data = to_lodtensor(map(lambda x: x[1], data), place)
ctx_n1_data = to_lodtensor(map(lambda x: x[2], data), place)
ctx_0_data = to_lodtensor(map(lambda x: x[3], data), place)
ctx_p1_data = to_lodtensor(map(lambda x: x[4], data), place)
ctx_p2_data = to_lodtensor(map(lambda x: x[5], data), place)
verb_data = to_lodtensor(map(lambda x: x[6], data), place)
mark_data = to_lodtensor(map(lambda x: x[7], data), place)
target = to_lodtensor(map(lambda x: x[8], data), place)
outs = exe.run(framework.default_main_program(),
feed={
'word_data': word_data,
'ctx_n2_data': ctx_n2_data,
'ctx_n1_data': ctx_n1_data,
'ctx_0_data': ctx_0_data,
'ctx_p1_data': ctx_p1_data,
'ctx_p2_data': ctx_p2_data,
'verb_data': verb_data,
'mark_data': mark_data,
'target': target
},
fetch_list=[avg_cost])
avg_cost_val = np.array(outs[0])
if batch_id % 10 == 0:
print("avg_cost=" + str(avg_cost_val))
# exit early for CI
exit(0)
batch_id = batch_id + 1
if __name__ == '__main__':
main()
...@@ -54,17 +54,17 @@ def to_lodtensor(data, place): ...@@ -54,17 +54,17 @@ def to_lodtensor(data, place):
return res return res
def chop_data(data, chop_len=80, batch_len=50): def chop_data(data, chop_len=80, batch_size=50):
data = [(x[0][:chop_len], x[1]) for x in data if len(x[0]) >= chop_len] data = [(x[0][:chop_len], x[1]) for x in data if len(x[0]) >= chop_len]
return data[:batch_len] return data[:batch_size]
def prepare_feed_data(data, place): def prepare_feed_data(data, place):
tensor_words = to_lodtensor(map(lambda x: x[0], data), place) tensor_words = to_lodtensor(map(lambda x: x[0], data), place)
label = np.array(map(lambda x: x[1], data)).astype("int64") label = np.array(map(lambda x: x[1], data)).astype("int64")
label = label.reshape([50, 1]) label = label.reshape([len(label), 1])
tensor_label = core.LoDTensor() tensor_label = core.LoDTensor()
tensor_label.set(label, place) tensor_label.set(label, place)
...@@ -72,33 +72,41 @@ def prepare_feed_data(data, place): ...@@ -72,33 +72,41 @@ def prepare_feed_data(data, place):
def main(): def main():
word_dict = paddle.dataset.imdb.word_dict() BATCH_SIZE = 100
cost, acc = lstm_net(dict_dim=len(word_dict), class_dim=2) PASS_NUM = 5
batch_size = 100 word_dict = paddle.dataset.imdb.word_dict()
train_data = paddle.batch( print "load word dict successfully"
paddle.reader.buffered( dict_dim = len(word_dict)
paddle.dataset.imdb.train(word_dict), size=batch_size * 10), class_dim = 2
batch_size=batch_size)
data = chop_data(next(train_data())) cost, acc = lstm_net(dict_dim=dict_dim, class_dim=class_dim)
train_data = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.imdb.train(word_dict), buf_size=BATCH_SIZE * 10),
batch_size=BATCH_SIZE)
place = core.CPUPlace() place = core.CPUPlace()
tensor_words, tensor_label = prepare_feed_data(data, place)
exe = Executor(place) exe = Executor(place)
exe.run(framework.default_startup_program()) exe.run(framework.default_startup_program())
while True: for pass_id in xrange(PASS_NUM):
outs = exe.run(framework.default_main_program(), for data in train_data():
feed={"words": tensor_words, chopped_data = chop_data(data)
"label": tensor_label}, tensor_words, tensor_label = prepare_feed_data(chopped_data, place)
fetch_list=[cost, acc])
cost_val = np.array(outs[0]) outs = exe.run(framework.default_main_program(),
acc_val = np.array(outs[1]) feed={"words": tensor_words,
"label": tensor_label},
print("cost=" + str(cost_val) + " acc=" + str(acc_val)) fetch_list=[cost, acc])
if acc_val > 0.9: cost_val = np.array(outs[0])
break acc_val = np.array(outs[1])
print("cost=" + str(cost_val) + " acc=" + str(acc_val))
if acc_val > 0.7:
exit(0)
exit(1)
if __name__ == '__main__': if __name__ == '__main__':
......
import unittest
import numpy as np
from op_test import OpTest
class TestFTRLOp(OpTest):
def setUp(self):
self.op_type = "ftrl"
w = np.random.random((102, 105)).astype("float32")
g = np.random.random((102, 105)).astype("float32")
sq_accum = np.full((102, 105), 0.1).astype("float32")
linear_accum = np.full((102, 105), 0.1).astype("float32")
lr = np.array([0.01]).astype("float32")
l1 = 0.1
l2 = 0.2
lr_power = -0.5
self.inputs = {
'Param': w,
'SquaredAccumulator': sq_accum,
'LinearAccumulator': linear_accum,
'Grad': g,
'LearningRate': lr
}
self.attrs = {
'l1': l1,
'l2': l2,
'lr_power': lr_power,
'learning_rate': lr
}
new_accum = sq_accum + g * g
if lr_power == -0.5:
linear_out = linear_accum + g - (
(np.sqrt(new_accum) - np.sqrt(sq_accum)) / lr) * w
else:
linear_out = linear_accum + g - ((np.power(
new_accum, -lr_power) - np.power(sq_accum, -lr_power)) / lr) * w
x = (l1 * np.sign(linear_out) - linear_out)
if lr_power == -0.5:
y = (np.sqrt(new_accum) / lr) + (2 * l2)
pre_shrink = x / y
param_out = np.where(np.abs(linear_out) > l1, pre_shrink, 0.0)
else:
y = (np.power(new_accum, -lr_power) / lr) + (2 * l2)
pre_shrink = x / y
param_out = np.where(np.abs(linear_out) > l1, pre_shrink, 0.0)
sq_accum_out = sq_accum + g * g
self.outputs = {
'ParamOut': param_out,
'SquaredAccumOut': sq_accum_out,
'LinearAccumOut': linear_out
}
def test_check_output(self):
self.check_output()
if __name__ == "__main__":
unittest.main()
...@@ -28,8 +28,8 @@ def relu(x): ...@@ -28,8 +28,8 @@ def relu(x):
class TestGRUUnitOp(OpTest): class TestGRUUnitOp(OpTest):
batch_size = 3 batch_size = 5
frame_size = 5 frame_size = 10
activate = { activate = {
GRUActivationType.identity: identity, GRUActivationType.identity: identity,
GRUActivationType.sigmoid: sigmoid, GRUActivationType.sigmoid: sigmoid,
...@@ -77,7 +77,7 @@ class TestGRUUnitOp(OpTest): ...@@ -77,7 +77,7 @@ class TestGRUUnitOp(OpTest):
c = self.activate[self.attrs['activation']](np.dot(r_h_p, w_c) + c = self.activate[self.attrs['activation']](np.dot(r_h_p, w_c) +
g[:, frame_size * 2:]) g[:, frame_size * 2:])
g = np.hstack((u_r, c)) g = np.hstack((u_r, c))
h = u * h_p + (1 - u) * c h = u * c + (1 - u) * h_p
self.outputs = { self.outputs = {
'Gate': g.astype('float64'), 'Gate': g.astype('float64'),
'ResetHiddenPrev': r_h_p.astype('float64'), 'ResetHiddenPrev': r_h_p.astype('float64'),
...@@ -92,10 +92,7 @@ class TestGRUUnitOp(OpTest): ...@@ -92,10 +92,7 @@ class TestGRUUnitOp(OpTest):
self.check_output() self.check_output()
def test_check_grad(self): def test_check_grad(self):
self.check_grad( self.check_grad(['Input', 'HiddenPrev', 'Weight'], ['Hidden'])
['Input', 'HiddenPrev', 'Weight'],
['Hidden', 'ResetHiddenPrev', 'Gate'],
max_relative_error=0.007)
class TestGRUUnitOpWithBias(TestGRUUnitOp): class TestGRUUnitOpWithBias(TestGRUUnitOp):
...@@ -104,18 +101,20 @@ class TestGRUUnitOpWithBias(TestGRUUnitOp): ...@@ -104,18 +101,20 @@ class TestGRUUnitOpWithBias(TestGRUUnitOp):
frame_size = self.frame_size frame_size = self.frame_size
super(TestGRUUnitOpWithBias, self).set_inputs() super(TestGRUUnitOpWithBias, self).set_inputs()
self.inputs['Bias'] = np.random.uniform( self.inputs['Bias'] = np.random.uniform(
-0.1, 0.1, (1, frame_size * 3)).astype('float32') -0.1, 0.1, (1, frame_size * 3)).astype('float64')
self.attrs = { self.attrs = {
'activation': GRUActivationType.identity, 'activation': GRUActivationType.identity,
'gate_activation': GRUActivationType.sigmoid 'gate_activation': GRUActivationType.sigmoid
} }
def test_check_grad(self): def test_check_grad(self):
self.check_grad(['Input', 'HiddenPrev', 'Weight', 'Bias'], ['Hidden'])
def test_check_grad_ingore_input(self):
self.check_grad( self.check_grad(
['Input', 'HiddenPrev', 'Weight', 'Bias'], ['Hidden'], ['HiddenPrev', 'Weight', 'Bias'], ['Hidden'],
max_relative_error=0.007) no_grad_set=set('Input'))
if __name__ == '__main__': if __name__ == '__main__':
exit(0) # FIXME(yuyang18): This unittest is not pass. Fix it later
unittest.main() unittest.main()
...@@ -223,5 +223,109 @@ class TestXavierInitializer(unittest.TestCase): ...@@ -223,5 +223,109 @@ class TestXavierInitializer(unittest.TestCase):
self.assertEqual(init_op.attr('seed'), 134) self.assertEqual(init_op.attr('seed'), 134)
class TestMSRAInitializer(unittest.TestCase):
def test_uniform_msra_initializer(self):
"""Test MSRA initializer with uniform distribution on
for matrix multiply.
"""
program = framework.Program()
block = program.global_block()
param = block.create_parameter(
dtype="float32",
shape=[5, 10],
lod_level=0,
name="param",
initializer=initializer.MSRAInitializer())
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'uniform_random')
limit = np.sqrt(6.0 / param.shape[0])
self.assertAlmostEqual(init_op.attr('min'), -limit, delta=DELTA)
self.assertAlmostEqual(init_op.attr('max'), limit, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 0)
def test_uniform_msra_initializer_conv(self):
"""Test MSRA initializer with uniform distribution on
for convolutions.
"""
program = framework.Program()
block = program.global_block()
param = block.create_parameter(
dtype="float32",
shape=[5, 10, 15, 20],
lod_level=0,
name="param",
initializer=initializer.MSRAInitializer())
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'uniform_random')
receptive_field_size = float(15 * 20)
limit = np.sqrt(6.0 / (param.shape[1] * receptive_field_size))
self.assertAlmostEqual(init_op.attr('min'), -limit, delta=DELTA)
self.assertAlmostEqual(init_op.attr('max'), limit, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 0)
def test_normal_msra_initializer(self):
"""Test MSRA initializer with normal distribution on
for matrix multiply.
"""
program = framework.Program()
block = program.global_block()
param = block.create_parameter(
dtype="float32",
shape=[5, 10],
lod_level=0,
name="param",
initializer=initializer.MSRAInitializer(uniform=False))
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'gaussian_random')
std = np.sqrt(2.0 / param.shape[0])
self.assertAlmostEqual(init_op.attr('mean'), 0.0, delta=DELTA)
self.assertAlmostEqual(init_op.attr('std'), std, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 0)
def test_normal_msra_initializer_conv(self):
"""Test MSRA initializer with normal distribution on
for convolutions.
"""
program = framework.Program()
block = program.global_block()
param = block.create_parameter(
dtype="float32",
shape=[5, 10, 15, 20],
lod_level=0,
name="param",
initializer=initializer.MSRAInitializer(uniform=False))
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'gaussian_random')
receptive_field_size = float(15 * 20)
std = np.sqrt(2.0 / (param.shape[1] * receptive_field_size))
self.assertAlmostEqual(init_op.attr('mean'), 0.0, delta=DELTA)
self.assertAlmostEqual(init_op.attr('std'), std, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 0)
def test_msra_initializer_supplied_arguments(self):
"""Test the MSRA initializer with supplied arguments
"""
program = framework.Program()
block = program.global_block()
block.create_parameter(
dtype="float32",
shape=[5, 10],
lod_level=0,
name="param",
initializer=initializer.MSRAInitializer(
fan_in=12, seed=134))
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'uniform_random')
limit = np.sqrt(6.0 / 12)
self.assertAlmostEqual(init_op.attr('min'), -limit, delta=DELTA)
self.assertAlmostEqual(init_op.attr('max'), limit, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 134)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
import unittest
import paddle.v2.fluid.layers as layers import paddle.v2.fluid.layers as layers
import paddle.v2.fluid.nets as nets import paddle.v2.fluid.nets as nets
from paddle.v2.fluid.framework import Program from paddle.v2.fluid.framework import Program
import paddle.v2.fluid.core as core
import unittest
class TestBook(unittest.TestCase): class TestBook(unittest.TestCase):
...@@ -20,7 +20,8 @@ class TestBook(unittest.TestCase): ...@@ -20,7 +20,8 @@ class TestBook(unittest.TestCase):
avg_cost = layers.mean(x=cost, main_program=program) avg_cost = layers.mean(x=cost, main_program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
program.append_backward(avg_cost) program.append_backward(avg_cost)
print str(program)
# print str(program)
def test_recognize_digits_mlp(self): def test_recognize_digits_mlp(self):
program = Program() program = Program()
...@@ -49,7 +50,7 @@ class TestBook(unittest.TestCase): ...@@ -49,7 +50,7 @@ class TestBook(unittest.TestCase):
input=predict, label=label, main_program=program) input=predict, label=label, main_program=program)
avg_cost = layers.mean(x=cost, main_program=program) avg_cost = layers.mean(x=cost, main_program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
print str(program) # print str(program)
def test_simple_conv2d(self): def test_simple_conv2d(self):
program = Program() program = Program()
...@@ -64,7 +65,7 @@ class TestBook(unittest.TestCase): ...@@ -64,7 +65,7 @@ class TestBook(unittest.TestCase):
filter_size=[4, 4], filter_size=[4, 4],
main_program=program) main_program=program)
print str(program) # print str(program)
def test_recognize_digits_conv(self): def test_recognize_digits_conv(self):
program = Program() program = Program()
...@@ -103,7 +104,7 @@ class TestBook(unittest.TestCase): ...@@ -103,7 +104,7 @@ class TestBook(unittest.TestCase):
program.append_backward(avg_cost) program.append_backward(avg_cost)
print str(program) # print str(program)
def test_word_embedding(self): def test_word_embedding(self):
program = Program() program = Program()
...@@ -164,7 +165,24 @@ class TestBook(unittest.TestCase): ...@@ -164,7 +165,24 @@ class TestBook(unittest.TestCase):
avg_cost = layers.mean(x=cost, main_program=program) avg_cost = layers.mean(x=cost, main_program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
print str(program) # print str(program)
def test_linear_chain_crf(self):
program = Program()
# Change g_program, so the rest layers use `g_program`
images = layers.data(
name='pixel',
shape=[784],
data_type='float32',
main_program=program)
label = layers.data(
name='label', shape=[1], data_type='int32', main_program=program)
hidden = layers.fc(input=images, size=128, main_program=program)
crf = layers.linear_chain_crf(
input=hidden, label=label, main_program=program)
# print str(program)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -104,7 +104,7 @@ class TestLinearChainCrfOp(OpTest): ...@@ -104,7 +104,7 @@ class TestLinearChainCrfOp(OpTest):
transition_exps = np.exp(transition) transition_exps = np.exp(transition)
labels = np.random.randint( labels = np.random.randint(
low=0, high=TAG_NUM, size=(lod[-1][-1], 1), dtype="int32") low=0, high=TAG_NUM, size=(lod[-1][-1], 1), dtype="int64")
self.inputs = { self.inputs = {
"Emission": (emission, lod), "Emission": (emission, lod),
......
import paddle.v2.fluid.layers as layers
from paddle.v2.fluid.framework import Program
from paddle.v2.fluid.executor import Executor
from paddle.v2.fluid.optimizer import MomentumOptimizer
import paddle.v2.fluid.core as core
import paddle.v2 as paddle
import unittest
import numpy as np
class TestMNISTIfElseOp(unittest.TestCase):
def test_raw_api(self):
kwargs = {'startup_program': Program(), 'main_program': Program()}
image = layers.data(
name='x', shape=[784], data_type='float32', **kwargs)
label = layers.data(name='y', shape=[1], data_type='int64', **kwargs)
limit = layers.fill_constant_batch_size_like(
input=label, dtype='int64', shape=[1], value=5.0, **kwargs)
cond = layers.less_than(x=label, y=limit, **kwargs)
true_image, false_image = layers.split_lod_tensor(
input=image, mask=cond, **kwargs)
true_out = layers.create_tensor(dtype='float32', **kwargs)
true_cond = layers.ConditionalBlock([true_image], **kwargs)
with true_cond.block():
hidden = layers.fc(input=true_image, size=100, act='tanh', **kwargs)
prob = layers.fc(input=hidden, size=10, act='softmax', **kwargs)
layers.assign(input=prob, output=true_out, **kwargs)
false_out = layers.create_tensor(dtype='float32', **kwargs)
false_cond = layers.ConditionalBlock([false_image], **kwargs)
with false_cond.block():
hidden = layers.fc(input=false_image,
size=200,
act='tanh',
**kwargs)
prob = layers.fc(input=hidden, size=10, act='softmax', **kwargs)
layers.assign(input=prob, output=false_out, **kwargs)
prob = layers.merge_lod_tensor(
in_true=true_out, in_false=false_out, mask=cond, x=image, **kwargs)
loss = layers.cross_entropy(input=prob, label=label, **kwargs)
avg_loss = layers.mean(x=loss, **kwargs)
optimizer = MomentumOptimizer(learning_rate=0.001, momentum=0.9)
optimizer.minimize(avg_loss, kwargs['startup_program'])
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=8192),
batch_size=200)
place = core.CPUPlace()
exe = Executor(place)
exe.run(kwargs['startup_program'])
PASS_NUM = 100
for pass_id in range(PASS_NUM):
for data in train_reader():
x_data = np.array(map(lambda x: x[0], data)).astype("float32")
y_data = np.array(map(lambda x: x[1], data)).astype("int64")
y_data = np.expand_dims(y_data, axis=1)
tensor_x = core.LoDTensor()
tensor_x.set(x_data, place)
tensor_y = core.LoDTensor()
tensor_y.set(y_data, place)
outs = map(np.array,
exe.run(kwargs['main_program'],
feed={'x': tensor_x,
'y': tensor_y},
fetch_list=[avg_loss]))
print outs[0]
if outs[0] < 1.0:
return
self.assertFalse(True)
def test_ifelse(self):
kwargs = {'startup_program': Program(), 'main_program': Program()}
image = layers.data(
name='x', shape=[784], data_type='float32', **kwargs)
label = layers.data(name='y', shape=[1], data_type='int64', **kwargs)
limit = layers.fill_constant_batch_size_like(
input=label, dtype='int64', shape=[1], value=5.0, **kwargs)
cond = layers.less_than(x=label, y=limit, **kwargs)
ie = layers.IfElse(cond, **kwargs)
with ie.true_block():
true_image = ie.input(image)
hidden = layers.fc(input=true_image, size=100, act='tanh', **kwargs)
prob = layers.fc(input=hidden, size=10, act='softmax', **kwargs)
ie.output(prob)
with ie.false_block():
false_image = ie.input(image)
hidden = layers.fc(input=false_image,
size=200,
act='tanh',
**kwargs)
prob = layers.fc(input=hidden, size=10, act='softmax', **kwargs)
ie.output(prob)
prob = ie()
loss = layers.cross_entropy(input=prob[0], label=label, **kwargs)
avg_loss = layers.mean(x=loss, **kwargs)
optimizer = MomentumOptimizer(learning_rate=0.001, momentum=0.9)
optimizer.minimize(avg_loss, kwargs['startup_program'])
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=8192),
batch_size=200)
place = core.CPUPlace()
exe = Executor(place)
exe.run(kwargs['startup_program'])
PASS_NUM = 100
for pass_id in range(PASS_NUM):
for data in train_reader():
x_data = np.array(map(lambda x: x[0], data)).astype("float32")
y_data = np.array(map(lambda x: x[1], data)).astype("int64")
y_data = np.expand_dims(y_data, axis=1)
tensor_x = core.LoDTensor()
tensor_x.set(x_data, place)
tensor_y = core.LoDTensor()
tensor_y.set(y_data, place)
outs = map(np.array,
exe.run(kwargs['main_program'],
feed={'x': tensor_x,
'y': tensor_y},
fetch_list=[avg_loss]))
print outs[0]
if outs[0] < 1.0:
return
self.assertFalse(True)
if __name__ == '__main__':
unittest.main()
import unittest import unittest
from paddle.v2.fluid.framework import Variable, g_main_program, Program from paddle.v2.fluid.framework import g_main_program, Program, convert_np_dtype_to_dtype_
import paddle.v2.fluid.core as core import paddle.v2.fluid.core as core
import numpy as np import numpy as np
...@@ -7,7 +7,7 @@ import numpy as np ...@@ -7,7 +7,7 @@ import numpy as np
class TestVariable(unittest.TestCase): class TestVariable(unittest.TestCase):
def test_np_dtype_convert(self): def test_np_dtype_convert(self):
DT = core.DataType DT = core.DataType
convert = Variable._convert_np_dtype_to_dtype_ convert = convert_np_dtype_to_dtype_
self.assertEqual(DT.FP32, convert(np.float32)) self.assertEqual(DT.FP32, convert(np.float32))
self.assertEqual(DT.FP16, convert("float16")) self.assertEqual(DT.FP16, convert("float16"))
self.assertEqual(DT.FP64, convert("float64")) self.assertEqual(DT.FP64, convert("float64"))
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册