提交 67a3277e 编写于 作者: W wanghaoshuang

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_crop

...@@ -53,6 +53,15 @@ TBD ...@@ -53,6 +53,15 @@ TBD
- GoogLeNet - GoogLeNet
| BatchSize | 64 | 128 | 256 |
|--------------|-------| ------| -------|
| OpenBLAS | 89.52 | 96.97 | 108.25 |
| MKLML | 128.46| 137.89| 158.63 |
| MKL-DNN     | 250.46| 264.83| 269.50 |
chart on batch size 128
TBD
### Laptop ### Laptop
TBD TBD
### Desktop ### Desktop
......
# Python Data Reader Design Doc # Python Data Reader Design Doc
At training and testing time, PaddlePaddle programs need to read data. To ease the users' work to write data reading code, we define that During the training and testing phases, PaddlePaddle programs need to read data. To help the users write code that performs reading input data, we define the following:
- A *reader* is a function that reads data (from file, network, random number generator, etc) and yields data items. - A *reader*: A function that reads data (from file, network, random number generator, etc) and yields the data items.
- A *reader creator* is a function that returns a reader function. - A *reader creator*: A function that returns a reader function.
- A *reader decorator* is a function, which accepts one or more readers, and returns a reader. - A *reader decorator*: A function, which takes in one or more readers, and returns a reader.
- A *batch reader* is a function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items. - A *batch reader*: A function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items.
and provide function which converts reader to batch reader, frequently used reader creators and reader decorators. and also provide a function which can convert a reader to a batch reader, frequently used reader creators and reader decorators.
## Data Reader Interface ## Data Reader Interface
Indeed, *data reader* doesn't have to be a function that reads and yields data items. It can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`): *Data reader* doesn't have to be a function that reads and yields data items. It can just be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`) as follows:
``` ```
iterable = data_reader() iterable = data_reader()
``` ```
Element produced from the iterable should be a **single** entry of data, **not** a mini batch. That entry of data could be a single item, or a tuple of items. Item should be of [supported type](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int) The item produced from the iterable should be a **single** entry of data and **not** a mini batch. The entry of data could be a single item or a tuple of items. Item should be of one of the [supported types](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int etc.)
An example implementation for single item data reader creator: An example implementation for single item data reader creator is as follows:
```python ```python
def reader_creator_random_image(width, height): def reader_creator_random_image(width, height):
...@@ -29,7 +29,7 @@ def reader_creator_random_image(width, height): ...@@ -29,7 +29,7 @@ def reader_creator_random_image(width, height):
return reader return reader
``` ```
An example implementation for multiple item data reader creator: An example implementation for multiple item data reader creator is as follows:
```python ```python
def reader_creator_random_image_and_label(width, height, label): def reader_creator_random_image_and_label(width, height, label):
def reader(): def reader():
...@@ -40,9 +40,10 @@ def reader_creator_random_image_and_label(width, height, label): ...@@ -40,9 +40,10 @@ def reader_creator_random_image_and_label(width, height, label):
## Batch Reader Interface ## Batch Reader Interface
*batch reader* can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list must be a tuple. *Batch reader* can be any function without any parameters that creates an iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list should be a tuple.
Here are some valid outputs:
Here are valid outputs:
```python ```python
# a mini batch of three data items. Each data item consist three columns of data, each of which is 1. # a mini batch of three data items. Each data item consist three columns of data, each of which is 1.
[(1, 1, 1), [(1, 1, 1),
...@@ -58,20 +59,22 @@ Here are valid outputs: ...@@ -58,20 +59,22 @@ Here are valid outputs:
Please note that each item inside the list must be a tuple, below is an invalid output: Please note that each item inside the list must be a tuple, below is an invalid output:
```python ```python
# wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],). # wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],).
# Otherwise it's ambiguous whether [1,1,1] means a single column of data [1, 1, 1], # Otherwise it is ambiguous whether [1,1,1] means a single column of data [1, 1, 1],
# or three column of datas, each of which is 1. # or three columns of data, each of which is 1.
[[1,1,1], [[1,1,1],
[2,2,2], [2,2,2],
[3,3,3]] [3,3,3]]
``` ```
It's easy to convert from reader to batch reader: It is easy to convert from a reader to a batch reader:
```python ```python
mnist_train = paddle.dataset.mnist.train() mnist_train = paddle.dataset.mnist.train()
mnist_train_batch_reader = paddle.batch(mnist_train, 128) mnist_train_batch_reader = paddle.batch(mnist_train, 128)
``` ```
Also easy to create custom batch reader: It is also straight forward to create a custom batch reader:
```python ```python
def custom_batch_reader(): def custom_batch_reader():
while True: while True:
...@@ -85,7 +88,8 @@ mnist_random_image_batch_reader = custom_batch_reader ...@@ -85,7 +88,8 @@ mnist_random_image_batch_reader = custom_batch_reader
## Usage ## Usage
batch reader, mapping from item(s) read to data layer, batch size and number of total pass will be passed into `paddle.train`: Following is how we can use the reader with PaddlePaddle:
The batch reader, a mapping from item(s) to data layer, the batch size and the number of total passes will be passed into `paddle.train` as follows:
```python ```python
# two data layer is created: # two data layer is created:
...@@ -99,13 +103,13 @@ paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...) ...@@ -99,13 +103,13 @@ paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...)
## Data Reader Decorator ## Data Reader Decorator
*Data reader decorator* takes a single or multiple data reader, returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` syntax. The *Data reader decorator* takes in a single reader or multiple data readers and returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` in the syntax.
Since we have a strict interface for data readers (no parameter, return a single data item). Data reader can be used flexiable via data reader decorators. Following are a few examples: Since we have a strict interface for data readers (no parameters and return a single data item), a data reader can be used in a flexible way using data reader decorators. Following are a few examples:
### Prefetch Data ### Prefetch Data
Since reading data may take time and training can not proceed without data. It is generally a good idea to prefetch data. Since reading data may take some time and training can not proceed without data, it is generally a good idea to prefetch the data.
Use `paddle.reader.buffered` to prefetch data: Use `paddle.reader.buffered` to prefetch data:
...@@ -117,9 +121,9 @@ buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100) ...@@ -117,9 +121,9 @@ buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100)
### Compose Multiple Data Readers ### Compose Multiple Data Readers
For example, we want to use a source of real images (reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661). For example, if we want to use a source of real images (say reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661).
We can do: We can do the following :
```python ```python
def reader_creator_random_image(width, height): def reader_creator_random_image(width, height):
...@@ -139,13 +143,13 @@ false_reader = reader_creator_bool(False) ...@@ -139,13 +143,13 @@ false_reader = reader_creator_bool(False)
reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader) reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader)
# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry. # Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry.
# And we don't care second item at this time. # And we don't care about the second item at this time.
paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...) paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...)
``` ```
### Shuffle ### Shuffle
Given shuffle buffer size `n`, `paddle.reader.shuffle` will return a data reader that buffers `n` data entries and shuffle them before a data entry is read. Given the shuffle buffer size `n`, `paddle.reader.shuffle` returns a data reader that buffers `n` data entries and shuffles them before a data entry is read.
Example: Example:
```python ```python
...@@ -154,21 +158,21 @@ reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512) ...@@ -154,21 +158,21 @@ reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512)
## Q & A ## Q & A
### Why reader return only a single entry, but not a mini batch? ### Why does a reader return only a single entry, and not a mini batch?
Always returning a single entry make reusing existing data readers much easier (e.g., if existing reader return not a single entry but 3 entries, training code will be more complex because it need to handle cases like batch size 2). Returning a single entry makes reusing existing data readers much easier (for example, if an existing reader returns 3 entries instead if a single entry, the training code will be more complicated because it need to handle cases like a batch size 2).
We provide function `paddle.batch` to turn (single entry) reader into batch reader. We provide a function: `paddle.batch` to turn (a single entry) reader into a batch reader.
### Why do we need batch reader, isn't train take reader and batch_size as arguments sufficient? ### Why do we need a batch reader, isn't is sufficient to give the reader and batch_size as arguments during training ?
In most of the case, train taking reader and batch_size as arguments would be sufficent. However sometimes user want to customize order of data entries inside a mini batch. Or even change batch size dynamically. In most of the cases, it would be sufficient to give the reader and batch_size as arguments to the train method. However sometimes the user wants to customize the order of data entries inside a mini batch, or even change the batch size dynamically. For these cases using a batch reader is very efficient and helpful.
### Why use a dictionary but not a list to provide mapping? ### Why use a dictionary instead of a list to provide mapping?
We decided to use dictionary (`{"image":0, "label":1}`) instead of list (`["image", "label"]`) is because that user can easily resue item (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or skip item (e.g., using `{"image_a":0, "label":2}`). Using a dictionary (`{"image":0, "label":1}`) instead of a list (`["image", "label"]`) gives the advantage that the user can easily reuse the items (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or even skip an item (e.g., using `{"image_a":0, "label":2}`).
### How to create custom data reader creator ### How to create a custom data reader creator ?
```python ```python
def image_reader_creator(image_path, label_path, n): def image_reader_creator(image_path, label_path, n):
...@@ -192,7 +196,7 @@ paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...) ...@@ -192,7 +196,7 @@ paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...)
### How is `paddle.train` implemented ### How is `paddle.train` implemented
An example implementation of paddle.train could be: An example implementation of paddle.train is:
```python ```python
def train(batch_reader, mapping, batch_size, total_pass): def train(batch_reader, mapping, batch_size, total_pass):
......
...@@ -38,7 +38,7 @@ inline bool IsExpand(std::vector<int64_t>& filter_dim, ...@@ -38,7 +38,7 @@ inline bool IsExpand(std::vector<int64_t>& filter_dim,
std::vector<int>& dilations) { std::vector<int>& dilations) {
bool filter_1 = true, strides_1 = true, padding_0 = true, dilation_1 = true; bool filter_1 = true, strides_1 = true, padding_0 = true, dilation_1 = true;
for (size_t j = 0; j < strides.size(); ++j) { for (size_t j = 0; j < strides.size(); ++j) {
filter_1 = filter_1 && (static_cast<int>(filter_dim[j]) == 1); filter_1 = filter_1 && (static_cast<int>(filter_dim[j + 2]) == 1);
strides_1 = strides_1 && (strides[j] == 1); strides_1 = strides_1 && (strides[j] == 1);
padding_0 = padding_0 && (paddings[j] == 0); padding_0 = padding_0 && (paddings[j] == 0);
dilation_1 = dilation_1 && (dilations[j] == 1); dilation_1 = dilation_1 && (dilations[j] == 1);
...@@ -91,32 +91,28 @@ class GemmConvKernel : public framework::OpKernel<T> { ...@@ -91,32 +91,28 @@ class GemmConvKernel : public framework::OpKernel<T> {
const int batch_size = static_cast<int>(input->dims()[0]); const int batch_size = static_cast<int>(input->dims()[0]);
// filter_shape_vec: {k_h, k_w} or {k_d, k_h, k_w} // filter_shape_vec: {k_o, k_i, k_h, k_w} or {k_o, k_i, k_d, k_h, k_w}
std::vector<int64_t> filter_shape_vec(framework::vectorize(filter.dims())); std::vector<int64_t> filter_shape_vec(framework::vectorize(filter.dims()));
filter_shape_vec.erase(filter_shape_vec.begin(), // output_shape_vec: {o_n, o_c, o_h, o_w} or {o_n, o_c, o_d, o_h, o_w}
filter_shape_vec.begin() + 2);
// output_shape_vec: {o_h, o_w} or {o_d, o_h, o_w}
std::vector<int64_t> output_shape_vec(framework::vectorize(output->dims())); std::vector<int64_t> output_shape_vec(framework::vectorize(output->dims()));
output_shape_vec.erase(output_shape_vec.begin(),
output_shape_vec.begin() + 2);
// use col_shape in the im2col calculation // use col_shape in the im2col calculation
// col_shape_vec: {i_c/g, k_h, k_w, o_h, o_w} or {i_c/g, k_d, k_h, k_w, o_d, // col_shape_vec: {i_c/g, k_h, k_w, o_h, o_w} or {i_c/g, k_d, k_h, k_w, o_d,
// o_h, o_w} // o_h, o_w}
std::vector<int64_t> col_shape_vec; size_t data_dim = filter_shape_vec.size() - 2;
col_shape_vec.push_back(input->dims()[1] / groups); std::vector<int64_t> col_shape_vec(1 + 2 * data_dim);
col_shape_vec.insert(col_shape_vec.end(), filter_shape_vec.begin(), col_shape_vec[0] = input->dims()[1] / groups;
filter_shape_vec.end()); for (size_t j = 0; j < data_dim; ++j) {
col_shape_vec.insert(col_shape_vec.end(), output_shape_vec.begin(), col_shape_vec[j + 1] = filter_shape_vec[j + 2];
output_shape_vec.end()); col_shape_vec[j + 1 + data_dim] = output_shape_vec[j + 2];
}
framework::DDim col_shape(framework::make_ddim(col_shape_vec)); framework::DDim col_shape(framework::make_ddim(col_shape_vec));
// use col_matrix_shape in the gemm calculation // use col_matrix_shape in the gemm calculation
// size: (i_c/g * k_h * k_w, o_h * o_w) or (i_c/g * k_d * k_h * k_w, o_d * // size: (i_c/g * k_h * k_w, o_h * o_w) or (i_c/g * k_d * k_h * k_w, o_d *
// o_h * o_w) // o_h * o_w)
framework::DDim col_matrix_shape = framework::DDim col_matrix_shape =
framework::flatten_to_2d(col_shape, filter_shape_vec.size() + 1); framework::flatten_to_2d(col_shape, data_dim + 1);
bool is_expand = IsExpand(filter_shape_vec, strides, paddings, dilations); bool is_expand = IsExpand(filter_shape_vec, strides, paddings, dilations);
Tensor col; Tensor col;
...@@ -159,13 +155,13 @@ class GemmConvKernel : public framework::OpKernel<T> { ...@@ -159,13 +155,13 @@ class GemmConvKernel : public framework::OpKernel<T> {
col.ShareDataWith(in_slice); col.ShareDataWith(in_slice);
col_matrix.ShareDataWith(col); col_matrix.ShareDataWith(col);
col_matrix.Resize(col_matrix_shape); col_matrix.Resize(col_matrix_shape);
} else if (filter_shape_vec.size() == 2) { } else if (data_dim == 2U) {
// im2col // im2col
im2col(context.device_context(), in_slice, dilations, strides, im2col(context.device_context(), in_slice, dilations, strides,
std::vector<int>{paddings[0], paddings[1], paddings[0], std::vector<int>{paddings[0], paddings[1], paddings[0],
paddings[1]}, paddings[1]},
&col); &col);
} else if (filter_shape_vec.size() == 3) { } else if (data_dim == 3U) {
// vol2col // vol2col
vol2col(context.device_context(), in_slice, dilations, strides, vol2col(context.device_context(), in_slice, dilations, strides,
paddings, &col); paddings, &col);
...@@ -206,26 +202,22 @@ class GemmConvGradKernel : public framework::OpKernel<T> { ...@@ -206,26 +202,22 @@ class GemmConvGradKernel : public framework::OpKernel<T> {
const int batch_size = static_cast<int>(input->dims()[0]); const int batch_size = static_cast<int>(input->dims()[0]);
// filter_shape_vec: {k_h, k_w} or {k_d, k_h, k_w} // filter_shape_vec: {k_o, k_i, k_h, k_w} or {k_o, k_i, k_d, k_h, k_w}
std::vector<int64_t> filter_shape_vec(framework::vectorize(filter.dims())); std::vector<int64_t> filter_shape_vec(framework::vectorize(filter.dims()));
filter_shape_vec.erase(filter_shape_vec.begin(), // output_shape_vec: {o_n, o_c, o_h, o_w} or {o_n, o_c, o_d, o_h, o_w}
filter_shape_vec.begin() + 2);
// output_shape_vec: {o_h, o_w} or {o_d, o_h, o_w}
std::vector<int64_t> output_shape_vec( std::vector<int64_t> output_shape_vec(
framework::vectorize(output_grad->dims())); framework::vectorize(output_grad->dims()));
output_shape_vec.erase(output_shape_vec.begin(),
output_shape_vec.begin() + 2);
// use col_shape in the im2col calculation // use col_shape in the im2col calculation
// col_shape_vec: {i_c/g, k_h, k_w, o_h, o_w} or {i_c/g, k_d, k_h, k_w, o_d, // col_shape_vec: {i_c/g, k_h, k_w, o_h, o_w} or {i_c/g, k_d, k_h, k_w, o_d,
// o_h, o_w} // o_h, o_w}
std::vector<int64_t> col_shape_vec; size_t data_dim = filter_shape_vec.size() - 2;
col_shape_vec.push_back(input->dims()[1] / groups); std::vector<int64_t> col_shape_vec(1 + 2 * data_dim);
col_shape_vec.insert(col_shape_vec.end(), filter_shape_vec.begin(), col_shape_vec[0] = input->dims()[1] / groups;
filter_shape_vec.end()); for (size_t j = 0; j < data_dim; ++j) {
col_shape_vec.insert(col_shape_vec.end(), output_shape_vec.begin(), col_shape_vec[j + 1] = filter_shape_vec[j + 2];
output_shape_vec.end()); col_shape_vec[j + 1 + data_dim] = output_shape_vec[j + 2];
}
framework::DDim col_shape(framework::make_ddim(col_shape_vec)); framework::DDim col_shape(framework::make_ddim(col_shape_vec));
// use col_matrix_shape in the gemm calculation // use col_matrix_shape in the gemm calculation
...@@ -233,7 +225,7 @@ class GemmConvGradKernel : public framework::OpKernel<T> { ...@@ -233,7 +225,7 @@ class GemmConvGradKernel : public framework::OpKernel<T> {
// or // or
// (i_c/g * k_d * k_h * k_w, o_d * o_h * o_w) // (i_c/g * k_d * k_h * k_w, o_d * o_h * o_w)
framework::DDim col_matrix_shape = framework::DDim col_matrix_shape =
framework::flatten_to_2d(col_shape, filter_shape_vec.size() + 1); framework::flatten_to_2d(col_shape, data_dim + 1);
framework::DDim input_shape = framework::slice_ddim( framework::DDim input_shape = framework::slice_ddim(
input->dims(), 1, static_cast<int>(input->dims().size())); input->dims(), 1, static_cast<int>(input->dims().size()));
...@@ -294,12 +286,12 @@ class GemmConvGradKernel : public framework::OpKernel<T> { ...@@ -294,12 +286,12 @@ class GemmConvGradKernel : public framework::OpKernel<T> {
out_grad_slice, false, T(1.0), &col_matrix, out_grad_slice, false, T(1.0), &col_matrix,
T(0.0)); T(0.0));
if (is_expand && filter_shape_vec.size() == 2) { if (is_expand && data_dim == 2U) {
col2im(context.device_context(), col, dilations, strides, col2im(context.device_context(), col, dilations, strides,
std::vector<int>{paddings[0], paddings[1], paddings[0], std::vector<int>{paddings[0], paddings[1], paddings[0],
paddings[1]}, paddings[1]},
&in_grad_slice); &in_grad_slice);
} else if (is_expand && filter_shape_vec.size() == 3) { } else if (is_expand && data_dim == 3U) {
col2vol(context.device_context(), col, dilations, strides, paddings, col2vol(context.device_context(), col, dilations, strides, paddings,
&in_grad_slice); &in_grad_slice);
} }
...@@ -328,12 +320,12 @@ class GemmConvGradKernel : public framework::OpKernel<T> { ...@@ -328,12 +320,12 @@ class GemmConvGradKernel : public framework::OpKernel<T> {
col.ShareDataWith(in_slice); col.ShareDataWith(in_slice);
col_matrix.ShareDataWith(col); col_matrix.ShareDataWith(col);
col_matrix.Resize(col_matrix_shape); col_matrix.Resize(col_matrix_shape);
} else if (filter_shape_vec.size() == 2) { } else if (data_dim == 2U) {
im2col(context.device_context(), in_slice, dilations, strides, im2col(context.device_context(), in_slice, dilations, strides,
std::vector<int>{paddings[0], paddings[1], paddings[0], std::vector<int>{paddings[0], paddings[1], paddings[0],
paddings[1]}, paddings[1]},
&col); &col);
} else if (filter_shape_vec.size() == 3) { } else if (data_dim == 3U) {
vol2col(context.device_context(), in_slice, dilations, strides, vol2col(context.device_context(), in_slice, dilations, strides,
paddings, &col); paddings, &col);
} }
......
...@@ -68,30 +68,26 @@ class GemmConvTransposeKernel : public framework::OpKernel<T> { ...@@ -68,30 +68,26 @@ class GemmConvTransposeKernel : public framework::OpKernel<T> {
const int batch_size = static_cast<int>(input->dims()[0]); const int batch_size = static_cast<int>(input->dims()[0]);
// input_shape_vec: {h, w} or {d, h, w} // input_shape_vec: {n, c, h, w} or {n, c, d, h, w}
std::vector<int64_t> input_shape_vec = framework::vectorize(input->dims()); std::vector<int64_t> input_shape_vec = framework::vectorize(input->dims());
input_shape_vec.erase(input_shape_vec.begin(), input_shape_vec.begin() + 2); // filter_shape_vec: {k_o, k_c, k_h, k_w} or {k_o, k_c, k_d, k_h, k_w}
// filter_shape_vec: {k_h, k_w} or {k_d, k_h, k_w}
std::vector<int64_t> filter_shape_vec = framework::vectorize(filter.dims()); std::vector<int64_t> filter_shape_vec = framework::vectorize(filter.dims());
filter_shape_vec.erase(filter_shape_vec.begin(),
filter_shape_vec.begin() + 2);
// use col_shape in the im2col and col2im (or vol2col and col2vol) // use col_shape in the im2col and col2im (or vol2col and col2vol)
// calculation // calculation
// col_shape_vec: {c, k_h, k_w, h, w} or {c, k_d, k_h, k_w, d, h, w} // col_shape_vec: {c, k_h, k_w, h, w} or {c, k_d, k_h, k_w, d, h, w}
std::vector<int64_t> col_shape_vec; size_t data_dim = filter_shape_vec.size() - 2;
col_shape_vec.push_back(output->dims()[1]); std::vector<int64_t> col_shape_vec(1 + 2 * data_dim);
col_shape_vec.insert(col_shape_vec.end(), filter_shape_vec.begin(), col_shape_vec[0] = output->dims()[1];
filter_shape_vec.end()); for (size_t j = 0; j < data_dim; ++j) {
col_shape_vec.insert(col_shape_vec.end(), input_shape_vec.begin(), col_shape_vec[j + 1] = filter_shape_vec[j + 2];
input_shape_vec.end()); col_shape_vec[j + 1 + data_dim] = input_shape_vec[j + 2];
}
DDim col_shape(framework::make_ddim(col_shape_vec)); DDim col_shape(framework::make_ddim(col_shape_vec));
// use col_matrix_shape in the gemm calculation // use col_matrix_shape in the gemm calculation
// size: (c * k_h * k_w, h * w) or (c * k_d * k_h * k_w, d * h * w) // size: (c * k_h * k_w, h * w) or (c * k_d * k_h * k_w, d * h * w)
DDim col_matrix_shape = DDim col_matrix_shape = framework::flatten_to_2d(col_shape, data_dim + 1);
framework::flatten_to_2d(col_shape, filter_shape_vec.size() + 1);
Tensor col; Tensor col;
col.mutable_data<T>(col_shape, context.GetPlace()); col.mutable_data<T>(col_shape, context.GetPlace());
...@@ -136,7 +132,7 @@ class GemmConvTransposeKernel : public framework::OpKernel<T> { ...@@ -136,7 +132,7 @@ class GemmConvTransposeKernel : public framework::OpKernel<T> {
input_batch, false, static_cast<T>(1.0), input_batch, false, static_cast<T>(1.0),
&col_matrix, static_cast<T>(0.0)); &col_matrix, static_cast<T>(0.0));
if (filter_shape_vec.size() == 2) { if (data_dim == 2U) {
// col2im: col_matrix -> dy // col2im: col_matrix -> dy
// from (c * k_h * k_w, h * w) to (c, o_h, o_w) // from (c * k_h * k_w, h * w) to (c, o_h, o_w)
col2im(context.device_context(), col, col2im(context.device_context(), col,
...@@ -144,7 +140,7 @@ class GemmConvTransposeKernel : public framework::OpKernel<T> { ...@@ -144,7 +140,7 @@ class GemmConvTransposeKernel : public framework::OpKernel<T> {
std::vector<int>{paddings[0], paddings[1], paddings[0], std::vector<int>{paddings[0], paddings[1], paddings[0],
paddings[1]}, paddings[1]},
&output_batch); &output_batch);
} else if (filter_shape_vec.size() == 3) { } else if (data_dim == 3U) {
// col2vol: col_matrix -> dy // col2vol: col_matrix -> dy
// from (c * k_d * k_h * k_w, d * h * w) to (c, o_d, o_h, o_w) // from (c * k_d * k_h * k_w, d * h * w) to (c, o_d, o_h, o_w)
col2vol(context.device_context(), col, dilations, strides, paddings, col2vol(context.device_context(), col, dilations, strides, paddings,
...@@ -176,30 +172,26 @@ class GemmConvTransposeGradKernel : public framework::OpKernel<T> { ...@@ -176,30 +172,26 @@ class GemmConvTransposeGradKernel : public framework::OpKernel<T> {
const int batch_size = static_cast<int>(input->dims()[0]); const int batch_size = static_cast<int>(input->dims()[0]);
// input_shape_vec: {h, w} or {d, h, w} // input_shape_vec: {n, c, h, w} or {n, c, d, h, w}
std::vector<int64_t> input_shape_vec = framework::vectorize(input->dims()); std::vector<int64_t> input_shape_vec = framework::vectorize(input->dims());
input_shape_vec.erase(input_shape_vec.begin(), input_shape_vec.begin() + 2); // filter_shape_vec: {k_o, k_c, k_h, k_w} or {k_o, k_c, k_d, k_h, k_w}
// filter_shape_vec: {k_h, k_w} or {k_d, k_h, k_w}
std::vector<int64_t> filter_shape_vec = framework::vectorize(filter.dims()); std::vector<int64_t> filter_shape_vec = framework::vectorize(filter.dims());
filter_shape_vec.erase(filter_shape_vec.begin(),
filter_shape_vec.begin() + 2);
// use col_shape in the im2col and col2im (or vol2col and col2vol) // use col_shape in the im2col and col2im (or vol2col and col2vol)
// calculation // calculation
// col_shape_vec: {c, k_h, k_w, h, w} or {c, k_d, k_h, k_w, d, h, w} // col_shape_vec: {c, k_h, k_w, h, w} or {c, k_d, k_h, k_w, d, h, w}
std::vector<int64_t> col_shape_vec; size_t data_dim = filter_shape_vec.size() - 2;
col_shape_vec.push_back(output_grad->dims()[1]); std::vector<int64_t> col_shape_vec(1 + 2 * data_dim);
col_shape_vec.insert(col_shape_vec.end(), filter_shape_vec.begin(), col_shape_vec[0] = output_grad->dims()[1];
filter_shape_vec.end()); for (size_t j = 0; j < data_dim; ++j) {
col_shape_vec.insert(col_shape_vec.end(), input_shape_vec.begin(), col_shape_vec[j + 1] = filter_shape_vec[j + 2];
input_shape_vec.end()); col_shape_vec[j + 1 + data_dim] = input_shape_vec[j + 2];
}
DDim col_shape(framework::make_ddim(col_shape_vec)); DDim col_shape(framework::make_ddim(col_shape_vec));
// use col_matrix_shape in the gemm calculation // use col_matrix_shape in the gemm calculation
// size: (c * k_h * k_w, h * w) or (c * k_d * k_h * k_w, d * h * w) // size: (c * k_h * k_w, h * w) or (c * k_d * k_h * k_w, d * h * w)
DDim col_matrix_shape = DDim col_matrix_shape = framework::flatten_to_2d(col_shape, data_dim + 1);
framework::flatten_to_2d(col_shape, filter_shape_vec.size() + 1);
// output size: (c, o_h, o_w) or (c, o_d, o_h, o_w) // output size: (c, o_h, o_w) or (c, o_d, o_h, o_w)
DDim output_shape = framework::slice_ddim(output_grad->dims(), 1, DDim output_shape = framework::slice_ddim(output_grad->dims(), 1,
...@@ -248,7 +240,7 @@ class GemmConvTransposeGradKernel : public framework::OpKernel<T> { ...@@ -248,7 +240,7 @@ class GemmConvTransposeGradKernel : public framework::OpKernel<T> {
Tensor output_grad_batch = Tensor output_grad_batch =
output_grad->Slice(i, i + 1).Resize(output_shape); output_grad->Slice(i, i + 1).Resize(output_shape);
if (filter_shape_vec.size() == 2) { if (data_dim == 2U) {
// im2col: dy -> col matrix // im2col: dy -> col matrix
// from (c, o_h, o_w) to (c * k_h * k_w, h * w) // from (c, o_h, o_w) to (c * k_h * k_w, h * w)
im2col(context.device_context(), output_grad_batch, im2col(context.device_context(), output_grad_batch,
...@@ -256,7 +248,7 @@ class GemmConvTransposeGradKernel : public framework::OpKernel<T> { ...@@ -256,7 +248,7 @@ class GemmConvTransposeGradKernel : public framework::OpKernel<T> {
std::vector<int>{paddings[0], paddings[1], paddings[0], std::vector<int>{paddings[0], paddings[1], paddings[0],
paddings[1]}, paddings[1]},
&col); &col);
} else if (filter_shape_vec.size() == 3) { } else if (data_dim == 3U) {
// vol2col: dy -> col_matrix // vol2col: dy -> col_matrix
// from (c, o_d, o_h, o_w) to (c * k_d * k_h * k_w, d * h * w) // from (c, o_d, o_h, o_w) to (c * k_d * k_h * k_w, d * h * w)
vol2col(context.device_context(), output_grad_batch, dilations, vol2col(context.device_context(), output_grad_batch, dilations,
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/ftrl_op.h"
namespace paddle {
namespace operators {
class FTRLOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Param"),
"Input(Param) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasInput("SquaredAccumulator"),
"Input(SquaredAccumulator) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasInput("LinearAccumulator"),
"Input(LinearAccumulator) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Grad"),
"Input(Grad) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasInput("LearningRate"),
"Input(LearningRate) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("ParamOut"),
"Output(ParamOut) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("SquaredAccumOut"),
"Output(SquaredAccumOut) of FTRL should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("LinearAccumOut"),
"Output(LinearAccumOut) of FTRL should not be null.");
auto param_dim = ctx->GetInputDim("Param");
PADDLE_ENFORCE_EQ(param_dim, ctx->GetInputDim("Grad"),
"Two input of FTRL Op's dimension must be same.");
auto lr_dim = ctx->GetInputDim("LearningRate");
PADDLE_ENFORCE_EQ(framework::product(lr_dim), 1,
"Learning Rate should be a scalar.");
ctx->SetOutputDim("ParamOut", param_dim);
ctx->SetOutputDim("SquaredAccumOut", param_dim);
ctx->SetOutputDim("LinearAccumOut", param_dim);
}
};
class FTRLOpMaker : public framework::OpProtoAndCheckerMaker {
public:
FTRLOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param",
"(Tensor, default Tensor<float>) "
"Input parameter value that has to be updated.");
AddInput("SquaredAccumulator",
"(Tensor, default Tensor<float>) "
"Accumulator that accumulates squared gradients.");
AddInput("LinearAccumulator",
"(Tensor, default Tensor<float>) "
"Accumulator that accumulates linear gradients.");
AddInput("Grad",
"(Tensor, default Tensor<float>) "
"Input gradient of the parameter.");
AddInput("LearningRate",
"(Tensor, default Tensor<float>) "
"The learning rate should be a tensor of size 1.");
AddOutput("ParamOut", "(Tensor) Output updated parameter value.");
AddOutput("SquaredAccumOut",
"(Tensor) Output accumulated squared"
" gradients.");
AddOutput("LinearAccumOut",
"(Tensor) Output accumulated linear"
" gradients.");
AddAttr<float>("l1",
"(float, default 0.0) "
"L1 regularization strength.")
.SetDefault(0.0f);
AddAttr<float>("l2",
"(float, default 0.0) "
"L2 regularization strength.")
.SetDefault(0.0f);
AddAttr<float>("lr_power",
"(float, default -0.5f) "
"Learning Rate Power.")
.SetDefault(-0.5f);
AddComment(R"DOC(
FTRL (Follow The Regularized Leader) Operator.
Optimizer that implements the FTRL algorithm:
$$
new\_accum = squared\_accum + grad^2 \\
if (lr\_power == -0.5) {
linear\_accum += grad - (\surd(new\_accum) - \surd(squared\_accum)) /
(learning\_rate * param) \\
} else {
linear\_accum += grad -
(new\_accum^{-lr\_power} - accum^{-lr\_power}) /
(learning\_rate * param) \\
}
x = (l1 * sign(linear\_accum) - linear\_accum)
if (lr\_power == -0.5) {
y = \frac{\surd(new\_accum)}{learning\_rate} + (2 * l2) \\
pre\_shrink = \frac{x}{y} \\
param = (abs(linear\_accum) > l1).select(pre\_shrink, 0.0) \\
} else {
y = \frac{new\_accum^{-lr\_power}}{learning\_rate} + (2 * l2) \\
pre\_shrink = \frac{x}{y} \\
param = (abs(linear\_accum) > l1).select(pre\_shrink, 0.0) \\
}
squared\_accum += grad^2;
$$
The paper that proposed Follow The Regularized Leader (FTRL):
(https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf)
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(ftrl, ops::FTRLOp, ops::FTRLOpMaker);
REGISTER_OP_CPU_KERNEL(ftrl,
ops::FTRLOpKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
You may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/ftrl_op.h"
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(ftrl,
ops::FTRLOpKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename Place, typename T>
class FTRLOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* param_out = ctx.Output<Tensor>("ParamOut");
auto* sq_accum_out = ctx.Output<Tensor>("SquaredAccumOut");
auto* lin_accum_out = ctx.Output<Tensor>("LinearAccumOut");
param_out->mutable_data<T>(ctx.GetPlace());
sq_accum_out->mutable_data<T>(ctx.GetPlace());
lin_accum_out->mutable_data<T>(ctx.GetPlace());
auto grad = ctx.Input<Tensor>("Grad");
auto l1 = static_cast<T>(ctx.Attr<float>("l1"));
auto l2 = static_cast<T>(ctx.Attr<float>("l2"));
auto lr_power = static_cast<T>(ctx.Attr<float>("lr_power"));
auto p = EigenVector<T>::Flatten(*ctx.Input<Tensor>("Param"));
auto sq_accum =
EigenVector<T>::Flatten(*ctx.Input<Tensor>("SquaredAccumulator"));
auto lin_accum =
EigenVector<T>::Flatten(*ctx.Input<Tensor>("LinearAccumulator"));
auto g = EigenVector<T>::Flatten(*grad);
auto lr = EigenVector<T>::Flatten(*ctx.Input<Tensor>("LearningRate"));
auto p_out = EigenVector<T>::Flatten(*param_out);
auto s_acc_out = EigenVector<T>::Flatten(*sq_accum_out);
auto l_acc_out = EigenVector<T>::Flatten(*lin_accum_out);
auto place = ctx.GetEigenDevice<Place>();
Eigen::DSizes<int, 1> grad_dsize(grad->numel());
auto new_accum = sq_accum + g * g;
// Special case for lr_power = -0.5
if (lr_power == static_cast<T>(-0.5)) {
l_acc_out.device(place) =
lin_accum + g -
((new_accum.sqrt() - sq_accum.sqrt()) / lr.broadcast(grad_dsize)) * p;
} else {
l_acc_out.device(place) =
lin_accum + g -
((new_accum.pow(-lr_power) - sq_accum.pow(-lr_power)) /
lr.broadcast(grad_dsize)) *
p;
}
auto x = (l_acc_out.constant(l1) * l_acc_out.sign() - l_acc_out);
if (lr_power == static_cast<T>(-0.5)) {
auto y = (new_accum.sqrt() / lr.broadcast(grad_dsize)) +
l_acc_out.constant(static_cast<T>(2) * l2);
auto pre_shrink = x / y;
p_out.device(place) =
(l_acc_out.abs() > l_acc_out.constant(l1))
.select(pre_shrink, p.constant(static_cast<T>(0)));
} else {
auto y = (new_accum.pow(-lr_power) / lr.broadcast(grad_dsize)) +
l_acc_out.constant(static_cast<T>(2) * l2);
auto pre_shrink = x / y;
p_out.device(place) =
(l_acc_out.abs() > l_acc_out.constant(l1))
.select(pre_shrink, p.constant(static_cast<T>(0)));
}
s_acc_out.device(place) = sq_accum + g * g;
}
};
} // namespace operators
} // namespace paddle
...@@ -70,11 +70,18 @@ input value and Y as the target value. Huber loss can evaluate the fitness of ...@@ -70,11 +70,18 @@ input value and Y as the target value. Huber loss can evaluate the fitness of
X to Y. Different from MSE loss, Huber loss is more robust for outliers. The X to Y. Different from MSE loss, Huber loss is more robust for outliers. The
shape of X and Y are [batch_size, 1]. The equation is: shape of X and Y are [batch_size, 1]. The equation is:
L_{\delta}(y, f(x)) = $$
Out_{\delta}(X, Y)_i =
\begin{cases} \begin{cases}
0.5 * (y - f(x))^2, \quad |y - f(x)| \leq \delta \\ 0.5 * (Y_i - X_i)^2,
\delta * (|y - f(x)| - 0.5 * \delta), \quad otherwise \quad |Y_i - X_i| \leq \delta \\
\delta * (|Y_i - X_i| - 0.5 * \delta),
\quad otherwise
\end{cases} \end{cases}
$$
In the above equation, $Out_\delta(X, Y)_i$, $X_i$ and $Y_i$ represent the ith
element of Out, X and Y.
)DOC"); )DOC");
} }
......
...@@ -297,7 +297,25 @@ void set_constant_with_place<platform::GPUPlace>( ...@@ -297,7 +297,25 @@ void set_constant_with_place<platform::GPUPlace>(
template struct RowwiseAdd<platform::GPUPlace, float>; template struct RowwiseAdd<platform::GPUPlace, float>;
template struct RowwiseAdd<platform::GPUPlace, double>; template struct RowwiseAdd<platform::GPUPlace, double>;
template struct ColwiseSum<platform::GPUPlace, float>; template struct ColwiseSum<platform::GPUPlace, float>;
template struct ColwiseSum<platform::GPUPlace, double>; // template struct ColwiseSum<platform::GPUPlace, double>;
// The ColwiseSum<platform::GPUPlace, double> failed in debug mode,
// and only failed for this case. So reimplemented it.
template <>
void ColwiseSum<platform::GPUPlace, double>::operator()(
const platform::DeviceContext& context, const framework::Tensor& input,
framework::Tensor* vector) {
auto in_dims = input.dims();
auto size = input.numel() / in_dims[0];
PADDLE_ENFORCE_EQ(vector->numel(), size);
framework::Tensor one;
one.mutable_data<double>({in_dims[0]}, context.GetPlace());
SetConstant<platform::GPUPlace, double> set;
set(context, &one, static_cast<double>(1.0));
gemv<platform::GPUPlace, double>(context, true, static_cast<int>(in_dims[0]),
static_cast<int>(in_dims[1]), 1.0,
input.data<double>(), one.data<double>(),
0.0, vector->data<double>());
}
} // namespace math } // namespace math
} // namespace operators } // namespace operators
......
...@@ -145,6 +145,8 @@ struct SelectedRowsAddTo<platform::CPUPlace, T> { ...@@ -145,6 +145,8 @@ struct SelectedRowsAddTo<platform::CPUPlace, T> {
template struct SelectedRowsAddTo<platform::CPUPlace, float>; template struct SelectedRowsAddTo<platform::CPUPlace, float>;
template struct SelectedRowsAddTo<platform::CPUPlace, double>; template struct SelectedRowsAddTo<platform::CPUPlace, double>;
template struct SelectedRowsAddTo<platform::CPUPlace, int>;
template struct SelectedRowsAddTo<platform::CPUPlace, int64_t>;
template <typename T> template <typename T>
struct SelectedRowsAddToTensor<platform::CPUPlace, T> { struct SelectedRowsAddToTensor<platform::CPUPlace, T> {
...@@ -175,6 +177,8 @@ struct SelectedRowsAddToTensor<platform::CPUPlace, T> { ...@@ -175,6 +177,8 @@ struct SelectedRowsAddToTensor<platform::CPUPlace, T> {
template struct SelectedRowsAddToTensor<platform::CPUPlace, float>; template struct SelectedRowsAddToTensor<platform::CPUPlace, float>;
template struct SelectedRowsAddToTensor<platform::CPUPlace, double>; template struct SelectedRowsAddToTensor<platform::CPUPlace, double>;
template struct SelectedRowsAddToTensor<platform::CPUPlace, int>;
template struct SelectedRowsAddToTensor<platform::CPUPlace, int64_t>;
} // namespace math } // namespace math
} // namespace operators } // namespace operators
......
...@@ -173,6 +173,8 @@ struct SelectedRowsAddTo<platform::GPUPlace, T> { ...@@ -173,6 +173,8 @@ struct SelectedRowsAddTo<platform::GPUPlace, T> {
template struct SelectedRowsAddTo<platform::GPUPlace, float>; template struct SelectedRowsAddTo<platform::GPUPlace, float>;
template struct SelectedRowsAddTo<platform::GPUPlace, double>; template struct SelectedRowsAddTo<platform::GPUPlace, double>;
template struct SelectedRowsAddTo<platform::GPUPlace, int>;
template struct SelectedRowsAddTo<platform::GPUPlace, int64_t>;
namespace { namespace {
template <typename T, int block_size> template <typename T, int block_size>
...@@ -223,6 +225,8 @@ struct SelectedRowsAddToTensor<platform::GPUPlace, T> { ...@@ -223,6 +225,8 @@ struct SelectedRowsAddToTensor<platform::GPUPlace, T> {
template struct SelectedRowsAddToTensor<platform::GPUPlace, float>; template struct SelectedRowsAddToTensor<platform::GPUPlace, float>;
template struct SelectedRowsAddToTensor<platform::GPUPlace, double>; template struct SelectedRowsAddToTensor<platform::GPUPlace, double>;
template struct SelectedRowsAddToTensor<platform::GPUPlace, int>;
template struct SelectedRowsAddToTensor<platform::GPUPlace, int64_t>;
} // namespace math } // namespace math
} // namespace operators } // namespace operators
......
...@@ -176,4 +176,6 @@ namespace ops = paddle::operators; ...@@ -176,4 +176,6 @@ namespace ops = paddle::operators;
REGISTER_OPERATOR(sum, ops::SumOp, ops::SumOpMaker, ops::SumGradMaker, REGISTER_OPERATOR(sum, ops::SumOp, ops::SumOpMaker, ops::SumGradMaker,
ops::SumOpVarTypeInference); ops::SumOpVarTypeInference);
REGISTER_OP_CPU_KERNEL(sum, ops::SumKernel<paddle::platform::CPUPlace, float>, REGISTER_OP_CPU_KERNEL(sum, ops::SumKernel<paddle::platform::CPUPlace, float>,
ops::SumKernel<paddle::platform::CPUPlace, double>); ops::SumKernel<paddle::platform::CPUPlace, double>,
ops::SumKernel<paddle::platform::CPUPlace, int>,
ops::SumKernel<paddle::platform::CPUPlace, int64_t>);
...@@ -14,4 +14,6 @@ limitations under the License. */ ...@@ -14,4 +14,6 @@ limitations under the License. */
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(sum, ops::SumKernel<paddle::platform::GPUPlace, float>, REGISTER_OP_GPU_KERNEL(sum, ops::SumKernel<paddle::platform::GPUPlace, float>,
ops::SumKernel<paddle::platform::GPUPlace, double>); ops::SumKernel<paddle::platform::GPUPlace, double>,
ops::SumKernel<paddle::platform::GPUPlace, int>,
ops::SumKernel<paddle::platform::GPUPlace, int64_t>);
...@@ -31,6 +31,16 @@ constexpr int PADDLE_CUDA_NUM_THREADS = 512; ...@@ -31,6 +31,16 @@ constexpr int PADDLE_CUDA_NUM_THREADS = 512;
// For atomicAdd. // For atomicAdd.
USE_CUDA_ATOMIC(Add, float); USE_CUDA_ATOMIC(Add, float);
USE_CUDA_ATOMIC(Add, int);
USE_CUDA_ATOMIC(Add, unsigned int);
USE_CUDA_ATOMIC(Add, unsigned long long int);
CUDA_ATOMIC_WRAPPER(Add, int64_t) {
static_assert(sizeof(int64_t) == sizeof(long long int),
"long long should be int64");
return CudaAtomicAdd(reinterpret_cast<unsigned long long int*>(address),
static_cast<unsigned long long int>(val));
}
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 600 #if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 600
USE_CUDA_ATOMIC(Add, double); USE_CUDA_ATOMIC(Add, double);
......
import unittest
import numpy as np
from op_test import OpTest
class TestFTRLOp(OpTest):
def setUp(self):
self.op_type = "ftrl"
w = np.random.random((102, 105)).astype("float32")
g = np.random.random((102, 105)).astype("float32")
sq_accum = np.full((102, 105), 0.1).astype("float32")
linear_accum = np.full((102, 105), 0.1).astype("float32")
lr = np.array([0.01]).astype("float32")
l1 = 0.1
l2 = 0.2
lr_power = -0.5
self.inputs = {
'Param': w,
'SquaredAccumulator': sq_accum,
'LinearAccumulator': linear_accum,
'Grad': g,
'LearningRate': lr
}
self.attrs = {
'l1': l1,
'l2': l2,
'lr_power': lr_power,
'learning_rate': lr
}
new_accum = sq_accum + g * g
if lr_power == -0.5:
linear_out = linear_accum + g - (
(np.sqrt(new_accum) - np.sqrt(sq_accum)) / lr) * w
else:
linear_out = linear_accum + g - ((np.power(
new_accum, -lr_power) - np.power(sq_accum, -lr_power)) / lr) * w
x = (l1 * np.sign(linear_out) - linear_out)
if lr_power == -0.5:
y = (np.sqrt(new_accum) / lr) + (2 * l2)
pre_shrink = x / y
param_out = np.where(np.abs(linear_out) > l1, pre_shrink, 0.0)
else:
y = (np.power(new_accum, -lr_power) / lr) + (2 * l2)
pre_shrink = x / y
param_out = np.where(np.abs(linear_out) > l1, pre_shrink, 0.0)
sq_accum_out = sq_accum + g * g
self.outputs = {
'ParamOut': param_out,
'SquaredAccumOut': sq_accum_out,
'LinearAccumOut': linear_out
}
def test_check_output(self):
self.check_output()
if __name__ == "__main__":
unittest.main()
...@@ -21,7 +21,7 @@ class TestBook(unittest.TestCase): ...@@ -21,7 +21,7 @@ class TestBook(unittest.TestCase):
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
program.append_backward(avg_cost) program.append_backward(avg_cost)
# print str(program) print str(program)
def test_recognize_digits_mlp(self): def test_recognize_digits_mlp(self):
program = Program() program = Program()
...@@ -50,7 +50,8 @@ class TestBook(unittest.TestCase): ...@@ -50,7 +50,8 @@ class TestBook(unittest.TestCase):
input=predict, label=label, main_program=program) input=predict, label=label, main_program=program)
avg_cost = layers.mean(x=cost, main_program=program) avg_cost = layers.mean(x=cost, main_program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
# print str(program)
print str(program)
def test_simple_conv2d(self): def test_simple_conv2d(self):
program = Program() program = Program()
...@@ -65,7 +66,7 @@ class TestBook(unittest.TestCase): ...@@ -65,7 +66,7 @@ class TestBook(unittest.TestCase):
filter_size=[4, 4], filter_size=[4, 4],
main_program=program) main_program=program)
# print str(program) print str(program)
def test_recognize_digits_conv(self): def test_recognize_digits_conv(self):
program = Program() program = Program()
...@@ -104,7 +105,7 @@ class TestBook(unittest.TestCase): ...@@ -104,7 +105,7 @@ class TestBook(unittest.TestCase):
program.append_backward(avg_cost) program.append_backward(avg_cost)
# print str(program) print str(program)
def test_word_embedding(self): def test_word_embedding(self):
program = Program() program = Program()
...@@ -165,7 +166,7 @@ class TestBook(unittest.TestCase): ...@@ -165,7 +166,7 @@ class TestBook(unittest.TestCase):
avg_cost = layers.mean(x=cost, main_program=program) avg_cost = layers.mean(x=cost, main_program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
# print str(program) print str(program)
def test_linear_chain_crf(self): def test_linear_chain_crf(self):
program = Program() program = Program()
...@@ -182,7 +183,7 @@ class TestBook(unittest.TestCase): ...@@ -182,7 +183,7 @@ class TestBook(unittest.TestCase):
crf = layers.linear_chain_crf( crf = layers.linear_chain_crf(
input=hidden, label=label, main_program=program) input=hidden, label=label, main_program=program)
# print str(program) print str(program)
if __name__ == '__main__': if __name__ == '__main__':
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册