提交 e43073e8 编写于 作者: Y Yan Chunwei 提交者: GitHub

design of TensorArray to simplify varient/dynamic RNNs (#4447)

* some enforce change

* add LoDTensor::NumElements(id,id)

* update

* init

* finish

* refacter interfaces
上级 cce682fe
# Design for TensorArray
TensorArray as a new concept is borrowed from TensorFlow,
it is meant to be used with dynamic iteration primitives such as `while_loop` and `map_fn`.
This concept can be used to support our new design of dynamic operations, and help to refactor some existing variant-sentence-related layers,
such as `RecurrentGradientMachine`.
In [our design for dynamic RNN](https://github.com/PaddlePaddle/Paddle/pull/4401),
`TensorArray` is used to segment inputs and store states in all time steps.
By providing some methods similar to a C++ array,
the definition of some state-based dynamic models such as RNN could be more natural and highly flexible.
## Dynamic-Related Methods
Some basic methods should be proposed as follows:
### stack()
Pack the values in a `TensorArray` into a tensor with rank one higher than each tensor in `values`.
### unstack(axis=0)
Unpacks the given dimension of a rank-`R` tensor into rank-`(R-1)` tensors.
### concat()
Return the values in the `TensorArray` as a concatenated Tensor.
### write(index, value, data_shared=true)
Write value into index of the TensorArray.
### read(index)
Read the value at location `index` in the `TensorArray`.
### size()
Return the number of values.
## LoDTensor-related Supports
The `RecurrentGradientMachine` in Paddle serves as a flexible RNN layer; it takes variant length sequences as input,
because each step of RNN could only take a tensor-represented batch of data as input,
some preprocess should be taken on the inputs such as sorting the sentences by their length in descending order and cut each word and pack to new batches.
Such cut-like operations can be embedded into `TensorArray` as general methods called `unpack` and `pack`.
With these two methods, a variant-sentence-RNN can be implemented like
```c++
// input is the varient-length data
LodTensor sentence_input(xxx);
TensorArray ta;
Tensor indice_map;
Tensor boot_state = xxx; // to initialize rnn's first state
TensorArray::unpack(input, 1/*level*/, true/*sort_by_length*/, &ta, &indice_map);
TessorArray step_outputs;
TensorArray states;
for (int step = 0; step = ta.size(); step++) {
auto state = states.read(step);
// rnnstep is a function which acts like a step of RNN
auto step_input = ta.read(step);
auto step_output = rnnstep(step_input, state);
step_outputs.write(step_output, true/*data_shared*/);
}
// rnn_output is the final output of an rnn
LoDTensor rnn_output = ta.pack(ta, indice_map);
```
the code above shows that by embedding the LoDTensor-related preprocess operations into `TensorArray`,
the implementation of a RNN that supports varient-length sentences is far more concise than `RecurrentGradientMachine` because the latter mixes all the codes together, hard to read and extend.
some details are as follows.
### unpack(level, sort_by_length)
Split LodTensor in some `level` and generate batches, if set `sort_by_length`, will sort by length.
Returns:
- a new `TensorArray`, whose values are LodTensors and represents batches of data.
- an int32 Tensor, which stores the map from the new batch's indices to original LoDTensor
### pack(level, indices_map)
Recover the original LoD-arranged LoDTensor with the values in a `TensorArray` and `level` and `indices_map`.
......@@ -72,6 +72,22 @@ bool operator==(const LoD& a, const LoD& b) {
return true;
}
size_t LoDTensor::NumElements(size_t level, size_t idx) const {
PADDLE_ENFORCE_LT(level, NumLevels());
PADDLE_ENFORCE_LT(idx, NumElements(level));
// the last level of LoD, just return number of records in Tensor
if (level == NumLevels() - 1) {
return lod_[level][idx + 1] - lod_[level][idx];
}
// high level of LoD, and there is another lower level, return number of
// lower-level elements
auto tmp = SliceInLevel(lod_, level, idx, idx + 1);
PADDLE_ENFORCE_GE(tmp.size(), 2);
// there is a 0 as a placeholder stored in LoD, so the number of elements
// equals lod.size() - 1
return tmp[1].size() - 1;
}
void LoDTensor::ShrinkLevels(size_t level_begin, size_t level_end) {
auto new_lod = framework::SliceLevels(lod_, level_begin, level_end);
lod_ = new_lod;
......
......@@ -38,6 +38,18 @@ using Vector = thrust::host_vector<
T, thrust::system::cuda::experimental::pinned_allocator<T>>;
#endif
/*
* 3-level LoD stores
*
* 0 10 20
* 0 5 10 15 20
* 0 2 5 7 10 12 15 20
*
* - in a level, each element indicates offset in the underlying Tensor
* - the first element should be 0 and that indicates that this sequence start
* from 0
* - each sequence's begin and end(no-inclusive) is level[id, id+1]
*/
using LoD = std::vector<Vector<size_t>>;
LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end);
......@@ -65,11 +77,8 @@ class LoDTensor : public Tensor {
* Get a element from LoD.
*/
size_t lod_element(size_t level, size_t elem) const {
PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
NumLevels());
PADDLE_ENFORCE(elem < NumElements(level),
"element begin [%d] out of range [%d]", elem,
NumElements(level));
PADDLE_ENFORCE_LT(level, NumLevels());
PADDLE_ENFORCE_LT(elem, NumElements(level));
return (lod_)[level][elem];
}
......@@ -82,12 +91,23 @@ class LoDTensor : public Tensor {
* Number of elements in a level.
*/
size_t NumElements(size_t level = 0) const {
PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
NumLevels());
PADDLE_ENFORCE_LT(level, NumLevels());
// the last offset is the end of last element
return (lod_)[level].size() - 1;
}
/*
* Number of lower-level elements.
* For example, a 2-level lod-tensor
*
* 0-th level | |
* 1-th level || |||
*
* NumElements(0, 0) get 2
* NumElements(0, 1) get 3
*/
size_t NumElements(size_t level, size_t idx) const;
/*
* Shrink levels[level_begin:level_end]
*/
......
......@@ -56,6 +56,12 @@ TEST_F(LoDTensorTester, NumElements) {
ASSERT_EQ(lod_tensor_.NumElements(2), 8UL);
}
TEST_F(LoDTensorTester, NumElements2) {
ASSERT_EQ(lod_tensor_.NumElements(0, 0), 2UL);
ASSERT_EQ(lod_tensor_.NumElements(0, 1), 2UL);
ASSERT_EQ(lod_tensor_.NumElements(1, 1), 2UL);
}
TEST_F(LoDTensorTester, ShrinkLevels) {
// slice 1 level
for (size_t level = 0; level < 3UL; ++level) {
......@@ -65,7 +71,7 @@ TEST_F(LoDTensorTester, ShrinkLevels) {
ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor_.NumElements(level));
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
}
// slice 2 level
// shrink 2 level
for (size_t level = 0; level < 2UL; ++level) {
LoDTensor new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkLevels(level, level + 2);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册