Data Reader Decorators takes a single or multiple data reader, returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` syntax.
Data Reader Decorators takes a single or multiple data reader, returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` syntax.
Since we have a strict interface for data readers (no parameter, return a single data item). Data reader can be used flexiable via data reader decorators. Following a few examples:
Since we have a strict interface for data readers (no parameter, return a single data item). Data reader can be used flexiable via data reader decorators. Following are a few examples:
Given shuffle buffer size `n`, `paddle.reader.shuffle` will return a data reader decorater that buffers `n` data entries and shuffle them before a data entry is read.
Given shuffle buffer size `n`, `paddle.reader.shuffle` will return a data reader that buffers `n` data entries and shuffle them before a data entry is read.
### Why return only a single entry, but not a mini batch?
### Why return only a single entry, but not a mini batch?
If return a mini batch, data reader need to take care of batch size. But batch size is a concept for training, it makes more sense for user to specify batch size as a parameter for `train`.
If a mini batch is returned, data reader need to take care of batch size. But batch size is a concept for training, it makes more sense for user to specify batch size as a parameter for `train`.
Practically, always return a single entry make reusing existing data reader much easier (e.g., if existing data reader return not a single entry but 3 entries, training code will be more complex because it need to handle cases like batch size 2).
Practically, always return a single entry make reusing existing data reader much easier (e.g., if existing data reader return not a single entry but 3 entries, training code will be more complex because it need to handle cases like batch size 2).
...
@@ -129,13 +129,16 @@ An example implementation of paddle.train could be:
...
@@ -129,13 +129,16 @@ An example implementation of paddle.train could be:
```python
```python
defminibatch_decorater(reader,minibatch_size):
defminibatch_decorater(reader,minibatch_size):
buf=[reader.next()forxinxrange(minibatch_size)]
defret():
whilelen(buf)>0:
r=reader()
yieldbuf
buf=[r.next()forxinxrange(minibatch_size)]
buf=[reader.next()forxinxrange(minibatch_size)]
whilelen(buf)>0:
yieldbuf
buf=[r.next()forxinxrange(minibatch_size)]
returnret
deftrain(reader,mapping,batch_size,total_pass):
deftrain(reader,mapping,batch_size,total_pass):
forpass_idxinrange(total_pass):
forpass_idxinrange(total_pass):
formini_batchinminibatch_decorater(reader()):# this loop will never end in online learning.
formini_batchinminibatch_decorater(reader):# this loop will never end in online learning.