提交 5688c381 编写于 作者: H Helin Wang

fix according to comments

上级 478b0c8a
# Python Data Reader Design Doc # Python Data Reader Design Doc
Paddle reads data from *data reader* during training. *data reader* will be passed into `paddle.train` as a parameter. At training and testing time, PaddlePaddle programs need to read data. To ease the users' work to write data reading code, we define that
- A *reader* is a function that reads data (from file, network, random number generator, etc) and yields data items.
- A *reader creator* is a function that returns a reader function.
- A *reader* decorator is a function, which accepts one or more readers, and returns a reader.
and provide frequently used reader creators and reader decorators.
## Data Reader Interface ## Data Reader Interface
Data reader is a function with no parameter that creates a iterable (anything can be used in `for x in iterable`): Indeed, *data reader* doesn't have to be a function that reads and yields data items. It can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`):
``` ```
iterable = data_reader() iterable = data_reader()
...@@ -15,16 +21,20 @@ Element produced for the iterable should be a **single** entry of data, **not** ...@@ -15,16 +21,20 @@ Element produced for the iterable should be a **single** entry of data, **not**
An example implementation for single item data reader: An example implementation for single item data reader:
```python ```python
def data_reader_fake_image(): def reader_creator_random_image(width, height):
while True: def reader():
yield numpy.random.uniform(-1, 1, size=20*20) while True:
yield numpy.random.uniform(-1, 1, size=width*height)
return reader
``` ```
An example implementation for multiple item data reader: An example implementation for multiple item data reader:
```python ```python
def data_reader_fake_image_and_label(): def reader_creator_random_imageand_label(widht, height, label):
while True: def reader():
yield numpy.random.uniform(-1, 1, size=20*20), False while True:
yield numpy.random.uniform(-1, 1, size=width*height), label
return reader
``` ```
## Usage ## Usage
...@@ -61,25 +71,27 @@ buffered_reader = paddle.reader.buffered(paddle.dataset.mnist, 100) ...@@ -61,25 +71,27 @@ buffered_reader = paddle.reader.buffered(paddle.dataset.mnist, 100)
### Compose Multiple Data Readers ### Compose Multiple Data Readers
For example, we want to use a source of real images (reusing mnist dataset), and a source of fake images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661). For example, we want to use a source of real images (reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661).
We can do: We can do:
```python ```python
def data_reader_fake_image(): def reader_creator_random_image(width, height):
while True: def reader():
yield numpy.random.uniform(-1, 1, size=20*20) while True:
yield numpy.random.uniform(-1, 1, size=width*height)
return reader
def data_reader_creator_bool(t): def reader_creator_bool(t):
def reader: def reader:
while True: while True:
yield t yield t
return reader return reader
true_reader = data_reader_creator_bool(True) true_reader = reader_creator_bool(True)
false_reader = data_reader_creator_bool(False) false_reader = reader_creator_bool(False)
reader = paddle.reader.compose(paddle.dataset.mnist, data_reader_fake_image, true_reader, false_reader) reader = paddle.reader.compose(paddle.dataset.mnist, data_reader_random_image(20, 20), true_reader, false_reader)
# Skipped 1 because paddle.dataset.mnist produces two items per data entry. # Skipped 1 because paddle.dataset.mnist produces two items per data entry.
# And we don't care second item at this time. # And we don't care second item at this time.
paddle.train(reader, {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...) paddle.train(reader, {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册