提交 9b58632c 编写于 作者: H HongyingG 提交者: Hao Wang

py_reader_en (#564)

* py_reader_en

* Textual Check
上级 6472ec6e
.. _user_guide_use_py_reader_en:
############################################
Use PyReader to read training and test data
############################################
Paddle Fluid supports PyReader, which implements feeding data from Python to C++. Different from :ref:`user_guide_use_numpy_array_as_train_data_en` , the process of loading data to Python is asynchronous with the process of :code:`Executor::Run()` reading data when PyReader is in use.
Moreover, PyReader is able to work with :code:`double_buffer_reader` to upgrade the performance of reading data.
Create PyReader Object
################################
You can create PyReader object as follows:
.. code-block:: python
import paddle.fluid as fluid
py_reader = fluid.layers.py_reader(capacity=64,
shapes=[(-1,3,224,224), (-1,1)],
dtypes=['float32', 'int64'],
name='py_reader',
use_double_buffer=True)
In the code, ``capacity`` is buffer size of PyReader;
``shapes`` is the size of parameters in the batch (such as image and label in picture classification task);
``dtypes`` is data type of parameters in the batch;
``name`` is name of PyReader instance;
``use_double_buffer`` is True by default, which means :code:`double_buffer_reader` is used.
To create some different PyReader objects (Usually, you have to create two different PyReader objects for training and testing phase), the names of objects must be different. For example, In the same task, PyReader objects in training and testing period are created as follows:
.. code-block:: python
import paddle.fluid as fluid
train_py_reader = fluid.layers.py_reader(capacity=64,
shapes=[(-1,3,224,224), (-1,1)],
dtypes=['float32', 'int64'],
name='train',
use_double_buffer=True)
test_py_reader = fluid.layers.py_reader(capacity=64,
shapes=[(-1,3,224,224), (-1,1)],
dtypes=['float32', 'int64'],
name='test',
use_double_buffer=True)
Note: You could not copy PyReader object with :code:`Program.clone()` so you have to create PyReader objects in training and testing phase with the method mentioned above
Because you could not copy PyReader with :code:`Program.clone()` so you have to share the parameters of training phase with testing phase through :code:`fluid.unique_name.guard()` .
Details are as follows:
.. code-block:: python
import paddle.fluid as fluid
import paddle.dataset.mnist as mnist
import paddle.v2
import numpy
def network(is_train):
reader = fluid.layers.py_reader(
capacity=10,
shapes=((-1, 784), (-1, 1)),
dtypes=('float32', 'int64'),
name="train_reader" if is_train else "test_reader",
use_double_buffer=True)
img, label = fluid.layers.read_file(reader)
...
# Here, we omitted the definition of loss of the model
return loss , reader
train_prog = fluid.Program()
train_startup = fluid.Program()
with fluid.program_guard(train_prog, train_startup):
with fluid.unique_name.guard():
train_loss, train_reader = network(True)
adam = fluid.optimizer.Adam(learning_rate=0.01)
adam.minimize(train_loss)
test_prog = fluid.Program()
test_startup = fluid.Program()
with fluid.program_guard(test_prog, test_startup):
with fluid.unique_name.guard():
test_loss, test_reader = network(False)
Configure data source of PyReader objects
##########################################
PyReader provides :code:`decorate_tensor_provider` and :code:`decorate_paddle_reader` , both of which receieve Python :code:`generator` as data source.The difference is:
1. :code:`decorate_tensor_provider` : :code:`generator` generates a :code:`list` or :code:`tuple` each time, with each element of :code:`list` or :code:`tuple` being :code:`LoDTensor` or Numpy array, and :code:`LoDTensor` or :code:`shape` of Numpy array must be the same as :code:`shapes` stated while PyReader is created.
2. :code:`decorate_paddle_reader` : :code:`generator` generates a :code:`list` or :code:`tuple` each time, with each element of :code:`list` or :code:`tuple` being Numpy array,but the :code:`shape` of Numpy array doesn't have to be the same as :code:`shape` stated while PyReader is created. :code:`decorate_paddle_reader` will :code:`reshape` Numpy array internally.
Train and test model with PyReader
##################################
Details are as follows(the remaining part of the code above):
.. code-block:: python
place = fluid.CUDAPlace(0)
startup_exe = fluid.Executor(place)
startup_exe.run(train_startup)
startup_exe.run(test_startup)
trainer = fluid.ParallelExecutor(
use_cuda=True, loss_name=train_loss.name, main_program=train_prog)
tester = fluid.ParallelExecutor(
use_cuda=True, share_vars_from=trainer, main_program=test_prog)
train_reader.decorate_paddle_reader(
paddle.v2.reader.shuffle(paddle.batch(mnist.train(), 512), buf_size=8192))
test_reader.decorate_paddle_reader(paddle.batch(mnist.test(), 512))
for epoch_id in xrange(10):
train_reader.start()
try:
while True:
print 'train_loss', numpy.array(
trainer.run(fetch_list=[train_loss.name]))
except fluid.core.EOFException:
print 'End of epoch', epoch_id
train_reader.reset()
test_reader.start()
try:
while True:
print 'test loss', numpy.array(
tester.run(fetch_list=[test_loss.name]))
except fluid.core.EOFException:
print 'End of testing'
test_reader.reset()
Specific steps are as follows:
1. Before the start of every epoch, call :code:`start()` to invoke PyReader;
2. At the end of every epoch, :code:`read_file` throws exception :code:`fluid.core.EOFException` . Call :code:`reset()` after catching up exception to reset the state of PyReader in order to start next epoch.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册