Data provider provides data for training. It will be passed into `paddle.train` as a parameter.
## Data Provider Interface
Data provider is a function with no parameter that creates a iterable (anything can be used in `for x in iterable`):
```
iterable = data_provider()
```
Element produced for the iterable should be a **single** entry of data, in format `[column_0_item, column_1_item, ...]`. Each element of the list needs to be supported data type (e.g., numpy 1d array of float32, list of int).
For example, `column_0_item` could be image pixels of format numpy 1d array of float32, and `column_1_item` could be image label of format single int value:
```
for single_entry in iterable:
pixel = entry[0]
label = entry[1]
```
## Usage
data provider, mapping from data provider column to data layer, batch size and number of total pass will be passed into `paddle.train`:
### Why return only a single entry, but not a mini batch?
If return a mini batch, data provider need to take care of batch size. But batch size is a concept for training, it makes more sense for user to specify batch size as a parameter for `train`.
Concretely, always return a single entry make reusing existing data providers much easier (e.g., if existing data provider return not a single entry but 3 entries, training code will be more complex because it need to handle cases like batch size 2).