In this section, we will introduce the three main units of training a recognizer or localizer:
data pipeline, model and iteration pipeline.
## Data pipeline
Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers.
In the data loader, a data preparation pipeline is defined to pre-process data.
At the end of the data preparation pipeline, a dict of data items corresponding
the arguments of models' forward method is returned, and it will be fed into the model.
> Since the data in action recognition & localization may not be the same size (image size, gt bbox size, etc.), the `DataContainer` type in MMCV is used to help collect and distribute data of different size. See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
The data preparation pipeline and the dataset is decoupled.
Usually a dataset
defines how to process the annotations while a data pipeline defines all the steps to prepare a data dict.
A data preparation pipeline consists of a sequence of operations.
Each operation takes a dict as input and also output a dict for the next transformation.
A typical pipeline is shown in the following figure.
With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
![pipeline figure](imgs/data_pipeline.png)
The operations are categorized into data loading, pre-processing, formatting.