# DataProvider Tutorial # DataProvider is responsible for data management in PaddlePaddle, corresponding to Data Layer. ## Input Data Format ## PaddlePaddle uses **Slot** to describe the data layer of neural network. One slot describes one data layer. Each slot stores a series of samples, and each sample contains a set of features. There are three attributes of a slot: + **Dimension**: dimenstion of features + **SlotType**: there are 5 different slot types in PaddlePaddle, following table compares the four commonly used ones.
SlotType Feature Description Vector Description
DenseSlot Continuous Features Dense Vector
SparseNonValueSlot Discrete Features without weights Sparse Vector with all non-zero elements equaled to 1
SparseValueSlot Discrete Features with weights Sparse Vector
IndexSlot mostly the same as SparseNonValueSlot, but especially for a single label Sparse Vector with only one value in each time step

And the remained one is **StringSlot**. It stores Character String, and can be used for debug or to describe data Id for prediction, etc. + **SeqType**: a **sequence** is a sample whose features are expanded in time scale. And a **sub-sequence** is a continous ordered subset of a sequence. For example, (a1, a2) and (a3, a4, a5) are two sub-sequences of one sequence (a1, a2, a3, a4, a5). Following are 3 different sequence types in PaddlePaddle: - **NonSeq**: input sample is not sequence - **Seq**: input sample is a sequence without sub-sequence - **SubSeq**: input sample is a sequence with sub-sequence ## Python DataProvider PyDataProviderWrapper is a python decorator in PaddlePaddle, used to read custom python DataProvider class. It currently supports all SlotTypes and SeqTypes of input data. User should only concern how to read samples from file. Feel easy with its [Use Case](python_case.md) and API Reference.