DataSources¶
Data Sources are helpers to define paddle training data or testing data. There are several data attributes will be used by paddle:
- Data ProviderType: such as Python, Protobuf
- Data File list: a single file that contains all data file paths
-
paddle.trainer_config_helpers.data_sources.
define_py_data_sources
(train_list, test_list, module, obj, args=None, train_async=False, data_cls=<function PyData>)¶ Define python Train/Test data sources in one method. If train/test use the same Data Provider configuration, module/obj/args contain one argument, otherwise contain a list or tuple of arguments. For example:
define_py_data_sources("train.list", "test.list", module="data_provider" obj="process", args={"dictionary": dict_name})
Or.
define_py_data_sources("train.list", "test.list", module="data_provider" obj=["process_train", "process_test"], args=[{"dictionary": dict_train}, {"dictionary": dict_test}])
The related data provider can refer to here.
Parameters: - data_cls –
- train_list (basestring) – Train list name.
- test_list (basestring) – Test list name.
- module (basestring or tuple or list) – python module name. If train and test is different, then pass a tuple or list to this argument.
- obj (basestring or tuple or list) – python object name. May be a function name if using PyDataProviderWrapper. If train and test is different, then pass a tuple or list to this argument.
- args (string or picklable object or list or tuple.) – The best practice is using dict() to pass arguments into
DataProvider, and use
@init_hook_wrapper
to receive arguments. If train and test is different, then pass a tuple or list to this argument. - train_async (bool) – Is training data load asynchronously or not.
Returns: None
Return type: None