DataSources¶

Data Sources are helpers to define paddle training data or testing data. There are several data attributes will be used by paddle:

Data ProviderType: such as Python, Protobuf
Data File list: a single file that contains all data file paths

paddle.trainer_config_helpers.data_sources.define_py_data_sources(train_list, test_list, module, obj, args=None, train_async=False, data_cls=<function PyData>)¶

Define python Train/Test data sources in one method. If train/test use the same Data Provider configuration, module/obj/args contain one argument, otherwise contain a list or tuple of arguments. For example:

define_py_data_sources("train.list", "test.list", module="data_provider"
                       obj="process", args={"dictionary": dict_name})

Or.

define_py_data_sources("train.list", "test.list", module="data_provider"
                       obj=["process_train", "process_test"],
                       args=[{"dictionary": dict_train}, {"dictionary": dict_test}])

The related data provider can refer to here.

Parameters:	data_cls – train_list (basestring) – Train list name. test_list (basestring) – Test list name. module (basestring or tuple or list) – python module name. If train and test is different, then pass a tuple or list to this argument. obj (basestring or tuple or list) – python object name. May be a function name if using PyDataProviderWrapper. If train and test is different, then pass a tuple or list to this argument. args (string or picklable object or list or tuple.) – The best practice is using dict() to pass arguments into DataProvider, and use `@init_hook_wrapper` to receive arguments. If train and test is different, then pass a tuple or list to this argument. train_async (bool) – Is training data load asynchronously or not.
Returns:	None
Return type:	None

DataSources¶

Previous topic

Next topic

This Page