Created by: yaoxuefeng6
PR types
Others
PR changes
APIs
Describe
1, hide some setting method and other methods in dataset class. These methods will be set at init() once by passing specific key value pair. 2, move some method from base class to child class. 3, modify related ut. 4, update example codes.
dataset api python demo
slots = ["slot1", "slot2", "slot3", "slot4"]
slots_vars = []
for slot in slots:
var = fluid.layers.data(name=slot, shape=[1], dtype="int64", lod_level=1)
slots_vars.append(var)
# create dataset instance directly with distributed.InMemoryDataset
dataset = paddle.distributed.InMemoryDataset()
# call init() to initialize single node related settings once.
dataset.init(
batch_size=32,
thread_num=3,
pipe_command="cat",
use_var=slots_vars)
# call init_distributed_settings() to initialize distributed related settings.
dataset._init_distributed_settings(
fea_eval=True,
candidate_size=10000)
# call update_settings to update specific settings.
dataset.update_settings(batch_size=2)
dataset.set_filelist(
["test_run_with_dump_a.txt", "test_run_with_dump_b.txt"])
dataset.load_into_memory()
dataset.local_shuffle()
place = paddle.CUDAPlace(0) if paddle.fluid.core.is_compiled_with_cuda() else paddle.CPUPlace()
exe = paddle.static.Executor(place)
startup_program = paddle.static.Program()
main_program = paddle.static.Program()
exe.run(startup_program)
exe.train_from_dataset(main_program, dataset)
update according to comments 1, only expose InMemoryDataset and QueueDataset in paddle.distributed, which can be created directly without using factory 2, add init_distributed_settings() method to set advanced distributed related settings. 3, add update_settings() method to update some settings on a existed dataset instance.