Add function to split datasets into multiple files to do cluster train
Created by: typhoonzero
We need to add a function to split downloaded datasets to multiple raw files and add function to read a subset of the files for each dataset.
Define split functions like:
def split(count, suffix="%05d"):
Define cluster readers like:
def cluster_files_reader(TRAIN_FILE_PATTERN):
train_file_list = glob.glob(TRAIN_FILE_PATTERN)
train_file_list.sort()
my_train_list = []
for idx, f in enumerate(train_file_list):
if idx % trainer_count == trainer_id:
my_train_list.append(f)
for f in my_train_list:
# do reader stuff...