提交 a3f75315 编写于 作者: M ms_yan

repair format and type problem in split

上级 32a72c19
...@@ -613,7 +613,7 @@ class Dataset: ...@@ -613,7 +613,7 @@ class Dataset:
# if we still need more rows, give them to the first split. # if we still need more rows, give them to the first split.
# if we have too many rows, remove the extras from the first split that has # if we have too many rows, remove the extras from the first split that has
# enough rows. # enough rows.
size_difference = dataset_size - absolute_sizes_sum size_difference = int(dataset_size - absolute_sizes_sum)
if size_difference > 0: if size_difference > 0:
absolute_sizes[0] += size_difference absolute_sizes[0] += size_difference
else: else:
...@@ -647,10 +647,14 @@ class Dataset: ...@@ -647,10 +647,14 @@ class Dataset:
Datasets of size round(f1*K), round(f2*K), …, round(fn*K) where K is the size of the Datasets of size round(f1*K), round(f2*K), …, round(fn*K) where K is the size of the
original dataset. original dataset.
If after rounding: If after rounding:
-Any size equals 0, an error will occur.
-The sum of split sizes < K, the difference will be added to the first split. - Any size equals 0, an error will occur.
-The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference. - The sum of split sizes < K, the difference will be added to the first split.
- The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference.
randomize (bool, optional): determines whether or not to split the data randomly (default=True). randomize (bool, optional): determines whether or not to split the data randomly (default=True).
If true, the data will be randomly split. Otherwise, each split will be created with If true, the data will be randomly split. Otherwise, each split will be created with
consecutive rows from the dataset. consecutive rows from the dataset.
...@@ -1282,10 +1286,14 @@ class MappableDataset(SourceDataset): ...@@ -1282,10 +1286,14 @@ class MappableDataset(SourceDataset):
Datasets of size round(f1*K), round(f2*K), …, round(fn*K) where K is the size of the Datasets of size round(f1*K), round(f2*K), …, round(fn*K) where K is the size of the
original dataset. original dataset.
If after rounding: If after rounding:
-Any size equals 0, an error will occur.
-The sum of split sizes < K, the difference will be added to the first split. - Any size equals 0, an error will occur.
-The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference. - The sum of split sizes < K, the difference will be added to the first split.
- The sum of split sizes > K, the difference will be removed from the first large
enough split such that it will have atleast 1 row after removing the difference.
randomize (bool, optional): determines whether or not to split the data randomly (default=True). randomize (bool, optional): determines whether or not to split the data randomly (default=True).
If true, the data will be randomly split. Otherwise, each split will be created with If true, the data will be randomly split. Otherwise, each split will be created with
consecutive rows from the dataset. consecutive rows from the dataset.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册