未验证 提交 2b81d13c 编写于 作者: 1 1want2sleep 提交者: GitHub

更改了引用格式 (#47963)

* fix some docs bugs; test=document_fix

* Update batch_sampler.py

* Update dataset.py

* Update dataset.py

* Update sampler.py

* for codestyle; test=document_fix

* fix copy-from issue; test=document_fix
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: 梦柳's avatarLigoml <limengliu@tiaozhan.com>
上级 519e7426
...@@ -37,20 +37,20 @@ class BatchSampler(Sampler): ...@@ -37,20 +37,20 @@ class BatchSampler(Sampler):
Args: Args:
dataset(Dataset): this could be a :code:`paddle.io.Dataset` dataset(Dataset, optional): this should be an instance of a subclass of :ref:`api_paddle_io_Dataset` or
implement or other python object which implemented :ref:`api_paddle_io_IterableDataset` or other python object which implemented
:code:`__len__` for BatchSampler to get indices as the :code:`__len__` for BatchSampler to get indices as the
range of :attr:`dataset` length. Default None. range of :attr:`dataset` length. Default None, disabled.
sampler (Sampler): this could be a :code:`paddle.io.Dataset` sampler (Sampler, optional): this should be a :ref:`api_paddle_io_Sample`
instance which implemented :code:`__iter__` to yield instance which implemented :code:`__iter__` to generate
sample indices. :attr:`sampler` and :attr:`dataset` sample indices. :attr:`sampler` and :attr:`dataset`
can not be set in the same time. If :attr:`sampler` can not be set in the same time. If :attr:`sampler`
is set, :attr:`shuffle` should not be set. Default None. is set, :attr:`dataset` should not be set. Default None, disabled.
shuffle(bool): whether to shuffle indices order before genrating shuffle(bool, optional): whether to shuffle indices order before generating
batch indices. Default False. batch indices. Default False, don't shuffle indices before generating batch indices.
batch_size(int): sample indice number in a mini-batch indices. batch_size(int, optional): sample indice number in a mini-batch indices. default 1, each mini-batch includes 1 sample.
drop_last(bool): whether drop the last incomplete batch dataset size drop_last(bool, optional): whether drop the last incomplete (less than 1 mini-batch) batch dataset. Default False, keep it.
is not divisible by the batch size. Default False see :ref:`api_paddle_io_DataLoader`
Returns: Returns:
BatchSampler: an iterable object for indices iterating BatchSampler: an iterable object for indices iterating
...@@ -92,7 +92,6 @@ class BatchSampler(Sampler): ...@@ -92,7 +92,6 @@ class BatchSampler(Sampler):
print(batch_indices) print(batch_indices)
see `paddle.io.DataLoader`
""" """
...@@ -183,22 +182,24 @@ class DistributedBatchSampler(BatchSampler): ...@@ -183,22 +182,24 @@ class DistributedBatchSampler(BatchSampler):
Dataset is assumed to be of constant size. Dataset is assumed to be of constant size.
Args: Args:
dataset(paddle.io.Dataset): this could be a `paddle.io.Dataset` implement dataset(Dataset): this could be an instance of subclass of :ref:`api_paddle_io_Dataset`
or other python object which implemented or other python object which implemented
`__len__` for BatchSampler to get sample `__len__` for BatchSampler to get indices of samples.
number of data source. batch_size(int): sample size of each mini-batch.
batch_size(int): sample indice number in a mini-batch indices.
num_replicas(int, optional): porcess number in distributed training. num_replicas(int, optional): porcess number in distributed training.
If :attr:`num_replicas` is None, :attr:`num_replicas` will be If :attr:`num_replicas` is None, :attr:`num_replicas` will be
retrieved from :code:`paddle.distributed.ParallenEnv`. retrieved from :ref:`api_paddle_distributed_ParallelEnv` .
Default None. Default None.
rank(int, optional): the rank of the current process among :attr:`num_replicas` rank(int, optional): the rank of the current process among :attr:`num_replicas`
processes. If :attr:`rank` is None, :attr:`rank` is retrieved from processes. If :attr:`rank` is None, :attr:`rank` is retrieved from
:code:`paddle.distributed.ParallenEnv`. Default None. :ref:`api_paddle_distributed_ParallelEnv`. Default None.
shuffle(bool): whther to shuffle indices order before genrating shuffle(bool, optional): whther to shuffle indices order before genrating
batch indices. Default False. batch indices. Default False.
drop_last(bool): whether drop the last incomplete batch dataset size drop_last(bool, optional): whether drop the last incomplete(less than a mini-batch) batch dataset size.
is not divisible by the batch size. Default False Default False.
Returns:
DistributedBatchSampler, return an iterable object for indices iterating.
Examples: Examples:
.. code-block:: python .. code-block:: python
......
...@@ -89,19 +89,20 @@ class IterableDataset(Dataset): ...@@ -89,19 +89,20 @@ class IterableDataset(Dataset):
An abstract class to encapsulate methods and behaviors of iterable datasets. An abstract class to encapsulate methods and behaviors of iterable datasets.
All datasets in iterable-style (can only get sample one by one sequentially, like All datasets in iterable-style (can only get sample one by one sequentially, like
a Python iterator) should be a subclass of `paddle.io.IterableDataset`. All subclasses should a Python iterator) should be a subclass of :ref:`api_paddle_io_IterableDataset` . All subclasses should
implement following methods: implement following methods:
:code:`__iter__`: yield sample sequentially. This method is required by reading dataset sample in :code:`paddle.io.DataLoader`. :code:`__iter__`: yield sample sequentially. This method is required by reading dataset sample in :ref:`api_paddle_io_DataLoader` .
.. note:: .. note::
do not implement :code:`__getitem__` and :code:`__len__` in IterableDataset, should not be called either. do not implement :code:`__getitem__` and :code:`__len__` in IterableDataset, should not be called either.
see :code:`paddle.io.DataLoader`. see :ref:`api_paddle_io_DataLoader` .
Examples: Examples:
.. code-block:: python .. code-block:: python
:name: code-example1
import numpy as np import numpy as np
from paddle.io import IterableDataset from paddle.io import IterableDataset
...@@ -128,9 +129,10 @@ class IterableDataset(Dataset): ...@@ -128,9 +129,10 @@ class IterableDataset(Dataset):
among workers as follows. In both the methods, worker information that can be getted in among workers as follows. In both the methods, worker information that can be getted in
a worker process by `paddle.io.get_worker_info` will be needed. a worker process by `paddle.io.get_worker_info` will be needed.
Example 1: splitting data copy in each worker in :code:`__iter__` splitting data copy in each worker in :code:`__iter__`
.. code-block:: python .. code-block:: python
:name: code-example2
import math import math
import paddle import paddle
...@@ -169,9 +171,10 @@ class IterableDataset(Dataset): ...@@ -169,9 +171,10 @@ class IterableDataset(Dataset):
print(data) print(data)
# outputs: [2, 5, 3, 6, 4, 7] # outputs: [2, 5, 3, 6, 4, 7]
Example 2: splitting data copy in each worker by :code:`worker_init_fn` splitting data copy in each worker by :code:`worker_init_fn`
.. code-block:: python .. code-block:: python
:name: code-example3
import math import math
import paddle import paddle
...@@ -370,16 +373,16 @@ class ComposeDataset(Dataset): ...@@ -370,16 +373,16 @@ class ComposeDataset(Dataset):
class ChainDataset(IterableDataset): class ChainDataset(IterableDataset):
""" """
A Dataset which chains multiple iterable-tyle datasets. A Dataset which chains multiple iterable-style datasets.
This dataset is used for assembling multiple datasets which should This dataset is used for assembling multiple datasets which should
be :code:`paddle.io.IterableDataset`. be :ref:`api_paddle_io_IterableDataset`.
Args: Args:
datasets(list of Dataset): List of datasets to be chainned. datasets(list of IterableDatasets): List of datasets to be chainned.
Returns: Returns:
Dataset: A Dataset which chains fields of multiple datasets. paddle.io.IterableDataset: A Dataset which chains fields of multiple datasets.
Examples: Examples:
......
...@@ -151,16 +151,16 @@ class RandomSampler(Sampler): ...@@ -151,16 +151,16 @@ class RandomSampler(Sampler):
Args: Args:
data_source(Dataset): dataset to sample, this could be an data_source(Dataset): dataset to sample, this could be an
instance of :code:`paddle.io.Dataset` other Python instance of :ref:`api_paddle_io_Dataset` or :ref:`api_paddle_io_IterableDataset` or other Python
object which implemented :code:`__len__`. object which implemented :code:`__len__` to get indices as the range of :code:`dataset` length. Default None.
replacement(bool): If False, sample the whole dataset, If False, replacement(bool, optional): If False, sample the whole dataset, If True,
set :attr:`num_samples` for how many sample to draw. Default False. set :attr:`num_samples` for how many samples to draw. Default False.
num_samples(int): set sample number to draw if :attr:`replacement` num_samples(int, optional): set sample number to draw if :attr:`replacement`
is True. Default None. is True, then it will take samples according to the number you set. Default None, disabled.
generator(Generator): specify a generator to sample the data source. Default None generator(Generator, optional): specify a generator to sample the :code:`data_source`. Default None, disabled.
Returns: Returns:
Sampler: a Sampler yield sample index randomly RandomSampler: a Sampler yield sample index randomly.
Examples: Examples:
...@@ -185,7 +185,6 @@ class RandomSampler(Sampler): ...@@ -185,7 +185,6 @@ class RandomSampler(Sampler):
for index in sampler: for index in sampler:
print(index) print(index)
see `paddle.io.Sampler`
""" """
def __init__( def __init__(
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册