Create all_reduce_sum operator for collective communication.
This operator sums the tensor data by coordinates across the specified group and returns a tensor with the shape of the input tensor.
Reduce tensors across the specified group by sum.
Args:
inp: The tensor data to apply this operator on.
group: The communication node list instance of :class:'Group' to apply this operator across. The default group is WORLD which means all processes available.
Specify a list of process ranks to apply this operator on specific processes, e.g. [1, 3, 5].
device: The specific device type of :class:'str' to execute this operator. The default device is None which mean the device of inp will be used.
Specify "cpu" or "gpu" to execute this operator on specific devices.
inp: Input tensor.
group: The process group to work on.
The default group is WORLD which means all processes available.
You can use a list of process ranks to create new group to work on it, e.g. [1, 3, 5].
device: The specific device to execute this operator.
None default device means the device of inp will be used.
Specify "gpu0:1" to execute this operator on diffrent cuda stream,
1 is stream id, and default stream id is 0.
Returns:
opt: The reduce sum tensor of the input tensor data across the specified group.
Result tensor.
Examples:
.. code-block::
import megengine as mge
import megengine.distributed as dist
import numpy as np
from warnings import warn
def func(sum_value):
# get the rank of this process, the ranks shold be 0, 1, 2, 3 for a 4 gpu task
rank = dist.get_rank()
data = mge.tensor(rank)
# the result should be n * (n - 1) / 2 for all processes
result = mge.functional.distributed.all_reduce_sum(data).item()
assert result == sum_value
def main():
p_num = mge.device.get_device_count("gpu")
if p_num < 2:
warn('This opr only works on group with more than one gpu')
Create all_reduce_min operator for collective communication.
This operator calculates the minimum value of the tensor data by coordinates across the specified group and returns a tensor with the shape of the input tensor.
Reduce tensors across the specified group by min.
Args:
inp: The tensor data to apply this operator on.
group: The communication node list instance of :class:'Group' to apply this operator across. The default group is WORLD which means all processes available.
Specify a list of process ranks to apply this operator on specific processes, e.g. [1, 3, 5].
device: The specific device type of :class:'str' to execute this operator. The default device is None which mean the device of inp will be used.
Specify "cpu" or "gpu" to execute this operator on specific devices.
inp: Input tensor.
group: The process group to work on.
The default group is WORLD which means all processes available.
You can use a list of process ranks to create new group to work on it, e.g. [1, 3, 5].
device: The specific device to execute this operator.
None default device means the device of inp will be used.
Specify "gpu0:1" to execute this operator on diffrent cuda stream,
1 is stream id, and default stream id is 0.
Returns:
opt: The reduce min tensor of the input tensor data across the specified group.
Result tensor.
Examples:
.. code-block::
import megengine as mge
import megengine.distributed as dist
import numpy as np
from warnings import warn
def func(min_value):
# get the rank of this process, the ranks shold be 0, 1, 2, 3 for a 4 gpu task
rank = dist.get_rank()
data = mge.Tensor(rank)
# the result should be 0 for all processes
result = mge.functional.distributed.all_reduce_min(data).item()
assert result == min_value
def main():
p_num = dist.helper.get_device_count("gpu")
if p_num < 2:
warn('This opr only works on group with more than one gpu')