@@ -210,10 +210,9 @@ The `Momentum` optimizer is used as the parameter update tool. The definition is
...
@@ -210,10 +210,9 @@ The `Momentum` optimizer is used as the parameter update tool. The definition is
## Training the Network
## Training the Network
`context.set_auto_parallel_context` is an API for users to set parallel training parameters and must be called before the initialization of `Model`. If no parameters are specified, MindSpore will automatically set parameters to the empirical values based on the parallel mode. For example, in data parallel mode, `parameter_broadcast` is enabled by default. The related parameters are as follows:
`context.set_auto_parallel_context` is an API for users to set parallel training parameters and must be called before the initialization of networks. The related parameters are as follows:
-`parallel_mode`: parallel distributed mode. The default value is `ParallelMode.STAND_ALONE`. The options are `ParallelMode.DATA_PARALLEL` and `ParallelMode.AUTO_PARALLEL`.
-`parallel_mode`: parallel distributed mode. The default value is `ParallelMode.STAND_ALONE`. The options are `ParallelMode.DATA_PARALLEL` and `ParallelMode.AUTO_PARALLEL`.
-`parameter_broadcast`: whether to broadcast initialized parameters. The default value is `True` in `DATA_PARALLEL` and `HYBRID_PARALLEL` mode.
-`mirror_mean`: During backward computation, the framework collects gradients of parameters in data parallel mode across multiple hosts, obtains the global gradient value, and transfers the global gradient value to the optimizer for update. The default value is `False`, which indicates that the `allreduce_sum` operation is applied. The value `True` indicates that the `allreduce_mean` operation is applied.
-`mirror_mean`: During backward computation, the framework collects gradients of parameters in data parallel mode across multiple hosts, obtains the global gradient value, and transfers the global gradient value to the optimizer for update. The default value is `False`, which indicates that the `allreduce_sum` operation is applied. The value `True` indicates that the `allreduce_mean` operation is applied.
-`enable_parallel_optimizer`: a developing feature. Whether to use optimizer model parallel, which improves performance by distributing the parameters to be updated to each worker, and applying Broadcast among workers to share updated parameters. This feature can be used only in data parallel mode and when the number of parameters is larger than the number of devices.
-`enable_parallel_optimizer`: a developing feature. Whether to use optimizer model parallel, which improves performance by distributing the parameters to be updated to each worker, and applying Broadcast among workers to share updated parameters. This feature can be used only in data parallel mode and when the number of parameters is larger than the number of devices.