未验证 提交 11a426ac 编写于 作者: S Shaden Smith 提交者: GitHub

Pointing docs to hosted HTML files for core API. (#41)

上级 246a2844
......@@ -215,7 +215,7 @@ pre-defined learning rate schedule:
* **Gradient Averaging**: in distributed data parallel training, `backward`
ensures that gradients are averaged across data parallel processes after
training on an `effective_batch_size`.
training on an `train_batch_size`.
* **Loss Scaling**: in FP16/mixed precision training, the DeepSpeed
engine automatically handles scaling the loss to avoid precision loss in the
......@@ -274,7 +274,7 @@ the `step` value is stored as part of the `client_sd`.
DeepSpeed featureds can be enabled, disabled, or configured using a config JSON
file that should be specified as `args.deepspeed_config`. A sample config file
is shown below. For a full set of features see [core API
doc](../../API/core_api/core_api.md).
doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html).
```json
{
......@@ -363,11 +363,12 @@ deepspeed --include="worker-2:0,1" \
## Further Reading
| Article | Description |
| ---------------------------------------------------------------- | -------------------------------------------- |
| [DeepSpeed Features](./docs/features.md) | DeepSpeed features |
| [CIFAR-10 Tutorial](./docs/tutorials/CIFAR-10.md) | Getting started with CIFAR-10 and DeepSpeed |
| [Megatron-LM Tutorial](./docs/tutorials/MegatronGPT2Tutorial.md) | Train GPT2 with DeepSpeed and Megatron-LM |
| Article | Description |
| ---------------------------------------------------------------------------------------------- | -------------------------------------------- |
| [DeepSpeed Features](./docs/features.md) | DeepSpeed features |
| [CIFAR-10 Tutorial](./docs/tutorials/CIFAR-10.md) | Getting started with CIFAR-10 and DeepSpeed |
| [Megatron-LM Tutorial](./docs/tutorials/MegatronGPT2Tutorial.md) | Train GPT2 with DeepSpeed and Megatron-LM |
| [API Documentation]( https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html) | Generated DeepSpeed API documentation |
......
......@@ -124,19 +124,19 @@ The DeepSpeed core API consists of just a handful of methods:
* checkpointing : `load_checkpoint` and `store_checkpoint`
DeepSpeed supports all the features described in this document, via the use of these API,
along with a `deepspeed_config` JSON file for enabling and disabling the features. Please
see [core API doc](../../API/core_api/core_api.md) for more details.
along with a `deepspeed_config` JSON file for enabling and disabling the features.
Please see the [core API doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html) for more details.
### Gradient Clipping
DeepSpeed handles gradient clipping under the hood based on the max gradient norm
specified by the user. See [core API doc](../../API/core_api/core_api.md) for more
details.
specified by the user.
Please see the [core API doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html) for more details.
### Automatic loss scaling with mixed precision
DeepSpeed internally handles loss scaling for mixed precision training. The parameters
for loss scaling can be specified in the `deepspeed_config` JSON file. See [core API
doc](../../API/core_api/core_api.md) for more details.
for loss scaling can be specified in the `deepspeed_config` JSON file.
Please see the [core API doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html) for more details.
## Training Optimizers
......@@ -169,12 +169,12 @@ more details see [ZeRO paper](https://arxiv.org/abs/1910.02054) .
## Training Agnostic Checkpointing
**TODO: API documentation**
DeepSpeed can simplify checkpointing for you regardless of whether you are using data
parallel training, model parallel training, mixed-precision training, a mix of these
three, or using the zero optimizer to enable larger model sizes. See the [getting
started](../../Onboard/onboard/onboard.md) or [core API
doc](../../API/core_api/core_api.md) for details.
three, or using the zero optimizer to enable larger model sizes.
Please see the [Getting Started](../README.md#getting-started) guide
and the
[core API doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html) for more details.
## Advanced parameter search
DeepSpeed supports multiple Learning Rate Schedules to enable faster convergence for
......@@ -195,9 +195,10 @@ can automatically handle batch creation appropriately.
## Performance Analysis and Debugging
For performance debugging, DeepSpeed can give you a detailed breakdown of the time spent
in different parts of the training with by simply enabling it in the `deepspeed_config`
file. See [core API doc](../../API/core_api/core_api.md).
file.
Please see the [core API doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html) for more details.
```json
{
"wallclock_breakdwon": true
"wallclock_breakdown": true
}
```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册