[ds-inference] checkpoint loading => tqdm (#2107)

* [ds-inference] checkpoint loading => tqdm solve 2 issues: - less noise using tqdm progress bar - more informative - tell users how much to wait and how many shards to load New way: ``` Loading 72 checkpoints: 12%|█▎ | 9/72 [01:12<08:39, 8.25s/it] ``` * write only from one process * style

[ds-inference] checkpoint loading => tqdm (#2107)
* [ds-inference] checkpoint loading => tqdm solve 2 issues: - less noise using tqdm progress bar - more informative - tell users how much to wait and how many shards to load New way: ``` Loading 72 checkpoints: 12%|█▎ | 9/72 [01:12<08:39, 8.25s/it] ``` * write only from one process * style
16699d83 · Stas Bekman · GitHub · aa88137b · 16699d83
隐藏空白更改
内联并排

Showing with 4 addition and 1 deletion

deepspeed/module_inject/replace_module.py deepspeed/module_inject/replace_module.py +4 -1

未找到文件。
--- a/deepspeed/module_inject/replace_module.py
+++ b/deepspeed/module_inject/replace_module.py
 import copy
 import torch
+import tqdm
 import deepspeed
 import deepspeed.ops.transformer as transformer_inference
 from .replace_policy import HFBertLayerPolicy, HFGPT2LayerPolicy, HFGPTJLayerPolicy, BLOOMLayerPolicy
@@ -765,9 +766,11 @@ def replace_transformer_layer(orig_layer_impl,
                                     _replace_policy=policy)

    if checkpoint is not None:
+        pbar = tqdm.tqdm(total=len(checkpoint),
+                         desc=f"Loading {len(checkpoint)} checkpoint shards")
        for i in range(len(checkpoint)):
            if not deepspeed.comm.is_initialized() or deepspeed.comm.get_rank() == 0:
-                print(f"loading checkpoint ({i})")
+                pbar.update(1)
            sd = torch.load(checkpoint[i], map_location='cpu')
            load_model_with_checkpoint(replaced_module, sd, mp_replace)
    return replaced_module