未验证 提交 7e77cf71 编写于 作者: H Heyang Qin 提交者: GitHub

Check device count before running dist tests (#2799)

* Check device count before running dist tests

* fixing format for "Check device count before running dist tests"

* Check device count against max world size

* Check GPU count before launching dist tests

* double-check GPU actually exists

---------
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
上级 859d7c92
......@@ -108,6 +108,10 @@ class DistributedExec(ABC):
return fixture_kwargs
def _launch_procs(self, num_procs):
if torch.cuda.is_available() and torch.cuda.device_count() < num_procs:
pytest.skip(
f"Skipping test because not enough GPUs are available: {num_procs} required, {torch.cuda.device_count()} available"
)
mp.set_start_method('forkserver', force=True)
skip_msg = mp.Queue() # Allows forked processes to share pytest.skip reason
processes = []
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册