未验证 提交 0c75f4a3 编写于 作者: L Logan Adams 提交者: GitHub

Update nightly workflows to open an issue if CI fails (#3952)

* Update H100 workflow to open an issue if nightly CI fails

* Test running as not CI

* Add all nightly/switch envvar name

* Test with AMD

* Add way to get url, switch path of template

* Add additional checkout step

* Move actions checkout step

* Try absolute path with github workspace

* Create issue without template/path

* Re-enable and add debug logic

* add if failed()

* More debug

* Try without checkout action uses

* Rename file

* Update variables

* Update issue template

* Confirm removing permissions still work

* Revert "Confirm removing permissions still work"

This reverts commit e7c2915a.

* Re-enable permissions

* Remove PR trigger for AMD MI200 tests

* Revert "Remove PR trigger for AMD MI200 tests"

This reverts commit 5c5c5fd6.

* Test update_existing

* Switch to composite action

* Fix line ending encoding issue

* Switch failure to be a variable

* Test with second workflow

* Format fix

* Switch failure to always

* Switch back to previously working way

* Test permission changes

* Revert "Test permission changes"

This reverts commit e051da75.

* Update existing bugs with newest build failure link

* Remove PR triggers for that were used for testing.
上级 d300517f
---
name: CI failure report
about: Report a DeepSpeed CI failure
title: "{{ env.GITHUB_WORKFLOW }} CI test failure"
labels: ci-failure
assignees: ''
---
The Nightly CI for {{ env.GITHUB_SERVER_URL }}/{{ env.GITHUB_REPOSITORY }}/actions/runs/{{ env.GITHUB_RUN_ID }} failed.
......@@ -8,6 +8,10 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
issues: write
jobs:
amd-tests:
# The type of runner that the job will run on
......@@ -65,3 +69,12 @@ jobs:
cd tests
pytest $PYTEST_OPTS -n 4 --verbose unit/
pytest $PYTEST_OPTS -m 'sequential' unit/
- name: Open GitHub issue if nightly CI fails
if: failure()
uses: JasonEtco/create-an-issue@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
filename: .github/ISSUE_TEMPLATE/ci_failure_report.md
update_existing: true
......@@ -8,6 +8,10 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
issues: write
jobs:
unit-tests:
runs-on: [self-hosted, nvidia, h100]
......@@ -49,3 +53,12 @@ jobs:
cd tests
python -m pytest $PYTEST_OPTS -n 4 unit/ --torch_ver="2.0" --cuda_ver="12"
python -m pytest $PYTEST_OPTS -m 'sequential' unit/ --torch_ver="2.0" --cuda_ver="12"
- name: Open GitHub issue if nightly CI fails
if: failure()
uses: JasonEtco/create-an-issue@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
filename: .github/ISSUE_TEMPLATE/ci_failure_report.md
update_existing: true
......@@ -8,6 +8,10 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
issues: write
jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
......@@ -47,3 +51,12 @@ jobs:
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
cd tests
pytest $PYTEST_OPTS --forked -m 'nightly' unit/ --torch_ver="1.13" --cuda_ver="11.6"
- name: Open GitHub issue if nightly CI fails
if: failure()
uses: JasonEtco/create-an-issue@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
filename: .github/ISSUE_TEMPLATE/ci_failure_report.md
update_existing: true
......@@ -8,6 +8,10 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
issues: write
jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
......@@ -48,3 +52,12 @@ jobs:
cd tests
pytest $PYTEST_OPTS --forked -n 4 unit/
pytest $PYTEST_OPTS --forked -m 'sequential' unit/
- name: Open GitHub issue if nightly CI fails
if: failure()
uses: JasonEtco/create-an-issue@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
filename: .github/ISSUE_TEMPLATE/ci_failure_report.md
update_existing: true
......@@ -8,6 +8,10 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
issues: write
jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu111, p40]
......@@ -47,3 +51,12 @@ jobs:
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
cd tests
pytest $PYTEST_OPTS --forked -n 4 unit/ --torch_ver="1.9" --cuda_ver="11.1"
- name: Open GitHub issue if nightly CI fails
if: failure()
uses: JasonEtco/create-an-issue@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
filename: .github/ISSUE_TEMPLATE/ci_failure_report.md
update_existing: true
......@@ -8,6 +8,10 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
issues: write
jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu111, v100]
......@@ -48,3 +52,12 @@ jobs:
cd tests
pytest $PYTEST_OPTS --forked -n 4 unit/ --torch_ver="1.9" --cuda_ver="11"
pytest $PYTEST_OPTS --forked -m 'sequential' unit/ --torch_ver="1.9" --cuda_ver="11"
- name: Open GitHub issue if nightly CI fails
if: failure()
uses: JasonEtco/create-an-issue@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
filename: .github/ISSUE_TEMPLATE/ci_failure_report.md
update_existing: true
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册