- 05 7月, 2021 3 次提交
-
-
由 liqingping 提交于
-
由 liqingping 提交于
1. update NerveXJob to DIJob 2. update group version to diengine.sensetime.com 3. update project name to di-orchestrator
-
由 liqingping 提交于
-
- 02 7月, 2021 3 次提交
-
-
由 liqingping 提交于
-
由 liqingping 提交于
Feat/ddp learner env See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!33
-
由 liqingping 提交于
-
- 01 7月, 2021 4 次提交
-
-
由 liqingping 提交于
-
由 liqingping 提交于
-
由 liqingping 提交于
-
由 liqingping 提交于
Feat/replicas failed See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!32
-
- 29 6月, 2021 3 次提交
-
-
由 liqingping 提交于
-
由 liqingping 提交于
feat: add service informer See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!31
-
由 liqingping 提交于
1. 请求/replicas/failed接口时如果删除的是aggregator,则需要把aggregator和ddp learner都重建 2. 将service的owner controller改为其对应的pod 3. 统一将每个module的container name改为nervex-container 4. 统一将每个module的port name改为nervex-port
-
- 28 6月, 2021 1 次提交
-
-
由 liqingping 提交于
-
- 25 6月, 2021 1 次提交
-
-
由 liqingping 提交于
1. add service informer 2. query service informer to generate urls for server http get requests
-
- 24 6月, 2021 1 次提交
-
-
由 liqingping 提交于
feat: create aggregator when learner request more than 1 gpu See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!29
-
- 23 6月, 2021 3 次提交
-
-
由 liqingping 提交于
Feat/http api aggregator See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!30
-
由 liqingping 提交于
-
由 liqingping 提交于
-
- 22 6月, 2021 1 次提交
-
-
由 liqingping 提交于
-
- 21 6月, 2021 3 次提交
-
-
由 liqingping 提交于
-
由 liqingping 提交于
-
由 liqingping 提交于
-
- 19 6月, 2021 1 次提交
-
-
由 liqingping 提交于
docs: add architecture See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!28
-
- 16 6月, 2021 2 次提交
-
-
由 liqingping 提交于
-
由 liqingping 提交于
-
- 15 6月, 2021 1 次提交
-
-
由 liqingping 提交于
-
- 10 6月, 2021 3 次提交
-
-
由 liqingping 提交于
-
由 liqingping 提交于
-
由 liqingping 提交于
-
- 09 6月, 2021 1 次提交
-
-
由 liqingping 提交于
-
- 08 6月, 2021 1 次提交
-
-
由 liqingping 提交于
-
- 04 6月, 2021 3 次提交
-
-
由 liqingping 提交于
fix: fix bug that container status missmatch with pod phase when has one more containers See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!26
-
由 liqingping 提交于
-
由 liqingping 提交于
when a pod has one more containers, the pod is always in Running phase as well as there exists a Running container. So wheen main container exits, but the sidecar is still Running, the pod will keep in Running phase. we fix it
-
- 01 6月, 2021 2 次提交
-
-
由 liqingping 提交于
fix: fix bug that defer is not executed before return See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!25
-
由 liqingping 提交于
当job succeeded之后defer没有执行是因为defer语句被放置在return之后了,所以需要把defer语句提到前面去
-
- 28 5月, 2021 3 次提交
-
-
由 liqingping 提交于
Fix/replicas volume not created See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!24
-
由 liqingping 提交于
fix: fix bug that the terminating coordinator request server to create replicas See merge request platform/CloudNative4AI/cluster-lifecycle/nervex-operator!23
-
由 liqingping 提交于
-