“a8cf9e676caf3af0aee42ab600809d2bec677751”上不存在“src/git@gitcode.net:openanolis/dragonwell8_jdk.git”
提交 d0cf3cec 编写于 作者: L leonwanghui

initial version

Signed-off-by: Nleonwanghui <leon.wanghui@huawei.com>
上级

要显示的变更太多。

To preserve performance only 1000 of 1000+ files are displayed.
# go package
Gopkg.lock
FROM debian:jessie
COPY cmd/ms-operator.v1/ms-operator /ms-operator
ENTRYPOINT ["/ms-operator", "-alsologtostderr"]
# Gopkg.toml example
#
# Refer to https://golang.github.io/dep/docs/Gopkg.toml.html
# for detailed Gopkg.toml documentation.
#
# required = ["github.com/user/thing/cmd/thing"]
# ignored = ["github.com/user/project/pkgX", "bitbucket.org/user/project/pkgA/pkgY"]
#
# [[constraint]]
# name = "github.com/user/project"
# version = "1.0.0"
#
# [[constraint]]
# name = "github.com/user/project2"
# branch = "dev"
# source = "github.com/myfork/project2"
#
# [[override]]
# name = "github.com/x/y"
# version = "2.4.0"
#
# [prune]
# non-go = false
# go-tests = true
# unused-packages = true
[[constraint]]
branch = "master"
name = "github.com/kubeflow/common"
[[constraint]]
name = "github.com/onrik/logrus"
version = "0.5.1"
[[constraint]]
name = "github.com/sirupsen/logrus"
version = "1.4.2"
[[constraint]]
name = "k8s.io/api"
version = "0.17.4"
[[constraint]]
name = "k8s.io/apimachinery"
version = "0.17.4"
[[constraint]]
name = "k8s.io/client-go"
version = "4.0.0"
[prune]
go-tests = true
unused-packages = true
[[constraint]]
name = "k8s.io/code-generator"
version = "release-1.7"
version = "4.0.0"
[[constraint]]
name = "cloud.google.com/go"
version = "0.55.0"
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# Mindspore Operator
#### Experimental notice: This project is still experimental and only serves as a proof of concept for running MindSpore on Kubernetes. The current version of ms-operator is based on an early version of [PyTorch Operator](https://github.com/kubeflow/pytorch-operator) and [TF Operator](https://github.com/kubeflow/tf-operator). Right now MindSpore supports running LeNet with MNIST dataset on a single node, distributed training examples are expected in the near future.
## Introduction of MindSpore and ms-operator
MindSpore is a new open source deep learning training/inference framework that
could be used for mobile, edge and cloud scenarios. MindSpore is designed to
provide development experience with friendly design and efficient execution for
the data scientists and algorithmic engineers, native support for Ascend AI
processor, and software hardware co-optimization.
This project contains the specification and implementation of MSJob custom
resource definition. We will demonstrate running a walkthrough of creating
ms-operator, as well as MNIST training job on Kubernetes with MindSpore
0.1.0-alpha image (x86 CPU build version) on a single node. More completed
features will be developed in the coming days.
This project defines the following:
- The ms-operator
- A way to deploy the operator
- MindSpore LeNet MNIST training example
- Future goal: distributed MindSpore training example
### MindSpore Docker Image
MindSpore docker image is hosted on [Docker Hub](https://hub.docker.com/repository/docker/mindspore/mindspore),
you can directly fetch the image using the below command:
```
docker pull mindspore/mindspore:0.1.0-alpha
```
### Design
The yaml file we used to create our MNIST training job is defined as follows:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: msjob-mnist
spec:
containers:
- image: mindspore/mindspore:0.1.0-alpha
imagePullPolicy: IfNotPresent
name: msjob-mnist
command: ["/bin/bash", "-c", "python /tmp/test/MNIST/lenet.py"]
volumeMounts:
- name: training-result
mountPath: /tmp/result
- name: ms-mnist
mountPath: /tmp/test
restartPolicy: OnFailure
volumes:
- name: training-result
emptyDir: {}
- name: ms-mnist
hostPath:
path: /root/gopath/src/gitee.com/mindspore/ms-operator/examples/
```
### Overview of MindSpore in Kubeflow Ecosystem
![ms-operator in kubeflow](./docs/pics/ms_operator_in_kubeflow.png)
The high-level view of how MindSpore fits in the ecosystem of Kubeflow and its
components.
## Prerequisites
- [Helm and Tiller](https://github.com/helm/helm/releases/tag/v2.9.0): v2.9.0
- [go](https://github.com/golang/go/releases/tag/go1.12.1): go1.12.1
- [docker](https://github.com/docker/docker-ce/releases/tag/v18.06.1-ce): 18.06.1-ce
- [Kubernetes](https://github.com/kubernetes/kubernetes/releases/tag/v1.14.0): v1.14.0
## Steps of running the example
First, build the ms-operator image:
```
docker build -t ms-operator .
```
After the installation, check the image status using `docker images` command:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
ms-operator latest 729960ae415e 28 hours ago 175MB
```
The MindSpore image we download from docker hub is `0.1.0-alpha` version:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
mindspore/mindspore 0.1.0-alpha 1cefbd0f7846 2 days ago 1.69GB
```
MindSpore supports heterogeneous computing including multiple hardware and
backends (`CPU`, `GPU`, `Ascend`), the device_target of MindSpore is
`Ascend` by default but we will use the CPU version here.
Install the msjob crd, ms-operator deployment and pod:
```
RBAC=true #set false if you do not have an RBAC cluster
helm install ms-operator-chart/ -n ms-operator --set rbac.install=${RBAC} --wait --replace
```
Using `helm status ms-operator` command to check generated resources:
```
LAST DEPLOYED: Tue Mar 24 11:36:51 2020
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1beta1/CustomResourceDefinition
NAME AGE
msjobs.kubeflow.org 1d
==> v1beta1/Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
ms-operator 1 1 1 1 1d
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
ms-operator-7b5b457d69-dpd2b 1/1 Running 0 1d
```
We will do a MNIST training to check the eligibility of MindSpore running on
Kubernetes:
```
cd examples/ && kubectl apply -f ms-mnist.yaml
```
The job is simply importing MindSpore packges, the dataset is already included in the `MNIST_Data` folder, executing only one epoch and printing result which should only consume little time. After the job completed, you should be able to check the job status and see the result logs. You can check the source training code in `examples/` folder.
```
kubectl get pod msjob-mnist && kubectl logs msjob-mnist
```
```
NAME READY STATUS RESTARTS AGE
msjob-mnist 0/1 Completed 0 3h53m
============== Starting Training ==============
epoch: 1 step: 1, loss is 2.3005836
epoch: 1 step: 2, loss is 2.2978227
epoch: 1 step: 3, loss is 2.3004227
epoch: 1 step: 4, loss is 2.3054247
epoch: 1 step: 5, loss is 2.3068798
epoch: 1 step: 6, loss is 2.298408
epoch: 1 step: 7, loss is 2.3055573
epoch: 1 step: 8, loss is 2.2998955
epoch: 1 step: 9, loss is 2.3028255
epoch: 1 step: 10, loss is 2.2972553
```
Since MindSpore is in the early stage of open source, the whole community is
still working on implementing distributed training of LeNet with MNIST dataset
on Kubernetes, together with the distributed training on different backends
(GPU || `Ascend`) are also expected in the near future.
## Future work
[Kubeflow](https://github.com/kubeflow/kubeflow) just announced its first major
1.0 release recently with the graduation of a core set of stable applications
including:
- [Kubeflow's UI](https://www.kubeflow.org/docs/components/central-dash/overview/)
- [Jupyter notebook controller](https://github.com/kubeflow/kubeflow/tree/master/components/notebook-controller) and [web app](https://www.kubeflow.org/docs/notebooks/why-use-jupyter-notebook/)
- [Tensorflow Operator](https://www.kubeflow.org/docs/components/training/tftraining/)(TFJob), and [PyTorch Operator](https://www.kubeflow.org/docs/components/training/pytorch/) for distributed training
- [kfctl](https://www.kubeflow.org/docs/other-guides/kustomize/) for deployment and upgrade
- etc.
The MindSpore community is driving to collaborate with the Kubeflow community
as well as making the ms-operator more complex, well-orgnized and its
dependencies up-to-date. All these components make it easy for machine learning
engineers and data scientists to leverage cloud assets (public or on-premise)
for machine learning workloads.
MindSpore is also looking forward to enable users to use Jupyter to develop
models. Users in the future can use Kubeflow tools like fairing (Kubeflow’s
python SDK) to build containers and create Kubernetes resources to train their
MindSpore models.
Once training completed, users can use [KFServing](https://github.com/kubeflow/kfserving)
to create and deploy a server for inference thus completing the life cycle of
machine learning.
Distributed training is another field MindSpore will be focusing on. There are
two major distributed training strategies nowadays: one based on parameter
servers and the other based on collective communication primitives such as
allreduce. [MPI Operator](https://github.com/kubeflow/mpi-operator) is one of
the core components of Kubeflow which makes it easy to run synchronized,
allreduce-style distributed training on Kubernetes. MPI Operator provides a crd
for defining a training job on a single CPU/GPU, multiple CPU/GPUs, and multiple
nodes. It also implements a custom controller to manage the CRD, create
dependent resources, and reconcile the desired states. If MindSpore can leverage
MPI Operator together with the high performance `Ascend` processor, it is
possible that MindSpore will bring distributed training to an even higher level.
### Example yaml file
The yaml file to create distributed training MSJob expected to be like this:
```yaml
# WIP example for distributed training
apiVersion: "kubeflow.org/v1"
kind: "MSJob"
metadata:
name: "msjob-mnist"
spec:
backend: "tcp"
masterPort: "23456"
replicaSpecs:
- replicas: 1
replicaType: MASTER
template:
spec:
containers:
- image: mindspore/mindspore:0.1.0-alpha
imagePullPolicy: IfNotPresent
name: msjob-mnist
command: ["/bin/bash", "-c", "python /tmp/test/MNIST/lenet.py"]
volumeMounts:
- name: training-result
mountPath: /tmp/result
- name: ms-mnist-local-file
mountPath: /tmp/test
restartPolicy: OnFailure
volumes:
- name: training-result
emptyDir: {}
- name: entrypoint
configMap:
name: dist-train
defaultMode: 0755
restartPolicy: OnFailure
- replicas: 3
replicaType: WORKER
template:
spec:
containers:
- image: mindspore/mindspore:0.1.0-alpha
imagePullPolicy: IfNotPresent
name: msjob-mnist
command: ["/bin/bash", "-c", "python /tmp/test/MNIST/lenet.py"]
volumeMounts:
- name: training-result
mountPath: /tmp/result
- name: ms-mnist-local-file
hostPath:
path: /root/gopath/src/gitee.com/mindspore/ms-operator/examples
restartPolicy: OnFailure
volumes:
- name: training-result
emptyDir: {}
- name: entrypoint
configMap:
name: dist-train
defaultMode: 0755
restartPolicy: OnFailure
```
The MSJob currently is designed based on the TF Job and PyTorch Job,
and is subject to change in future versions.
We define `backend` protocol which the MS workers will use to communicate when
initializing the worker group. MindSpore supports heterogeneous computing
including multiple hardware and backends (`CPU`, `GPU`, `Ascend`),
the device_target of MindSpore is `Ascend` by default.
We define `masterPort` that groups will use to communicate with master service.
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package options
import (
"flag"
"time"
)
// ServerOption is the main context object for the controller manager.
type ServerOption struct {
ChaosLevel int
ControllerConfigFile string
PrintVersion bool
GCInterval time.Duration
JsonLogFormat bool
}
// NewServerOption creates a new CMServer with a default config.
func NewServerOption() *ServerOption {
s := ServerOption{}
return &s
}
// AddFlags adds flags for a specific CMServer to the specified FlagSet
func (s *ServerOption) AddFlags(fs *flag.FlagSet) {
// chaos level will be removed once we have a formal tool to inject failures.
fs.IntVar(&s.ChaosLevel, "chaos-level", -1, "DO NOT USE IN PRODUCTION - level of chaos injected into the PyTorchJob created by the operator.")
fs.BoolVar(&s.PrintVersion, "version", false, "Show version and quit")
fs.DurationVar(&s.GCInterval, "gc-interval", 10*time.Minute, "GC interval")
fs.StringVar(&s.ControllerConfigFile, "controller-config-file", "", "Path to file containing the controller config.")
fs.BoolVar(&s.JsonLogFormat, "json-log-format", true, "Set true to use json style log format. Set false to use plaintext style log format")
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package app
import (
"fmt"
"io/ioutil"
"os"
"time"
"github.com/ghodss/yaml"
log "github.com/sirupsen/logrus"
"k8s.io/api/core/v1"
apiextensionsclient "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientset "k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
election "k8s.io/client-go/tools/leaderelection"
"k8s.io/client-go/tools/leaderelection/resourcelock"
"k8s.io/client-go/tools/record"
"gitee.com/mindspore/ms-operator/cmd/ms-operator.v1/app/options"
msv1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
jobclient "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned"
"gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/scheme"
informers "gitee.com/mindspore/ms-operator/pkg/client/informers/externalversions"
"gitee.com/mindspore/ms-operator/pkg/controller"
"gitee.com/mindspore/ms-operator/pkg/util"
"gitee.com/mindspore/ms-operator/pkg/util/k8sutil"
"gitee.com/mindspore/ms-operator/version"
)
var (
leaseDuration = 15 * time.Second
renewDuration = 5 * time.Second
retryPeriod = 3 * time.Second
)
func Run(opt *options.ServerOption) error {
// Check if the -version flag was passed and, if so, print the version and exit.
if opt.PrintVersion {
version.PrintVersionAndExit()
}
namespace := os.Getenv(util.EnvKubeflowNamespace)
if len(namespace) == 0 {
log.Infof("EnvKubeflowNamespace not set, use default namespace")
namespace = metav1.NamespaceDefault
}
// To help debugging, immediately log version
log.Infof("%+v", version.Info())
config, err := k8sutil.GetClusterConfig()
if err != nil {
return err
}
kubeClient, leaderElectionClient, msJobClient, apiExtensionsclient, err := createClients(config)
if err != nil {
return err
}
controllerConfig := readControllerConfig(opt.ControllerConfigFile)
neverStop := make(chan struct{})
defer close(neverStop)
msJobInformerFactory := informers.NewSharedInformerFactory(msJobClient, time.Second*30)
controller, err := controller.New(kubeClient, apiExtensionsclient, msJobClient, *controllerConfig, msJobInformerFactory)
if err != nil {
return err
}
go msJobInformerFactory.Start(neverStop)
run := func(stopCh <-chan struct{}) {
controller.Run(1, stopCh)
}
id, err := os.Hostname()
if err != nil {
return fmt.Errorf("Failed to get hostname: %v", err)
}
// Prepare event clients.
eventBroadcaster := record.NewBroadcaster()
recorder := eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "ms-operator"})
rl := &resourcelock.EndpointsLock{
EndpointsMeta: metav1.ObjectMeta{
Namespace: namespace,
Name: "ms-operator",
},
Client: leaderElectionClient.CoreV1(),
LockConfig: resourcelock.ResourceLockConfig{
Identity: id,
EventRecorder: recorder,
},
}
election.RunOrDie(election.LeaderElectionConfig{
Lock: rl,
LeaseDuration: leaseDuration,
RenewDeadline: renewDuration,
RetryPeriod: retryPeriod,
Callbacks: election.LeaderCallbacks{
OnStartedLeading: run,
OnStoppedLeading: func() {
log.Fatalf("leader election lost")
},
},
})
return nil
}
func readControllerConfig(controllerConfigFile string) *msv1.ControllerConfig {
controllerConfig := &msv1.ControllerConfig{}
if controllerConfigFile != "" {
log.Infof("Loading controller config from %v.", controllerConfigFile)
data, err := ioutil.ReadFile(controllerConfigFile)
if err != nil {
log.Fatalf("Could not read file: %v. Error: %v", controllerConfigFile, err)
return controllerConfig
}
err = yaml.Unmarshal(data, controllerConfig)
if err != nil {
log.Fatalf("Could not parse controller config; Error: %v\n", err)
}
log.Infof("ControllerConfig: %v", util.Pformat(controllerConfig))
} else {
log.Info("No controller_config_file provided; using empty config.")
}
return controllerConfig
}
func createClients(config *rest.Config) (clientset.Interface, clientset.Interface, jobclient.Interface, apiextensionsclient.Interface, error) {
kubeClient, err := clientset.NewForConfig(rest.AddUserAgent(config, "msjob_operator"))
if err != nil {
return nil, nil, nil, nil, err
}
leaderElectionClient, err := clientset.NewForConfig(rest.AddUserAgent(config, "leader-election"))
if err != nil {
return nil, nil, nil, nil, err
}
msJobClient, err := jobclient.NewForConfig(config)
if err != nil {
return nil, nil, nil, nil, err
}
apiExtensionsclient, err := apiextensionsclient.NewForConfig(config)
if err != nil {
return nil, nil, nil, nil, err
}
return kubeClient, leaderElectionClient, msJobClient, apiExtensionsclient, nil
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package main
import (
"flag"
"gitee.com/mindspore/ms-operator/cmd/ms-operator.v1/app"
"gitee.com/mindspore/ms-operator/cmd/ms-operator.v1/app/options"
"github.com/onrik/logrus/filename"
log "github.com/sirupsen/logrus"
)
func init() {
// Add filename as one of the fields of the structured log message
filenameHook := filename.NewHook()
filenameHook.Field = "filename"
log.AddHook(filenameHook)
}
func main() {
s := options.NewServerOption()
s.AddFlags(flag.CommandLine)
flag.Parse()
if s.JsonLogFormat {
// Output logs in a json format so that it can be parsed by services like Stackdriver
log.SetFormatter(&log.JSONFormatter{})
}
if err := app.Run(s); err != nil {
log.Fatalf("%v\n", err)
}
}
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
Lenet Tutorial
"""
import os
import argparse
import mindspore.dataset as ds
import mindspore.nn as nn
from mindspore import context, Tensor
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
from mindspore.train import Model
import mindspore.ops.operations as P
from mindspore.common.initializer import TruncatedNormal
import mindspore.dataset.transforms.vision.c_transforms as CV
import mindspore.dataset.transforms.c_transforms as C
from mindspore.dataset.transforms.vision import Inter
from mindspore.nn.metrics import Accuracy
from mindspore.common import dtype as mstype
from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
def create_dataset(data_path, batch_size=32, repeat_size=1,
num_parallel_workers=1):
"""
create dataset for train or test
"""
# define dataset
mnist_ds = ds.MnistDataset(data_path)
resize_height, resize_width = 32, 32
rescale = 1.0 / 255.0
shift = 0.0
rescale_nml = 1 / 0.3081
shift_nml = -1 * 0.1307 / 0.3081
# define map operations
resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR) # Bilinear mode
rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)
rescale_op = CV.Rescale(rescale, shift)
hwc2chw_op = CV.HWC2CHW()
type_cast_op = C.TypeCast(mstype.int32)
# apply map operations on images
mnist_ds = mnist_ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(input_columns="image", operations=resize_op, num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_op, num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(input_columns="image", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers)
# apply DatasetOps
buffer_size = 10000
mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size) # 10000 as in LeNet train script
mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
mnist_ds = mnist_ds.repeat(repeat_size)
return mnist_ds
def conv(in_channels, out_channels, kernel_size, stride=1, padding=0):
"""
conv layer weight initial
"""
weight = weight_variable()
return nn.Conv2d(in_channels, out_channels,
kernel_size=kernel_size, stride=stride, padding=padding,
weight_init=weight, has_bias=False, pad_mode="valid")
def fc_with_initialize(input_channels, out_channels):
"""
fc layer weight initial
"""
weight = weight_variable()
bias = weight_variable()
return nn.Dense(input_channels, out_channels, weight, bias)
def weight_variable():
"""
weight initial
"""
return TruncatedNormal(0.02)
class LeNet5(nn.Cell):
"""
Lenet network
"""
def __init__(self):
super(LeNet5, self).__init__()
self.batch_size = 32
self.conv1 = conv(1, 6, 5)
self.conv2 = conv(6, 16, 5)
self.fc1 = fc_with_initialize(16 * 5 * 5, 120)
self.fc2 = fc_with_initialize(120, 84)
self.fc3 = fc_with_initialize(84, 10)
self.relu = nn.ReLU()
self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
self.reshape = P.Reshape()
def construct(self, x):
x = self.conv1(x)
x = self.relu(x)
x = self.max_pool2d(x)
x = self.conv2(x)
x = self.relu(x)
x = self.max_pool2d(x)
x = self.reshape(x, (self.batch_size, -1))
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
return x
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='MindSpore LeNet Example')
parser.add_argument('--device_target', type=str, default="CPU", choices=['Ascend', 'GPU', 'CPU'],
help='device where the code will be implemented (default: CPU)')
parser.add_argument('--dataset_sink_mode', type=bool, default=False, help='dataset_sink_mode is False or True')
args = parser.parse_args()
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target,
enable_mem_reuse=False)
lr = 0.01
momentum = 0.9
epoch_size = 1
mnist_path = "/tmp/test/MNIST/MNIST_Data/"
net_loss = SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True, reduction='mean')
repeat_size = epoch_size
network = LeNet5()
net_opt = nn.Momentum(network.trainable_params(), lr, momentum)
config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)
ckpoint_cb = ModelCheckpoint(prefix="checkpoint_lenet", config=config_ck)
model = Model(network, net_loss, net_opt, metrics={"Accuracy": Accuracy()})
print("============== Starting Training ==============")
ds_train = create_dataset(os.path.join(mnist_path, "train"), 32, repeat_size)
model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor()],
dataset_sink_mode=args.dataset_sink_mode) # train
print("============== Starting Testing ==============")
param_dict = load_checkpoint("checkpoint_lenet-1_1875.ckpt")
load_param_into_net(network, param_dict)
ds_eval = create_dataset(os.path.join(mnist_path, "test")) # test
acc = model.eval(ds_eval, dataset_sink_mode=args.dataset_sink_mode)
print("============== Accuracy:{} ==============".format(acc))
# WIP example for distributed training
apiVersion: "kubeflow.org/v1"
kind: "MSJob"
metadata:
name: "msjob-mnist"
spec:
backend: "tcp"
masterPort: "23456"
replicaSpecs:
- replicas: 1
replicaType: MASTER
template:
spec:
containers:
- image: mindspore/mindspore:0.1.0-alpha
imagePullPolicy: IfNotPresent
name: msjob-mnist
command: ["/bin/bash", "-c", "python /tmp/test/MNIST/lenet.py"]
volumeMounts:
- name: training-result
mountPath: /tmp/result
- name: ms-mnist-local-file
mountPath: /tmp/test
restartPolicy: OnFailure
volumes:
- name: training-result
emptyDir: {}
- name: entrypoint
configMap:
name: dist-train
defaultMode: 0755
restartPolicy: OnFailure
- replicas: 3
replicaType: WORKER
template:
spec:
containers:
- image: mindspore/mindspore:0.1.0-alpha
imagePullPolicy: IfNotPresent
name: msjob-mnist
command: ["/bin/bash", "-c", "python /tmp/test/MNIST/lenet.py"]
volumeMounts:
- name: training-result
mountPath: /tmp/result
- name: ms-mnist-local-file
hostPath:
path: /root/gopath/src/gitee.com/mindspore/ms-operator/examples
restartPolicy: OnFailure
volumes:
- name: training-result
emptyDir: {}
- name: entrypoint
configMap:
name: dist-train
defaultMode: 0755
restartPolicy: OnFailure
apiVersion: v1
kind: Pod
metadata:
name: msjob-mnist
spec:
containers:
- image: mindspore/mindspore:0.1.0-alpha
imagePullPolicy: IfNotPresent
name: msjob-mnist
command: ["/bin/bash", "-c", "python /tmp/test/MNIST/lenet.py"]
volumeMounts:
- name: training-result
mountPath: /tmp/result
- name: ms-mnist
mountPath: /tmp/test
restartPolicy: OnFailure
volumes:
- name: training-result
emptyDir: {}
- name: ms-mnist
hostPath:
path: /root/gopath/src/gitee.com/mindspore/ms-operator/examples/
// Copyright YEAR The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#!/bin/bash
# Copyright 2018 The Kubeflow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Update CHANGELOG.md using github_changelog_generator.
#
# The script will compute changes between release tags. So make sure there is
# a release tag corresponding to the release you want to compute the changes
# for.
set -o errexit
set -o nounset
set -o pipefail
GITHUB_TOKEN=${GITHUB_TOKEN:-"NO"}
SCRIPT_ROOT=$(dirname ${BASH_SOURCE})/../..
cd ${SCRIPT_ROOT}
if [ "${GITHUB_TOKEN}" == "NO" ]
then
echo "Environment variable GITHUB_TOKEN is not set."
exit 1
fi
github_changelog_generator -t ${GITHUB_TOKEN} -u kubeflow -p common \
--exclude-labels community/discussion,cmmunity/question,duplicate,question,invalid,wontfix \
--bug-labels kind/bug,problems/bug \
--enhancement-labels improvement/optimization,kind/enhancement,improvement/enhancement,addition/feature,kind/feature \
--enhancement-label "**Features and improvements:**"
cd - > /dev/null
#!/bin/bash
# Copyright 2019 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This shell is used to auto generate some useful tools for k8s, such as lister,
# informer, deepcopy, defaulter and so on.
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_ROOT=$(dirname ${BASH_SOURCE})/..
CODEGEN_PKG=${CODEGEN_PKG:-$(cd ${SCRIPT_ROOT}; ls -d -1 ./vendor/k8s.io/code-generator 2>/dev/null || echo ../code-generator)}
# generate the code with:
# --output-base because this script should also be able to run inside the vendor dir of
# k8s.io/kubernetes. The output-base is needed for the generators to output into the vendor dir
# instead of the $GOPATH directly. For normal projects this can be dropped.
cd ${SCRIPT_ROOT}
${CODEGEN_PKG}/generate-groups.sh "defaulter,client,informer,lister,deepcopy" \
gitee.com/mindspore/ms-operator/pkg/client gitee.com/mindspore/ms-operator/pkg/apis \
mindspore:v1alpha1 \
--go-header-file ${SCRIPT_ROOT}/hack/boilerplate/boilerplate.go.txt
echo "Generating defaulters for mindspore v1alpha1"
${GOPATH}/bin/defaulter-gen --input-dirs gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1alpha1 \
-O zz_generated.defaults \
--go-header-file ./hack/../hack/boilerplate/boilerplate.go.txt \
--output-package gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1alpha1
#!/bin/bash
# Copyright 2017 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_ROOT=$(dirname "${BASH_SOURCE}")/..
DIFFROOT="${SCRIPT_ROOT}/pkg"
TMP_DIFFROOT="${SCRIPT_ROOT}/_tmp/pkg"
_tmp="${SCRIPT_ROOT}/_tmp"
cleanup() {
rm -rf "${_tmp}"
}
trap "cleanup" EXIT SIGINT
cleanup
mkdir -p "${TMP_DIFFROOT}"
cp -a "${DIFFROOT}"/* "${TMP_DIFFROOT}"
"${SCRIPT_ROOT}/hack/update-codegen.sh"
echo "diffing ${DIFFROOT} against freshly generated codegen"
ret=0
diff -Naupr "${DIFFROOT}" "${TMP_DIFFROOT}" || ret=$?
cp -a "${TMP_DIFFROOT}"/* "${DIFFROOT}"
if [[ $ret -eq 0 ]]
then
echo "${DIFFROOT} up to date."
else
echo "${DIFFROOT} is out of date. Please run hack/update-codegen.sh"
exit 1
fi
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*~
# Various IDEs
.project
.idea/
*.tmproj
apiVersion: v1
appVersion: "1.0"
description: K8s Custom Resource and Operator For MindSpore Jobs
name: ms-operator
version: 0.1.0
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: msjobs.kubeflow.org
spec:
group: kubeflow.org
version: v1
names:
kind: MSJob
singular: msjob
plural: msjobs
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ms-operator
spec:
replicas: 1
template:
metadata:
labels:
name: ms-operator
spec:
{{- if .Values.rbac.install }}
serviceAccountName: ms-operator
{{- end }}
containers:
- name: ms-operator
image: {{ .Values.image }}
imagePullPolicy: IfNotPresent
command:
- /ms-operator
{{- if .Values.config.configmap }}
- --controller-config-file={{ .Values.config.file }}
{{- else if .Values.cloud }}
- --controller-config-file=/etc/config/controller-config-file.yaml
{{- end }}
- -alsologtostderr
- -v=1
{{- if .Values.config.configmap }}
env:
- name: KUBEFLOW_NAMESPACE
value: {{ .Release.namespace }}
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: {{ .Values.config.configmap }}
{{- else if .Values.cloud }}
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: ms-operator-config
{{- end }}
{{ if .Values.rbac.install }}
apiVersion: rbac.authorization.k8s.io/{{ required "A valid .Values.rbac.apiVersion entry required!" .Values.rbac.apiVersion }}
kind: ClusterRole
metadata:
name: ms-operator
labels:
app: ms-operator
rules:
- apiGroups:
- kubeflow.org
resources:
- msjobs
verbs:
- "*"
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- "*"
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- "*"
- apiGroups:
- batch
resources:
- jobs
verbs:
- "*"
- apiGroups:
- ""
resources:
- configmaps
- pods
- services
- endpoints
- persistentvolumeclaims
- events
verbs:
- "*"
- apiGroups:
- apps
- extensions
resources:
- deployments
verbs:
- "*"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/{{ required "A valid .Values.rbac.apiVersion entry required!" .Values.rbac.apiVersion }}
metadata:
name: ms-operator
labels:
app: ms-operator
subjects:
- kind: ServiceAccount
name: ms-operator
namespace: {{ .Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ms-operator
{{ end }}
{{ if .Values.rbac.install }}
apiVersion: v1
kind: ServiceAccount
metadata:
name: ms-operator
labels:
app: ms-operator
{{ end }}
# Docker image to use.
image: ms-operator:latest
# Which cloud provider is kubernetes hosted on.
# Supported values are gke or azure.
# Leave blank to use a default, non-cloud specific config.
cloud:
## Wether the dashboard should be installed and the kind of service to use
dashboard:
install: false
serviceType: ClusterIP
config:
configmap:
file: /etc/config/controller-config-file.yaml
## Install Default RBAC roles and bindings
rbac:
install: false
apiVersion: v1
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package helper
import (
"fmt"
msv1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
"gitee.com/mindspore/ms-operator/pkg/util"
"k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime/schema"
)
var (
groupVersionKind = schema.GroupVersionKind{
Group: msv1.GroupName,
Version: msv1.GroupVersion,
Kind: msv1.ResourceKind,
}
)
// AsOwner make OwnerReference according to the parameter
func AsOwner(msJob *msv1.MSJob) metav1.OwnerReference {
trueVar := true
// Both api.OwnerReference and metatypes.OwnerReference are combined into that.
return metav1.OwnerReference{
APIVersion: groupVersionKind.GroupVersion().String(),
Kind: groupVersionKind.Kind,
Name: msJob.ObjectMeta.Name,
UID: msJob.ObjectMeta.UID,
Controller: &trueVar,
BlockOwnerDeletion: &trueVar,
}
}
// ConfigureAcceleratorsForMSJobSpec adds any accelerator specific configuration to the pods.
func ConfigureAcceleratorsForMSJobSpec(c *msv1.MSJobSpec, accelerators map[string]msv1.AcceleratorConfig) error {
for _, r := range c.ReplicaSpecs {
if r.Template == nil {
return fmt.Errorf("Replica is missing Template; %v", util.Pformat(r))
}
for i, c := range r.Template.Spec.Containers {
if c.Name == msv1.DefaultMSContainer {
// Identify the accelerators attached to this container.
a := map[string]msv1.AcceleratorConfig{}
lists := []v1.ResourceList{c.Resources.Limits, c.Resources.Requests}
for _, resources := range lists {
for name, _ := range resources {
if _, ok := accelerators[string(name)]; !ok {
continue
}
// Add the expected mounts to the pods.
a[string(name)] = accelerators[string(name)]
}
}
// Add accelerator information to the pod.
for _, config := range a {
for _, v := range config.Volumes {
r.Template.Spec.Volumes = append(r.Template.Spec.Volumes,
v1.Volume{
Name: v.Name,
VolumeSource: v1.VolumeSource{
HostPath: &v1.HostPathVolumeSource{
Path: v.HostPath,
},
},
})
c.VolumeMounts = append(c.VolumeMounts, v1.VolumeMount{
Name: v.Name,
MountPath: v.MountPath,
})
}
for _, envVar := range config.EnvVars {
c.Env = append(c.Env, v1.EnvVar{
Name: envVar.Name,
Value: envVar.Value,
})
}
}
r.Template.Spec.Containers[i] = c
break
}
}
}
return nil
}
// Cleanup cleans up user passed spec, e.g. defaulting, transforming fields.
// TODO: move this to admission controller
func Cleanup(c *msv1.MSJobSpec) {
// TODO(jlewi): Add logic to cleanup user provided spec; e.g. by filling in defaults.
// We should have default container images so user doesn't have to provide these.
}
func CRDName() string {
return fmt.Sprintf("%s.%s", msv1.CRDKindPlural, msv1.CRDGroup)
}
func scalingReason(from, to int) string {
return fmt.Sprintf("Current cluster size: %d, desired cluster size: %d", from, to)
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package v1
import (
"github.com/golang/protobuf/proto"
"k8s.io/apimachinery/pkg/runtime"
)
func addDefaultingFuncs(scheme *runtime.Scheme) error {
return RegisterDefaults(scheme)
}
// SetDefaults_MSJob sets any unspecified values to defaults
func SetDefaults_MSJob(obj *MSJob) {
c := &obj.Spec
if c.MSImage == "" {
c.MSImage = DefaultMSImage
}
// Check that each replica has a ms container.
for _, r := range c.ReplicaSpecs {
if r.MasterPort == nil {
r.MasterPort = proto.Int32(MasterPort)
}
if string(r.MSReplicaType) == "" {
r.MSReplicaType = MASTER
}
if r.Replicas == nil {
r.Replicas = proto.Int32(Replicas)
}
}
if c.TerminationPolicy == nil {
c.TerminationPolicy = &TerminationPolicySpec{
Master: &MasterSpec{
ReplicaName: "MASTER",
ReplicaRank: 0,
},
}
}
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// +k8s:deepcopy-gen=package,register
// +k8s:defaulter-gen=TypeMeta
// Package v1alpha1 is the v1alpha1 version of the API.
// +groupName=kubeflow.org
package v1
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/schema"
)
var (
SchemeBuilder = runtime.NewSchemeBuilder(addKnownTypes)
AddToScheme = SchemeBuilder.AddToScheme
)
const (
// GroupName is the group name use in this package.
GroupName = "kubeflow.org"
// ResourceKind is the kind name.
ResourceKind = "MSJob"
// GroupVersion is the version.
GroupVersion = "v1"
)
// SchemeGroupVersion is the group version used to register these objects.
var SchemeGroupVersion = schema.GroupVersion{Group: GroupName, Version: CRDVersion}
func init() {
// We only register manually written functions here. The registration of the
// generated functions takes place in the generated files. The separation
// makes the code compile even when the generated files are missing.
SchemeBuilder.Register(addDefaultingFuncs)
}
// Resource takes an unqualified resource and returns a Group-qualified GroupResource.
func Resource(resource string) schema.GroupResource {
return SchemeGroupVersion.WithResource(resource).GroupResource()
}
// addKnownTypes adds the set of types defined in this package to the supplied scheme.
func addKnownTypes(scheme *runtime.Scheme) error {
scheme.AddKnownTypes(SchemeGroupVersion,
&MSJob{},
&MSJob{},
)
metav1.AddToGroupVersion(scheme, SchemeGroupVersion)
return nil
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package v1
import (
"k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
const (
CRDKind = "msjob"
CRDKindPlural = "msjobs"
CRDGroup = "kubeflow.org"
CRDVersion = "v1"
// Value of the APP label that gets applied to a lot of entities.
AppLabel = "ms-job"
// Defaults for the Spec
MasterPort = 23456
Replicas = 1
)
// +genclient
// +genclient:noStatus
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +resource:path=msjob
// MSJob describes msjob info
type MSJob struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MSJobSpec `json:"spec"`
Status MSJobStatus `json:"status"`
}
type MSJobSpec struct {
// TODO(jlewi): Can we we get rid of this and use some value from Kubernetes or a random ide.
RuntimeId string
// ReplicaSpecs specifies the MS replicas to run.
ReplicaSpecs []*MSReplicaSpec `json:"replicaSpecs"`
// MSImage defines the mindspore docker image that should be used for default parameter server
MSImage string `json:"msImage,omitempty"`
// TerminationPolicy specifies the condition that the msjob should be considered finished.
TerminationPolicy *TerminationPolicySpec `json:"terminationPolicy,omitempty"`
// SchedulerName specifies the name of scheduler which should handle the MSJob
SchedulerName string `json:"schedulerName,omitempty"`
}
type TerminationPolicySpec struct {
// Master policy waits for a particular process (which is the master) to exit.
Master *MasterSpec `json:"master,omitempty"`
}
type MasterSpec struct {
ReplicaName string `json:"replicaName"`
ReplicaRank int `json:"replicaRank"`
}
// MSReplicaType determines how a set of MS processes are handled.
type MSReplicaType string
const (
MASTER MSReplicaType = "MASTER"
WORKER MSReplicaType = "WORKER"
)
const (
DefaultMSContainer string = "mindspore"
DefaultMSImage string = "mindspore/mindspore:v0.1.0"
)
// TODO(jlewi): We probably want to add a name field. This would allow us to have more than 1 type of each worker.
// This might be useful if you wanted to have a separate set of workers to do eval.
type MSReplicaSpec struct {
// Replicas is the number of desired replicas.
// This is a pointer to distinguish between explicit zero and unspecified.
// Defaults to 1.
// More info: http://kubernetes.io/docs/user-guide/replication-controller#what-is-a-replication-controller
// +optional
Replicas *int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`
Template *v1.PodTemplateSpec `json:"template,omitempty" protobuf:"bytes,3,opt,name=template"`
// MasterPort is the port to use for MS services.
MasterPort *int32 `json:"masterPort,omitempty" protobuf:"varint,1,opt,name=masterPort"`
MSReplicaType `json:"replicaType"`
}
type MSJobPhase string
const (
MSJobPhaseNone MSJobPhase = ""
MSJobPhaseCreating MSJobPhase = "Creating"
MSJobPhaseRunning MSJobPhase = "Running"
MSJobPhaseCleanUp MSJobPhase = "CleanUp"
MSJobPhaseFailed MSJobPhase = "Failed"
MSJobPhaseDone MSJobPhase = "Done"
)
type State string
const (
StateUnknown State = "Unknown"
StateRunning State = "Running"
StateSucceeded State = "Succeeded"
StateFailed State = "Failed"
)
type MSJobStatus struct {
// Phase is the MSJob running phase
Phase MSJobPhase `json:"phase"`
Reason string `json:"reason"`
// State indicates the state of the job.
State State `json:"state"`
// ReplicaStatuses specifies the status of each MS replica.
ReplicaStatuses []*MSReplicaStatus `json:"replicaStatuses"`
}
type ReplicaState string
const (
ReplicaStateUnknown ReplicaState = "Unknown"
ReplicaStateRunning ReplicaState = "Running"
ReplicaStateFailed ReplicaState = "Failed"
ReplicaStateSucceeded ReplicaState = "Succeeded"
)
type MSReplicaStatus struct {
MSReplicaType `json:"replica_type"`
// State is the overall state of the replica
State ReplicaState `json:"state"`
// ReplicasStates provides the number of replicas in each status.
ReplicasStates map[ReplicaState]int
}
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +resource:path=msjobs
// MSJobList is a list of MSJobs clusters.
type MSJobList struct {
metav1.TypeMeta `json:",inline"`
// Standard list metadata
// More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata
metav1.ListMeta `json:"metadata,omitempty"`
// Items is a list of MSJobs
Items []MSJob `json:"items"`
}
type ControllerConfig struct {
// Accelerators is a map from the name of the accelerator to the config for that accelerator.
// This should match the value specified as a container limit.
// e.g. alpha.kubernetes.io/nvidia-gpu
Accelerators map[string]AcceleratorConfig
// Path to the file containing the grpc server source
GrpcServerFilePath string
}
// AcceleratorVolume represents a host path that must be mounted into
// each container that needs to use GPUs.
type AcceleratorVolume struct {
Name string
HostPath string
MountPath string
}
type AcceleratorConfig struct {
Volumes []AcceleratorVolume
EnvVars []EnvironmentVariableConfig
}
type EnvironmentVariableConfig struct {
Name string
Value string
}
// +build !ignore_autogenerated
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by deepcopy-gen. DO NOT EDIT.
package v1
import (
corev1 "k8s.io/api/core/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
)
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *AcceleratorConfig) DeepCopyInto(out *AcceleratorConfig) {
*out = *in
if in.Volumes != nil {
in, out := &in.Volumes, &out.Volumes
*out = make([]AcceleratorVolume, len(*in))
copy(*out, *in)
}
if in.EnvVars != nil {
in, out := &in.EnvVars, &out.EnvVars
*out = make([]EnvironmentVariableConfig, len(*in))
copy(*out, *in)
}
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new AcceleratorConfig.
func (in *AcceleratorConfig) DeepCopy() *AcceleratorConfig {
if in == nil {
return nil
}
out := new(AcceleratorConfig)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *AcceleratorVolume) DeepCopyInto(out *AcceleratorVolume) {
*out = *in
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new AcceleratorVolume.
func (in *AcceleratorVolume) DeepCopy() *AcceleratorVolume {
if in == nil {
return nil
}
out := new(AcceleratorVolume)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ControllerConfig) DeepCopyInto(out *ControllerConfig) {
*out = *in
if in.Accelerators != nil {
in, out := &in.Accelerators, &out.Accelerators
*out = make(map[string]AcceleratorConfig, len(*in))
for key, val := range *in {
(*out)[key] = *val.DeepCopy()
}
}
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ControllerConfig.
func (in *ControllerConfig) DeepCopy() *ControllerConfig {
if in == nil {
return nil
}
out := new(ControllerConfig)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *EnvironmentVariableConfig) DeepCopyInto(out *EnvironmentVariableConfig) {
*out = *in
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new EnvironmentVariableConfig.
func (in *EnvironmentVariableConfig) DeepCopy() *EnvironmentVariableConfig {
if in == nil {
return nil
}
out := new(EnvironmentVariableConfig)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *MSJob) DeepCopyInto(out *MSJob) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ObjectMeta.DeepCopyInto(&out.ObjectMeta)
in.Spec.DeepCopyInto(&out.Spec)
in.Status.DeepCopyInto(&out.Status)
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MSJob.
func (in *MSJob) DeepCopy() *MSJob {
if in == nil {
return nil
}
out := new(MSJob)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *MSJob) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *MSJobList) DeepCopyInto(out *MSJobList) {
*out = *in
out.TypeMeta = in.TypeMeta
in.ListMeta.DeepCopyInto(&out.ListMeta)
if in.Items != nil {
in, out := &in.Items, &out.Items
*out = make([]MSJob, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MSJobList.
func (in *MSJobList) DeepCopy() *MSJobList {
if in == nil {
return nil
}
out := new(MSJobList)
in.DeepCopyInto(out)
return out
}
// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (in *MSJobList) DeepCopyObject() runtime.Object {
if c := in.DeepCopy(); c != nil {
return c
}
return nil
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *MSJobSpec) DeepCopyInto(out *MSJobSpec) {
*out = *in
if in.ReplicaSpecs != nil {
in, out := &in.ReplicaSpecs, &out.ReplicaSpecs
*out = make([]*MSReplicaSpec, len(*in))
for i := range *in {
if (*in)[i] != nil {
in, out := &(*in)[i], &(*out)[i]
*out = new(MSReplicaSpec)
(*in).DeepCopyInto(*out)
}
}
}
if in.TerminationPolicy != nil {
in, out := &in.TerminationPolicy, &out.TerminationPolicy
*out = new(TerminationPolicySpec)
(*in).DeepCopyInto(*out)
}
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MSJobSpec.
func (in *MSJobSpec) DeepCopy() *MSJobSpec {
if in == nil {
return nil
}
out := new(MSJobSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *MSJobStatus) DeepCopyInto(out *MSJobStatus) {
*out = *in
if in.ReplicaStatuses != nil {
in, out := &in.ReplicaStatuses, &out.ReplicaStatuses
*out = make([]*MSReplicaStatus, len(*in))
for i := range *in {
if (*in)[i] != nil {
in, out := &(*in)[i], &(*out)[i]
*out = new(MSReplicaStatus)
(*in).DeepCopyInto(*out)
}
}
}
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MSJobStatus.
func (in *MSJobStatus) DeepCopy() *MSJobStatus {
if in == nil {
return nil
}
out := new(MSJobStatus)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *MSReplicaSpec) DeepCopyInto(out *MSReplicaSpec) {
*out = *in
if in.Replicas != nil {
in, out := &in.Replicas, &out.Replicas
*out = new(int32)
**out = **in
}
if in.Template != nil {
in, out := &in.Template, &out.Template
*out = new(corev1.PodTemplateSpec)
(*in).DeepCopyInto(*out)
}
if in.MasterPort != nil {
in, out := &in.MasterPort, &out.MasterPort
*out = new(int32)
**out = **in
}
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MSReplicaSpec.
func (in *MSReplicaSpec) DeepCopy() *MSReplicaSpec {
if in == nil {
return nil
}
out := new(MSReplicaSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *MSReplicaStatus) DeepCopyInto(out *MSReplicaStatus) {
*out = *in
if in.ReplicasStates != nil {
in, out := &in.ReplicasStates, &out.ReplicasStates
*out = make(map[ReplicaState]int, len(*in))
for key, val := range *in {
(*out)[key] = val
}
}
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MSReplicaStatus.
func (in *MSReplicaStatus) DeepCopy() *MSReplicaStatus {
if in == nil {
return nil
}
out := new(MSReplicaStatus)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *MasterSpec) DeepCopyInto(out *MasterSpec) {
*out = *in
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MasterSpec.
func (in *MasterSpec) DeepCopy() *MasterSpec {
if in == nil {
return nil
}
out := new(MasterSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *TerminationPolicySpec) DeepCopyInto(out *TerminationPolicySpec) {
*out = *in
if in.Master != nil {
in, out := &in.Master, &out.Master
*out = new(MasterSpec)
**out = **in
}
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new TerminationPolicySpec.
func (in *TerminationPolicySpec) DeepCopy() *TerminationPolicySpec {
if in == nil {
return nil
}
out := new(TerminationPolicySpec)
in.DeepCopyInto(out)
return out
}
// +build !ignore_autogenerated
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by defaulter-gen. DO NOT EDIT.
package v1
import (
runtime "k8s.io/apimachinery/pkg/runtime"
)
// RegisterDefaults adds defaulters functions to the given scheme.
// Public to allow building arbitrary schemes.
// All generated defaulters are covering - they call all nested defaulters.
func RegisterDefaults(scheme *runtime.Scheme) error {
scheme.AddTypeDefaultingFunc(&MSJob{}, func(obj interface{}) { SetObjectDefaults_MSJob(obj.(*MSJob)) })
scheme.AddTypeDefaultingFunc(&MSJobList{}, func(obj interface{}) { SetObjectDefaults_MSJobList(obj.(*MSJobList)) })
return nil
}
func SetObjectDefaults_MSJob(in *MSJob) {
SetDefaults_MSJob(in)
}
func SetObjectDefaults_MSJobList(in *MSJobList) {
for i := range in.Items {
a := &in.Items[i]
SetObjectDefaults_MSJob(a)
}
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package validation
import (
"errors"
"fmt"
msv1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
"gitee.com/mindspore/ms-operator/pkg/util"
)
// ValidateMSJobSpec checks that the MSJobSpec is valid.
func ValidateMSJobSpec(c *msv1.MSJobSpec) error {
if c.TerminationPolicy == nil || c.TerminationPolicy.Master == nil {
return fmt.Errorf("invalid termination policy: %v", c.TerminationPolicy)
}
masterExists := false
// Check that each replica has a MS container and a master.
for _, r := range c.ReplicaSpecs {
found := false
if r.Template == nil {
return fmt.Errorf("Replica is missing Template; %v", util.Pformat(r))
}
if r.MSReplicaType == msv1.MSReplicaType(c.TerminationPolicy.Master.ReplicaName) {
masterExists = true
}
if r.MasterPort == nil {
return errors.New("MSReplicaSpec.MasterPort can't be nil.")
}
// Make sure the replica type is valid.
validReplicaTypes := []msv1.MSReplicaType{msv1.MASTER, msv1.WORKER}
isValidReplicaType := false
for _, t := range validReplicaTypes {
if t == r.MSReplicaType {
isValidReplicaType = true
break
}
}
if !isValidReplicaType {
return fmt.Errorf("tfReplicaSpec.MSReplicaType is %v but must be one of %v", r.MSReplicaType, validReplicaTypes)
}
for _, c := range r.Template.Spec.Containers {
if c.Name == msv1.DefaultMSContainer {
found = true
break
}
}
if !found {
return fmt.Errorf("Replica type %v is missing a container named %s", r.MSReplicaType, msv1.DefaultMSContainer)
}
}
if !masterExists {
return fmt.Errorf("Missing ReplicaSpec for master: %v", c.TerminationPolicy.Master.ReplicaName)
}
return nil
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package versioned
import (
kubeflowv1 "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/typed/mindspore/v1"
discovery "k8s.io/client-go/discovery"
rest "k8s.io/client-go/rest"
flowcontrol "k8s.io/client-go/util/flowcontrol"
)
type Interface interface {
Discovery() discovery.DiscoveryInterface
KubeflowV1() kubeflowv1.KubeflowV1Interface
// Deprecated: please explicitly pick a version if possible.
Kubeflow() kubeflowv1.KubeflowV1Interface
}
// Clientset contains the clients for groups. Each group has exactly one
// version included in a Clientset.
type Clientset struct {
*discovery.DiscoveryClient
kubeflowV1 *kubeflowv1.KubeflowV1Client
}
// KubeflowV1 retrieves the KubeflowV1Client
func (c *Clientset) KubeflowV1() kubeflowv1.KubeflowV1Interface {
return c.kubeflowV1
}
// Deprecated: Kubeflow retrieves the default version of KubeflowClient.
// Please explicitly pick a version.
func (c *Clientset) Kubeflow() kubeflowv1.KubeflowV1Interface {
return c.kubeflowV1
}
// Discovery retrieves the DiscoveryClient
func (c *Clientset) Discovery() discovery.DiscoveryInterface {
if c == nil {
return nil
}
return c.DiscoveryClient
}
// NewForConfig creates a new Clientset for the given config.
func NewForConfig(c *rest.Config) (*Clientset, error) {
configShallowCopy := *c
if configShallowCopy.RateLimiter == nil && configShallowCopy.QPS > 0 {
configShallowCopy.RateLimiter = flowcontrol.NewTokenBucketRateLimiter(configShallowCopy.QPS, configShallowCopy.Burst)
}
var cs Clientset
var err error
cs.kubeflowV1, err = kubeflowv1.NewForConfig(&configShallowCopy)
if err != nil {
return nil, err
}
cs.DiscoveryClient, err = discovery.NewDiscoveryClientForConfig(&configShallowCopy)
if err != nil {
return nil, err
}
return &cs, nil
}
// NewForConfigOrDie creates a new Clientset for the given config and
// panics if there is an error in the config.
func NewForConfigOrDie(c *rest.Config) *Clientset {
var cs Clientset
cs.kubeflowV1 = kubeflowv1.NewForConfigOrDie(c)
cs.DiscoveryClient = discovery.NewDiscoveryClientForConfigOrDie(c)
return &cs
}
// New creates a new Clientset for the given RESTClient.
func New(c rest.Interface) *Clientset {
var cs Clientset
cs.kubeflowV1 = kubeflowv1.New(c)
cs.DiscoveryClient = discovery.NewDiscoveryClient(c)
return &cs
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
// This package has the automatically generated clientset.
package versioned
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package fake
import (
clientset "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned"
kubeflowv1 "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/typed/mindspore/v1"
fakekubeflowv1 "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/typed/mindspore/v1/fake"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/watch"
"k8s.io/client-go/discovery"
fakediscovery "k8s.io/client-go/discovery/fake"
"k8s.io/client-go/testing"
)
// NewSimpleClientset returns a clientset that will respond with the provided objects.
// It's backed by a very simple object tracker that processes creates, updates and deletions as-is,
// without applying any validations and/or defaults. It shouldn't be considered a replacement
// for a real clientset and is mostly useful in simple unit tests.
func NewSimpleClientset(objects ...runtime.Object) *Clientset {
o := testing.NewObjectTracker(scheme, codecs.UniversalDecoder())
for _, obj := range objects {
if err := o.Add(obj); err != nil {
panic(err)
}
}
cs := &Clientset{}
cs.discovery = &fakediscovery.FakeDiscovery{Fake: &cs.Fake}
cs.AddReactor("*", "*", testing.ObjectReaction(o))
cs.AddWatchReactor("*", func(action testing.Action) (handled bool, ret watch.Interface, err error) {
gvr := action.GetResource()
ns := action.GetNamespace()
watch, err := o.Watch(gvr, ns)
if err != nil {
return false, nil, err
}
return true, watch, nil
})
return cs
}
// Clientset implements clientset.Interface. Meant to be embedded into a
// struct to get a default implementation. This makes faking out just the method
// you want to test easier.
type Clientset struct {
testing.Fake
discovery *fakediscovery.FakeDiscovery
}
func (c *Clientset) Discovery() discovery.DiscoveryInterface {
return c.discovery
}
var _ clientset.Interface = &Clientset{}
// KubeflowV1 retrieves the KubeflowV1Client
func (c *Clientset) KubeflowV1() kubeflowv1.KubeflowV1Interface {
return &fakekubeflowv1.FakeKubeflowV1{Fake: &c.Fake}
}
// Kubeflow retrieves the KubeflowV1Client
func (c *Clientset) Kubeflow() kubeflowv1.KubeflowV1Interface {
return &fakekubeflowv1.FakeKubeflowV1{Fake: &c.Fake}
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
// This package has the automatically generated fake clientset.
package fake
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package fake
import (
kubeflowv1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
schema "k8s.io/apimachinery/pkg/runtime/schema"
serializer "k8s.io/apimachinery/pkg/runtime/serializer"
)
var scheme = runtime.NewScheme()
var codecs = serializer.NewCodecFactory(scheme)
var parameterCodec = runtime.NewParameterCodec(scheme)
func init() {
v1.AddToGroupVersion(scheme, schema.GroupVersion{Version: "v1"})
AddToScheme(scheme)
}
// AddToScheme adds all types of this clientset into the given scheme. This allows composition
// of clientsets, like in:
//
// import (
// "k8s.io/client-go/kubernetes"
// clientsetscheme "k8s.io/client-go/kubernetes/scheme"
// aggregatorclientsetscheme "k8s.io/kube-aggregator/pkg/client/clientset_generated/clientset/scheme"
// )
//
// kclientset, _ := kubernetes.NewForConfig(c)
// aggregatorclientsetscheme.AddToScheme(clientsetscheme.Scheme)
//
// After this, RawExtensions in Kubernetes types will serialize kube-aggregator types
// correctly.
func AddToScheme(scheme *runtime.Scheme) {
kubeflowv1.AddToScheme(scheme)
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
// This package contains the scheme of the automatically generated clientset.
package scheme
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package scheme
import (
kubeflowv1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
schema "k8s.io/apimachinery/pkg/runtime/schema"
serializer "k8s.io/apimachinery/pkg/runtime/serializer"
)
var Scheme = runtime.NewScheme()
var Codecs = serializer.NewCodecFactory(Scheme)
var ParameterCodec = runtime.NewParameterCodec(Scheme)
func init() {
v1.AddToGroupVersion(Scheme, schema.GroupVersion{Version: "v1"})
AddToScheme(Scheme)
}
// AddToScheme adds all types of this clientset into the given scheme. This allows composition
// of clientsets, like in:
//
// import (
// "k8s.io/client-go/kubernetes"
// clientsetscheme "k8s.io/client-go/kubernetes/scheme"
// aggregatorclientsetscheme "k8s.io/kube-aggregator/pkg/client/clientset_generated/clientset/scheme"
// )
//
// kclientset, _ := kubernetes.NewForConfig(c)
// aggregatorclientsetscheme.AddToScheme(clientsetscheme.Scheme)
//
// After this, RawExtensions in Kubernetes types will serialize kube-aggregator types
// correctly.
func AddToScheme(scheme *runtime.Scheme) {
kubeflowv1.AddToScheme(scheme)
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
// This package has the automatically generated typed clients.
package v1
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
// Package fake has the automatically generated clients.
package fake
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package fake
import (
v1 "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/typed/mindspore/v1"
rest "k8s.io/client-go/rest"
testing "k8s.io/client-go/testing"
)
type FakeKubeflowV1 struct {
*testing.Fake
}
func (c *FakeKubeflowV1) MSJobs(namespace string) v1.MSJobInterface {
return &FakeMSJobs{c, namespace}
}
// RESTClient returns a RESTClient that is used to communicate
// with API server by this client implementation.
func (c *FakeKubeflowV1) RESTClient() rest.Interface {
var ret *rest.RESTClient
return ret
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package fake
import (
mindsporev1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
labels "k8s.io/apimachinery/pkg/labels"
schema "k8s.io/apimachinery/pkg/runtime/schema"
types "k8s.io/apimachinery/pkg/types"
watch "k8s.io/apimachinery/pkg/watch"
testing "k8s.io/client-go/testing"
)
// FakeMSJobs implements MSJobInterface
type FakeMSJobs struct {
Fake *FakeKubeflowV1
ns string
}
var msjobsResource = schema.GroupVersionResource{Group: "kubeflow.org", Version: "v1", Resource: "msjobs"}
var msjobsKind = schema.GroupVersionKind{Group: "kubeflow.org", Version: "v1", Kind: "MSJob"}
// Get takes name of the mSJob, and returns the corresponding mSJob object, and an error if there is any.
func (c *FakeMSJobs) Get(name string, options v1.GetOptions) (result *mindsporev1.MSJob, err error) {
obj, err := c.Fake.
Invokes(testing.NewGetAction(msjobsResource, c.ns, name), &mindsporev1.MSJob{})
if obj == nil {
return nil, err
}
return obj.(*mindsporev1.MSJob), err
}
// List takes label and field selectors, and returns the list of MSJobs that match those selectors.
func (c *FakeMSJobs) List(opts v1.ListOptions) (result *mindsporev1.MSJobList, err error) {
obj, err := c.Fake.
Invokes(testing.NewListAction(msjobsResource, msjobsKind, c.ns, opts), &mindsporev1.MSJobList{})
if obj == nil {
return nil, err
}
label, _, _ := testing.ExtractFromListOptions(opts)
if label == nil {
label = labels.Everything()
}
list := &mindsporev1.MSJobList{ListMeta: obj.(*mindsporev1.MSJobList).ListMeta}
for _, item := range obj.(*mindsporev1.MSJobList).Items {
if label.Matches(labels.Set(item.Labels)) {
list.Items = append(list.Items, item)
}
}
return list, err
}
// Watch returns a watch.Interface that watches the requested mSJobs.
func (c *FakeMSJobs) Watch(opts v1.ListOptions) (watch.Interface, error) {
return c.Fake.
InvokesWatch(testing.NewWatchAction(msjobsResource, c.ns, opts))
}
// Create takes the representation of a mSJob and creates it. Returns the server's representation of the mSJob, and an error, if there is any.
func (c *FakeMSJobs) Create(mSJob *mindsporev1.MSJob) (result *mindsporev1.MSJob, err error) {
obj, err := c.Fake.
Invokes(testing.NewCreateAction(msjobsResource, c.ns, mSJob), &mindsporev1.MSJob{})
if obj == nil {
return nil, err
}
return obj.(*mindsporev1.MSJob), err
}
// Update takes the representation of a mSJob and updates it. Returns the server's representation of the mSJob, and an error, if there is any.
func (c *FakeMSJobs) Update(mSJob *mindsporev1.MSJob) (result *mindsporev1.MSJob, err error) {
obj, err := c.Fake.
Invokes(testing.NewUpdateAction(msjobsResource, c.ns, mSJob), &mindsporev1.MSJob{})
if obj == nil {
return nil, err
}
return obj.(*mindsporev1.MSJob), err
}
// Delete takes name of the mSJob and deletes it. Returns an error if one occurs.
func (c *FakeMSJobs) Delete(name string, options *v1.DeleteOptions) error {
_, err := c.Fake.
Invokes(testing.NewDeleteAction(msjobsResource, c.ns, name), &mindsporev1.MSJob{})
return err
}
// DeleteCollection deletes a collection of objects.
func (c *FakeMSJobs) DeleteCollection(options *v1.DeleteOptions, listOptions v1.ListOptions) error {
action := testing.NewDeleteCollectionAction(msjobsResource, c.ns, listOptions)
_, err := c.Fake.Invokes(action, &mindsporev1.MSJobList{})
return err
}
// Patch applies the patch and returns the patched mSJob.
func (c *FakeMSJobs) Patch(name string, pt types.PatchType, data []byte, subresources ...string) (result *mindsporev1.MSJob, err error) {
obj, err := c.Fake.
Invokes(testing.NewPatchSubresourceAction(msjobsResource, c.ns, name, data, subresources...), &mindsporev1.MSJob{})
if obj == nil {
return nil, err
}
return obj.(*mindsporev1.MSJob), err
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package v1
type MSJobExpansion interface{}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package v1
import (
v1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
"gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/scheme"
// serializer "k8s.io/apimachinery/pkg/runtime/serializer"
rest "k8s.io/client-go/rest"
)
type KubeflowV1Interface interface {
RESTClient() rest.Interface
MSJobsGetter
}
// KubeflowV1Client is used to interact with features provided by the kubeflow.org group.
type KubeflowV1Client struct {
restClient rest.Interface
}
func (c *KubeflowV1Client) MSJobs(namespace string) MSJobInterface {
return newMSJobs(c, namespace)
}
// NewForConfig creates a new KubeflowV1Client for the given config.
func NewForConfig(c *rest.Config) (*KubeflowV1Client, error) {
config := *c
if err := setConfigDefaults(&config); err != nil {
return nil, err
}
client, err := rest.RESTClientFor(&config)
if err != nil {
return nil, err
}
return &KubeflowV1Client{client}, nil
}
// NewForConfigOrDie creates a new KubeflowV1Client for the given config and
// panics if there is an error in the config.
func NewForConfigOrDie(c *rest.Config) *KubeflowV1Client {
client, err := NewForConfig(c)
if err != nil {
panic(err)
}
return client
}
// New creates a new KubeflowV1Client for the given RESTClient.
func New(c rest.Interface) *KubeflowV1Client {
return &KubeflowV1Client{c}
}
func setConfigDefaults(config *rest.Config) error {
gv := v1.SchemeGroupVersion
config.GroupVersion = &gv
config.APIPath = "/apis"
config.NegotiatedSerializer = scheme.Codecs.WithoutConversion()
if config.UserAgent == "" {
config.UserAgent = rest.DefaultKubernetesUserAgent()
}
return nil
}
// RESTClient returns a RESTClient that is used to communicate
// with API server by this client implementation.
func (c *KubeflowV1Client) RESTClient() rest.Interface {
if c == nil {
return nil
}
return c.restClient
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by client-gen. DO NOT EDIT.
package v1
import (
v1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
scheme "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/scheme"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
types "k8s.io/apimachinery/pkg/types"
watch "k8s.io/apimachinery/pkg/watch"
rest "k8s.io/client-go/rest"
)
// MSJobsGetter has a method to return a MSJobInterface.
// A group's client should implement this interface.
type MSJobsGetter interface {
MSJobs(namespace string) MSJobInterface
}
// MSJobInterface has methods to work with MSJob resources.
type MSJobInterface interface {
Create(*v1.MSJob) (*v1.MSJob, error)
Update(*v1.MSJob) (*v1.MSJob, error)
Delete(name string, options *metav1.DeleteOptions) error
DeleteCollection(options *metav1.DeleteOptions, listOptions metav1.ListOptions) error
Get(name string, options metav1.GetOptions) (*v1.MSJob, error)
List(opts metav1.ListOptions) (*v1.MSJobList, error)
Watch(opts metav1.ListOptions) (watch.Interface, error)
Patch(name string, pt types.PatchType, data []byte, subresources ...string) (result *v1.MSJob, err error)
MSJobExpansion
}
// mSJobs implements MSJobInterface
type mSJobs struct {
client rest.Interface
ns string
}
// newMSJobs returns a MSJobs
func newMSJobs(c *KubeflowV1Client, namespace string) *mSJobs {
return &mSJobs{
client: c.RESTClient(),
ns: namespace,
}
}
// Get takes name of the mSJob, and returns the corresponding mSJob object, and an error if there is any.
func (c *mSJobs) Get(name string, options metav1.GetOptions) (result *v1.MSJob, err error) {
result = &v1.MSJob{}
err = c.client.Get().
Namespace(c.ns).
Resource("msjobs").
Name(name).
VersionedParams(&options, scheme.ParameterCodec).
Do().
Into(result)
return
}
// List takes label and field selectors, and returns the list of MSJobs that match those selectors.
func (c *mSJobs) List(opts metav1.ListOptions) (result *v1.MSJobList, err error) {
result = &v1.MSJobList{}
err = c.client.Get().
Namespace(c.ns).
Resource("msjobs").
VersionedParams(&opts, scheme.ParameterCodec).
Do().
Into(result)
return
}
// Watch returns a watch.Interface that watches the requested mSJobs.
func (c *mSJobs) Watch(opts metav1.ListOptions) (watch.Interface, error) {
opts.Watch = true
return c.client.Get().
Namespace(c.ns).
Resource("msjobs").
VersionedParams(&opts, scheme.ParameterCodec).
Watch()
}
// Create takes the representation of a mSJob and creates it. Returns the server's representation of the mSJob, and an error, if there is any.
func (c *mSJobs) Create(mSJob *v1.MSJob) (result *v1.MSJob, err error) {
result = &v1.MSJob{}
err = c.client.Post().
Namespace(c.ns).
Resource("msjobs").
Body(mSJob).
Do().
Into(result)
return
}
// Update takes the representation of a mSJob and updates it. Returns the server's representation of the mSJob, and an error, if there is any.
func (c *mSJobs) Update(mSJob *v1.MSJob) (result *v1.MSJob, err error) {
result = &v1.MSJob{}
err = c.client.Put().
Namespace(c.ns).
Resource("msjobs").
Name(mSJob.Name).
Body(mSJob).
Do().
Into(result)
return
}
// Delete takes name of the mSJob and deletes it. Returns an error if one occurs.
func (c *mSJobs) Delete(name string, options *metav1.DeleteOptions) error {
return c.client.Delete().
Namespace(c.ns).
Resource("msjobs").
Name(name).
Body(options).
Do().
Error()
}
// DeleteCollection deletes a collection of objects.
func (c *mSJobs) DeleteCollection(options *metav1.DeleteOptions, listOptions metav1.ListOptions) error {
return c.client.Delete().
Namespace(c.ns).
Resource("msjobs").
VersionedParams(&listOptions, scheme.ParameterCodec).
Body(options).
Do().
Error()
}
// Patch applies the patch and returns the patched mSJob.
func (c *mSJobs) Patch(name string, pt types.PatchType, data []byte, subresources ...string) (result *v1.MSJob, err error) {
result = &v1.MSJob{}
err = c.client.Patch(pt).
Namespace(c.ns).
Resource("msjobs").
SubResource(subresources...).
Name(name).
Body(data).
Do().
Into(result)
return
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by informer-gen. DO NOT EDIT.
package externalversions
import (
reflect "reflect"
sync "sync"
time "time"
versioned "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned"
internalinterfaces "gitee.com/mindspore/ms-operator/pkg/client/informers/externalversions/internalinterfaces"
mindspore "gitee.com/mindspore/ms-operator/pkg/client/informers/externalversions/mindspore"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
schema "k8s.io/apimachinery/pkg/runtime/schema"
cache "k8s.io/client-go/tools/cache"
)
// SharedInformerOption defines the functional option type for SharedInformerFactory.
type SharedInformerOption func(*sharedInformerFactory) *sharedInformerFactory
type sharedInformerFactory struct {
client versioned.Interface
namespace string
tweakListOptions internalinterfaces.TweakListOptionsFunc
lock sync.Mutex
defaultResync time.Duration
customResync map[reflect.Type]time.Duration
informers map[reflect.Type]cache.SharedIndexInformer
// startedInformers is used for tracking which informers have been started.
// This allows Start() to be called multiple times safely.
startedInformers map[reflect.Type]bool
}
// WithCustomResyncConfig sets a custom resync period for the specified informer types.
func WithCustomResyncConfig(resyncConfig map[v1.Object]time.Duration) SharedInformerOption {
return func(factory *sharedInformerFactory) *sharedInformerFactory {
for k, v := range resyncConfig {
factory.customResync[reflect.TypeOf(k)] = v
}
return factory
}
}
// WithTweakListOptions sets a custom filter on all listers of the configured SharedInformerFactory.
func WithTweakListOptions(tweakListOptions internalinterfaces.TweakListOptionsFunc) SharedInformerOption {
return func(factory *sharedInformerFactory) *sharedInformerFactory {
factory.tweakListOptions = tweakListOptions
return factory
}
}
// WithNamespace limits the SharedInformerFactory to the specified namespace.
func WithNamespace(namespace string) SharedInformerOption {
return func(factory *sharedInformerFactory) *sharedInformerFactory {
factory.namespace = namespace
return factory
}
}
// NewSharedInformerFactory constructs a new instance of sharedInformerFactory for all namespaces.
func NewSharedInformerFactory(client versioned.Interface, defaultResync time.Duration) SharedInformerFactory {
return NewSharedInformerFactoryWithOptions(client, defaultResync)
}
// NewFilteredSharedInformerFactory constructs a new instance of sharedInformerFactory.
// Listers obtained via this SharedInformerFactory will be subject to the same filters
// as specified here.
// Deprecated: Please use NewSharedInformerFactoryWithOptions instead
func NewFilteredSharedInformerFactory(client versioned.Interface, defaultResync time.Duration, namespace string, tweakListOptions internalinterfaces.TweakListOptionsFunc) SharedInformerFactory {
return NewSharedInformerFactoryWithOptions(client, defaultResync, WithNamespace(namespace), WithTweakListOptions(tweakListOptions))
}
// NewSharedInformerFactoryWithOptions constructs a new instance of a SharedInformerFactory with additional options.
func NewSharedInformerFactoryWithOptions(client versioned.Interface, defaultResync time.Duration, options ...SharedInformerOption) SharedInformerFactory {
factory := &sharedInformerFactory{
client: client,
namespace: v1.NamespaceAll,
defaultResync: defaultResync,
informers: make(map[reflect.Type]cache.SharedIndexInformer),
startedInformers: make(map[reflect.Type]bool),
customResync: make(map[reflect.Type]time.Duration),
}
// Apply all options
for _, opt := range options {
factory = opt(factory)
}
return factory
}
// Start initializes all requested informers.
func (f *sharedInformerFactory) Start(stopCh <-chan struct{}) {
f.lock.Lock()
defer f.lock.Unlock()
for informerType, informer := range f.informers {
if !f.startedInformers[informerType] {
go informer.Run(stopCh)
f.startedInformers[informerType] = true
}
}
}
// WaitForCacheSync waits for all started informers' cache were synced.
func (f *sharedInformerFactory) WaitForCacheSync(stopCh <-chan struct{}) map[reflect.Type]bool {
informers := func() map[reflect.Type]cache.SharedIndexInformer {
f.lock.Lock()
defer f.lock.Unlock()
informers := map[reflect.Type]cache.SharedIndexInformer{}
for informerType, informer := range f.informers {
if f.startedInformers[informerType] {
informers[informerType] = informer
}
}
return informers
}()
res := map[reflect.Type]bool{}
for informType, informer := range informers {
res[informType] = cache.WaitForCacheSync(stopCh, informer.HasSynced)
}
return res
}
// InternalInformerFor returns the SharedIndexInformer for obj using an internal
// client.
func (f *sharedInformerFactory) InformerFor(obj runtime.Object, newFunc internalinterfaces.NewInformerFunc) cache.SharedIndexInformer {
f.lock.Lock()
defer f.lock.Unlock()
informerType := reflect.TypeOf(obj)
informer, exists := f.informers[informerType]
if exists {
return informer
}
resyncPeriod, exists := f.customResync[informerType]
if !exists {
resyncPeriod = f.defaultResync
}
informer = newFunc(f.client, resyncPeriod)
f.informers[informerType] = informer
return informer
}
// SharedInformerFactory provides shared informers for resources in all known
// API group versions.
type SharedInformerFactory interface {
internalinterfaces.SharedInformerFactory
ForResource(resource schema.GroupVersionResource) (GenericInformer, error)
WaitForCacheSync(stopCh <-chan struct{}) map[reflect.Type]bool
Kubeflow() mindspore.Interface
}
func (f *sharedInformerFactory) Kubeflow() mindspore.Interface {
return mindspore.New(f, f.namespace, f.tweakListOptions)
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by informer-gen. DO NOT EDIT.
package externalversions
import (
"fmt"
v1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
schema "k8s.io/apimachinery/pkg/runtime/schema"
cache "k8s.io/client-go/tools/cache"
)
// GenericInformer is type of SharedIndexInformer which will locate and delegate to other
// sharedInformers based on type
type GenericInformer interface {
Informer() cache.SharedIndexInformer
Lister() cache.GenericLister
}
type genericInformer struct {
informer cache.SharedIndexInformer
resource schema.GroupResource
}
// Informer returns the SharedIndexInformer.
func (f *genericInformer) Informer() cache.SharedIndexInformer {
return f.informer
}
// Lister returns the GenericLister.
func (f *genericInformer) Lister() cache.GenericLister {
return cache.NewGenericLister(f.Informer().GetIndexer(), f.resource)
}
// ForResource gives generic access to a shared informer of the matching type
// TODO extend this to unknown resources with a client pool
func (f *sharedInformerFactory) ForResource(resource schema.GroupVersionResource) (GenericInformer, error) {
switch resource {
// Group=kubeflow.org, Version=v1
case v1.SchemeGroupVersion.WithResource("msjobs"):
return &genericInformer{resource: resource.GroupResource(), informer: f.Kubeflow().V1().MSJobs().Informer()}, nil
}
return nil, fmt.Errorf("no informer found for %v", resource)
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by informer-gen. DO NOT EDIT.
package internalinterfaces
import (
time "time"
versioned "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
cache "k8s.io/client-go/tools/cache"
)
type NewInformerFunc func(versioned.Interface, time.Duration) cache.SharedIndexInformer
// SharedInformerFactory a small interface to allow for adding an informer without an import cycle
type SharedInformerFactory interface {
Start(stopCh <-chan struct{})
InformerFor(obj runtime.Object, newFunc NewInformerFunc) cache.SharedIndexInformer
}
type TweakListOptionsFunc func(*v1.ListOptions)
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by informer-gen. DO NOT EDIT.
package kubeflow
import (
internalinterfaces "gitee.com/mindspore/ms-operator/pkg/client/informers/externalversions/internalinterfaces"
v1 "gitee.com/mindspore/ms-operator/pkg/client/informers/externalversions/mindspore/v1"
)
// Interface provides access to each of this group's versions.
type Interface interface {
// V1 provides access to shared informers for resources in V1.
V1() v1.Interface
}
type group struct {
factory internalinterfaces.SharedInformerFactory
namespace string
tweakListOptions internalinterfaces.TweakListOptionsFunc
}
// New returns a new Interface.
func New(f internalinterfaces.SharedInformerFactory, namespace string, tweakListOptions internalinterfaces.TweakListOptionsFunc) Interface {
return &group{factory: f, namespace: namespace, tweakListOptions: tweakListOptions}
}
// V1 returns a new v1.Interface.
func (g *group) V1() v1.Interface {
return v1.New(g.factory, g.namespace, g.tweakListOptions)
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by informer-gen. DO NOT EDIT.
package v1
import (
internalinterfaces "gitee.com/mindspore/ms-operator/pkg/client/informers/externalversions/internalinterfaces"
)
// Interface provides access to all the informers in this group version.
type Interface interface {
// MSJobs returns a MSJobInformer.
MSJobs() MSJobInformer
}
type version struct {
factory internalinterfaces.SharedInformerFactory
namespace string
tweakListOptions internalinterfaces.TweakListOptionsFunc
}
// New returns a new Interface.
func New(f internalinterfaces.SharedInformerFactory, namespace string, tweakListOptions internalinterfaces.TweakListOptionsFunc) Interface {
return &version{factory: f, namespace: namespace, tweakListOptions: tweakListOptions}
}
// MSJobs returns a MSJobInformer.
func (v *version) MSJobs() MSJobInformer {
return &mSJobInformer{factory: v.factory, namespace: v.namespace, tweakListOptions: v.tweakListOptions}
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by informer-gen. DO NOT EDIT.
package v1
import (
time "time"
mindsporev1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
versioned "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned"
internalinterfaces "gitee.com/mindspore/ms-operator/pkg/client/informers/externalversions/internalinterfaces"
v1 "gitee.com/mindspore/ms-operator/pkg/client/listers/mindspore/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
watch "k8s.io/apimachinery/pkg/watch"
cache "k8s.io/client-go/tools/cache"
)
// MSJobInformer provides access to a shared informer and lister for
// MSJobs.
type MSJobInformer interface {
Informer() cache.SharedIndexInformer
Lister() v1.MSJobLister
}
type mSJobInformer struct {
factory internalinterfaces.SharedInformerFactory
tweakListOptions internalinterfaces.TweakListOptionsFunc
namespace string
}
// NewMSJobInformer constructs a new informer for MSJob type.
// Always prefer using an informer factory to get a shared informer instead of getting an independent
// one. This reduces memory footprint and number of connections to the server.
func NewMSJobInformer(client versioned.Interface, namespace string, resyncPeriod time.Duration, indexers cache.Indexers) cache.SharedIndexInformer {
return NewFilteredMSJobInformer(client, namespace, resyncPeriod, indexers, nil)
}
// NewFilteredMSJobInformer constructs a new informer for MSJob type.
// Always prefer using an informer factory to get a shared informer instead of getting an independent
// one. This reduces memory footprint and number of connections to the server.
func NewFilteredMSJobInformer(client versioned.Interface, namespace string, resyncPeriod time.Duration, indexers cache.Indexers, tweakListOptions internalinterfaces.TweakListOptionsFunc) cache.SharedIndexInformer {
return cache.NewSharedIndexInformer(
&cache.ListWatch{
ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
if tweakListOptions != nil {
tweakListOptions(&options)
}
return client.KubeflowV1().MSJobs(namespace).List(options)
},
WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
if tweakListOptions != nil {
tweakListOptions(&options)
}
return client.KubeflowV1().MSJobs(namespace).Watch(options)
},
},
&mindsporev1.MSJob{},
resyncPeriod,
indexers,
)
}
func (f *mSJobInformer) defaultInformer(client versioned.Interface, resyncPeriod time.Duration) cache.SharedIndexInformer {
return NewFilteredMSJobInformer(client, f.namespace, resyncPeriod, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc}, f.tweakListOptions)
}
func (f *mSJobInformer) Informer() cache.SharedIndexInformer {
return f.factory.InformerFor(&mindsporev1.MSJob{}, f.defaultInformer)
}
func (f *mSJobInformer) Lister() v1.MSJobLister {
return v1.NewMSJobLister(f.Informer().GetIndexer())
}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by lister-gen. DO NOT EDIT.
package v1
// MSJobListerExpansion allows custom methods to be added to
// MSJobLister.
type MSJobListerExpansion interface{}
// MSJobNamespaceListerExpansion allows custom methods to be added to
// MSJobNamespaceLister.
type MSJobNamespaceListerExpansion interface{}
// Copyright 2020 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Code generated by lister-gen. DO NOT EDIT.
package v1
import (
v1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/client-go/tools/cache"
)
// MSJobLister helps list MSJobs.
type MSJobLister interface {
// List lists all MSJobs in the indexer.
List(selector labels.Selector) (ret []*v1.MSJob, err error)
// MSJobs returns an object that can list and get MSJobs.
MSJobs(namespace string) MSJobNamespaceLister
MSJobListerExpansion
}
// mSJobLister implements the MSJobLister interface.
type mSJobLister struct {
indexer cache.Indexer
}
// NewMSJobLister returns a new MSJobLister.
func NewMSJobLister(indexer cache.Indexer) MSJobLister {
return &mSJobLister{indexer: indexer}
}
// List lists all MSJobs in the indexer.
func (s *mSJobLister) List(selector labels.Selector) (ret []*v1.MSJob, err error) {
err = cache.ListAll(s.indexer, selector, func(m interface{}) {
ret = append(ret, m.(*v1.MSJob))
})
return ret, err
}
// MSJobs returns an object that can list and get MSJobs.
func (s *mSJobLister) MSJobs(namespace string) MSJobNamespaceLister {
return mSJobNamespaceLister{indexer: s.indexer, namespace: namespace}
}
// MSJobNamespaceLister helps list and get MSJobs.
type MSJobNamespaceLister interface {
// List lists all MSJobs in the indexer for a given namespace.
List(selector labels.Selector) (ret []*v1.MSJob, err error)
// Get retrieves the MSJob from the indexer for a given namespace and name.
Get(name string) (*v1.MSJob, error)
MSJobNamespaceListerExpansion
}
// mSJobNamespaceLister implements the MSJobNamespaceLister
// interface.
type mSJobNamespaceLister struct {
indexer cache.Indexer
namespace string
}
// List lists all MSJobs in the indexer for a given namespace.
func (s mSJobNamespaceLister) List(selector labels.Selector) (ret []*v1.MSJob, err error) {
err = cache.ListAllByNamespace(s.indexer, s.namespace, selector, func(m interface{}) {
ret = append(ret, m.(*v1.MSJob))
})
return ret, err
}
// Get retrieves the MSJob from the indexer for a given namespace and name.
func (s mSJobNamespaceLister) Get(name string) (*v1.MSJob, error) {
obj, exists, err := s.indexer.GetByKey(s.namespace + "/" + name)
if err != nil {
return nil, err
}
if !exists {
return nil, errors.NewNotFound(v1.Resource("msjob"), name)
}
return obj.(*v1.MSJob), nil
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Package controller provides a Kubernetes controller for a TensorFlow job resource.
package controller
import (
"errors"
"fmt"
"time"
log "github.com/sirupsen/logrus"
"k8s.io/api/core/v1"
apiextensionsclient "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/kubernetes/scheme"
typedcorev1 "k8s.io/client-go/kubernetes/typed/core/v1"
"k8s.io/client-go/tools/cache"
"k8s.io/client-go/tools/record"
"k8s.io/client-go/util/workqueue"
msv1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
msjobclient "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned"
kubeflowscheme "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/scheme"
informers "gitee.com/mindspore/ms-operator/pkg/client/informers/externalversions"
listers "gitee.com/mindspore/ms-operator/pkg/client/listers/mindspore/v1"
"gitee.com/mindspore/ms-operator/pkg/trainer"
)
const (
controllerName = "kubeflow"
)
var (
ErrVersionOutdated = errors.New("requested version is outdated in apiserver")
// IndexerInformer uses a delta queue, therefore for deletes we have to use this
// key function but it should be just fine for non delete events.
keyFunc = cache.DeletionHandlingMetaNamespaceKeyFunc
// DefaultJobBackOff is the max backoff period, exported for the e2e test
DefaultJobBackOff = 10 * time.Second
// MaxJobBackOff is the max backoff period, exported for the e2e test
MaxJobBackOff = 360 * time.Second
)
type Controller struct {
KubeClient kubernetes.Interface
APIExtclient apiextensionsclient.Interface
MSJobClient msjobclient.Interface
config msv1.ControllerConfig
jobs map[string]*trainer.TrainingJob
MSJobLister listers.MSJobLister
MSJobSynced cache.InformerSynced
// WorkQueue is a rate limited work queue. This is used to queue work to be
// processed instead of performing it as soon as a change happens. This
// means we can ensure we only process a fixed amount of resources at a
// time, and makes it easy to ensure we are never processing the same item
// simultaneously in two different workers.
WorkQueue workqueue.RateLimitingInterface
// recorder is an event recorder for recording Event resources to the
// Kubernetes API.
recorder record.EventRecorder
syncHandler func(jobKey string) (bool, error)
}
func New(kubeClient kubernetes.Interface, APIExtclient apiextensionsclient.Interface, tfJobClient msjobclient.Interface,
config msv1.ControllerConfig, tfJobInformerFactory informers.SharedInformerFactory) (*Controller, error) {
tfJobInformer := tfJobInformerFactory.Kubeflow().V1().MSJobs()
kubeflowscheme.AddToScheme(scheme.Scheme)
log.Debug("Creating event broadcaster")
eventBroadcaster := record.NewBroadcaster()
eventBroadcaster.StartLogging(log.Infof)
eventBroadcaster.StartRecordingToSink(&typedcorev1.EventSinkImpl{Interface: kubeClient.CoreV1().Events("")})
recorder := eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: controllerName})
controller := &Controller{
KubeClient: kubeClient,
APIExtclient: APIExtclient,
MSJobClient: tfJobClient,
WorkQueue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "MSjobs"),
recorder: recorder,
// TODO(jlewi)): What to do about cluster.Cluster?
jobs: make(map[string]*trainer.TrainingJob),
config: config,
}
log.Info("Setting up event handlers")
// Set up an event handler for when Foo resources change
tfJobInformer.Informer().AddEventHandler(
cache.FilteringResourceEventHandler{
FilterFunc: func(obj interface{}) bool {
switch t := obj.(type) {
case *msv1.MSJob:
log.Debugf("filter tfjob name: %v", t.Name)
return true
default:
return false
}
},
Handler: cache.ResourceEventHandlerFuncs{
AddFunc: controller.enqueueController,
UpdateFunc: func(oldObj, newObj interface{}) {
controller.enqueueController(newObj)
},
DeleteFunc: controller.enqueueController,
},
})
controller.MSJobLister = tfJobInformer.Lister()
controller.MSJobSynced = tfJobInformer.Informer().HasSynced
controller.syncHandler = controller.syncMSJob
return controller, nil
}
// Run will set up the event handlers for types we are interested in, as well
// as syncing informer caches and starting workers. It will block until stopCh
// is closed, at which point it will shutdown the workqueue and wait for
// workers to finish processing their current work items.
func (c *Controller) Run(threadiness int, stopCh <-chan struct{}) error {
defer runtime.HandleCrash()
defer c.WorkQueue.ShutDown()
// Start the informer factories to begin populating the informer caches
log.Info("Starting MSJob controller")
// Wait for the caches to be synced before starting workers
log.Info("Waiting for informer caches to sync")
if ok := cache.WaitForCacheSync(stopCh, c.MSJobSynced); !ok {
return fmt.Errorf("failed to wait for caches to sync")
}
log.Infof("Starting %v workers", threadiness)
// Launch workers to process MSJob resources
for i := 0; i < threadiness; i++ {
go wait.Until(c.runWorker, time.Second, stopCh)
}
log.Info("Started workers")
<-stopCh
log.Info("Shutting down workers")
return nil
}
// runWorker is a long-running function that will continually call the
// processNextWorkItem function in order to read and process a message on the
// workqueue.
func (c *Controller) runWorker() {
for c.processNextWorkItem() {
}
}
// processNextWorkItem will read a single work item off the workqueue and
// attempt to process it, by calling the syncHandler.
func (c *Controller) processNextWorkItem() bool {
key, quit := c.WorkQueue.Get()
if quit {
return false
}
defer c.WorkQueue.Done(key)
forget, err := c.syncHandler(key.(string))
if err == nil {
if forget {
c.WorkQueue.Forget(key)
}
return true
}
utilruntime.HandleError(fmt.Errorf("Error syncing job: %v", err))
c.WorkQueue.AddRateLimited(key)
return true
}
// syncMSJob will sync the job with the given. This function is not meant to be invoked
// concurrently with the same key.
//
// When a job is completely processed it will return true indicating that its ok to forget about this job since
// no more processing will occur for it.
func (c *Controller) syncMSJob(key string) (bool, error) {
startTime := time.Now()
defer func() {
log.Debugf("Finished syncing job %q (%v)", key, time.Since(startTime))
}()
ns, name, err := cache.SplitMetaNamespaceKey(key)
if err != nil {
return false, err
}
if len(ns) == 0 || len(name) == 0 {
return false, fmt.Errorf("invalid job key %q: either namespace or name is missing", key)
}
tfJob, err := c.MSJobLister.MSJobs(ns).Get(name)
if err != nil {
if apierrors.IsNotFound(err) {
log.Debugf("Job has been deleted: %v", key)
return true, nil
}
return false, err
}
// Create a new TrainingJob if there is no TrainingJob stored for it in the jobs map or if the UID's don't match.
// The UID's won't match in the event we deleted the job and then recreated the job with the same name.
if cJob, ok := c.jobs[key]; !ok || cJob.UID() != tfJob.UID {
nc, err := trainer.NewJob(c.KubeClient, c.MSJobClient, c.recorder, tfJob, &c.config)
if err != nil {
return false, err
}
c.jobs[key] = nc
}
nc := c.jobs[key]
if err := nc.Reconcile(&c.config); err != nil {
return false, err
}
tfJob, err = c.MSJobClient.KubeflowV1().MSJobs(tfJob.ObjectMeta.Namespace).Get(tfJob.ObjectMeta.Name, metav1.GetOptions{})
if err != nil {
return false, err
}
// TODO(jlewi): This logic will need to change when/if we get rid of phases and move to conditions. At that
// case we should forget about a job when the appropriate condition is reached.
if tfJob.Status.Phase == msv1.MSJobPhaseCleanUp {
return true, nil
}
return false, nil
}
// obj could be an *batch.Job, or a DeletionFinalStateUnknown marker item.
func (c *Controller) enqueueController(obj interface{}) {
key, err := keyFunc(obj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("Couldn't get key for object %+v: %v", obj, err))
return
}
c.WorkQueue.AddRateLimited(key)
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package trainer
import (
"fmt"
"strings"
)
// KubernetesLabels represents a set of labels to apply to a Kubernetes resources.
type KubernetesLabels map[string]string
// ToSelector converts the labels to a selector matching the labels.
func (l KubernetesLabels) ToSelector() (string, error) {
pieces := make([]string, 0, len(l))
for k, v := range l {
pieces = append(pieces, fmt.Sprintf("%v=%v", k, v))
}
return strings.Join(pieces, ","), nil
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package trainer
import (
"encoding/json"
"errors"
"fmt"
"strconv"
"strings"
log "github.com/golang/glog"
"k8s.io/api/core/v1"
k8s_errors "k8s.io/apimachinery/pkg/api/errors"
meta_v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
k8sErrors "k8s.io/apimachinery/pkg/util/errors"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/record"
torchv1alpha1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
"gitee.com/mindspore/ms-operator/pkg/util/k8sutil"
// TOOO(jlewi): Rename to apiErrors
"gitee.com/mindspore/ms-operator/pkg/apis/mindspore/helper"
"gitee.com/mindspore/ms-operator/pkg/util"
)
const (
SuccessfulCreateReason = "SuccessfulCreate"
FailedCreateReason = "FailedCreate"
)
// MSReplicaSet is a set of MS processes all acting as the same role (e.g. worker
type MSReplicaSet struct {
ClientSet kubernetes.Interface
recorder record.EventRecorder
// Job is a pointer to the TrainingJob to which this replica belongs.
Job *TrainingJob
Spec torchv1alpha1.MSReplicaSpec
}
// MSReplicas is an interface for managing a set of replicas.
type MSReplicaSetInterface interface {
Create() error
Delete() error
GetStatus() (torchv1alpha1.MSReplicaStatus, error)
}
// MSConfig is a struct representing the TensorFlow config. This struct is turned into an environment
// which is used by TensorFlow processes to configure themselves.
type MSConfig struct {
Cluster ClusterSpec `json:"cluster"`
Task TaskSpec `json:"task"`
Environment string `json:"environment"`
}
func NewMSReplicaSet(clientSet kubernetes.Interface, recorder record.EventRecorder, tfReplicaSpec torchv1alpha1.MSReplicaSpec, job *TrainingJob) (*MSReplicaSet, error) {
if tfReplicaSpec.MSReplicaType == torchv1alpha1.MASTER && *tfReplicaSpec.Replicas != 1 {
return nil, errors.New("The MASTER must have Replicas = 1")
}
if tfReplicaSpec.MasterPort == nil {
return nil, errors.New("tfReplicaSpec.MasterPort can't be nil.")
}
// Make sure the replica type is valid.
validReplicaTypes := []torchv1alpha1.MSReplicaType{torchv1alpha1.MASTER, torchv1alpha1.WORKER}
isValidReplicaType := false
for _, t := range validReplicaTypes {
if t == tfReplicaSpec.MSReplicaType {
isValidReplicaType = true
break
}
}
if !isValidReplicaType {
return nil, fmt.Errorf("tfReplicaSpec.MSReplicaType is %v but must be one of %v", tfReplicaSpec.MSReplicaType, validReplicaTypes)
}
return &MSReplicaSet{
ClientSet: clientSet,
recorder: recorder,
Job: job,
Spec: tfReplicaSpec,
}, nil
}
// Labels returns the labels for this replica set.
func (s *MSReplicaSet) Labels() KubernetesLabels {
return KubernetesLabels(map[string]string{
"kubeflow.org": "",
"job_type": string(s.Spec.MSReplicaType),
// runtime_id is set by Job.setup, which is called after the MSReplicaSet is created.
// this is why labels aren't a member variable.
"runtime_id": s.Job.job.Spec.RuntimeId,
"ms_job_name": s.Job.job.ObjectMeta.Name})
}
func (s *MSReplicaSet) Create(config *torchv1alpha1.ControllerConfig, worldSize int32) error {
// Create services
err := s.SyncServices()
if err != nil {
return err
}
// Create pods
return s.SyncPods(worldSize)
}
// CreateServiceWithIndex will create a new service with specify index
func (s *MSReplicaSet) CreateServiceWithIndex(index int32) (*v1.Service, error) {
taskLabels := s.Labels()
taskLabels["task_index"] = fmt.Sprintf("%v", index)
// Create the service.
service := &v1.Service{
ObjectMeta: meta_v1.ObjectMeta{
Name: s.genName(index),
Labels: taskLabels,
OwnerReferences: []meta_v1.OwnerReference{
helper.AsOwner(s.Job.job),
},
},
Spec: v1.ServiceSpec{
Selector: taskLabels,
Ports: []v1.ServicePort{
{
Name: "tf-port",
Port: *s.Spec.MasterPort,
},
},
},
}
log.Infof("Creating service: %v", service.ObjectMeta.Name)
return s.ClientSet.CoreV1().Services(s.Job.job.ObjectMeta.Namespace).Create(service)
}
// CreatePodWithIndex will create a new pod with specify index
func (s *MSReplicaSet) CreatePodWithIndex(index int32, worldSize int32) (*v1.Pod, error) {
taskLabels := s.Labels()
taskLabels["task_index"] = fmt.Sprintf("%v", index)
pod := &v1.Pod{
ObjectMeta: meta_v1.ObjectMeta{
Name: s.genPodName(index),
Labels: taskLabels,
OwnerReferences: []meta_v1.OwnerReference{
helper.AsOwner(s.Job.job),
},
},
Spec: *s.Spec.Template.Spec.DeepCopy(),
}
pod.Spec.SchedulerName = s.Job.SchedulerName()
// Configure the MS distributed environment variables
masterPort := strconv.Itoa(int(*s.Spec.MasterPort))
masterAddr := fmt.Sprintf("%v-%v-%v-%v", fmt.Sprintf("%.40s", s.Job.job.ObjectMeta.Name), "master", s.Job.job.Spec.RuntimeId, 0)
if index == 0 {
masterAddr = "localhost"
}
rank := strconv.Itoa(int(index))
tfConfig := MSConfig{
Cluster: s.Job.ClusterSpec(),
Task: TaskSpec{
Type: strings.ToLower(string(s.Spec.MSReplicaType)),
Index: int(index),
},
// We need to set environment to cloud otherwise it will default to local which isn't what we want.
Environment: "cloud",
}
tfConfigJson, err := json.Marshal(tfConfig)
if err != nil {
log.Errorf("Job: %v serializing tfConfig: %v return error; %v", s.Job.job.ObjectMeta.Name, util.Pformat(tfConfig), err)
return nil, err
}
// TODO(jose5918) Do not need TF_CONFIG but leaving for POC
// Add TF_CONFIG environment variable.
for i, _ := range pod.Spec.Containers {
// We can't get c in the loop variable because that would be by value so our modifications
// wouldn't have any effect.
c := &pod.Spec.Containers[i]
if c.Name != torchv1alpha1.DefaultMSContainer {
continue
}
if len(c.Env) == 0 {
c.Env = make([]v1.EnvVar, 0)
}
c.Env = append(c.Env, v1.EnvVar{
Name: "TF_CONFIG",
Value: string(tfConfigJson),
})
c.Env = append(c.Env, v1.EnvVar{
Name: "MASTER_PORT",
Value: masterPort,
})
c.Env = append(c.Env, v1.EnvVar{
Name: "MASTER_ADDR",
Value: masterAddr,
})
c.Env = append(c.Env, v1.EnvVar{
Name: "WORLD_SIZE",
Value: strconv.Itoa(int(worldSize)),
})
c.Env = append(c.Env, v1.EnvVar{
Name: "RANK",
Value: rank,
})
}
log.Infof("Creating pod: %v", pod.ObjectMeta.Name)
return s.ClientSet.CoreV1().Pods(s.Job.job.ObjectMeta.Namespace).Create(pod)
}
// Delete deletes the replicas
func (s *MSReplicaSet) Delete() error {
selector, err := s.Labels().ToSelector()
if err != nil {
return err
}
failures := false
options := meta_v1.ListOptions{
LabelSelector: selector,
}
log.V(1).Infof("Deleting Jobs namespace=%v selector=%v", s.Job.job.ObjectMeta.Namespace, selector)
err = s.ClientSet.CoreV1().Pods(s.Job.job.ObjectMeta.Namespace).DeleteCollection(&meta_v1.DeleteOptions{}, options)
if err != nil {
log.Errorf("There was a problem deleting the jobs; %v", err)
failures = true
}
// We need to delete the completed pods.
log.Infof("Deleting Pods namespace=%v selector=%v", s.Job.job.ObjectMeta.Namespace, selector)
err = s.ClientSet.CoreV1().Pods(s.Job.job.ObjectMeta.Namespace).DeleteCollection(&meta_v1.DeleteOptions{}, options)
if err != nil {
log.Errorf("There was a problem deleting the pods; %v", err)
failures = true
}
// Services doesn't support DeleteCollection so we delete them individually.
// TODO(jlewi): We should check if this has changed with K8s 1.8 or other releases.
for index := int32(0); index < *s.Spec.Replicas; index++ {
log.V(1).Infof("Deleting Service %v:%v", s.Job.job.ObjectMeta.Namespace, s.genName((index)))
err = s.ClientSet.CoreV1().Services(s.Job.job.ObjectMeta.Namespace).Delete(s.genName(index), &meta_v1.DeleteOptions{})
if err != nil {
log.Errorf("Error deleting service %v; %v", s.genName(index), err)
failures = true
}
}
// If the ConfigMap for the default parameter server exists, we delete it
log.Infof("Get ConfigMaps %v:%v", s.Job.job.ObjectMeta.Namespace, s.defaultPSConfigMapName())
_, err = s.ClientSet.CoreV1().ConfigMaps(s.Job.job.ObjectMeta.Namespace).Get(s.defaultPSConfigMapName(), meta_v1.GetOptions{})
if err != nil {
if !k8sutil.IsKubernetesResourceNotFoundError(err) {
log.Errorf("Error deleting ConfigMap %v; %v", s.defaultPSConfigMapName(), err)
failures = true
}
} else {
log.Infof("Delete ConfigMaps %v:%v", s.Job.job.ObjectMeta.Namespace, s.defaultPSConfigMapName())
err = s.ClientSet.CoreV1().ConfigMaps(s.Job.job.ObjectMeta.Namespace).Delete(s.defaultPSConfigMapName(), &meta_v1.DeleteOptions{})
if err != nil {
log.Errorf("There was a problem deleting the ConfigMaps; %v", err)
failures = true
}
}
if failures {
return errors.New("Some of the replicas resources could not be deleted")
}
return nil
}
// replicaStatusFromPodList returns a status from a list of pods for a job.
func replicaStatusFromPodList(l v1.PodList, name string) torchv1alpha1.ReplicaState {
var latest *v1.Pod
for _, i := range l.Items {
if latest == nil {
latest = &i
continue
}
if latest.Status.StartTime.Before(i.Status.StartTime) {
latest = &i
}
}
if latest == nil {
return torchv1alpha1.ReplicaStateRunning
}
var tfState v1.ContainerState
for _, i := range latest.Status.ContainerStatuses {
if i.Name != name {
continue
}
// We need to decide whether to use the current state or the previous termination state.
tfState = i.State
// If the container previously terminated we will look at the termination to decide whether it is a retryable
// or permanenent error.
if i.LastTerminationState.Terminated != nil {
tfState = i.LastTerminationState
}
}
if tfState.Running != nil || tfState.Waiting != nil {
return torchv1alpha1.ReplicaStateRunning
}
if tfState.Terminated != nil {
if tfState.Terminated.ExitCode == 0 {
return torchv1alpha1.ReplicaStateSucceeded
}
if isRetryableTerminationState(tfState.Terminated) {
// Since its a retryable error just return RUNNING.
// We can just let Kubernetes restart the container to retry.
return torchv1alpha1.ReplicaStateRunning
}
return torchv1alpha1.ReplicaStateFailed
}
return torchv1alpha1.ReplicaStateUnknown
}
func (s *MSReplicaSet) GetSingleReplicaStatus(index int32) torchv1alpha1.ReplicaState {
p, err := s.ClientSet.CoreV1().Pods(s.Job.job.ObjectMeta.Namespace).Get(s.genName(index), meta_v1.GetOptions{})
if err != nil {
return torchv1alpha1.ReplicaStateUnknown
}
if v1.PodSucceeded == p.Status.Phase {
return torchv1alpha1.ReplicaStateSucceeded
}
labels := s.Labels()
labels["task_index"] = fmt.Sprintf("%v", index)
selector, err := labels.ToSelector()
if err != nil {
log.Errorf("labels.ToSelector() error; %v", err)
return torchv1alpha1.ReplicaStateFailed
}
// TODO(jlewi): Handle errors. We need to get the pod and looking at recent container exits.
l, err := s.ClientSet.CoreV1().Pods(s.Job.job.ObjectMeta.Namespace).List(meta_v1.ListOptions{
// TODO(jlewi): Why isn't the label selector working?
LabelSelector: selector,
})
if err != nil {
// TODO(jlewi): Are there errors that should be treated as retryable errors?
return torchv1alpha1.ReplicaStateFailed
}
status := replicaStatusFromPodList(*l, torchv1alpha1.DefaultMSContainer)
return status
}
// Status returns the status of the replica set.
func (s *MSReplicaSet) GetStatus() (torchv1alpha1.MSReplicaStatus, error) {
status := torchv1alpha1.MSReplicaStatus{
MSReplicaType: s.Spec.MSReplicaType,
State: torchv1alpha1.ReplicaStateUnknown,
ReplicasStates: make(map[torchv1alpha1.ReplicaState]int),
}
increment := func(state torchv1alpha1.ReplicaState) {
v, ok := status.ReplicasStates[state]
if ok {
status.ReplicasStates[state] = v + 1
} else {
status.ReplicasStates[state] = 1
}
}
for index := int32(0); index < *s.Spec.Replicas; index++ {
increment(s.GetSingleReplicaStatus(index))
}
// Determine the overall status for the replica set based on the status of the individual
// replicas.
// If any of the replicas failed mark the set as failed.
if _, ok := status.ReplicasStates[torchv1alpha1.ReplicaStateFailed]; ok {
status.State = torchv1alpha1.ReplicaStateFailed
return status, nil
}
// If any replicas are RUNNING mark it as RUNNING.
if _, ok := status.ReplicasStates[torchv1alpha1.ReplicaStateRunning]; ok {
status.State = torchv1alpha1.ReplicaStateRunning
return status, nil
}
// If all of the replicas succeeded consider it success.
if v, ok := status.ReplicasStates[torchv1alpha1.ReplicaStateSucceeded]; ok && int32(v) == *s.Spec.Replicas {
status.State = torchv1alpha1.ReplicaStateSucceeded
return status, nil
}
return status, nil
}
// SyncPods will try to check current pods for this MSReplicaSet and try to make it as desired.
func (s *MSReplicaSet) SyncPods(worldSize int32) error {
for index := int32(0); index < *s.Spec.Replicas; index++ {
// Label to get all pods of this MSReplicaType + index
labels := s.Labels()
labels["task_index"] = fmt.Sprintf("%v", index)
rank := index
if labels["job_type"] == "WORKER" {
rank = index + 1
}
labels["task_index"] = fmt.Sprintf("%v", rank)
labelSelector, err := labels.ToSelector()
if err != nil {
return err
}
// Filter the unactive pods
fieldSelector := "status.phase!=" + string(v1.PodFailed)
//",deletionTimestamp!=nil"
options := meta_v1.ListOptions{
LabelSelector: labelSelector,
FieldSelector: fieldSelector,
}
// List to get pods
pl, err := s.ClientSet.CoreV1().Pods(s.Job.job.ObjectMeta.Namespace).List(options)
if len(pl.Items) == 0 {
log.Infof("Pod not found, create new one.")
// Create the pod
createdPod, err := s.CreatePodWithIndex(rank, worldSize)
// If the pod already exists do nothing.
if err != nil {
if k8s_errors.IsAlreadyExists(err) {
log.Infof("Pod: %v already exists.", createdPod.ObjectMeta.Name)
continue
}
s.recorder.Eventf(s.Job.job, v1.EventTypeWarning, FailedCreateReason, "Error creating: %v", err)
return k8sErrors.NewAggregate([]error{fmt.Errorf("Creating pod %v returned error.", createdPod.ObjectMeta.Name), err})
}
s.recorder.Eventf(s.Job.job, v1.EventTypeNormal, SuccessfulCreateReason, "Created pod: %v", createdPod.Name)
continue
}
if err != nil {
// TODO: handing this error
continue
}
}
return nil
}
// SyncServices will try to check current services for this MSReplicaSet and try to make it as desired.
func (s *MSReplicaSet) SyncServices() error {
for index := int32(0); index < *s.Spec.Replicas; index++ {
_, err := s.ClientSet.CoreV1().Services(s.Job.job.ObjectMeta.Namespace).Get(s.genName(index), meta_v1.GetOptions{})
if err != nil && k8s_errors.IsNotFound(err) {
log.Infof("Service: %v not found, create new one.", s.genName(index))
// Create the service
createdService, err := s.CreateServiceWithIndex(index)
// If the service already exists do nothing.
if err != nil {
if k8s_errors.IsAlreadyExists(err) {
log.Infof("Service: %v already exists.", s.genName(index))
continue
}
s.recorder.Eventf(s.Job.job, v1.EventTypeWarning, FailedCreateReason, "Error creating: %v", err)
return k8sErrors.NewAggregate([]error{fmt.Errorf("Creating Service %v returned error.", createdService.ObjectMeta.Name), err})
}
s.recorder.Eventf(s.Job.job, v1.EventTypeNormal, SuccessfulCreateReason, "Created Service: %v", createdService.Name)
continue
}
if err != nil {
// TODO: handing this error
continue
}
}
return nil
}
func (s *MSReplicaSet) genName(index int32) string {
// Truncate tfjob name to 40 characters
// The whole job name should be compliant with the DNS_LABEL spec, up to a max length of 63 characters
// Thus genName(40 chars)-replicaType(6 chars)-runtimeId(4 chars)-index(4 chars), also leaving some spaces
// See https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/identifiers.md
return fmt.Sprintf("%v-%v-%v-%v", fmt.Sprintf("%.40s", s.Job.job.ObjectMeta.Name), strings.ToLower(string(s.Spec.MSReplicaType)), s.Job.job.Spec.RuntimeId, index)
}
func (s *MSReplicaSet) genPodName(index int32) string {
// Generate a new pod name with random string
return s.genName(index) + "-" + util.RandString(5)
}
func (s *MSReplicaSet) defaultPSConfigMapName() string {
return fmt.Sprintf("cm-ps-%v", s.Job.job.Spec.RuntimeId)
}
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Package trainer is to manage ms training jobs.
package trainer
import (
"fmt"
"reflect"
"strings"
log "github.com/sirupsen/logrus"
"k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/record"
"gitee.com/mindspore/ms-operator/pkg/apis/mindspore/helper"
msv1 "gitee.com/mindspore/ms-operator/pkg/apis/mindspore/v1"
"gitee.com/mindspore/ms-operator/pkg/apis/mindspore/validation"
msclient "gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned"
"gitee.com/mindspore/ms-operator/pkg/client/clientset/versioned/scheme"
"gitee.com/mindspore/ms-operator/pkg/util"
)
// TODO(jlewi): We should switch a New pattern and make trainingJob private so we can
// ensure correctness on creation.
type TrainingJob struct {
job *msv1.MSJob
KubeCli kubernetes.Interface
recorder record.EventRecorder
Replicas []*MSReplicaSet
msJobClient msclient.Interface
// in memory state of the job.
// status is the source of truth after job struct is materialized. Changes to the status to be persisted
// should be made here.
status msv1.MSJobStatus
memberCounter int
}
// TODO(jose5918): We don't really need the cluster spec for this operator but no harm in leaving it for POC
// ClusterSpec represents a cluster TensorFlow specification.
// https://www.tensorflow.org/deploy/distributed#create_a_tftrainclusterspec_to_describe_the_cluster
// It is a map from job names to network addresses.
type ClusterSpec map[string][]string
type TaskSpec struct {
Type string `json:"type"`
Index int `json:"index"`
}
func initJob(kubeCli kubernetes.Interface, msJobClient msclient.Interface, recorder record.EventRecorder, job *msv1.MSJob) (*TrainingJob, error) {
j := &TrainingJob{
KubeCli: kubeCli,
msJobClient: msJobClient,
recorder: recorder,
Replicas: make([]*MSReplicaSet, 0),
job: job,
status: *job.Status.DeepCopy(),
}
return j, nil
}
func NewJob(kubeCli kubernetes.Interface, msJobClient msclient.Interface, recorder record.EventRecorder, job *msv1.MSJob, config *msv1.ControllerConfig) (*TrainingJob, error) {
j, err := initJob(kubeCli, msJobClient, recorder, job)
if err != nil {
return nil, err
}
return j, nil
}
func (j *TrainingJob) UID() types.UID {
return j.job.ObjectMeta.UID
}
func (j *TrainingJob) ClusterSpec() ClusterSpec {
clusterSpec := make(ClusterSpec)
for _, p := range j.Replicas {
replicaNames := make([]string, 0, *p.Spec.Replicas)
for i := int32(0); i < *p.Spec.Replicas; i++ {
replicaNames = append(replicaNames, fmt.Sprintf("%v:%v", p.genName(i), *p.Spec.MasterPort))
}
clusterSpec[strings.ToLower(string(p.Spec.MSReplicaType))] = replicaNames
}
return clusterSpec
}
// createResources creates all the replicas if requested
func (j *TrainingJob) createResources(config *msv1.ControllerConfig) error {
// TODO(jose5918) Need to figure out where it is best to add worldSize logic
// Get MS worldSize by adding replicas
worldSize := int32(0)
for _, r := range j.Replicas {
worldSize = worldSize + *r.Spec.Replicas
}
for _, r := range j.Replicas {
if err := r.Create(config, worldSize); err != nil {
return err
}
}
return nil
}
// deleteResources deletes the replicas it it was created
func (j *TrainingJob) deleteResources() error {
for _, r := range j.Replicas {
if err := r.Delete(); err != nil {
return err
}
}
return nil
}
func (j *TrainingJob) GetStatus() (msv1.State, []*msv1.MSReplicaStatus, error) {
master := j.job.Spec.TerminationPolicy.Master
masterState := msv1.ReplicaStateUnknown
state := msv1.StateUnknown
replicaStatuses := make([]*msv1.MSReplicaStatus, 0)
// The state for each replica.
// TODO(jlewi): We will need to modify this code if we want to allow multiples of a given type of replica.
replicaSetStates := make(map[msv1.MSReplicaType]msv1.ReplicaState)
for _, r := range j.Replicas {
rStatus, err := r.GetStatus()
if err != nil {
log.Errorf("GetStatus() for %v returned error; %v", r.Spec.MSReplicaType, err)
}
replicaSetStates[r.Spec.MSReplicaType] = rStatus.State
replicaStatuses = append(replicaStatuses, &rStatus)
if string(r.Spec.MSReplicaType) == master.ReplicaName {
masterState = r.GetSingleReplicaStatus(int32(master.ReplicaRank))
}
}
if masterState == msv1.ReplicaStateRunning {
state = msv1.StateRunning
} else if masterState == msv1.ReplicaStateFailed {
state = msv1.StateFailed
} else if masterState == msv1.ReplicaStateSucceeded {
state = msv1.StateSucceeded
}
return state, replicaStatuses, nil
}
// isRetryableTerminationState returns true if a container terminated in a state
// that we consider retryable.
func isRetryableTerminationState(s *v1.ContainerStateTerminated) bool {
// TODO(jlewi): Need to match logic in
// https://cs.corp.google.com/piper///depot/google3/cloud/ml/beta/job/training_job_state_util.cc?l=88
if s.Reason == "OOMKilled" {
// If the user's process causes an OOM and Docker kills the container,
// the termination reason of ContainerState will be specified to
// 'OOMKilled'. In this case, we can't assume this to be a retryable error.
//
// This check should happen before checking the termination log, since
// if the container terminated with an OOM, the termination log may not
// be written.
return false
}
// TODO(jlewi): Should we use the exit code reported in the termination
// log message and not the ExitCode reported by the container.
if s.ExitCode >= 0 && s.ExitCode <= 127 {
// For the exit_code in [0, 127]:
// 0 means success,
// 1 - 127 corresponds to permanent user errors.
// We don't want to retry for both cases.
// More info about exit status can be found in:
// https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html
return false
}
// For the remaining cases that exit_code from workers that doesn't
// fall into [0, 127]. They can be:
// 137 corresponds to SIGKILL,
// 143 corresponds to SIGTERM,
// other values that have undefined behavior.
// We treat them as internal errors for now and all the internal errors
// will be retired.
return true
}
func (j *TrainingJob) masterName() string {
return fmt.Sprintf("master-%v-0", j.job.Spec.RuntimeId)
}
// setup the training job.
func (j *TrainingJob) setup(config *msv1.ControllerConfig) {
err := func() error {
// If the job has already started we shouldn't set it up again.
if j.status.Phase != msv1.MSJobPhaseNone {
log.Warningf("Job %v has already been setup.", j.name())
return nil
}
// Set defaults.
scheme.Scheme.Default(j.job)
err := validation.ValidateMSJobSpec(&j.job.Spec)
if err != nil {
return fmt.Errorf("invalid job spec: %v", err)
}
if err := helper.ConfigureAcceleratorsForMSJobSpec(&j.job.Spec, config.Accelerators); err != nil {
return fmt.Errorf("ConfigureAccelerators(...) error; %v", err)
}
if j.job.Spec.RuntimeId == "" {
j.job.Spec.RuntimeId = util.RandString(4)
}
return nil
}()
if err != nil {
j.status.Reason = err.Error()
j.status.Phase = msv1.MSJobPhaseFailed
j.status.State = msv1.StateFailed
} else {
j.status.Phase = msv1.MSJobPhaseCreating
j.status.State = msv1.StateRunning
}
}
// setup Replicas. This creates in memory data structures corresponding to the replicas.
func (j *TrainingJob) setupReplicas() error {
if len(j.Replicas) != len(j.job.Spec.ReplicaSpecs) {
j.Replicas = make([]*MSReplicaSet, 0, len(j.job.Spec.ReplicaSpecs))
for _, t := range j.job.Spec.ReplicaSpecs {
r, err := NewMSReplicaSet(j.KubeCli, j.recorder, *t, j)
if err != nil {
return err
}
j.Replicas = append(j.Replicas, r)
}
}
return nil
}
func (j *TrainingJob) Delete() {
// TODO(jlewi): Delete is what should cause us to delete the Pods.
// we shouldn't delete the pods when the jobs finish because leaving the pods
// allows us to get the logs from the pods after the job finishes.
//
log.Infof("MSJob %v deleted by the user", j.fullname())
// TODO(jlewi): This logic is probably insufficient.
if j.job.Status.Phase != msv1.MSJobPhaseCleanUp {
j.status.Phase = msv1.MSJobPhaseCleanUp
}
// TODO(jlewi): Does it make sense to explicitly delete the resources? Should
// we just rely on K8s garbage collection to delete the resources before
// deleting MSJob?
if cErr := j.deleteResources(); cErr != nil {
log.Errorf("trainingJob.deleteResources() error; %v", cErr)
}
}
// updateCRDStatus updates the job status based on TraingingJob.status.
func (j *TrainingJob) updateCRDStatus() error {
// If the status hasn't changed then there's no reason to update the CRD.
if reflect.DeepEqual(j.job.Status, j.status) {
return nil
}
newJob := j.job
newJob.Status = j.status
// newJob, err := j.msJobClient.mindsporev1().MSJobs(j.job.ObjectMeta.Namespace).Update(newJob)
// if err != nil {
// return err
// }
j.job = newJob
return nil
}
// reconcile tries to get the job into the desired state.
func (j *TrainingJob) Reconcile(config *msv1.ControllerConfig) error {
if j.job.Status.Phase == msv1.MSJobPhaseNone {
// The job hasn't been setup.
j.setup(config)
if err := j.updateCRDStatus(); err != nil {
log.Warningf("failed to update CRD status: %v", err)
return err
}
}
// setupreplicas initializes data structures inside TrainingJob representing the replicas.
// These are go-lang structures which aren't preserved in the APIServer. So we always need to call setupReplicas
// unlike setup which only needs to be called once during the lifecycle of the job.
if err := j.setupReplicas(); err != nil {
log.Errorf("failed to create replicas: %v", err)
j.status.Reason = fmt.Sprintf("Could not create in memory datastructures; %v", err)
if uErr := j.updateCRDStatus(); err != nil {
log.Warningf("Job %v; failed to update status error: %v", j.job.ObjectMeta.Name, uErr)
}
return err
}
// TODO(jlewi): Can we determine from the CRD status whether we should
// Create the resources or not? We need to ensure the resources exist so for
// now we always call Create.
if j.job.Status.Phase == msv1.MSJobPhaseCreating || j.job.Status.Phase == msv1.MSJobPhaseRunning {
// We call Create to make sure all the resources exist and are running.
if cErr := j.createResources(config); cErr != nil {
// TODO(jlewi): Should we eventually give up and mark the job as failed if we can't create the resources?
j.status.Reason = fmt.Sprintf("Could not create job resources; %v", cErr)
if err := j.updateCRDStatus(); err != nil {
log.Warningf("Job %v; failed to update status error: %v", j.job.ObjectMeta.Name, err)
return err
}
log.Errorf("trainingJobCreateReplicas() error; %v", cErr)
return cErr
}
state, replicaStatuses, err := j.GetStatus()
j.status.ReplicaStatuses = replicaStatuses
if err != nil {
log.Errorf("GetStatus() for job %v returned error: %v", j.job.ObjectMeta.Name, err)
return err
}
// TODO(jlewi): We should update the Phase if we detect the job is done.
if state == msv1.StateFailed {
log.Errorf("Master failed Job: %v.", j.job.ObjectMeta.Name)
j.status.Phase = msv1.MSJobPhaseDone
j.status.State = msv1.StateFailed
} else if state == msv1.StateSucceeded {
log.Infof("Master succeeded Job: %v.", j.job.ObjectMeta.Name)
j.status.Phase = msv1.MSJobPhaseDone
j.status.State = msv1.StateSucceeded
} else {
log.Infof("Job %v status=%v", j.job.ObjectMeta.Name, util.Pformat(j.status))
}
}
// TODO(jose5918) Need to figure out where it is best to add worldSize logic
// Get MS worldSize by adding replicas
worldSize := int32(0)
for _, r := range j.Replicas {
worldSize = worldSize + *r.Spec.Replicas
}
// sync pods
for _, rc := range j.Replicas {
err := rc.SyncPods(worldSize)
if err != nil {
log.Errorf("SyncPods error: %v", err)
}
}
// sync services
for _, rc := range j.Replicas {
err := rc.SyncServices()
if err != nil {
log.Errorf("SyncServices error: %v", err)
}
}
// If the phase changed we should update the CRD.
if err := j.updateCRDStatus(); err != nil {
log.Warningf("Job %v, failed to update CRD status error: %v", j.job.ObjectMeta.Name, err)
return err
}
if j.job.Status.Phase == msv1.MSJobPhaseCleanUp {
if cErr := j.deleteResources(); cErr != nil {
log.Errorf("Job %v trainingJob.Delete() error; %v", j.job.ObjectMeta.Name, cErr)
}
// j.status.SetPhase(spec.MSJobPhaseDone)
// Return from run because we want to stop reconciling the object.
return nil
}
// updateCRDStatus will update the status of the CRD with c.Status if c.Status
// doesn't match c.Cluster.status. So you can change c.Status in order to propagate
// changes to the CRD status.
if err := j.updateCRDStatus(); err != nil {
log.Warningf("Job %v; failed to update CRD status error: %v", j.job.ObjectMeta.Name, err)
return err
}
return nil
}
func (j *TrainingJob) name() string {
return j.job.ObjectMeta.GetName()
}
// fullname returns the namespace and name for the job.
func (j *TrainingJob) fullname() string {
return j.job.ObjectMeta.GetNamespace() + ":" + j.job.ObjectMeta.GetName()
}
func (j *TrainingJob) SchedulerName() string {
return j.job.Spec.SchedulerName
}
此差异已折叠。
// Copyright 2018 The Kubeflow Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package retryutil
import (
"fmt"
"time"
)
type RetryError struct {
n int
}
func (e *RetryError) Error() string {
return fmt.Sprintf("still failing after %d retries", e.n)
}
func IsRetryFailure(err error) bool {
_, ok := err.(*RetryError)
return ok
}
type ConditionFunc func() (bool, error)
// Retry retries f every interval until after maxRetries.
// The interval won't be affected by how long f takes.
// For example, if interval is 3s, f takes 1s, another f will be called 2s later.
// However, if f takes longer than interval, it will be delayed.
func Retry(interval time.Duration, maxRetries int, f ConditionFunc) error {
if maxRetries <= 0 {
return fmt.Errorf("maxRetries (%d) should be > 0", maxRetries)
}
tick := time.NewTicker(interval)
defer tick.Stop()
for i := 0; ; i++ {
ok, err := f()
if err != nil {
return err
}
if ok {
return nil
}
if i+1 == maxRetries {
break
}
<-tick.C
}
return &RetryError{maxRetries}
}
此差异已折叠。
此差异已折叠。
此差异已折叠。
*.sublime-*
.DS_Store
*.swp
*.swo
tags
language: go
go:
- 1.4.x
- 1.5.x
- 1.6.x
- 1.7.x
- 1.8.x
- 1.9.x
- "1.10.x"
- "1.11.x"
- tip
Copyright (c) 2012, Martin Angers
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册