#### Experimental notice: This project is still experimental and only serves as a proof of concept for running MindSpore on Kubernetes. The current version of ms-operator is based on an early version of [PyTorch Operator](https://github.com/kubeflow/pytorch-operator) and [TF Operator](https://github.com/kubeflow/tf-operator). Right now MindSpore supports running LeNet with MNIST dataset on a single node, distributed training examples are expected in the near future.
-[MindSpore Operator](#mindspore-operator)
-[Introduction of MindSpore and ms-operator](#introduction-of-mindspore-and-ms-operator)
We will do a MNIST training to check the eligibility of MindSpore running on
Kubernetes:
```
cd examples/ && kubectl apply -f ms-mnist.yaml
```
The job is simply importing MindSpore packges, the dataset is already included in the `MNIST_Data` folder, executing only one epoch and printing result which should only consume little time. After the job completed, you should be able to check the job status and see the result logs. You can check the source training code in `examples/` folder.
```
kubectl get pod msjob-mnist && kubectl logs msjob-mnist
```
...
...
@@ -160,7 +174,7 @@ still working on implementing distributed training of LeNet with MNIST dataset
on Kubernetes, together with the distributed training on different backends
(GPU || `Ascend`) are also expected in the near future.
## Future work
## Future Work
[Kubeflow](https://github.com/kubeflow/kubeflow) just announced its first major
1.0 release recently with the graduation of a core set of stable applications
...
...
@@ -198,10 +212,9 @@ dependent resources, and reconcile the desired states. If MindSpore can leverage
MPI Operator together with the high performance `Ascend` processor, it is
possible that MindSpore will bring distributed training to an even higher level.
### Example yaml file
## Appendix: Example yaml file
The yaml file to create distributed training MSJob expected to be like this: