Kubernetes上单机训练后,没有训练结果
Created by: Daemon007
我参考paddlepaddle进阶指南在kubernetes上进行单机训练。http://doc.paddlepaddle.org/doc_cn/howto/usage/k8s/k8s_cn.html
已经将包含训练数据的mypaddle/paddle:quickstart
镜像放到本地docker上,并且同时放到私有仓库registry上registry.vm-1:5000/mypaddle/paddle:quickstart
yaml文件配置如下(仅仅修改了image名称为registry.vm-1:5000/mypaddle/paddle:quickstart
):
apiVersion: batch/v1
kind: Job
metadata:
name: quickstart
spec:
parallelism: 1
completions: 1
template:
metadata:
name: quickstart
spec:
volumes:
- name: output
hostPath:
path: /home/work/paddle_output
containers:
- name: pi
image: registry.vm-1:5000/mypaddle/paddle:quickstart
command: ["bin/bash", "-c", "/root/paddle/demo/quick_start/train.sh"]
volumeMounts:
- name: output
mountPath: /root/paddle/demo/quick_start/output
restartPolicy: Never
查看训练结果
[root@localhost work]# kubectl get pods
NAME READY STATUS RESTARTS AGE
private-image-test-1 0/1 CrashLoopBackOff 11 34m
[root@localhost work]# kubectl describe pod quickstart-bgxtf
Name: quickstart-bgxtf
Namespace: default
Node: vm-2/192.168.1.48
Start Time: Sun, 16 Jul 2017 03:17:51 -0700
Labels: controller-uid=0606a0ea-6a10-11e7-9839-000c291ffd39
job-name=quickstart
Status: Succeeded
IP: 10.0.1.3
Controllers: Job/quickstart
Containers:
pi:
Container ID: docker://613653ff059d67224d7d1c272150f45bb06ffb175fb59a6a93305b9591994b45
Image: registry.vm-1:5000/mypaddle/paddle:quickstart
Image ID: docker-pullable://registry.vm-1:5000/mypaddle/paddle@sha256:df130bf3ebb08d819e0899141a374b1f14e94e83142a3c3e9d3618b583add7b3
Port:
Command:
bin/bash
-c
/root/paddle/demo/quick_start/train.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 16 Jul 2017 03:21:45 -0700
Finished: Sun, 16 Jul 2017 03:21:51 -0700
Ready: False
Restart Count: 0
Volume Mounts:
/root/paddle/demo/quick_start/output from output (rw)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
output:
Type: HostPath (bare host directory volume)
Path: /home/work/paddle_output
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
28m 28m 1 {default-scheduler } Normal Scheduled Successfully assigned quickstart-bgxtf
to vm-2 28m 28m 1 {kubelet vm-2} spec.containers{pi} Normal Pulling pulling image "registry.vm-1:5000/mypad
dle/paddle:quickstart" 28m 25m 2 {kubelet vm-2} Warning MissingClusterDNS kubelet does not have ClusterDNS IP con
figured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy. 25m 25m 1 {kubelet vm-2} spec.containers{pi} Normal Pulled Successfully pulled image "registry.vm-
1:5000/mypaddle/paddle:quickstart" 25m 25m 1 {kubelet vm-2} spec.containers{pi} Normal Created Created container with docker id 613653
ff059d; Security:[seccomp=unconfined] 25m 25m 1 {kubelet vm-2} spec.containers{pi} Normal Started Started container with docker id 613653
ff059d
显示pod已经结束,但是宿主机上没有训练的结果数据。 请问这是什么原因导致的?