[WIP] Run paddle mnist demo on kubernetes(minikube)
Created by: typhoonzero
- Install the latest
kubectl
client - Install minikube on my macbook:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.16.0/minikube-darwin-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/
- start the minikube box
minikube start
- write a yaml for a simple description of mnist demo
testPaddle.yaml
apiVersion: v1
kind: Pod
metadata:
name: testpaddle
spec:
containers:
- name: testpaddle
image: paddledev/paddle:gpu-noavx-demo-latest
env:
- name: PYTHONPATH
value: /root/paddle/demo/mnist
resources:
limits:
cpu: 500m
memory: 50Mi
#storage-iops: 30
requests:
cpu: 500m
memory: 50Mi
#storage-iops: 30
command:
- sleep
- "3600"
volumeMounts:
- mountPath: /mnt
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
- create the pod:
kubectl craete -f testPaddle.yaml
- prepare data:
kubectl exec -it testpaddle -- /root/paddle/demo/mnist/data/get_mnist_data.sh /root/paddle/demo/mnist/data
- change some configurations:
kubectl exec -it testpaddle -- sed -i 's/\.\/data\//\/root\/paddle\/demo\/mnist\/data\//g' /root/paddle/demo/mnist/vgg_16_mnist.py
- run trainer:
kubectl exec -it testpaddle -- /usr/local/bin/../opt/paddle/bin/paddle_trainer --config=/root/paddle/demo/mnist/vgg_16_mnist.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=0 --trainer_count=1 --num_passes=100 --save_dir=/mnt/mnist_vgg_model
Utill then error reported:
...
[INFO 2017-02-22 14:58:47,592 layers.py:1985] output size for __pool_4__ is 1*1
I0222 14:58:47.600934 99 Trainer.cpp:170] trainer mode: Normal
I0222 14:58:48.161231 99 PyDataProvider2.cpp:257] loading dataprovider mnist_provider::process
I0222 14:58:48.165740 99 PyDataProvider2.cpp:257] loading dataprovider mnist_provider::process
I0222 14:58:48.166436 99 GradientMachine.cpp:134] Initing parameters..
I0222 14:58:49.464411 99 GradientMachine.cpp:141] Init parameters done.
F0222 14:58:49.476078 106 PythonUtil.h:345] Check failed: (data) != nullptr Current PYTHONPATH: ['/usr/local/opt/paddle/bin', '/root/paddle/demo/mnist', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7']
Python Error: <type 'exceptions.IOError'> : [Errno 2] No such file or directory: './data/raw_data/train-images-idx3-ubyte'
Python Callstack:
/usr/local/lib/python2.7/dist-packages/paddle/trainer/PyDataProvider2.py : 132
/root/paddle/demo/mnist/mnist_provider.py : 11
Calling iterator next error
*** Check failure stack trace: ***
@ 0x7fac8010fdaa (unknown)
@ 0x7fac8010fce4 (unknown)
@ 0x7fac8010f6e6 (unknown)
@ 0x7fac80112687 (unknown)
@ 0x58ecac paddle::PyDataProvider2::loadThread()
@ 0x7fac7fc8ca60 (unknown)
@ 0x7fac80f22184 start_thread
@ 0x7fac7f3f437d (unknown)
@ (nil) (unknown)
error: error executing remote command: command terminated with non-zero exit code: Error executing in Docker Container: 134