Created by: tizhou86
Added paddle on kubernetes tutorial in english.
Created by: pineking
@tizhou86 我参照 https://github.com/tizhou86/Paddle/blob/develop/doc/kubernetes_on_paddle.md#use-kubernetes-for-training 运行了下,发现 pod 运行报错,提示缺少 dict.txt:
IOError: [Errno 2] No such file or directory: './data/dict.txt'
$ docker logs k8s_pi.8f635e06_quickstart-mx24w_default_8626edbc-c1d0-11e6-b8dc-002590c0f780_438904e3 I1214 07:41:10.549923 26 Util.cpp:155] commandline: /usr/local/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lr.py --save_dir=./output --trainer_count=4 --log_period=20 --num_passes=15 --use_gpu=false --show_parameter_stats_period=100 --test_all_data_in_one_period=1 I1214 07:41:10.550160 26 Util.cpp:130] Calling runInitFunctions I1214 07:41:10.550518 26 Util.cpp:143] Call runInitFunctions done. Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/paddle/trainer/config_parser.py", line 3406, in parse_config_and_serialize config = parse_config(config_file, config_arg_str) File "/usr/local/lib/python2.7/dist-packages/paddle/trainer/config_parser.py", line 3382, in parse_config execfile(config_file, make_config_environment(config_file, config_args)) File "trainer_config.lr.py", line 21, in <module> with open(dict_file, 'r') as f: IOError: [Errno 2] No such file or directory: './data/dict.txt' F1214 07:41:10.609591 26 PythonUtil.cpp:134] Check failed: (ret) != nullptr Current PYTHONPATH: ['/usr/local/opt/paddle/bin', '/root/paddle/demo/quick_start', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7'] Python Error: <type 'exceptions.IOError'> : [Errno 2] No such file or directory: './data/dict.txt' Python Callstack: /usr/local/lib/python2.7/dist-packages/paddle/trainer/config_parser.py : 3406 /usr/local/lib/python2.7/dist-packages/paddle/trainer/config_parser.py : 3382 trainer_config.lr.py : 21 Call Object failed. *** Check failure stack trace: *** @ 0x7fd3cc271daa (unknown) @ 0x7fd3cc271ce4 (unknown) @ 0x7fd3cc2716e6 (unknown) @ 0x7fd3cc274687 (unknown) @ 0x76814a paddle::callPythonFuncRetPyObj() @ 0x76832c paddle::callPythonFunc() @ 0x684ef3 paddle::TrainerConfigHelper::TrainerConfigHelper() @ 0x685534 paddle::TrainerConfigHelper::createFromFlags() @ 0x513207 main @ 0x7fd3cb47df45 (unknown) @ 0x51f2a5 (unknown) @ (nil) (unknown) /usr/local/bin/paddle: line 109: 26 Aborted (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
docker image 中 data 目录下文件如下:
root@c020d520a0fd:~/paddle/demo/quick_start/data# ll total 484252 drwxr-xr-x 1 root root 6 Dec 14 07:58 ./ drwxr-xr-x 1 root root 17 Dec 14 07:27 ../ -rwxr-xr-x 2 root root 1052 Nov 30 09:02 get_data.sh* drwxr-xr-x 22 root root 4096 Dec 7 10:09 mosesdecoder-master/ -rw-r--r-- 2 root root 16 Nov 30 09:02 pred.list -rw-r--r-- 2 root root 1740 Nov 30 09:02 pred.txt -rw-r--r-- 1 root root 495854086 Apr 26 2016 reviews_Electronics_5.json.gz
Created by: pineking
@drinktee 多谢提醒,我试下 @tizhou86 需要执行 preprocess.sh 操作,可以补充到 https://github.com/tizhou86/Paddle/blob/develop/doc/kubernetes_on_paddle.md
Created by: wangkuiyi
@tizhou86 我看到这个PR的Travis CI没有过。具体错误在 https://travis-ci.org/PaddlePaddle/Paddle/jobs/183563852#L700 。意思好像是有些文本文件最后一行不是空行。
对code style的check是 @reyoung 最近加入的。你应该是需要在本机上安装 pre-commit:
pip install pre-commit
以及 clang-format
brew update && brew install clang-format
据 @reyoung 说,好像 clang-format的版本得是4.0.0以上:
$ clang-format --version clang-format version 4.0.0 (tags/google/testing/2016-08-03)
Created by: wangkuiyi
@tizhou86 我问了Paddle pre-commit check的配置者 @reyoung ,目前 checks failed
的原因是 —— 所有文本文件文末必须有且只有一个空行。
具体报错信息在这里: https://travis-ci.org/PaddlePaddle/Paddle/jobs/184222241#L699
Created by: xiang90
@luotao1 @wangkuiyi
Have you tried https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/aws to setup k8s on aws? This should make your life a lot easier.
Created by: xiang90
for the local dev doc, probably you want to try https://github.com/kubernetes/minikube.
it is the easiest way to setup a local k8s for testing/demo purpose.
Created by: helinwang
@tizhou86 我按照这个说明进行尝试,会卡在
[master running] Attaching IP 52.9.99.195 to instance i-0eec1bfda9aa908d9 Attaching persistent data volume (vol-024eeaf7baa728f7c) to master 2016-12-21T00:34:36.626Z /dev/sdb i-0eec1bfda9aa908d9 attaching vol-024eeaf7baa728f7c Cluster "aws_kubernetes" set. User "aws_kubernetes" set. Context "aws_kubernetes" set. Switched to context "aws_kubernetes". User "aws_kubernetes-basic-auth" set. Wrote config for aws_kubernetes to /Users/helinwang/.kube/config Creating minion configuration Creating autoscaling group 0 minions started; waiting 0 minions started; waiting 0 minions started; waiting 0 minions started; waiting 0 minions started; waiting 2 minions started; ready Waiting for cluster initialization. This will continually check to see if the API for kubernetes is reachable. This might loop forever if there was some uncaught error during start up
我用不同的设置尝试了三次,每次都卡在这里:
export KUBERNETES_PROVIDER=aws; curl -sS https://get.k8s.io | bash export KUBE_AWS_ZONE=us-west-1a; export KUBERNETES_PROVIDER=aws; curl -sS https://get.k8s.io | bash export NUM_NODES=2&&export KUBE_AWS_ZONE=us-west-1a; export KUBERNETES_PROVIDER=aws; curl -sS https://get.k8s.io | bash
请问你碰到过类似的问题吗?