Distributed training: run.sh: Bad substitution error? Hi I am unable to launch the distributed version of the demo/recommendation example.
Created by: bestfitline
Here are my settings in conf.py and run.sh.
conf.py ROOT_DIR = "home/xyz/paddle/paddle" ''' network configuration ''' #pserver nics PADDLE_NIC = "eth0" #pserver port PADDLE_PORT = 7164 #pserver ports num PADDLE_PORTS_NUM = 2 #pserver sparse ports num PADDLE_PORTS_NUM_FOR_SPARSE = 2 #environments setting for all processes in cluster job LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/lib64"
run.sh: python paddle.py \ --job_dispatch_package="${demo/recommendation}" \ --dot_period=10 \ --ports_num_for_sparse=2 \ --log_period=50 \ --num_passes=10 \ --trainer_count=4 \ --saving_period=1 \ --local=0 \ --config=./trainer_config.py \ --save_dir=./output \ --use_gpu=0