未验证 提交 d9c384b2 编写于 作者: D Dong Daxiang 提交者: GitHub

Merge pull request #2 from suoych/master

Update template
# ElasticCTR
ElasticCTR是分布式训练CTR预估任务和Serving流程一键部署的方案,用户只需配置数据源、样本格式即可完成一系列的训练与预测任务
......@@ -30,13 +29,17 @@ ElasticCTR采用PaddlePaddle提供的全异步分布式训练方式,在保证
## <span id='head2'>2. 配置集群</span>
运行本方案前,需要用户已经搭建好k8s集群,并安装好volcano组件。k8s环境部署比较复杂,本文不涉及。百度智能云CCE容器引擎申请后即可使用,百度云上创建k8s的方法用户可以参考这篇文档[百度云创建k8s教程及使用指南](cluster_config.md)
运行本方案前,需要用户已经搭建好k8s集群,并安装好volcano组件。k8s环境部署比较复杂,本文不涉及。百度智能云CCE容器引擎申请后即可使用,百度云上创建k8s的方法用户可以参考这篇文档[百度云创建k8s教程及使用指南](cluster_config.md)此外,Elastic CTR还支持在其他云上部署,可以参考以下两篇文档[华为云创建k8s集群](huawei_k8s.md)[aws创建k8s集群](aws_k8s.md)
## <span id='head3'>3. 一键部署教程</span>
您可以使用我们提供的脚本elastic-control.sh来完成部署,脚本的使用方式如下:
您可以使用我们提供的脚本elastic-control.sh来完成部署,在运行脚本前,请确保您的机器装有python3并通过pip安装了mlflow,安装mlflow的命令如下:
```bash
python3 -m pip install mlflow -i https://pypi.tuna.tsinghua.edu.cn/simple
```
脚本的使用方式如下:
```bash
bash elastic-control.sh [COMMAND] [OPTIONS]
```
其中可选的命令(COMMAND)如下:
......@@ -84,12 +87,14 @@ bash elastic-control.sh -l
```
2.mlflow可视化界面
注意:该方法要求客户端机器可以使用浏览器
注意:为了正常预览,请确保您本机的8111端口未被占用
在训练过程中,用户可以输入以下指令后用浏览器访问127.0.0.1:8111查看训练情况界面
在训练过程中,用户还可以通过mlflow的可视化界面来追踪训练进度,当屏幕上有如下输出后,
```bash
kubectl port-forward fleet-ctr-demo-trainer-0 8111:8111
mlflow ready!
```
用户可以用本机的浏览器访问127.0.0.1:8111查看训练情况界面。如果本机有公网ip且8111端口开放,那么用户可以在任何机器上用浏览器访问${external_ip}:8111 查看训练情况界面
可以看到页面显示效果如下所示:
![elastic.png](https://github.com/suoych/WebChat/raw/master/MacHi%202019-11-25%2014-19-30.png)
![dashboard.png](https://github.com/suoych/WebChat/raw/master/MacHi%202019-11-25%2014-18-32.png)
......@@ -108,6 +113,3 @@ bash elastic-control.sh -c
# AWS搭建k8s集群
本文档旨在介绍如何在aws上搭建k8s集群
* [1. 流程概述](#head1)
* [2. 购买跳板机](#head2)
* [3. 部署集群](#head3)
## <span id='head1'>1. 流程概述</span>
在aws上搭建k8s集群主要有以下两个步骤:
1.购买跳板机
首先需要购买一个ec2实例作为跳板机来控制k8s集群,这个实例不需要很高的配置
2.部署集群
用上一步购买的跳板机创建集群,集群的配置可以自行调整
下面对每一步进行详细介绍。
## <span id='head2'>2. 购买跳板机</span>
用户可以在EC2控制台购买想要的实例作为跳板机。
具体的操作如下:
1. 打开 Amazon EC2 控制台,从控制台控制面板中,点击启动实例按钮。
![run_instance.png](https://github.com/suoych/WebChat/raw/master/run_instance.png)
2. 选择合适的AMI,建议选用Amazon Linux 2 AMI。
![choose_AMI.png](https://github.com/suoych/WebChat/raw/master/choose_AMI.png)
3. 选择实例类型,建议选用默认的t2.micro,选好后点击审核和启动
![choose_instance_type.png](https://github.com/suoych/WebChat/raw/master/choose_instance_type.png)
4. 在审核界面,在核查实例启动页面上的安全组栏中点击编辑安全组,然后在配置安全组界面中点击选择一个现有的安全组,点击名称为default的安全组,再点击审核和启动。
![review_instance.png](https://github.com/suoych/WebChat/raw/master/review_instance.png)
![select_security_group.png](https://github.com/suoych/WebChat/raw/master/select_security_group.png)
5. 在审核界面点击启动,在弹出的密钥对窗口中选择创建新密钥对,自定义密钥名称后下载密钥对,请一定保存好密钥对文件,因为丢失后无法再次下载。以上操作完成后点击启动实例即可完成跳板机购买。
![create_key.png](https://github.com/suoych/WebChat/raw/master/create_key.png)
请注意:密钥对文件下载之后请修改权限为400。
## <span id='head3'>3. 部署集群</span>
在上一步购买的实例启动后会显示公网ip和DNS,连接到实例进行部署,连接需要用到刚才下载的密钥对文件(后缀为.pem),连接指令如下:
```bash
ssh -i ec2key.pem ec2-user@12.23.34.123
```
```bash
ssh -i ec2key.pem ec2-user@ec2-12-23-34-123.us-west-2.compute.amazonaws.com
```
连接到跳板机后,需要安装一系列操控组件,具体如下:
1. 安装pip
```bash
sudo yum -y install python-pip
```
2. 安装或升级 AWS CLI
```bash
sudo pip install --upgrade awscli
```
3. 安装 eksctl
```bash
curl --silent \
--location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" \
| tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
```
4. 安装 kubectl
```bash
curl -o kubectl https://amazon-eks.s3-us-west-2.amazonaws.com/1.11.5/2018-12-06/bin/linux/amd64/kubectl
chmod +x ./kubectl
mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$HOME/bin:$PATH
```
5. 安装 aws-iam-authenticator
```bash
curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.11.5/2018-12-06/bin/linux/amd64/aws-iam-authenticator
chmod +x aws-iam-authenticator
cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$HOME/bin:$PATH
```
6. 安装 ksonnet
```bash
export KS_VER=0.13.1
export KS_PKG=ks_${KS_VER}_linux_amd64
wget -O /tmp/${KS_PKG}.tar.gz https://github.com/ksonnet/ksonnet/releases/download/v${KS_VER}/${KS_PKG}.tar.gz
mkdir -p ${HOME}/bin
tar -xvf /tmp/$KS_PKG.tar.gz -C ${HOME}/bin
sudo mv ${HOME}/bin/$KS_PKG/ks /usr/local/bin
```
安装好这些组件后,用户可以购买集群并部署,指令如下:
```bash
eksctl create cluster paddle-cluster \
--version 1.13 \
--nodes 2 \
--node-type=m5.2xlarge \
--timeout=40m \
--ssh-access \
--ssh-public-key ec2.key \
--region us-west-2 \
--auto-kubeconfig
```
其中:
**--version** 指k8s版本,目前aws支持1.12, 1.13 和 1.14
**--nodes** 指节点数量
**--node-type** 指节点实例型号,用户可以挑选自己喜欢的实例套餐
**--ssh-public-key** 用户可以使用之前购买跳板机时定义的密钥名称
**--region** 指节点所在地区
部署集群所需时间较长,请耐心等待,当部署成功后,用户可以测试集群,具体方法如下:
1. 输入以下指令查看节点信息:
```bash
kubectl get nodes -o wide
```
2. 验证集群是否处于活动状态:
```bash
aws eks --region <region> describe-cluster --name <cluster-name> --query cluster.status
```
应看到如下输出:
```
"ACTIVE"
```
3. 如果在同一跳板机中具有多个集群设置,请验证 kubectl 上下文:
```bash
kubectl config get-contexts
```
如果未按预期设置该上下文,请使用以下命令修复此问题:
```bash
aws eks --region <region> update-kubeconfig --name <cluster-name>
```
以上是AWS搭建k8s集群的全部步骤,用户接下来可以再自行在aws上搭建hdfs,并在跳板机上部署elastic ctr2.0
export HDFS_ADDRESS="hdfs://192.168.48.87:9000"
export HDFS_UGI="root,i"
export START_DATE_HR=20191205/00
export END_DATE_HR=20191205/00
export START_DATE_HR=20191221/00
export END_DATE_HR=20191221/09
export DATASET_PATH="/train_data"
export SPARSE_DIM="1000001"
......@@ -203,7 +203,7 @@ function generate_cube_yaml()
echo "spec:"
echo " containers:"
echo " - name: cube-transfer"
echo " image: hub.baidubce.com/ctr/cube-transfer:v1"
echo " image: hub.baidubce.com/ctr/cube-transfer:v2"
echo " workingDir: /"
echo " env:"
echo " - name: POD_IP"
......@@ -230,14 +230,16 @@ function generate_fileserver_yaml()
{
check_tools sed
check_files fileserver.yaml.template
if [ $# -ne 2 ]; then
if [ $# -ne 3 ]; then
echo "Invalid argument to function generate_fileserver_yaml"
return -1
else
hdfs_address=$1
hdfs_ugi=$2
dataset_path=$3
sed -e "s#<$ HDFS_ADDRESS $>#$hdfs_address#g" \
-e "s#<$ HDFS_UGI $>#$hdfs_ugi#g" \
-e "s#<$ DATASET_PATH $>#$dataset_path#g" \
fileserver.yaml.template > fileserver.yaml
echo "File server yaml written to fileserver.yaml"
fi
......@@ -329,7 +331,7 @@ function config_resource()
"HDFS_ADDRESS=$HDFS_ADDRESS HDFS_UGI=$HDFS_UGI START_DATE_HR=$START_DATE_HR END_DATE_HR=$END_DATE_HR "\
"SPARSE_DIM=$SPARSE_DIM DATASET_PATH=$DATASET_PATH "
generate_cube_yaml $CUBE || die "config_resource: generate_cube_yaml failed"
generate_fileserver_yaml $HDFS_ADDRESS $HDFS_UGI || die "config_resource: generate_fileserver_yaml failed"
generate_fileserver_yaml $HDFS_ADDRESS $HDFS_UGI $DATASET_PATH || die "config_resource: generate_fileserver_yaml failed"
generate_yaml $PSERVER $TRAINER $CPU $MEM $DATA_PATH $HDFS_ADDRESS $HDFS_UGI $START_DATE_HR $END_DATE_HR $SPARSE_DIM $DATASET_PATH || die "config_resource: generate_yaml failed"
upload_slot_conf $SLOT_CONF || die "config_resource: upload_slot_conf failed"
return 0
......@@ -344,23 +346,32 @@ function log()
else
echo "Trainer Log Has not been generated"
fi
echo "\nFile Server Log:"
kubectl logs file-server | grep __main__ > file-server.log
echo ""
echo "File Server Log:"
file_server_pod=$(kubectl get po | grep file-server | awk {'print $1'})
kubectl logs ${file_server_pod} | grep __main__ > file-server.log
if [ -f file-server.log ]; then
tail -n 20 file-server.log
else
echo "File Server Log Has not been generated"
fi
echo "\nCube Transfer Log:"
echo ""
echo "Cube Transfer Log:"
kubectl logs cube-transfer | grep "all reload ok" > cube-transfer.log
if [ -f cube-transfer.log ]; then
tail -n 20 cube-transfer.log
else
echo "Cube Transfer Log Has not been generated"
fi
echo "\nPadddle Serving Log:"
kubectl logs paddleserving | grep INFO > paddleserving.log
echo ""
echo "Padddle Serving Log:"
serving_pod=$(kubectl get po | grep paddleserving | awk {'print $1'})
kubectl logs ${serving_pod} | grep __INFO__ > paddleserving.log
if [ -f paddleserving.log ]; then
tail -n 20 paddleserving.log
else
echo "PaddleServing Log Has not been generated"
fi
}
datafile_config()
......@@ -370,7 +381,8 @@ datafile_config()
function apply()
{
check_tools kubectl
echo "Waiting for pod..."
check_tools kubectl
install_volcano
kubectl get pod | grep cube | awk {'print $1'} | xargs kubectl delete pod >/dev/null 2>&1
kubectl get pod | grep paddleserving | awk {'print $1'} | xargs kubectl delete pod >/dev/null 2>&1
......@@ -383,6 +395,9 @@ function apply()
kubectl delete jobs.batch.volcano.sh fleet-ctr-demo
fi
kubectl apply -f fleet-ctr.yaml
python3 listen.py &
echo "waiting for mlflow..."
python3 service.py
return
}
......@@ -399,7 +414,7 @@ PSERVER=2
DATA_PATH="/app"
SLOT_CONF="./slot.conf"
VERBOSE=0
DATA_CONF_PATH="data.config"
DATA_CONF_PATH="./data.config"
source $DATA_CONF_PATH
# Parse arguments
......
apiVersion: v1
kind: Pod
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: file-server
labels:
app: file-server
spec:
containers:
- name: file-server
image: hub.baidubce.com/ctr/file-server:hdfs7
ports:
- containerPort: 8080
command: ['bash']
args: ['run.sh']
env:
- name: JAVA_HOME
value: /usr/local/jdk1.8.0_231
- name: HADOOP_HOME
value: /usr/local/hadoop-2.8.5
- name: HDFS_ADDRESS
value: "<$ HDFS_ADDRESS $>"
- name: HDFS_UGI
value: "<$ HDFS_UGI $>"
- name: PATH
value: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/jdk1.8.0_231/bin:/usr/local/hadoop-2.8.5/bin:/Python-3.7.0:/node-v12.13.1-linux-x64/bin
replicas: 1
template:
metadata:
name: file-server
labels:
app: file-server
spec:
containers:
- name: file-server
image: hub.baidubce.com/ctr/file-server:latest
imagePullPolicy: Always
ports:
- containerPort: 8080
command: ['bash']
args: ['run.sh']
env:
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: PADDLE_CURRENT_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: JAVA_HOME
value: /usr/local/jdk1.8.0_231
- name: HADOOP_HOME
value: /usr/local/hadoop-2.8.5
- name: HADOOP_HOME
value: /usr/local/hadoop-2.8.5
- name: DATASET_PATH
value: "<$ DATASET_PATH $>"
- name: HDFS_ADDRESS
value: "<$ HDFS_ADDRESS $>"
- name: HDFS_UGI
value: "<$ HDFS_UGI $>"
- name: PATH
value: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/jdk1.8.0_231/bin:/usr/local/hadoop-2.8.5/bin:/Python-3.7.0:/node-v12.13.1-linux-x64/bin
---
kind: Service
apiVersion: v1
......
......@@ -21,11 +21,11 @@ spec:
imagePullSecrets:
- name: default-secret
containers:
- image: hub.baidubce.com/ctr/fleet-ctr:83
- image: hub.baidubce.com/ctr/fleet-ctr:latest
command:
- paddle_k8s
- start_fluid
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: preserver
resources:
limits:
......@@ -113,11 +113,11 @@ spec:
imagePullSecrets:
- name: default-secret
containers:
- image: hub.baidubce.com/ctr/fleet-ctr:83
- image: hub.baidubce.com/ctr/fleet-ctr:latest
command:
- paddle_k8s
- start_fluid
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: trainer
resources:
limits:
......@@ -194,20 +194,5 @@ spec:
- name: PATH
value: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/jdk1.8.0_231/bin:/usr/local/hadoop-2.8.5/bin:/Python-3.7.0
- name: ENTRY
value: cd workspace && (bash mlflow_run.sh &) && python3 train_with_mlflow.py slot.conf
value: cd workspace && python3 train_with_mlflow.py slot.conf
restartPolicy: OnFailure
---
kind: Service
apiVersion: v1
metadata:
name: mlflow
spec:
type: LoadBalancer
ports:
- name: mlflow
port: 8111
targetPort: 8111
selector:
app: mlflow
import time
import os
import socket
def rewrite_yaml(path):
for root , dirs, files in os.walk(path):
for name in files:
if name == "meta.yaml":
if len(root.split("/mlruns")) != 2:
print("Error: the parent directory of your current directory should not contain a path named mlruns")
exit(0)
cmd = "sed -i \"s#/workspace#" + root.split("/mlruns")[0] + "#g\" " + os.path.join(root, name)
os.system(cmd)
time.sleep(5)
while True:
r = os.popen("kubectl get pod | grep fleet-ctr-demo-trainer-0 | awk {'print $3'}")
info = r.readlines()
if info == []:
exit(0)
for line in info:
line = line.strip()
if line == "Completed" or line == "Terminating":
exit(0)
os.system("kubectl cp fleet-ctr-demo-trainer-0:workspace/mlruns ./mlruns_temp >/dev/null 2>&1")
if os.path.exists("./mlruns_temp"):
os.system("rm -rf ./mlruns >/dev/null 2>&1")
os.system("mv ./mlruns_temp ./mlruns >/dev/null 2>&1")
rewrite_yaml(os.getcwd()+"/mlruns")
time.sleep(30)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import os
start_service_flag = True
while True:
os.system("kubectl cp fleet-ctr-demo-trainer-0:workspace/mlruns ./mlruns")
if os.path.exists("./mlruns") and start_service_flag:
os.system("mlflow server --default-artifact-root ./mlruns/0 --host 0.0.0.0 --port 8111 &")
start_service_flag = False
time.sleep(30)
apiVersion: v1
kind: Pod
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: paddleserving
labels:
app: paddleserving
spec:
containers:
- name: paddleserving
image: hub.baidubce.com/ctr/paddleserving:test_elastic_ctr_1
#image: hub.baidubce.com/ctr/paddleserving:latest
workingDir: /serving
command: ['/bin/bash']
args: ['run.sh']
#command: ['sleep']
#args: ['1000000']
ports:
- containerPort: 8010
name: serving
replicas: 1
template:
metadata:
name: paddleserving
labels:
app: paddleserving
spec:
containers:
- name: paddleserving
image: hub.baidubce.com/ctr/paddleserving:latest
imagePullPolicy: Always
workingDir: /serving
command: ['/bin/bash']
args: ['run.sh']
ports:
- containerPort: 8010
name: serving
---
apiVersion: v1
kind: Service
......
# replace suo with other namespace
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: suo
namespace: suo
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: suo
namespace: suo
subjects:
- kind: ServiceAccount
name: default
namespace: suo
roleRef:
kind: ClusterRole
name: suo
apiGroup: rbac.authorization.k8s.io
import time
import os
import socket
def net_is_used(port, ip='0.0.0.0'):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect((ip, port))
s.shutdown(2)
print('Error: %s:%d is used' % (ip, port))
return True
except:
#print('%s:%d is unused' % (ip, port))
return False
os.system("ps -ef | grep ${USER} | grep mlflow | awk {'print $2'} | xargs kill -9 >/dev/null 2>&1")
os.system("ps -ef | grep ${USER} | grep gunicorn | awk {'print $2'} | xargs kill -9 >/dev/null 2>&1")
while True:
if os.path.exists("./mlruns") and not net_is_used(8111):
os.system("mlflow server --default-artifact-root ./mlruns/0 --host 0.0.0.0 --port 8111 >/dev/null 2>&1 &")
time.sleep(3)
print("mlflow ready!")
exit(0)
time.sleep(30)
# 华为云搭建k8s集群
本文档旨在介绍如何在华为云上搭建k8s集群,以便部署ElasticCTR。
* [1. 流程概述](#head1)
* [2. 购买集群](#head2)
* [3. 定义负载均衡](#head2)
* [4. 上传镜像](#head3)
## <span id='head1'>1. 流程概述</span>
在华为云上搭建k8s集群主要有以下三个步骤:
1.购买集群
首先需要登录到华为云的cce控制台,购买合适配置的集群
2.定义负载均衡
用上一步购买的跳板机创建集群,集群的配置可以自行调整
3. 上传镜像
华为云提供了镜像仓库的服务,我们可以将所需的镜像上传到仓库,以后拉取镜像的速度会变快
下面对每一步进行详细介绍。
## <span id='head2'>2. 购买集群</span>
用户登录到华为云的cce控制台购买k8s集群,具体的操作如下:
1. 打开华为云cce控制台,从控制台控制面板中,点击购买Kubernetes集群,在购买混合集群界面的服务选型项下设置集群信息,总体配置等。
![huawei_cce_choose_service.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_choose_service.png)
2. 集群信息设置好后,点击下一步:创建节点,在创建节点项下设置节点配置。
![huawei_cce_create_node.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_create_node.png)
![huawei_cce_create_node_configure.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_create_node_configure.png)
![huawei_cce_create_node_configure_1.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_create_node_configure_1.png)
3. 节点设置完成后,在安装插件项的高级功能插件下选择volcano,然后可以确认配置,完成支付。
![huawei_cce_plugin.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_plugin.png)
## <span id='head3'>3. 定义负载均衡</span>
由于华为云对于负载均衡有限制,建议参考[华为云负载均衡](https://support.huaweicloud.com/usermanual-cce/cce_01_0014.html),修改 fileserver.yaml.template和pdserving.yaml 关于 Service定义的metadata,参见下图
![huawei_cce_load_balancer.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_load_balancer.png)
## <span id='head4'>4. 上传镜像</span>
注:本步骤为可选操作,如果经费充足可以不做这一步。
点击镜像仓库,按照提示登录仓库,将指令复制到节点终端上执行,如下图所示:
![huawei_cce_image_repo.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_image_repo.png)
随后按照提示将elastic ctr所需的镜像上传到仓库中
![huawei_cce_image_upload.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_image_upload.png)
完成操作后可以看到效果如下:
![huawei_cce_image_upload_success.png](https://github.com/suoych/WebChat/raw/master/huawei_cce_image_upload_success.png)
至此,我们完成了华为云搭建k8s的流程,用户可以自行搭建hdfs,部署elastic ctr。
......@@ -21,6 +21,7 @@ import google.protobuf.text_format as text_format
import paddle.fluid.proto.framework_pb2 as framework_pb2
import paddle.fluid.core as core
import six
import subprocess as sp
inference_path = sys.argv[2]+ '/inference_only'
feature_names = []
......@@ -37,8 +38,8 @@ sparse_input_ids = [
label = fluid.layers.data(
name='label', shape=[1], dtype='int64')
sparse_feature_dim = int(os.environ('SPARSE_DIM'))
dataset_prefix
sparse_feature_dim = int(os.environ['SPARSE_DIM'])
dataset_prefix = os.environ['DATASET_PATH']
embedding_size = 9
current_date_hr = sys.argv[3]
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册