未验证 提交 8e4eebfc 编写于 作者: A anpark 提交者: GitHub

add kdd2020-p3ac (#4238)

* update README

* fix monopoly info

* add kdd2020-p3ac

* add kdd2020-p3ac
上级 a026656e
...@@ -19,3 +19,4 @@ The full list of frontier industrial projects: ...@@ -19,3 +19,4 @@ The full list of frontier industrial projects:
|应用项目|项目简介|开源地址| |应用项目|项目简介|开源地址|
|----|----|----| |----|----|----|
|||| ||||
...@@ -29,7 +29,7 @@ We have conducted extensive experiments with the large-scale urban data of sever ...@@ -29,7 +29,7 @@ We have conducted extensive experiments with the large-scale urban data of sever
1. paddle安装 1. paddle安装
本项目依赖于Paddle Fluid 1.5.1 及以上版本,请参考[安装指南](http://www.paddlepaddle.org/#quick-start)进行安装 本项目依赖于Paddle Fluid 1.6.1 及以上版本,请参考[安装指南](http://www.paddlepaddle.org/#quick-start)进行安装
2. 下载代码 2. 下载代码
......
...@@ -280,7 +280,7 @@ num_in_dimension: ${DEFAULT:num_in_dimension} ...@@ -280,7 +280,7 @@ num_in_dimension: ${DEFAULT:num_in_dimension}
num_out_dimension: ${DEFAULT:num_out_dimension} num_out_dimension: ${DEFAULT:num_out_dimension}
# Directory where the results are saved to # Directory where the results are saved to
eval_dir: ${Train:train_dir}/epoch<s> eval_dir: ${Train:train_dir}/checkpoint_1
# The number of samples in each batch # The number of samples in each batch
batch_size: ${DEFAULT:eval_batch_size} batch_size: ${DEFAULT:eval_batch_size}
...@@ -77,8 +77,7 @@ class HousePrice(BaseNet): ...@@ -77,8 +77,7 @@ class HousePrice(BaseNet):
act=act) act=act)
return _fc return _fc
def pred_format(self, result, **kwargs):
def pred_format(self, result):
""" """
format pred output format pred output
""" """
...@@ -118,7 +117,7 @@ class HousePrice(BaseNet): ...@@ -118,7 +117,7 @@ class HousePrice(BaseNet):
max_house_num = FLAGS.max_house_num max_house_num = FLAGS.max_house_num
max_public_num = FLAGS.max_public_num max_public_num = FLAGS.max_public_num
pred_keys = inputs.keys()
#step1. get house self feature #step1. get house self feature
if FLAGS.with_house_attr: if FLAGS.with_house_attr:
def _get_house_attr(name, attr_vec_size): def _get_house_attr(name, attr_vec_size):
...@@ -136,6 +135,10 @@ class HousePrice(BaseNet): ...@@ -136,6 +135,10 @@ class HousePrice(BaseNet):
else: else:
#no house attr #no house attr
house_vec = fluid.layers.reshape(inputs["house_business"], [-1, self.city_info.business_num]) house_vec = fluid.layers.reshape(inputs["house_business"], [-1, self.city_info.business_num])
pred_keys.remove('house_wuye')
pred_keys.remove('house_kfs')
pred_keys.remove('house_age')
pred_keys.remove('house_lou')
house_self = self.fc_fn(house_vec, 1, act='sigmoid', layer_name='house_self', FLAGS=FLAGS) house_self = self.fc_fn(house_vec, 1, act='sigmoid', layer_name='house_self', FLAGS=FLAGS)
house_self = fluid.layers.reshape(house_self, [-1, 1]) house_self = fluid.layers.reshape(house_self, [-1, 1])
...@@ -192,8 +195,8 @@ class HousePrice(BaseNet): ...@@ -192,8 +195,8 @@ class HousePrice(BaseNet):
net_output = {"debug_output": debug_output, net_output = {"debug_output": debug_output,
"model_output": model_output} "model_output": model_output}
model_output['feeded_var_names'] = inputs.keys() model_output['feeded_var_names'] = pred_keys
model_output['target_vars'] = [label, pred] model_output['fetch_targets'] = [label, pred]
model_output['loss'] = avg_cost model_output['loss'] = avg_cost
#debug_output['pred'] = pred #debug_output['pred'] = pred
......
# P3AC
## 任务说明(Introduction)
TODO
![](docs/framework.png)
## 安装说明(Install Guide)
### 环境准备
1. paddle安装
本项目依赖于Paddle Fluid 1.6.1 及以上版本,请参考[安装指南](http://www.paddlepaddle.org/#quick-start)进行安装
2. 下载代码
克隆数据集代码库到本地, 本代码依赖[Paddle-EPEP框架](https://github.com/PaddlePaddle/epep)
```
git clone https://github.com/PaddlePaddle/epep.git
cd epep
git clone https://github.com/PaddlePaddle/models.git
ln -s models/PaddleST/Research/KDD2020-P3AC/conf/poi_qac_personalized conf/poi_qac_personalized
ln -s models/PaddleST/Research/KDD2020-P3AC/datasets/poi_qac_personalized datasets/poi_qac_personalized
ln -s models/PaddleST/Research/KDD2020-P3AC/nets/poi_qac_personalized nets/poi_qac_personalized
```
3. 环境依赖
python版本依赖python 2.7
### 实验说明
1. 数据准备
TODO
```
#script to download
```
2. 模型训练
```
cp conf/poi_qac_personalized/poi_qac_personalized.local.conf.template conf/poi_qac_personalized/poi_qac_personalized.local.conf
sh run.sh -c conf/poi_qac_personalized/poi_qac_personalized.local.conf -m train [ -g 0 ]
```
3. 模型评估
```
pred_gpu=$1
mode=$2 #query, poi, eval
if [ $# -lt 2 ];then
exit 1
fi
#编辑conf/poi_qac_personalized/poi_qac_personalized.local.conf.template,打开 CUDA_VISIBLE_DEVICES: <pred_gpu>
cp conf/poi_qac_personalized/poi_qac_personalized.local.conf.template conf/poi_qac_personalized/poi_qac_personalized.local.conf
sed -i "s#<pred_gpu>#$pred_gpu#g" conf/poi_qac_personalized/poi_qac_personalized.local.conf
sed -i "s#<mode>#$mode#g" conf/poi_qac_personalized/poi_qac_personalized.local.conf
sh run.sh -c poi_qac_personalized.local -m predict 1>../tmp/$mode-pred$pred_gpu.out 2>../tmp/$mode-pred$pred_gpu.err
```
## 论文下载(Paper Download)
Please feel free to review our paper :)
TODO
## 引用格式(Paper Citation)
TODO
[DEFAULT]
sample_seed: 1234
# The value in `DEFAULT` section will be referenced by other sections.
# For convinence, we will put the variables which changes frequently here and
# let other section refer them
debug_mode: False
#reader: dataset | pyreader | async | datafeed | sync
#data_reader: dataset
dataset_mode: Memory
#data_reader: datafeed
data_reader: pyreader
py_reader_iterable: False
#model_type: lstm_net
model_type: cnn_net
vocab_size: 93896
#emb_dim: 200
emb_dim: 128
time_size: 28
tag_size: 371
fc_dim: 64
emb_lr: 1.0
base_lr: 0.001
margin: 0.35
window_size: 3
pooling_type: max
#activate: sigmoid
activate: None
use_attention: True
use_personal: True
max_seq_len: 128
prefix_word_id: True
#print_period: 200
#TODO personal_resident_drive + neg_only_sample
#query cityid trendency, poi tag/alias
#local-cpu | local-gpu | pserver-cpu | pserver-gpu | nccl2
platform: local-gpu
# Input settings
dataset_name: PoiQacPersonalized
CUDA_VISIBLE_DEVICES: 0,1,2,3
#CUDA_VISIBLE_DEVICES: <pred_gpu>
train_batch_size: 128
#train_batch_size: 2
eval_batch_size: 2
#file_list: ../tmp/data/poi/qac/train_data/part-00000
dataset_dir: ../tmp/data/poi/qac/train_data
#init_train_params: ../tmp/data/poi/qac/tencent_pretrain.words
tag_dict_path: None
qac_dict_path: None
kv_path: None
#qac_dict_path: ./datasets/poi_qac_personalized/qac_term.dict
#tag_dict_path: ./datasets/poi_qac_personalized/poi_tag.dict
#kv_path: ../tmp/data/poi/qac/kv
# Model settings
model_name: PoiQacPersonalized
preprocessing_name: None
#file_pattern: %s-part-*
file_pattern: part-
num_in_dimension: 3
num_out_dimension: 4
# Learning options
num_samples_train: 100
num_samples_eval: 10
max_number_of_steps: 155000
[Convert]
# The name of the dataset to convert
dataset_name: ${DEFAULT:dataset_name}
#dataset_dir: ${DEFAULT:dataset_dir}
dataset_dir: stream
# The output Records file name prefix.
dataset_split_name: train
# The number of Records per shard
num_per_shard: 100000
# The dimensions of net input vectors, it is just used by svm dataset
# which of input are sparse tensors now
num_in_dimension: ${DEFAULT:num_in_dimension}
# The output file name pattern with two placeholders ("%s" and "%d"),
# it must correspond to the glob `file_pattern' in Train and Evaluate
# config sections
#file_pattern: %s-part-%05d
file_pattern: part-
[Train]
#######################
# Dataset Configure #
#######################
# The name of the dataset to load
dataset_name: ${DEFAULT:dataset_name}
# The directory where the dataset files are stored
dataset_dir: ${DEFAULT:dataset_dir}
# dataset_split_name
dataset_split_name: train
batch_shuffle_size: 128
#log_exp or hinge
#loss_func: hinge
loss_func: log_exp
neg_sample_num: 5
reader_batch: True
drop_last_batch: False
# The glob pattern for data path, `file_pattern' must contain only one "%s"
# which is the placeholder for split name (such as 'train', 'validation')
file_pattern: ${DEFAULT:file_pattern}
# The file type text or record
file_type: record
# kv path, used in image_sim
kv_path: ${DEFAULT:kv_path}
# The number of input sample for training
num_samples: ${DEFAULT:num_samples_train}
# The number of parallel readers that read data from the dataset
num_readers: 2
# The number of threads used to create the batches
num_preprocessing_threads: 2
# Number of epochs from dataset source
num_epochs_input: 10
###########################
# Basic Train Configure #
###########################
# Directory where checkpoints and event logs are written to.
train_dir: ../tmp/model/poi/qac/save_model
# The max number of ckpt files to store variables
save_max_to_keep: 40
# The frequency with which the model is saved, in seconds.
save_model_secs: None
# The frequency with which the model is saved, in steps.
save_model_steps: 5000
# The name of the architecture to train
model_name: ${DEFAULT:model_name}
# The dimensions of net input vectors, it is just used by svm dataset
# which of input are sparse tensors now
num_in_dimension: ${DEFAULT:num_in_dimension}
# The dimensions of net output vector, it will be num of classes in image classify task
num_out_dimension: ${DEFAULT:num_out_dimension}
#####################################
# Training Optimization Configure #
#####################################
# The number of samples in each batch
batch_size: ${DEFAULT:train_batch_size}
# The maximum number of training steps
max_number_of_steps: ${DEFAULT:max_number_of_steps}
# The weight decay on the model weights
#weight_decay: 0.00000001
weight_decay: None
# The decay to use for the moving average. If left as None, then moving averages are not used
moving_average_decay: None
# ***************** learning rate options ***************** #
# Specifies how the learning rate is decayed. One of "fixed", "exponential" or "polynomial"
learning_rate_decay_type: fixed
# Learning rate decay factor
learning_rate_decay_factor: 0.1
# Proportion of training steps to perform linear learning rate warmup for
learning_rate_warmup_proportion: 0.1
init_learning_rate: 0
learning_rate_warmup_steps: 10000
# The minimal end learning rate used by a polynomial decay learning rate
end_learning_rate: 0.0001
# Number of epochs after which learning rate decays
num_epochs_per_decay: 10
# A boolean, whether or not it should cycle beyond decay_steps
learning_rate_polynomial_decay_cycle: False
# ******************* optimizer options ******************* #
# The name of the optimizer, one of the following:
# "adadelta", "adagrad", "adam", "ftrl", "momentum", "sgd" or "rmsprop"
#optimizer: weight_decay_adam
optimizer: adam
#optimizer: sgd
# Epsilon term for the optimizer, used for adadelta, adam, rmsprop
opt_epsilon: 1e-8
# conf for adadelta
# The decay rate for adadelta
adadelta_rho: 0.95
# Starting value for the AdaGrad accumulators
adagrad_initial_accumulator_value: 0.1
# conf for adam
# The exponential decay rate for the 1st moment estimates
adam_beta1: 0.9
# The exponential decay rate for the 2nd moment estimates
adam_beta2: 0.997
adam_weight_decay: 0.01
#adam_exclude_from_weight_decay: LayerNorm,layer_norm,bias
# conf for ftrl
# The learning rate power
ftrl_learning_rate_power: -0.1
# Starting value for the FTRL accumulators
ftrl_initial_accumulator_value: 0.1
# The FTRL l1 regularization strength
ftrl_l1: 0.0
# The FTRL l2 regularization strength
ftrl_l2: 0.01
# conf for momentum
# The momentum for the MomentumOptimizer and RMSPropOptimizer
momentum: 0.9
# conf for rmsprop
# Decay term for RMSProp
rmsprop_decay: 0.9
# Number of model clones to deploy
num_gpus: 3
#############################
# Log and Trace Configure #
#############################
# The frequency with which logs are print
log_every_n_steps: 100
# The frequency with which logs are trace.
trace_every_n_steps: 1
[Evaluate]
# process mode: pred, eval or export
#proc_name: eval
proc_name: pred
#data_reader: datafeed
py_reader_iterable: True
#platform: hadoop
platform: local-gpu
qac_dict_path: ./datasets/poi_qac_personalized/qac_term.dict
tag_dict_path: ./datasets/poi_qac_personalized/poi_tag.dict
#kv_path: ../tmp/data/poi/qac/kv
# The directory where the dataset files are stored
#file_list: ../tmp/x.bug
file_list: ../tmp/data/poi/qac/recall_data/<mode>/part-0<pred_gpu>
#file_list: ../tmp/data/poi/qac/ltr_data/<mode>/part-0<pred_gpu>
#dataset_dir: stream_record
# The directory where the model was written to or an absolute path to a checkpoint file
init_pretrain_model: ../tmp/model/poi/qac/save_model_logexp/checkpoint_125000
#init_pretrain_model: ../tmp/model/poi/qac/save_model_personal_logexp/checkpoint_125000
#init_pretrain_model: ../tmp/model/poi/qac/save_model_wordid_logexp/checkpoint_125000
#init_pretrain_model: ../tmp/model/poi/qac/save_model_personal_wordid_logexp/checkpoint_125000
#init_pretrain_model: ../tmp/model/poi/qac/save_model_attention_logexp/checkpoint_125000
#init_pretrain_model: ../tmp/model/poi/qac/save_model_attention_personal_logexp/checkpoint_125000
#init_pretrain_model: ../tmp/model/poi/qac/save_model_attention_wordid_logexp/checkpoint_125000
#init_pretrain_model: ../tmp/model/poi/qac/save_model_attention_personal_wordid_logexp/checkpoint_125000
model_type: cnn_net
fc_dim: 64
use_attention: False
use_personal: False
prefix_word_id: False
#dump_vec: query
#dump_vec: <mode>
dump_vec: eval
# The number of samples in each batch
#batch_size: ${DEFAULT:eval_batch_size}
batch_size: 1
# The file type text or record
#file_type: record
file_type: text
reader_batch: False
# only exectute evaluation once
eval_once: True
#######################
# Dataset Configure #
#######################
# The name of the dataset to load
dataset_name: ${DEFAULT:dataset_name}
# The name of the train/test split
dataset_split_name: validation
# The glob pattern for data path, `file_pattern' must contain only one "%s"
# which is the placeholder for split name (such as 'train', 'validation')
file_pattern: ${DEFAULT:file_pattern}
# The number of input sample for evaluation
num_samples: ${DEFAULT:num_samples_eval}
# The number of parallel readers that read data from the dataset
num_readers: 2
# The number of threads used to create the batches
num_preprocessing_threads: 1
# Number of epochs from dataset source
num_epochs_input: 1
# The name of the architecture to evaluate
model_name: ${DEFAULT:model_name}
# The dimensions of net input vectors, it is just used by svm dataset
# which of input are sparse tensors now
num_in_dimension: ${DEFAULT:num_in_dimension}
# The dimensions of net output vector, it will be num of classes in image classify task
num_out_dimension: ${DEFAULT:num_out_dimension}
# Directory where the results are saved to
eval_dir: ${Train:train_dir}/checkpoint_1
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册