README.md

# FLEN

 以下是本例的简要目录结构及说明： 

```
├── data #样例数据
	├── sample_data
		├── train
			├── sample_train.txt
	├── run.sh
	├── get_slot_data.py
├── __init__.py
├── README.md # 文档
├── model.py #模型文件
├── config.yaml #配置文件
```

## 简介

[《FLEN: Leveraging Field for Scalable CTR Prediction》](https://arxiv.org/pdf/1911.04690.pdf)文章提出了field-wise bi-interaction pooling技术，解决了在大规模应用特征field信息时存在的时间复杂度和空间复杂度高的困境，同时提出了一种缓解梯度耦合问题的方法dicefactor。该模型已应用于美图的大规模推荐系统中，持续稳定地取得业务效果的全面提升。

本项目在avazu数据集上验证模型效果

## 数据下载及预处理

## 环境

PaddlePaddle 1.7.2

python3.7 

PaddleRec

## 单机训练

CPU环境

在config.yaml文件中设置好设备，epochs等。

```
# select runner by name
mode: [single_cpu_train, single_cpu_infer]
# config of each runner.
# runner is a kind of paddle training class, which wraps the train/infer process.
runner:
- name: single_cpu_train
  class: train
  # num of epochs
  epochs: 4
  # device to run training or infer
  device: cpu
  save_checkpoint_interval: 2 # save model interval of epochs
  save_inference_interval: 4 # save inference
  save_checkpoint_path: "increment_model" # save checkpoint path
  save_inference_path: "inference" # save inference path
  save_inference_feed_varnames: [] # feed vars of save inference
  save_inference_fetch_varnames: [] # fetch vars of save inference
  init_model_path: "" # load model path
  print_interval: 10
  phases: [phase1]
```

## 单机预测

CPU环境

在config.yaml文件中设置好epochs、device等参数。

```
- name: single_cpu_infer
  class: infer
  # num of epochs
  epochs: 1
  # device to run training or infer
  device: cpu #选择预测的设备
  init_model_path: "increment_dnn" # load model path
  phases: [phase2]
```

## 运行

```
python -m paddlerec.run -m paddlerec.models.rank.flen
```

## 模型效果

在样例数据上测试模型

训练：

```
0702 13:38:20.903220  7368 parallel_executor.cc:440] The Program will be executed on CPU using ParallelExecutor, 2 cards are used, so 2 programs are executed in parallel.
I0702 13:38:20.925912  7368 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0702 13:38:20.933356  7368 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
batch: 2, AUC: [0.09090909 0.        ], BATCH_AUC: [0.09090909 0.        ]
batch: 4, AUC: [0.31578947 0.29411765], BATCH_AUC: [0.31578947 0.29411765]
batch: 6, AUC: [0.41333333 0.33333333], BATCH_AUC: [0.41333333 0.33333333]
batch: 8, AUC: [0.4453125  0.44166667], BATCH_AUC: [0.4453125  0.44166667]
batch: 10, AUC: [0.39473684 0.38888889], BATCH_AUC: [0.44117647 0.41176471]
batch: 12, AUC: [0.41860465 0.45535714], BATCH_AUC: [0.5078125  0.54545455]
batch: 14, AUC: [0.43413729 0.42746615], BATCH_AUC: [0.56666667 0.56      ]
batch: 16, AUC: [0.46433566 0.47460087], BATCH_AUC: [0.53       0.59247649]
batch: 18, AUC: [0.44009217 0.44642857], BATCH_AUC: [0.46 0.47]
batch: 20, AUC: [0.42705314 0.43781095], BATCH_AUC: [0.45878136 0.4874552 ]
batch: 22, AUC: [0.45176471 0.46011281], BATCH_AUC: [0.48046875 0.45878136]
batch: 24, AUC: [0.48375    0.48910256], BATCH_AUC: [0.56630824 0.59856631]
epoch 0 done, use time: 0.21532440185546875
PaddleRec Finish
```

预测

```
PaddleRec: Runner single_cpu_infer Begin
Executor Mode: infer
processor_register begin
Running SingleInstance.
Running SingleNetwork.
QueueDataset can not support PY3, change to DataLoader
QueueDataset can not support PY3, change to DataLoader
Running SingleInferStartup.
Running SingleInferRunner.
load persistables from increment_model/0
batch: 20, AUC: [0.49121353], BATCH_AUC: [0.66176471]
batch: 40, AUC: [0.51156463], BATCH_AUC: [0.55197133]
Infer phase2 of 0 done, use time: 0.3941819667816162
PaddleRec Finish
```