1. Huang P S, He X, Gao J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2013: 2333-2338.
2.[Microsoft Learning to Rank Datasets](https://www.microsoft.com/en-us/research/project/mslr/)
3.[Gao J, He X, Deng L. Deep Learning for Web Search and Natural Language Processing[J]. Microsoft Research Technical Report, 2015.](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/wsdm2015.v3.pdf)
Deep Structured Semantic Models (DSSM) is simple but powerful DNN based model for matching web search queries and the URL based documents. This example demonstrates how to use PaddlePaddle to implement a generic DSSM model for modeling the semantic similarity between two strings.
DSSM \[[1](##References)]is a classic semantic model proposed by the Institute of Physics. It is used to study the semantic distance between two texts. The general implementation of DSSM is as follows.
In the original paper \[[1](#References)] the DSSM model uses the implicit semantic relation between the user search query and the document as metric. The model structure is as follows
<palign="center">
<palign="center">
<imgsrc="./images/dssm.png"/><br/><br/>
<imgsrc="./images/dssm.png"/><br/><br/>
图 1. DSSM 原始结构
Figure 1. DSSM In the original paper
</p>
</p>
其贯彻的思想是, **用DNN将高维特征向量转化为低纬空间的连续向量(图中红色框部分)** ,
**在上层用cosin similarity来衡量用户搜索词与候选文档间的语义相关性** 。
在最顶层损失函数的设计上,原始模型使用类似Word2Vec中负例采样的方法,
一个Query会抽取正例 $D+$ 和4个负例 $D-$ 整体上算条件概率用对数似然函数作为损失,
这也就是图 1中类似 $P(D_1|Q)$ 的结构,具体细节请参考原论文。
随着后续优化DSSM模型的结构得以简化\[[3](#参考文献)\],演变为:
With the subsequent optimization of the DSSM model to simplify the structure \[[3](#References)],the model becomes:
The blank box in the figure can be replaced by any model, such as fully connected FC, convoluted CNN, RNN, etc. The structure is designed to measure the semantic distance between two elements (such as strings).
该模型结构专门用于衡量两个元素(比如字符串)间的语义距离。
在现实使用中,DSSM模型会作为基础的积木,搭配上不同的损失函数来实现具体的功能,比如
- 在排序学习中,将 图 2 中结构添加 pairwise rank损失,变成一个排序模型
- 在CTR预估中,对点击与否做0,1二元分类,添加交叉熵损失变成一个分类模型
- 在需要对一个子串打分时,可以使用余弦相似度来计算相似度,变成一个回归模型
本例将尝试面向应用提供一个比较通用的解决方案,在模型任务类型上支持
- 分类
In practice,DSSM model serves as a basic building block, with different loss functions to achieve specific functions, such as
- [-1, 1] 值域内的回归
- Pairwise-Rank
在生成低纬语义向量的模型结构上,本模型支持以下三种:
- In ranking system, the pairwise rank loss function.
- In the CTR estimate, instead of the binary classification on the click, use cross-entropy loss for a classification model
- In regression model, the cosine similarity is used to calculate the similarity
- FC, 多层全连接层
## Model Implementation
- CNN,卷积神经网络
At a high level, DSSM model is composed of three components: the left and right DNN, and loss function on top of them. In complex tasks, the structure of the left DNN and the light DNN can be different. In this example, we keep these two DNN structures the same. And we choose any of FC, CNN, and RNN for the DNN architecture.
- RNN,递归神经网络
## 模型实现
In PaddlePaddle, the loss functions are supported for any of classification, regression, and ranking. Among them, the distance between the left and right DNN is calculated by the cosine similarity. In the classification task, the predicted distribution is calculated by softmax.
- How CNN, FC do text information extraction can refer to [text classification](https://github.com/PaddlePaddle/models/blob/develop/text_classification/README.md#模型详解)
其中,在回归和排序两种损失中,左右两边的匹配程度通过余弦相似度(cossim)来计算;
- The contents of the RNN / GRU can be found in [Machine Translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#gated-recurrent-unit-gru)
在分类任务中,类别预测的分布通过softmax计算。
- For Pairwise Rank learning, please refer to [learn to rank](https://github.com/PaddlePaddle/models/blob/develop/ltr/README.md)
在其它教程中,对上述很多内容都有过详细的介绍,例如:
Figure 3 shows the general architecture for both regression and classification models.
- 如何CNN, FC 做文本信息提取可以参考 [text classification](https://github.com/PaddlePaddle/models/blob/develop/text_classification/README.md#模型详解)
Since the input (embedding table) is a list of the IDs of the words corresponding to a sentence, the word vector table outputs the sequence of word vectors.
In the construction of FC, we use `paddle.layer.pooling` for the maximum pooling operation on the word vector sequence. Then we transform the sequence into a fixed dimensional vector.
1. Huang P S, He X, Gao J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2013: 2333-2338.
1. Huang P S, He X, Gao J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2013: 2333-2338.
2.[Microsoft Learning to Rank Datasets](https://www.microsoft.com/en-us/research/project/mslr/)
2.[Microsoft Learning to Rank Datasets](https://www.microsoft.com/en-us/research/project/mslr/)
<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<divid="markdown"style='display:none'>
# Deep Structured Semantic Models (DSSM)
Deep Structured Semantic Models (DSSM) is simple but powerful DNN based model for matching web search queries and the URL based documents. This example demonstrates how to use PaddlePaddle to implement a generic DSSM model for modeling the semantic similarity between two strings.
## Background Introduction
DSSM \[[1](##References)]is a classic semantic model proposed by the Institute of Physics. It is used to study the semantic distance between two texts. The general implementation of DSSM is as follows.
1. The CTR predictor measures the degree of association between a user search query and a candidate web page.
2. Text relevance, which measures the degree of semantic correlation between two strings.
3. Automatically recommend, measure the degree of association between User and the recommended Item.
## Model Architecture
In the original paper \[[1](#References)] the DSSM model uses the implicit semantic relation between the user search query and the document as metric. The model structure is as follows
<palign="center">
<imgsrc="./images/dssm.png"/><br/><br/>
Figure 1. DSSM In the original paper
</p>
With the subsequent optimization of the DSSM model to simplify the structure \[[3](#References)],the model becomes:
The blank box in the figure can be replaced by any model, such as fully connected FC, convoluted CNN, RNN, etc. The structure is designed to measure the semantic distance between two elements (such as strings).
In practice,DSSM model serves as a basic building block, with different loss functions to achieve specific functions, such as
- In ranking system, the pairwise rank loss function.
- In the CTR estimate, instead of the binary classification on the click, use cross-entropy loss for a classification model
- In regression model, the cosine similarity is used to calculate the similarity
## Model Implementation
At a high level, DSSM model is composed of three components: the left and right DNN, and loss function on top of them. In complex tasks, the structure of the left DNN and the light DNN can be different. In this example, we keep these two DNN structures the same. And we choose any of FC, CNN, and RNN for the DNN architecture.
In PaddlePaddle, the loss functions are supported for any of classification, regression, and ranking. Among them, the distance between the left and right DNN is calculated by the cosine similarity. In the classification task, the predicted distribution is calculated by softmax.
Here we demonstrate:
- How CNN, FC do text information extraction can refer to [text classification](https://github.com/PaddlePaddle/models/blob/develop/text_classification/README.md#模型详解)
- The contents of the RNN / GRU can be found in [Machine Translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#gated-recurrent-unit-gru)
- For Pairwise Rank learning, please refer to [learn to rank](https://github.com/PaddlePaddle/models/blob/develop/ltr/README.md)
Figure 3 shows the general architecture for both regression and classification models.
<palign="center">
<imgsrc="./images/dssm3.jpg"/><br/><br/>
Figure 3. DSSM for REGRESSION or CLASSIFICATION
</p>
The structure of the Pairwise Rank is more complex, as shown in Figure 4.
<palign="center">
<imgsrc="./images/dssm2.jpg"/><br/><br/>
图 4. DSSM for Pairwise Rank
</p>
In below, we describe how to train DSSM model in PaddlePaddle. All the codes are included in `./network_conf.py`.
### Create a word vector table for the text
```python
def create_embedding(self, input, prefix=''):
'''
Create an embedding table whose name has a `prefix`.
'''
logger.info("create embedding table [%s] which dimention is %d" %
(prefix, self.dnn_dims[0]))
emb = paddle.layer.embedding(
input=input,
size=self.dnn_dims[0],
param_attr=ParamAttr(name='%s_emb.w' % prefix))
return emb
```
Since the input (embedding table) is a list of the IDs of the words corresponding to a sentence, the word vector table outputs the sequence of word vectors.
### CNN implementation
```python
def create_cnn(self, emb, prefix=''):
'''
A multi-layer CNN.
@emb: paddle.layer
output of the embedding layer
@prefix: str
prefix of layers' names, used to share parameters between more than one `cnn` parts.
logger.info('create a sequence_conv_pool which context width is 3')
conv_3 = create_conv(3, self.dnn_dims[1], "cnn")
logger.info('create a sequence_conv_pool which context width is 4')
conv_4 = create_conv(4, self.dnn_dims[1], "cnn")
return conv_3, conv_4
```
CNN accepts the word sequence of the embedding table, then process the data by convolution and pooling, and finally outputs a semantic vector.
### RNN implementation
RNN is suitable for learning variable length of the information
```python
def create_rnn(self, emb, prefix=''):
'''
A GRU sentence vector learner.
'''
gru = paddle.layer.gru_memory(input=emb,)
sent_vec = paddle.layer.last_seq(gru)
return sent_vec
```
### FC implementation
```python
def create_fc(self, emb, prefix=''):
'''
A multi-layer fully connected neural networks.
@emb: paddle.layer
output of the embedding layer
@prefix: str
prefix of layers' names, used to share parameters between more than one `fc` parts.
'''
_input_layer = paddle.layer.pooling(
input=emb, pooling_type=paddle.pooling.Max())
fc = paddle.layer.fc(input=_input_layer, size=self.dnn_dims[1])
return fc
```
In the construction of FC, we use `paddle.layer.pooling` for the maximum pooling operation on the word vector sequence. Then we transform the sequence into a fixed dimensional vector.
### Multi-layer DNN implementation
```python
def create_dnn(self, sent_vec, prefix):
# if more than three layers exists, a fc layer will be added.
if len(self.dnn_dims) > 1:
_input_layer = sent_vec
for id, dim in enumerate(self.dnn_dims[1:]):
name = "%s_fc_%d_%d" % (prefix, id, dim)
logger.info("create fc layer [%s] which dimention is %d" %
(name, dim))
fc = paddle.layer.fc(
input=_input_layer,
size=dim,
name=name,
act=paddle.activation.Tanh(),
param_attr=ParamAttr(name='%s.w' % name),
bias_attr=ParamAttr(name='%s.b' % name),
)
_input_layer = fc
return _input_layer
```
### Classification / Regression
The structure of classification and regression is similar. Below function can be used for both tasks.
whether to share network parameters between source and
target
--share_embed SHARE_EMBED
whether to share word embedding between source and
target
--dnn_dims DNN_DIMS dimentions of dnn layers, default is '256,128,64,32',
which means create a 4-layer dnn, demention of each
layer is 256, 128, 64 and 32
-c CLASS_NUM, --class_num CLASS_NUM
number of categories for classification task.
```
Important parameters are
- `data_path` Path for the data to predict
- `prediction_output_path` Prediction output path
## References
1. Huang P S, He X, Gao J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2013: 2333-2338.
2. [Microsoft Learning to Rank Datasets](https://www.microsoft.com/en-us/research/project/mslr/)
3. [Gao J, He X, Deng L. Deep Learning for Web Search and Natural Language Processing[J]. Microsoft Research Technical Report, 2015.](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/wsdm2015.v3.pdf)