Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleRec
提交
8180c70c
P
PaddleRec
项目概览
PaddlePaddle
/
PaddleRec
通知
68
Star
12
Fork
5
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
27
列表
看板
标记
里程碑
合并请求
10
Wiki
1
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleRec
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
27
Issue
27
列表
看板
标记
里程碑
合并请求
10
合并请求
10
Pages
分析
分析
仓库分析
DevOps
Wiki
1
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
8180c70c
编写于
7月 31, 2020
作者:
M
malin10
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
doc
上级
d6d9d9a5
变更
8
隐藏空白更改
内联
并排
Showing
8 changed file
with
160 addition
and
55 deletion
+160
-55
core/metrics/__init__.py
core/metrics/__init__.py
+3
-2
core/metrics/auc.py
core/metrics/auc.py
+11
-14
core/metrics/binary_class/__init__.py
core/metrics/binary_class/__init__.py
+0
-18
core/metrics/pairwise_pn.py
core/metrics/pairwise_pn.py
+4
-1
core/metrics/precision_recall.py
core/metrics/precision_recall.py
+9
-11
core/metrics/recall_k.py
core/metrics/recall_k.py
+7
-9
doc/metrics.md
doc/metrics.md
+124
-0
doc/model_develop.md
doc/model_develop.md
+2
-0
未找到文件。
core/metrics/__init__.py
浏览文件 @
8180c70c
...
@@ -14,6 +14,7 @@
...
@@ -14,6 +14,7 @@
from
.recall_k
import
RecallK
from
.recall_k
import
RecallK
from
.pairwise_pn
import
PosNegRatio
from
.pairwise_pn
import
PosNegRatio
from
.binary_class
import
*
from
.precision_recall
import
PrecisionRecall
from
.auc
import
AUC
__all__
=
[
'RecallK'
,
'PosNegRatio'
]
+
binary_class
.
__all__
__all__
=
[
'RecallK'
,
'PosNegRatio'
,
'AUC'
,
'PrecisionRecall'
]
core/metrics/
binary_class/
auc.py
→
core/metrics/auc.py
浏览文件 @
8180c70c
...
@@ -26,34 +26,31 @@ class AUC(Metric):
...
@@ -26,34 +26,31 @@ class AUC(Metric):
Metric For Fluid Model
Metric For Fluid Model
"""
"""
def
__init__
(
self
,
**
kwargs
):
def
__init__
(
self
,
input
,
label
,
curve
=
'ROC'
,
num_thresholds
=
2
**
12
-
1
,
topk
=
1
,
slide_steps
=
1
):
""" """
""" """
if
"input"
not
in
kwargs
or
"label"
not
in
kwargs
:
if
not
isinstance
(
input
,
Variable
):
raise
ValueError
(
"AUC expect input and label as inputs."
)
predict
=
kwargs
.
get
(
"input"
)
label
=
kwargs
.
get
(
"label"
)
curve
=
kwargs
.
get
(
"curve"
,
'ROC'
)
num_thresholds
=
kwargs
.
get
(
"num_thresholds"
,
2
**
12
-
1
)
topk
=
kwargs
.
get
(
"topk"
,
1
)
slide_steps
=
kwargs
.
get
(
"slide_steps"
,
1
)
if
not
isinstance
(
predict
,
Variable
):
raise
ValueError
(
"input must be Variable, but received %s"
%
raise
ValueError
(
"input must be Variable, but received %s"
%
type
(
predic
t
))
type
(
inpu
t
))
if
not
isinstance
(
label
,
Variable
):
if
not
isinstance
(
label
,
Variable
):
raise
ValueError
(
"label must be Variable, but received %s"
%
raise
ValueError
(
"label must be Variable, but received %s"
%
type
(
label
))
type
(
label
))
auc_out
,
batch_auc_out
,
[
auc_out
,
batch_auc_out
,
[
batch_stat_pos
,
batch_stat_neg
,
stat_pos
,
stat_neg
batch_stat_pos
,
batch_stat_neg
,
stat_pos
,
stat_neg
]
=
fluid
.
layers
.
auc
(
predic
t
,
]
=
fluid
.
layers
.
auc
(
inpu
t
,
label
,
label
,
curve
=
curve
,
curve
=
curve
,
num_thresholds
=
num_thresholds
,
num_thresholds
=
num_thresholds
,
topk
=
topk
,
topk
=
topk
,
slide_steps
=
slide_steps
)
slide_steps
=
slide_steps
)
prob
=
fluid
.
layers
.
slice
(
predic
t
,
axes
=
[
1
],
starts
=
[
1
],
ends
=
[
2
])
prob
=
fluid
.
layers
.
slice
(
inpu
t
,
axes
=
[
1
],
starts
=
[
1
],
ends
=
[
2
])
label_cast
=
fluid
.
layers
.
cast
(
label
,
dtype
=
"float32"
)
label_cast
=
fluid
.
layers
.
cast
(
label
,
dtype
=
"float32"
)
label_cast
.
stop_gradient
=
True
label_cast
.
stop_gradient
=
True
sqrerr
,
abserr
,
prob
,
q
,
pos
,
total
=
\
sqrerr
,
abserr
,
prob
,
q
,
pos
,
total
=
\
...
...
core/metrics/binary_class/__init__.py
已删除
100755 → 0
浏览文件 @
d6d9d9a5
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
.auc
import
AUC
from
.precision_recall
import
PrecisionRecall
__all__
=
[
'PrecisionRecall'
,
'AUC'
]
core/metrics/pairwise_pn.py
浏览文件 @
8180c70c
...
@@ -28,8 +28,11 @@ class PosNegRatio(Metric):
...
@@ -28,8 +28,11 @@ class PosNegRatio(Metric):
Metric For Fluid Model
Metric For Fluid Model
"""
"""
def
__init__
(
self
,
**
kwargs
):
def
__init__
(
self
,
pos_score
,
neg_score
):
""" """
""" """
kwargs
=
locals
()
del
kwargs
[
'self'
]
helper
=
LayerHelper
(
"PaddleRec_PosNegRatio"
,
**
kwargs
)
helper
=
LayerHelper
(
"PaddleRec_PosNegRatio"
,
**
kwargs
)
if
"pos_score"
not
in
kwargs
or
"neg_score"
not
in
kwargs
:
if
"pos_score"
not
in
kwargs
or
"neg_score"
not
in
kwargs
:
raise
ValueError
(
raise
ValueError
(
...
...
core/metrics/
binary_class/
precision_recall.py
→
core/metrics/precision_recall.py
浏览文件 @
8180c70c
...
@@ -28,19 +28,17 @@ class PrecisionRecall(Metric):
...
@@ -28,19 +28,17 @@ class PrecisionRecall(Metric):
Metric For Fluid Model
Metric For Fluid Model
"""
"""
def
__init__
(
self
,
**
kwargs
):
def
__init__
(
self
,
input
,
label
,
class_num
):
"""R
"""R
"""
"""
if
"input"
not
in
kwargs
or
"label"
not
in
kwargs
or
"class_num"
not
in
kwargs
:
kwargs
=
locals
()
raise
ValueError
(
del
kwargs
[
'self'
]
"PrecisionRecall expect input, label and class_num as inputs."
)
predict
=
kwargs
.
get
(
"input"
)
self
.
num_cls
=
class_num
label
=
kwargs
.
get
(
"label"
)
self
.
num_cls
=
kwargs
.
get
(
"class_num"
)
if
not
isinstance
(
input
,
Variable
):
if
not
isinstance
(
predict
,
Variable
):
raise
ValueError
(
"input must be Variable, but received %s"
%
raise
ValueError
(
"input must be Variable, but received %s"
%
type
(
predic
t
))
type
(
inpu
t
))
if
not
isinstance
(
label
,
Variable
):
if
not
isinstance
(
label
,
Variable
):
raise
ValueError
(
"label must be Variable, but received %s"
%
raise
ValueError
(
"label must be Variable, but received %s"
%
type
(
label
))
type
(
label
))
...
@@ -48,7 +46,7 @@ class PrecisionRecall(Metric):
...
@@ -48,7 +46,7 @@ class PrecisionRecall(Metric):
helper
=
LayerHelper
(
"PaddleRec_PrecisionRecall"
,
**
kwargs
)
helper
=
LayerHelper
(
"PaddleRec_PrecisionRecall"
,
**
kwargs
)
label
=
fluid
.
layers
.
cast
(
label
,
dtype
=
"int32"
)
label
=
fluid
.
layers
.
cast
(
label
,
dtype
=
"int32"
)
label
.
stop_gradient
=
True
label
.
stop_gradient
=
True
max_probs
,
indices
=
fluid
.
layers
.
nn
.
topk
(
predic
t
,
k
=
1
)
max_probs
,
indices
=
fluid
.
layers
.
nn
.
topk
(
inpu
t
,
k
=
1
)
indices
=
fluid
.
layers
.
cast
(
indices
,
dtype
=
"int32"
)
indices
=
fluid
.
layers
.
cast
(
indices
,
dtype
=
"int32"
)
indices
.
stop_gradient
=
True
indices
.
stop_gradient
=
True
...
...
core/metrics/recall_k.py
浏览文件 @
8180c70c
...
@@ -29,23 +29,21 @@ class RecallK(Metric):
...
@@ -29,23 +29,21 @@ class RecallK(Metric):
Metric For Fluid Model
Metric For Fluid Model
"""
"""
def
__init__
(
self
,
**
kwargs
):
def
__init__
(
self
,
input
,
label
,
k
=
20
):
""" """
""" """
if
"input"
not
in
kwargs
or
"label"
not
in
kwargs
:
kwargs
=
locals
()
raise
ValueError
(
"RecallK expect input and label as inputs."
)
del
kwargs
[
'self'
]
predict
=
kwargs
.
get
(
'input'
)
self
.
k
=
k
label
=
kwargs
.
get
(
'label'
)
self
.
k
=
kwargs
.
get
(
"k"
,
20
)
if
not
isinstance
(
predic
t
,
Variable
):
if
not
isinstance
(
inpu
t
,
Variable
):
raise
ValueError
(
"input must be Variable, but received %s"
%
raise
ValueError
(
"input must be Variable, but received %s"
%
type
(
predic
t
))
type
(
inpu
t
))
if
not
isinstance
(
label
,
Variable
):
if
not
isinstance
(
label
,
Variable
):
raise
ValueError
(
"label must be Variable, but received %s"
%
raise
ValueError
(
"label must be Variable, but received %s"
%
type
(
label
))
type
(
label
))
helper
=
LayerHelper
(
"PaddleRec_RecallK"
,
**
kwargs
)
helper
=
LayerHelper
(
"PaddleRec_RecallK"
,
**
kwargs
)
batch_accuracy
=
accuracy
(
predic
t
,
label
,
self
.
k
)
batch_accuracy
=
accuracy
(
inpu
t
,
label
,
self
.
k
)
global_ins_cnt
,
_
=
helper
.
create_or_get_global_variable
(
global_ins_cnt
,
_
=
helper
.
create_or_get_global_variable
(
name
=
"ins_cnt"
,
persistable
=
True
,
dtype
=
'float32'
,
shape
=
[
1
])
name
=
"ins_cnt"
,
persistable
=
True
,
dtype
=
'float32'
,
shape
=
[
1
])
global_pos_cnt
,
_
=
helper
.
create_or_get_global_variable
(
global_pos_cnt
,
_
=
helper
.
create_or_get_global_variable
(
...
...
doc/metrics.md
0 → 100644
浏览文件 @
8180c70c
# 如何给模型增加Metric
## PaddleRec Metric使用示例
```
from paddlerec.core.model import ModelBase
from paddlerec.core.metrics import RecallK
class Model(ModelBase):
def __init__(self, config):
ModelBase.__init__(self, config)
def net(self, inputs, is_infer=False):
...
acc = RecallK(input=logits, label=label, k=20)
self._metrics["Train_P@20"] = acc
```
## Metric类
### 成员变量
> _global_metric_state_vars(dict),
字典类型,用以存储metric计算过程中需要的中间状态变量。一般情况下,这些中间状态需要是Persistable=True的变量,所以会在模型保存的时候也会被保存下来。因此infer阶段需手动将这些中间状态值清零,进而保证预测结果的正确性。
### 成员函数
> clear(self, scope):
从scope中将self._global_metric_state_vars中的状态值全清零。该函数一般用在
**infer**
阶段开始的时候。用以保证预测指标的正确性。
> calc_global_metrics(self, fleet, scope=None):
将self._global_metric_state_vars中的状态值在所有训练节点上做all_reduce操作,进而下一步调用_calculate()函数计算全局指标。若fleet=None,则all_reduce的结果为自己本身,即单机全局指标计算。
> get_result(self): 返回训练过程中需要fetch,并定期打印至屏幕的变量。返回类型为dict。
## Metrics
### AUC
> AUC(input ,label, curve='ROC', num_thresholds=2**12 - 1, topk=1, slide_steps=1)
Auc,全称Area Under the Curve(AUC),该层根据前向输出和标签计算AUC,在二分类(binary classification)估计中广泛使用。在二分类(binary classification)中广泛使用。相关定义参考 https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve 。
#### 参数
-
**input(Tensor|LoDTensor)**
: 数据类型为float32,float64。浮点二维变量。输入为网络的预测值。shape为[batch_size, 2]。
-
**label(Tensor|LoDTensor)**
: 数据类型为int64,int32。输入为数据集的标签。shape为[batch_size, 1]。
-
**curve(str)**
: 曲线类型,可以为 ROC 或 PR,默认 ROC。
-
**num_thresholds(int)**
: 将roc曲线离散化时使用的临界值数。默认200。
-
**topk(int)**
: 取topk的输出值用于计算。
-
**slide_steps(int)**
: - 当计算batch auc时,不仅用当前步也用于先前步。slide_steps=1,表示用当前步;slide_steps = 3表示用当前步和前两步;slide_steps = 0,则用所有步。
#### 返回值
该指标训练过程中定期的变量有两个:
-
**AUC**
: 整体AUC值
-
**BATCH_AUC**
:当前batch的AUC值
### PrecisionRecall
> PrecisionRecall(input, label, class_num)
计算precison, recall, f1。
#### 参数
-
**input(Tensor|LoDTensor)**
: 数据类型为float32,float64。输入为网络的预测值。shape为[batch_size, class_num]
-
**label(Tensor|LoDTensor)**
: 数据类型为int32。输入为数据集的标签。shape为 [batch_size, 1]
-
**class_num(int)**
: 类别个数。
#### 返回值
-
**[TP FP TN FN]**
: 形状为[class_num, 4]的变量,用以表征每种类型的TP,FP,TN和FN值。TP=true positive, FP=false positive, TN=true negative, FN=false negative。若需计算每种类型的precison, recall,f1, 则可根据如下公式进行计算:
precision = TP / (TP + FP); recall = TP = TP / (TP + FN); F1 = 2
* precision *
recall / (precision + recall)。
-
**precision_recall_f1**
: 形状为[6],分别代表[macro_avg_precision, macro_avg_recall, macro_avg_f1, micro_avg_precision, micro_avg_recall, micro_avg_f1],这里macro代表先计算每种类型的准确率,召回率,F1,然后求平均。micro代表先计算所有类型的整体TP,TN, FP, FN等中间值,然后在计算准确率,召回率,F1.
### RecallK
> RecallK(input, label, k=20)
TopK的召回准确率,对于任意一条样本来说,若前top_k个分类结果中包含正确分类标签,则视为正样本。
#### 参数
-
**input(Tensor|LoDTensor)**
: 数据类型为float32,float64。输入为网络的预测值。shape为[batch_size, class_dim]
-
**label(Tensor|LoDTensor)**
: 数据类型为int64,int32。输入为数据集的标签。shape为 [batch_size, 1]
-
**k(int)**
: 取每个类别中top_k个预测值用于计算召回准确率。
#### 返回值
-
**InsCnt**
:样本总数
-
**RecallCnt**
: topk可以正确被召回的样本数
-
**Acc(Recall@k)**
: RecallCnt/InsCnt,即Topk召回准确率。
## PairWise_PN
> PosNegRatio(pos_score, neg_score)
正逆序指标,一般用在输入是pairwise的模型中。例如输入既包含正样本,也包含负样本,模型需要去学习最大化正负样本打分的差异。
#### 参数
-
**pos_score(Tensor|LoDTensor)**
: 正样本的打分,数据类型为float32,float64。浮点二维变量,值的范围为[0,1]。
-
**neg_score(Tensor|LoDTensor)**
:负样本的打分。数据类型为float32,float64。浮点二维变量,值的范围为[0,1]。
#### 返回值
-
**RightCnt**
: pos_score > neg_score的样本数
-
**WrongCnt**
: pos_score <= neg_score的样本数
-
**PN**
: (RightCnt + 1.0) / (WrongCnt + 1.0), 正逆序,+1.0是为了避免除0错误。
### Customized_Metric
如果你需要在自定义metric,那么你需要按如下步骤操作:
1.
继承paddlerec.core.Metric,定义你的MyMetric类。
2.
在MyMetric的构造函数中,自定义Metric组网,声明self._global_metric_state_vars私有变量。
3.
定义_calculate(global_metrics),全局指标计算。该函数的输入globla_metrics,存储了self._global_metric_state_vars中所有中间状态变量的全局统计值。最终结果以str格式返回。
自定义Metric模版如下,你可以参考注释,或paddlerec.core.metrics下已经实现的precision_recall, auc, pairwise_pn, recall_k等指标的计算方式,自定义自己的Metric类。
```
from paddlerec.core.Metric import Metric
class MyMetric(Metric):
def __init__(self):
# 1. 自定义Metric组网
** 1. your code **
# 2. 设置中间状态字典
self._global_metric_state_vars = dict()
** 2. your code **
def get_result(self):
# 3. 定义训练过程中需要打印的变量,以字典格式返回
self. _metrics = dict()
** 3. your code **
def _calculate(self, global_metrics):
# 4. 全局指标计算,global_metrics为字典类型,存储了self._global_metric_state_vars中所有中间状态变量的全局统计值。返回格式为str。
** your code **
```
doc/model_develop.md
浏览文件 @
8180c70c
...
@@ -113,6 +113,8 @@ def input_data(self, is_infer=False, **kwargs):
...
@@ -113,6 +113,8 @@ def input_data(self, is_infer=False, **kwargs):
可以参考官方模型的示例学习net的构造方法。
可以参考官方模型的示例学习net的构造方法。
除可以使用Paddle的Metrics接口外,PaddleRec也统一封装了一些常见的Metrics评价指标,并允许开发者定义自己的Metrics类,相关文件参考
[
Metrics开发文档
](
metrics.md
)
。
## 如何运行自定义模型
## 如何运行自定义模型
记录
`model.py`
,
`config.yaml`
及数据读取
`reader.py`
的文件路径,建议置于同一文件夹下,如
`/home/custom_model`
下,更改
`config.yaml`
中的配置选项
记录
`model.py`
,
`config.yaml`
及数据读取
`reader.py`
的文件路径,建议置于同一文件夹下,如
`/home/custom_model`
下,更改
`config.yaml`
中的配置选项
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录