From 26368f232ba8a72bd6f318878b1d425d457c3034 Mon Sep 17 00:00:00 2001 From: LiuChiachi Date: Thu, 10 Dec 2020 21:24:13 +0800 Subject: [PATCH] Update paddlenlp.metrics README (#5010) * Update paddlenlp.metrics README * Add info for Mcc * delete DuReader metric * update task name, add API col --- PaddleNLP/paddlenlp/metrics/README.md | 49 +++++++-------------------- 1 file changed, 12 insertions(+), 37 deletions(-) diff --git a/PaddleNLP/paddlenlp/metrics/README.md b/PaddleNLP/paddlenlp/metrics/README.md index 2495e82f..ddb0b89a 100644 --- a/PaddleNLP/paddlenlp/metrics/README.md +++ b/PaddleNLP/paddlenlp/metrics/README.md @@ -1,39 +1,14 @@ # paddlenlp.metrics -## Perplexity -Perplexity is calculated using cross entropy. It supports both padding data -and no padding data. - -If data is not padded, users should provide `seq_len` for `Metric` -initialization. If data is padded, your label should contain `seq_mask`, -which indicates the actual length of samples. - -This Perplexity requires that the output of your network is prediction, -label and sequence length (opitonal). If the Perplexity here doesn't meet -your needs, you could override the `compute` or `update` method for -caculating Perplexity. - -## BLEU -BLEU (bilingual evaluation understudy) is an algorithm for evaluating the -quality of text which has been machine-translated from one natural language -to another. This metric uses a modified form of precision to compare a -candidate translation against multiple reference translations. - -BLEU could be used as `paddle.metrics.Metric` class, or an ordinary -class. - -When BLEU is used as `paddle.metrics.Metric` class. A function is -needed that transforms the network output to reference string list, and -transforms the label to candidate string. By default, a default function -`_default_trans_func` is provided, which gets target sequence id by -calculating the maximum probability of each step. In this case, user must -provide `vocab`. It should be noted that the BLEU here is different from -the BLEU calculated in prediction, and it is only for observation during -training and evaluation. - -## Rouge -### rouge-l - -## dureader -## chunk -## squad +目前paddlenlp提供以下评价指标: + +| Metric | 简介 | API | +| ---------------------------------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ | +| Perplexity | 困惑度,常用来衡量语言模型优劣,也可用于机器翻译、文本生成等任务。 | paddlenlp.metrics.Perplexity | +| BLEU(bilingual evaluation understudy) | 机器翻译常用评价指标 | paddlenlp.metrics.BLEU | +| Rouge-L(Recall-Oriented Understudy for Gisting Evaluation) | 评估自动文摘以及机器翻译的指标 | paddlenlp.metrics.RougeL | +| AccuracyAndF1 | 准确率及F1-score,可用于GLUE中的MRPC 和QQP任务 | paddlenlp.metrics.AccuracyAndF1 | +| PearsonAndSpearman | 皮尔森相关性系数和斯皮尔曼相关系数。可用于GLUE中的STS-B任务 | paddlenlp.metrics.PearsonAndSpearman | +| Mcc(Matthews correlation coefficient) | 马修斯相关系数,用以测量二分类的分类性能的指标。可用于GLUE中的CoLA任务 | paddlenlp.metrics.Mcc | +| ChunkEvaluator | 计算了块检测的精确率、召回率和F1-score。常用于序列标记任务,如命名实体识别(NER) | paddlenlp.metrics.ChunkEvaluator | +| Squad | 用于SQuAD和DuReader-robust的评价指标 | paddlenlp.metrics.compute_predictions paddlenlp.metrics.squad_evaluate | -- GitLab