Update F1 scores & clean evaluation (#2525)

4ccf1884 · Hongyu Li · Yibing Liu · 6b09ec35 · 4ccf1884 · 4ccf1884
Showing with 5 addition and 2 deletion

PaddleNLP/Research/MRQA2019-BASELINE/README.md PaddleNLP/Research/MRQA2019-BASELINE/README.md +2 -2

PaddleNLP/Research/MRQA2019-BASELINE/run_evaluation.sh PaddleNLP/Research/MRQA2019-BASELINE/run_evaluation.sh +3 -0

未找到文件。
--- a/PaddleNLP/Research/MRQA2019-BASELINE/README.md
+++ b/PaddleNLP/Research/MRQA2019-BASELINE/README.md
@@ -73,14 +73,14 @@ Where `parameters_to_restore` is the model parameters used in the evaluatation (

 |      Model     | HotpotQA | NaturalQ | NewsQA | SearchQA | SQuAD | TriviaQA | Macro-F1 |
 | :------------- | :---------: | :----------: | :---------: | :----------: | :---------: | :----------: |:----------: |
-| baseline + EMA | 81.4/65.5 | 81.6/70.0 | 73.1/57.9 | 85.1/79.1 | 93.3/87.1 | 79.0/73.4 | 82.4 |
+| baseline + EMA | 81.4/65.5 | 81.6/69.9 | 73.1/57.9 | 85.1/79.1 | 93.3/87.1 | 79.0/73.4 | 82.4 |
 | baseline woEMA | 82.4/66.9 | 81.7/69.9 | 73.0/57.8 | 85.1/79.2 | 93.4/87.2 | 79.0/73.4 | 82.4 |

 ##### out-of-domain dev  (F1/EM)

 |      Model     | BioASQ | DROP | DuoRC | RACE | RE | Textbook | Macro-F1 |
 | :------------- | :---------: | :----------: | :---------: | :----------: | :---------: | :----------: |:----------: |
-| baseline + EMA | 70.2/54.7 | 57.3/47.5 | 64.1/52.8 | 51.7/37.2 | 87.9/77.7 | 63.1/53.5 | 65.7 |
+| baseline + EMA | 70.2/54.7 | 57.3/47.5 | 64.1/52.8 | 51.7/37.2 | 87.9/77.7 | 63.1/53.6 | 65.7 |
 | baseline woEMA | 69.9/54.6 | 57.0/47.3 | 64.0/52.8 | 51.8/37.4 | 87.8/77.6 | 63.0/53.4 | 65.6 |

 Note that we turn on exponential moving average (EMA) during training by default (in most cases EMA can improve performance) and save EMA parameters into the final checkpoint files. The predicted answers using EMA parameters are saved into `ema_predictions.json`.   

--- a/PaddleNLP/Research/MRQA2019-BASELINE/run_evaluation.sh
+++ b/PaddleNLP/Research/MRQA2019-BASELINE/run_evaluation.sh
@@ -22,6 +22,9 @@ PATH_prediction=./output/ema_predictions.json

 # evaluation
 for dataset in `ls $PATH_dev/*.raw.json`;do
+    if [ "$dataset" = "./data/dev/mrqa-combined.raw.json" ]; then
+        continue
+    fi
    echo $dataset
    python evaluate-v1.1.py $dataset $PATH_prediction
 done