From a554db771e3c0387575aa7a503cc7bd8b83b57d2 Mon Sep 17 00:00:00 2001
From: chenlong <chenlong@csdn.net>
Date: Tue, 25 Jul 2023 10:55:22 +0800
Subject: [PATCH] add readme

---
 README.md | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index a3f87f0..7c158de 100644
--- a/README.md
+++ b/README.md
@@ -24,9 +24,7 @@
 1. 斯坦福大学的评测：AlpacaEval Logo Leaderboard <https://tatsu-lab.github.io/alpaca_eval/>
 2. <https://github.com/the-crypt-keeper/can-ai-code>
 3. <https://github.com/THUDM/CodeGeeX/tree/main/codegeex/benchmark>
-4. https://github.com/the-crypt-keeper/can-ai-code
-5. https://github.com/THUDM/CodeGeeX/tree/main/codegeex/benchmark
-6. https://github.com/openai/human-eval
+4. <https://github.com/openai/human-eval>
 
 
 ## HumanEval-X
@@ -66,10 +64,30 @@ example_test: 提示中出现的公开测例，用于评测。
 4、目前适配了java、python、cpp、js和go等五种语言。
 
 
+## 运行命令
+
+下面是一个使用chatgpt来生成python语言测试数据的样例：
+python generate_humaneval_x.py --input_path ../eval_set/humaneval-x  
+                               --language_type python  
+                               --model_name chatgpt  
+                               --output_prefix ../output/humaneval  
+
+评估样例：
+python evaluate_humaneval_x.py --language_type  python
+                               --input_folder  ../output
+                               --tmp_dir  ../output/tmp/
+                               --n_workers  3
+                               --timeout 500.0  
+                               --problem_folder ../eval_set/humaneval-x/  
+                               --out_dir ../output/  
+                               --k [1, 10, 100]  
+                               --test_groundtruth False  
+                               --example_test False  
+                               --model_name chatgpt  
 
 ## 测试结果
 
-受限于模型推理速度，目前测试了pass@1指标。
+受限于模型推理速度，目前只测试了pass@1指标。
 
 |             | python | java   | cpp    | js     | go      |
 |-------------|--------|--------|--------|--------|---------|
@@ -81,6 +99,6 @@ example_test: 提示中出现的公开测例，用于评测。
 
 
 ## TODO
-1、测试更多开源模型，例如百川，llama2，rwkv。
-2、测试模型的pass@10和pass@100指标。
-3、代码翻译类任务还没有适配，同时也需要构造相关的数据。
+1、测试更多开源模型，例如百川，llama2，rwkv。  
+2、测试模型的pass@10和pass@100指标。  
+3、代码翻译类任务还没有适配，同时也需要构造相关的数据。  
-- 
GitLab