add readme

a554db77 · CSDN-Ada助手 · d1158e1a · a554db77
隐藏空白更改
内联并排

Showing with 25 addition and 7 deletion

README.md README.md +25 -7

未找到文件。
--- a/README.md
+++ b/README.md
@@ -24,9 +24,7 @@
 1. 斯坦福大学的评测：AlpacaEval Logo Leaderboard <https://tatsu-lab.github.io/alpaca_eval/>
 2. <https://github.com/the-crypt-keeper/can-ai-code>
 3. <https://github.com/THUDM/CodeGeeX/tree/main/codegeex/benchmark>
-4. https://github.com/the-crypt-keeper/can-ai-code
+4. <https://github.com/openai/human-eval>
-5. https://github.com/THUDM/CodeGeeX/tree/main/codegeex/benchmark
-6. https://github.com/openai/human-eval
 ## HumanEval-X
@@ -66,10 +64,30 @@ example_test: 提示中出现的公开测例，用于评测。
 4、目前适配了java、python、cpp、js和go等五种语言。
+## 运行命令
+下面是一个使用chatgpt来生成python语言测试数据的样例：
+python generate_humaneval_x.py --input_path ../eval_set/humaneval-x  
+                               --language_type python  
+                               --model_name chatgpt  
+                               --output_prefix ../output/humaneval  
+评估样例：
+python evaluate_humaneval_x.py --language_type  python
+                               --input_folder  ../output
+                               --tmp_dir  ../output/tmp/
+                               --n_workers  3
+                               --timeout 500.0  
+                               --problem_folder ../eval_set/humaneval-x/  
+                               --out_dir ../output/  
+                               --k [1, 10, 100]  
+                               --test_groundtruth False  
+                               --example_test False  
+                               --model_name chatgpt  
 ## 测试结果
-受限于模型推理速度，目前测试了pass@1指标。
+受限于模型推理速度，目前只测试了pass@1指标。
 |             | python | java   | cpp    | js     | go      |
 |-------------|--------|--------|--------|--------|---------|
@@ -81,6 +99,6 @@ example_test: 提示中出现的公开测例，用于评测。
 ## TODO
-1、测试更多开源模型，例如百川，llama2，rwkv。
+1、测试更多开源模型，例如百川，llama2，rwkv。  
-2、测试模型的pass@10和pass@100指标。
+2、测试模型的pass@10和pass@100指标。  
-3、代码翻译类任务还没有适配，同时也需要构造相关的数据。
+3、代码翻译类任务还没有适配，同时也需要构造相关的数据。