add readme.md

d1158e1a · CSDN-Ada助手 · 89305fa0 · d1158e1a · d1158e1a · d1158e1a
13 changed file
--- a/.gitignore
+++ b/.gitignore
@@ -3,3 +3,4 @@
 .DS_Store
 __pycache__
 *.pyc
+output
--- a/README.md
+++ b/README.md
@@ -24,3 +24,63 @@
 1. 斯坦福大学的评测：AlpacaEval Logo Leaderboard <https://tatsu-lab.github.io/alpaca_eval/>
 2. <https://github.com/the-crypt-keeper/can-ai-code>
 3. <https://github.com/THUDM/CodeGeeX/tree/main/codegeex/benchmark>
+4. https://github.com/the-crypt-keeper/can-ai-code
+5. https://github.com/THUDM/CodeGeeX/tree/main/codegeex/benchmark
+6. https://github.com/openai/human-eval
+
+
+## HumanEval-X
+HumanEval-X中每个语言的样本，包含了声明、描述和解答，它们之间的组合可以支持不同的下游任务，包括生成、翻译、概括等。我们目前关注两个任务：代码生成与代码翻译。对于代码生成任务，模型将函数声明与文档字符串作为输入，输出函数实现；对于代码翻译任务，模型将两种语言的函数声明与源语言的实现作为输入，输出目标语言上的实现。我们在代码翻译任务中不将文档字符串输入模型，以避免模型直接通过描述生成答案。在两种任务下，我们都采用Codex所使用的无偏pass@k指标：
+pass@k:=𝔼[1−(<sup>n-c</sup><sub>k</sub>)/(<sup>n</sup><sub>k</sub>)], n=200,  k∈(1,10,100)。
+
+样本使用JSON列表格式存储在codegeex/benchmark/humaneval-x/[LANG]/data/humaneval_[LANG].jsonl.gz，每条样本包含6个部分：
+
+task_id: 题目的目标语言与ID。语言为["Python", "Java", "JavaScript", "CPP", "Go"]中之一。
+prompt: 函数声明与描述，用于代码生成。
+declaration: 仅有函数声明，用于代码翻译。
+canonical_solution: 手写的示例解答。
+test: 隐藏测例，用于评测。
+example_test: 提示中出现的公开测例，用于评测。
+
+评测生成的代码需要使用多种语言编译、运行。我们使用的各编程语言依赖及所用包的版本如下：
+
+|  依赖 | 版本  |
+|  ----  | ----  |
+|  Python  | 3.8.12  |
+| JDK | 18.0.2.1 |
+| Node.js | 16.14.0 |
+| js-md5 | 0.7.3 |
+| C++ | 11 |
+| g++ | 7.5.0 |
+| Boost | 1.71.0 |
+| OpenSSL | 3.0.0 |
+| go | 1.18.4 |
+	
+
+
+## 我们的工作
+
+1、基于清华的HumanEval-X，进行了集成，修改了代码生成任务的结构；  
+2、多模型配置，可以配置模型参数，以及是调取接口还是本地推理；  
+3、优化了代码块方法体抽取的逻辑；  
+4、目前适配了java、python、cpp、js和go等五种语言。
+
+
+
+## 测试结果
+
+受限于模型推理速度，目前测试了pass@1指标。
+
+|             | python | java   | cpp    | js     | go      |
+|-------------|--------|--------|--------|--------|---------|
+| chatgpt     | 64.02% | 15.85% | 26.22% | 47.00% | 31.70%  |
+| bbt-7B      | 0.61%  | 1.83%  | 1.22%  | 1.83%  | 0%      |
+| chatglm2-7B | 7.93%  | 5.45%  | 0.61%  | 6.70%  | 1.83%   |
+
+
+
+
+## TODO
+1、测试更多开源模型，例如百川，llama2，rwkv。
+2、测试模型的pass@10和pass@100指标。
+3、代码翻译类任务还没有适配，同时也需要构造相关的数据。
--- a/eval_set/humaneval-x/cpp/data/humaneval_cpp.jsonl
+++ b/eval_set/humaneval-x/cpp/data/humaneval_cpp.jsonl
--- a/eval_set/humaneval-x/go/data/humaneval_go.jsonl
+++ b/eval_set/humaneval-x/go/data/humaneval_go.jsonl
--- a/eval_set/humaneval-x/java/data/humaneval_java.jsonl
+++ b/eval_set/humaneval-x/java/data/humaneval_java.jsonl
--- a/eval_set/humaneval-x/js/data/humaneval_js.jsonl
+++ b/eval_set/humaneval-x/js/data/humaneval_js.jsonl
--- a/eval_set/humaneval-x/python/data/humaneval_python.jsonl
+++ b/eval_set/humaneval-x/python/data/humaneval_python.jsonl
--- a/eval_set/humaneval-x/rust/data/humaneval_rust.jsonl
+++ b/eval_set/humaneval-x/rust/data/humaneval_rust.jsonl
--- a/llm_set/params/bbt.json
+++ b/llm_set/params/bbt.json
+{
+  "url": "https://codebbt.ssymmetry.com/test3/code_bbt",
+  "user_id": "csdnTest_0001",
+  "stream": 0,
+  "timeout": 200,
+  "max_new_tokens": 1024
+}
\ No newline at end of file
--- a/llm_set/params/chatglm2.json
+++ b/llm_set/params/chatglm2.json
 {
-  "model_path": "llm_set/models/chatglm2-6b",
+  "model_path": "../llm_set/models/chatglm2-6b",
  "device": "cuda:0",
  "quantize": false,
  "max_length": 1024,

--- a/llm_set/params/chatgpt.json
+++ b/llm_set/params/chatgpt.json
@@ -2,5 +2,5 @@
  "url": "http://47.74.27.238/conversation.php",
  "id": 1001059396431415,
  "stream": 0,
-  "timeout": 20
+  "timeout": 200
 }
\ No newline at end of file
--- a/src/inference/chatglm2_inference.py
+++ b/src/inference/chatglm2_inference.py
@@ -60,5 +60,5 @@ class GLMInference(Inference):
                temperature=self.temperature
            )
            output = self.tokenizer.decode([el.item() for el in generation_output[0]])
-            sentence = output.split("ChatCSDN:")[1].strip()
+            sentence = output.strip()
        return sentence
--- a/src/utils.py
+++ b/src/utils.py
@@ -281,6 +281,7 @@ def cleanup_code(code: str, sample, language_type: str = None):

        if method_lines:
            method_body = "\n".join(method_lines[1:])
+            print(method_body)

    elif language_type.lower() == "cpp":
        method_lines = []