提交 8dd9debf 编写于 作者: alex45854361's avatar alex45854361 👀

Update 全球大模型产品库.md

增加“截至2023年5月,大型预训练模型列表”
上级 fa0ee3f2
...@@ -54,6 +54,229 @@ ...@@ -54,6 +54,229 @@
| 面壁智能 | CPM | 曾国洋,面壁智能CTO | | 面壁智能 | CPM | 曾国洋,面壁智能CTO |
## 截至2023年5月,大型预训练模型列表
|#|Name|Release Date|Domain|Affiliation|# of Parameters|Language|Paper/News|Model|Code|API|
| :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
|1|ChatGPT|2022/11|Text|OpenAI|~100B|Multilingual|https://openai.com/blog/chatgpt/|-|-|https://chat.openai.com/
|2|Galactica|2022/11|Text|Meta|1.3B , 6.7B , 30B , 120B|English|https://arxiv.org/abs/2211.09085|-|-|-
|3|ERNIE-ViLG 2.0|2022/10|Text/Vision|Baidu|24B|Chinese|https://arxiv.org/abs/2210.15257|-|-|-
|4|WeLM|2022/10|Text|Tencent|1.3B , 2.7B , 10B|Chinese|https://arxiv.org/abs/2209.10372|-|-|https://welm.weixin.qq.com/docs/api/
|5|Magneto|2022/10|Text|Microsoft|1B|English|https://arxiv.org/abs/2210.06423|-|-|-
|6|Imagen Video|2022/10|Text/Vision|Google|11.6B|English|https://arxiv.org/abs/2210.02303|-|-|-
|7|Whisper|2022/9|Audio|OpenAI|1.55B|Multilingual|https://cdn.openai.com/papers/whisper.pdf|https://github.com/openai/whisper|https://github.com/openai/whisper|-
|8|Sparrow|2022/9|Text|DeepMind|70B|English|https://arxiv.org/abs/2209.14375|-|-|-
|9|CodeGeeX|2022/9|Code|Tsinghua University/Peng Cheng Laboratory/Zhipu.AI|13B|-|http://keg.cs.tsinghua.edu.cn/codegeex/|-|https://github.com/THUDM/CodeGeeX|-
|10|CPM-Ant|2022/9|Text|OpenBMB|1B , 3B , 7B , 10B|Chinese|https://www.openbmb.org/en/community/blogs/blogpage?id=98afef2ce45f4fe9a4bc15a66d7ccb92|https://github.com/OpenBMB/CPM-Live/tree/master/cpm-live#model-checkpoints|https://github.com/OpenBMB/CPM-Live/tree/master/cpm-live|-
|11|PaLI|2022/9|Text/Vision|Google|3B , 15B , 17B|Multilingual|https://arxiv.org/abs/2209.06794|-|-|-
|12|BEiT-3|2022/8|Text/Vision|Microsoft|1.9B|English|https://arxiv.org/abs/2208.10442|https://huggingface.co/docs/transformers/main/model_doc/beit|https://github.com/microsoft/unilm/tree/master/beit|-
|13|Atlas|2022/8|Text|MetaENS, PSL University/University College LondonInria|3B , 11B|English|https://arxiv.org/abs/2208.03299|-|-|-
|14|GLM-130B|2022/8|Text|Tsinghua University/Zhipu.AI|130B|EnglishChinese|http://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/|https://docs.google.com/forms/d/e/1FAIpQLSehr5Dh_i3TwACmFFi8QEgIVNYGmSPwV0GueIcsUev0NEfUug/viewform|https://github.com/THUDM/GLM-130B|-
|15|AlexaTM 20B|2022/8|Text|Amazon|20B|Multilingual|https://arxiv.org/abs/2208.01448|https://github.com/amazon-research/alexa-teacher-models|-|-
|16|FIM|2022/7|Code/Text|OpenAI|1.4B , 2.8B , 6.9B|English|https://arxiv.org/abs/2207.14255|-|-|-
|17|PanGu-Coder|2022/7|Code|Huawei|2.6B|-|https://arxiv.org/abs/2207.11280|-|-|-
|18|ESM-2|2022/7|Protein|Meta/New York University/Stanford University/MIT|3B , 15B|-|https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1|-|-|-
|19|BLOOM(BLOOM & Mt0/BLOOMZ)|2022/7|Text|BigScience|1.3B , 2.5B , 6.3B , 175B|Multilingual|https://bigscience.huggingface.co/blog/bloom|https://huggingface.co/bigscience/bloom/tree/main|https://github.com/huggingface/transformers/tree/main/src/transformers/models/bloom|-
|20|NLLB|2022/7|Text|MetaUC Berkeley/Johns Hopkins University|dense: 1.3B , 3.3B; MoE:54.5B|Multilingual|https://arxiv.org/abs/2207.04672|https://github.com/facebookresearch/fairseq/blob/nllb/README.md#multilingual-translation-models|https://github.com/facebookresearch/fairseq/blob/nllb/examples/nllb/modeling/README.md|-
|21|Minerva|2022/6|Text|Google|8B , 62B , 540B|English|https://arxiv.org/abs/2206.14858|-|-|-
|22|ProGen2|2022/6|Protein|Salesforce/Johns Hopkins University/Columbia University|2.7B , 6.4B|-|https://arxiv.org/pdf/2206.13517.pdf|https://storage.googleapis.com/sfr-progen-research/checkpoints/progen2-xlarge.tar.gz|https://github.com/salesforce/progen|-
|23|LIMoE|2022/6|Text/Vision|Google|5.6B|English|https://arxiv.org/abs/2206.02770|-|-|-
|24|YaLM|2022/6|Text|Yandex|100B|EnglishRussian|https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6|https://github.com/yandex/YaLM-100B#downloading-checkpoint|https://github.com/yandex/YaLM-100B|-
|25|Parti|2022/6|Text/Vision|Google|3B , 20B|English|https://arxiv.org/abs/2206.10789|-|-|-
|26|GODEL|2022/6|Text|Microsoft/Columbia University|2.7B|English|https://arxiv.org/abs/2206.11309|https://github.com/Microsoft/GODEL#models|https://github.com/Microsoft/GODEL|-
|27|Unified-IO|2022/6|Text/Vision|AI2/University of Washington|2.8B|English|https://arxiv.org/abs/2206.08916|-|-|-
|28|AlexaTM|2022/6|Text|Amazon|2.68B , 9.9B|Multilingual|https://arxiv.org/abs/2206.07808|-|-|-
|29|SwinV2-MoE|2022/6|Vision|Microsoft|1B , 2B|-|https://arxiv.org/abs/2206.03382|-|https://github.com/microsoft/Swin-Transformer|-
|30|OBERT|2022/6|Text|OPPO|~1B|Chinese|https://blog.51cto.com/u_15273780/5440502|-|-|-
|31|CogVideo|2022/5|Text/Vision|Tsinghua University/BAAI|9.4B|Chinese|https://arxiv.org/abs/2205.15868|https://github.com/THUDM/CogVideo#download|https://github.com/THUDM/CogVideo|-
|32|Imagen|2022/5|Text/Vision|Google|7.6B|English|https://arxiv.org/abs/2205.11487|-|-|-
|33|ERNIE 3.0 Zeus|2022/5|Text|Baidu|~100B|Chinese|https://baijiahao.baidu.com/s?id=1733603775259242015&wfr=spider&for=pc|-|-|https://wenxin.baidu.com/younger/apiDetail?id=20006
|34|RankGen|2022/5|Text|University of Massachusetts Amherst/ Google|1.2B|English|https://arxiv.org/abs/2205.09726|https://huggingface.co/kalpeshk2011|https://github.com/martiansideofthemoon/rankgen|-
|35|Gato|2022/5|Text/Vision|DeepMind|1.2B|English|https://arxiv.org/abs/2205.06175|-|-|-
|36|HunYuan(混元)|2022/5|Text|Tencent|~10B|Chinese|http://ex.chinadaily.com.cn/exchange/partners/82/rss/channel/cn/columns/snl9a7/stories/WS628df605a3101c3ee7ad730e.html|-|-|-
|37|UL2|2022/5|Text|Google|20B|English|https://arxiv.org/abs/2205.05131|https://github.com/google-research/google-research/tree/master/ul2#checkpoints|https://github.com/google-research/t5x|-
|38|CoCa|2022/5|Text/Vision|Google|2.1B|English|https://arxiv.org/abs/2205.01917|-|-|-
|39|OPT|2022/5|Text|Meta|1.3B , 2.7B , 6.7B , 13B , 30B , 66B , 175B|English|https://arxiv.org/abs/2205.01068|https://github.com/facebookresearch/metaseq/tree/main/projects/OPT#pretrained-model-weights|https://github.com/facebookresearch/metaseq|-
|40|Flamingo|2022/4|Text/Vision|DeepMind|80B|English|https://arxiv.org/abs/2204.14198|-|-|-
|41|CogView2|2022/4|Text/Vision|Tsinghua University/ BAAI|6B|English/Chinese|https://arxiv.org/abs/2204.14217|https://model.baai.ac.cn/model-detail/100041|https://github.com/THUDM/CogView2|-
|42|mGPT|2022/4|Text|SberDevices/ HSE University/ AI Research Institute|1.3B , 13B|Multilingual|https://arxiv.org/abs/2204.07580|https://huggingface.co/sberbank-ai/mGPT|https://github.com/ai-forever/mgpt|-
|43|GPT-NeoX|2022/4|Text|EleutherAI|20B|English|https://arxiv.org/abs/2204.06745|https://github.com/EleutherAI/gpt-neox#download-links|https://github.com/EleutherAI/gpt-neox|-
|44|NOOR|2022/4|Text|Technology Innovation Institute|10B|Arabic|https://www.tii.ae/news/technology-innovation-institute-announces-launch-noor-worlds-largest-arabic-nlp-model|-|-|-
|45|METRO-LM|2022/4|Text|Microsoft|5.4B|English|https://arxiv.org/abs/2204.06644|-|-|-
|46|DALL-E 2|2022/4|Text/Vision|OpenAI|6.5B|English|https://arxiv.org/abs/2204.06125|-|-|https://labs.openai.com/waitlist
|47|InCoder|2022/4|Code|Facebook/University of Washington/UC Berkeley/TTIC/CMU|1.3B , 6.7B|-|https://arxiv.org/abs/2204.05999|https://sites.google.com/view/incoder-code-models|https://github.com/dpfried/incoder/blob/main/README.md|-
|48|PaLM|2022/4|Text|Google|8B , 62B , 540B|English|https://arxiv.org/abs/2204.02311|-|-|-
|49|Chinchilla|2022/3|Text|DeepMind|70B|English|https://arxiv.org/abs/2203.15556|-|-|-
|50|Benetnasch(瑶光)|2022/3|Text|Singularity AI|~10B|Chinese|https://vr.sina.com.cn/2022-03-28/doc-imcwiwss8619202.shtml|-|-|https://openapi.singularity-ai.com/index.html#/
|51|CodeGen|2022/3|Code|Salesforce|2.7B , 6.1B , 16.1B|-|https://arxiv.org/abs/2203.13474|https://github.com/salesforce/CodeGen#setup|https://github.com/salesforce/CodeGen|-
|52|EVA-2|2022/3|Text|Tsinghua University/ BAAI/ York University|2.8B|Chinese|https://arxiv.org/abs/2203.09313|https://wudaoai.cn/model/detail/EVA#download|https://github.com/thu-coai/EVA/|-
|53|AlphaCode|2022/3|Code|DeepMind|1.1B , 2.8B , 8.7B , 41.1B|-|https://arxiv.org/abs/2203.07814|-|-|-
|54|InstructGPT|2022/3|Text|OpenAI|1.3B , 6B , 175B|English|https://arxiv.org/abs/2203.02155|-|-|-
|55|DeepNet|2022/3|Text|Microsoft|3.2B|English|https://arxiv.org/abs/2203.00555|-|-|-
|56|PolyCoder|2022/2|Code|CMU|2.7B|-|https://arxiv.org/abs/2202.13169|https://github.com/VHellendoorn/Code-LMs#available-models|https://github.com/VHellendoorn/Code-LMs|-
|57|SEER|2022/2|Vision|MetaInria|1.5B , 10B|-|https://arxiv.org/abs/2202.08360|https://github.com/facebookresearch/vissl/tree/main/projects/SEER#pretrained-models-weights|https://github.com/facebookresearch/vissl/tree/main/projects/SEER|-
|58|Cedille|2022/2|Text|Cedille AI|6B|French|https://arxiv.org/abs/2202.03371|https://github.com/coteries/cedille-ai#mesh-transformer|https://github.com/coteries/cedille-ai#why-is-this-repository-empty|https://app.cedille.ai/
|59|Megatron-Turing NLG|2022/1|Text|Microsoft/ NVIDIA|530B|English|https://arxiv.org/abs/2201.11990|-|-|-
|60|LaMDA|2022/1|Text|Google|2B , 8B , 137B|English|https://arxiv.org/abs/2201.08239|-|-|-
|61|CM3|2022/1|Text/Vision|Facebook|2.7B , 13B|English|https://arxiv.org/abs/2201.07520|-|-|-
|62|ERNIE-ViLG|2021/12|Text/Vision|Baidu|10B|Chinese|https://arxiv.org/pdf/2112.15283.pdf|-|-|https://wenxin.baidu.com/younger/apiDetail?id=20008
|63|ERNIE 3.0 Titan |2021/12|Text|Peng Cheng Laboratory/ Baidu|260B|Chinese|https://arxiv.org/abs/2112.12731|-|-|-
|64|XGLM|2021/12|Text|Meta|1.7B , 2.9B , 7.5B|Multilingual|https://arxiv.org/abs/2112.10668|https://github.com/facebookresearch/fairseq/tree/main/examples/xglm#pre-trained-models|https://github.com/facebookresearch/fairseq/tree/main/examples/xglm|-
|65|MOE LM|2021/12|Text|Meta|dense: 1.3B , 2.7B , 6.7B , 13B; MoE:15B , 52B , 207B , 1100B|English|https://arxiv.org/abs/2112.10684|https://github.com/facebookresearch/fairseq/tree/main/examples/moe_lm#pre-trained-models|https://github.com/facebookresearch/fairseq/tree/main/examples/moe_lm|-
|66|GLIDE|2021/12|Text/Vision|OpenAI|5B|English|https://arxiv.org/abs/2112.10741|-|https://github.com/openai/glide-text2im|-
|67|WebGPT|2021/12|Text|OpenAI|13B , 175B|English|https://arxiv.org/abs/2112.09332|-|-|-
|68|LongT5|2021/12|Text|Google|3B|English|https://arxiv.org/abs/2112.07916|https://github.com/google-research/longt5#released-model-checkpoints|https://github.com/google-research/longt5|-
|69|GLaM|2021/12|Text|Google|dense: 1.7B , 8.7B , 137B; MoE:1.9B , 20B , 27B , 53B , 105B , 143B , 1200B|English|https://arxiv.org/abs/2112.06905|-|-|-
|70|Retro |2021/12|Text|DeepMind|1.5B , 7.5B|English|https://arxiv.org/abs/2112.04426|-|-|-
|71|Gopher|2021/12|Text|DeepMind|1.4B , 7.1B , 280B|English|https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf|-|-|-
|72|CodeParrot|2021/12|Code|Huggingface|1.5B|-|https://huggingface.co/blog/codeparrot|https://huggingface.co/codeparrot/codeparrot|https://github.com/huggingface/transformers/tree/main/examples/research_projects/codeparrot|-
|73|GPT-JT|2021/11|Text|Together|6B|English|https://www.together.xyz/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai|https://huggingface.co/togethercomputer/GPT-JT-6B-v1|https://huggingface.co/togethercomputer/GPT-JT-6B-v1|-
|74|Zhouwenwang(周文王)|2021/11|Text|IDEA|1.3B|Chinese|https://idea.edu.cn/fengshenbang-lm.html|https://huggingface.co/IDEA-CCNL/Zhouwenwang-Unified-1.3B|https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/models/roformer|-
|75|Yuyuan(余元)|2021/11|Text|IDEA|3.5B|Chinese|https://idea.edu.cn/fengshenbang-lm.html|https://huggingface.co/IDEA-CCNL/YuyuanQA-GPT2-3.5B|https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/wenzhong_qa|-
|76|Wenzhong(闻仲)|2021/11|Text|IDEA|3.5B|Chinese|https://idea.edu.cn/fengshenbang-lm.html|https://huggingface.co/IDEA-CCNL/Wenzhong-GPT2-3.5B|https://fengshenbang-doc.readthedocs.io/zh/latest/docs/%E9%97%BB%E4%BB%B2%E7%B3%BB%E5%88%97/Wenzhong-GPT2-3.5B.html|-
|77|ExT5|2021/11|Text|Google|3B , 11B|English|https://arxiv.org/abs/2111.10952|-|-|-
|78|Erlangshen(二郎神)|2021/11|Text|IDEA|1.3B|Chinese|https://idea.edu.cn/fengshenbang-lm.html|https://huggingface.co/IDEA-CCNL/Erlangshen-MegatronBert-1.3B|https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/pretrain_erlangshen|-
|79|Bigan(比干)|2021/11|Text|IDEA|1.1B|Chinese|https://idea.edu.cn/fengshenbang-lm.html|https://huggingface.co/IDEA-CCNL/Bigan-Transformer-XL-denoise-1.1B|https://fengshenbang-doc.readthedocs.io/zh/latest/docs/%E6%AF%94%E5%B9%B2%E7%B3%BB%E5%88%97/Bigan-Transformer-XL-denoise-1.1B.html|-
|80|BASIC|2021/11|Vision|GoogleMIT|3B|-|https://arxiv.org/abs/2111.10050|-|-|-
|81|Swin Transformer V2|2021/11|Vision|Microsoft|3B|-|https://arxiv.org/abs/2111.09883|-|https://github.com/microsoft/Swin-Transformer|-
|82|PERKS|2021/11|Text|Kuaishou|~1B|Chinese|https://github.com/KuaiSearchPERKS/PERKS/|-|-|-
|83|M-CTC-T|2021/10|Audio|Facebook/ McGill University/ Mila|1.06B|Multilingual|https://arxiv.org/abs/2111.00161|https://github.com/flashlight/wav2letter/tree/main/recipes/mling_pl#inference|https://github.com/flashlight/wav2letter/tree/main/recipes/mling_pl|-
|84|TI-NLP|2021/10|Text|Tencent|~10B|Chinese|https://www.iheima.com/article-325531.html|-|-|-
|85|T0|2021/10|Text|BigScience|3B , 11B|English|https://arxiv.org/abs/2110.08207|https://github.com/bigscience-workshop/t-zero#released-checkpoints|https://github.com/bigscience-workshop/t-zero|-
|86|MixQG|2021/10|Text|Salesforce|3B|English|https://arxiv.org/abs/2110.08175|https://github.com/salesforce/QGen/tree/main/MixQG#released-model-checkpoints|https://github.com/salesforce/QGen/tree/main/MixQG|-
|87|ShenNonG(神农)|2021/10|Text|Tencent|~1B|Chinese|https://mp.weixin.qq.com/s/CavGiy1Rz0MJVtcxXdSn0A|-|-|-
|88|Mengzi(孟子)|2021/10|Text|Shanghai Jiao Tong University/ Beijing Institute of Technology/ Beijing Jiaotong University/ Peking University/ Langboat Technology|1B|Chinese|https://arxiv.org/abs/2110.06696|-|https://github.com/Langboat/Mengzi|-
|89|Yuan(源) 1.0|2021/10|Text|Inspur|13B , 245B|Chinese|https://arxiv.org/abs/2110.04725|-|-|https://air.inspur.com/home
|90|M6-10T|2021/10|Text/Vision|Alibaba|dense: 1.4B; MoE:10000B|Chinese|https://arxiv.org/abs/2110.03888|-|-|-
|91|Zidong.Taichu(紫东太初)|2021/9|Audio/Text/Vision|Institute of Automation|~1B , ~10B , ~100B|Chinese|http://www.ia.cas.cn/xwzx/kydt/202109/t20210927_6215538.html|https://gitee.com/zidongtaichu/multi-modal-models|https://gitee.com/zidongtaichu/multi-modal-models|-
|92|Z-code M3|2021/9|Text|Microsoft|1.8B , 3B , 5.5B , 10B , 20B|Multilingual|https://arxiv.org/abs/2109.10465|-|-|-
|93|T5-Efficient|2021/9|Text|Google/DeepMind|3B , 11B , 30B|English|https://arxiv.org/abs/2109.10686|https://github.com/google-research/google-research/tree/master/scaling_transformers#download-checkpoints|https://github.com/google-research/text-to-text-transfer-transformer|-
|94|PLATO-XL|2021/9|Text|Baidu|11B|English|https://arxiv.org/abs/2109.09519|https://github.com/PaddlePaddle/Knover/blob/develop/projects/PLATO-XL/README.md#pre-trained-dialogue-generation-model|https://github.com/PaddlePaddle/Knover/|-
|95|ShenZhou(神舟)|2021/9|Text|Tencent|~10B|Chinese|https://mp.weixin.qq.com/s/PODShmOo0tg9cmchNhzvtw|-|-|-
|96|CoAtNet|2021/9|Vision|Google|1.47B , 2.44B|English|https://arxiv.org/abs/2106.04803|-|-|-
|97|HyperCLOVA|2021/9|Text|NAVER/ Search Solutions|1.3B , 6.9B , 13B , 39B , 82B , 204B|Korean|https://arxiv.org/abs/2109.04650|-|-|-
|98|Macaw|2021/9|Text|AI2|3B , 11B|English|https://arxiv.org/abs/2109.02593|https://github.com/allenai/macaw#available-models|https://github.com/allenai/macaw|-
|99|FLAN|2021/9|Text|Google|137B|English|https://arxiv.org/abs/2109.01652|-|https://github.com/google-research/flan|-
|100|ProteinLM|2021/8|Protein|Tsinghua University/ BAAI/ Tencent|3B|-|https://arxiv.org/abs/2108.07435|https://github.com/THUDM/ProteinLM#download-proteinlm|https://github.com/THUDM/ProteinLM|-
|101|Jurassic-1(Grande, Jumbo)|2021/8|Text|AI21 Labs|7.5B , 17B , 178B|English|https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf|-|-|https://studio.ai21.com/playground
|102|baseline-1.5B|2021/8|Text|Cohere|1.5B|English|https://arxiv.org/abs/2108.07790|-|-|-
|103|EVA|2021/8|Text|Tsinghua University/ BAAI|2.8B|Chinese|https://arxiv.org/abs/2108.01547|https://wudaoai.cn/model/detail/EVA#download|https://github.com/thu-coai/EVA/|-
|104|BlenderBot 2|2021/7|Text|Facebook|2.7B|English|https://ai.facebook.com/blog/blender-bot-2-an-open-source-chatbot-that-builds-long-term-memory-and-searches-the-internet/|https://parl.ai/projects/blenderbot2/|https://parl.ai/projects/blenderbot2/|-
|105|ProtTrans|2021/7|Protein|Technical University of Munich/ Med AI Technology/ Google/ NVIDIA/ Seoul National University/ Oak Ridge National Laboratory|3B , 11B|-|https://doi.org/10.1109/TPAMI.2021.3095381|https://github.com/agemagician/ProtTrans#%EF%B8%8F-models-availability|https://github.com/agemagician/ProtTrans|-
|106|Codex|2021/7|Code|OpenAI|2.5B , 12B|-|https://arxiv.org/abs/2107.03374|-|-|http://beta.openai.com/codex-waitlist
|107|ERNIE 3.0|2021/7|Text|Baidu|10B|Chinese|https://arxiv.org/abs/2107.02137|-|-|-
|108|CPM-2|2021/6|Text|Tsinghua University/ BAAI|dense: 11B; MoE:198B|Chinese|https://arxiv.org/abs/2106.10715|https://github.com/OpenBMB/ModelCenter#supported-models|https://github.com/OpenBMB/ModelCenter|-
|109|Motian(摩天)|2021/6|Text|Tencent|~1B|Chinese|https://mp.weixin.qq.com/s/HQL0Hk49UR6kVNtrvcXEGA|-|-|-
|110|V-MOE|2021/6|Vision|Google|14.7B|-|https://arxiv.org/abs/2106.05974|-|-|-
|111|GPT-J|2021/6|Text|EleutherAI|6B|English|https://www.infoq.com/news/2021/07/eleutherai-gpt-j/|https://github.com/kingoflolz/mesh-transformer-jax/#links|https://github.com/kingoflolz/mesh-transformer-jax/|-
|112|ViT|2021/6|Vision|Google|1B , 1.8B|-|https://arxiv.org/abs/2106.04560|-|https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/scaling_laws/train_vit_g.py|-
|113|Wudao(悟道) 2.0|2021/6|Text/Vision|BAAI|1750B|EnglishChinese|https://www.sohu.com/na/469857971_473283|-|-|-
|114|ByT5|2021/5|Text|Google|1.2B , 3.7B , 13B|Multilingual|https://arxiv.org/abs/2105.13626|https://github.com/google-research/byt5#released-model-checkpoints|https://github.com/google-research/byt5|-
|115|CogView|2021/5|Text/Vision|Tsinghua University/ Alibaba/ BAAI|4B|Chinese|https://arxiv.org/abs/2204.14217|https://github.com/THUDM/CogView#download|https://github.com/THUDM/CogView|-
|116|QAConv|2021/5|Text|Salesforce/ HKUST|3B|English|https://arxiv.org/abs/2105.06912|https://github.com/salesforce/QAConv#trained-models|https://github.com/salesforce/QAConv|-
|117|XLM-R |2021/5|Text|Facebook|3.5B , 10.7B|Multilingual|https://arxiv.org/abs/2105.00572|https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr#pre-trained-models|https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr|-
|118|PanGu(盘古)-α|2021/4|Text|Peng Cheng Laboratory/ Huawei|2.6B , 13B , 200B|Chinese|https://arxiv.org/abs/2104.12369|https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/PanGu-%CE%B1#%E6%A8%A1%E5%9E%8B%E4%B8%8B%E8%BD%BD|https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/PanGu-%CE%B1|https://pangu-alpha.openi.org.cn/
|119|PLUG|2021/4|Text|Alibaba|27B|Chinese|https://www.infoq.cn/article/efiho75sqsvqlvftruke|https://www.alice-mind.com/portal#/|https://github.com/alibaba/AliceMind/tree/main/PLUG|-
|120|GPT-Neo|2021/3|Text|EleutherAI|1.3B , 2.7B|English|https://venturebeat.com/2021/05/15/gpt-3s-free-alternative-gpt-neo-is-something-to-be-excited-about/|https://github.com/EleutherAI/gpt-neo/#pretrained-models|https://github.com/EleutherAI/gpt-neo/|-
|121|GLM|2021/3|Text|Tsinghua University/ BAAI/ MIT/ Shanghai Qi Zhi Institute|10B|EnglishChinese|https://arxiv.org/abs/2103.10360|https://github.com/THUDM/GLM#pretrained-models|https://github.com/THUDM/GLM|-
|122|Chinese-Transformer-XL|2021/3|Text|Tsinghua University|2.9B|Chinese|https://wudaoai.cn/model/detail/Transformer-XL|http://dorc-model-team.ks3-cn-beijing.ksyun.com/ren-zhi/my-model/mp_rank_00_model_states.pt|https://github.com/THUDM/Chinese-Transformer-XL|-
|123|BriVL|2021/3|Text/Vision|Renmin University of ChinaInstitute of Computing Technology|1B|Chinese|https://arxiv.org/abs/2103.06561|https://github.com/BAAI-WuDao/BriVL#%E4%B8%8B%E8%BD%BD%E4%B8%93%E5%8C%BA|https://github.com/BAAI-WuDao/BriVL|-
|124|M6|2021/3|Text/Vision|AlibabaTsinghua University|10B , 100B|Chinese|https://arxiv.org/abs/2103.00823|-|-|-
|125|DALL-E|2021/2|Text/Vision|OpenAI|12B|English|https://arxiv.org/abs/2102.12092|-|https://github.com/openai/dall-e|-
|126|Switch Transformers|2021/1|Text|Google|7B , 26B , 395B , 1571B|English|https://arxiv.org/abs/2101.03961|https://github.com/google-research/t5x/blob/main/docs/models.md#mixture-of-experts-moe-checkpoints|https://github.com/google-research/t5x|-
|127|CPM-1|2020/12|Text|Tsinghua University/ BAAI|2.6B|Chinese|https://arxiv.org/abs/2012.00413|https://github.com/OpenBMB/ModelCenter#supported-models|https://github.com/OpenBMB/ModelCenter|-
|128|mT5|2020/10|Text|Google|1.2B , 3.7B , 13B|Multilingual|https://arxiv.org/abs/2010.11934|https://github.com/google-research/multilingual-t5#released-model-checkpoints|https://github.com/google-research/multilingual-t5|-
|129|M2M-100|2020/10|Text|Facebook|1.2B , 12B|Multilingual|https://arxiv.org/abs/2010.11125|https://github.com/facebookresearch/fairseq/tree/main/examples/m2m_100#trained-models|https://github.com/facebookresearch/fairseq/tree/main/examples/m2m_100|-
|130|BlenderBot 3|2020/8|Text|Meta/ Mila/ McGill University|3B , 30B , 175B|English|https://arxiv.org/abs/2208.03188|https://github.com/facebookresearch/ParlAI/tree/main/projects/bb3|https://github.com/facebookresearch/ParlAI/tree/main/projects/bb3|-
|131|PLATO-2|2020/6|Text|Baidu|1.6B|English|https://arxiv.org/abs/2006.16779|https://github.com/PaddlePaddle/Knover/blob/develop/projects/PLATO-2/README.md#pre-trained-dialogue-generation-model|https://github.com/PaddlePaddle/Knover/|-
|132|GShard|2020/6|Text|Google|12.5B , 37B , 50B , 150B , 200B , 600B|Multilingual|https://arxiv.org/abs/2006.16668|-|-|-
|133|iGPT|2020/6|Vision|OpenAI|1.3B , 6.8B|-|https://proceedings.mlr.press/v119/chen20s.html|-|-|-
|134|DeBERTa v2|2020/6|Text|Microsoft|1.5B|English|https://arxiv.org/abs/2006.03654|https://huggingface.co/microsoft/deberta-v2-xxlarge|https://github.com/microsoft/DeBERTa|-
|135|GPT-3|2020/5|Text|OpenAI|1.3B , 2.7B , 6.7B , 13B , 175B|English|https://arxiv.org/abs/2005.14165|-|-|https://openai.com/api/
|136|BlenderBot 1|2020/4|Text|Facebook|2.7B , 9.4B|English|https://arxiv.org/abs/2004.13637|https://parl.ai/projects/recipes/|https://parl.ai/projects/recipes/|-
|137|ProGen|2020/3|Protein|Salesforce/ Stanford University|1.2B|-|https://arxiv.org/abs/2004.03497|-|-|-
|138|Turing NLG|2020/2|Text|Microsoft|17B|English|https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/|-|-|-
|139|Meena|2020/1|Text|Google|2.6B|English|https://arxiv.org/abs/2001.09977|-|-|-
|140|T5|2019/10|Text|Google|3B , 11B|English|https://arxiv.org/abs/1910.10683|https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints|https://github.com/google-research/text-to-text-transfer-transformer|-
|141|Megatron-LM|2019/9|Text|NVIDIA|1.2B , 2.5B , 4.2B , 8.3B|English|https://arxiv.org/abs/1909.08053|-|https://github.com/NVIDIA/Megatron-LM|-
|142|CTRL|2019/9|Text|Salesforce|1.63B|English|https://arxiv.org/abs/1909.05858|https://console.cloud.google.com/storage/browser/sf-ctrl;tab=objects?prefix=&forceOnObjectsSortingFiltering=false|https://github.com/salesforce/ctrl|-
|143|GPT-2|2019/2|Text|OpenAI|1.5B|English|https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf|https://github.com/openai/gpt-2|https://github.com/openai/gpt-2|-
|144|Sparsely-Gate MOE|2017/1|Text|Jagiellonian University/ Google|8.7B , 137B|Multilingual|https://arxiv.org/abs/1701.06538|-||
|145|LLaMA|2023/02|NLP|Meta|7B/13B/33B/|en/|https://arxiv.org/abs/2302.13971|https://github.com/facebookresearch/llama/blob/main/download.sh|https://github.com/facebookresearch/llama/tree/main/llama|
|146|ChatGLM|2023/03|NLP|THUDM|6B|zh/en|https://chatglm.cn/blog|https://huggingface.co/THUDM/chatglm-6b|https://huggingface.co/THUDM/chatglm-6b|
|147|Open Flamingo|2023/03|CV/NLP|laion.ai|9B|en|https://laion.ai/blog/open-flamingo/|https://huggingface.co/openflamingo/OpenFlamingo-9B|https://github.com/mlfoundations/open_flamingo|https://7164d2142d11.ngrok.app/
|148|Alpaca-7B|2023/03|NLP|Stanford|7B/13B(Not Open)|en/ multiligual|https://crfm.stanford.edu/2023/03/13/alpaca.html|https://huggingface.co/tatsu-lab/alpaca-7b-wdiff/tree/main|https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py|
|149|Vicuna|2023/03|NLP|The Vicuna|7B/13B|en/|https://lmsys.org/blog/2023-03-30-vicuna/|https://github.com/lm-sys/FastChat#vicuna-weights|https://github.com/lm-sys/FastChat/blob/main/scripts/train_vicuna_7b.sh|https://chat.lmsys.org/
|150|FastChat-T5|2023/04|NLP|The Vicuna|3B|en|https://github.com/lm-sys/FastChat#FastChat-T5|https://huggingface.co/lmsys/fastchat-t5-3b-v1.0|https://huggingface.co/lmsys/fastchat-t5-3b-v1.0|
|151|Pythia|2023/04|NLP|EleutherAI|70M/160M/410M/1B/1.4B/2.8B/6.9B/12B|en|https://arxiv.org/pdf/2304.01373.pdf|https://github.com/EleutherAI/pythia/tree/main|https://github.com/EleutherAI/pythia/tree/main/models|
|152|StableLM-Tuned-Alpha|2023/04|NLP|Stability AI|3B/7B|en|https://github.com/stability-AI/stableLM|https://huggingface.co/stabilityai|https://huggingface.co/stabilityai|
|153|Stable-vicuna-13b-delta|2023/04|NLP|CarperAI|13B|en/|https://stability.ai/blog/stablevicuna-open-source-rlhf-chatbot|https://huggingface.co/CarperAI/stable-vicuna-13b-delta/tree/main|https://huggingface.co/CarperAI/stable-vicuna-13b-delta|https://huggingface.co/CarperAI/stable-vicuna-13b-delta
|154|Dolly|2023/04|NLP|Databricks|3B/6B/7B/12B|en|https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm|https://huggingface.co/databricks|https://huggingface.co/databricks|
|155|MPT|2023/05|NLP|Mosaic ML|7B|en|https://the-decoder.com/mpt-7b-the-best-open-source-llm-available-for-commercial-use/|https://huggingface.co/mosaicml|https://huggingface.co/mosaicml|
|156|Dromedary|2023/05|NLP|MIT-IBM|65B|en/|https://arxiv.org/pdf/2305.03047.pdf|https://huggingface.co/zhiqings/dromedary-65b-lora-delta-v0|https://github.com/IBM/Dromedary|
|157|Visualglm-6b|2023/05|CV/NLP|THUDM|6B|zh/en|https://github.com/THUDM/VisualGLM-6B|https://huggingface.co/THUDM/visualglm-6b|https://github.com/THUDM/VisualGLM-6B|https://huggingface.co/spaces/THUDM/visualglm-6b
|158|Alpaca.cpp|2023/03|NLP||7B|en|https://github.com/antimatter15/alpaca.cpp|||
|159|Alpaca-LoRA|2023/03|NLP||7B||https://github.com/tloen/alpaca-lora|||
|160|Baize|2023/03|NLP|UCSD/ Sun Yat-sen University|7B/13B/30B||https://arxiv.org/abs/2304.01196|https://huggingface.co/spaces/project-baize/chat-with-baize|https://github.com/project-baize/baize-chatbot|
|161|BELLE|2023/04|NLP|链家(lianjia tech)|7B/13B|zh/en|||https://github.com/LianjiaTech/BELLE|
|162|BLOOM-LoRA|2023/03|NLP||7B|Multilingual||https://huggingface.co/LinhDuong/bloom-7b1-lora-codealpaca20k|https://github.com/linhduongtuan/BLOOM-LORA|
|163|Cabrita|2023/03|NLP|22-hours|| portuguese||https://huggingface.co/22h/cabrita-lora-v0-1|https://github.com/22-hours/cabrita|
|164|Camel|2023/03|NLP|camel-ai.org(KAUST)|||https://ghli.org/camel.pdf||https://github.com/camel-ai/camel|
|165|Cerebras-GPT|2023/04|NLP|Cerebras|111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B|Multilingual|https://arxiv.org/abs/2304.03208 (https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models)|https://huggingface.co/cerebras/Cerebras-GPT-13B||
|166|RWKV|2022/04|NLP|BlinkDL|0.1B/0.4B/1.5B/3B/7B/14B||https://arxiv.org/abs/2305.13048||https://github.com/BlinkDL/RWKV-LM|
|167|ChatRWKV|2023/04|NLP|BlinkDL|||||https://github.com/BlinkDL/ChatRWKV|
|168|Chimera|2023/04|NLP|FreedomAI(CUHK)|7B/13B|Latin||https://huggingface.co/FreedomIntelligence|https://github.com/FreedomIntelligence/LLMZoo|
|169|Phoenix|2023/04|NLP|FreedomAI(CUHK)|7B|Multilingual||https://huggingface.co/FreedomIntelligence|https://github.com/FreedomIntelligence/LLMZoo|
|170|HuatuoGPT(CAMEL)|2023/04|NLP|FreedomAI(CUHK)|7B|Multilingual||https://huggingface.co/FreedomIntelligence|https://github.com/FreedomIntelligence/LLMZoo|
|171|Chinese-Vicuna|2023/04|NLP||7B/13B|Chinese||https://huggingface.co/Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco|https://github.com/Facico/Chinese-Vicuna|
|172|Claude|2023/03|NLP|anthropic|~6.2B|Multilingual||||https://www.anthropic.com/product
|173|CPM-Bee|2023/05|NLP|OpenBMB|10B|EN/ZH||https://huggingface.co/openbmb/cpm-bee-10b/tree/main|https://github.com/OpenBMB/CPM-Bee|
|174|Dolly 2.0|2023/04|NLP|EleutherAI|2.8B/6.9B/12B|Multilingual|https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm|https://huggingface.co/databricks/dolly-v2-12b|https://github.com/databrickslabs/dolly#getting-started-with-response-generation|
|175|Flan-Alpaca(Base/large/XL/XXL/GPT4-XL)/Flan-GPT4ALL-XL/Flan-ShareGPT-XL|2023/03|NLP|Deep Cognition and Language Research (DeCLaRe) Lab|0.22B/0.77B/3B/11B||https://huggingface.co/declare-lab|https://huggingface.co/declare-lab|https://github.com/declare-lab/flan-alpaca|
|176|Flan-PaLM|2022/10|NLP|Google Research|540B||https://arxiv.org/pdf/2210.11416.pdf||https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints|
|177|Flan-UL2|2023/03|NLP|Google Research|20B|Multilingual|https://arxiv.org/pdf/2205.05131.pdf|https://huggingface.co/google/flan-ul2||
|178|GALPACA|2022/11|NLP|Georgia Tech Research Institute/ Meta AI|125M/1.3B/6.7B/30B/120B||https://galactica.org/static/paper.pdf|https://huggingface.co/GeorgiaTechResearchInstitute||
|179|GPT 4|2023/03|NLP|OpenAI|~1000B or more|Multilingual|https://arxiv.org/abs/2303.08774|||https://openai.com/research/gpt-4
|180|GPT4All|2023/03|NLP|Nomic AI|||||https://github.com/nomic-ai/gpt4all|https://gpt4all.io/index.html
|181|GPTQ-for-LLaMA|2023/03|NLP||7B/13B/33B/65B||||https://github.com/qwopqwop200/GPTQ-for-LLaMa|
|182|h2oGPT|2023/03|NLP|h2o.ai|12B|||https://huggingface.co/spaces/h2oai/h2ogpt-chatbot|https://github.com/h2oai/h2ogpt|
|183|HuggingChat||NLP|Huggingface|30B|||https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor||
|184|Koala|2023/04|NLP|berkeley AI research (BAIR)|13B||https://bair.berkeley.edu/blog/2023/04/03/koala/|||
|185|Llama-X|2023/03|NLP||7B||||https://github.com/AetherCortex/Llama-X|
|186|Open-Assistant|2023/04|NLP|LAION AI|||https://projects.laion.ai/Open-Assistant/blog/2023/04/15/open-assistant-released|open-assistant.io/chat|https://github.com/LAION-AI/Open-Assistant|open-assistant.io/chat
|187|OpenChatKit||NLP|togethercomputer|7B||https://github.com/togethercomputer/OpenChatKit|||
|188|PALM2|2023/05|NLP|Google|||https://blog.google/technology/ai/google-palm-2-ai-large-language-model/|||https://developers.generativeai.google/products/palm
|189|Palmyra|2023/03|NLP|Writer|3B/5B|||https://huggingface.co/Writer/palmyra-base||https://writer.com/product/api/
|190|RedPajama|2023/04|NLP|Together Computer|3B/7B|EN||https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1|https://github.com/togethercomputer/RedPajama-Data|
|191|StackLLaMA|2023/04|NLP|Huggingface|7B||https://huggingface.co/blog/stackllama|||
|192|StarCoder||NLP|Huggingface||||||
|193|falcon-40b|2023/05|NLP|Technology Innovation Institute|40B|Multilingual||https://huggingface.co/tiiuae/falcon-40b||
|194|文心一言|2023/03|NLP|baidu|260B||https://yiyan.baidu.com|||
|195|通义千问|2023/04|NLP|alibaba|1200B||https://tongyi.aliyun.com/|||
|196|盘古β|2020/04|NLP|huawei||||||
|197|飞书“My AI”|2023/04|NLP|飞书(ByteDance)||||||
|198|言犀-ChatJD|2023/02|NLP|JD|~100B||https://yanxi.jd.com/|||
|199|知海图Al|2023/04|NLP|zhihu & modelbest|~1B|zh||||
|200|360智脑|2023/05|NLP|360|~10B||https://ai.360.cn/|||
|201|伏羲——预训练大模型“玉言”|2023/01|NLP|netease|11B||https://fuxi.163.com/|||
|202|天工|2023/04|NLP|昆仑万维 & 奇点智源|~100B||https://tiangong.kunlun.com/|||
|203|天燕大模型AiLMe|2023/04|NLP|APUS|~100B||http://apusai.com/|||
|204|日日新SenseNova/商量|2023/04|NLP|sensetime|180B||https://techday.sensetime.com/list|||
|205|讯飞星火|2023/05|NLP|科大讯飞|~50B||https://xinghuo.xfyun.cn/|||
|206|序列猴子|2023/04|NLP|出门问问|less than 100B||https://openapi.mobvoi.com/largemodel-introduce|||
|207|Mchat(孟子)|2023/03(2021/07)|NLP|澜舟科技|~1B||https://www.langboat.com/portal/mengzi-model|||
|208|自动驾驶生成式大模型DriveGPT|2023/04|NLP|毫末科技|120B||https://www.haomo.ai/|||
|209|魔力写作||NLP|竹间智能||||||
|210|Glow(基于自研大模型的AI虚拟聊天社交软件)||NLP|北京稀宇科技有限公司||||||
|211|曹植|2023/03|NLP|达观数据|50B||http://www.datagrand.com/products/aigc/|||
|212|子曰|2023/05|NLP|网易有道||||||
|213|MathGPT|2023/05|NLP|好未来||||||
|214|对话式大型语言模型MOSS|2023/02|NLP|复旦大学|~20B|||https://huggingface.co/fnlp/moss-moon-003-base|https://github.com/OpenLMLab/MOSS|
|215|封神榜系列-姜子牙大模型|2023/05|NLP|IDEA|7B/13B|||https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1/discussions/17|https://github.com/IDEA-CCNL/Fengshenbang-LM|
|216|天气预报大模型“风乌”|2023/04|NLP|上海人工智能实验室/中国科学技术大学/上海交通大学/南京信息工程大学/中国科学院大气物理研究所/上海中心气象台|||https://arxiv.org/abs/2304.02948|||
|217|Firefly(流萤)|2023/03|NLP||1.4B/2.6B|ZH||https://huggingface.co/YeungNLP/firefly-2b6|https://github.com/yangjianxin1/Firefly|
## 致谢名单 ## 致谢名单
感谢以下各位贡献者的大力支持与参与! 感谢以下各位贡献者的大力支持与参与!
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册