introduction_cn.ipynb 4.4 KB
Notebook
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "72047643",
   "metadata": {},
   "source": [
    "# DistilGPT2\n",
    "\n",
    "详细内容请看[GPT2 in PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/gpt/README.md)。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20c299c9",
   "metadata": {},
   "source": [
    "DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the smallest version of Generative Pre-trained Transformer 2 (GPT-2). Like GPT-2, DistilGPT2 can be used to generate text. Users of this model card should also consider information about the design, training, and limitations of GPT-2.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c624b3d1",
   "metadata": {},
   "source": [
    "## Model Details\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92002396",
   "metadata": {},
   "source": [
    "- **Developed by:** Hugging Face\n",
    "- **Model type:** Transformer-based Language Model\n",
    "- **Language:** English\n",
    "- **License:** Apache 2.0\n",
    "- **Model Description:** DistilGPT2 is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has 82 million parameters, was developed using [knowledge distillation](#knowledge-distillation) and was designed to be a faster, lighter version of GPT-2.\n",
    "- **Resources for more information:** See this repository for more about Distil\\* (a class of compressed models including Distilled-GPT2), [Sanh et al. (2019)](https://arxiv.org/abs/1910.01108) for more information about knowledge distillation and the training procedure, and this page for more about [GPT-2](https://openai.com/blog/better-language-models/).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1a84778",
   "metadata": {},
   "source": [
    "## How to use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9c6043d",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --upgrade paddlenlp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9f0754d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import paddle\n",
    "from paddlenlp.transformers import AutoModel\n",
    "\n",
    "model = AutoModel.from_pretrained(\"distilgpt2\")\n",
    "input_ids = paddle.randint(100, 200, shape=[1, 20])\n",
    "print(model(input_ids))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03d3d465",
   "metadata": {},
   "source": [
    "## Citation\n",
    "\n",
    "```\n",
    "@inproceedings{sanh2019distilbert,\n",
    "title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},\n",
    "author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},\n",
    "booktitle={NeurIPS EMC^2 Workshop},\n",
    "year={2019}\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7966636a",
   "metadata": {},
   "source": [
    "## Glossary\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "533038ef",
   "metadata": {},
   "source": [
    "-\t<a name=\"knowledge-distillation\">**Knowledge Distillation**</a>: As described in [Sanh et al. (2019)](https://arxiv.org/pdf/1910.01108.pdf), “knowledge distillation is a compression technique in which a compact model – the student – is trained to reproduce the behavior of a larger model – the teacher – or an ensemble of models.” Also see [Bucila et al. (2006)](https://www.cs.cornell.edu/~caruana/compression.kdd06.pdf) and [Hinton et al. (2015)](https://arxiv.org/abs/1503.02531).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7ff7cc1",
   "metadata": {},
   "source": [
    "<a href=\"https://huggingface.co/exbert/?model=distilgpt2\">\n",
    "<img width=\"300px\" src=\"https://cdn-media.huggingface.co/exbert/button.png\">\n",
    "</a>\n",
    "\n",
    "> 此模型介绍及权重来源于[https://huggingface.co/distilgpt2](https://huggingface.co/distilgpt2),并转换为飞桨模型格式。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}