introduction_cn.ipynb 2.9 KB
Notebook
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "922f44e2",
   "metadata": {},
   "source": [
    "# bert-base-romanian-uncased-v1\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f5259bd",
   "metadata": {},
   "source": [
    "The BERT **base**, **uncased** model for Romanian, trained on a 15GB corpus, version ![v1.0](https://img.shields.io/badge/v1.0-21%20Apr%202020-ff6666)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "408f4468",
   "metadata": {},
   "source": [
    "## How to use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acd14372",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --upgrade paddlenlp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc5d539c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import paddle\n",
    "from paddlenlp.transformers import AutoModel\n",
    "\n",
    "model = AutoModel.from_pretrained(\"dumitrescustefan/bert-base-romanian-uncased-v1\")\n",
    "input_ids = paddle.randint(100, 200, shape=[1, 20])\n",
    "print(model(input_ids))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adbbab44",
   "metadata": {},
   "source": [
    "## Citation\n",
    "\n",
    "```\n",
    "@inproceedings{dumitrescu-etal-2020-birth,\n",
    "title = \"The birth of {R}omanian {BERT}\",\n",
    "author = \"Dumitrescu, Stefan  and\n",
    "Avram, Andrei-Marius  and\n",
    "Pyysalo, Sampo\",\n",
    "booktitle = \"Findings of the Association for Computational Linguistics: EMNLP 2020\",\n",
    "month = nov,\n",
    "year = \"2020\",\n",
    "address = \"Online\",\n",
    "publisher = \"Association for Computational Linguistics\",\n",
    "url = \"https://aclanthology.org/2020.findings-emnlp.387\",\n",
    "doi = \"10.18653/v1/2020.findings-emnlp.387\",\n",
    "pages = \"4324--4328\",\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4276651e",
   "metadata": {},
   "source": [
    "#### Acknowledgements\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84a91796",
   "metadata": {},
   "source": [
    "- We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!\n",
    "> 此模型介绍及权重来源于[https://huggingface.co/dumitrescustefan/bert-base-romanian-uncased-v1](https://huggingface.co/dumitrescustefan/bert-base-romanian-uncased-v1),并转换为飞桨模型格式。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}