{ "cells": [ { "cell_type": "markdown", "id": "7aa268f7", "metadata": {}, "source": [ "# German BERT\n", "![bert_image](https://static.tildacdn.com/tild6438-3730-4164-b266-613634323466/german_bert.png)\n", "## Overview\n", "**Language model:** bert-base-cased\n", "**Language:** German\n", "**Training data:** Wiki, OpenLegalData, News (~ 12GB)\n", "**Eval data:** Conll03 (NER), GermEval14 (NER), GermEval18 (Classification), GNAD (Classification)\n", "**Infrastructure**: 1x TPU v2\n", "**Published**: Jun 14th, 2019\n", "\n", "You can get more details from [Bert in PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/bert/README.md)。" ] }, { "cell_type": "markdown", "id": "f407e80e", "metadata": {}, "source": [ "**Update April 3rd, 2020**: we updated the vocabulary file on deepset's s3 to conform with the default tokenization of punctuation tokens.\n", "For details see the related [FARM issue](https://github.com/deepset-ai/FARM/issues/60). If you want to use the old vocab we have also uploaded a deepset/bert-base-german-cased-oldvocab model.\n" ] }, { "cell_type": "markdown", "id": "18d2ad8e", "metadata": {}, "source": [ "## How to use" ] }, { "cell_type": "code", "execution_count": null, "id": "b80052bd", "metadata": {}, "outputs": [], "source": [ "!pip install --upgrade paddlenlp" ] }, { "cell_type": "code", "execution_count": null, "id": "4ea9d4e3", "metadata": {}, "outputs": [], "source": [ "import paddle\n", "from paddlenlp.transformers import AutoModel\n", "\n", "model = AutoModel.from_pretrained(\"bert-base-german-cased\")\n", "input_ids = paddle.randint(100, 200, shape=[1, 20])\n", "print(model(input_ids))" ] }, { "cell_type": "markdown", "id": "9d560e75", "metadata": {}, "source": [ "## Authors\n", "- Branden Chan: `branden.chan [at] deepset.ai`\n", "- Timo Möller: `timo.moeller [at] deepset.ai`\n", "- Malte Pietsch: `malte.pietsch [at] deepset.ai`\n", "- Tanay Soni: `tanay.soni [at] deepset.ai`\n" ] }, { "cell_type": "markdown", "id": "a0e43273", "metadata": {}, "source": [ "## About us\n", "![deepset logo](https://raw.githubusercontent.com/deepset-ai/FARM/master/docs/img/deepset_logo.png)\n" ] }, { "cell_type": "markdown", "id": "c1b05e60", "metadata": {}, "source": [ "We bring NLP to the industry via open source!\n", "Our focus: Industry specific language models & large scale QA systems.\n" ] }, { "cell_type": "markdown", "id": "5196bee9", "metadata": {}, "source": [ "Some of our work:\n", "- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n", "- [FARM](https://github.com/deepset-ai/FARM)\n", "- [Haystack](https://github.com/deepset-ai/haystack/)\n" ] }, { "cell_type": "markdown", "id": "18fe01d5", "metadata": {}, "source": [ "Get in touch:\n", "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Website](https://deepset.ai)\n", "\n", "> The model introduction and model weights originate from https://huggingface.co/bert-base-german-cased and were converted to PaddlePaddle format for ease of use in PaddleNLP." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.13" } }, "nbformat": 4, "nbformat_minor": 5 }