megengine_nlp_bert.md 10.1 KB
Newer Older
1 2 3 4 5 6 7 8
---
template: hub1
title: BERT for Finetune
summary:
    en_US: Bidirectional Encoder Representation from Transformers (BERT)
    zh_CN: BERT
author: MegEngine Team
tags: [nlp]
9
github-link: https://github.com/MegEngine/Models/tree/master/official/nlp/bert
10 11 12 13
---

```python
import megengine.hub as hub
M
Megvii Engine Team 已提交
14
model = megengine.hub.load("megengine/models", "wwm_cased_L-24_H-1024_A-16", pretrained=True)
15
# or any of these variants
M
Megvii Engine Team 已提交
16 17 18 19 20 21 22 23
# model = megengine.hub.load("megengine/models", "wwm_cased_L-24_H-1024_A-16", pretrained=True)
# model = megengine.hub.load("megengine/models", "wwm_uncased_L-24_H-1024_A-16", pretrained=True)
# model = megengine.hub.load("megengine/models", "cased_L-12_H-768_A-12", pretrained=True)
# model = megengine.hub.load("megengine/models", "cased_L-24_H-1024_A-16", pretrained=True)
# model = megengine.hub.load("megengine/models", "uncased_L-12_H-768_A-12", pretrained=True)
# model = megengine.hub.load("megengine/models", "uncased_L-24_H-1024_A-16", pretrained=True)
# model = megengine.hub.load("megengine/models", "chinese_L-12_H-768_A-12", pretrained=True)
# model = megengine.hub.load("megengine/models", "multi_cased_L-12_H-768_A-12", pretrained=True)
24 25
```

26
<!-- section: zh_CN -->
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

这个项目中, 我们用MegEngine重新实现了Google开源的BERT模型.

我们提供了以下预训练模型供用户在不同的下游任务中进行finetune.

* `wwm_cased_L-24_H-1024_A-16`
* `wwm_uncased_L-24_H-1024_A-16`
* `cased_L-12_H-768_A-12`
* `cased_L-24_H-1024_A-16`
* `uncased_L-12_H-768_A-12`
* `uncased_L-24_H-1024_A-16`
* `chinese_L-12_H-768_A-12`
* `multi_cased_L-12_H-768_A-12`

模型的权重来自Google的pre-trained models, 其含义也与其一致, 用户可以直接使用`megengine.hub`轻松的调用预训练的bert模型, 以及下载对应的`vocab.txt``bert_config.json`. 我们在[models](https://github.com/megengine/models/official/nlp/bert)中还提供了更加方便的脚本, 可以通过任务名直接获取到对应字典, 配置, 与预训练模型.

```python
import megengine.hub as hub
import urllib
import urllib.request
import os

DATA_URL = 'https://data.megengine.org.cn/models/weights/bert'
CONFIG_NAME = 'bert_config.json'
VOCAB_NAME = 'vocab.txt'
MODEL_NAME = {
    'wwm_cased_L-24_H-1024_A-16': 'wwm_cased_L_24_H_1024_A_16',
    'wwm_uncased_L-24_H-1024_A-16': 'wwm_uncased_L_24_H_1024_A_16',
    'cased_L-12_H-768_A-12': 'cased_L_12_H_768_A_12',
    'cased_L-24_H-1024_A-16': 'cased_L_24_H_1024_A_16',
    'uncased_L-12_H-768_A-12': 'uncased_L_12_H_768_A_12',
    'uncased_L-24_H-1024_A-16': 'uncased_L_24_H_1024_A_16',
    'chinese_L-12_H-768_A-12': 'chinese_L_12_H_768_A_12',
    'multi_cased_L-12_H-768_A-12': 'multi_cased_L_12_H_768_A_12'
}

def download_file(url, filename):
    try: urllib.URLopener().retrieve(url, filename)
    except: urllib.request.urlretrieve(url, filename)

def create_hub_bert(model_name, pretrained):
    assert model_name in MODEL_NAME, '{} not in the valid models {}'.format(model_name, MODEL_NAME)
    data_dir = './{}'.format(model_name)
    if not os.path.exists(data_dir):
        os.makedirs(data_dir)

    vocab_url = '{}/{}/{}'.format(DATA_URL, model_name, VOCAB_NAME)
    config_url = '{}/{}/{}'.format(DATA_URL, model_name, CONFIG_NAME)
75

76 77
    vocab_file = './{}/{}'.format(model_name, VOCAB_NAME)
    config_file = './{}/{}'.format(model_name, CONFIG_NAME)
78

79 80 81 82 83 84
    download_file(vocab_url, vocab_file)
    download_file(config_url, config_file)

    config = BertConfig(config_file)

    model = hub.load(
85 86
        "megengine/models",
        MODEL_NAME[model_name],
87 88
        pretrained=pretrained,
    )
89

90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
    return model, config, vocab_file
```

为了用户可以更加方便的使用预训练模型, 我们仅保留了模型的`BertModel`的部分, 在实际使用中, 可以将带有预训练的权重的`bert`模型作为其他模型的一部分, 在初始化函数中传入.

```python
class BertForSequenceClassification(Module):
    def __init__(self, config, num_labels, bert):
        self.bert = bert
        self.num_labels = num_labels
        self.dropout = Dropout(config.hidden_dropout_prob)
        self.classifier = Linear(config.hidden_size, num_labels)

    def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None):
        _, pooled_output = self.bert(
105
            input_ids, token_type_ids,
106 107 108 109 110 111
            attention_mask, output_all_encoded_layers=False)
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        if labels is not None:
            loss = cross_entropy_with_softmax(
112
                logits.reshape(-1, self.num_labels),
113 114 115 116 117 118 119 120 121 122 123 124 125
                labels.reshape(-1))
            return logits, loss
        else:
            return logits, None

bert, config, vocab_file = create_hub_bert('uncased_L-12_H-768_A-12', pretrained=True)
model = BertForSequenceClassification(config, num_labels=2, bert=bert)
```

所有预训练模型希望数据被正确预处理, 其要求与Google中的开源bert一致, 详细可以参考 [bert](https://github.com/google-research/bert), 或者参考在[models](https://github.com/megengine/models/official/nlp/bert)中提供的样例.

### 模型描述

126
我们在[models](https://github.com/megengine/models/official/nlp/bert)中提供了简单的示例代码.
127 128 129 130 131 132 133 134
此示例代码在Microsoft Research Paraphrase(MRPC)数据集上对预训练的`uncased_L-12_H-768_A-12`模型进行微调.

我们的样例代码中使用了原始的超参进行微调, 在测试集中可以得到84%到88%的正确率.

### 参考文献

 - [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805), Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova;

135 136

<!-- section: en_US -->
137 138 139 140 141 142 143 144 145 146 147 148 149
This repository contains reimplemented Google's BERT by MegEngine.

We provide the following pre-trained models for users to finetune in different tasks.

* `wwm_cased_L-24_H-1024_A-16`
* `wwm_uncased_L-24_H-1024_A-16`
* `cased_L-12_H-768_A-12`
* `cased_L-24_H-1024_A-16`
* `uncased_L-12_H-768_A-12`
* `uncased_L-24_H-1024_A-16`
* `chinese_L-12_H-768_A-12`
* `multi_cased_L-12_H-768_A-12`

M
Megvii Engine Team 已提交
150
The weight of the model comes from Google's pre-trained models, and its meaning is also consistent with it. Users can use `megengine.hub` to easily use the pre-trained bert model, and download the corresponding` vocab.txt` and `bert_config.json`. We also provide a convenient script in [models](https://github.com/megengine/models/official/nlp/bert), which can directly obtain the corresponding dictionary, configuration, and pre-trained model by task name. .
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183

```python
import megengine.hub as hub
import urllib
import urllib.request
import os

DATA_URL = 'https://data.megengine.org.cn/models/weights/bert'
CONFIG_NAME = 'bert_config.json'
VOCAB_NAME = 'vocab.txt'
MODEL_NAME = {
    'wwm_cased_L-24_H-1024_A-16': 'wwm_cased_L_24_H_1024_A_16',
    'wwm_uncased_L-24_H-1024_A-16': 'wwm_uncased_L_24_H_1024_A_16',
    'cased_L-12_H-768_A-12': 'cased_L_12_H_768_A_12',
    'cased_L-24_H-1024_A-16': 'cased_L_24_H_1024_A_16',
    'uncased_L-12_H-768_A-12': 'uncased_L_12_H_768_A_12',
    'uncased_L-24_H-1024_A-16': 'uncased_L_24_H_1024_A_16',
    'chinese_L-12_H-768_A-12': 'chinese_L_12_H_768_A_12',
    'multi_cased_L-12_H-768_A-12': 'multi_cased_L_12_H_768_A_12'
}

def download_file(url, filename):
    try: urllib.URLopener().retrieve(url, filename)
    except: urllib.request.urlretrieve(url, filename)

def create_hub_bert(model_name, pretrained):
    assert model_name in MODEL_NAME, '{} not in the valid models {}'.format(model_name, MODEL_NAME)
    data_dir = './{}'.format(model_name)
    if not os.path.exists(data_dir):
        os.makedirs(data_dir)

    vocab_url = '{}/{}/{}'.format(DATA_URL, model_name, VOCAB_NAME)
    config_url = '{}/{}/{}'.format(DATA_URL, model_name, CONFIG_NAME)
184

185 186
    vocab_file = './{}/{}'.format(model_name, VOCAB_NAME)
    config_file = './{}/{}'.format(model_name, CONFIG_NAME)
187

188 189 190 191 192 193
    download_file(vocab_url, vocab_file)
    download_file(config_url, config_file)

    config = BertConfig(config_file)

    model = hub.load(
194 195
        "megengine/models",
        MODEL_NAME[model_name],
196 197
        pretrained=pretrained,
    )
198

199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214
    return model, config, vocab_file
```

In order to make it easier for the user to use the pre-trained model, we only keep the `BertModel` part of the original bert model. For example, The` bert` model with pre-trained weights can be used as a part of other models.


```python
class BertForSequenceClassification(Module):
    def __init__(self, config, num_labels, bert):
        self.bert = bert
        self.num_labels = num_labels
        self.dropout = Dropout(config.hidden_dropout_prob)
        self.classifier = Linear(config.hidden_size, num_labels)

    def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None):
        _, pooled_output = self.bert(
215
            input_ids, token_type_ids,
216 217 218 219 220 221
            attention_mask, output_all_encoded_layers=False)
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        if labels is not None:
            loss = cross_entropy_with_softmax(
222
                logits.reshape(-1, self.num_labels),
223 224 225 226 227 228 229 230 231
                labels.reshape(-1))
            return logits, loss
        else:
            return logits, None

bert, config, vocab_file = create_hub_bert('uncased_L-12_H-768_A-12', pretrained=True)
model = BertForSequenceClassification(config, num_labels=2, bert=bert)
```

M
Megvii Engine Team 已提交
232
All pre-trained models expect the data to be pre-processed correctly. The requirements are consistent with the Google's bert. For details, please refer to original [bert](https://github.com/google-research/bert), or refer to our example [models](https://github.com/megengine/models/official/nlp/bert).
233 234 235


### Model Description
M
Megvii Engine Team 已提交
236
We provide example code in [models](https://github.com/megengine/models/official/nlp/bert).
237
This example code fine-tunes the pre-trained `uncased_L-12_H-768_A-12` model on the Microsoft Research Paraphrase (MRPC) dataset.
238 239 240 241 242 243

Our test ran on the original implementation hyper-parameters gave evaluation results between 84% and 88%.


### References
 - [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805), Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova;
244