contri_pretrained_model_en.md 7.2 KB
Newer Older
D
Daniel Yang 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236
# How to Write a PaddleHub Module and Go Live

## I. Preparation

### Basic Model Information

We are going to write a PaddleHub Module with the following basic information about the module:

```yaml
name: senta_test
version: 1.0.0
summary: This is a PaddleHub Module. Just for test.
author: anonymous
author_email:
type: nlp/sentiment_analysis
```

**This sample code can be referred to as [senta\_module\_sample](../../demo/senta_module_sample/senta_test)**

The Module has an interface sentiment\_classify, which is used to receive incoming text and give it a sentiment preference (positive/negative). It supports python interface calls and command line calls.

```python
import paddlehub as hub

senta_test = hub.Module(name="senta_test")
senta_test.sentiment_classify(texts=["这部电影太差劲了"])
```

```cmd
hub run senta_test --input_text 这部电影太差劲了
```

<br/>
### Strategy

For the sake of simplicity of the sample codes, we use a very simple sentiment strategy. When the input text has the word specified in the vocabulary list, the text tendency is judged to be negative; otherwise it is positive.

<br/>
## II. Create Module

### Step 1: Create the necessary directories and files.

Create a senta\_test directory. Then, create module.py, processor.py, and vocab.list in the senta\_test directory, respectively.

| File Name    | Purpose                                                      |
| ------------ | ------------------------------------------------------------ |
| module.py    | It is the main module that provides the implementation codes of Module. |
| processor.py | It is the helper module that provides a way to load the vocabulary list. |
| vocab.list   | It stores the vocabulary.                                    |

```cmd
➜  tree senta_test
senta_test/
├── vocab.list
├── module.py
└── processor.py
```

### Step 2: Implement the helper module processor.

Implement a load\_vocab interface in processor.py to read the vocabulary  list.

```python
def load_vocab(vocab_path):
    with open(vocab_path) as file:
        return file.read().split()
```

### Step 3: Write Module processing codes.

The module.py file is the place where the Module entry code is located. We need to implement prediction logic on it.

#### Step 3\_1. Reference the necessary header files

```python
import argparse
import os

import paddlehub as hub
from paddlehub.module.module import runnable, moduleinfo

from senta_test.processor import load_vocab
```

**NOTE:** When referencing a module in Module, you need to enter the full path, for example, senta\_test. processor.

#### Step 3\_2. Define the SentaTest class.

Module.py needs to have a class that inherits hub. Module, and this class is responsible for implementing the prediction logic and filling in basic information with using moduleinfo. When the hub. Module(name="senta\_test") is used to load Module, PaddleHub automatically creates an object of SentaTest and return it.

```python
@moduleinfo(
    name="senta_test",
    version="1.0.0",
    summary="This is a PaddleHub Module. Just for test.",
    author="anonymous",
    author_email="",
    type="nlp/sentiment_analysis",
)
class SentaTest:
    ...
```

#### Step 3\_3. Perform necessary initialization.

```python
def __init__(self):
    # add arg parser
    self.parser = argparse.ArgumentParser(
        description="Run the senta_test module.",
        prog='hub run senta_test',
        usage='%(prog)s',
        add_help=True)
    self.parser.add_argument(
        '--input_text', type=str, default=None, help="text to predict")

    # load word dict
    vocab_path = os.path.join(self.directory, "vocab.list")
    self.vocab = load_vocab(vocab_path)
```

`注意`: The execution class object has a built-in directory attribute by default. You can directly get the path of the Module.

#### Step 3\_4: Refine the prediction logic.

```python
def sentiment_classify(self, texts):
    results = []
    for text in texts:
        sentiment = "positive"
        for word in self.vocab:
            if word in text:
                sentiment = "negative"
                break
        results.append({"text":text, "sentiment":sentiment})

    return results
```

#### Step 3\_5. Support the command-line invoke.

If you want the module to support command-line invoke, you need to provide a runnable modified interface that parses the incoming data, makes prediction, and returns the results.

If you don't want to provide command-line prediction, you can leave the interface alone and PaddleHub automatically finds out that the module does not support command-line methods and gives a hint when PaddleHub executes in command lines.

```python
@runnable
def run_cmd(self, argvs):
    args = self.parser.parse_args(argvs)
    texts = [args.input_text]
    return self.sentiment_classify(texts)
```

#### step 3\_6. Support the serving invoke.

If you want the module to support the PaddleHub Serving deployment prediction service, you need to provide a serving-modified interface that parses the incoming data, makes prediction, and returns the results.

If you do not want to provide the PaddleHub Serving deployment prediction service, you do not need to add the serving modification.

```python
@serving
def sentiment_classify(self, texts):
    results = []
    for text in texts:
        sentiment = "positive"
        for word in self.vocab:
            if word in text:
                sentiment = "negative"
                break
        results.append({"text":text, "sentiment":sentiment})

    return results
```

### Complete Code

* [module.py](../../../modules/demo/senta_test/module.py)

* [processor.py](../../../modules/demo/senta_test/processor.py)

<br/>
## III. Install and test Module.

After writing a module, we can test it in the following ways:

### Call Method 1

Install the Module into the local machine, and then load it through Hub.Module(name=...)

```shell
➜  hub install senta_test
```

```python
import paddlehub as hub

senta_test = hub.Module(name="senta_test")
senta_test.sentiment_classify(texts=["这部电影太差劲了"])
```

### Call Method 2

Load directly through Hub.Module(directory=...)

```python
import paddlehub as hub

senta_test = hub.Module(directory="senta_test/")
senta_test.sentiment_classify(texts=["这部电影太差劲了"])
```

### Call Method 3

Load SentaTest object directly by adding senta\_test as a path to the environment variable.

```shell
export PYTHONPATH=senta_test:$PYTHONPATH
```

```python
from senta_test.module import SentaTest

SentaTest.sentiment_classify(texts=["这部电影太差劲了"])
```

### Call Method 4

Install the Module on the local machine and run it through hub run.

```shell
➜  hub install senta_test
➜  hub run senta_test --input_text "这部电影太差劲了"
```

## IV. Release Module

W
wuzewu 已提交
237
After completing the development and testing of the module, if you want to share the model with others, you can release the model by **Upload the Module to the PaddleHub website**.
D
Daniel Yang 已提交
238 239 240 241

https://www.paddlepaddle.org.cn/hub

We will complete the review of the module and give feedback in the shortest possible time. After passing the review and going online, the module will be displayed on the PaddleHub website, and users can load it like any other official modules.