- PaddleHub provides several open source pretrained word embedding model. These embedding models are distinguished according to the corpus, training methods and word embedding dimensions. For more informations, please refer to: [Summary of embedding models](https://github.com/PaddlePaddle/models/blob/release/2.0-beta/PaddleNLP/docs/embeddings.md)
- In case of any problems during installation, please refer to: [Windows_Quickstart](../../../../docs/docs_ch/get_start/windows_quickstart_en.md) | [Linux_Quickstart](../../../../docs/docs_ch/get_start/linux_quickstart_en.md) | [Mac_Quickstart](../../../../docs/docs_ch/get_start/mac_quickstart_en.md)
# Calculate the cosine similarity of two word vectors
embedding.cosine_sim("中国", "美国")
# Calculate the inner product of two word vectors
embedding.dot("中国", "美国")
```
- ### 2、API
-```python
def __init__(
*args,
**kwargs
)
```
- Construct an embedding module object without parameters by default.
- **Parameters**
- `*args`: Arguments specified by the user.
- `**kwargs`:Keyword arguments specified by the user.
- More info[paddlenlp.embeddings](https://github.com/PaddlePaddle/models/tree/release/2.0-beta/PaddleNLP/paddlenlp/embeddings)
-```python
def search(
words: Union[List[str], str, int],
)
```
- Return the embedding of one or multiple words. The input data type can be `str`, `List[str]` and `int`, represent word, multiple words and the embedding of specified word id accordingly. Word id is related to the model vocab, vocab can be obtained by the attribute of `vocab`.
- **参数**
- `words`: input words or word id.
-```python
def cosine_sim(
word_a: str,
word_b: str,
)
```
- Cosine similarity calculation. `word_a` and `word_b` should be in the voacb, or they will be replaced by `unknown_token`.
- **参数**
- `word_a`: input word a.
- `word_b`: input word b.
-```python
def dot(
word_a: str,
word_b: str,
)
```
- Inner product calculation. `word_a` and `word_b` should be in the voacb, or they will be replaced by `unknown_token`.
- **参数**
- `word_a`: input word a.
- `word_b`: input word b.
-```python
def get_vocab_path()
```
- Get the path of the local vocab file.
-```python
def get_tokenizer(*args, **kwargs)
```
- Get the tokenizer of current model, it will return an instance of JiebaTokenizer, only supports the chinese embedding models currently.
- **参数**
- `*args`: Arguments specified by the user.
- `**kwargs`: Keyword arguments specified by the user.
- For more information about the arguments, please refer to[paddlenlp.data.tokenizer.JiebaTokenizer](https://github.com/PaddlePaddle/models/blob/release/2.0-beta/PaddleNLP/paddlenlp/data/tokenizer.py)
- For more information about the usage, please refer to[paddlenlp.embeddings](https://github.com/PaddlePaddle/models/tree/release/2.0-beta/PaddleNLP/paddlenlp/embeddings)
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of cosine similarity calculation.