Add README_en.md

51c6d70d · linjieccc · b3f19bfc · 51c6d70d
隐藏空白更改
内联并排

Showing with 178 addition and 0 deletion

modules/text/embedding/fasttext_crawl_target_word-word_dim300_en/README_en.md ...ng/fasttext_crawl_target_word-word_dim300_en/README_en.md +178 -0

未找到文件。
--- a/modules/text/embedding/fasttext_crawl_target_word-word_dim300_en/README_en.md
+++ b/modules/text/embedding/fasttext_crawl_target_word-word_dim300_en/README_en.md
+# fasttext_crawl_target_word-word_dim300_en
+|Module Name|fasttext_crawl_target_word-word_dim300_en|
+| :--- | :---: | 
+|Category|Word Embedding|
+|Network|fasttext|
+|Dataset|crawl|
+|Fine-tuning supported or not|No|
+|Module size|1.19GB|
+|Vocab size|2,000,002|
+|Lastest update date|26 Feb, 2021|
+|Data indicators|-|
+
+## I. Basic Information
+
+- ### Module Introduction
+
+    - PaddleHub provides several open source pretrained word embedding model. These embedding models are distinguished according to the corpus, training methods and word embedding dimensions. For more informations, please refer to: [Summary of embedding models](https://github.com/PaddlePaddle/models/blob/release/2.0-beta/PaddleNLP/docs/embeddings.md)
+
+## II. Installation
+
+- ### 1. Environmental Dependence
+
+  - paddlepaddle >= 2.0.0
+
+  - paddlehub >= 2.0.0    | [PaddleHub Installation Guide](../../../../docs/docs_ch/get_start/installation_en.rst)
+
+- ### 2. Installation
+
+  - ```shell
+    $ hub install fasttext_crawl_target_word-word_dim300_en
+    ```
+
+  - In case of any problems during installation, please refer to: [Windows_Quickstart](../../../../docs/docs_ch/get_start/windows_quickstart_en.md) | [Linux_Quickstart](../../../../docs/docs_ch/get_start/linux_quickstart_en.md) | [Mac_Quickstart](../../../../docs/docs_ch/get_start/mac_quickstart_en.md)
+
+## III. Module API Prediction
+
+- ### 1. Prediction Code Example
+
+  - ```
+    import paddlehub as hub
+    embedding = hub.Module(name='fasttext_crawl_target_word-word_dim300_en')
+
+    # Get the embedding of the word
+    embedding.search("中国")
+    # Calculate the cosine similarity of two word vectors
+    embedding.cosine_sim("中国", "美国")
+    # Calculate the inner product of two word vectors
+    embedding.dot("中国", "美国")
+    ```
+
+- ### 2、API
+
+  - ```python
+    def __init__(
+        *args,
+        **kwargs
+    )
+    ```
+
+    - Construct an embedding module object without parameters by default.
+
+    - **Parameters**
+      - `*args`： Arguments specified by the user.
+      - `**kwargs`：Keyword arguments specified by the user.
+
+    - More info[paddlenlp.embeddings](https://github.com/PaddlePaddle/models/tree/release/2.0-beta/PaddleNLP/paddlenlp/embeddings)
+
+
+  - ```python
+    def search(
+        words: Union[List[str], str, int],
+    )
+    ```
+
+    - Return the embedding of one or multiple words. The input data type can be `str`, `List[str]` and `int`, represent word, multiple words and the embedding of specified word id accordingly. Word id is related to the model vocab, vocab can be obtained by the attribute of `vocab`.
+
+    - **参数**
+      - `words`： input words or word id.
+
+
+  - ```python
+    def cosine_sim(
+        word_a: str,
+        word_b: str,
+    )
+    ```
+
+    - Cosine similarity calculation. `word_a` and `word_b` should be in the voacb, or they will be replaced by `unknown_token`. 
+
+    - **参数**
+      - `word_a`： input word a.
+      - `word_b`： input word b.
+
+
+  - ```python
+    def dot(
+        word_a: str,
+        word_b: str,
+    )
+    ```
+
+    - Inner product calculation. `word_a` and `word_b` should be in the voacb, or they will be replaced by `unknown_token`. 
+
+    - **参数**
+      - `word_a`： input word a.
+      - `word_b`： input word b.
+
+
+  - ```python
+    def get_vocab_path()
+    ```
+
+    - Get the path of the local vocab file.
+
+
+  - ```python
+    def get_tokenizer(*args, **kwargs)
+    ```
+
+    - Get the tokenizer of current model, it will return an instance of JiebaTokenizer, only supports the chinese embedding models currently.
+
+    - **参数**
+      - `*args`: Arguments specified by the user.
+      - `**kwargs`: Keyword arguments specified by the user.
+    
+    - For more information about the arguments, please refer to[paddlenlp.data.tokenizer.JiebaTokenizer](https://github.com/PaddlePaddle/models/blob/release/2.0-beta/PaddleNLP/paddlenlp/data/tokenizer.py)
+
+  - For more information about the usage, please refer to[paddlenlp.embeddings](https://github.com/PaddlePaddle/models/tree/release/2.0-beta/PaddleNLP/paddlenlp/embeddings)
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of cosine similarity calculation.
+
+- ### Step 1: Start PaddleHub Serving
+
+  - Run the startup command:
+
+  - ```shell
+    $ hub serving start -m fasttext_crawl_target_word-word_dim300_en
+    ```
+
+  - The servitization API is now deployed and the default port number is 8866.
+
+  - **NOTE:** If GPU is used for prediction, set `CUDA_VISIBLE_DEVICES` environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+  - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+  - ```python
+    import requests
+    import json
+
+    # Specify the word pairs used to calculate the cosine similarity [[word_a, word_b], [word_a, word_b], ... ]]
+    word_pairs = [["中国", "美国"], ["今天", "明天"]]
+    data = {"data": word_pairs}
+    # Send an HTTP request
+    url = "http://127.0.0.1:8866/predict/fasttext_crawl_target_word-word_dim300_en"
+    headers = {"Content-Type": "application/json"}
+
+    r = requests.post(url=url, headers=headers, data=json.dumps(data))
+    print(r.json())
+    ```
+
+
+## V. Release Note
+
+* 1.0.0
+
+  First release
+
+* 1.0.1
+
+  Model optimization
+  - ```shell
+    $ hub install fasttext_crawl_target_word-word_dim300_en==1.0.1
+    ```
\ No newline at end of file