Merge branch 'release/v1.2' into develop

d5a4a7f5 · zhangxuefei · 12555ef0 · 2d1fbd26 · d5a4a7f5 · d5a4a7f5
34 changed file
--- a/README.md
+++ b/README.md
@@ -9,26 +9,28 @@ PaddleHub是基于PaddlePaddle生态下的预训练模型管理和迁移学习
 * 便捷地获取PaddlePaddle生态下的所有预训练模型，涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、语言模型、视频分类、图像生成、图像分割等主流模型。
  * 更多详情可查看官网：https://www.paddlepaddle.org.cn/hub
 * 通过PaddleHub Fine-tune API，结合少量代码即可完成**大规模预训练模型**的迁移学习，具体Demo可参考以下链接：
-  * [文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.1.0/demo/text-classification)
+  * [文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/text-classification)
-  * [序列标注](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.1.0/demo/sequence-labeling)
+  * [序列标注](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/sequence-labeling)
-  * [多标签分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.1.0/demo/multi-label-classification)
+  * [多标签分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/multi-label-classification)
-  * [图像分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.1.0/demo/image-classification)
+  * [图像分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/image-classification)
-  * [检索式问答任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.1.0/demo/qa_classification)
+  * [检索式问答任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/qa_classification)
-  * [回归任务](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/sentence_similarity)
+  * [回归任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/sentence_similarity)
-  * [句子语义相似度计算](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/sentence_similarity)
+  * [句子语义相似度计算](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/sentence_similarity)
-  * [阅读理解任务](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/reading-comprehension)
+  * [阅读理解任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/reading-comprehension)
+* PaddleHub支持超参优化（Auto Fine-tune），给定Fine-tune任务运行脚本以及超参搜索范围，Auto Fine-tune即可给出对于当前任务的较佳超参数组合。
+  * [PaddleHub超参优化功能autofinetune使用教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/tutorial/autofinetune.md)
 * PaddleHub引入『**模型即软件**』的设计理念，支持通过Python API或者命令行工具，一键完成预训练模型地预测，更方便的应用PaddlePaddle模型库。
  * [PaddleHub命令行工具介绍](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%B7%A5%E5%85%B7)
 ## 目录
-* [安装](https://github.com/paddlepaddle/paddlehub#%E5%AE%89%E8%A3%85)
+* [安装](#%E5%AE%89%E8%A3%85)
-* [快速体验](https://github.com/paddlepaddle/paddlehub#%E5%BF%AB%E9%80%9F%E4%BD%93%E9%AA%8C)
+* [快速体验](#%E5%BF%AB%E9%80%9F%E4%BD%93%E9%AA%8C)
-* [教程](https://github.com/paddlepaddle/paddlehub#%E6%95%99%E7%A8%8B)
+* [教程](#%E6%95%99%E7%A8%8B)
-* [FAQ](https://github.com/paddlepaddle/paddlehub#faq)
+* [FAQ](#faq)
-* [用户交流群](https://github.com/paddlepaddle/paddlehub#%E7%94%A8%E6%88%B7%E4%BA%A4%E6%B5%81%E7%BE%A4)
+* [用户交流群](#%E7%94%A8%E6%88%B7%E4%BA%A4%E6%B5%81%E7%BE%A4)
-* [更新历史](https://github.com/paddlepaddle/paddlehub#%E6%9B%B4%E6%96%B0%E5%8E%86%E5%8F%B2)
+* [更新历史](#%E6%9B%B4%E6%96%B0%E5%8E%86%E5%8F%B2)
 ## 安装
@@ -74,7 +76,7 @@ $ hub run ssd_mobilenet_v1_pascal --input_path test_object_detection.jpg
 $ hub run yolov3_coco2017 --input_path test_object_detection.jpg
 $ hub run faster_rcnn_coco2017 --input_path test_object_detection.jpg
 ```
-![SSD检测结果](https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.0.0/docs/imgs/object_detection_result.png)
+![SSD检测结果](https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.2/docs/imgs/object_detection_result.png)
 除了上述三类模型外，PaddleHub还发布了语言模型、语义模型、图像分类、生成模型、视频分类等业界主流模型，更多PaddleHub已经发布的模型，请前往 https://www.paddlepaddle.org.cn/hub 查看
@@ -103,11 +105,9 @@ PaddleHub如何完成迁移学习，详情参考[wiki教程](https://github.com/
 PaddleHub如何自定义迁移任务，详情参考[wiki教程](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub:-%E8%87%AA%E5%AE%9A%E4%B9%89Task)
-如何使用PaddleHub超参优化功能，详情参考[autofinetune使用教程](https://github.com/PaddlePaddle/PaddleHub/blob/develop/tutorial/autofinetune.md)
+如何使用PaddleHub超参优化功能，详情参考[autofinetune使用教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/tutorial/autofinetune.md)
-如何使用PaddleHub“端到端地”完成文本相似度计算，详情参考[word2vce使用教程](https://github.com/PaddlePaddle/PaddleHub/blob/develop/tutorial/sentence_sim.ipynb)
+如何使用ULMFiT策略微调PaddleHub预训练模型，详情参考[PaddleHub 迁移学习与ULMFiT微调策略](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/tutorial/strategy_exp.md)
-如何使用ULMFiT策略微调PaddleHub预训练模型，详情参考[PaddleHub 迁移学习与ULMFiT微调策略](https://github.com/PaddlePaddle/PaddleHub/blob/develop/tutorial/strategy_exp.md)
 ## FAQ
@@ -155,4 +155,4 @@ print(res)
 ## 更新历史
-详情参考[更新历史](https://github.com/PaddlePaddle/PaddleHub/blob/develop/RELEASE.md)
+详情参考[更新历史](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/RELEASE.md)
--- a/demo/elmo/README.md
+++ b/demo/elmo/README.md
@@ -41,7 +41,7 @@ reader = hub.reader.LACClassifyReader(
    vocab_path=module.get_vocab_path())
 ```
-其中数据集的准备代码可以参考 [chnsenticorp.py](https://github.com/PaddlePaddle/PaddleHub/blob/develop/paddlehub/dataset/chnsenticorp.py)
+其中数据集的准备代码可以参考 [chnsenticorp.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/paddlehub/dataset/chnsenticorp.py)
 `hub.dataset.ChnSentiCorp()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录

--- a/demo/image-classification/README.md
+++ b/demo/image-classification/README.md
@@ -2,7 +2,7 @@
 ## 关于
-本示例将展示如何使用PaddleHub Finetune API以及[图像分类](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification)预训练模型完成分类任务。
+本示例将展示如何使用PaddleHub Finetune API以及[图像分类](https://github.com/PaddlePaddle/models/tree/release/v1.2/PaddleCV/image_classification)预训练模型完成分类任务。
 ## 准备工作

--- a/demo/multi-label-classification/README.md
+++ b/demo/multi-label-classification/README.md
@@ -4,7 +4,7 @@
 ## 如何开始Finetune
-在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_classifier.sh`即可开始使用BERT对Toxic数据集进行Finetune。**由于BERT模型计算量较大，建议在GPU上使用，且显存需要大于14GB**
+在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_classifier.sh`即可开始使用BERT对Toxic数据集进行Finetune。
 其中脚本参数说明如下：
@@ -42,7 +42,7 @@ reader = hub.reader.MultiLabelClassifyReader(
    max_seq_len=128)
 ```
-其中数据集的准备代码可以参考 [toxic.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.0.0/paddlehub/dataset/toxic.py)
+其中数据集的准备代码可以参考 [toxic.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/paddlehub/dataset/toxic.py)
 `hub.dataset.Toxic()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录

--- a/demo/qa_classification/README.md
+++ b/demo/qa_classification/README.md
@@ -4,7 +4,7 @@
 ## 如何开始Finetune
-在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_classifier.sh`即可开始使用ERNIE对NLPCC-DBQA数据集进行Finetune。**由于ERNIE模型计算量较大，建议在GPU上使用，且显存需要大于14GB**
+在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_classifier.sh`即可开始使用ERNIE对NLPCC-DBQA数据集进行Finetune。
 其中脚本参数说明如下：
@@ -61,7 +61,7 @@ reader = hub.reader.ClassifyReader(
    max_seq_len=128)
 ```
-其中数据集的准备代码可以参考 [nlpcc_dbqa.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.0.0/paddlehub/dataset/nlpcc_dbqa.py)
+其中数据集的准备代码可以参考 [nlpcc_dbqa.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/paddlehub/dataset/nlpcc_dbqa.py)
 `hub.dataset.NLPCC_DBQA())` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录

--- a/demo/reading-comprehension/README.md
+++ b/demo/reading-comprehension/README.md
@@ -4,7 +4,7 @@
 ## 如何开始Finetune
-在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_finetune.sh`即可开始使用BERT对SQuAD数据集进行Finetune。**由于BERT模型计算量较大，建议在GPU上使用，且显存需要大于14GB**
+在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_finetune.sh`即可开始使用BERT对SQuAD数据集进行Finetune。
 其中脚本参数说明如下：
@@ -48,7 +48,7 @@ reader = hub.reader.ReadingComprehensionReader(
    max_query_length=64)
 ```
-其中数据集的准备代码可以参考 [squad.py](https://github.com/PaddlePaddle/PaddleHub/blob/develop/paddlehub/dataset/squad.py)
+其中数据集的准备代码可以参考 [squad.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/paddlehub/dataset/squad.py)
 `hub.dataset.SQUAD()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录

--- a/demo/regression/README.md
+++ b/demo/regression/README.md
@@ -5,7 +5,7 @@
 ## 如何开始Finetune
-在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_regression.sh`即可开始使用BERT对GLUE-STSB数据集进行Finetune。**由于ERNIE模型计算量较大，建议在GPU上使用，且显存需要大于14GB**
+在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_regression.sh`即可开始使用BERT对GLUE-STSB数据集进行Finetune。
 其中脚本参数说明如下：
@@ -45,7 +45,7 @@ reader = hub.reader.RegressionReader(
    max_seq_len=args.max_seq_len)
 ```
-其中数据集的准备代码可以参考 [glue.py](https://github.com/PaddlePaddle/PaddleHub/blob/develop/paddlehub/dataset/glue.py)
+其中数据集的准备代码可以参考 [glue.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/paddlehub/dataset/glue.py)
 `hub.dataset.GLUE("STS-B")` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录

--- a/demo/sequence-labeling/README.md
+++ b/demo/sequence-labeling/README.md
@@ -2,7 +2,7 @@
 ## 如何开始Finetune
-在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_sequence_label.sh`即可开始使用ERNIE对MSRA_NER数据集进行Finetune。**由于ERNIE模型计算量较大，建议在GPU上使用，且显存需要大于14GB**
+在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_sequence_label.sh`即可开始使用ERNIE对MSRA_NER数据集进行Finetune。
 其中脚本参数说明如下：
@@ -57,7 +57,7 @@ reader = hub.reader.SequenceLabelReader(
    max_seq_len=128)
 ```
-其中数据集的准备代码可以参考 [msra_ner.py](https://github.com/PaddlePaddle/PaddleHub/blob/develop/paddlehub/dataset/msra_ner.py)
+其中数据集的准备代码可以参考 [msra_ner.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/paddlehub/dataset/msra_ner.py)
 `hub.dataset.MSRA_NER()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录

--- a/demo/ssd/README.md
+++ b/demo/ssd/README.md
@@ -4,7 +4,7 @@
 本示例展示如何使用SSD Module进行预测。
-SSD是一个目标检测模型，可以检测出图片中的实物的类别和位置，PaddleHub发布的SSD模型通过pascalvoc数据集训练，支持20个数据类别的检测，关于模型的训练细节，请查看[SSD](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/object_detection)
+SSD是一个目标检测模型，可以检测出图片中的实物的类别和位置，PaddleHub发布的SSD模型通过pascalvoc数据集训练，支持20个数据类别的检测，关于模型的训练细节，请查看[SSD](https://github.com/PaddlePaddle/models/tree/release/v1.2/PaddleCV/object_detection)
 ## 准备工作

--- a/demo/text-classification/README.md
+++ b/demo/text-classification/README.md
@@ -21,7 +21,7 @@
 ## 如何开始Finetune
-在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_classifier.sh`即可开始使用ERNIE对ChnSentiCorp数据集进行Finetune。**由于ERNIE模型计算量较大，建议在GPU上使用，且显存需要大于14GB**
+在完成安装PaddlePaddle与PaddleHub后，通过执行脚本`sh run_classifier.sh`即可开始使用ERNIE对ChnSentiCorp数据集进行Finetune。
 其中脚本参数说明如下：
@@ -86,7 +86,7 @@ reader = hub.reader.ClassifyReader(
 metrics_choices = ["acc"]
 ```
-其中数据集的准备代码可以参考 [chnsenticorp.py](https://github.com/PaddlePaddle/PaddleHub/blob/develop/paddlehub/dataset/chnsenticorp.py)
+其中数据集的准备代码可以参考 [chnsenticorp.py](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/paddlehub/dataset/chnsenticorp.py)
 `hub.dataset.ChnSentiCorp()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录

--- a/docs/imgs/bayesian_optimization.gif
+++ b/docs/imgs/bayesian_optimization.gif
--- a/docs/imgs/pbt_optimization.gif
+++ b/docs/imgs/pbt_optimization.gif
--- a/docs/imgs/thermodynamics.gif
+++ b/docs/imgs/thermodynamics.gif
--- a/paddlehub/autofinetune/autoft.py
+++ b/paddlehub/autofinetune/autoft.py
@@ -63,7 +63,7 @@ class BaseTuningStrategy(object):
            self._output_dir = "output_" + time_str
        else:
            self._output_dir = output_dir
-        self.writer = SummaryWriter(logdir=self._output_dir + '/tb_paddle')
+        self.writer = SummaryWriter(logdir=self._output_dir + '/visualization')
    @property
    def thread(self):

--- a/paddlehub/commands/autofinetune.py
+++ b/paddlehub/commands/autofinetune.py
@@ -84,12 +84,12 @@ class AutoFineTuneCommand(BaseCommand):
        self.arg_config_group.add_argument(
            "--evaluate_choice",
            type=str,
-            default="fulltrail",
+            default="modelbased",
            help="Choices: fulltrail or modelbased.")
        self.arg_config_group.add_argument(
            "--tuning_strategy",
            type=str,
-            default="HAZero",
+            default="pshe2",
            help="Choices: HAZero or PSHE2.")
        self.arg_config_group.add_argument(
            'opts',
@@ -184,7 +184,7 @@ class AutoFineTuneCommand(BaseCommand):
            run_round_cnt = run_round_cnt + 1
        print("PaddleHub Autofinetune ends.")
-        with open("./log_file.txt", "w") as f:
+        with open(autoft._output_dir + "/log_file.txt", "w") as f:
            best_hparams = evaluator.convert_params(autoft.get_best_hparams())
            print("The final best hyperparameters:")
            f.write("The final best hyperparameters:\n")
@@ -195,8 +195,8 @@ class AutoFineTuneCommand(BaseCommand):
            f.write("\t".join(autoft.hparams_name_list) +
                    "\tsaved_params_dir\n\n")
            print(
-                "The checkpont directory of programs ran with hyperparamemters searched are saved as log_file.txt ."
+                "The related infomation  about hyperparamemters searched are saved as %s/log_file.txt ."
-            )
+                % autoft._output_dir)
            for solution, modeldir in solutions_modeldirs.items():
                param = evaluator.convert_params(solution)
                param = [str(p) for p in param]

--- a/paddlehub/commands/download.py
+++ b/paddlehub/commands/download.py
@@ -55,16 +55,26 @@ class DownloadCommand(BaseCommand):
        self.args = self.parser.parse_args(argv[1:])
        self.args.type = self.check_type(self.args.type)
+        extra = {"command": "download"}
        if self.args.type in ["Module", "Model"]:
            search_result = default_hub_server.get_resource_url(
-                mod_name, resource_type=self.args.type, version=mod_version)
+                mod_name,
+                resource_type=self.args.type,
+                version=mod_version,
+                extra=extra)
        else:
            search_result = default_hub_server.get_resource_url(
-                mod_name, resource_type="Module", version=mod_version)
+                mod_name,
+                resource_type="Module",
+                version=mod_version,
+                extra=extra)
            self.args.type = "Module"
            if search_result == {}:
                search_result = default_hub_server.get_resource_url(
-                    mod_name, resource_type="Model", version=mod_version)
+                    mod_name,
+                    resource_type="Model",
+                    version=mod_version,
+                    extra=extra)
                self.args.type = "Model"
        url = search_result.get('url', None)
        except_md5_value = search_result.get('md5', None)

--- a/paddlehub/commands/hub.py
+++ b/paddlehub/commands/hub.py
@@ -51,7 +51,6 @@ class HubCommand(BaseCommand):
            help.command.execute(argv)
            exit(1)
            return False
-        srv_utils.hub_stat(['hub'] + argv)
        command = BaseCommand.command_dict[sub_command]
        return command.execute(argv[1:])

--- a/paddlehub/commands/install.py
+++ b/paddlehub/commands/install.py
@@ -47,8 +47,9 @@ class InstallCommand(BaseCommand):
            "==")[1]
        module_name = module_name if "==" not in module_name else module_name.split(
            "==")[0]
+        extra = {"command": "install"}
        result, tips, module_dir = default_module_manager.install_module(
-            module_name=module_name, module_version=module_version)
+            module_name=module_name, module_version=module_version, extra=extra)
        print(tips)
        return True

--- a/paddlehub/commands/run.py
+++ b/paddlehub/commands/run.py
@@ -62,8 +62,9 @@ class RunCommand(BaseCommand):
                module_dir = (module_name, None)
            else:
                print("Install Module %s" % module_name)
+                extra = {"command": "install"}
                result, tips, module_dir = default_module_manager.install_module(
-                    module_name)
+                    module_name, extra=extra)
                print(tips)
                if not result:
                    return None

--- a/paddlehub/commands/search.py
+++ b/paddlehub/commands/search.py
@@ -43,8 +43,9 @@ class SearchCommand(BaseCommand):
            argv = ['.*']
        resource_name = argv[0]
+        extra = {"command": "search"}
        resource_list = default_hub_server.search_resource(
-            resource_name, resource_type="Module")
+            resource_name, resource_type="Module", extra=extra)
        if utils.is_windows():
            placeholders = [20, 8, 8, 20]
        else:

--- a/paddlehub/common/hub_server.py
+++ b/paddlehub/common/hub_server.py
@@ -96,13 +96,17 @@ class HubServer(object):
            return False
        return True
-    def search_resource(self, resource_key, resource_type=None, update=False):
+    def search_resource(self,
+                        resource_key,
+                        resource_type=None,
+                        update=False,
+                        extra=None):
        try:
            payload = {'word': resource_key}
            if resource_type:
                payload['type'] = resource_type
            api_url = srv_utils.uri_path(self.get_server_url(), 'search')
-            r = srv_utils.hub_request(api_url, payload)
+            r = srv_utils.hub_request(api_url, payload, extra=extra)
            if r['status'] == 0 and len(r['data']) > 0:
                return [(item['name'], item['type'], item['version'],
                         item['summary']) for item in r['data']]
@@ -147,7 +151,8 @@ class HubServer(object):
                         resource_name,
                         resource_type=None,
                         version=None,
-                         update=False):
+                         update=False,
+                         extra=None):
        try:
            payload = {'word': resource_name}
            if resource_type:
@@ -155,7 +160,7 @@ class HubServer(object):
            if version:
                payload['version'] = version
            api_url = srv_utils.uri_path(self.get_server_url(), 'search')
-            r = srv_utils.hub_request(api_url, payload)
+            r = srv_utils.hub_request(api_url, payload, extra)
            if r['status'] == 0 and len(r['data']) > 0:
                for item in r['data']:
                    if resource_name.lower() == item['name'].lower():
@@ -200,19 +205,26 @@ class HubServer(object):
        return {}
-    def get_module_url(self, module_name, version=None, update=False):
+    def get_module_url(self,
+                       module_name,
+                       version=None,
+                       update=False,
+                       extra=None):
        return self.get_resource_url(
            resource_name=module_name,
            resource_type="Module",
            version=version,
-            update=update)
+            update=update,
+            extra=extra)
-    def get_model_url(self, module_name, version=None, update=False):
+    def get_model_url(self, module_name, version=None, update=False,
+                      extra=None):
        return self.get_resource_url(
            resource_name=module_name,
            resource_type="Model",
            version=version,
-            update=update)
+            update=update,
+            extra=extra)
    def request(self):
        if not os.path.exists(hub.CACHE_HOME):

--- a/paddlehub/common/lock.py
+++ b/paddlehub/common/lock.py
-import fcntl
 import os
+if os.name == "posix":
+    import fcntl
 class WinLock(object):
@@ -15,23 +16,22 @@ class Lock(object):
    _owner = None
    def __init__(self):
-        self.LOCK_EX = fcntl.LOCK_EX
-        self.LOCK_UN = fcntl.LOCK_UN
-        self.LOCK_TE = ""
        if os.name == "posix":
            self.lock = fcntl
        else:
            self.lock = WinLock()
        _lock = self.lock
+        self.LOCK_EX = self.lock.LOCK_EX
+        self.LOCK_UN = self.lock.LOCK_UN
    def get_lock(self):
        return self.lock
    def flock(self, fp, cmd):
-        if cmd == fcntl.LOCK_UN:
+        if cmd == self.lock.LOCK_UN:
            Lock._owner = None
            self.lock.flock(fp, cmd)
-        elif cmd == fcntl.LOCK_EX:
+        elif cmd == self.lock.LOCK_EX:
            if Lock._owner is None:
                Lock._owner = os.getpid()
                self.lock.flock(fp, cmd)

--- a/paddlehub/common/server_config.py
+++ b/paddlehub/common/server_config.py
@@ -14,16 +14,9 @@
 HUB_SERVERS = ["http://paddlepaddle.org.cn/paddlehub"]
-STAT_SERVERS = [
-    "http://paddlepaddle.org.cn/paddlehub/stat",
-    "http://paddlepaddle.org.cn/paddlehub/stat"
-]
 default_server_config = {
    "server_url": HUB_SERVERS,
    "resource_storage_server_url": "https://bj.bcebos.com/paddlehub-data/",
    "debug": False,
    "log_level": "DEBUG"
 }
-default_stat_config = {"server_list": STAT_SERVERS}
--- a/paddlehub/common/srv_utils.py
+++ b/paddlehub/common/srv_utils.py
@@ -17,38 +17,11 @@ import requests
 import time
 import paddle
 import socket
+import json
 from random import randint, seed
 from paddlehub import version
-from paddlehub.common.server_config import default_stat_config
-def get_stat_server():
-    seed(int(time.time()))
-    stat_env = os.environ.get('HUB_SERVER_STAT_SRV')
-    if stat_env:
-        server_list = stat_env.split(';')
-    else:
-        server_list = default_stat_config['server_list']
-    return server_list[randint(0, len(server_list) - 1)]
-def hub_stat(argv):
-    try:
-        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
-        s.connect(('bj.bcebos.com', 80))
-        ip_addr = s.getsockname()[0]
-        params = {
-            'command': ' '.join(argv),
-            'hub_version': version.hub_version,
-            'paddle_version': paddle.__version__,
-            'ip_addr': ip_addr
-        }
-        stat_api = get_stat_server()
-        r = requests.get(stat_api, params=params, timeout=0.5)
-    except:
-        pass
 def uri_path(server_url, api):
@@ -63,8 +36,9 @@ def uri_path(server_url, api):
    return srv
-def hub_request(api, params):
+def hub_request(api, params, extra=None):
    params['hub_version'] = version.hub_version
    params['paddle_version'] = paddle.__version__
+    params["extra"] = json.dumps(extra)
    r = requests.get(api, params)
    return r.json()
--- a/paddlehub/finetune/task/basic_task.py
+++ b/paddlehub/finetune/task/basic_task.py
@@ -21,6 +21,7 @@ import os
 import contextlib
 import time
 import copy
+import logging
 import paddle.fluid as fluid
 from tb_paddle import SummaryWriter
@@ -286,6 +287,11 @@ class BasicTask(object):
                    build_strategy=self.build_strategy)
        self.exe.run(self.env.startup_program)
+        # to avoid to print logger two times in result of the logger usage of paddle-fluid
+        for handler in logging.root.handlers[:]:
+            logging.root.removeHandler(handler)
        self._build_env_end_event()
    @property

--- a/paddlehub/io/parser.py
+++ b/paddlehub/io/parser.py
@@ -72,13 +72,21 @@ class TextFileParser(object):
        pass
    def parse(self, txt_file, use_strip=True):
-        with codecs.open(txt_file, "r", sys_stdin_encoding()) as file:
+        contents = []
-            contents = []
+        try:
-            for line in file:
+            with codecs.open(txt_file, "r", encoding="utf8") as file:
-                if use_strip:
+                for line in file:
-                    line = line.strip()
+                    if use_strip:
-                if line:
+                        line = line.strip()
-                    contents.append(line)
+                    if line:
+                        contents.append(line)
+        except:
+            with codecs.open(txt_file, "r", encoding="gbk") as file:
+                for line in file:
+                    if use_strip:
+                        line = line.strip()
+                    if line:
+                        contents.append(line)
        return contents

--- a/paddlehub/module/manager.py
+++ b/paddlehub/module/manager.py
@@ -75,7 +75,11 @@ class LocalModuleManager(object):
        self.all_modules(update=update)
        return self.modules_dict.get(module_name, None)
-    def install_module(self, module_name, module_version=None, upgrade=False):
+    def install_module(self,
+                       module_name,
+                       module_version=None,
+                       upgrade=False,
+                       extra=None):
        self.all_modules(update=True)
        module_info = self.modules_dict.get(module_name, None)
        if module_info:
@@ -84,13 +88,12 @@ class LocalModuleManager(object):
                module_dir = self.modules_dict[module_name][0]
                module_tag = module_name if not module_version else '%s-%s' % (
                    module_name, module_version)
-                srv_utils.hub_stat(['installed', module_tag])
                tips = "Module %s already installed in %s" % (module_tag,
                                                              module_dir)
                return True, tips, self.modules_dict[module_name]
        search_result = hub.default_hub_server.get_module_url(
-            module_name, version=module_version)
+            module_name, version=module_version, extra=extra)
        name = search_result.get('name', None)
        url = search_result.get('url', None)
        md5_value = search_result.get('md5', None)
@@ -102,7 +105,6 @@ class LocalModuleManager(object):
                tips += " with version %s" % module_version
            module_tag = module_name if not module_version else '%s-%s' % (
                module_name, module_version)
-            srv_utils.hub_stat(['install fail', module_tag])
            return False, tips, None
        result, tips, module_zip_file = default_downloader.download_file(
@@ -127,7 +129,6 @@ class LocalModuleManager(object):
            shutil.move(module_dir, save_path)
            module_dir = save_path
            tips = "Successfully installed %s" % module_name
-            srv_utils.hub_stat(['install', module_name, url])
            if installed_module_version:
                tips += "-%s" % installed_module_version
            return True, tips, (module_dir, installed_module_version)
@@ -143,7 +144,6 @@ class LocalModuleManager(object):
                1]:
            tips = "%s-%s is not installed" % (module_name, module_version)
            return True, tips
-        srv_utils.hub_stat(['uninstall', module_name])
        tips = "Successfully uninstalled %s" % module_name
        if module_version:
            tips += '-%s' % module_version

--- a/paddlehub/module/module.py
+++ b/paddlehub/module/module.py
@@ -150,8 +150,9 @@ class Module(object):
        if version:
            log_msg += "-%s" % version
        logger.info(log_msg)
+        extra = {"command": "install"}
        result, tips, module_dir = default_module_manager.install_module(
-            module_name=name, module_version=version)
+            module_name=name, module_version=version, extra=extra)
        if not result:
            logger.error(tips)
            exit(1)

--- a/paddlehub/version.py
+++ b/paddlehub/version.py
@@ -13,5 +13,5 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """ PaddleHub version string """
-hub_version = "1.1.1"
+hub_version = "1.2.0"
 module_proto_version = "1.0.0"
--- a/tutorial/autofinetune-cv.md
+++ b/tutorial/autofinetune-cv.md
@@ -20,9 +20,6 @@ param_list:
  greater_than : 10
 ```
-**NOTE:** 该yaml文件的最外层级的key必须是param_list
 以下是图像分类的finetunee.py
 ```python
@@ -106,8 +103,9 @@ def finetune(args):
    eval_avg_score, eval_avg_loss, eval_run_speed = task._calculate_metrics(run_states)
    # Move ckpt/best_model to the defined saved parameters directory
-    if is_path_valid(args.saved_params_dir) and os.path.exists(config.checkpoint_dir+"/best_model/"):
+    best_model_dir = os.path.join(config.checkpoint_dir, "best_model")
-        shutil.copytree(config.checkpoint_dir+"/best_model/", args.saved_params_dir)
+    if is_path_valid(args.saved_params_dir) and os.path.exists(best_model_dir):
+        shutil.copytree(best_model_dir, args.saved_params_dir)
        shutil.rmtree(config.checkpoint_dir)
    print("AutoFinetuneEval"+"\t"+str(float(eval_avg_score["acc"])))
@@ -117,13 +115,3 @@ if __name__ == "__main__":
    args = parser.parse_args()
    finetune(args)
 ```
-**Note**:以上是finetunee.py的写法。
-> finetunee.py必须可以接收待优化超参数选项参数, 并且待搜素超参数选项名字和yaml文件中的超参数名字保持一致。
-> finetunee.py必须有saved_params_dir这个选项。
-> PaddleHub Auto Fine-tune超参评估策略选择为ModelBased，finetunee.py必须有model_path选项。
-> PaddleHub Auto Fine-tune优化超参策略选择hazero时，必须提供两个以上的待优化超参。
-> finetunee.py必须输出模型在数据集dev上的评价效果，同时以“AutoFinetuneEval"开始，和评价效果之间以“\t”分开，如print("AutoFinetuneEval"+"\t"+str(float(eval_avg_score["acc"])))。
--- a/tutorial/autofinetune-nlp.md
+++ b/tutorial/autofinetune-nlp.md
@@ -29,8 +29,6 @@ param_list:
  greater_than : 0.0
 ```
-**NOTE:** 该yaml文件的最外层级的key必须是param_list
 以下是中文情感分类的finetunee.py
 ```python
@@ -135,19 +133,10 @@ if __name__ == '__main__':
    eval_avg_score, eval_avg_loss, eval_run_speed = cls_task._calculate_metrics(run_states)
    # Move ckpt/best_model to the defined saved parameters directory
-    if is_path_valid(args.saved_params_dir) and os.path.exists(config.checkpoint_dir+"/best_model/"):
+    best_model_dir = os.path.join(config.checkpoint_dir, "best_model")
-        shutil.copytree(config.checkpoint_dir+"/best_model/", args.saved_params_dir)
+    if is_path_valid(args.saved_params_dir) and os.path.exists(best_model_dir):
+        shutil.copytree(best_model_dir, args.saved_params_dir)
        shutil.rmtree(config.checkpoint_dir)
    print("AutoFinetuneEval"+"\t"+str(float(eval_avg_score["acc"])))
 ```
-**Note**:以上是finetunee.py的写法。
-> finetunee.py必须可以接收待优化超参数选项参数, 并且待搜素超参数选项名字和yaml文件中的超参数名字保持一致。
-> finetunee.py必须有saved_params_dir这个选项。
-> PaddleHub Auto Fine-tune超参评估策略选择为ModelBased，finetunee.py必须有model_path选项。
-> PaddleHub Auto Fine-tune优化超参策略选择hazero时，必须提供两个以上的待优化超参。
-> finetunee.py必须输出模型在数据集dev上的评价效果，同时以“AutoFinetuneEval"开始，和评价效果之间以“\t”分开，如print("AutoFinetuneEval"+"\t"+str(float(eval_avg_score["acc"])))。
--- a/tutorial/autofinetune.md
+++ b/tutorial/autofinetune.md
@@ -6,10 +6,19 @@
 PaddleHub Auto Fine-tune提供两种超参优化策略：
-* HAZero: 核心思想是通过对正态分布中协方差矩阵的调整来处理变量之间的依赖关系和scaling。算法基本可以分成以下三步: 采样产生新解；计算目标函数值；更新正态分布参数。调整参数的基本思路为，调整参数使得产生更优解的概率逐渐增大
+* HAZero: 核心思想是通过对正态分布中协方差矩阵的调整来处理变量之间的依赖关系和scaling。算法基本可以分成以下三步: 采样产生新解；计算目标函数值；更新正态分布参数。调整参数的基本思路为，调整参数使得产生更优解的概率逐渐增大。优化过程如下图：
+<p align="center">
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.2/docs/imgs/bayesian_optimization.gif" hspace='10'/> <br />
+</p>
+*图片来源于https://www.kaggle.com/clair14/tutorial-bayesian-optimization*
 * PSHE2: 采用粒子群算法，最优超参数组合就是所求问题的解。现在想求得最优解就是要找到更新超参数组合，即如何更新超参数，才能让算法更快更好的收敛到最优解。PSHE2算法根据超参数本身历史的最优，在一定随机扰动的情况下决定下一步的更新方向。
+<p align="center">
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.2/docs/imgs/thermodynamics.gif" hspace='10'/> <br />
+</p>
 PaddleHub Auto Fine-tune提供两种超参评估策略：
 * FullTrail: 给定一组超参，利用这组超参从头开始Finetune一个新模型，之后在数据集dev部分评估这个模型
@@ -55,14 +64,14 @@ finetunee.py用于接受PaddleHub搜索到的超参进行一次优化过程，
 ```python
 print("AutoFinetuneEval"+"\t" + str(eval_acc))
 ```
 * 输出的评价效果取值范围应该为`(-∞, 1]`，取值越高，表示效果越好。
 ### 示例
-[PaddleHub Auto Fine-tune超参优化--NLP情感分类任务]()
+[PaddleHub Auto Fine-tune超参优化--NLP情感分类任务](./autofinetune-nlp.md)
-[PaddleHub Auto Fine-tune超参优化--CV图像分类任务]()
+[PaddleHub Auto Fine-tune超参优化--CV图像分类任务](./autofinetune-cv.md)
 ## 三、启动方式
@@ -87,25 +96,57 @@ $ hub autofinetune finetunee.py --param_file=hparam.yaml --cuda=['1','2'] --pops
 > `--output_dir`: 设置程序运行输出结果存放目录，可选，不指定该选项参数时，在当前运行路径下生成存放程序运行输出信息的文件夹
-> `--evaluate_choice`: 设置自动优化超参的评价效果方式，可选fulltrail和modelbased, 默认为fulltrail
+> `--evaluate_choice`: 设置自动优化超参的评价效果方式，可选fulltrail和modelbased, 默认为modelbased
+> `--tuning_strategy`: 设置自动优化超参策略，可选hazero和pshe2，默认为pshe2
+`NOTE`
+* 进行超参搜索时，一共会进行n轮(--round指定)，每轮产生m组超参(--popsize指定)进行搜索。每一轮的超参会根据上一轮的优化结果决定，当指定GPU数量不足以同时跑一轮时，Auto Fine-tune功能自动实现排队，为了提高GPU利用率，建议卡数为刚好可以被popsize整除。如popsize=6，cuda=['0','1','2','3']，则每搜索一轮，Auto Fine-tune自动起四个进程训练，所以第5/6组超参组合需要排队一次，在搜索第5/6两组超参时，会存在两张卡出现空闲等待的情况，如果设置为3张可用的卡，则可以避免这种情况的出现。
+## 四、目录结构
-> `--tuning_strategy`: 设置自动优化超参策略，可选hazero和pshe2，默认为hazero
+进行自动超参搜索时，PaddleHub会生成以下目录
+```
+./output_dir/
+    ├── log_file.txt
+    ├── visualization
+    ├── round0
+    ├── round1
+    ├── ...
+    └── roundn
+        ├── log-0.info
+        ├── log-1.info
+        ├── ...
+        ├── log-m.info
+        ├── model-0
+        ├── model-1
+        ├── ...
+        └── model-m
+```
+其中output_dir为启动autofinetune命令时指定的根目录，目录下:
+* log_file.txt记录了每一轮搜索所有的超参以及整个过程中所搜索到的最优超参
-**NOTE:** Auto Fine-tune功能会根据popsize和cuda自动实现排队使用GPU，如popsize=5，cuda=['0','1','2','3']，则每搜索一轮，Auto Fine-tune自动起四个进程训练，所以第5组超参组合需要排队一次。为了提高GPU利用率以及超参优化效率，此时建议可以设置为3张可用的卡，cuda=['0','1','2']。
+* visualization记录了可视化过程的日志文件
+* round0 ~ roundn记录了每一轮的数据，在每个round目录下，还存在以下文件：
-## 四、可视化
+  * log-0.info ~ log-m.info记录了每个搜索方向的日志
+  * model-0 ~ model-m记录了对应搜索的参数
+## 五、可视化
 Auto Finetune API在优化超参过程中会自动对关键训练指标进行打点，启动程序后执行下面命令
 ```shell
-$ tensorboard --logdir $OUTPUT/tb_paddle --host ${HOST_IP} --port ${PORT_NUM}
+$ tensorboard --logdir ${OUTPUT}/visualization --host ${HOST_IP} --port ${PORT_NUM}
 ```
-其中${HOST_IP}为本机IP地址，${PORT_NUM}为可用端口号，如本机IP地址为192.168.0.1，端口号8040，
+其中${OUTPUT}为AutoDL根目录，${HOST_IP}为本机IP地址，${PORT_NUM}为可用端口号，如本机IP地址为192.168.0.1，端口号8040，
-用浏览器打开192.168.0.1:8040，即可看到搜素过程中各超参以及指标的变化情况
+用浏览器打开192.168.0.1:8040，即可看到搜索过程中各超参以及指标的变化情况
-## 五、其他
+## 六、其他
 1. 如在使用Auto Fine-tune功能时，输出信息中包含如下字样：
@@ -113,10 +154,13 @@ $ tensorboard --logdir $OUTPUT/tb_paddle --host ${HOST_IP} --port ${PORT_NUM}
 首先根据终端上的输出信息，确定这个输出信息是在第几个round（如round 3），之后查看${OUTPUT}/round3/下的日志文件信息log.info, 查看具体出错原因。
-2. PaddleHub AutoFinetune 命令行支持从启动命令hub autofinetune传入finetunee.py中不需要搜索的选项参数，如上述示例中的max_seq_len选项，可以参照以下方式传入。
+2. PaddleHub AutoFinetune 命令行支持从启动命令hub autofinetune传入finetunee.py中不需要搜索的选项参数，如
+[PaddleHub Auto Fine-tune超参优化--NLP情感分类任务](./autofinetune-nlp.md)示例中的max_seq_len选项，可以参照以下方式传入。
 ```shell
 $ OUTPUT=result/
 $ hub autofinetune finetunee.py --param_file=hparam.yaml --cuda=['1','2'] --popsize=5 --round=10
 --output_dir=${OUTPUT} --evaluate_choice=fulltrail --tuning_strategy=pshe2 max_seq_len 128
 ```
+3. PaddleHub Auto Fine-tune功能使用过程中确认使用的GPU卡仅供PaddleHub使用，无其他任务使用。
--- a/tutorial/sentence_sim.ipynb
+++ b/tutorial/sentence_sim.ipynb
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# 使用Word2Vec进行文本语义相似度计算\n",
-    "\n",
-    "本示例展示利用PaddleHub“端到端地”完成文本相似度计算\n",
-    "\n",
-    "## 一、准备文本数据\n",
-    "\n",
-    "如\n",
-    "```\n",
-    "驾驶违章一次扣12分用两个驾驶证处理可以吗    一次性扣12分的违章,能用不满十二分的驾驶证扣分吗\n",
-    "水果放冰箱里储存好吗    中国银行纪念币网上怎么预约\n",
-    "电脑反应很慢怎么办    反应速度慢,电脑总是卡是怎么回事\n",
-    "```\n",
-    "\n",
-    "## 二、分词\n",
-    "利用PaddleHub Module LAC对文本数据进行分词"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# coding:utf-8\n",
-    "#  Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.\n",
-    "#\n",
-    "# Licensed under the Apache License, Version 2.0 (the \"License\"\n",
-    "# you may not use this file except in compliance with the License.\n",
-    "# You may obtain a copy of the License at\n",
-    "#\n",
-    "#     http://www.apache.org/licenses/LICENSE-2.0\n",
-    "#\n",
-    "# Unless required by applicable law or agreed to in writing, software\n",
-    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
-    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
-    "# See the License for the specific language governing permissions and\n",
-    "# limitations under the License.\n",
-    "\"\"\"similarity between two sentences\"\"\"\n",
-    "\n",
-    "import numpy as np\n",
-    "import scipy\n",
-    "from scipy.spatial import distance\n",
-    "\n",
-    "from paddlehub.reader.tokenization import load_vocab\n",
-    "import paddle.fluid as fluid\n",
-    "import paddlehub as hub\n",
-    "\n",
-    "raw_data = [\n",
-    "    [\"驾驶违章一次扣12分用两个驾驶证处理可以吗\", \"一次性扣12分的违章,能用不满十二分的驾驶证扣分吗\"],\n",
-    "    [\"水果放冰箱里储存好吗\", \"中国银行纪念币网上怎么预约\"],\n",
-    "    [\"电脑反应很慢怎么办\", \"反应速度慢,电脑总是卡是怎么回事\"]\n",
-    "]\n",
-    "\n",
-    "lac = hub.Module(name=\"lac\")\n",
-    "\n",
-    "processed_data = []\n",
-    "for text_pair in raw_data:\n",
-    "    inputs = {\"text\" : text_pair}\n",
-    "    results = lac.lexical_analysis(data=inputs, use_gpu=True, batch_size=2)\n",
-    "    data = []\n",
-    "    for result in results:\n",
-    "        data.append(\" \".join(result[\"word\"]))\n",
-    "    processed_data.append(data)\n",
-    "\n",
-    "processed_data"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 三、计算文本语义相似度\n",
-    "\n",
-    "将分词文本中的单词相应替换为wordid，之后输入wor2vec module中计算两个文本语义相似度"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def convert_tokens_to_ids(vocab, text):\n",
-    "    wids = []\n",
-    "    tokens = text.split(\" \")\n",
-    "    for token in tokens:\n",
-    "        wid = vocab.get(token, None)\n",
-    "        if not wid:\n",
-    "            wid = vocab[\"unknown\"]\n",
-    "        wids.append(wid)\n",
-    "    return wids"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "module = hub.Module(name=\"word2vec_skipgram\")\n",
-    "inputs, outputs, program = module.context(trainable=False)\n",
-    "vocab = load_vocab(module.get_vocab_path())\n",
-    "\n",
-    "word_ids = inputs[\"word_ids\"]\n",
-    "embedding = outputs[\"word_embs\"]\n",
-    "\n",
-    "place = fluid.CPUPlace()\n",
-    "exe = fluid.Executor(place)\n",
-    "feeder = fluid.DataFeeder(feed_list=[word_ids], place=place)\n",
-    "\n",
-    "for item in processed_data:\n",
-    "    text_a = convert_tokens_to_ids(vocab, item[0])\n",
-    "    text_b = convert_tokens_to_ids(vocab, item[1])\n",
-    "\n",
-    "    vecs_a, = exe.run(\n",
-    "        program,\n",
-    "        feed=feeder.feed([[text_a]]),\n",
-    "        fetch_list=[embedding.name],\n",
-    "        return_numpy=False)\n",
-    "    vecs_a = np.array(vecs_a)\n",
-    "    vecs_b, = exe.run(\n",
-    "        program,\n",
-    "        feed=feeder.feed([[text_b]]),\n",
-    "        fetch_list=[embedding.name],\n",
-    "        return_numpy=False)\n",
-    "    vecs_b = np.array(vecs_b)\n",
-    "\n",
-    "    sent_emb_a = np.sum(vecs_a, axis=0)\n",
-    "    sent_emb_b = np.sum(vecs_b, axis=0)\n",
-    "    cos_sim = 1 - distance.cosine(sent_emb_a, sent_emb_b)\n",
-    "\n",
-    "    print(\"text_a: %s; text_b: %s; cosine_similarity: %.5f\" %\n",
-    "          (item[0], item[1], cos_sim))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.8"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/tutorial/sentence_sim.md
+++ b/tutorial/sentence_sim.md
+# 使用Word2Vec进行文本语义相似度计算
+本示例展示利用PaddleHub“端到端地”完成文本相似度计算
+## 一、准备文本数据
+如
+```
+驾驶违章一次扣12分用两个驾驶证处理可以吗    一次性扣12分的违章,能用不满十二分的驾驶证扣分吗
+水果放冰箱里储存好吗    中国银行纪念币网上怎么预约
+电脑反应很慢怎么办    反应速度慢,电脑总是卡是怎么回事
+```
+## 二、分词
+利用PaddleHub Module LAC对文本数据进行分词
+```python
+# coding:utf-8
+#  Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""similarity between two sentences"""
+import numpy as np
+import scipy
+from scipy.spatial import distance
+from paddlehub.reader.tokenization import load_vocab
+import paddle.fluid as fluid
+import paddlehub as hub
+raw_data = [
+    ["驾驶违章一次扣12分用两个驾驶证处理可以吗", "一次性扣12分的违章,能用不满十二分的驾驶证扣分吗"],
+    ["水果放冰箱里储存好吗", "中国银行纪念币网上怎么预约"],
+    ["电脑反应很慢怎么办", "反应速度慢,电脑总是卡是怎么回事"]
+]
+lac = hub.Module(name="lac")
+processed_data = []
+for text_pair in raw_data:
+    inputs = {"text" : text_pair}
+    results = lac.lexical_analysis(data=inputs, use_gpu=True, batch_size=2)
+    data = []
+    for result in results:
+        data.append(" ".join(result["word"]))
+    processed_data.append(data)
+```
+## 三、计算文本语义相似度
+将分词文本中的单词相应替换为wordid，之后输入wor2vec module中计算两个文本语义相似度
+```python
+def convert_tokens_to_ids(vocab, text):
+    wids = []
+    tokens = text.split(" ")
+    for token in tokens:
+        wid = vocab.get(token, None)
+        if not wid:
+            wid = vocab["unknown"]
+        wids.append(wid)
+    return wids
+module = hub.Module(name="word2vec_skipgram")
+inputs, outputs, program = module.context(trainable=False)
+vocab = load_vocab(module.get_vocab_path())
+word_ids = inputs["word_ids"]
+embedding = outputs["word_embs"]
+place = fluid.CPUPlace()
+exe = fluid.Executor(place)
+feeder = fluid.DataFeeder(feed_list=[word_ids], place=place)
+for item in processed_data:
+    text_a = convert_tokens_to_ids(vocab, item[0])
+    text_b = convert_tokens_to_ids(vocab, item[1])
+    vecs_a, = exe.run(
+        program,
+        feed=feeder.feed([[text_a]]),
+        fetch_list=[embedding.name],
+        return_numpy=False)
+    vecs_a = np.array(vecs_a)
+    vecs_b, = exe.run(
+        program,
+        feed=feeder.feed([[text_b]]),
+        fetch_list=[embedding.name],
+        return_numpy=False)
+    vecs_b = np.array(vecs_b)
+    sent_emb_a = np.sum(vecs_a, axis=0)
+    sent_emb_b = np.sum(vecs_b, axis=0)
+    cos_sim = 1 - distance.cosine(sent_emb_a, sent_emb_b)
+    print("text_a: %s; text_b: %s; cosine_similarity: %.5f" %
+          (item[0], item[1], cos_sim))
+```