Merge branch 'release/v1.6' into develop

59f6945f · wuzewu · 5db06480 · ff9859d1 · 59f6945f · 59f6945f
14 changed file
--- a/README.md
+++ b/README.md
@@ -8,18 +8,18 @@
 ![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
 ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)

-PaddleHub是飞桨生态的预训练模型应用工具，开发者可以便捷地使用高质量的预训练模型结合Fine-tune API快速完成模型迁移到部署的全流程工作。PaddleHub提供的预训练模型涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型。更多详情可查看官网：https://www.paddlepaddle.org.cn/hu
+PaddleHub是飞桨生态的预训练模型应用工具，开发者可以便捷地使用高质量的预训练模型结合Fine-tune API快速完成模型迁移到部署的全流程工作。PaddleHub提供的预训练模型涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型。更多详情可查看官网：https://www.paddlepaddle.org.cn/hub


 PaddleHub以预训练模型应用为核心具备以下特点：  

 * **[模型即软件](#模型即软件)**，通过Python API或命令行实现模型调用，可快速体验或集成飞桨特色预训练模型。

-* **[易用的迁移学习](#迁移学习)**，通过Fine-tune API，内置多种优化策略，只需少量代码即可完成预训练模型的Fine-tuning。
+* **[易用的迁移学习](#易用的迁移学习)**，通过Fine-tune API，内置多种优化策略，只需少量代码即可完成预训练模型的Fine-tuning。

-* **[一键模型转服务](#服务化部署paddlehub-serving)**，简单一行命令即可搭建属于自己的深度学习模型API服务完成部署。
+* **[一键模型转服务](#一键模型转服务)**，简单一行命令即可搭建属于自己的深度学习模型API服务完成部署。

-* **[自动超参优化](#超参优化autodl-finetuner)**，内置AutoDL Finetuner能力，一键启动自动化超参搜索。
+* **[自动超参优化](#自动超参优化)**，内置AutoDL Finetuner能力，一键启动自动化超参搜索。


 <p align="center">
@@ -66,7 +66,7 @@ PaddleHub采用模型即软件的设计理念，所有的预训练模型与Pytho

 安装PaddleHub后，执行命令[hub run](./docs/tutorial/cmdintro.md)，即可快速体验无需代码、一键预测的功能：

-* 使用[目标检测](http://www.paddlepaddle.org.cn/hub?filter=category&value=ObjectDetection)模型pyramidbox_lite_mobile_mask对图片进行口罩检测
+* 使用[目标检测](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=ObjectDetection)模型pyramidbox_lite_mobile_mask对图片进行口罩检测
 ```shell
 $ wget https://paddlehub.bj.bcebos.com/resources/test_mask_detection.jpg
 $ hub run pyramidbox_lite_mobile_mask --input_path test_mask_detection.jpg
@@ -75,19 +75,22 @@ $ hub run pyramidbox_lite_mobile_mask --input_path test_mask_detection.jpg
 <img src="./docs/imgs/test_mask_detection_result.jpg" align="middle"  
 </p>

-* 使用[词法分析](http://www.paddlepaddle.org.cn/hub?filter=category&value=LexicalAnalysis)模型LAC进行分词
+* 使用[词法分析](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=LexicalAnalysis)模型LAC进行分词
 ```shell
-$ hub run lac --input_text "今天是个好日子"
-[{'word': ['今天', '是', '个', '好日子'], 'tag': ['TIME', 'v', 'q', 'n']}]
+$ hub run lac --input_text "现在，慕尼黑再保险公司不仅是此类行动的倡议者，更是将其大量气候数据整合进保险产品中，并与公众共享大量天气信息，参与到新能源领域的保障中。"
+[{
+    'word': ['现在', '，', '慕尼黑再保险公司', '不仅', '是', '此类', '行动', '的', '倡议者', '，', '更是', '将', '其', '大量', '气候', '数据', '整合', '进', '保险', '产品', '中', '，', '并', '与', '公众', '共享', '大量', '天气', '信息', '，', '参与', '到', '新能源', '领域', '的', '保障', '中', '。'],
+    'tag':  ['TIME', 'w', 'ORG', 'c', 'v', 'r', 'n', 'u', 'n', 'w', 'd', 'p', 'r', 'a', 'n', 'n', 'v', 'v', 'n', 'n', 'f', 'w', 'c', 'p', 'n', 'v', 'a', 'n', 'n', 'w', 'v', 'v', 'n', 'n', 'u', 'vn', 'f', 'w']
+}]
 ```

-* 使用[情感分析](http://www.paddlepaddle.org.cn/hub?filter=category&value=SentimentAnalysis)模型Senta对句子进行情感预测
+* 使用[情感分析](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=SentimentAnalysis)模型Senta对句子进行情感预测
 ```shell
 $ hub run senta_bilstm --input_text "今天天气真好"
 {'text': '今天天气真好', 'sentiment_label': 1, 'sentiment_key': 'positive', 'positive_probs': 0.9798, 'negative_probs': 0.0202}]
 ```

-* 使用[目标检测](http://www.paddlepaddle.org.cn/hub?filter=category&value=ObjectDetection)模型Ultra-Light-Fast-Generic-Face-Detector-1MB对图片进行人脸识别
+* 使用[目标检测](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=ObjectDetection)模型Ultra-Light-Fast-Generic-Face-Detector-1MB对图片进行人脸识别
 ```shell
 $ wget https://paddlehub.bj.bcebos.com/resources/test_image.jpg
 $ hub run ultra_light_fast_generic_face_detector_1mb_640 --input_path test_image.jpg
@@ -110,11 +113,11 @@ $ hub run deeplabv3p_xception65_humanseg --input_path test_image.jpg
 </p>  

 <p align='center'>
- &#8194;&#8194;&#8194&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;ace2p分割结果展示&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;
- humanseg分割结果展示&#8194;&#8194;&#8194;
+ &#8194;&#8194;&#8194&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;ACE2P人体部件分割&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;
+ HumanSeg人像分割&#8194;&#8194;&#8194;
 </p>

-PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型，更多模型介绍，请前往 [https://www.paddlepaddle.org.cn/hub](https://www.paddlepaddle.org.cn/hub) 查看
+PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型，更多模型介绍，请前往[预训练模型介绍](./docs/pretrained_models.md)或者PaddleHub官网[https://www.paddlepaddle.org.cn/hub](https://www.paddlepaddle.org.cn/hub) 查看

 ### 易用的迁移学习

@@ -189,6 +192,5 @@ $ hub uninstall ernie

 ## 更新历史

-PaddleHub v1.6.0已发布！
-
-详情参考[更新历史](./RELEASE.md)
+PaddleHub v1.6 已发布！
+更多升级详情参考[更新历史](./RELEASE.md)
--- a/demo/text_classification/finetuned_model_to_module/module.py
+++ b/demo/text_classification/finetuned_model_to_module/module.py
@@ -94,6 +94,7 @@ class ERNIETinyFinetuned(hub.Module):
            config=config,
            metrics_choices=metrics_choices)

+    @serving
    def predict(self, data, return_result=False, accelerate_mode=True):
        """
        Get prediction results
@@ -102,7 +103,14 @@ class ERNIETinyFinetuned(hub.Module):
            data=data,
            return_result=return_result,
            accelerate_mode=accelerate_mode)
-        return run_states
+        results = [run_state.run_results for run_state in run_states]
+        prediction = []
+        for batch_result in results:
+            # get predict index
+            batch_result = np.argmax(batch_result, axis=2)[0]
+            batch_result = batch_result.tolist()
+            prediction += batch_result
+        return prediction


 if __name__ == "__main__":
@@ -113,12 +121,6 @@ if __name__ == "__main__":
    data = [["这个宾馆比较陈旧了，特价的房间也很一般。总体来说一般"], ["交通方便；环境很好；服务态度很好 房间较小"],
            ["19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"]]

-    index = 0
-    run_states = ernie_tiny.predict(data=data)
-    results = [run_state.run_results for run_state in run_states]
-    for batch_result in results:
-        # get predict index
-        batch_result = np.argmax(batch_result, axis=2)[0]
-        for result in batch_result:
-            print("%s\tpredict=%s" % (data[index][0], result))
-            index += 1
+    predictions = ernie_tiny.predict(data=data)
+    for index, text in enumerate(data):
+        print("%s\tpredict=%s" % (data[index][0], predictions[index]))
--- a/docs/pretrained_models.md
+++ b/docs/pretrained_models.md
--- a/docs/reference/config.md
+++ b/docs/reference/config.md
@@ -8,8 +8,8 @@
 hub.RunConfig(
    log_interval=10,
    eval_interval=100,
-    use_pyreader=False,
-    use_data_parallel=False,
+    use_pyreader=True,
+    use_data_parallel=True,
    save_ckpt_interval=None,
    use_cuda=False,
    checkpoint_dir=None,
@@ -22,8 +22,8 @@ hub.RunConfig(

 * `log_interval`: 打印训练日志的周期，默认为10。
 * `eval_interval`: 进行评估的周期，默认为100。
-* `use_pyreader`: 是否使用pyreader，默认False。
-* `use_data_parallel`: 是否使用并行计算，默认False。打开该功能依赖nccl库。
+* `use_pyreader`: 是否使用pyreader，默认True。
+* `use_data_parallel`: 是否使用并行计算，默认True。打开该功能依赖nccl库。
 * `save_ckpt_interval`: 保存checkpoint的周期，默认为None。
 * `use_cuda`: 是否使用GPU训练和评估，默认为False。
 * `checkpoint_dir`: checkpoint的保存目录，默认为None，此时会在工作目录下根据时间戳生成一个临时目录。

--- a/docs/reference/task/base_task.md
+++ b/docs/reference/task/base_task.md
@@ -169,15 +169,6 @@ import paddlehub as hub
 task.predict()
 ```

-## Func `predict`
-根据config配置进行predict
-
-**示例**
-```python
-import paddlehub as hub
-...
-task.predict()
-```

 ## Property `is_train_phase`
 判断是否处于训练阶段

--- a/docs/tutorial/define_task_example.md
+++ b/docs/tutorial/define_task_example.md
+# 如何修改Task中的模型网络
+
+在应用中，用户需要更换迁移网络结构以调整模型在数据集上的性能。根据[如何自定义Task](./how_to_define_task.md)，本教程展示如何修改Task中的默认网络。
+以序列标注任务为例，本教程展示如何修改默认网络结构。SequenceLabelTask提供了两种网络选择，一种是FC网络，一种是FC+CRF网络。
+
+此时如果想在这基础之上，添加LSTM网络，组成BiLSTM+CRF的一种序列标注任务常用网络结构。
+此时，需要定义一个Task，继承自SequenceLabelTask，并改写其中build_net()方法。
+
+
+下方代码示例写了一个BiLSTM+CRF的网络。代码如下：
+
+```python
+class SequenceLabelTask_BiLSTMCRF(SequenceLabelTask):
+    def _build_net(self):
+        """
+        自定义序列标注迁移网络结构BiLSTM+CRF
+        """
+        self.seq_len = fluid.layers.data(
+            name="seq_len", shape=[1], dtype='int64', lod_level=0)
+
+        if version_compare(paddle.__version__, "1.6"):
+            self.seq_len_used = fluid.layers.squeeze(self.seq_len, axes=[1])
+        else:
+            self.seq_len_used = self.seq_len
+
+        if self.add_crf:
+            # 迁移网络为BiLSTM+CRF
+
+            # 去padding
+            unpad_feature = fluid.layers.sequence_unpad(
+                self.feature, length=self.seq_len_used)
+
+            # bilstm层
+            hid_dim = 128
+            fc0 = fluid.layers.fc(input=unpad_feature, size=hid_dim * 4)
+            rfc0 = fluid.layers.fc(input=unpad_feature, size=hid_dim * 4)
+            lstm_h, c = fluid.layers.dynamic_lstm(
+                input=fc0, size=hid_dim * 4, is_reverse=False)
+            rlstm_h, c = fluid.layers.dynamic_lstm(
+                input=rfc0, size=hid_dim * 4, is_reverse=True)
+            # 拼接lstm
+            lstm_concat = fluid.layers.concat(input=[lstm_h, rlstm_h], axis=1)
+
+            self.emission = fluid.layers.fc(
+                size=self.num_classes,
+                input=lstm_concat,
+                param_attr=fluid.ParamAttr(
+                    initializer=fluid.initializer.Uniform(low=-0.1, high=0.1),
+                    regularizer=fluid.regularizer.L2DecayRegularizer(
+                        regularization_coeff=1e-4)))
+            size = self.emission.shape[1]
+            fluid.layers.create_parameter(
+                shape=[size + 2, size], dtype=self.emission.dtype, name='crfw')
+            # CRF层
+            self.ret_infers = fluid.layers.crf_decoding(
+                input=self.emission, param_attr=fluid.ParamAttr(name='crfw'))
+            ret_infers = fluid.layers.assign(self.ret_infers)
+            # 返回预测值，list类型
+            return [ret_infers]
+        else:
+            # 迁移网络为FC
+            self.logits = fluid.layers.fc(
+                input=self.feature,
+                size=self.num_classes,
+                num_flatten_dims=2,
+                param_attr=fluid.ParamAttr(
+                    name="cls_seq_label_out_w",
+                    initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
+                bias_attr=fluid.ParamAttr(
+                    name="cls_seq_label_out_b",
+                    initializer=fluid.initializer.Constant(0.)))
+
+            self.ret_infers = fluid.layers.reshape(
+                x=fluid.layers.argmax(self.logits, axis=2), shape=[-1, 1])
+
+            logits = self.logits
+            logits = fluid.layers.flatten(logits, axis=2)
+            logits = fluid.layers.softmax(logits)
+            self.num_labels = logits.shape[1]
+            # 返回预测值，list类型
+            return [logits]
+```
+
+以上代码通过继承PaddleHub已经内置的Task，改写其中_build_net方法即可实现自定义迁移网络结构。
--- a/docs/tutorial/finetuned_model_to_module.md
+++ b/docs/tutorial/finetuned_model_to_module.md
@@ -148,7 +148,9 @@ def _initialize(self,

 初始化过程即为Fine-tune时创建Task的过程。

-**NOTE:** 执行类的初始化不能使用默认的__init__接口，而是应该重载实现_initialize接口。对象默认内置了directory属性，可以直接获取到Module所在路径
+**NOTE:**
+1. 执行类的初始化不能使用默认的__init__接口，而是应该重载实现_initialize接口。对象默认内置了directory属性，可以直接获取到Module所在路径。
+2. 使用Fine-tune保存的模型预测时，无需加载数据集Dataset，即Reader中的dataset参数可为None。

 #### step 3_4. 完善预测逻辑
 ```python
@@ -160,7 +162,14 @@ def predict(self, data, return_result=False, accelerate_mode=True):
        data=data,
        return_result=return_result,
        accelerate_mode=accelerate_mode)
-    return run_states
+    results = [run_state.run_results for run_state in run_states]
+    prediction = []
+    for batch_result in results:
+        # get predict index
+        batch_result = np.argmax(batch_result, axis=2)[0]
+        batch_result = batch_result.tolist()
+        prediction += batch_result
+    return prediction
 ```

 #### step 3_5. 支持serving调用
@@ -179,7 +188,14 @@ def predict(self, data, return_result=False, accelerate_mode=True):
        data=data,
        return_result=return_result,
        accelerate_mode=accelerate_mode)
-    return run_states
+    results = [run_state.run_results for run_state in run_states]
+    prediction = []
+    for batch_result in results:
+        # get predict index
+        batch_result = np.argmax(batch_result, axis=2)[0]
+        batch_result = batch_result.tolist()
+        prediction += batch_result
+    return prediction
 ```

 ### 完整代码
@@ -214,15 +230,9 @@ ernie_tiny = hub.Module(name="ernie_tiny_finetuned")
 data = [["这个宾馆比较陈旧了，特价的房间也很一般。总体来说一般"], ["交通方便；环境很好；服务态度很好 房间较小"],
        ["19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"]]

-index = 0
-run_states = ernie_tiny.predict(data=data)
-results = [run_state.run_results for run_state in run_states]
-for batch_result in results:
-    # get predict index
-    batch_result = np.argmax(batch_result, axis=2)[0]
-    for result in batch_result:
-        print("%s\tpredict=%s" % (data[index][0], result))
-        index += 1
+predictions = ernie_tiny.predict(data=data)
+for index, text in enumerate(data):
+    print("%s\tpredict=%s" % (data[index][0], predictions[index]))
 ```

 ### 调用方法2
@@ -238,15 +248,9 @@ ernie_tiny_finetuned = hub.Module(directory="finetuned_model_to_module/")
 data = [["这个宾馆比较陈旧了，特价的房间也很一般。总体来说一般"], ["交通方便；环境很好；服务态度很好 房间较小"],
        ["19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"]]

-index = 0
-run_states = ernie_tiny.predict(data=data)
-results = [run_state.run_results for run_state in run_states]
-for batch_result in results:
-    # get predict index
-    batch_result = np.argmax(batch_result, axis=2)[0]
-    for result in batch_result:
-        print("%s\tpredict=%s" % (data[index][0], result))
-        index += 1
+predictions = ernie_tiny.predict(data=data)
+for index, text in enumerate(data):
+    print("%s\tpredict=%s" % (data[index][0], predictions[index]))
 ```

 ### 调用方法3
@@ -263,13 +267,42 @@ import numpy as np
 data = [["这个宾馆比较陈旧了，特价的房间也很一般。总体来说一般"], ["交通方便；环境很好；服务态度很好 房间较小"],
        ["19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"]]

-run_states = ERNIETinyFinetuned.predict(data=data)
-index = 0
-results = [run_state.run_results for run_state in run_states]
-for batch_result in results:
-    # get predict index
-    batch_result = np.argmax(batch_result, axis=2)[0]
-    for result in batch_result:
-        print("%s\tpredict=%s" % (data[index][0], result))
-        index += 1
+predictions = ERNIETinyFinetuned.predict(data=data)
+for index, text in enumerate(data):
+    print("%s\tpredict=%s" % (data[index][0], predictions[index]))
 ```
+
+
+### PaddleHub Serving调用方法
+
+**第一步:启动预测服务**
+
+```shell
+hub serving start -m ernie_tiny_finetuned
+```
+
+**第二步:发送请求，获取预测结果**
+
+通过如下脚本既可以发送请求：
+```python
+# coding: utf8
+import requests
+import json
+
+
+# 待预测文本
+texts = [["这个宾馆比较陈旧了，特价的房间也很一般。总体来说一般"], ["交通方便；环境很好；服务态度很好 房间较小"],
+        ["19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"]]
+# key为'data', 对应着预测接口predict的参数data
+data = {'data': texts}
+
+# 指定模型为ernie_tiny_finetuned并发送post请求，且请求的headers为application/json方式
+url = "http://127.0.0.1:8866/predict/ernie_tiny_finetuned"
+headers = {"Content-Type": "application/json"}
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+# 打印预测结果
+print(json.dumps(r.json(), indent=4, ensure_ascii=False))
+```
+
+关与PaddleHub Serving更多信息参见[Hub Serving教程](../../docs/tutorial/serving.md)以及[Demo](../../demo/serving)
--- a/docs/tutorial/how_to_load_data.md
+++ b/docs/tutorial/how_to_load_data.md
@@ -22,6 +22,7 @@

 如果您有两个输入文本text_a、text_b，则第一列为第一个输入文本text_a, 第二列应为第二个输入文本text_b，第三列文本类别label。列与列之间以Tab键分隔。数据集第一行为`text_a    text_b    label`（中间以Tab键分隔）。

+
 ```text
 text_a    label
 15.4寸笔记本的键盘确实爽，基本跟台式机差不多了，蛮喜欢数字小键盘，输数字特方便，样子也很美观，做工也相当不错    1
@@ -36,6 +37,7 @@ text_a    label
 * 数据集文件编码格式建议为utf8格式。
 * 如果相应的数据集文件没有上述的列说明，如train.tsv文件没有第一行的`text_a    label`，则train_file_with_header=False。
 * 如果您还有预测数据（没有文本类别），可以将预测数据存放在predict.tsv文件，文件格式和train.tsv类似。去掉label一列即可。
+* 分类任务中，数据集的label必须从0开始计数


 ```python
@@ -117,6 +119,7 @@ dog
 * 训练/验证/测试集的数据列表文件中的图片路径需要相对于dataset_dir的相对路径，例如图片的实际位置为`/test/data/dog/dog1.jpg`。base_path为`/test/data`，则文件中填写的路径应该为`dog/dog1.jpg`。
 * 如果您还有预测数据（没有文本类别），可以将预测数据存放在predict_list.txt文件，文件格式和train_list.txt类似。去掉label一列即可
 * 如果您的数据集类别较少，可以不用定义label_list.txt，可以选择定义label_list=["数据集所有类别"]。
+* 分类任务中，数据集的label必须从0开始计数

 ```python
 from paddlehub.dataset.base_cv_dataset import BaseCVDataset

--- a/paddlehub/__init__.py
+++ b/paddlehub/__init__.py
@@ -38,6 +38,7 @@ from .common.logger import logger
 from .common.paddle_helper import connect_program
 from .common.hub_server import HubServer
 from .common.hub_server import server_check
+from .common.downloader import download, ResourceNotFoundError, ServerConnectionError

 from .module.module import Module
 from .module.base_processor import BaseProcessor

--- a/paddlehub/common/downloader.py
+++ b/paddlehub/common/downloader.py
-#coding:utf-8
+# coding:utf-8
 # Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
 #
-# Licensed under the Apache License, Version 2.0 (the "License"
+# Licensed under the Apache License, Version 2.0 (the 'License'
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
+# distributed under the License is distributed on an 'AS IS' BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
@@ -28,6 +28,8 @@ import tarfile

 from paddlehub.common import utils
 from paddlehub.common.logger import logger
+from paddlehub.common import tmp_dir
+import paddlehub as hub

 __all__ = ['Downloader', 'progress']
 FLUSH_INTERVAL = 0.1
@@ -38,10 +40,10 @@ lasttime = time.time()
 def progress(str, end=False):
    global lasttime
    if end:
-        str += "\n"
+        str += '\n'
        lasttime = 0
    if time.time() - lasttime >= FLUSH_INTERVAL:
-        sys.stdout.write("\r%s" % str)
+        sys.stdout.write('\r%s' % str)
        lasttime = time.time()
        sys.stdout.flush()

@@ -67,7 +69,7 @@ class Downloader(object):
            if retry_times < retry_limit:
                retry_times += 1
            else:
-                tips = "Cannot download {0} within retry limit {1}".format(
+                tips = 'Cannot download {0} within retry limit {1}'.format(
                    url, retry_limit)
                return False, tips, None
            r = requests.get(url, stream=True)
@@ -82,19 +84,19 @@ class Downloader(object):
                    total_length = int(total_length)
                    starttime = time.time()
                    if print_progress:
-                        print("Downloading %s" % save_name)
+                        print('Downloading %s' % save_name)
                    for data in r.iter_content(chunk_size=4096):
                        dl += len(data)
                        f.write(data)
                        if print_progress:
                            done = int(50 * dl / total_length)
                            progress(
-                                "[%-50s] %.2f%%" %
+                                '[%-50s] %.2f%%' %
                                ('=' * done, float(dl / total_length * 100)))
                if print_progress:
-                    progress("[%-50s] %.2f%%" % ('=' * 50, 100), end=True)
+                    progress('[%-50s] %.2f%%' % ('=' * 50, 100), end=True)

-        tips = "File %s download completed!" % (file_name)
+        tips = 'File %s download completed!' % (file_name)
        return True, tips, file_name

    def uncompress(self,
@@ -104,24 +106,25 @@ class Downloader(object):
                   print_progress=False):
        dirname = os.path.dirname(file) if dirname is None else dirname
        if print_progress:
-            print("Uncompress %s" % file)
-        with tarfile.open(file, "r:gz") as tar:
+            print('Uncompress %s' % file)
+
+        with tarfile.open(file, 'r:*') as tar:
            file_names = tar.getnames()
            size = len(file_names) - 1
            module_dir = os.path.join(dirname, file_names[0])
            for index, file_name in enumerate(file_names):
                if print_progress:
                    done = int(50 * float(index) / size)
-                    progress("[%-50s] %.2f%%" % ('=' * done,
+                    progress('[%-50s] %.2f%%' % ('=' * done,
                                                 float(index / size * 100)))
                tar.extract(file_name, dirname)

            if print_progress:
-                progress("[%-50s] %.2f%%" % ('=' * 50, 100), end=True)
+                progress('[%-50s] %.2f%%' % ('=' * 50, 100), end=True)
        if delete_file:
            os.remove(file)

-        return True, "File %s uncompress completed!" % file, module_dir
+        return True, 'File %s uncompress completed!' % file, module_dir

    def download_file_and_uncompress(self,
                                     url,
@@ -147,8 +150,62 @@ class Downloader(object):
        if save_name:
            save_name = os.path.join(save_path, save_name)
            shutil.move(file, save_name)
-            return result, "%s\n%s" % (tips_1, tips_2), save_name
-        return result, "%s\n%s" % (tips_1, tips_2), file
+            return result, '%s\n%s' % (tips_1, tips_2), save_name
+        return result, '%s\n%s' % (tips_1, tips_2), file


 default_downloader = Downloader()
+
+
+class ResourceNotFoundError(Exception):
+    def __init__(self, name, version=None):
+        self.name = name
+        self.version = version
+
+    def __str__(self):
+        if self.version:
+            tips = 'No resource named {} was found'.format(self.name)
+        else:
+            tips = 'No resource named {}-{} was found'.format(
+                self.name, self.version)
+        return tips
+
+
+class ServerConnectionError(Exception):
+    def __str__(self):
+        tips = 'Can\'t connect to Hub Server:{}'.format(
+            hub.HubServer().server_url[0])
+        return tips
+
+
+def download(name,
+             save_path,
+             version=None,
+             decompress=True,
+             resource_type='Model',
+             extra=None):
+    file = os.path.join(save_path, name)
+    file = os.path.realpath(file)
+    if os.path.exists(file):
+        return
+
+    if not hub.HubServer()._server_check():
+        raise ServerConnectionError
+
+    search_result = hub.HubServer().get_resource_url(
+        name, resource_type=resource_type, version=version, extra=extra)
+
+    if not search_result:
+        raise ResourceNotFoundError(name, version)
+
+    url = search_result['url']
+
+    with tmp_dir() as _dir:
+        if not os.path.exists(save_path):
+            os.makedirs(save_path)
+        _, _, savefile = default_downloader.download_file(
+            url=url, save_path=_dir, print_progress=True)
+        if tarfile.is_tarfile(savefile) and decompress:
+            _, _, savefile = default_downloader.uncompress(
+                file=savefile, print_progress=True)
+        shutil.move(savefile, file)
--- a/paddlehub/dataset/food101.py
+++ b/paddlehub/dataset/food101.py
@@ -25,11 +25,11 @@ from paddlehub.dataset.base_cv_dataset import BaseCVDataset

 class Food101Dataset(BaseCVDataset):
    def __init__(self):
-        dataset_path = os.path.join(hub.common.dir.DATA_HOME, "food-101",
-                                    "images")
-        base_path = self._download_dataset(
+        dataset_path = os.path.join(hub.common.dir.DATA_HOME, "food-101")
+        dataset_path = self._download_dataset(
            dataset_path=dataset_path,
            url="https://bj.bcebos.com/paddlehub-dataset/Food101.tar.gz")
+        base_path = os.path.join(dataset_path, "images")
        super(Food101Dataset, self).__init__(
            base_path=base_path,
            train_list_file="train_list.txt",

--- a/paddlehub/module/manager.py
+++ b/paddlehub/module/manager.py
@@ -96,8 +96,10 @@ class LocalModuleManager(object):
        for sub_dir_name in os.listdir(self.local_modules_dir):
            sub_dir_path = os.path.join(self.local_modules_dir, sub_dir_name)
            if os.path.isdir(sub_dir_path):
-                if "-" in sub_dir_path:
-                    new_sub_dir_path = sub_dir_path.replace("-", "_")
+                if "-" in sub_dir_name:
+                    sub_dir_name = sub_dir_name.replace("-", "_")
+                    new_sub_dir_path = os.path.join(self.local_modules_dir,
+                                                    sub_dir_name)
                    shutil.move(sub_dir_path, new_sub_dir_path)
                    sub_dir_path = new_sub_dir_path
                valid, info = self.check_module_valid(sub_dir_path)
@@ -180,11 +182,13 @@ class LocalModuleManager(object):
                with tarfile.open(module_package, "r:gz") as tar:
                    file_names = tar.getnames()
                    size = len(file_names) - 1
-                    module_dir = os.path.join(_dir, file_names[0])
+                    module_name = file_names[0]
+                    module_dir = os.path.join(_dir, module_name)
                    for index, file_name in enumerate(file_names):
                        tar.extract(file_name, _dir)
-                    if "-" in module_dir:
-                        new_module_dir = module_dir.replace("-", "_")
+                    if "-" in module_name:
+                        module_name = module_name.replace("-", "_")
+                        new_module_dir = os.path.join(_dir, module_name)
                        shutil.move(module_dir, new_module_dir)
                        module_dir = new_module_dir
                    module_name = hub.Module(directory=module_dir).name

--- a/paddlehub/reader/cv_reader.py
+++ b/paddlehub/reader/cv_reader.py
@@ -165,7 +165,7 @@ class ImageClassificationReader(BaseReader):
                for image_path, label in data:
                    image = preprocess(image_path)
                    images.append(image.astype('float32'))
-                    labels.append([int(label)])
+                    labels.append([np.int64(label)])

                    if len(images) == batch_size:
                        if return_list:

--- a/paddlehub/version.py
+++ b/paddlehub/version.py
@@ -13,5 +13,5 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """ PaddleHub version string """
-hub_version = "1.6.0"
+hub_version = "1.6.2"
 module_proto_version = "1.0.0"