提交 a6fb82c2 编写于 作者: W wuzewu

add API docs

上级 8b1f07ac
----
# RunConfig
----
在PaddleHub中,RunConfig代表了在对[Task](https://github.com/PaddlePaddle/PaddleHub/tree/develop/docs/API/Task.md)进行Finetune时的运行配置。包括运行的epoch次数、batch的大小、是否使用GPU训练等
## `class paddlehub.finetune.config.RunConfig(log_interval=10, eval_interval=100, save_ckpt_interval=None, use_cuda=False, checkpoint_dir=None, num_epoch=10, batch_size=None, enable_memory_optim=True, strategy=None)`
> ### 参数
> * log_interval: 打印训练日志的周期。默认为10
>
> * eval_interval: 进行评估的周期。默认为100
>
> * save_ckpt_interval: 保存checkpoint的周期。默认为None
>
> * use_cuda: 是否使用GPU训练和评估。默认为False
>
> * checkpoint_dir: checkpoint的保存目录。默认为None,此时会在工作目录下根据时间戳生成一个临时目录。
>
> * num_epoch: 运行的epoch次数。默认为10次
>
> * batch_size: batch大小。默认为None
>
> * enable_memory_optim: 是否进行内存优化。默认为True
>
> * strategy: finetune的策略。默认为None,此时会使用DefaultFinetuneStrategy策略
>
> ### 返回
> RunConfig
>
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig(
> use_cuda=True,
> num_epoch=10,
> batch_size=32)
> ```
## `log_interval`
获取RunConfig设置的log_interval属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> log_interval = config.log_interval()
> ```
## `eval_interval`
获取RunConfig设置的eval_interval属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> eval_interval = config.eval_interval()
> ```
## `save_ckpt_interval`
获取RunConfig设置的save_ckpt_interval属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> save_ckpt_interval = config.save_ckpt_interval()
> ```
## `use_cuda`
获取RunConfig设置的use_cuda属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> use_cuda = config.use_cuda()
> ```
## `checkpoint_dir`
获取RunConfig设置的checkpoint_dir属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> checkpoint_dir = config.checkpoint_dir()
> ```
## `num_epoch`
获取RunConfig设置的num_epoch属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> num_epoch = config.num_epoch()
> ```
## `batch_size`
获取RunConfig设置的batch_size属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> batch_size = config.batch_size()
> ```
## `strategy`
获取RunConfig设置的strategy属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> strategy = config.strategy()
> ```
## `enable_memory_optim`
获取RunConfig设置的enable_memory_optim属性
> ### 示例
>
> ```python
> import paddlehub as hub
>
> config = hub.RunConfig()
> enable_memory_optim = config.enable_memory_optim()
> ```
----
# Strategy
----
在PaddleHub中,Strategy代表了在对[Task](https://github.com/PaddlePaddle/PaddleHub/tree/develop/docs/API/Task.md)进行Finetune时,应该使用怎样的策略。这里的策略,包含了对预训练参数使用怎样的学习率,使用哪种类型的优化器,使用什么类型的正则化等
## `class paddlehub.finetune.strategy.AdamWeightDecayStrategy(learning_rate=1e-4, lr_scheduler="linear_warmup_decay", warmup_proportion=0.0, weight_decay=0.01, optimizer_name=None)`
基于Adam优化器的学习率衰减策略
> ### 参数
> * learning_rate: 全局学习率。默认为1e-4
>
> * lr_scheduler: 学习率调度方法。默认为"linear_warmup_decay"
>
> * warmup_proportion: warmup所占比重
>
> * weight_decay: 学习率衰减率
>
> * optimizer_name: 优化器名称。默认为None,此时会使用Adam
>
> ### 返回
> AdamWeightDecayStrategy
>
> ### 示例
>
> ```python
> ...
> strategy = hub.AdamWeightDecayStrategy()
>
> config = hub.RunConfig(
> use_cuda=True,
> num_epoch=10,
> batch_size=32,
> checkpoint_dir="hub_finetune_ckpt",
> strategy=strategy)
> ```
## `class paddlehub.finetune.strategy.DefaultFinetuneStrategy(learning_rate=1e-4, optimizer_name=None, regularization_coeff=1e-3)`
默认的Finetune策略,该策略会对预训练参数增加L2正则作为惩罚因子
> ### 参数
> * learning_rate: 全局学习率。默认为1e-4
>
> * optimizer_name: 优化器名称。默认为None,此时会使用Adam
>
> * regularization_coeff: 正则化的λ参数。默认为1e-3
>
> ### 返回
> DefaultFinetuneStrategy
>
> ### 示例
>
> ```python
> ...
> strategy = hub.DefaultFinetuneStrategy()
>
> config = hub.RunConfig(
> use_cuda=True,
> num_epoch=10,
> batch_size=32,
> checkpoint_dir="hub_finetune_ckpt",
> strategy=strategy)
> ```
----
# Task
----
在PaddleHub中,Task代表了一个finetune的任务。任务中包含了执行该任务相关的program以及和任务相关的一些度量指标(如准确率accuracy、F1分数)、损失等
## `class paddlehub.finetune.Task(task_type, graph_var_dict, main_program, startup_program)`
> ### 参数
> * task_type: 任务类型,用于在finetune时进行判断如何执行任务
>
> * graph_var_dict: 变量映射表,提供了任务的度量指标
>
> * main_program: 存储了模型计算图的Program
>
> * module_dir: 存储了模型参数初始化op的Program
>
> ### 返回
> Task
>
> ### 示例
>
> ```python
> import paddlehub as hub
> # 根据模型名字创建Module
> resnet = hub.Module(name = "resnet_v2_50_imagenet")
> input_dict, output_dict, program = resnet.context(trainable=True)
> with fluid.program_guard(program):
> label = fluid.layers.data(name="label", dtype="int64", shape=[1])
> feature_map = output_dict["feature_map"]
> task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=2)
> ```
## `variable(var_name)`
获取Task中的相关变量
> ### 参数
> * var_name: 变量名
>
> ### 示例
>
> ```python
> import paddlehub as hub
> ...
> task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=2)
> task.variable("loss")
> ```
## `main_program()`
获取Task对应的main_program
> ### 示例
>
> ```python
> import paddlehub as hub
> ...
> task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=2)
> main_program = task.main_program()
> ```
## `startup_program()`
获取Task对应的startup_program
> ### 示例
>
> ```python
> import paddlehub as hub
> ...
> task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=2)
> startup_program = task.startup_program()
> ```
## `inference_program()`
获取Task对应的inference_program
> ### 示例
>
> ```python
> import paddlehub as hub
> ...
> task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=2)
> inference_program = task.inference_program()
> ```
## `metric_variable_names()`
获取Task对应的所有相关的变量,包括loss、度量指标等
> ### 示例
>
> ```python
> import paddlehub as hub
> ...
> task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=2)
> metric_variable_names = task.metric_variable_names()
> ```
----
# create_img_cls_task
----
## `method paddlehub.finetune.task.create_img_cls_task(feature, label, num_classes, hidden_units=None):`
基于输入的特征,添加一个或多个全连接层来创建一个[图像分类](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/image-classification)任务用于finetune
> ### 参数
> * feature: 输入的特征
>
> * labels: 标签Variable
>
> * num_classes: 最后一层全连接层的神经元个数
>
> * hidden_units: 隐藏单元的设置,预期值为一个python list,list中的每个元素说明了一个隐藏层的神经元个数
>
> ### 返回
> paddle.finetune.task.Task
>
> ### 示例
>
> ```python
> import paddlehub as hub
>
> module = hub.Module(name="resnet_v2_50_imagenet")
> inputs, outputs, program = module.context(trainable=True)
>
> with fluid.program_guard(program):
> label = fluid.layers.data(name="label", shape=[1], dtype='int64')
> feature_map = outputs['feature_map']
>
> cls_task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=2, hidden_units = [20, 10])
> ```
----
# create_seq_label_task
----
## `method paddlehub.finetune.task.create_seq_label_task(feature, labels, seq_len, num_classes)`
基于输入的特征,添加一个全连接层来创建一个[序列标注](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/sequence-labeling)任务用于finetune
> ### 参数
> * feature: 输入的特征
>
> * labels: 标签Variable
>
> * seq_len: 序列长度Variable
>
> * num_classes: 全连接层的神经元个数
>
> ### 返回
> paddle.finetune.task.Task
>
> ### 示例
>
> ```python
> import paddlehub as hub
>
> max_seq_len = 20
> module = hub.Module(name="ernie")
> inputs, outputs, program = module.context(
> trainable=True, max_seq_len=max_seq_len)
>
> with fluid.program_guard(program):
> label = fluid.layers.data(name="label", shape=[max_seq_len, 1], dtype='int64')
> seq_len = fluid.layers.data(name="seq_len", shape=[1], dtype='int64')
> sequence_output = outputs["sequence_output"]
>
> seq_label_task = hub.create_seq_label_task(
> feature=sequence_output,
> labels=label,
> seq_len=seq_len,
> num_classes=dataset.num_labels)
> ```
----
# create_text_cls_task
----
## `method paddlehub.finetune.task.create_text_cls_task(feature, label, num_classes, hidden_units=None):`
基于输入的特征,添加一个或多个全连接层来创建一个[文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/text-classification)任务用于finetune
> ### 参数
> * feature: 输入的特征
>
> * labels: 标签Variable
>
> * num_classes: 最后一层全连接层的神经元个数
>
> * hidden_units: 隐藏单元的设置,预期值为一个python list,list中的每个元素说明了一个隐藏层的神经元个数
>
> ### 返回
> paddle.finetune.task.Task
>
> ### 示例
>
> ```python
> import paddlehub as hub
>
> max_seq_len = 20
> module = hub.Module(name="ernie")
> inputs, outputs, program = module.context(
> trainable=True, max_seq_len=max_seq_len)
>
> with fluid.program_guard(program):
> label = fluid.layers.data(name="label", shape=[1], dtype='int64')
> pooled_output = outputs["pooled_output"]
>
> cls_task = hub.create_text_cls_task(
> feature=pooled_output, label=label, num_classes=2, hidden_units = [20, 10])
> ```
----
# finetune
----
## `method paddlehub.finetune.task.finetune(task, data_reader, feed_list, config=None):`
对一个Task进行finetune。在finetune的过程中,接口会定期的保存checkpoint(模型和运行数据),当运行被中断时,通过RunConfig指定上一次运行的checkpoint目录,可以直接从上一次运行的最后一次评估中恢复状态继续运行
> ### 参数
> * task: 需要执行的Task
>
> * data_reader: 提供数据的reader
>
> * feed_list: reader的feed列表
>
> * config: 运行配置
>
> ### 示例
>
> ```python
> import paddlehub as hub
> import paddle.fluid as fluid
>
> resnet_module = hub.Module(name="resnet_v2_50_imagenet")
> input_dict, output_dict, program = resnet_module.context(trainable=True)
> dataset = hub.dataset.Flowers()
> data_reader = hub.reader.ImageClassificationReader(
> image_width=resnet_module.get_excepted_image_width(),
> image_height=resnet_module.get_excepted_image_height(),
> dataset=dataset)
> with fluid.program_guard(program):
> label = fluid.layers.data(name="label", dtype="int64", shape=[1])
> img = input_dict[0]
> feature_map = output_dict[0]
>
> feed_list = [img.name, label.name]
>
> task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=dataset.num_labels)
> hub.finetune(
> task, feed_list=feed_list, data_reader=data_reader)
> ```
----
# finetune_and_eval
----
## `method paddlehub.finetune.task.finetune_and_eval(task, data_reader, feed_list, config=None):`
对一个Task进行finetune,并且定期进行性能评估。在finetune的过程中,接口会定期的保存checkpoint(模型和运行数据),当运行被中断时,通过RunConfig指定上一次运行的checkpoint目录,可以直接从上一次运行的最后一次评估中恢复状态继续运行
> ### 参数
> * task: 需要执行的Task
>
> * data_reader: 提供数据的reader
>
> * feed_list: reader的feed列表
>
> * config: 运行配置
>
> ### 示例
>
> ```python
> import paddlehub as hub
> import paddle.fluid as fluid
>
> resnet_module = hub.Module(name="resnet_v2_50_imagenet")
> input_dict, output_dict, program = resnet_module.context(trainable=True)
> dataset = hub.dataset.Flowers()
> data_reader = hub.reader.ImageClassificationReader(
> image_width=resnet_module.get_excepted_image_width(),
> image_height=resnet_module.get_excepted_image_height(),
> dataset=dataset)
> with fluid.program_guard(program):
> label = fluid.layers.data(name="label", dtype="int64", shape=[1])
> img = input_dict[0]
> feature_map = output_dict[0]
>
> feed_list = [img.name, label.name]
>
> task = hub.create_img_cls_task(
> feature=feature_map, label=label, num_classes=dataset.num_labels)
> hub.finetune_and_eval(
> task, feed_list=feed_list, data_reader=data_reader)
> ```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册