diff --git a/docs/API/RunConfig.md b/docs/API/RunConfig.md new file mode 100644 index 0000000000000000000000000000000000000000..b043ef3749bed341ed4062865281dc648d7261eb --- /dev/null +++ b/docs/API/RunConfig.md @@ -0,0 +1,156 @@ +---- +# RunConfig +---- +在PaddleHub中,RunConfig代表了在对[Task](https://github.com/PaddlePaddle/PaddleHub/tree/develop/docs/API/Task.md)进行Finetune时的运行配置。包括运行的epoch次数、batch的大小、是否使用GPU训练等 + +## `class paddlehub.finetune.config.RunConfig(log_interval=10, eval_interval=100, save_ckpt_interval=None, use_cuda=False, checkpoint_dir=None, num_epoch=10, batch_size=None, enable_memory_optim=True, strategy=None)` + +> ### 参数 +> * log_interval: 打印训练日志的周期。默认为10 +> +> * eval_interval: 进行评估的周期。默认为100 +> +> * save_ckpt_interval: 保存checkpoint的周期。默认为None +> +> * use_cuda: 是否使用GPU训练和评估。默认为False +> +> * checkpoint_dir: checkpoint的保存目录。默认为None,此时会在工作目录下根据时间戳生成一个临时目录。 +> +> * num_epoch: 运行的epoch次数。默认为10次 +> +> * batch_size: batch大小。默认为None +> +> * enable_memory_optim: 是否进行内存优化。默认为True +> +> * strategy: finetune的策略。默认为None,此时会使用DefaultFinetuneStrategy策略 +> +> ### 返回 +> RunConfig +> +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig( +> use_cuda=True, +> num_epoch=10, +> batch_size=32) +> ``` + +## `log_interval` + +获取RunConfig设置的log_interval属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> log_interval = config.log_interval() +> ``` + +## `eval_interval` + +获取RunConfig设置的eval_interval属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> eval_interval = config.eval_interval() +> ``` + +## `save_ckpt_interval` + +获取RunConfig设置的save_ckpt_interval属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> save_ckpt_interval = config.save_ckpt_interval() +> ``` + +## `use_cuda` + +获取RunConfig设置的use_cuda属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> use_cuda = config.use_cuda() +> ``` + +## `checkpoint_dir` + +获取RunConfig设置的checkpoint_dir属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> checkpoint_dir = config.checkpoint_dir() +> ``` + +## `num_epoch` + +获取RunConfig设置的num_epoch属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> num_epoch = config.num_epoch() +> ``` + +## `batch_size` + +获取RunConfig设置的batch_size属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> batch_size = config.batch_size() +> ``` + +## `strategy` + +获取RunConfig设置的strategy属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> strategy = config.strategy() +> ``` + +## `enable_memory_optim` + +获取RunConfig设置的enable_memory_optim属性 + +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> config = hub.RunConfig() +> enable_memory_optim = config.enable_memory_optim() +> ``` diff --git a/docs/API/Strategy.md b/docs/API/Strategy.md new file mode 100644 index 0000000000000000000000000000000000000000..2408dff684ef0112dd119eed01de15dd71ebd5ab --- /dev/null +++ b/docs/API/Strategy.md @@ -0,0 +1,62 @@ +---- +# Strategy +---- +在PaddleHub中,Strategy代表了在对[Task](https://github.com/PaddlePaddle/PaddleHub/tree/develop/docs/API/Task.md)进行Finetune时,应该使用怎样的策略。这里的策略,包含了对预训练参数使用怎样的学习率,使用哪种类型的优化器,使用什么类型的正则化等 + +## `class paddlehub.finetune.strategy.AdamWeightDecayStrategy(learning_rate=1e-4, lr_scheduler="linear_warmup_decay", warmup_proportion=0.0, weight_decay=0.01, optimizer_name=None)` + +基于Adam优化器的学习率衰减策略 +> ### 参数 +> * learning_rate: 全局学习率。默认为1e-4 +> +> * lr_scheduler: 学习率调度方法。默认为"linear_warmup_decay" +> +> * warmup_proportion: warmup所占比重 +> +> * weight_decay: 学习率衰减率 +> +> * optimizer_name: 优化器名称。默认为None,此时会使用Adam +> +> ### 返回 +> AdamWeightDecayStrategy +> +> ### 示例 +> +> ```python +> ... +> strategy = hub.AdamWeightDecayStrategy() +> +> config = hub.RunConfig( +> use_cuda=True, +> num_epoch=10, +> batch_size=32, +> checkpoint_dir="hub_finetune_ckpt", +> strategy=strategy) +> ``` + +## `class paddlehub.finetune.strategy.DefaultFinetuneStrategy(learning_rate=1e-4, optimizer_name=None, regularization_coeff=1e-3)` + +默认的Finetune策略,该策略会对预训练参数增加L2正则作为惩罚因子 +> ### 参数 +> * learning_rate: 全局学习率。默认为1e-4 +> +> * optimizer_name: 优化器名称。默认为None,此时会使用Adam +> +> * regularization_coeff: 正则化的λ参数。默认为1e-3 +> +> ### 返回 +> DefaultFinetuneStrategy +> +> ### 示例 +> +> ```python +> ... +> strategy = hub.DefaultFinetuneStrategy() +> +> config = hub.RunConfig( +> use_cuda=True, +> num_epoch=10, +> batch_size=32, +> checkpoint_dir="hub_finetune_ckpt", +> strategy=strategy) +> ``` diff --git a/docs/API/Task.md b/docs/API/Task.md new file mode 100644 index 0000000000000000000000000000000000000000..7df33df97c3cca66ee457fdb3617471b6c8b9f17 --- /dev/null +++ b/docs/API/Task.md @@ -0,0 +1,94 @@ +---- +# Task +---- +在PaddleHub中,Task代表了一个finetune的任务。任务中包含了执行该任务相关的program以及和任务相关的一些度量指标(如准确率accuracy、F1分数)、损失等 + +## `class paddlehub.finetune.Task(task_type, graph_var_dict, main_program, startup_program)` +> ### 参数 +> * task_type: 任务类型,用于在finetune时进行判断如何执行任务 +> +> * graph_var_dict: 变量映射表,提供了任务的度量指标 +> +> * main_program: 存储了模型计算图的Program +> +> * module_dir: 存储了模型参数初始化op的Program +> +> ### 返回 +> Task +> +> ### 示例 +> +> ```python +> import paddlehub as hub +> # 根据模型名字创建Module +> resnet = hub.Module(name = "resnet_v2_50_imagenet") +> input_dict, output_dict, program = resnet.context(trainable=True) +> with fluid.program_guard(program): +> label = fluid.layers.data(name="label", dtype="int64", shape=[1]) +> feature_map = output_dict["feature_map"] +> task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=2) +> ``` + +## `variable(var_name)` +获取Task中的相关变量 +> ### 参数 +> * var_name: 变量名 +> +> ### 示例 +> +> ```python +> import paddlehub as hub +> ... +> task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=2) +> task.variable("loss") +> ``` + +## `main_program()` +获取Task对应的main_program +> ### 示例 +> +> ```python +> import paddlehub as hub +> ... +> task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=2) +> main_program = task.main_program() +> ``` + +## `startup_program()` +获取Task对应的startup_program +> ### 示例 +> +> ```python +> import paddlehub as hub +> ... +> task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=2) +> startup_program = task.startup_program() +> ``` + +## `inference_program()` +获取Task对应的inference_program +> ### 示例 +> +> ```python +> import paddlehub as hub +> ... +> task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=2) +> inference_program = task.inference_program() +> ``` + +## `metric_variable_names()` +获取Task对应的所有相关的变量,包括loss、度量指标等 +> ### 示例 +> +> ```python +> import paddlehub as hub +> ... +> task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=2) +> metric_variable_names = task.metric_variable_names() +> ``` diff --git a/docs/API/create_img_cls_task.md b/docs/API/create_img_cls_task.md new file mode 100644 index 0000000000000000000000000000000000000000..fc2d870a9a12e961ff468eea83ebafa4f56e0810 --- /dev/null +++ b/docs/API/create_img_cls_task.md @@ -0,0 +1,34 @@ +---- +# create_img_cls_task +---- + +## `method paddlehub.finetune.task.create_img_cls_task(feature, label, num_classes, hidden_units=None):` + +基于输入的特征,添加一个或多个全连接层来创建一个[图像分类](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/image-classification)任务用于finetune +> ### 参数 +> * feature: 输入的特征 +> +> * labels: 标签Variable +> +> * num_classes: 最后一层全连接层的神经元个数 +> +> * hidden_units: 隐藏单元的设置,预期值为一个python list,list中的每个元素说明了一个隐藏层的神经元个数 +> +> ### 返回 +> paddle.finetune.task.Task +> +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> module = hub.Module(name="resnet_v2_50_imagenet") +> inputs, outputs, program = module.context(trainable=True) +> +> with fluid.program_guard(program): +> label = fluid.layers.data(name="label", shape=[1], dtype='int64') +> feature_map = outputs['feature_map'] +> +> cls_task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=2, hidden_units = [20, 10]) +> ``` diff --git a/docs/API/create_seq_label_task.md b/docs/API/create_seq_label_task.md new file mode 100644 index 0000000000000000000000000000000000000000..e083f13414386ad614a0aab7a54afafe2e83f878 --- /dev/null +++ b/docs/API/create_seq_label_task.md @@ -0,0 +1,40 @@ +---- +# create_seq_label_task +---- + +## `method paddlehub.finetune.task.create_seq_label_task(feature, labels, seq_len, num_classes)` + +基于输入的特征,添加一个全连接层来创建一个[序列标注](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/sequence-labeling)任务用于finetune +> ### 参数 +> * feature: 输入的特征 +> +> * labels: 标签Variable +> +> * seq_len: 序列长度Variable +> +> * num_classes: 全连接层的神经元个数 +> +> ### 返回 +> paddle.finetune.task.Task +> +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> max_seq_len = 20 +> module = hub.Module(name="ernie") +> inputs, outputs, program = module.context( +> trainable=True, max_seq_len=max_seq_len) +> +> with fluid.program_guard(program): +> label = fluid.layers.data(name="label", shape=[max_seq_len, 1], dtype='int64') +> seq_len = fluid.layers.data(name="seq_len", shape=[1], dtype='int64') +> sequence_output = outputs["sequence_output"] +> +> seq_label_task = hub.create_seq_label_task( +> feature=sequence_output, +> labels=label, +> seq_len=seq_len, +> num_classes=dataset.num_labels) +> ``` diff --git a/docs/API/create_text_cls_task.md b/docs/API/create_text_cls_task.md new file mode 100644 index 0000000000000000000000000000000000000000..5d251d895670a7a8e4a331fe24a2011ebfd96265 --- /dev/null +++ b/docs/API/create_text_cls_task.md @@ -0,0 +1,36 @@ +---- +# create_text_cls_task +---- + +## `method paddlehub.finetune.task.create_text_cls_task(feature, label, num_classes, hidden_units=None):` + +基于输入的特征,添加一个或多个全连接层来创建一个[文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/develop/demo/text-classification)任务用于finetune +> ### 参数 +> * feature: 输入的特征 +> +> * labels: 标签Variable +> +> * num_classes: 最后一层全连接层的神经元个数 +> +> * hidden_units: 隐藏单元的设置,预期值为一个python list,list中的每个元素说明了一个隐藏层的神经元个数 +> +> ### 返回 +> paddle.finetune.task.Task +> +> ### 示例 +> +> ```python +> import paddlehub as hub +> +> max_seq_len = 20 +> module = hub.Module(name="ernie") +> inputs, outputs, program = module.context( +> trainable=True, max_seq_len=max_seq_len) +> +> with fluid.program_guard(program): +> label = fluid.layers.data(name="label", shape=[1], dtype='int64') +> pooled_output = outputs["pooled_output"] +> +> cls_task = hub.create_text_cls_task( +> feature=pooled_output, label=label, num_classes=2, hidden_units = [20, 10]) +> ``` diff --git a/docs/API/finetune.md b/docs/API/finetune.md new file mode 100644 index 0000000000000000000000000000000000000000..3305d5abd456a89e17e8dfb5dc8ecf336c815ab6 --- /dev/null +++ b/docs/API/finetune.md @@ -0,0 +1,41 @@ +---- +# finetune +---- + +## `method paddlehub.finetune.task.finetune(task, data_reader, feed_list, config=None):` + +对一个Task进行finetune。在finetune的过程中,接口会定期的保存checkpoint(模型和运行数据),当运行被中断时,通过RunConfig指定上一次运行的checkpoint目录,可以直接从上一次运行的最后一次评估中恢复状态继续运行 +> ### 参数 +> * task: 需要执行的Task +> +> * data_reader: 提供数据的reader +> +> * feed_list: reader的feed列表 +> +> * config: 运行配置 +> +> ### 示例 +> +> ```python +> import paddlehub as hub +> import paddle.fluid as fluid +> +> resnet_module = hub.Module(name="resnet_v2_50_imagenet") +> input_dict, output_dict, program = resnet_module.context(trainable=True) +> dataset = hub.dataset.Flowers() +> data_reader = hub.reader.ImageClassificationReader( +> image_width=resnet_module.get_excepted_image_width(), +> image_height=resnet_module.get_excepted_image_height(), +> dataset=dataset) +> with fluid.program_guard(program): +> label = fluid.layers.data(name="label", dtype="int64", shape=[1]) +> img = input_dict[0] +> feature_map = output_dict[0] +> +> feed_list = [img.name, label.name] +> +> task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=dataset.num_labels) +> hub.finetune( +> task, feed_list=feed_list, data_reader=data_reader) +> ``` diff --git a/docs/API/finetune_and_eval.md b/docs/API/finetune_and_eval.md new file mode 100644 index 0000000000000000000000000000000000000000..db27d3463f619cfd40070f03480b26f9e27db082 --- /dev/null +++ b/docs/API/finetune_and_eval.md @@ -0,0 +1,41 @@ +---- +# finetune_and_eval +---- + +## `method paddlehub.finetune.task.finetune_and_eval(task, data_reader, feed_list, config=None):` + +对一个Task进行finetune,并且定期进行性能评估。在finetune的过程中,接口会定期的保存checkpoint(模型和运行数据),当运行被中断时,通过RunConfig指定上一次运行的checkpoint目录,可以直接从上一次运行的最后一次评估中恢复状态继续运行 +> ### 参数 +> * task: 需要执行的Task +> +> * data_reader: 提供数据的reader +> +> * feed_list: reader的feed列表 +> +> * config: 运行配置 +> +> ### 示例 +> +> ```python +> import paddlehub as hub +> import paddle.fluid as fluid +> +> resnet_module = hub.Module(name="resnet_v2_50_imagenet") +> input_dict, output_dict, program = resnet_module.context(trainable=True) +> dataset = hub.dataset.Flowers() +> data_reader = hub.reader.ImageClassificationReader( +> image_width=resnet_module.get_excepted_image_width(), +> image_height=resnet_module.get_excepted_image_height(), +> dataset=dataset) +> with fluid.program_guard(program): +> label = fluid.layers.data(name="label", dtype="int64", shape=[1]) +> img = input_dict[0] +> feature_map = output_dict[0] +> +> feed_list = [img.name, label.name] +> +> task = hub.create_img_cls_task( +> feature=feature_map, label=label, num_classes=dataset.num_labels) +> hub.finetune_and_eval( +> task, feed_list=feed_list, data_reader=data_reader) +> ```