提交 ac6f99e2 编写于 作者: M mindspore-ci-bot 提交者: Gitee

!746 Fix the mindinsight notebook tutorial

Merge pull request !746 from ougongchang/fix_notebook
......@@ -74,51 +74,55 @@
"metadata": {},
"outputs": [],
"source": [
"import urllib.request \n",
"from urllib.parse import urlparse\n",
"import gzip \n",
"import os\n",
"import gzip\n",
"import urllib.request\n",
"from urllib.parse import urlparse\n",
"\n",
"\n",
"def unzip_file(gzip_path):\n",
" \"\"\"unzip dataset file\n",
" \"\"\"\n",
" Unzip a given gzip file.\n",
"\n",
" Args:\n",
" gzip_path: dataset file path\n",
" gzip_path (str): The gzip file path\n",
" \"\"\"\n",
" open_file = open(gzip_path.replace('.gz',''), 'wb')\n",
" open_file = open(gzip_path.replace('.gz', ''), 'wb')\n",
" gz_file = gzip.GzipFile(gzip_path)\n",
" open_file.write(gz_file.read())\n",
" gz_file.close()\n",
" \n",
"\n",
"\n",
"def download_dataset():\n",
" \"\"\"Download the dataset from http://yann.lecun.com/exdb/mnist/.\"\"\"\n",
" print(\"******Downloading the MNIST dataset******\")\n",
" train_path = \"./MNIST_Data/train/\" \n",
" train_path = \"./MNIST_Data/train/\"\n",
" test_path = \"./MNIST_Data/test/\"\n",
" train_path_check = os.path.exists(train_path)\n",
" test_path_check = os.path.exists(test_path)\n",
" if train_path_check == False and test_path_check == False:\n",
" if not train_path_check and not test_path_check:\n",
" os.makedirs(train_path)\n",
" os.makedirs(test_path)\n",
" train_url = {\"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\", \"http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\"}\n",
" test_url = {\"http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\", \"http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\"}\n",
" \n",
" train_url = {\"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\",\n",
" \"http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\"}\n",
" test_url = {\"http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\",\n",
" \"http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\"}\n",
"\n",
" for url in train_url:\n",
" url_parse = urlparse(url)\n",
" \"\"\"split the file name from url\"\"\"\n",
" file_name = os.path.join(train_path,url_parse.path.split('/')[-1])\n",
" if not os.path.exists(file_name.replace('.gz', '')):\n",
" file = urllib.request.urlretrieve(url, file_name)\n",
" unzipfile(file_name)\n",
" os.remove(file_name)\n",
" \n",
" # split the file name from url\n",
" file_name = os.path.join(train_path, url_parse.path.split('/')[-1])\n",
" if not os.path.exists(file_name.replace('.gz', '')) and not os.path.exists(file_name):\n",
" urllib.request.urlretrieve(url, file_name)\n",
" unzip_file(file_name)\n",
"\n",
" for url in test_url:\n",
" url_parse = urlparse(url)\n",
" \"\"\"split the file name from url\"\"\"\n",
" file_name = os.path.join(test_path,url_parse.path.split('/')[-1])\n",
" if not os.path.exists(file_name.replace('.gz', '')):\n",
" file = urllib.request.urlretrieve(url, file_name)\n",
" unzipfile(file_name)\n",
" os.remove(file_name)\n",
" # split the file name from url\n",
" file_name = os.path.join(test_path, url_parse.path.split('/')[-1])\n",
" if not os.path.exists(file_name.replace('.gz', '')) and not os.path.exists(file_name):\n",
" urllib.request.urlretrieve(url, file_name)\n",
" unzip_file(file_name)\n",
"\n",
"download_dataset()"
]
......@@ -127,9 +131,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 数据集使用\n",
"\n",
"设置正确的数据存放路径,可将数据集读取出来,并对整体数据集做预处理,让数据更能发挥模型性能。MindInsight可视化的数据图,便是显示的数据集预处理时的变化方式和顺序。"
"#### 数据增强\n",
"对数据集进行数据增强操作,可以提升模型精度。\n"
]
},
{
......@@ -148,32 +151,39 @@
"def create_dataset(data_path, batch_size=32, repeat_size=1,\n",
" num_parallel_workers=1):\n",
" \"\"\"\n",
" create dataset for train or test\n",
" Create dataset for train or test.\n",
"\n",
" Args:\n",
" data_path (str): The absolute path of the dataset\n",
" batch_size (int): The number of data records in each group\n",
" repeat_size (int): The number of replicated data records\n",
" num_parallel_workers (int): The number of parallel workers\n",
" \"\"\"\n",
" \"\"\"define dataset\"\"\"\n",
" # define dataset\n",
" mnist_ds = ds.MnistDataset(data_path)\n",
"\n",
" # define some parameters needed for data enhancement and rough justification\n",
" resize_height, resize_width = 32, 32\n",
" rescale = 1.0 / 255.0\n",
" shift = 0.0\n",
" rescale_nml = 1 / 0.3081\n",
" shift_nml = -1 * 0.1307 / 0.3081\n",
"\n",
" \"\"\"define map operations\"\"\"\n",
" type_cast_op = C.TypeCast(mstype.int32)\n",
" resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR) # Bilinear mode\n",
" # according to the parameters, generate the corresponding data enhancement method\n",
" resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)\n",
" rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)\n",
" rescale_op = CV.Rescale(rescale, shift)\n",
" hwc2chw_op = CV.HWC2CHW()\n",
" type_cast_op = C.TypeCast(mstype.int32)\n",
"\n",
" \"\"\"apply map operations on images\"\"\"\n",
" # using map method to apply operations to a dataset\n",
" mnist_ds = mnist_ds.map(input_columns=\"label\", operations=type_cast_op, num_parallel_workers=num_parallel_workers)\n",
" mnist_ds = mnist_ds.map(input_columns=\"image\", operations=resize_op, num_parallel_workers=num_parallel_workers)\n",
" mnist_ds = mnist_ds.map(input_columns=\"image\", operations=rescale_op, num_parallel_workers=num_parallel_workers)\n",
" mnist_ds = mnist_ds.map(input_columns=\"image\", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers)\n",
" mnist_ds = mnist_ds.map(input_columns=\"image\", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers)\n",
"\n",
" \"\"\"apply DatasetOps\"\"\"\n",
" \n",
" # process the generated dataset\n",
" buffer_size = 10000\n",
" mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size) # 10000 as in LeNet train script\n",
" mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)\n",
......@@ -272,15 +282,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 主程序运行\n",
"#### 执行训练\n",
"\n",
"1. 首先在主函数之前调用所需要的模块,并在主函数之前使用相应接口。\n",
"1. 导入所需的代码包,并示例化训练网络。\n",
"2. 通过MindSpore提供的 `SummaryCollector` 接口,实现收集计算图和数据图。在实例化 `SummaryCollector` 时,在 `collect_specified_data` 参数中,通过设置 `collect_graph` 指定收集计算图,设置 `collect_dataset_graph` 指定收集数据图。\n",
"\n",
"2. 本次体验主要完成计算图与数据图的可视化,定义变量`specified={'collect_graph': True,'collect_dataset_graph': True}`,在`specified`字典中,键名`collect_graph`值设置为`True`,表示记录计算图;键名`collect_dataset_graph`值设置为`True`,表示记录数据图。\n",
"\n",
"3. 定义完`specified`变量后,传参到`summary_collector`中,最后将`summary_collector`传参到`model`中。\n",
"\n",
"至此,模型中就有了计算图与数据图的可视化功能。"
"更多 `SummaryCollector` 的用法,请点击[API文档](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.train.html?highlight=summarycollector#mindspore.train.callback.SummaryCollector)查看。\n",
"\n"
]
},
{
......@@ -293,9 +301,7 @@
"from mindspore import context\n",
"from mindspore.train import Model\n",
"from mindspore.nn.metrics import Accuracy\n",
"from mindspore.train.callback import SummaryCollector\n",
"from mindspore.train.serialization import load_checkpoint, load_param_into_net\n",
"from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor\n",
"from mindspore.train.callback import LossMonitor, SummaryCollector\n",
"\n",
"if __name__ == \"__main__\":\n",
" device_target = \"CPU\"\n",
......@@ -308,18 +314,15 @@
" net_loss = nn.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True, reduction=\"mean\")\n",
" net_opt = nn.Momentum(network.trainable_params(), learning_rate=0.01, momentum=0.9)\n",
" time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())\n",
" config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)\n",
" ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_lenet\", config=config_ck)\n",
" model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
" specified={'collect_graph': True,'collect_dataset_graph': True}\n",
"\n",
" specified={'collect_graph': True, 'collect_dataset_graph': True}\n",
" summary_collector = SummaryCollector(summary_dir='./summary_dir', collect_specified_data=specified, collect_freq=1, keep_default_action=False)\n",
" \n",
" print(\"============== Starting Training ==============\")\n",
" model.train(epoch=2, train_dataset=ds_train, callbacks=[time_cb, ckpoint_cb, LossMonitor(), summary_collector], dataset_sink_mode=False)\n",
" model.train(epoch=2, train_dataset=ds_train, callbacks=[LossMonitor(), summary_collector], dataset_sink_mode=False)\n",
"\n",
" print(\"============== Starting Testing ==============\")\n",
" param_dict = load_checkpoint(\"checkpoint_lenet-3_1875.ckpt\")\n",
" load_param_into_net(network, param_dict)\n",
" ds_eval = create_dataset(\"./MNIST_Data/test/\")\n",
" acc = model.eval(ds_eval, dataset_sink_mode=False)\n",
" print(\"============== {} ==============\".format(acc))"
......@@ -333,6 +336,8 @@
"- 启动MindInsigh服务命令:`mindinsigh start --summary-base-dir=/path/ --port=8080`;\n",
"- 执行完服务命令后,访问给出的地址,查看MindInsigh可视化结果。\n",
"\n",
"> 其中 /path/ 为 `SummaryCollector` 中参数 `summary_dir` 所指定的目录。\n",
"\n",
"![title](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/mindinsight/images/mindinsight_map.png)"
]
},
......@@ -354,45 +359,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 数据图信息\n",
"### 数据图展示\n",
"\n",
"数据图所展示的顺序与数据集使用处代码顺序对应\n",
"数据图展示了数据增强中对数据进行操作的流程。\n",
"\n",
"1. 首先是从加载数据集`mnist_ds = ds.MnistDataset(data_path)`开始,对应数据图中`MnistDataset`。\n",
"1. 首先是从加载数据集 `mnist_ds = ds.MnistDataset(data_path)` 开始,对应数据图中 `MnistDataset`。\n",
"\n",
"2. 在以下所示代码中,是数据预处理的一些方法,顺序与数据图中所示顺序对应。\n",
"2. 下面代码为上面的 `create_dataset` 函数中作数据预处理与数据增强的相关操作。可以从数据图中清晰地看到数据处理的流程。通过查看数据图,可以帮助分析是否存在不恰当的数据处理流程。\n",
"\n",
"```\n",
"type_cast_op = C.TypeCast(mstype.int32)\n",
"resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)\n",
"rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)\n",
"rescale_op = CV.Rescale(rescale, shift)\n",
"hwc2chw_op = CV.HWC2CHW()\n",
"mnist_ds = mnist_ds.map(input_columns=\"label\", operations=type_cast_op, num_parallel_workers=num_parallel_workers)\n",
"mnist_ds = mnist_ds.map(input_columns=\"image\", operations=resize_op, num_parallel_workers=num_parallel_workers)\n",
"mnist_ds = mnist_ds.map(input_columns=\"image\", operations=rescale_op, num_parallel_workers=num_parallel_workers)\n",
"mnist_ds = mnist_ds.map(input_columns=\"image\", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers)\n",
"mnist_ds = mnist_ds.map(input_columns=\"image\", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers)\n",
"```\n",
"\n",
"- `TypeCast`:在数据集`create_data`函数中,使用:`TypeCase(mstype.int32)`,将数据类型转换成我们所设置的类型。\n",
"- `Resize`:在数据集`create_data`函数中,使用:`Resize(resize_height,resize_width = 32,32)`,可以将数据的高和宽做调整。\n",
"- `Rescale`:在数据集`create_data`函数中,使用:`rescale = 1.0 / 255.0`;`Rescale(rescale,shift)`,可以重新数据格式。\n",
"- `HWC2CHW`:在数据集`create_data`函数中,使用:`HWC2CHW()`,此方法可以将数据所带信息与通道结合,一并加载。\n",
"\n",
"\n",
"3. 前面的几个步骤是数据集的预处理顺序,后面几个步骤是模型加载数据集时要定义的参数,顺序与数据图中对应。\n",
"\n",
"```\n",
"buffer_size = 10000\n",
"mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size) # 10000 as in LeNet train script\n",
"mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)\n",
"mnist_ds = mnist_ds.repeat(repeat_size)\n",
"```\n",
" \n",
"- `Shuffle`:在数据集`create_data`函数中,使用:`buffer_size = 10000`,后面数值可以支持自行设置,表示一次缓存数据的数量。\n",
"- `Batch`:在数据集`create_data`函数中,使用:`batch_size = 32`。支持自行设置,表示将整体数据集划分成小批量数据集,每一个小批次作为一个整体进行训练。\n",
"- `Repeat`:在数据集`create_data`函数中,使用:`repeat_size = 1`,支持自行设定,表示的是一次运行中要训练的次数。"
"```\n"
]
},
{
......@@ -408,7 +393,7 @@
"source": [
"### 关闭MindInsight\n",
"\n",
"- 查看完成后,在命令行中可执行此命令`mindinsight stop --port=8080`,关闭MindInsight。"
"- 查看完成后,在命令行中可执行此命令 `mindinsight stop --port=8080`,关闭MindInsight。"
]
}
],
......
......@@ -6,16 +6,16 @@
"source": [
"# 标量、直方图、图像和张量可视化\n",
"\n",
"MindInsight可以将神经网络训练过程中的损失值标量、直方图、图像信息和张量信息记录到日志文件中,通过可视化界面解析以供用户查看。\n",
"可以通过MindSpore提供的接口将训练过程中的标量、图像和张量记录到summary日志文件中,并通过MindInsight提供的可视化界面进行查看。\n",
"\n",
"接下来是本次流程的体验过程。\n",
"\n",
"## 整体流程\n",
"\n",
"1. 准备环节。下载CIFAR-10二进制格式数据集,配置运行信息。\n",
"2. 数据处理。\n",
"3. 初始化AlexNet网络,使用`ImageSummary`记录图像数据和`TensorSummary`记录张量数据。\n",
"4. 训练网络,使用`SummaryCollector`记录损失值标量、权重梯度等参数。同时启动MindInsight服务,实时查看损失值、参数直方图、输入图像和张量的变化。\n",
"1. 下载CIFAR-10二进制格式数据集。\n",
"2. 对数据进行预处理。\n",
"3. 定义AlexNet网络,在网络中使用summary算子记录数据。\n",
"4. 训练网络,使用 `SummaryCollector` 记录损失值标量、权重梯度等参数。同时启动MindInsight服务,实时查看损失值、参数直方图、输入图像和张量的变化。\n",
"5. 完成训练后,查看MindInsight看板中记录到的损失值标量、直方图、图像信息、张量信息。\n",
"6. 分别单独记录损失值标量、直方图、图像信息和张量信息并查看可视化结果,查看损失值标量对比信息。\n",
"7. 相关注意事项,关闭MindInsight服务。"
......@@ -33,10 +33,7 @@
"\n",
"CIFAR-10二进制格式数据集包含10个类别的60000个32x32彩色图像。每个类别6000个图像,包含50000张训练图像和10000张测试图像。数据集分为5个训练批次和1个测试批次,每个批次具有10000张图像。测试批次包含每个类别中1000个随机选择的图像,训练批次按随机顺序包含剩余图像(某个训练批次包含的一类图像可能比另一类更多)。其中,每个训练批次精确地包含对应每个类别的5000张图像。\n",
"\n",
"执行下面一段代码下载CIFAR-10二进制格式数据集到当前工作目录,该段代码分为两部分:\n",
"\n",
"1. 判断当前工作目录是否存在CIFAR-10二进制格式数据集目录,不存在则创建目录,存在则跳至[**数据处理**](#数据处理)。\n",
"2. 判断CIFAT-10数据集目录是否存在CIFAR-10二进制格式数据集,不存在则下载CIFAR-10二进制格式数据集,存在则跳至[**数据处理**](#数据处理)。"
"执行下面一段代码下载CIFAR-10二进制格式数据集到当前工作目录,如果已经下载过数据集,则不重复下载。"
]
},
{
......@@ -143,26 +140,6 @@
"- `data_batch_5.bin`文件为第5批次训练数据集文件。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 配置运行信息"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from mindspore import context\n",
"\n",
"\n",
"device_target = \"GPU\"\n",
"context.set_context(mode=context.GRAPH_MODE, device_target=device_target)"
]
},
{
"cell_type": "markdown",
"metadata": {},
......@@ -284,43 +261,54 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 网络初始化\n",
"## 使用Summary算子记录数据\n",
"\n",
"在进行训练之前,需定义神经网络模型,本流程采用AlexNet网络,以下一段代码中定义AlexNet网络结构。\n",
"\n",
"在AlexNet网络中使用`Summary`算子记录输入图像和张量数据。\n",
"MindSpore提供了两种方法进行记录数据,分别为:\n",
"- 通过Summary算子记录数据\n",
"- 通过 `SummaryCollector` 这个callback进行记录\n",
"\n",
"- 使用`ImageSummary`记录输入图像数据。\n",
"下面展示在AlexNet网络中使用Summary算子记录输入图像和张量数据。\n",
"\n",
" 1. 在`__init__`方法中初始化`ImageSummary`。\n",
"- 使用 `ImageSummary` 记录输入图像数据。\n",
"\n",
" 1. 在 `__init__` 方法中初始化 `ImageSummary`。\n",
" \n",
" ```python\n",
" # Init ImageSummary\n",
" self.sm_image = P.ImageSummary()\n",
" self.image_summary = P.ImageSummary()\n",
" ```\n",
" \n",
" 2. 在`construct`方法中使用`ImageSummary`算子记录输入图像。其中\"Image\"为MindInsight展示的记录到的图像信息面板标题。\n",
" 2. 在 `construct` 方法中使用 `ImageSummary` 算子记录输入图像。其中 \"Image\" 为该数据的名称,MindInsight在展示时,会将该名称展示出来以方便识别是哪个数据。\n",
" \n",
" ```python\n",
" # Record image by Summary operator\n",
" self.sm_image(\"Image\", x)\n",
" self.image_summary(\"Image\", x)\n",
" ```\n",
" \n",
"- 使用`TensorSummary`记录张量数据。\n",
"- 使用 `TensorSummary` 记录张量数据。\n",
"\n",
" 1. 在`__init__`方法中初始化`TensorSummary`。\n",
" 1. 在 `__init__` 方法中初始化 `TensorSummary`。\n",
" \n",
" ```python\n",
" # Init TensorSummary\n",
" self.sm_tensor = P.TensorSummary()\n",
" self.tensor_summary = P.TensorSummary()\n",
" ```\n",
" \n",
" 2. 在`construct`方法中使用`TensorSummary`算子记录张量数据。其中\"Tensor\"为MindInsight展示的记录到的张量信息面板标题。\n",
" 2. 在`construct`方法中使用`TensorSummary`算子记录张量数据。其中\"Tensor\"为该数据的名称。\n",
" \n",
" ```python\n",
" # Record tensor by Summary operator\n",
" self.sm_tensor(\"Tensor\", x)\n",
" ```"
" self.tensor_summary(\"Tensor\", x)\n",
" ```\n",
"\n",
"当前支持的Summary算子:\n",
"\n",
"- [ScalarSummary](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.ops.operations.html?highlight=scalarsummary#mindspore.ops.operations.ScalarSummary): 记录标量数据\n",
"- [TensorSummary](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.ops.operations.html?highlight=tensorsummary#mindspore.ops.operations.TensorSummary): 记录张量数据\n",
"- [ImageSummary](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.ops.operations.html?highlight=imagesummary#mindspore.ops.operations.ImageSummary): 记录图片数据\n",
"- [HistogramSummary](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.ops.operations.html?highlight=histogramsummar#mindspore.ops.operations.HistogramSummary): 将张量数据转为直方图数据记录"
]
},
{
......@@ -366,16 +354,16 @@
" self.fc2 = fc_with_initialize(4096, 4096)\n",
" self.fc3 = fc_with_initialize(4096, num_classes)\n",
" # Init TensorSummary\n",
" self.sm_tensor = P.TensorSummary()\n",
" self.tensor_summary = P.TensorSummary()\n",
" # Init ImageSummary\n",
" self.sm_image = P.ImageSummary()\n",
" self.image_summary = P.ImageSummary()\n",
"\n",
" def construct(self, x):\n",
" # Record image by Summary operator\n",
" self.sm_image(\"Image\", x)\n",
" self.image_summary(\"Image\", x)\n",
" x = self.conv1(x)\n",
" # Record tensor by Summary operator\n",
" self.sm_tensor(\"Tensor\", x)\n",
" self.tensor_summary(\"Tensor\", x)\n",
" x = self.relu(x)\n",
" x = self.max_pool2d(x)\n",
" x = self.conv2(x)\n",
......@@ -401,36 +389,35 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 记录标量、直方图、图像\n",
"## 使用 `SummaryCollector` 记录数据\n",
"\n",
"本次体验中使用`SummaryCollector`来记录标量、直方图信息。\n",
"下面展示使用`SummaryCollector`来记录标量、直方图信息。\n",
"\n",
"在MindSpore中通过`Callback`机制,提供支持快速简易地收集损失值、参数权重、梯度等信息的`Callback`, 叫做`SummaryCollector`(详细的用法可以参考API文档中`mindspore.train.callback.SummaryCollector`)。`SummaryCollector`使用方法如下: \n",
"在MindSpore中通过`Callback`机制,提供支持快速简易地收集损失值、参数权重、梯度等信息的`Callback`, 叫做`SummaryCollector`(详细的用法可以参考API文档中[mindspore.train.callback.SummaryCollector](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.train.html?highlight=summarycollector#mindspore.train.callback.SummaryCollector))。`SummaryCollector`使用方法如下: \n",
"\n",
"1. 为了记录损失值标量、直方图信息,在下面一段代码中需要在`specified`参数中指定需要记录的信息。\n",
"`SummaryCollector` 提供 `collect_specified_data` 参数,允许自定义想要收集的数据。\n",
"\n",
" ```python\n",
" specified={\"collect_metric\": True, \"histogram_regular\": \"^conv1.*|^conv2.*\"}\n",
" ```\n",
" - 其中:\n",
" - `\"collect_metric\"`为记录损失值标量信息。\n",
" - `\"histogram_regular\"`为记录`conv1`层和`conv2`层直方图信息。\n",
"下面的代码展示通过 `SummaryCollector` 收集损失值以及卷积层的参数值,参数值在MindInsight中以直方图展示。\n",
"\n",
"2. 实例化`SummaryCollector`,并将其应用到`model.train`或者`model.eval`中。\n",
"\n",
" ```python\n",
" summary_collector = SummaryCollector(summary_dir=\"./summary_dir/summary_01\", \n",
"\n",
"\n",
"```python\n",
"specified={\"collect_metric\": True, \"histogram_regular\": \"^conv1.*|^conv2.*\"}\n",
"summary_collector = SummaryCollector(summary_dir=\"./summary_dir/summary_01\", \n",
" collect_specified_data=specified, \n",
" collect_freq=1, \n",
" keep_default_action=False, \n",
" collect_tensor_freq=200)\n",
" ```\n",
" - 其中:\n",
" - `summary_dir`:指定日志保存的路径。\n",
" - `collect_specified_data`:指定需要记录的信息。\n",
" - `collect_freq`:指定使用`SummaryCollector`记录数据的频率。\n",
" - `keep_default_action`:指定是否除记录除指定信息外的其他数据信息。\n",
" - `collect_tensor_freq`:指定记录张量信息的频率。\n",
"```\n",
"\n",
"- `summary_dir`:指定日志保存的路径。\n",
"- `collect_specified_data`:指定需要记录的信息。\n",
"- `collect_freq`:指定使用`SummaryCollector`记录数据的频率。\n",
"- `keep_default_action`:指定是否除记录除指定信息外的其他数据信息。\n",
"- `collect_tensor_freq`:指定记录张量信息的频率。\n",
"- `\"collect_metric\"`为记录损失值标量信息。\n",
"- `\"histogram_regular\"`为记录`conv1`层和`conv2`层直方图信息。\n",
"\n",
"  程序运行过程中将在本地`8080`端口自动启动MindInsight服务并自动遍历读取当前notebook目录下`summary_dir`子目录下所有日志文件、解析进行可视化展示。"
]
......@@ -455,7 +442,11 @@
"from mindspore.nn.metrics import Accuracy\n",
"from mindspore.train.callback import SummaryCollector\n",
"from mindspore.train.serialization import load_checkpoint, load_param_into_net\n",
"from mindspore import Tensor"
"from mindspore import Tensor\n",
"from mindspore import context\n",
"\n",
"device_target = \"GPU\"\n",
"context.set_context(mode=context.GRAPH_MODE, device_target=device_target)"
]
},
{
......@@ -501,9 +492,7 @@
" lr_each_step = np.array(lr_each_step).astype(np.float32)\n",
" learning_rate = lr_each_step[current_step:]\n",
"\n",
" return learning_rate\n",
"\n",
"lr = Tensor(get_lr(0, 0.002, 10, ds_train.get_dataset_size()))"
" return learning_rate\n"
]
},
{
......@@ -553,21 +542,26 @@
}
],
"source": [
"summary_base_dir = \"./summary_dir\"\n",
"\n",
"network = AlexNet(num_classes=10)\n",
"net_loss = nn.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True, reduction=\"mean\")\n",
"lr = Tensor(get_lr(0, 0.002, 10, ds_train.get_dataset_size()))\n",
"net_opt = nn.Momentum(network.trainable_params(), learning_rate=lr, momentum=0.9)\n",
"time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())\n",
"config_ck = CheckpointConfig(save_checkpoint_steps=1562, keep_checkpoint_max=10)\n",
"ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_alexnet\", config=config_ck)\n",
"model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
"\n",
"summary_base_dir = \"./summary_dir\"\n",
"os.system(f\"mindinsight start --summary-base-dir {summary_base_dir} --port=8080\")\n",
"\n",
"# Init a SummaryCollector callback instance, and use it in model.train or model.eval\n",
"specified = {\"collect_metric\": True, \"histogram_regular\": \"^conv1.*|^conv2.*\"}\n",
"summary_collector = SummaryCollector(summary_dir=\"./summary_dir/summary_01\", collect_specified_data=specified, collect_freq=1, keep_default_action=False, collect_tensor_freq=200)\n",
"\n",
"print(\"============== Starting Training ==============\")\n",
"# Note: dataset_sink_mode should be set to False, else you should modify collect freq in SummaryCollector\n",
"model.train(epoch=10, train_dataset=ds_train, callbacks=[time_cb, ckpoint_cb, LossMonitor(), summary_collector], dataset_sink_mode=True)\n",
"\n",
"print(\"============== Starting Testing ==============\")\n",
"param_dict = load_checkpoint(\"checkpoint_alexnet-10_1562.ckpt\")\n",
"load_param_into_net(network, param_dict)\n",
......@@ -591,11 +585,11 @@
"\n",
"### 标量可视化\n",
"\n",
"标量可视化用于展示训练过程中标量的变化趋势情况,点击打开标量信息展示面板,该面板记录了迭代计算过程中的损失值标量信息,如下图展示了loss值标量趋势图。\n",
"标量可视化用于展示训练过程中标量的变化趋势,点击打开标量信息展示面板,该面板记录了迭代计算过程中的损失值标量信息,如下图展示了损失值标量趋势图。\n",
"\n",
"![](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/mindinsight/images/scalar_panel.png)\n",
"\n",
"上图展示了神经网络在训练过程中loss值的变化过程。横坐标是训练步骤,纵坐标是loss值。\n",
"上图展示了神经网络在训练过程中损失值的变化过程。横坐标是训练步骤,纵坐标是损失值。\n",
"\n",
"图中右上角有几个按钮功能,从左到右功能分别是全屏展示,切换Y轴比例,开启/关闭框选,分步回退和还原图形。\n",
"\n",
......@@ -789,11 +783,14 @@
"config_ck = CheckpointConfig(save_checkpoint_steps=1562, keep_checkpoint_max=10)\n",
"ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_alexnet\", config=config_ck)\n",
"model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
"\n",
"# Init a SummaryCollector callback instance, and use it in model.train or model.eval\n",
"specified = {\"collect_metric\": True}\n",
"summary_collector = SummaryCollector(summary_dir=\"./summary_dir/summary_loss_only\", collect_specified_data=specified, collect_freq=1, keep_default_action=False)\n",
"\n",
"print(\"============== Starting Training ==============\")\n",
"model.train(epoch=10, train_dataset=ds_train, callbacks=[time_cb, ckpoint_cb, LossMonitor(), summary_collector], dataset_sink_mode=True)\n",
"\n",
"print(\"============== Starting Testing ==============\")\n",
"param_dict = load_checkpoint(\"checkpoint_alexnet_1-10_1562.ckpt\")\n",
"load_param_into_net(network, param_dict)\n",
......@@ -882,11 +879,14 @@
"config_ck = CheckpointConfig(save_checkpoint_steps=1562, keep_checkpoint_max=10)\n",
"ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_alexnet\", config=config_ck)\n",
"model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
"\n",
"# Init a SummaryCollector callback instance, and use it in model.train or model.eval\n",
"specified = {\"histogram_regular\": \"^conv1.*\"}\n",
"summary_collector = SummaryCollector(summary_dir=\"./summary_dir/summary_histogram_only\", collect_specified_data=specified, collect_freq=1, keep_default_action=False)\n",
"\n",
"print(\"============== Starting Training ==============\")\n",
"model.train(epoch=1, train_dataset=ds_train, callbacks=[time_cb, ckpoint_cb, LossMonitor(), summary_collector], dataset_sink_mode=True)\n",
"\n",
"print(\"============== Starting Testing ==============\")\n",
"param_dict = load_checkpoint(\"checkpoint_alexnet_2-1_1562.ckpt\")\n",
"load_param_into_net(network, param_dict)\n",
......@@ -989,12 +989,12 @@
" self.fc2 = fc_with_initialize(4096, 4096)\n",
" self.fc3 = fc_with_initialize(4096, num_classes)\n",
" # Init TensorSummary\n",
" self.sm_tensor = P.TensorSummary()\n",
" self.tensor_summary = P.TensorSummary()\n",
"\n",
" def construct(self, x):\n",
" x = self.conv1(x)\n",
" # Record tensor by Summary operator\n",
" self.sm_tensor(\"Tensor\", x)\n",
" self.tensor_summary(\"Tensor\", x)\n",
" x = self.relu(x)\n",
" x = self.max_pool2d(x)\n",
" x = self.conv2(x)\n",
......@@ -1023,10 +1023,13 @@
"config_ck = CheckpointConfig(save_checkpoint_steps=1562, keep_checkpoint_max=10)\n",
"ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_alexnet\", config=config_ck)\n",
"model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
"\n",
"# Init a SummaryCollector callback instance, and use it in model.train or model.eval\n",
"summary_collector = SummaryCollector(summary_dir=\"./summary_dir/summary_tensor_only\", collect_specified_data=None, collect_freq=1, keep_default_action=False, collect_tensor_freq=50)\n",
"\n",
"print(\"============== Starting Training ==============\")\n",
"model.train(epoch=1, train_dataset=ds_train, callbacks=[time_cb, ckpoint_cb, LossMonitor(), summary_collector], dataset_sink_mode=True)\n",
"\n",
"print(\"============== Starting Testing ==============\")\n",
"param_dict = load_checkpoint(\"checkpoint_alexnet_3-1_1562.ckpt\")\n",
"load_param_into_net(network, param_dict)\n",
......@@ -1122,11 +1125,11 @@
" self.fc2 = fc_with_initialize(4096, 4096)\n",
" self.fc3 = fc_with_initialize(4096, num_classes)\n",
" # Init ImageSummary\n",
" self.sm_image = P.ImageSummary()\n",
" self.image_summary = P.ImageSummary()\n",
"\n",
" def construct(self, x):\n",
" # Record image by Summary operator\n",
" self.sm_image(\"Image\", x)\n",
" self.image_summary(\"Image\", x)\n",
" x = self.conv1(x)\n",
" x = self.relu(x)\n",
" x = self.max_pool2d(x)\n",
......@@ -1156,10 +1159,13 @@
"config_ck = CheckpointConfig(save_checkpoint_steps=1562, keep_checkpoint_max=10)\n",
"ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_alexnet\", config=config_ck)\n",
"model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
"\n",
"# Init a SummaryCollector callback instance, and use it in model.train or model.eval\n",
"summary_collector = SummaryCollector(summary_dir=\"./summary_dir/summary_image_only\", collect_specified_data=None, collect_freq=1, keep_default_action=False)\n",
"\n",
"print(\"============== Starting Training ==============\")\n",
"model.train(epoch=1, train_dataset=ds_train, callbacks=[time_cb, ckpoint_cb, LossMonitor(), summary_collector], dataset_sink_mode=True)\n",
"\n",
"print(\"============== Starting Testing ==============\")\n",
"param_dict = load_checkpoint(\"checkpoint_alexnet_4-1_1562.ckpt\")\n",
"load_param_into_net(network, param_dict)\n",
......@@ -1182,7 +1188,7 @@
"source": [
"### 对比看板\n",
"\n",
"对比看板可视用于多次训练之间的标量数据对比。\n",
"对比看板用于多次训练之间的数据对比。\n",
"\n",
"点击MindInsight看板中的**对比看板**,打开对比看板,可以得到多次(不同)训练搜集到的标量数据对比信息。\n",
"\n",
......@@ -1218,12 +1224,6 @@
"metadata": {},
"source": [
"## 注意事项和规格\n",
"\n",
"- 当前支持的Summary算子:\n",
" - [ScalarSummary](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.ops.operations.html?highlight=scalarsummary#mindspore.ops.operations.ScalarSummary): 记录标量数据\n",
" - [TensorSummary](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.ops.operations.html?highlight=tensorsummary#mindspore.ops.operations.TensorSummary): 记录张量数据\n",
" - [ImageSummary](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.ops.operations.html?highlight=imagesummary#mindspore.ops.operations.ImageSummary): 记录图片数据\n",
" - [HistogramSummary](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.ops.operations.html?highlight=histogramsummar#mindspore.ops.operations.HistogramSummary): 将张量数据转为直方图数据记录\n",
"- 在训练中使用Summary算子收集数据时,`HistogramSummary`算子会影响性能,所以请尽量少地使用。\n",
"- 不能同时使用多个 `SummaryRecord` 实例 (`SummaryCollector` 中使用了 `SummaryRecord`)。\n",
"- 为了控制列出summary文件目录的用时,MindInsight最多支持发现999个summary文件目录。\n",
......@@ -1255,7 +1255,7 @@
"source": [
"## 总结\n",
"\n",
"本次体验流程为完整的MindSpore深度学习及MindInsight可视化展示的过程,包括了下载数据集及预处理过程,构建网络、损失函数和优化器过程,生成模型并进行训练、验证的过程,以及启动MindInsight服务进行训练过程可视化展示。读者可以基于本次体验流程构建自己的网络模型进行训练,并使用`SummaryCollector`、`ImageSummary`和`TensorSummary`记录关心的数据,然后在MindInsight服务看板中进行可视化展示,根据MindInsight服务中展示的结果调整相应的参数以提高训练精度。\n",
"本次体验流程为完整的MindSpore深度学习及MindInsight可视化展示的过程,包括了下载数据集及预处理过程,构建网络、损失函数和优化器过程,生成模型并进行训练、验证的过程,以及启动MindInsight服务进行训练过程可视化展示。读者可以基于本次体验流程构建自己的网络模型进行训练,并使用`SummaryCollector`以及Summary算子记录关心的数据,然后在MindInsight服务看板中进行可视化展示,根据MindInsight服务中展示的结果调整相应的参数以提高训练精度。\n",
"\n",
"以上便完成了标量、直方图、图像和张量可视化的体验,我们通过本次体验全面了解了MindSpore执行训练的过程和MindInsight在标量、直方图、图像和张量可视化的应用,理解了如何使用`SummaryColletor`记录训练过程中的标量、直方图、图像和张量数据。"
]
......
......@@ -12,20 +12,10 @@
"metadata": {},
"source": [
"## 概述\n",
"在AI训练的过程中,面对陌生的神经网络训练,经常需要事先优化神经网络训练中的参数,毕竟在训练一个十分复杂的神经网络时,有时候需要花费少则几天多则几周甚至更多的时间,为了更好的管理、调试和优化神经网络的训练过程,我们需要一个工具来对训练过程中的计算图、各种指标随着时间的变化趋势以及训练中使用到的图像信息进行分析和记录工作,而MindSpore就提供了一个对用户十分易用友好的可视化工具MindInsight,赋能给用户进行数据溯源和模型溯源的可视化分析,能明显提升用户对网络搭建过程和数据增强过程的纠错调优能力。而本次体验会从MindInsight的数据记录,可视化效果,如何方便用户在模型调优,数据调优上做一次整体流程的体验。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"下面按照MindSpore的训练数据模型的正常步骤进行,当使用`SummaryCollector`进行数据保存操作时,会增加相应的说明,本次体验的整体流程如下:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在调参的场景下,需要多次调整模型超参并进行多次训练,在这个过程,往往需要手动记录每次训练使用参数以及训练结果。为此,MindSpore提供了自动记录模型参数,训练信息,以及训练结果评估指标的功能,并通过MindInsight进行可视化展示。本次体验会从MindInsight的数据记录,可视化效果,如何方便用户在模型调优,数据调优上做一次整体流程的体验。\n",
"\n",
"下面按照MindSpore的训练数据模型的正常步骤进行,当使用`SummaryCollector`进行数据保存操作时,会增加相应的说明,本次体验的整体流程如下:\n",
"\n",
"1. 数据集的准备,这里使用的是MNIST数据集。\n",
"\n",
"2. 构建一个网络,这里使用LeNet网络。\n",
......@@ -36,13 +26,9 @@
"\n",
"5. 模型溯源的使用。调整模型参数多次训练并存储数据,并使用MindInsight的模型溯源功能对不同优化参数下训练产生的模型作对比,了解MindSpore中的各类优化对训练过程的影响及如何调优训练过程。\n",
"\n",
"6. 数据溯源的使用。调整数据参数多次训练并存储数据,并使用MindInsight的数据溯源功能对不同数据集下训练产生的模型进行对比分析,了解如何调优。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"6. 数据溯源的使用。调整数据参数多次训练并存储数据,并使用MindInsight的数据溯源功能对不同数据集下训练产生的模型进行对比分析,了解如何调优。\n",
"\n",
"\n",
"本次体验将使用快速入门案例作为基础用例,将MindInsight的模型溯源和数据溯源的数据记录功能加入到案例中,快速入门案例的源码请参考:<https://gitee.com/mindspore/docs/blob/master/tutorials/tutorial_code/lenet.py>。"
]
},
......@@ -57,13 +43,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 数据集准备"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### 准备\n",
"#### 方法一:\n",
"从以下网址下载,并将数据包解压缩后放至Jupyter的工作目录下:<br/>训练数据集:{\"<http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz>\", \"<http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz>\"}\n",
"<br/>测试数据集:{\"<http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz>\", \"<http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz>\"}<br/>我们用下面代码查询jupyter的工作目录。"
......@@ -121,7 +102,7 @@
" test_path = \"./MNIST_Data/test/\"\n",
" train_path_check = os.path.exists(train_path)\n",
" test_path_check = os.path.exists(test_path)\n",
" if train_path_check == False and test_path_check == False:\n",
" if not train_path_check and not test_path_check:\n",
" os.makedirs(train_path)\n",
" os.makedirs(test_path)\n",
" train_url = {\"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\", \"http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\"}\n",
......@@ -168,18 +149,14 @@
"source": [
"数据集处理对于训练非常重要,好的数据集可以有效提高训练精度和效率。在加载数据集前,我们通常会对数据集进行一些处理。\n",
"<br/>我们定义一个函数`create_dataset`来创建数据集。在这个函数中,我们定义好需要进行的数据增强和处理操作:\n",
"\n",
"1. 定义数据集。\n",
"2. 定义进行数据增强和处理所需要的一些参数。\n",
"3. 根据参数,生成对应的数据增强操作。\n",
"4. 使用`map`映射函数,将数据操作应用到数据集。\n",
"5. 对生成的数据集进行处理。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 具体的数据集操作可以在MindInsight的数据溯源中进行可视化分析。另外提取图像需要将`normalize`算子的数据处理(`CV.Rescale`)操作取消,否则取出来的图像为全黑图像。"
"5. 对生成的数据集进行处理。\n",
"\n",
"具体的数据集操作可以在MindInsight的数据溯源中进行可视化分析。"
]
},
{
......@@ -242,21 +219,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 构建LeNet5网络"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 使用ImageSummary记录图像数据"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在构建LeNet5网络的`__init__`中,初始化`ImageSummary`算子,同时在`construct`中将`ImageSummary`放在第一步,其第一个参数`image`为抽取出来的图片的自定义命名,第二个参数`x`是图像数据。此方法与`SummaryCollector`抽取图像的方法不冲突,可以同时使用。"
"## 定义LeNet5网络"
]
},
{
......@@ -299,11 +262,8 @@
" self.relu = nn.ReLU()\n",
" self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)\n",
" self.flatten = nn.Flatten()\n",
" # Init ImageSummary\n",
" self.sm_image = P.ImageSummary()\n",
"\n",
" def construct(self, x):\n",
" self.sm_image(\"image\",x)\n",
" x = self.conv1(x)\n",
" x = self.relu(x)\n",
" x = self.max_pool2d(x)\n",
......@@ -323,88 +283,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 训练网络和测试网络构建"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 使用SummaryCollector记录训练数据"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`summary_callback`,即是`SummaryCollector`,在`model.train`的回调函数中使用,可以记录训练数据溯源和模型溯源信息。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, SummaryCollector, Callback\n",
"from mindspore.train import Model\n",
"import os\n",
"\n",
"def train_net(model, epoch_size, mnist_path, repeat_size, ckpoint_cb, summary_collector):\n",
" \"\"\"Define the training method.\"\"\"\n",
" print(\"============== Starting Training ==============\")\n",
" # load training dataset\n",
" ds_train = create_dataset(os.path.join(mnist_path, \"train\"), 32, repeat_size)\n",
" model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor(), summary_collector], dataset_sink_mode=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 使用SummaryCollector记录测试数据"
"## 记录数据及启动训练"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`summary_callback`,即是`SummaryCollector`,在`model.eval`的回调函数中使用,可以记录训练精度信息和测试样本数量信息。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mindspore.train.serialization import load_checkpoint, load_param_into_net\n",
"MindSpore 提供 `SummaryCollector` 进行记录训练过程中的信息。通过 `SummaryCollector` 的 `collect_specified_data` 参数,可以自定义记录指定数据。\n",
"\n",
"def test_net(network, model, mnist_path, summary_collector):\n",
" \"\"\"Define the evaluation method.\"\"\"\n",
" print(\"============== Starting Testing ==============\")\n",
" # load the saved model for evaluation\n",
" param_dict = load_checkpoint(\"checkpoint_lenet-3_1875.ckpt\")\n",
" # load parameter to the network\n",
" load_param_into_net(network, param_dict)\n",
" # load testing dataset\n",
" ds_eval = create_dataset(os.path.join(mnist_path, \"test\"))\n",
" acc = model.eval(ds_eval, callbacks=[summary_collector], dataset_sink_mode=True)\n",
" print(\"============== Accuracy:{} ==============\".format(acc))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 主程序运行入口"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"初始化`SummaryCollector`,使用`collect_specified_data`控制需要记录的数据,我们这里只需要记录模型溯源和数据溯源,所以将`collect_train_lineage`和`collect_eval_lineage`参数设置成`True`,其他的参数使用`keep_default_action`设置成`False`,SummaryCollector能够记录哪些数据,请参考官网:<https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.train.html?highlight=collector#mindspore.train.callback.SummaryCollector>。"
"在本次体验中,我们将记录训练数据与数据集预处理的操作,我们将 `collect_specified_data` 中的 `collect_train_lineage`, `collect_eval_lineage`, `collect_dataset_graph` 设置成 `True`。SummaryCollector的更多用法,请参考[API文档](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.train.html?highlight=collector#mindspore.train.callback.SummaryCollector)。\n"
]
},
{
......@@ -422,7 +310,7 @@
" context.set_context(mode=context.GRAPH_MODE, device_target = \"GPU\")\n",
" lr = 0.01\n",
" momentum = 0.9 \n",
" epoch_size = 3\n",
" epoch_size = 10\n",
" mnist_path = \"./MNIST_Data\"\n",
" \n",
" net_loss = SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True, reduction='mean')\n",
......@@ -436,10 +324,22 @@
" ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_lenet\", config=config_ck)\n",
" model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
" \n",
" collect_specified_data = {\"collect_eval_lineage\":True,\"collect_train_lineage\":True}\n",
" collect_specified_data = {\"collect_eval_lineage\": True, \"collect_train_lineage\": True, \"collect_dataset_graph\": True}\n",
" summary_collector = SummaryCollector(summary_dir=\"./summary_base/quick_start_summary01\", collect_specified_data=collect_specified_data, keep_default_action=False) \n",
" train_net(model, epoch_size, mnist_path, repeat_size, ckpoint_cb, summary_collector)\n",
" test_net(network, model, mnist_path, summary_collector)"
"\n",
" # Start to train\n",
" ds_train = create_dataset(os.path.join(mnist_path, \"train\"), 32, repeat_size)\n",
" model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, summary_collector], dataset_sink_mode=True)\n",
"\n",
" print(\"============== Starting Testing ==============\")\n",
" # load the saved model for evaluation\n",
" param_dict = load_checkpoint(\"checkpoint_lenet-10_1875.ckpt\")\n",
" # load parameter to the network\n",
" load_param_into_net(network, param_dict)\n",
" # load testing dataset\n",
" ds_eval = create_dataset(os.path.join(mnist_path, \"test\"))\n",
" acc = model.eval(ds_eval, callbacks=[summary_collector], dataset_sink_mode=True)\n",
" print(\"============== Accuracy:{} ==============\".format(acc))"
]
},
{
......@@ -453,74 +353,36 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"这里主要展示如何启用及关闭MindInsight,更多的命令集信息,请参考MindSpore官方网站:<https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/visualization_tutorials.html>。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 启动MindInsight服务\n",
"这里主要展示如何启用及关闭MindInsight,更多的命令集信息,请参考MindSpore官方网站:<https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/visualization_tutorials.html>。\n",
"\n",
" 在安装过MindInsight的环境中启动MindInsight服务:\n",
" - `--summary-base-dir`:MindInsight指定启动工作路径的命令;`./summary_base`表示SummaryRecord保存文件夹的目录。\n",
" - `--port`:MindInsight指定启动的端口,数值可以任意为1~65535的范围内。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.system(\"mindinsight start --summary-base-dir=./summary_base --port=8080\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"查询是否启动成功,在网址输入:`127.0.0.1:8080`,如果看到如下界面说明启动成功。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![image](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/mindinsight/images/summary_list.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 关闭MindInsight服务\n",
"启动MindInsight服务命令:\n",
"\n",
" 在安装过MindInsight的环境中输入命令:`mindinsight stop --port=8080`\n",
" - `mindinsight stop`:MindInsight关闭服务命令。\n",
" - `--port=8080`:即MindInsight服务开启在`8080`端口,所以这里写成`--port=8080`。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 模型溯源"
"mindinsight start --summary-base-dir=./summary_base --port=8080\n",
"\n",
"\n",
"- `--summary-base-dir`:MindInsight指定启动工作路径的命令;`./summary_base` 为 `SummaryCollector` 的 `summary_dir` 参数所指定的目录。\n",
"- `--port`:MindInsight指定启动的端口,数值可以任意为1~65535的范围内。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 连接到模型溯源地址"
"停止MindInsight服务命令:`mindinsight stop --port=8080`\n",
"\n",
"- `mindinsight stop`:MindInsight关闭服务命令。\n",
"- `--port=8080`:即MindInsight服务开启在`8080`端口,所以这里写成`--port=8080`。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"浏览器中输入:`127.0.0.1:8080`,点击模型溯源,如下模型溯源界面:"
"## 模型溯源\n",
"\n",
"### 连接到模型溯源地址\n",
"\n",
"浏览器中输入:`http://127.0.0.1:8080`,点击模型溯源,如下模型溯源界面:"
]
},
{
......@@ -534,46 +396,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"我们可以勾选展示列,由于训练过程涉及的参数很多,在调整训练参数时,一般只会调整少量参数,所以对大部分相同参数可以去掉勾选,不显示出来,使得用户更方便的观察不同参数对模型训练的影响,上图中的不同参数的竖直线段代表的各个参数,数根连接各个参数的折线图代表不同的模型训练过程,其中各参数从左到右如下:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 训练日志路径:表示存储记录数据的文件夹路径,即`summary_dir`。\n",
"- `Accuracy`:模型的精度值。\n",
"- `loss`:模型的loss值。\n",
"- 网络:表示神经网络名称。\n",
"- 优化器:表示训练过程中采用的优化器。\n",
"- 训练样本数量:训练样本数量。\n",
"- 测试样本数量:测试样本数量。\n",
"- 学习率:learning_rate的值。\n",
"- `epoch`:训练整个数据集的次数。\n",
"- `steps`:训练迭代数。\n",
"- device数目:启用的训练卡数目。\n",
"- 模型大小:生成的模型文件`.ckpt`的大小。\n",
"- 损失函数:表示训练过程中采用的损失函数。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"根据上述记录的信息,我们可以调整模型训练过程中的参数,训练生成模型,然后选择要对比的训练,进行比对观察分析。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 观察分析记录下来的溯源参数"
"如图所示,在页面的左上角中,我们可以选择要查看的训练信息,并通过这个功能挑选出我们要关注的训练信息,而不是要查看所有的训练信息,避免影响对不同训练作业进行对比分析。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 观察分析记录下来的溯源参数\n",
"\n",
"下图选择了数条不同参数下训练生成的模型进行对比:"
]
},
......@@ -588,29 +419,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"在这几次训练的参数中,优化器,epoch和学习率都不一致,可以看到不同的训练生成的模型精度`Accuracy`和loss值是不一致的,当然最好是调整单个参数来观察对模型生成的影响,避免多重因素干扰,难以分辨哪个参数是正影响,哪个参数是负影响。这需要我们调整不同的参数,多训练几遍生成模型,分析各参数对训练产生的影响,这对前期学习AI训练时很有帮助。在以后应对复杂训练时,可以节省不少时间。\n",
"> 在多次训练时,需要将`summary_dir`的指定为不同的文件夹,否则训练记录的数据会生成在同一个文件夹下,而在同一文件夹下MindInsight只会读取最新生成的文件。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 数据溯源"
"在这几次训练的参数中,优化器,epoch和学习率都不一致,可以看到不同的训练生成的模型准确率`Accuracy`和损失值是不一致的,当然最好是调整单个参数来观察对模型生成的影响,避免多重因素干扰,难以分辨哪个参数是正影响,哪个参数是负影响。\n",
"\n",
"在多次训练时,需要为 `SummaryCollector` 的 `summary_dir` 参数的指定不同的文件夹,否则训练记录的数据会生成在同一个文件夹下,会导致MindInsight展示的数据为非预期。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 连接到数据溯源地址"
"## 数据溯源\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"浏览器中输入:`127.0.0.1:8080`连接上MindInsight的服务,点击模型溯源,如下图数据溯源界面:"
"点击模型溯源,如下图数据溯源界面:"
]
},
{
......@@ -624,60 +449,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"数据溯源的根本是重现数据集从左到右进行数据增强的整个过程,方便自己发现增强过程中是否有遗漏的步骤或者不合理的操作,方便自己查找错误,也方便自己找到最优的数据增强方式,毕竟一个好的数据集对模型训练是有事半功倍的效果的。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 训练日志路径:表示存储记录数据的文件夹路径,即`summary_dir`。\n",
"- `MnistDataset`:表示数据集信息,包含数据集路径。\n",
"- `Map_TypeCast`:定义数据集的类型。\n",
"- `Map_Resize`:图像缩放后的尺寸。\n",
"- `Map_Rescale`:图像的缩放比例。\n",
"- `Map_HWC2CHW`:数据集的张量由:高×宽×通道-->通道×高×宽。\n",
"- `Shuffle`:数据集混洗的缓存空间。\n",
"- `Batch`:每组训练样本数量。\n",
"- `Repeat`:数据图片复制次数,用于增强数据的数量。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 观察分析数据溯源参数"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"可以从上图看到数据增强过程由原数据集MnistDataset开始,按照先后顺序经过了下面的操作:label的数据类型转换(`Map_Typecast`),图像的高宽缩放(`Map_Resize`),图像的比例缩放(`Map_Rescale`),图像数据的张量变换(`Map_HWC2CHW`),图像混洗(`Shuffle`),图像成组(`Batch`),图像数量增强(`Repeat`)然后输出训练需要的数据。显然这样的可视化的数据溯源功能,在你检查数据增强操作是否有误的时候,比起一行行的去检查代码效率多了。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 最后关闭MindInsight服务"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.system(\"mindinsight stop --port=8080\")"
"数据溯源的是记录了每次训练中对数据集进行操作的流程,通过分析数据溯源,查看数据预处理过程中是否有遗漏的步骤或者不合理的操作。\n",
"\n",
"如图所示,图中几次训练的数据增强过程由原数据集MnistDataset开始,按照先后顺序经过了下面的操作:\n",
"\n",
"- label的数据类型转换(`Map_Typecast`)\n",
"- 图像的高宽缩放(`Map_Resize`)\n",
"- 图像的比例缩放(`Map_Rescale`)\n",
"- 图像数据的张量变换(`Map_HWC2CHW`)\n",
"- 图像混洗(`Shuffle`)\n",
"- 图像成组(`Batch`)\n",
"- 图像数量增强(`Repeat`)\n",
"\n",
"在数据溯源中,可以对不同训练所使用的数据预处理操作进行对比,快速发现数据预处理中存在的问题。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"以上就是这次对MindInsight的使用方法,模型溯源和数据溯源的全部过程。"
"以上就是本次对MindInsight的模型溯源和数据溯源的体验全过程。"
]
}
],
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册