{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 加载数据集\n", "\n", "## 概述\n", "\n", "MindSpore可以帮助你加载常见的数据集、特定数据格式的数据集或自定义的数据集。加载数据集时,需先导入所需要依赖的库`mindspore.dataset`。\n", "\n", "接下来,以加载数常用数据集(CIFAR-10数据集)、特定格式数据集以及自定义数据集为例来体验MindSpore加载数据集操作。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 整体流程\n", "\n", "1. 准备环节。下载本次体验流程所需的数据集。\n", "2. 加载常用数据集并输出结果,以CIFAR-10二进制数据集为例。\n", "3. 加载特定格式数据集并输出结果,以MindRecord格式数据集为例。\n", "4. 加载自定义数据集并输出结果。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 准备环节\n", "\n", "### 导入`mindspore.dataset`模块" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import mindspore.dataset as ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 下载所需数据集\n", "\n", "1. 在当前`notebook`工作目录创建`./datasets/cifar-10`目录,用于存放数据集。\n", "2. 在当前`notebook`工作目录创建`./datasets/mindrecord`目录,用于后续存放转换后的MindRecord格式数据集文件。\n", "3. 下载[CIFAR-10二进制格式数据集](https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz),并将数据集文件解压到`./datasets/cifar-10/cifar-10-batches-bin`目录下。\n", "4. 下载数据集[CIFAR-10 Python文件格式数据集](http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz),并将数据集文件解压到`./datasets/cifar-10/cifar-10-batches-py`目录下。\n", "\n", " 此时当前`notebook`工作目录下`datasets`目录结构为:\n", "\n", " ```shell\n", " $ tree datasets\n", " datasets\n", " ├── cifar-10\n", " │   ├── cifar-10-batches-bin\n", " │   │   ├── batches.meta.txt\n", " │   │   ├── data_batch_1.bin\n", " │   │   ├── data_batch_2.bin\n", " │   │   ├── data_batch_3.bin\n", " │   │   ├── data_batch_4.bin\n", " │   │   ├── data_batch_5.bin\n", " │   │   ├── readme.html\n", " │   │   └── test_batch.bin\n", " │   └── cifar-10-batches-py\n", " │   ├── batches.meta\n", " │   ├── data_batch_1\n", " │   ├── data_batch_2\n", " │   ├── data_batch_3\n", " │   ├── data_batch_4\n", " │   ├── data_batch_5\n", " │   ├── readme.html\n", " │   └── test_batch\n", " └── mindrecord\n", " ```\n", "\n", " 其中:\n", " - `cifar-10-batches-bin`目录为CIFAR-10二进制格式数据集目录。\n", " - `cifar-10-batches-py`目录为CIFAR-10 Python文件格式数据集目录。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 加载常见的数据集\n", "\n", "MindSpore可以加载常见的标准数据集。支持的数据集如下表:\n", "\n", "| 数据集: | 简要说明 |\n", "| :---------: | :-------------:|\n", "| ImageNet | ImageNet是根据WordNet层次结构组织的图像数据库,其中层次结构的每个节点都由成百上千个图像表示。 |\n", "| MNIST | 是一个手写数字图像的大型数据库,通常用于训练各种图像处理系统。 |\n", "| CIFAR-10 | 常用于训练图像的采集机器学习和计算机视觉算法。CIFAR-10数据集包含10种不同类别的60,000张32x32彩色图像。 |\n", "| CIFAR-100 | 该数据集类似于CIFAR-10,不同之处在于它有100个类别,每个类别包含600张图像:500张训练图像和100张测试图像。|\n", "| PASCAL-VOC | 数据内容多样,可用于训练计算机视觉模型(分类、定位、检测、分割、动作识别等)。|\n", "| CelebA | CelebA人脸数据集包含上万个名人身份的人脸图片,每张图片有40个特征标记,常用于人脸相关的训练任务。 |\n", "\n", "加载常见数据集的详细步骤如下,以创建`CIFAR-10`对象为例,用于加载支持的数据集。\n", "\n", "1. 使用二进制格式的数据集(CIFAR-10 binary version),配置数据集目录,定义需要加载的数据集实例。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "DATA_DIR = \"./datasets/cifar-10/cifar-10-batches-bin\"\n", "cifar10_dataset = ds.Cifar10Dataset(DATA_DIR)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. 创建迭代器,通过迭代器读取数据。此处读取前2个图像及其标签。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The data of image 1 is below:\n", "[[[179 147 140]\n", " [173 148 138]\n", " [131 108 98]\n", " ...\n", " [129 90 77]\n", " [167 140 124]\n", " [188 172 154]]\n", "\n", " [[177 156 131]\n", " [182 167 142]\n", " [120 108 85]\n", " ...\n", " [156 142 130]\n", " [199 171 159]\n", " [174 126 106]]\n", "\n", " [[145 129 103]\n", " [128 107 81]\n", " [166 144 118]\n", " ...\n", " [145 129 115]\n", " [138 94 72]\n", " [179 108 84]]\n", "\n", " ...\n", "\n", " [[123 135 91]\n", " [134 146 101]\n", " [113 123 86]\n", " ...\n", " [117 106 79]\n", " [ 87 81 67]\n", " [ 80 80 56]]\n", "\n", " [[148 159 114]\n", " [135 146 103]\n", " [125 135 97]\n", " ...\n", " [150 137 93]\n", " [123 116 88]\n", " [124 120 93]]\n", "\n", " [[150 162 102]\n", " [160 171 115]\n", " [132 141 97]\n", " ...\n", " [139 126 79]\n", " [113 100 84]\n", " [ 98 83 72]]]\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAOcAAAD3CAYAAADmIkO7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAZEklEQVR4nO2dW2yc6V3Gn3eOnvOMPWN7bCexE8fZxMlumu0hadm2KqUUUQRtpQoQQqpEVXGDRC9A3HC6QCCBQIgLEBdc0AtECypdoAeWtkv3mD002Rw2J8dnO7bH9njGcz58XMQrrOp9PgRtN+9Wz++m6fvPO/PNN98z7+Z93v//bzzPgxDCPQKP+gKEEHYkTiEcReIUwlEkTiEcReIUwlEkTiEcReJ8RBhjbhhjPvyor0O4i8T5iPA8b9bzvO886ut4C2NMxBjzZWPMgjHG0w/Ho0fiFId5DsCvAHjwqC9ESJyPjIMV6qPGmN83xnzJGPNFY0zVGHPNGDNjjPkdY8ymMWbZGPOxQ/M+a4x58+Dv3jfGfP77Xve3jDHrxpg1Y8yvHayC0wexqDHmT40xS8aYDWPMXxtjYgDgeV7b87y/8DzvOQC9t/VmCCsSpxv8HIC/B5AD8D0A38DD72YcwB8C+JtDf3cTwCcApAF8FsCfG2MuAIAx5uMAvgDgowCmAXzo+97nTwDMADh/EB8H8Ls/ig8kfnAkTjf4rud53/A8rwvgSwAKAP7Y87wOgH8AMGmMyQKA53n/5nnenPeQZwF8E8BTB6/zGQB/53neDc/z6gD+4K03MMYYAJ8D8Jue5+14nlcF8EcAfvFt+ozi/0joUV+AAABsHPpzA0DJ87zeof8PAEkAZWPMzwD4PTxcAQMA4gCuHfydMQCvHnqt5UN/Lhz83dce6hQAYAAEf0ifQfyQkTjfQRhjogD+CcCvAvgXz/M6xpiv4KHIAGAdwMShKUcO/bmEh0Kf9Txv9W24XPEDov+sfWcRARAFsAWge7CKfuxQ/B8BfNYYc9oYE8ehf096ntcH8Ld4+G/UYQAwxowbY376rb9zsGE08NZ7GWMGzKFlVry9SJzvIA7+nfgbeCjCXQC/DOCrh+JfA/CXAL4N4B6AFw9CrYP//e2D8ZeMMRUAzwA4degtbuPh6jqOh5tSDQDHfkQfR/wvGCVb//hijDkN4DqA6MFmk3gHoZXzxwxjzCcPTvvk8NA6eVrCfGcicf748Xk8/DfpHB4eJvj1R3s54v+L/rNWCEfRyimEo/j6nP/6V39Gl9W26dN5N2/dso6nB/lvQTYSobF6o0Nju60mjd24edseMNx3v/jeMzQWRJvGKrt7PLZXp7F6xX6LO42GdRwAMqkajY3GkjQWSmRprByx3+PBbIHOGQjw76xV5dfY68Vo7F0f/4R1/M7NK3TO2hy3bXeX79DY6ZkZGkN/gIaWVu3vN3PhCTqnMHmUxn7y05+z2lVaOYVwFIlTCEeROIVwFIlTCEeROIVwFIlTCEfxtVL6Mb6dH0skaKyGqnX8qZ/9/sT8/yEei9LY17/4LRrrdXjSxKmzs9bxWqlC53T3uV3S6Nk/FwBkBjM01mpxK6jft7/mqZkj1nEA2Jov0dj63BaNjZ7h1xgdsNsbjRL/zLHcGI3V9/ZpLBDh9ldu0P4cGMPvYaPKTydGk3kai01M0Fizz9et8aL9c08e4d/Z+huv0Bg+bR/WyimEo0icQjiKxCmEo0icQjiKxCmEo0icQjiKr5WysLpCY7ubmzT2+MmT1vFkj5/0/+Yzr9LY1//rNRr7yIV301i+YLcHSkvL1nEA2OpwCyAU47mv7TC3dNo+d3n05Ih1PBjmWT+5kRSNZQL8GqN5ng2SSaWt4+km/85eu36PxkyA/+6/9+JZGouE7J87TGcAsRjPMhqbOEFj4Sy3llKpURq7d/WKdfy7T79A55xO82whhlZOIRxF4hTCUSROIRxF4hTCUSROIRzFd7e21+HhyeJjNPbyc9es4y9cfp3OmV/dpbF0yL6TCAAeeH2eWCpnHb/0U++jc7IxvkuXzNpfDwC8ID98HQ7xHc9I0r7z2u7wz7Ux9yaNlebs9ZsAoN7jO9ER8n7lfT5nZDTLXy/O6wvt1HzqC3lx63gyN0znmMg6ja084K7CR951icYWbvCd6P49+25/fYc/w3f3eQIBQyunEI4icQrhKBKnEI4icQrhKBKnEI4icQrhKL5WSnGEl+K/+tINHrt90zo+MckPbBcHuU1x5uw5GpuZtR8cB4BE1l7nKBji9Y+uvzZHY63bd2ksn+XWwf4er+sTHxq0jg8VjtM5wwluYx25dIHGtnYWaGxz3v597jTX6JxcbojGPJ+f/UaH1wMKhuxH3EfHinTOrWvcWpqe5PcqNsCfg3u3uSU1f93+jAzwRwCNPq9bxdDKKYSjSJxCOIrEKYSjSJxCOIrEKYSjSJxCOIqvlbK9ycv+v/IqzzAZnbRnFgzl+dZ1KsFtlqlpbpesLi/R2O4Vuw2QG+KvN/fmPI0N8EQRlPo9Gmv6ZM4MFu33xKvwbs3Xb3FrJjl4isaKx6do7PRZe6ZO/wley2i7skNj6yu8o3Snye9Hh3QPL4zzVgejxyZpbPYcvx8bW7xGVjfK7Z7cmWPW8U6dZ9uEu7y1CUMrpxCOInEK4SgSpxCOInEK4SgSpxCOInEK4Si+VsrcnUUaGynygkuZnL08fqfNt6eLMzwr5dWXX6Sx7QVu94zn7ZbOTpu3YxjM8A7b9fVtGssWeJfnzDRvCdDt27ffo31uO42keOuHRoVfY2WR2yKXl+wW0vkPfZjOmZ7ln2tsgneUvvHmFRpr9uzXmBsdp3Mefx8v2Gaa/H5EB/Zo7PT7+PfZatmvsVrjFtEIKTbnh1ZOIRxF4hTCUSROIRxF4hTCUSROIRxF4hTCUXytlPIWL0o0PGIvTAUAraC9v0YunaVzokHeuzgd4vZGNM4LP0XI1najx7e8x07w18tdnKCxWJJbBzWPW0hDKbsdEfLpU5PI8A7Vt32ydLo+3aYHPfv7BUr8Xl2+8S0amzrGu0ZPZHjhuHrJnumyFeD9UCaO8u9s+RbPjql0eebP3AN7kToASAXsvW+OTvGsn063SWMMrZxCOIrEKYSjSJxCOIrEKYSjSJxCOIrvbm2sz3cZExG+u2rC9kO+tTLf+Vu89YDGZqftNVsA4PxTF2lsecVepn9t+z6d88R7eM2ZW0s+RYS6vAZSa5XX/Fkv2Tsv58d5h+2BJO+UHQ7Zkw4AIJ3P0li3Zt9N/M+v/jOdU6ryrtFbp/h3Fkzxa8wcse/Wno3zR7U+wHdd37h9hcbuLi7QWNe0aCw9RJ7vfa6XzT1+rxhaOYVwFIlTCEeROIVwFIlTCEeROIVwFIlTCEfxtVI8fr4aL16+QmNNMnGIdJoGgOMnec2WZ7/HDyHvtjwae/yc/VD5xAg/SL+8VaWxnU0ei6Z4l+dz73k3jc0t2VsCLO7wpIOAx2sIpeq7NIYGbxmRmbIf6o8ucNtme8GnHcPr/DqCGf6aHyjYawU1G9ymeObpb9DYKy9fo7FQNE1j+TH+rLaa9hpC9SZ/Fls+956hlVMIR5E4hXAUiVMIR5E4hXAUiVMIR5E4hXAUXytly/DsgUqIl/aPhOxbyv0In3N/lde+KVe5rZAeidDYBUza5+zzj92s8WybWJdvld9fuUdjuw1uwVx87yXreMun5szNV75LY8Ynm2Isxi2M6n37/Z+dfZzOiRd4h3D0ujT0+hyv6xMI2+2N55+/Tue88B88ZgL2TtkAkMrx72V/gXepHi3an5FMjK91Zp8/OwytnEI4isQphKNInEI4isQphKNInEI4isQphKP4WimNPt9qLh7hWRiJYNI6XqvxLIZ4yqflgk/Gx4kTvLR/eXPN/noN+/UBwPo6z6bo9+xtJgDgxMkZGtva4bbC9cvPWccvXuIWxnvef4HGrnz7eRrb2eCFsKJB+6Owt8MLU03keRGyqemzNJYgmScAsN+z2xQLPlZV3ecpjsf4Nc6v8e86kea2X6W9YR1f3+Bd1s+cOkpjDK2cQjiKxCmEo0icQjiKxCmEo0icQjiKxCmEo/haKZ/8ed43ZHOLZz8MDtp7SXR7vMiR6fPfif0uzywIxHx6aOyUrePBPt8mz6S5bbNR5p85HOJFt548z22RN77zHev4s1/7Op3zxAd4f5jx49M0VtrhVlAsEbeOdzZ5tlC1wq2xvuHfdTLKK8ddfuOqdTyb59lHJ05O0tjONs9oOnOGz2s2uY24vWvvmVOvNeicqQ5/PhhaOYVwFIlTCEeROIVwFIlTCEeROIVwFIlTCEfxtVJCYW4dpJJ86rHj9j4ThtfOwuYq35bvNXlWR7/PryMRt2ckZONFOifq8d+rwhGeAbNZXqax/Rb/bDOXPmgdX7zN+8N873VeICtk7JYIAFxdsGfpAMDZ82es49PHH6Nz0lVeBGuvzC2MRmmbxqZG8tbxOnjGR23PniUCAGNjWRorjvN7tV/nBbniW/Z5Xps/O8vL6zTG0MophKNInEI4isQphKNInEI4isQphKP47tZ2+vywbrfJd3Iru/a6M90wPwzd7vAd2WaDv1e7zg8bd6PHrOOtQIrO6ff561W2ec2ZdIzv7gUD9oPSALDXH7SOHzvH6wRt3+f1dEorfJe0uc8P/G/v2A96jyf5jmY8MUxjwSBv5TFZ4Pcq0itbx+/u8PpHQ4P8+8zmeL2oSmOPX0eCWwtJkjixt8G7b+dyWRpjaOUUwlEkTiEcReIUwlEkTiEcReIUwlEkTiEcxddKabd8rJQ2txwCnn1ru9PgnZVTcb4d3u7yg9L9NrdZdjZWSMR+MB8Ajs6epjEscXsg2OBb/c/++1dp7NiTn7KOT80coXMGjvIWA+0+b2vxiTO8ZURh2F73KR7nj8jWHn8Gjg7xJIHK/Vs0dvO2vUt1PMevwwtnaazR4tfYD3JLZyjJXzOesVtjIzl7V24AQJvbLAytnEI4isQphKNInEI4isQphKNInEI4isQphKP41xAK8W354Qm+nb9XtWveC/KS+q0G3/Ju+bRPSMb59nUmb68VNDV5nM6ZX1qksUC9TGPY4fV0vDa3iWYmTtrn+LQzuLXAbZtTs+/n1wGe+dML2t+vCn7vUwVul7T4W2GrxdtrDBTslk5yxCe7pMazSxpN3vohmuDWmE+yEyID9musevwZgE+WDkMrpxCOInEK4SgSpxCOInEK4SgSpxCOInEK4Si+Vorn8SySjRLv/AvYs1nyw9yaCXg8Aybo8xvSqPI9+5ZnL9K0WuLtEa49+yKNhXu8UFdqiG/Zf/wzv0RjiYLdCnrh8kt0TjbPbaxmh9sU9xZ4NkitbS9eVhzmnb77PkXZdtZ5i4THTtvtIwCIt+0Fzyo+WT+9Fn9OaxWetdTs8U7f8XSWxsKhjHW8UufZU+k0t9MYWjmFcBSJUwhHkTiFcBSJUwhHkTiFcBSJUwhH8bVStnb51rDX56fsp6bsp/bXH9h7qABAIpalsdgAt2A2d3n/ktigfV42ym0PpHnxr9ffnKexT138BRpLJrgd8ZWvfNk6fmf+Lp0zlOH9S7ZK/H7UfIqyDcTtxa7OzHLbo1wu01gmwa2Dco/Hql379QfDfB1p7fPP7PV4Ya14gj8HPo8cjGfP4MmmeIZU3+M2HEMrpxCOInEK4SgSpxCOInEK4SgSpxCO4rtbmxzku4KFQV4/JkG2ujauztE58Sg/RH1q9iiNpQv8I+zsNe3XsXWTztno8QP9kVFexyaZ5Dt/psYP9ZfW7Dvit27xneGJIt9RDvl02E7l+PdZHLHvsHd7/OB4x+OxsWMnaGy3zg/FN2Hf1Sz4dNjOHuW74V1etgrdKH92DPh2bYkkfSSTvBs2jI9DQNDKKYSjSJxCOIrEKYSjSJxCOIrEKYSjSJxCOIqvlRII8S37RosfKF6Zt9foGStO0jm1/TKNtVq8NUEgyre8V7sl63h+eJjOeWKEWwB7S/wz97fXaOzqA94uIJ+zWxhel9svxuNfW6Fgr28DAD1wmwWe/Xd6Z7dMp1Tr3Epp+LRcQJ/Pq3XtNkXCcJsiA25TrK1y26YT4ckbE+P2WkYAEInYP1upxD9XIsnrHDG0cgrhKBKnEI4icQrhKBKnEI4icQrhKBKnEI7ia6U0anzrPRrhU+MpewbBUMZuGwAAhvnW9bWrvI1Ay6cD8WjRbpkkBvm1JyI88yRa59vhd+Zu8NfM807aP3H+vHW8uc3rN/UNbyNQ73ELJhT128632wPxJLeqeuCvt73Ls3sGh/n9T5O6Pq02X0e2fdpoLyxVaGxkcoTGqlX+YMXidgumXuNWW7XKbRaGVk4hHEXiFMJRJE4hHEXiFMJRJE4hHEXiFMJRfK2U3ZK9QBYARH225XstuwXz3Lcv0zmjI3kaa9R4VkoowjMSNkv2rJRQm1d9yg7xYlGtOs+0MHH+mks7yzR2sjdjHf/gpafonCv379OY1+PX2OqX+byQ3Z7hxgwwNsILr/UNt1I2NngGzweeumAdX1nklsj113kxtNknJ2nMxLm9sb3F7ap80G7RZTM8c2avUqYxhlZOIRxF4hTCUSROIRxF4hTCUSROIRxF4hTCUXytlNde5T1FRlfHaOyJ2WPW8USK2x77tTKN5Qs8YyU3OEpjlao9s6NKxgFgbPAIjUXr/DOvbS/R2N25FRrr1J+xjh+b5NfhxbnBsXj3No2FY30aG07araB0MstfL8Dto2qTZ4pMjE/SWLdpXy92N3mRtME8L0SXH+QFz7phbpdkszyD6sGaPWNlY3Odzhk/wq1ChlZOIRxF4hTCUSROIRxF4hTCUSROIRzFd7d2dLRIYxurNRqrT9p3s45N853VgOGH2ytl/l4P1hZobPyI/f1aMb7bOb/Mu28HVnmNmLs3+LxULk1jQxP23cT1Bj8c3vf4LumZs3xHudriO54myO4Jf6+VZb5DPTjCO1HXa3wn9+qr9t3mkOE7shMTvBbQKy/y3etQnK9NJ87w1wyG7M/jufN8Ts2ndQVDK6cQjiJxCuEoEqcQjiJxCuEoEqcQjiJxCuEovlbK9DQ/fJ2J2+vzAEA6bi/hHzR8W75U4h2Ik3FemyWY5bZIJGKP9Zu8o3GzxA9Dh/s+3aYj3DpoN3lbC1anKZziv5t7FX5wf5tfPpJZ3moCffs11qJ1OmW/UaaxqQw/6D3i03376spd63inwe99p8nrJnldbn/FwzwRo13h1kd5035PukNcTsanizZDK6cQjiJxCuEoEqcQjiJxCuEoEqcQjiJxCuEovlbK9q6PXZLlmRYPth5YxxMJbjdsrXMPoBHjFsxjZ8ZpzITs2+hLG2U652jxJL+OPX6N2zneCXl/n2/LL92315157PwEnTNzgnfK3ihv0ljap9XEyvKudbzV5C05xsZ4BszeLrc3bl2/QmPHp+yvGY3wdeTyi4s0ls5maezocZ5F0u/yLKkEsQpLGzzrJxL3sbEIWjmFcBSJUwhHkTiFcBSJUwhHkTiFcBSJUwhH8bVSEOLZFH4zw1H7af96m2cW3LnjY9tEuQUTCfEO29m8PRMgk7N3JgaA0naZx9Z55sx+l9slYz6tFZbX7DZArcGLmg207Fv5ABDo8lhjj7dj8Dz7dxMM8WJc0SjP+Niv+nQj9/nOEil7G4SJozzLZX6RX0elwu/jis/3GYvxBzydtNsiG2vcdlpc4QXb8AX7sFZOIRxF4hTCUSROIRxF4hTCUSROIRxF4hTCUXytlOMnecZHu8mtg0TMblUsL5bpnHiOF31q1fh2+Oo6t2A6xp45UxzO0jkrmz4ZDmmeWeAFuYURyfHbXByw36tEhs95/vJlGosHeRfwZJb3GylM2a2KdotbKc06t0uiPgXPjszyHjzXr71pHV9Y4rbH+AmeHZNv8WwhnyQpLM2v0tj+nv3Z73e5RVTZ49fB0MophKNInEI4isQphKNInEI4isQphKNInEI4iq+VUtqyF30C/It1zd1fsI6Xd3lWRDrPt6ELM9weCHg8I6HeqVjHx48/Sec0G9we2NngPUoCAX4rYwPcJrp77751PLjuYwEYXqiL9akBgFCY9+votOwZFbEEt19qe/zeez0fC6a1w19z3/6MNNu8Z0s3zJ/T4kSBxjpdfo83ScEzABjJ2+2v3RKfMzTIvxeGVk4hHEXiFMJRJE4hHEXiFMJRJE4hHMV3t7Y4ysvVV6v8MHoyad/hW1nih4nPneNtECIhvss7d9u+2wkAU8ftLQ0ebPOWBe0g34EcKPBOyLeu8+vIjPD6N8dP2lsrdGplOqdV5bWY6g17ewcA6LT4TnSkYf+dDid9ujV7vMbUyy9dpbFgkO8aT588Zh3P5VJ0zvwcr8+zu12lsUSY38ezM9M0liSXEonz58OE+OF8hlZOIRxF4hTCUSROIRxF4hTCUSROIRxF4hTCUXytlECQd3IuFvnW9sqS/WBzs85fr13nh5AH4vxQfKTHD2Y/uGPvNPygx+sORRP8vcanuLV04d3vorFChtceWrxutz7iXR+74RSv7bS0xq2g+wv2juMAEB2wf5/7PV4rKhriyQ9jRW4fhXwsjGIxax2vVMt0znCGd1mvVfn96Ef4dWTG+bpVITZXvRymc/Z27XP80MophKNInEI4isQphKNInEI4isQphKNInEI4ivF8MguEEI8OrZxCOIrEKYSjSJxCOIrEKYSjSJxCOIrEKYSj/DcMRJ1bAu63tAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The label of image 1 is : 6\n", "The data of image 2 is below:\n", "[[[ 91 93 133]\n", " [ 94 97 127]\n", " [ 75 86 127]\n", " ...\n", " [ 86 89 117]\n", " [ 84 86 113]\n", " [ 80 80 110]]\n", "\n", " [[ 96 104 130]\n", " [ 98 106 124]\n", " [ 83 99 124]\n", " ...\n", " [102 102 111]\n", " [ 99 101 110]\n", " [ 75 88 106]]\n", "\n", " [[ 76 92 126]\n", " [ 91 101 126]\n", " [ 89 104 132]\n", " ...\n", " [100 104 114]\n", " [102 106 115]\n", " [ 88 95 116]]\n", "\n", " ...\n", "\n", " [[127 113 132]\n", " [139 125 138]\n", " [147 131 142]\n", " ...\n", " [159 127 111]\n", " [133 127 137]\n", " [133 124 139]]\n", "\n", " [[132 120 135]\n", " [140 129 136]\n", " [142 130 138]\n", " ...\n", " [166 133 115]\n", " [139 130 136]\n", " [141 133 142]]\n", "\n", " [[118 115 143]\n", " [126 121 143]\n", " [115 111 134]\n", " ...\n", " [148 130 146]\n", " [139 130 156]\n", " [129 121 146]]]\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAOcAAAD3CAYAAADmIkO7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYZklEQVR4nO2dW4xk11WG/11V51SdunVVV19mumemZzzj8TU2ASVKMEpAmBAkIuAVQUikSBEvIPEAygtBICEhIaEgkAJI8BAEEhaEBKQQxRBHAQWFBAePHdvxTObqvkx3dVfX5dT11OZh2tIQ7X+LIcG9Pfk/yfJ4r9l1dp1z/t7utfZay1hrIYQIj9xxL0AI4UbiFCJQJE4hAkXiFCJQJE4hAkXiFCJQJM5jwhjzkjHmR497HSJcJM5jwlr7mLX2ueNexxsYY95ljPmCMWbfGLNrjHnGGHPyuNf1/YzEKd6gCeBPAZwFsAGgB+AvjnNB3+9InMeEMeaaMeZpY8xvHe1Sf2mM6RljLhljLhpjPmaMuW2MuWmMed9d8z5sjHn56O9+2xjz0e/43F83xmwZYzaNMR8xxlhjzIUjW9EY8/vGmBvGmB1jzCeNMQkAWGs/Z619xlrbtdamAP4IwFNv5j0R/xOJMww+AOBTuLN7PQ/g87jzbNYB/DaAP7nr794G8NMA6gA+DOAPjDE/CADGmPcD+DUATwO4AOC933Gd3wNwEcAPHNnXAfwmWdN7ALz03X0t8d1gdLb2eDDGXAPwEQA/AuApa+1PHI1/AMBfA1iw1mbGmBqALoCmtbbj+Jy/B/BFa+0njDF/DmDHWvuxI9sFAK8BeBDAFQB9AE9Ya68c2d8N4K+stee+4zOfAPAcgJ+x1n75e/zVxf+SwnEvQAAAdu768xDAnrU2u+u/AaAKoGOM+SkAH8edHTAHoAzg0tHfWQPwtbs+6+Zdf14++rtfN8a8MWYA5O9eyJGgPwfgVyXM40XifAthjCkC+FsAHwTwGWvt9GjnfENtWwBO3TXl9F1/3sMdoT9mrX2dfP4GgGcB/I619lPf4+WLe0S/c761iAEUAewCmB3tou+7y/43AD5sjHnEGFPGXb9PWmvnAP4Md35HXQEAY8y6MeYn3/gzgH8B8MfW2k++Kd9GeJE430JYa3sAfgV3RHgA4OcBfPYu++cA/CGALwK4DOArR6bx0b9/42j8340xXdzZJR86sn0EwAMAPm6M6b/xz//vNxI+5BC6jzHGPALgRQBFa+3suNcj7g3tnPcZxpifM8bExpgm7oRO/kHCfGsicd5/fBR3fie9AiAD8MvHuxzxf0X/WytEoGjnFCJQvHHOX/jg79JtNV+Y0HntgXs8V8jcBgCLRf5z4rBHPhDAIOW2pFx1jseRexwA6gk1AYb/X8Z8PKW2U/kDarvVdc8bdvfpnOWsR2025Q7W+olT1JZfWHGO77Vv0zm34wVqm5fXqG25yN+D3tT9XlmTd44DQOx5LpjzeYXcnNqGGbfN5+7rzTP+q/1swr/zpz/9CeMa184pRKBInEIEisQpRKBInEIEisQpRKBInEIEijeUElW4+zeX47YSkfxC4vQYAwCyMQ+J9LM9ajMxNWGh4Q6ZVBMe9gD3vCPq71DbxYiv8XyVu/qv3H7FOW4KfI31InfZ2xJ/LmceP01thbp7/Bv//A065/qsRW3dFo9JmYQ/tFW4QxhDMg4A8ym/H+W4TG2zOb9X6YjbGnm3LfMc6Nnu3fsJSu2cQgSKxClEoEicQgSKxClEoEicQgSKxClEoHhDKXt97v5dSLiu48jtKi8Ynsmy002pLW/5tRaqNWoziJzjw0N+rVlvi9oejjrU9mCNZ57k+jyEdLZaco63zvCwR1Lj33nS5Vkp1RM89JHPu9fxtjPLdM6pTofavtV5kdqm8TlqKzdWneM7Ex5+uTn0vDuR+x0AgLEnbpZlY2qbk4/M5/hzrhU9MT+Cdk4hAkXiFCJQJE4hAkXiFCJQJE4hAsXrrY2L3JvVanIvWI84z0bpiC/E41WLcvxAcZLweZ3DtnN8uvVtOue05bV78p7aNzt9/t2qEb/NJ066a/ecefRxOicq8+88n/E1zpIGtcWRux5Q2eNh72y5D+0DQO7yJrXVquSUPYC07D4wb22FzhnXuCe0l3lqCOX53rTW5PM6Q/ezzpHaQgBwsnrv+6B2TiECReIUIlAkTiECReIUIlAkTiECReIUIlC8oZTlBj9gHcW8pktu7D40nBl+uUKe2yKPy3tCrgUA6a6zuzoezbbpnLMNHqYYpfxaA9JWAQAmGT+YvXHefcC9XOOtDmbg7RhKFU8/iSY/xJ7LuedFhTN0TnmRt7UYH/wbtaXdLrWtNtztH3rg9wMl3vrh5ZGnVYPh71W1wsMz/Zn73c88e1064yEphnZOIQJF4hQiUCROIQJF4hQiUCROIQJF4hQiULyhFOupibLXH1Jb3rpdzTMyDgALJR7CsFMewmjf5mGR2tSdlXKCRwAQ57nL2+cOn3rK92887K6LAwCrD204xyd93lEaEQ8PzKY8bGNHnjYUU/f6o5i/IskJHmaJm/wm3/jPq9RWtO5nnSwX6RzjadVQSng2S87TLRuW3ysTu8MsOcP1MlUoRYj7B4lTiECROIUIFIlTiECROIUIFIlTiEDxhlJQ4O7rEnjooDx3t3GYeDJZpjMemrm9eYvaRru8WNejdbc7PCnxjIPDMV9He59ngzTWebjkXR/6RWoze+72D7ee/xqds/jwWWrL53mYpXvjVWpbWD3hHB8NeBjLeAp1VU+7Pw8A0q9fpraDG+4QUjHfoHPyrVPUVvOEUmyO702VPP/e8cjd8mLsyZAqxe52Fz60cwoRKBKnEIEicQoRKBKnEIEicQoRKBKnEIHiDaUslMvUNpnyrtf5qVvz8xE/mT8ccttkzPuQPLTIC1q9veW21Q0P6bzW4Z2hp55eGGvneJGp/3rpJrV9/tlvOMefrPGsiHdveLoug68x9WTOVAru6/U3edZPN+VZHWvv+WFqa77IP3Prm+5wTzTi78Di9JDa5mV3LxoA2DU8vLHX4yG1RtH9fmcF/i7O5v6opQvtnEIEisQpRKBInEIEisQpRKBInEIEit+FZLnnL8576q9kbq9gOuUe2Szlh8qTjHsn85Z/Zjxxr2PsqQ/jq/VSSvjtmh7sUds/fuZZanuud945fmqFe/4GmzvU1nzgEWpLzrnrFQFAuv2Cc/yFywM655lXXqa2D6FJbQXSvRoADofuZ10acC96M+P1lmZj3lIkzfPD+W3uHMactAdJEp4okvY99ZsI2jmFCBSJU4hAkTiFCBSJU4hAkTiFCBSJU4hA8YZSmnVef2VxgXca7vTcYZEbhwd0zmTKQyntvU2+jgo/gJ/m3aGUWo23Clhf5welb9ziB7annpYRP3ThJLX1r7rbJ5z0rLG+4Q6/AEDx/JPUVqktUZt9xX3Qu3+Z12i65Al/femz/0ptTyzxxIP6Yss5Ph3zUETW4wffo4SHYEyRt09IEn7/pzP3e5XleHhxnuPvKUM7pxCBInEKESgSpxCBInEKESgSpxCBInEKESjeUMrKEncnz+c8dDCcuLNZTtYadM5+kZ/of23I3dAzPg3lxN0tezzh9WHGntpImcedP5zw8MAHfuyd1HZm2R2eqVf4o4lXzlFbIeHhEhzw0Icpu8NmJ5r8Br+/wTNxLrZ4y4uk4KmBRFplxJ73o+RpQbGzv0ttnUXemXupyt/93sgdgjlMeUZTzhNmoXPueYYQ4k1B4hQiUCROIQJF4hQiUCROIQJF4hQiULyhlK12h9rKZe6+Xmq6iypF4OGGW5e4mz/vOdG/7U7qAADszt3u98H2Pp1TrPBuzestXixqeZVn6eSqvOx/c8n9CEzKMy1efo1n6Vz9Ku8CXpnwe/zYujuEsdbia/+ln30btfW7vOjW3tXr1La44L7HacrDWLOUZ55kxQa1jeEOtQHAzPJ9K59zr8Vb826uAl9C3DdInEIEisQpRKBInEIEisQpRKBInEIEijeUMp/yAkjNKi/+VSUn8AdjnqkQ13kII5fjHZkHGf/58uquO87y+ArP3HjwPC+e1d3lBb5aCzxrYueFV6itPyDhjYxnzvzdV69Q26tdnk3R4FERDB90P7OnnuT9RKJ8l9o6PW7L84gaGg13H5WtnQ6d0xt4rnXhNLUVEk/PFtJnBwCWim7ZdD3ZU8UC/zyGdk4hAkXiFCJQJE4hAkXiFCJQJE4hAkXiFCJQvKGUxgJ3Nbc7vB15So7nF0jxJgBYPfsAtV1/5RK12X6H2po1d4bDwxfX6JykyH9eRUu8lfqep3jWQZuHPpokm2We43GP5ZOnqO1FTzbIgBSmAoDRyO3qt5mn4NmI2w73OtQ2GfACX/22O2OoXOLPpdfmGVIJeNpSzrM3FWJ+r1jfk5zhmSfWk+XC0M4pRKBInEIEisQpRKBInEIEisQpRKB4vbU3dtrUNp1xj9vKovtQvM34nLhQprZatUFtnUPeEmBKDli/vsu/13w0oraFMq8htNPh81Zb3Ou9mpA6Rz1+UPqxDe6tneR4LSPj6URdmW45x/de36Fzys0GtY1ynnYM/Gw+pqn7wH8y56/qdJGv46DIPfNRju9NZV5eCO2e2yubFPl3HrGX0YN2TiECReIUIlAkTiECReIUIlAkTiECReIUIlC8oZT+gHfqTar8YHAtYgffed2hm9d5iX5fjZiZ5evYTt3u6/Mzfqi8ZvjnDTxlYAaZx1Ve4qGUatUd+jhs83pF+TlvJ/GOB9aprX2Nh5Bm2wfO8bjEExL4sXfgm1v8mb33oRVqy43crRX2dvjnxcu8TtBg+SFqiwq8f8Joxr8da2LeqvNwYDThYUSGdk4hAkXiFCJQJE4hAkXiFCJQJE4hAkXiFCJQvKGUaoWHPhqLfGpCMi26Q15jZa/H2w+UEu6irlieKbLbddc5+tJLvGv002+/QG2tNd7G4bDH6wR12zwbBA+7MxmWLvK2EJWL76K26spZatv6Ok8Huf2C+9nUWot0zmDMn2c64yGpy9s8LHIuccer4jp/zlmeX8t6wl/5hKeeDEh2DAAUyfXyMW/JwS0c7ZxCBIrEKUSgSJxCBIrEKUSgSJxCBIrEKUSgeEMpxRI3V2N+oj8zbs3vtN2ZDwBQX+TZFLWT7uJTAJBe4QWohjO3q3805xkC+Sp3eq+d52sc7GzydWzzImSvXHrNOb7+OM+maJR4CIDcegBA2dNeo7rsbjUxM/w539q6RW0PrvPQx6ttHvowJGy2lrizVQBgFPP3NMt5wiyee1XyhFnqpJDX7T7P4pp7QksM7ZxCBIrEKUSgSJxCBIrEKUSgSJxCBIrEKUSgeEMpdU9337TLOwZvpu5CUp0Bz2JoGp4+ME15t+aFEp9XIr1NhhkPKfiyB4rgYYUn3vEktbVv8JDDYM9drGva42Gnwfa3qK1/+UVqO7j6ErUNB+4Mns513sF8bnlH6ccff4zaDm/wvjLPT9wFzw6aPKOmudSiNluoU1t3wNeRFLk0otgdSslmXBOTiSc9hqCdU4hAkTiFCBSJU4hAkTiFCBSJU4hA8Xprszn3Th5M+CFfRG4v3ukab4Nw/Uv/RG2NAfd2turci9equL2yr+3xGkIH+7y+TWHGvZOnzvAOyhunT/DrbbsP7vc89XlKVe6BjBu8u3Ih4595cN19AH9g+XNurfNEgLVHLlJbmnS47bb74Hu6zNtClDxRhdKIr39G2oYAQNHwZ52O3PWF4ojLKc7xazG0cwoRKBKnEIEicQoRKBKnEIEicQoRKBKnEIHiDaV0x7y7b63Cj4jXyIHom19+ls6Ju9+mtmKJu7WNp96+se4WxIs13t5hPOfhhkKR15WJPHVs5hkv7X/itLvLc63DWzj0tnnrh4Hl6xiP+TrKdXcIprnMQ0SVUzxE5IkqwOQ97TVWNpzjQ89znnoO4CcVHlrqtHktqdNVXgOpD/cB95HnbPt8eu/7oHZOIQJF4hQiUCROIQJF4hQiUCROIQJF4hQiULyhlDjjNVFaRR5KufofzzvHd1/+Cp3zwJq7HQAAlPJ8mQW4wyUAMEzd4Yiyx89vDHe9Dw54W4jDiN+rcsKzccZTd42efIH/3Fxc4J/X87S8yE861GZL7uvVWst0TrnqrvcDAJ4EHszmPFNkSjI+cgX+nfMJv1cZCacBQFTgocK9CX+eBTLPeDJ4hlO1YxDivkHiFCJQJE4hAkXiFCJQJE4hAkXiFCJQvKGU9pVL1Dbe5gWL2tfcWRPlEg9TWOMpf5/3FGKKPF2eSYLJfsqzM9aWeBZGc/kktUVVjzt/yt3yo0N36GPiKUy1sHaW2oqLPCSVT/j9z5GCVpHn/k77PHNmOOFF1Oqe8Fd17m5PMc7zez8e8bYKacrvY+zper3b5utPyLxyiYd7ZkPe1oKhnVOIQJE4hQgUiVOIQJE4hQgUiVOIQJE4hQgUbyhlf2+P2mbci45p5naV+9qr1FJebKm54sl+mPIPzaYkeyDiLu9ylRefKpR5X5ZckVd3Oty+QW2j1B0GiBLefXvs6ZLc3OA9RVDkRasGt646xyfdbTrnYI93HN+6yTN4shLPdJlH7jBFu8sLr41GPFSVTfm9KhpP+MsTyupm7tBe6gn5DQe8Bw9DO6cQgSJxChEoEqcQgSJxChEoEqcQgeL11tbL3KtZ9dRtieMzzvF9Ty2g3e7r/PM6fWprlXgto1zefWh7MuaeuHLE68qMBu5D2QDQvd6mtv4m91yOSI2bqMLXYW9xz1+9y7/bmXf8OLXNim5P9MyTkLC/yw+Hb27zw+jG8mdda7jX36lyL/Q0x73Q8HTzHnm6h+c9ntdSyW2zlj+zWVHtGIS4b5A4hQgUiVOIQJE4hQgUiVOIQJE4hQgUbygl8RwQn894HR6Tc3/s8jp3h49r/HD7ZnuT2kZd7s4vGfeh54mnS3LF0wk555m3u8nXaHv8XnUH7gP//ZQfKp9MeQ2e7EX3AXYAuHmNh4IWl9whqVrEQyIHhzzE1U35GssxD2Ekkx3n+IblCQlt8LDHIQkRAcA44rY44uvPFdzXG4958kaS54kM9Dr3PEMI8aYgcQoRKBKnEIEicQoRKBKnEIEicQoRKN5QSqOxSG1RgbuN2Wl/m/FQRNXT6iC36s5yAYDhgGeD3Lj+knP8VJGvvejJRsgmvHx/vcXv1aGnzhFIjZu4wEMHkx6vmROTMBYA7F1x3w8AmO645+3n+c/v6Zy3r256um+Pxzw8E5P9onBwi87JDblteYE/l73Wo9Q2KK9SWwHu7z0e+bJcvFJzop1TiECROIUIFIlTiECROIUIFIlTiECROIUIFK9/t1ypUNt4xMMR87n7RH8p4e71VpO7vOeGhzBmdoXaJiQqkm27O28DwMEuz7SIcu4u1ABQXeBZNdOMh1J6Q9LugLS0AID6Mu9ePfd0ebaeglbzmft6gy7PqEk8GTyLdV54bWeLP8/FonvezFOoyxR4iC6X8mJiFcNDUteyJ6htXD3nHF+otegc63mHGdo5hQgUiVOIQJE4hQgUiVOIQJE4hQgUiVOIQPGGUkaeDsrVKnfnl2LWrZm73qvNBrXNPIWTMk/I4dzp887xcZNnHNxKeaGu3e0BtVV6fB2NIg8hpak7zDLmCR9YXeH3PtvnWTrdQ96OvJ+6n/XJFl97XOKvTxTz7J5T60vUVpy77/GI3CcAKJc8xbPyPKRTXnD30gEAA14o7ZsDd4ixUFuncyaed5ihnVOIQJE4hQgUiVOIQJE4hQgUiVOIQJE4hQgUbyhl7ingNPf0p0iq7gyNyZhnOHQOeZgi7znRXy5yN3q14naVLzSW6ZzhkNvGc09bcU9IZ2S4G3246g6L7Ny6TOfYNs/QWLKe7AfSOwbg/W2qDd7SPVfkoYix5SGM2hIPpezduOYcT4ee4moFbqvW+frznhDXepOHq/b77vu/OefhHpPjemFo5xQiUCROIQJF4hQiUCROIQJF4hQiULze2qjIPUymwL2TU+v2auY8pf2zGfd05WJ+YN7jNKY1i3LGs/Yp9/xFEff8ZaRuEgBMLO+gXF+/4ByvPvJOOqd75RK1RQVPa4IG9052B27PfLLq6Qwdc49s7Dm5P146RW2W1DLKJ7zT98BzAP/k6dPUZuaeOlgRf7GKsft6cca/c86TCEDn3PMMIcSbgsQpRKBInEIEisQpRKBInEIEisQpRKB4QyndKXe9e5o8Y5C6D7j7ar0Mh/xQfCnmLupSzA+jR+RAdN7wn0mjET/MHceen2WGu8r7qafFQ9vd4mFpmbeZsItPUtvl/hq1zc0WtZmS+z4eDvn32vYcwF/xHEafgx8qn43d1xvmeN2n2+0OtbUNf4eXKvx97A/5s+4vbBAL70be7/PWDwztnEIEisQpRKBInEIEisQpRKBInEIEisQpRKAYaz11/4UQx4Z2TiECReIUIlAkTiECReIUIlAkTiECReIUIlD+G3JDoBlkE8HyAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The label of image 2 is : 5\n" ] } ], "source": [ "from matplotlib import pyplot as plt\n", "\n", "count = 0\n", "for data in cifar10_dataset.create_dict_iterator():\n", "# In CIFAR-10 dataset, each dictionary of data has keys \"image\" and \"label\".\n", " image = data[\"image\"]\n", " print(f\"The data of image {count+1} is below:\")\n", " print(image)\n", " plt.figure(count)\n", " plt.imshow(image)\n", " plt.title(f\"image{count+1}\")\n", " plt.axis('off')\n", " plt.show()\n", " print(f\"\\nThe label of image {count+1} is :\", data[\"label\"])\n", " count += 1\n", " if count == 2:\n", " break\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 加载特定数据格式的数据集\n", "\n", "\n", "### MindSpore数据格式\n", "\n", "MindSpore天然支持读取MindSpore数据格式——`MindRecord`存储的数据集,在性能和特性上有更好的支持。 \n", "\n", "> 阅读[将数据集转换为MindSpore数据格式](https://www.mindspore.cn/tutorial/zh-CN/r0.7/use/data_preparation/converting_datasets.html),了解如何将数据集转换为MindSpore数据格式。\n", "\n", "可以通过`MindDataset`对象对数据集进行读取。详细方法如下所示:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. 将CIFAR-10数据集转换为`MindRecord`数据格式。此处使用的数据集为CIFAR-10 Python文件格式数据集(`cifar-10-batches-py`)。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MSRStatus.SUCCESS" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from mindspore.mindrecord import Cifar10ToMR\n", "\n", "\n", "CIFAR10_DIR = \"./datasets/cifar-10/cifar-10-batches-py\"\n", "MINDRECORD_FILE = \"./datasets/mindrecord/cifar10.mindrecord\"\n", "cifar10_transformer = Cifar10ToMR(CIFAR10_DIR, MINDRECORD_FILE)\n", "cifar10_transformer.transform(['label'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. 使用`MindDataset`类创建数据集`data_set`,用于读取数据。其中`dataset_file`为指定MindRecord的文件或文件列表。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "\n", "data_set = ds.MindDataset(dataset_file=\"./datasets/mindrecord/cifar10.mindrecord\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. 创建字典迭代器,通过迭代器读取数据记录。此处读取前5个数据的标签数据。" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "6\n", "0\n", "6\n", "9\n" ] } ], "source": [ "num_iter = 0\n", "for data in data_set.create_dict_iterator():\n", " print(data[\"label\"])\n", " num_iter += 1\n", " if num_iter == 5:\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 加载自定义数据集\n", "\n", "现实场景中,数据集的种类多种多样,对于自定义数据集或者目前不支持直接加载的数据集,有两种方法可以处理。\n", "一种方法是将数据集转成MindRecord格式(请参考[将数据集转换为MindSpore数据格式](https://www.mindspore.cn/tutorial/zh-CN/r0.7/use/data_preparation/converting_datasets.html)章节),另一种方法是通过`GeneratorDataset`对象加载,以下将展示如何使用`GeneratorDataset`。\n", "\n", "1. 定义一个可迭代的对象,用于生成数据集。以下展示了两种示例,一种是含有`yield`返回值的自定义函数,另一种是含有`__getitem__`的自定义类。两种示例都将产生一个含有从0到9数字的数据集。\n", " \n", "> 自定义的可迭代对象,每次返回`numpy array`的元组,作为一行数据。 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "    以下一段代码创建含有`yield`返回值的自定义函数`generator_func`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import numpy as np # Import numpy lib.\n", "\n", "\n", "def generator_func(num):\n", " for i in range(num):\n", " yield (np.array([i]),) # Notice, tuple of only one element needs following a comma at the end.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "    创建含有`__getitem__`的自定义类:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import numpy as np # Import numpy lib.\n", "\n", "\n", "class Generator():\n", "\n", " def __init__(self, num):\n", " self.num = num\n", "\n", " def __getitem__(self, item):\n", " return (np.array([item]),) # Notice, tuple of only one element needs following a comma at the end.\n", "\n", " def __len__(self):\n", " return self.num\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. 使用`GeneratorDataset`创建数据集,并通过给数据创建迭代器的方式,获取相应的数据。\n", "\n", " - 将`generator_func`传入`GeneratorDataset`创建数据集`dataset1`,并设定`column`名为“data” 。\n", " - 将定义的`Generator`对象传入`GeneratorDataset`创建数据集`dataset2`,并设定`column`名为“data” 。\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "    以下一段代码分别对`dataset1`和`dataset2`创建返回值为序列类型的迭代器,并打印输出数据。" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dataset1:\n", "[array([0], dtype=int32)]\n", "[array([1], dtype=int32)]\n", "[array([2], dtype=int32)]\n", "[array([3], dtype=int32)]\n", "[array([4], dtype=int32)]\n", "[array([5], dtype=int32)]\n", "[array([6], dtype=int32)]\n", "[array([7], dtype=int32)]\n", "[array([8], dtype=int32)]\n", "[array([9], dtype=int32)]\n", "dataset2:\n", "[array([0], dtype=int64)]\n", "[array([1], dtype=int64)]\n", "[array([2], dtype=int64)]\n", "[array([3], dtype=int64)]\n", "[array([4], dtype=int64)]\n", "[array([5], dtype=int64)]\n", "[array([6], dtype=int64)]\n", "[array([7], dtype=int64)]\n", "[array([8], dtype=int64)]\n", "[array([9], dtype=int64)]\n" ] } ], "source": [ "dataset1 = ds.GeneratorDataset(source=generator_func(10), column_names=[\"data\"], shuffle=False)\n", "dataset2 = ds.GeneratorDataset(source=Generator(10), column_names=[\"data\"], shuffle=False)\n", "\n", "print(\"dataset1:\") \n", "for data in dataset1.create_tuple_iterator(): # each data is a sequence\n", " print(data)\n", "\n", "print(\"dataset2:\")\n", "for data in dataset2.create_tuple_iterator(): # each data is a sequence\n", " print(data)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "    以下一段代码分别对`dataset1`和`dataset2`创建迭代器,并打印输出数据果。" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dataset1:\n", "[0]\n", "[1]\n", "[2]\n", "[3]\n", "[4]\n", "[5]\n", "[6]\n", "[7]\n", "[8]\n", "[9]\n", "dataset2:\n", "[0]\n", "[1]\n", "[2]\n", "[3]\n", "[4]\n", "[5]\n", "[6]\n", "[7]\n", "[8]\n", "[9]\n" ] } ], "source": [ "dataset1 = ds.GeneratorDataset(source=generator_func(10), column_names=[\"data\"], shuffle=False)\n", "dataset2 = ds.GeneratorDataset(source=Generator(10), column_names=[\"data\"], shuffle=False)\n", "\n", "\n", "print(\"dataset1:\")\n", "for data in dataset1.create_dict_iterator(): # each data is a dictionary\n", " print(data[\"data\"])\n", "\n", "print(\"dataset2:\")\n", "for data in dataset2.create_dict_iterator(): # each data is a dictionary\n", " print(data[\"data\"])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 总结\n", "\n", "以上便完成了MindSpore加载数据集的体验,我们通过本次体验全面了解了MindSpore加载数据集的几种方式和支持的数据集类型、如何创建自定义数据集,以及输出展示加载后的数据集结果。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 4 }