fix: update mindrecord example docs

bfa0f98c · jonyguo · c984c48f · bfa0f98c
隐藏空白更改
内联并排

Showing with 48 addition and 9 deletion

example/convert_to_mindrecord/README.md example/convert_to_mindrecord/README.md +48 -9

未找到文件。
--- a/example/convert_to_mindrecord/README.md
+++ b/example/convert_to_mindrecord/README.md
-# MindRecord generating guidelines
+# Guideline to Efficiently Generating MindRecord

 <!-- TOC -->

- [MindRecord generating guidelines](#mindrecord-generating-guidelines)
+- [What does the example do](#what-does-the-example-do)
+- [Example test for ImageNet](#example-test-for-imagenet)
+- [How to use the example for other dataset](#how-to-use-the-example-for-other-dataset)
    - [Create work space](#create-work-space)
    - [Implement data generator](#implement-data-generator)
    - [Run data generator](#run-data-generator)

+
 <!-- /TOC -->

-## Create work space
+## What does the example do
+
+This example provides an efficient way to generate MindRecord. Users only need to define the parallel granularity of training data reading and the data reading function of a single task. That is, they can efficiently convert the user's training data into MindRecord.
+
+1.  run_template.sh: entry script, users need to modify parameters according to their own training data.
+2.  writer.py: main script, called by run_template.sh, it mainly reads user training data in parallel and generates MindRecord.
+3.  template/mr_api.py: uers define their own parallel granularity of training data reading and single task reading function through the template.
+
+## Example test for ImageNet
+
+1. Download and prepare the ImageNet dataset as required.
+
+    > [ImageNet dataset download address](http://image-net.org/download)
+
+    Store the downloaded ImageNet dataset in a folder. The folder contains all images and a mapping file that records labels of the images.
+
+    In the mapping file, there are three columns, which are separated by spaces. They indicate image classes, label IDs, and label names. The following is an example of the mapping file:
+    ```
+    n02119789 1 pen
+    n02100735 2 notbook
+    n02110185 3 mouse
+    n02096294 4 orange
+    ```
+2. Edit run_imagenet.sh and modify the parameters
+3. Run the bash script  
+    ```bash  
+    bash run_imagenet.sh
+    ```  
+4. Performance result
+
+    |  Training Data |  General API | Current Example |  Env  |
+    | ---- | ---- | ---- | ---- |
+    |ImageNet(140G)|  2h40m |  50m  |  CPU: Intel Xeon Gold 6130 x 64, Memory: 256G, Storage: HDD |
+
+## How to use the example for other dataset
+### Create work space

 Assume the dataset name is 'xyz'
 * Create work space from template
@@ -18,7 +56,7 @@ Assume the dataset name is 'xyz'
    cp -r template xyz
    ```

-## Implement data generator 
+### Implement data generator

 Edit dictionary data generator  
 * Edit file 
@@ -27,20 +65,21 @@ Edit dictionary data generator
    vi xyz/mr_api.py
    ```

- Two API, 'mindrecord_task_number' and 'mindrecord_dict_data', must be implemented
+Two API, 'mindrecord_task_number' and 'mindrecord_dict_data', must be implemented
 - 'mindrecord_task_number()' returns number of tasks. Return 1 if data row is generated serially. Return N if generator can be split into N parallel-run tasks.
 - 'mindrecord_dict_data(task_id)' yields dictionary data row by row. 'task_id' is 0..N-1, if N is return value of mindrecord_task_number()

-
 Tricky for parallel run
- For imagenet, one directory can be a task.
+- For ImageNet, one directory can be a task.
 - For TFRecord with multiple files, each file can be a task.
 - For TFRecord with 1 file only, it could also be split into N tasks. Task_id=K means: data row is picked only if (count % N == K) 

+### Run data generator

-## Run data generator 
 * run python script 
    ```shell
    cd ${your_mindspore_home}/example/convert_to_mindrecord
-    python writer.py --mindrecord_script imagenet [...]
+    python writer.py --mindrecord_script xyz [...]
    ```
+    > You can put this command in script **run_xyz.sh** for easy execution
+