# Guideline to Convert Training Data enwiki to MindRecord For Bert Pre Training
<!-- TOC -->
-[What does the example do](#what-does-the-example-do)
-[How to use the example to process enwiki](#how-to-use-the-example-to-process-enwiki)
-[Download enwiki training data](#download-enwiki-training-data)
-[Process the enwiki](#process-the-enwiki)
-[Generate MindRecord](#generate-mindrecord)
-[Create MindDataset By MindRecord](#create-minddataset-by-mindrecord)
<!-- /TOC -->
## What does the example do
This example is based on [enwiki](https://dumps.wikimedia.org/enwiki) training data, generating MindRecord file, and finally used for Bert network training.
1. run.sh: generate MindRecord entry script.
2. run_read.py: create MindDataset by MindRecord entry script.
- create_dataset.py: use MindDataset to read MindRecord to generate dataset.
## How to use the example to process enwiki
Download enwiki data, process it, convert it to MindRecord, use MindDataset to read MindRecord.