For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#tfrecord) format.
For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#tfrecord) format.
```
For general task, schema file contains ["input_ids", "input_mask", "segment_ids"].
For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"].
`numRows` is the only option which could be set by user, the others value must be set according to the dataset.
For example, the dataset is cn-wiki-128, the schema file for general distill phase as following:
{
"datasetType": "TF",
"numRows": 7680,
"columns": {
"input_ids": {
"type": "int64",
"rank": 1,
"shape": [256]
},
"input_mask": {
"type": "int64",
"rank": 1,
"shape": [256]
},
"segment_ids": {
"type": "int64",
"rank": 1,
"shape": [256]
}
}
}
```
# [Script Description](#contents)
## [Script and Sample Code](#contents)
...
...
@@ -117,7 +149,7 @@ options:
--save_checkpoint_step steps for saving checkpoint files: N, default is 1000
--load_teacher_ckpt_path path to load teacher checkpoint files: PATH, default is ""
--data_dir path to dataset directory: PATH, default is ""
--schema_dir path to schema.json file, PATH, default is ""
--schema_dir path to schema.json file, PATH, default is ""