diff --git a/model_zoo/official/nlp/bert/README.md b/model_zoo/official/nlp/bert/README.md index bf37e269ab7f0714183fd9095c1c6ce5c3f368c3..3d1a6fbdc74c3b934bdda6c2f447aca5c1a41a73 100644 --- a/model_zoo/official/nlp/bert/README.md +++ b/model_zoo/official/nlp/bert/README.md @@ -73,6 +73,60 @@ For distributed training, a hccl configuration file with JSON format needs to be Please follow the instructions in the link below: https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. +For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#tfrecord) format. +``` +For pretraining, schema file contains ["input_ids", "input_mask", "segment_ids", "next_sentence_labels", "masked_lm_positions", "masked_lm_ids", "masked_lm_weights"]. + +For ner or classification task, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"]. + +For squad task, training: schema file contains ["start_positions", "end_positions", "input_ids", "input_mask", "segment_ids"], evaluation: schema file contains ["input_ids", "input_mask", "segment_ids"]. + +`numRows` is the only option which could be set by user, the others value must be set according to the dataset. + +For example, the dataset is cn-wiki-128, the schema file for pretraining as following: +{ + "datasetType": "TF", + "numRows": 7680, + "columns": { + "input_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "input_mask": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "segment_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "next_sentence_labels": { + "type": "int64", + "rank": 1, + "shape": [1] + }, + "masked_lm_positions": { + "type": "int64", + "rank": 1, + "shape": [32] + }, + "masked_lm_ids": { + "type": "int64", + "rank": 1, + "shape": [32] + }, + "masked_lm_weights": { + "type": "float32", + "rank": 1, + "shape": [32] + } + } +} +``` + # [Script Description](#contents) ## [Script and Sample Code](#contents) @@ -87,11 +141,12 @@ https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. ├─hyper_parameter_config.ini # hyper paramter for distributed pretraining ├─run_distribute_pretrain.py # script for distributed pretraining ├─README.md - ├─run_classifier.sh # shell script for standalone classifier task - ├─run_ner.sh # shell script for standalone NER task - ├─run_squad.sh # shell script for standalone SQUAD task + ├─run_classifier.sh # shell script for standalone classifier task on ascend or gpu + ├─run_ner.sh # shell script for standalone NER task on ascend or gpu + ├─run_squad.sh # shell script for standalone SQUAD task on ascend or gpu ├─run_standalone_pretrain_ascend.sh # shell script for standalone pretrain on ascend ├─run_distributed_pretrain_ascend.sh # shell script for distributed pretrain on ascend + ├─run_distributed_pretrain_gpu.sh # shell script for distributed pretrain on gpu └─run_standaloned_pretrain_gpu.sh # shell script for distributed pretrain on gpu ├─src ├─__init__.py @@ -122,7 +177,7 @@ https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. usage: run_pretrain.py [--distribute DISTRIBUTE] [--epoch_size N] [----device_num N] [--device_id N] [--enable_save_ckpt ENABLE_SAVE_CKPT] [--device_target DEVICE_TARGET] [--enable_lossscale ENABLE_LOSSSCALE] [--do_shuffle DO_SHUFFLE] - [--enable_data_sink ENABLE_DATA_SINK] [--data_sink_steps N] + [--enable_data_sink ENABLE_DATA_SINK] [--data_sink_steps N] [--save_checkpoint_path SAVE_CHECKPOINT_PATH] [--load_checkpoint_path LOAD_CHECKPOINT_PATH] [--save_checkpoint_steps N] [--save_checkpoint_num N] @@ -361,55 +416,59 @@ The result will be as follows: ## [Model Description](#contents) ## [Performance](#contents) ### Pretraining Performance -| Parameters | BERT | BERT | +| Parameters | Ascend | GPU | | -------------------------- | ---------------------------------------------------------- | ------------------------- | -| Model Version | base | base | +| Model Version | BERT_base | BERT_base | | Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G | | uploaded Date | 08/22/2020 | 05/06/2020 | | MindSpore Version | 0.6.0 | 0.3.0 | -| Dataset | cn-wiki-128 | ImageNet | +| Dataset | cn-wiki-128(4000w) | ImageNet | | Training Parameters | src/config.py | src/config.py | | Optimizer | Lamb | Momentum | | Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy | | outputs | probability | | -| Loss | | 1.913 | -| Speed | 116.5 ms/step | 1.913 | -| Total time | | | +| Epoch | 40 | | | +| Batch_size | 256*8 | 130(8P) | | +| Loss | 1.7 | 1.913 | +| Speed | 340ms/step | 1.913 | +| Total time | 73h | | | Params (M) | 110M | | | Checkpoint for Fine tuning | 1.2G(.ckpt file) | | -| Parameters | BERT | BERT | +| Parameters | Ascend | GPU | | -------------------------- | ---------------------------------------------------------- | ------------------------- | -| Model Version | NEZHA | NEZHA | +| Model Version | BERT_NEZHA | BERT_NEZHA | | Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G | | uploaded Date | 08/20/2020 | 05/06/2020 | | MindSpore Version | 0.6.0 | 0.3.0 | -| Dataset | cn-wiki-128 | ImageNet | +| Dataset | cn-wiki-128(4000w) | ImageNet | | Training Parameters | src/config.py | src/config.py | | Optimizer | Lamb | Momentum | | Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy | | outputs | probability | | -| Loss | | 1.913 | -| Speed | | 1.913 | -| Total time | | | +| Epoch | 40 | | | +| Batch_size | 96*8 | 130(8P) | +| Loss | 1.7 | 1.913 | +| Speed | 360ms/step | 1.913 | +| Total time | 200h | | | Params (M) | 340M | | | Checkpoint for Fine tuning | 3.2G(.ckpt file) | | #### Inference Performance -| Parameters | | | | -| -------------------------- | ----------------------------- | ------------------------- | -------------------- | -| Model Version | V1 | | | -| Resource | Ascend 910 | NV SMX2 V100-32G | Ascend 310 | -| uploaded Date | 08/22/2020 | 05/22/2020 | | -| MindSpore Version | 0.6.0 | 0.2.0 | 0.2.0 | -| Dataset | cola, 1.2W | ImageNet, 1.2W | ImageNet, 1.2W | -| batch_size | 32(1P) | 130(8P) | | -| Accuracy | 0.588986 | ACC1[72.07%] ACC5[90.90%] | | -| Speed | 59.25ms/step | | | -| Total time | | | | -| Model for inference | 1.2G(.ckpt file) | | | +| Parameters | Ascend | GPU | +| -------------------------- | ----------------------------- | ------------------------- | +| Model Version | | | +| Resource | Ascend 910 | NV SMX2 V100-32G | +| uploaded Date | 08/22/2020 | 05/22/2020 | +| MindSpore Version | 0.6.0 | 0.2.0 | +| Dataset | cola, 1.2W | ImageNet, 1.2W | +| batch_size | 32(1P) | 130(8P) | +| Accuracy | 0.588986 | ACC1[72.07%] ACC5[90.90%] | +| Speed | 59.25ms/step | | +| Total time | 15min | | +| Model for inference | 1.2G(.ckpt file) | | # [Description of Random Situation](#contents) diff --git a/model_zoo/official/nlp/bert/scripts/ascend_distributed_launcher/run_distribute_pretrain.py b/model_zoo/official/nlp/bert/scripts/ascend_distributed_launcher/run_distribute_pretrain.py index 6489f7c1e320e7681de7ba09bc949e483600c71c..794aaf7234b3ee90332fb39568eb198808b5e94f 100644 --- a/model_zoo/official/nlp/bert/scripts/ascend_distributed_launcher/run_distribute_pretrain.py +++ b/model_zoo/official/nlp/bert/scripts/ascend_distributed_launcher/run_distribute_pretrain.py @@ -122,7 +122,7 @@ def distribute_pretrain(): print("core_nums:", cmdopt) print("epoch_size:", str(cfg['epoch_size'])) print("data_dir:", data_dir) - print("log_file_dir: " + cur_dir + "/LOG" + str(device_id) + "/log.txt") + print("log_file_dir: " + cur_dir + "/LOG" + str(device_id) + "/pretraining_log.txt") os.chdir(cur_dir + "/LOG" + str(device_id)) cmd = 'taskset -c ' + cmdopt + ' nohup python ' + run_script + " " diff --git a/model_zoo/official/nlp/bert/src/dataset.py b/model_zoo/official/nlp/bert/src/dataset.py index 8193ef83fa6370c8dc4c60c701b71a7ba0045f2d..cf4ee0741842dc263a65c4874f380b9b63f1b9ad 100644 --- a/model_zoo/official/nlp/bert/src/dataset.py +++ b/model_zoo/official/nlp/bert/src/dataset.py @@ -112,9 +112,6 @@ def create_squad_dataset(batch_size=1, repeat_count=1, data_file_path=None, sche else: ds = de.TFRecordDataset([data_file_path], schema_file_path if schema_file_path != "" else None, columns_list=["input_ids", "input_mask", "segment_ids", "unique_ids"]) - ds = ds.map(input_columns="input_ids", operations=type_cast_op) - ds = ds.map(input_columns="input_mask", operations=type_cast_op) - ds = ds.map(input_columns="segment_ids", operations=type_cast_op) ds = ds.map(input_columns="segment_ids", operations=type_cast_op) ds = ds.map(input_columns="input_mask", operations=type_cast_op) ds = ds.map(input_columns="input_ids", operations=type_cast_op) diff --git a/model_zoo/official/nlp/tinybert/README.md b/model_zoo/official/nlp/tinybert/README.md index 6b202716fdf8f75e21a27af8d6d9d15414ce2470..0620c354db94b4ef544029c4b7edca9b52ed4a68 100644 --- a/model_zoo/official/nlp/tinybert/README.md +++ b/model_zoo/official/nlp/tinybert/README.md @@ -65,6 +65,38 @@ For distributed training on Ascend, a hccl configuration file with JSON format n Please follow the instructions in the link below: https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. +For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#tfrecord) format. +``` +For general task, schema file contains ["input_ids", "input_mask", "segment_ids"]. + +For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"]. + +`numRows` is the only option which could be set by user, the others value must be set according to the dataset. + +For example, the dataset is cn-wiki-128, the schema file for general distill phase as following: +{ + "datasetType": "TF", + "numRows": 7680, + "columns": { + "input_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "input_mask": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "segment_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + } + } +} +``` + # [Script Description](#contents) ## [Script and Sample Code](#contents) @@ -117,7 +149,7 @@ options: --save_checkpoint_step steps for saving checkpoint files: N, default is 1000 --load_teacher_ckpt_path path to load teacher checkpoint files: PATH, default is "" --data_dir path to dataset directory: PATH, default is "" - --schema_dir path to schema.json file, PATH, default is "" + --schema_dir path to schema.json file, PATH, default is "" ``` ### Task Distill @@ -132,7 +164,7 @@ usage: run_general_task.py [--device_target DEVICE_TARGET] [--do_train DO_TRAIN [--load_td1_ckpt_path LOAD_TD1_CKPT_PATH] [--train_data_dir TRAIN_DATA_DIR] [--eval_data_dir EVAL_DATA_DIR] - [--task_name TASK_NAME] [--schema_dir SCHEMA_DIR] + [--task_name TASK_NAME] [--schema_dir SCHEMA_DIR] options: --device_target device where the code will be implemented: "Ascend" | "GPU", default is "Ascend" @@ -302,9 +334,9 @@ The best acc is 0.891176 ## [Model Description](#contents) ## [Performance](#contents) ### training Performance -| Parameters | TinyBERT | TinyBERT | +| Parameters | Ascend | GPU | | -------------------------- | ---------------------------------------------------------- | ------------------------- | -| Model Version | | | +| Model Version | TinyBERT | TinyBERT | | Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G, cpu:2.10GHz 64cores, memory:251G | | uploaded Date | 08/20/2020 | 08/24/2020 | | MindSpore Version | 0.6.0 | 0.7.0 | @@ -321,7 +353,7 @@ The best acc is 0.891176 #### Inference Performance -| Parameters | | | +| Parameters | Ascend | GPU | | -------------------------- | ----------------------------- | ------------------------- | | Model Version | | | | Resource | Ascend 910 | NV SMX2 V100-32G | @@ -344,4 +376,4 @@ In run_general_distill.py, we set the random seed to make sure distribute traini # [ModelZoo Homepage](#contents) -Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). \ No newline at end of file +Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).