提交 c0ece9cc 编写于 作者: C Chen Chen 提交者: A. Unique TensorFlower

Move data pre-processing related files (classifier_data_lib.py,...

Move data pre-processing related files (classifier_data_lib.py, create_finetuning_data.py, create_pretraining_data.py, squad_lib.py, squad_lib_sp.py) to data folder.

PiperOrigin-RevId: 296254023
上级 73d8226d
......@@ -114,7 +114,7 @@ officially supported by Google Cloud TPU team yet until TF 2.1 released.
### Pre-training
There is no change to generate pre-training data. Please use the script
[`create_pretraining_data.py`](create_pretraining_data.py)
[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py)
which is essentially branched from [BERT research repo](https://github.com/google-research/bert)
to get processed pre-training data and it adapts to TF2 symbols and python3
compatibility.
......@@ -123,10 +123,10 @@ compatibility.
### Fine-tuning
To prepare the fine-tuning data for final model training, use the
[`create_finetuning_data.py`](./create_finetuning_data.py) script. Resulting
datasets in `tf_record` format and training meta data should be later passed to
training or evaluation scripts. The task-specific arguments are described in
following sections:
[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script.
Resulting datasets in `tf_record` format and training meta data should be later
passed to training or evaluation scripts. The task-specific arguments are
described in following sections:
* GLUE
......@@ -141,7 +141,7 @@ export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1
export TASK_NAME=MNLI
export OUTPUT_DIR=gs://some_bucket/datasets
python create_finetuning_data.py \
python ../data/create_finetuning_data.py \
--input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
--vocab_file=${BERT_BASE_DIR}/vocab.txt \
--train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
......@@ -171,7 +171,7 @@ export SQUAD_VERSION=v1.1
export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
export OUTPUT_DIR=gs://some_bucket/datasets
python create_finetuning_data.py \
python ../data/create_finetuning_data.py \
--squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
--vocab_file=${BERT_BASE_DIR}/vocab.txt \
--train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
......
......@@ -13,6 +13,7 @@
# limitations under the License.
# ==============================================================================
"""Run BERT on SQuAD 1.1 and SQuAD 2.0 in TF 2.x."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
......@@ -33,13 +34,14 @@ from official.nlp.bert import common_flags
from official.nlp.bert import configs as bert_configs
from official.nlp.bert import input_pipeline
from official.nlp.bert import model_saving_utils
from official.nlp.bert import squad_lib as squad_lib_wp
from official.nlp.bert import squad_lib_sp
from official.nlp.bert import tokenization
# word-piece tokenizer based squad_lib
from official.nlp.data import squad_lib as squad_lib_wp
# sentence-piece tokenizer based squad_lib
from official.nlp.data import squad_lib_sp
from official.utils.misc import distribution_utils
from official.utils.misc import keras_utils
flags.DEFINE_enum(
'mode', 'train_and_predict',
['train_and_predict', 'train', 'predict', 'export_only'],
......
......@@ -24,13 +24,12 @@ import json
from absl import app
from absl import flags
import tensorflow as tf
from official.nlp.bert import classifier_data_lib
from official.nlp.bert import tokenization
from official.nlp.data import classifier_data_lib
# word-piece tokenizer based squad_lib
from official.nlp.bert import squad_lib as squad_lib_wp
from official.nlp.data import squad_lib as squad_lib_wp
# sentence-piece tokenizer based squad_lib
from official.nlp.bert import squad_lib_sp
from official.nlp.bert import tokenization
from official.nlp.data import squad_lib_sp
FLAGS = flags.FLAGS
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册