Move data pre-processing related files (classifier_data_lib.py,...

Move data pre-processing related files (classifier_data_lib.py, create_finetuning_data.py, create_pretraining_data.py, squad_lib.py, squad_lib_sp.py) to data folder. PiperOrigin-RevId: 296254023

Move data pre-processing related files (classifier_data_lib.py,...
Move data pre-processing related files (classifier_data_lib.py, create_finetuning_data.py, create_pretraining_data.py, squad_lib.py, squad_lib_sp.py) to data folder. PiperOrigin-RevId: 296254023
c0ece9cc · Chen Chen · A. Unique TensorFlower · 73d8226d · c0ece9cc · c0ece9cc
7 changed file
--- a/official/nlp/bert/README.md
+++ b/official/nlp/bert/README.md
@@ -114,7 +114,7 @@ officially supported by Google Cloud TPU team yet until TF 2.1 released.
 ### Pre-training

 There is no change to generate pre-training data. Please use the script
-[`create_pretraining_data.py`](create_pretraining_data.py)
+[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py)
 which is essentially branched from [BERT research repo](https://github.com/google-research/bert)
 to get processed pre-training data and it adapts to TF2 symbols and python3
 compatibility.
@@ -123,10 +123,10 @@ compatibility.
 ### Fine-tuning

 To prepare the fine-tuning data for final model training, use the
-[`create_finetuning_data.py`](./create_finetuning_data.py) script.  Resulting
-datasets in `tf_record` format and training meta data should be later passed to
-training or evaluation scripts. The task-specific arguments are described in
-following sections:
+[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script.
+Resulting datasets in `tf_record` format and training meta data should be later
+passed to training or evaluation scripts. The task-specific arguments are
+described in following sections:

 * GLUE

@@ -141,7 +141,7 @@ export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1

 export TASK_NAME=MNLI
 export OUTPUT_DIR=gs://some_bucket/datasets
-python create_finetuning_data.py \
+python ../data/create_finetuning_data.py \
 --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
 --vocab_file=${BERT_BASE_DIR}/vocab.txt \
 --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
@@ -171,7 +171,7 @@ export SQUAD_VERSION=v1.1
 export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
 export OUTPUT_DIR=gs://some_bucket/datasets

-python create_finetuning_data.py \
+python ../data/create_finetuning_data.py \
 --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
 --vocab_file=${BERT_BASE_DIR}/vocab.txt \
 --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \

--- a/official/nlp/bert/run_squad.py
+++ b/official/nlp/bert/run_squad.py
@@ -13,6 +13,7 @@
 # limitations under the License.
 # ==============================================================================
 """Run BERT on SQuAD 1.1 and SQuAD 2.0 in TF 2.x."""
+
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
@@ -33,13 +34,14 @@ from official.nlp.bert import common_flags
 from official.nlp.bert import configs as bert_configs
 from official.nlp.bert import input_pipeline
 from official.nlp.bert import model_saving_utils
-from official.nlp.bert import squad_lib as squad_lib_wp
-from official.nlp.bert import squad_lib_sp
 from official.nlp.bert import tokenization
+# word-piece tokenizer based squad_lib
+from official.nlp.data import squad_lib as squad_lib_wp
+# sentence-piece tokenizer based squad_lib
+from official.nlp.data import squad_lib_sp
 from official.utils.misc import distribution_utils
 from official.utils.misc import keras_utils

-
 flags.DEFINE_enum(
    'mode', 'train_and_predict',
    ['train_and_predict', 'train', 'predict', 'export_only'],

--- a/official/nlp/bert/classifier_data_lib.py
+++ b/official/nlp/bert/classifier_data_lib.py
--- a/official/nlp/bert/create_finetuning_data.py
+++ b/official/nlp/bert/create_finetuning_data.py
@@ -24,13 +24,12 @@ import json
 from absl import app
 from absl import flags
 import tensorflow as tf
-
-from official.nlp.bert import classifier_data_lib
+from official.nlp.bert import tokenization
+from official.nlp.data import classifier_data_lib
 # word-piece tokenizer based squad_lib
-from official.nlp.bert import squad_lib as squad_lib_wp
+from official.nlp.data import squad_lib as squad_lib_wp
 # sentence-piece tokenizer based squad_lib
-from official.nlp.bert import squad_lib_sp
-from official.nlp.bert import tokenization
+from official.nlp.data import squad_lib_sp

 FLAGS = flags.FLAGS


--- a/official/nlp/bert/create_pretraining_data.py
+++ b/official/nlp/bert/create_pretraining_data.py
--- a/official/nlp/bert/squad_lib.py
+++ b/official/nlp/bert/squad_lib.py
--- a/official/nlp/bert/squad_lib_sp.py
+++ b/official/nlp/bert/squad_lib_sp.py