diff --git a/configs/recognition/csn/README.md b/configs/recognition/csn/README.md
index 13d789312cc4f4425a844794a08e53815c815ac1..dad32ec99298ed9633764220f819bd7fb24e6be3 100644
--- a/configs/recognition/csn/README.md
+++ b/configs/recognition/csn/README.md
@@ -34,7 +34,7 @@ Notes:
 1. The **gpus** indicates the number of gpu (32G V100) we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8x4 gpus as default.
 According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
 e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu.
-2. The **inference_time** is got by this [benchmark script](/tools/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
+2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
 not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.
 3. The values in columns named after "reference" are the results got by training on the original repo, using the same model settings.
 
diff --git a/configs/recognition/i3d/README.md b/configs/recognition/i3d/README.md
index cbf6cc116a04e2bb6f28cee04ca0b905188aa17d..999271ab3de0a2c794c11a43ec3105e7034eff2f 100644
--- a/configs/recognition/i3d/README.md
+++ b/configs/recognition/i3d/README.md
@@ -40,7 +40,7 @@ Notes:
 1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
 According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
 e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu.
-2. The **inference_time** is got by this [benchmark script](/tools/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
+2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
 not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.
 
 For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md).
diff --git a/configs/recognition/r2plus1d/README.md b/configs/recognition/r2plus1d/README.md
index 2729d87c20ec3bc6cc9108e955fc44e8bf037fa6..8857b9a8e6adfbec500e812aff9cad994fba0961 100644
--- a/configs/recognition/r2plus1d/README.md
+++ b/configs/recognition/r2plus1d/README.md
@@ -26,7 +26,7 @@ Notes:
 1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
 According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
 e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu.
-2. The **inference_time** is got by this [benchmark script](/tools/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
+2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
 not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.
 
 For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md).
diff --git a/configs/recognition/slowfast/README.md b/configs/recognition/slowfast/README.md
index 728f416f74ce8cfa9ed4a569f087e79a5bf7226f..254f0cd1d547ca575c20bb8d864992349b60ec55 100644
--- a/configs/recognition/slowfast/README.md
+++ b/configs/recognition/slowfast/README.md
@@ -27,7 +27,7 @@ Notes:
 1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
 According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
 e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu.
-2. The **inference_time** is got by this [benchmark script](/tools/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
+2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
 not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.
 
 For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md).
diff --git a/configs/recognition/slowonly/README.md b/configs/recognition/slowonly/README.md
index 6a2a8c61948eedc758aa909804b99a2a32623e42..b9b9eae2bb3b63c0705bbbc3d22031e7b654f208 100644
--- a/configs/recognition/slowonly/README.md
+++ b/configs/recognition/slowonly/README.md
@@ -39,7 +39,7 @@ Notes:
 1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
 According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
 e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu.
-2. The **inference_time** is got by this [benchmark script](/tools/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
+2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
 not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.
 
 For more details on data preparation, you can refer to Kinetics400 in [Data Preparation](/docs/data_preparation.md).
diff --git a/configs/recognition/tin/README.md b/configs/recognition/tin/README.md
index 3af0374ec09e01c4fd6791865741d77093478a7c..bbe26f50b89f79f8721de93c44c1af339e5fb119 100644
--- a/configs/recognition/tin/README.md
+++ b/configs/recognition/tin/README.md
@@ -33,7 +33,7 @@ The [AverageMeter issue](https://github.com/deepcs233/TIN/issues/4) will lead to
 2. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
 According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
 e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu.
-3. The **inference_time** is got by this [benchmark script](/tools/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
+3. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
 not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.
 4. The values in columns named after "reference" are the results got by training on the original repo, using the same model settings.
 
diff --git a/configs/recognition/tsm/README.md b/configs/recognition/tsm/README.md
index 7b341d012670ca7df325f46593d7d91ac9da74e6..097361c9c35d7f8c1b48798011b4db63ec34ad39 100644
--- a/configs/recognition/tsm/README.md
+++ b/configs/recognition/tsm/README.md
@@ -51,7 +51,7 @@ Notes:
 1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
 According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
 e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu.
-2. The **inference_time** is got by this [benchmark script](/tools/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
+2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
 not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.
 3. The values in columns named after "reference" are the results got by training on the original repo, using the same model settings.
 
diff --git a/configs/recognition/tsn/README.md b/configs/recognition/tsn/README.md
index ebe24eb81e7ff6d9ca7da996febfb2b3c4c0643f..ae9e0bd82e538c979f4c3e9376dcf25974df725b 100644
--- a/configs/recognition/tsn/README.md
+++ b/configs/recognition/tsn/README.md
@@ -103,7 +103,7 @@ Notes:
 1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
 According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU,
 e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu.
-2. The **inference_time** is got by this [benchmark script](/tools/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
+2. The **inference_time** is got by this [benchmark script](/tools/analysis/benchmark.py), where we use the sampling frames strategy of the test setting and only care about the model inference time,
 not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.
 3. The values in columns named after "reference" are the results got by training on the original repo, using the same model settings.
 
diff --git a/docs/getting_started.md b/docs/getting_started.md
index ebcd37728ead91f5acf0f4c16e12121e47f1a0a4..0c6c65e111e00c84306108317c0dd9f71a6c7478 100644
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@@ -468,7 +468,7 @@ You can plot loss/top-k acc curves given a training log file. Run `pip install s
 ![acc_curve_image](imgs/acc_curve.png)
 
 ```shell
-python tools/analyze_logs.py plot_curve ${JSON_LOGS} [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]
+python tools/analysis/analyze_logs.py plot_curve ${JSON_LOGS} [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]
 ```
 
 Examples:
@@ -476,31 +476,31 @@ Examples:
 - Plot the classification loss of some run.
 
 ```shell
-python tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
+python tools/analysis/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
 ```
 
 - Plot the top-1 acc and top-5 acc of some run, and save the figure to a pdf.
 
 ```shell
-python tools/analyze_logs.py plot_curve log.json --keys top1_acc top5_acc --out results.pdf
+python tools/analysis/analyze_logs.py plot_curve log.json --keys top1_acc top5_acc --out results.pdf
 ```
 
 - Compare the top-1 acc of two runs in the same figure.
 
 ```shell
-python tools/analyze_logs.py plot_curve log1.json log2.json --keys top1_acc --legend run1 run2
+python tools/analysis/analyze_logs.py plot_curve log1.json log2.json --keys top1_acc --legend run1 run2
 ```
 
 You can also compute the average training speed.
 
 ```shell
-python tools/analyze_logs.py cal_train_time ${JSON_LOGS} [--include-outliers]
+python tools/analysis/analyze_logs.py cal_train_time ${JSON_LOGS} [--include-outliers]
 ```
 
 - Compute the average training speed for a config file
 
 ```shell
-python tools/analyze_logs.py cal_train_time work_dirs/some_exp/20200422_153324.log.json
+python tools/analysis/analyze_logs.py cal_train_time work_dirs/some_exp/20200422_153324.log.json
 ```
 
 The output is expected to be like the following.
@@ -519,7 +519,7 @@ average iter time: 0.9330 s/iter
 We provide a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model.
 
 ```shell
-python tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
+python tools/analysis/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
 ```
 
 We will get the result like this
diff --git a/mmaction/models/recognizers/recognizer2d.py b/mmaction/models/recognizers/recognizer2d.py
index e39dfe13fd340ee56c4c32df8ec410e1a8b67ab5..7c2dfd8a1eceacda8f1a440549713554524a74b3 100644
--- a/mmaction/models/recognizers/recognizer2d.py
+++ b/mmaction/models/recognizers/recognizer2d.py
@@ -35,7 +35,7 @@ class Recognizer2D(BaseRecognizer):
     def forward_dummy(self, imgs):
         """Used for computing network FLOPs.
 
-        See ``mmaction/tools/get_flops.py``.
+        See ``tools/analysis/get_flops.py``.
 
         Args:
             imgs (torch.Tensor): Input images.
diff --git a/mmaction/models/recognizers/recognizer3d.py b/mmaction/models/recognizers/recognizer3d.py
index f35dd5cf90ef06e819c57483d6c1ce2af059028e..bf8a18c7de50490bd73d3a98ca5b4afa15483ce6 100644
--- a/mmaction/models/recognizers/recognizer3d.py
+++ b/mmaction/models/recognizers/recognizer3d.py
@@ -31,7 +31,7 @@ class Recognizer3D(BaseRecognizer):
     def forward_dummy(self, imgs):
         """Used for computing network FLOPs.
 
-        See ``mmaction/tools/get_flops.py``.
+        See ``tools/analysis/get_flops.py``.
 
         Args:
             imgs (torch.Tensor): Input images.
diff --git a/tools/analyze_logs.py b/tools/analysis/analyze_logs.py
similarity index 100%
rename from tools/analyze_logs.py
rename to tools/analysis/analyze_logs.py
diff --git a/tools/bench_processing.py b/tools/analysis/bench_processing.py
similarity index 96%
rename from tools/bench_processing.py
rename to tools/analysis/bench_processing.py
index 0ff7824181459d21ffd6aacd85318c2281262669..1460dca9da2988f27448f37c8f06638a1406003a 100644
--- a/tools/bench_processing.py
+++ b/tools/analysis/bench_processing.py
@@ -1,7 +1,7 @@
 """This file is for benchmark dataloading process. The command line to run this
 file is:
 
-$ python -m cProfile -o program.prof tools/bench_processing.py
+$ python -m cProfile -o program.prof tools/analysis/bench_processing.py
 configs/task/method/[config filename]
 
 It use cProfile to record cpu running time and output to program.prof
diff --git a/tools/benchmark.py b/tools/analysis/benchmark.py
similarity index 100%
rename from tools/benchmark.py
rename to tools/analysis/benchmark.py
diff --git a/tools/get_flops.py b/tools/analysis/get_flops.py
similarity index 100%
rename from tools/get_flops.py
rename to tools/analysis/get_flops.py
diff --git a/tools/report_accuracy.py b/tools/analysis/report_accuracy.py
similarity index 100%
rename from tools/report_accuracy.py
rename to tools/analysis/report_accuracy.py
diff --git a/tools/anet_feature_prepare.py b/tools/data/activitynet/activitynet_feature_extraction.py
similarity index 100%
rename from tools/anet_feature_prepare.py
rename to tools/data/activitynet/activitynet_feature_extraction.py
diff --git a/tools/tsn_feature_extract.py b/tools/data/activitynet/tsn_feature_extraction.py
similarity index 100%
rename from tools/tsn_feature_extract.py
rename to tools/data/activitynet/tsn_feature_extraction.py
diff --git a/tools/gen_flow.py b/tools/flow_extraction.py
similarity index 100%
rename from tools/gen_flow.py
rename to tools/flow_extraction.py