init

a793d8bf · yuanxiao · a793d8bf · a793d8bf · a793d8bf · a793d8bf
38 changed file
--- a/README.md
+++ b/README.md
+# Multimodal Short Video Data Set and Baseline Classification Model
+> If you have data / access to data / better model, please feel free to issue /pull requests / contact me wangzichaochaochao@gmail.com
+
+This resource contains 50+ million(865G) multimodal short video data sets and TensorFlow2.0 multimodal short video classification model, aiming at creating a multimodal classification framework.
+
+Multimodal short video data = short video description text + short video cover image + short video
+
+本资源含有 50+ 万条（865G）多模态短视频数据集和 TensorFlow2.0 多模态短视频分类模型，旨在打造多模态分类框架。
+
+多模态短视频数据 = 短视频描述文本 + 短视频封面图 + 短视频
+
+![](example_data/example_data_file.png)
+
+[click to view example data](example_data)
+
+---
+
+## 1. Multimodal dataset information
+The current multimodal short video dataset contains 50+ million multimodal data, covering 31 categories, occupying a total of 865G space. Download and unzip the [multimodal_data_info.rar](aggregate_download_data_to_a_json_file/multimodal_data_info.rar) file and you will get the download address for all datas. You can download them directly using [data_download_tools](data_download_tools), but you can also use your own download tool.
+
+目前多模态短视频数据集含有50+万条多模态数据，它们涵盖31个类别，共占用865G空间。下载并解压 [multimodal_data_info.rar](aggregate_download_data_to_a_json_file/multimodal_data_info.rar) 文件，你可以获得所有数据的下载地址。你可以直接使用 [data_download_tools](data_download_tools) 下载它们，当然你也可以使用自己的下载工具。
+
+
+### Multimodal data (31 types)
+
+Video category Chinese and English mapping dictionary 视频种类中英文映射字典
+
+```python
+video_type_dict = {'360VR': 'VR', '4k': '4K', 'Technology': '科技', 'Sport': '运动', 'Timelapse': '延时',
+                   'Aerial': '航拍', 'Animals': '动物', 'Sea': '大海', 'Beach': '海滩', 'space': '太空',
+                   'stars': '星空', 'City': '城市', 'Business': '商业', 'Underwater': '水下摄影',
+                   'Wedding': '婚礼', 'Archival': '档案', 'Backgrounds': '背景', 'Alpha Channel': '透明通道',
+                   'Intro': '开场', 'Celebration': '庆典', 'Clouds': '云彩', 'Corporate': '企业',
+                   'Explosion': '爆炸', 'Film': '电影镜头', 'Green Screen': '绿幕', 'Military': '军事',
+                   'Nature': '自然', 'News': '新闻', 'R3d': 'R3d', 'Romantic': '浪漫', 'Abstract': '抽象'}
+
+```
+
+In addition to 360VR type video data, each of the other types has approximately 20,000 pieces of data. You can check the contents of all multimodal files at any time using the [download_file_info.ipynb](data_download_tools/xinpianchang/download_file_info.ipynb) tool in [data_download_tools](data_download_tools). As follows:
+
+除了360VR类型的视频数据，其它每个类型有大约20000条数据。你可以使用[data_download_tools](data_download_tools)中的[download_file_info.ipynb](data_download_tools/xinpianchang/download_file_info.ipynb)工具随时检查所有多模态文件的内容，如下所示：
+
+Check the disk space occupied by the data. 检查数据占用的磁盘空间。
+
+![](data_download_tools/xinpianchang/download_mp4_info.png)
+
+Check a type of video cover image and corresponding video description information. 检查某个类型的视频封面图以及对应的视频描述信息。
+![](data_download_tools/xinpianchang/check_image.png)
+
+### multimodal data statistics
+
+The multimodal_data_info.json file contains statistics on 562,342 multimodal data, ```['mp4_id', 'video_label', 'mp4_time', 'mp4_download_url', 'mp4_background_image_url', 'mp4_txt_brief']``` content.
+
+The content of multimodal_data_info.json is as follows:
+
+```python
+{"mp4_id": "80328682", "mp4_download_url": "https://p5-v1.xpccdn.com/080328682_main_xl.mp4",
+ "mp4_time": "0:16", "mp4_background_image_url": "https://p5-i1.xpccdn.com/080328682_iconl.jpeg",
+ "mp4_txt_brief": " Woman in swimsuit and cover up walking at the beach", "video_label": "Beach"}
+
+{"mp4_id": "63660083", "mp4_download_url": "https://p5-v1.xpccdn.com/063660083_main_xl.mp4",
+"mp4_time": "0:29", "mp4_background_image_url": "https://p5-i1.xpccdn.com/063660083_iconl.jpeg",
+ "mp4_txt_brief": " 4K Happy female friends chatting & drinking on city rooftop in the summer", "video_label": "City"}
+```
+
+You can use the [data_analysis.ipynb](aggregate_download_data_to_a_json_file/data_analysis.ipynb) tool in [aggregate_download_data_to_a_json_file](aggregate_download_data_to_a_json_file) to count the data of a multimodal file. The statistics are as follows.
+
+你可以使用[aggregate_download_data_to_a_json_file](aggregate_download_data_to_a_json_file)中的[data_analysis.ipynb](aggregate_download_data_to_a_json_file/data_analysis.ipynb)工具统计多模态文件的数据，统计结果如下所示。
+
+![](aggregate_download_data_to_a_json_file/json_file_data_analysis.png)
+
+---
+
+## 2. Baseline Classification Model
+> 查看我的博客 [短视频分类技术](https://yuanxiaosc.github.io/categories/TF/%E5%95%86%E4%B8%9A%E5%BA%94%E7%94%A8%E6%A1%88%E4%BE%8B/) 获取更多短视频分类信息。
+
+Model structure picture 模型结构图
+
+![](baseline_model/multimodal_baseline_model.png)
+
+Model structure test 模型结构测试
+
+![](baseline_model/model_structure_test.png)
+
+[Click on baseline_model to learn more](baseline_model)
+
+
+### Require
+
+ python 3+, e.g. python==3.6
+ tensorflow version 2, e.g. tensorflow==2.0.0-beta1
+ tensorflow-datasets
+
+### Train Model
+
+```python
+python train_multimodal_baseline_model.py
+```
+
+---
+
+## 4. Build your own model
+
+[Click on data_interface_for_model to learn more](data_interface_for_model)
+
+Data can be easily provided to your model using the [data_interface_for_model](data_interface_for_model) data interface. Data_interface_for_model contains three types of data interfaces: tensor required by TensorFlow, numpy required by Pytorch, and native Python type.
+
+可以使用[data_interface_for_model](data_interface_for_model) 数据接口方便的为你的模型提供数据。data_interface_for_model包含三种类型的数据接口，分别是：TensorFlow需要的tensor、Pytorch需要的numpy和原生的Python类型。
+
+---
+
+## 5. Copyright Statement
+
+Currently all multimodal video data comes from the Internet, and the data is copyrighted by the original author. If this data (from https://xinpianchang.com) is used for profit, please contact service@xinpianchang.com to purchase data copyright.
+
+目前所有多模态视频数据来自互联网，该数据版权归原作者所有。如果将该数据（来自 https://xinpianchang.com ）用于牟利，请联系 service@xinpianchang.com 购买数据版权。
--- a/aggregate_download_data_to_a_json_file/aggravate_data_utils.py
+++ b/aggregate_download_data_to_a_json_file/aggravate_data_utils.py
+import os
+import sys
+import pathlib
+import pandas as pd
+import json
+
+
+def clean_specified_type_file(data_root=None, specified_type_list=["*/*/*.mp4", "*/*/*.jpeg", "*/*/*.txt"]):
+    """
+    :param data_root: To delete the root of the file
+    :param specified_type_list: To delete the relationship between the specified file and the root directory
+    """
+    if data_root is None:
+        data_root = os.getcwd()
+
+    data_root = pathlib.Path(data_root)
+    garbage_file_list = list()
+    # get the paths to clear the file
+    for t in specified_type_list:
+        names_list = sorted(item.name for item in data_root.glob(t))
+        garbage_file_list.extend(names_list)
+
+    # remove file
+    for name in garbage_file_list:
+        os.remove(name)
+
+
+def get_description_information(txt_path):
+    """description_information include: {'mp4_id': '', 'mp4_download_url': '', 'mp4_time': '',
+    'mp4_background_image_url': '', 'mp4_txt_brief': ''}"""
+    description_information_dict = eval(open(txt_path).read())
+    return description_information_dict
+
+
+def standardization_of_file_names(data_root="MP4_download"):
+    """
+    Uniform naming format for each set of data as follows:
+
+    multimodal_data_id
+        multimodal_data_id.jepg
+        multimodal_data_id.mp4
+        multimodal_data_id.txt
+    """
+
+    # Get all multimodal data type names
+    data_root = pathlib.Path(data_root)
+    label_names_list = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
+    print(f"data_root contain video type numbers {len(label_names_list)}")
+    print(f"data_root contain video type {label_names_list}")
+
+    # Processing multimodal data sequentially
+    for label_name in label_names_list:
+        # Get all folders under a certain type of multimodal data
+        label_mode = label_name + "/*"
+        multimodal_data_dir = list(data_root.glob(label_mode))
+        multimodal_data_dir = [str(path) for path in multimodal_data_dir]
+
+        # File name for standardized multimodal data
+        for multimodal_data_path in multimodal_data_dir:
+            multimodal_data_id = os.path.basename(multimodal_data_path)
+            for item_file in os.listdir(multimodal_data_path):
+                item_file = os.path.join(multimodal_data_path, item_file)
+                if item_file.endswith('.txt'):
+                    os.rename(item_file, os.path.join(multimodal_data_path, multimodal_data_id + ".txt"))
+                elif item_file.endswith('.jpeg'):
+                    os.rename(item_file, os.path.join(multimodal_data_path, multimodal_data_id + ".jpeg"))
+                elif item_file.endswith('.mp4'):
+                    os.rename(item_file, os.path.join(multimodal_data_path, multimodal_data_id + ".mp4"))
+                elif item_file.endswith('.ipynb_checkpoints'):
+                    pass
+                else:
+                    raise ValueError("An abnormal document appeared! check!")
+
+
+def count_file_number(data_root="MP4_download"):
+    """
+    statistics files number
+    :return {'Military': 18560, 'Business': 19200, 'Archival': 10176, 'Romantic': 19162,...}
+             all number:      56xx42
+    """
+    video_label_number = len(os.listdir(data_root))
+    print("video_label_number:\t", video_label_number)
+    multimodal_data_number_dict = dict()
+    all_number = 0
+    for video_label in os.listdir(data_root):
+        video_label_dir = os.path.join(data_root, video_label)
+        # print("video_label_dir:\t", video_label_dir)
+        multimodal_data_number = len(os.listdir(video_label_dir))
+        # print("multimodal_data_number:\t", multimodal_data_number)
+        multimodal_data_number_dict[video_label] = multimodal_data_number
+        all_number += multimodal_data_number
+    print(multimodal_data_number_dict)
+    print("all number:\t", all_number)
+    return multimodal_data_number_dict
+
+
+def statistics_all_multimodal_data_information_to_json_file(data_root="MP4_download",
+                                                            store_multimodal_info_json_file_path="multimodal_data_info.json"):
+    """
+    data_root all *.txt files to a *.json file
+    """
+    data_root = pathlib.Path(data_root)
+    all_txt_data_paths = [str(path) for path in
+                          list(data_root.glob('*/*/*.txt'))]  # [MP4_download/360VR/89422838/89422838.txt,...]
+
+    json_write_f = open(store_multimodal_info_json_file_path, "w", encoding='utf-8')
+    for text_data_path in all_txt_data_paths:
+        video_label_path = os.path.dirname(os.path.dirname(text_data_path))  # /MP4_download/360VR/
+        video_label = os.path.basename(video_label_path)  # 360VR
+
+        description_information_dict = get_description_information(text_data_path)
+        description_information_dict["video_label"] = video_label
+
+        line_json = json.dumps(description_information_dict, ensure_ascii=False)
+        json_write_f.write(line_json + "\n")
+    json_write_f.close()
+
+
+def read_multimodal_data_information_json_file(json_file_path="multimodal_data_info.json"):
+    """
+    :param json_file_path:
+    :return: multimodal_data_information_list
+            [{'mp4_id': '97930081', 'mp4_download_url': ...'video_label': 'Military'},
+            {'mp4_id': '64413672', 'mp4_download_url': ... 'video_label': 'Military'}]
+    """
+
+    def check_data(line_dict):
+        for item in ['mp4_id', 'video_label', 'mp4_time', 'mp4_download_url', 'mp4_background_image_url',
+                     'mp4_txt_brief']:
+            if item not in line_dict:
+                return False
+        return True
+
+    multimodal_data_information_list = list()
+    with open(json_file_path, 'r', encoding='utf-8') as f:
+        try:
+            while True:
+                line = f.readline()
+                if line:
+                    line_dict = json.loads(line)
+                    if check_data(line_dict):
+                        multimodal_data_information_list.append(line_dict)
+                    else:
+                        print("incomplete data:")
+                        print(line_dict)
+                else:
+                    break
+        except:
+            f.close()
+    return multimodal_data_information_list
+
+
+def multimodal_data_json_file_to_datafram(json_file_path="multimodal_data_info.json"):
+    """
+    json file to pandas.DataFrame
+    """
+    if not os.path.exists(json_file_path):
+        print("python statistics_all_multimodal_data_information_to_json_file(data_root, json_file_path)")
+        raise ValueError("Not found json file!")
+
+    multimodal_data_information_list = read_multimodal_data_information_json_file(json_file_path)
+
+    multimodal_data_information_dict = {'mp4_id': [], 'video_label': [], 'mp4_time': [],
+                                        'mp4_download_url': [], 'mp4_background_image_url': [], 'mp4_txt_brief': []}
+
+    for data in multimodal_data_information_list:
+        multimodal_data_information_dict['mp4_id'].append(data['mp4_id'])
+        multimodal_data_information_dict['video_label'].append(data['video_label'])
+        multimodal_data_information_dict['mp4_time'].append(data['mp4_time'])
+        multimodal_data_information_dict['mp4_download_url'].append(data['mp4_download_url'])
+        multimodal_data_information_dict['mp4_background_image_url'].append(data['mp4_background_image_url'])
+        multimodal_data_information_dict['mp4_txt_brief'].append(data['mp4_txt_brief'])
+
+    multimodal_data_information_datafram = pd.DataFrame(multimodal_data_information_dict)
+
+    return multimodal_data_information_datafram
+
+
+def aggravate_data_utils_main(data_root, json_file_path="./multimodal_data_info.json"):
+    """
+    aggregate_download_data_to_a_json_file
+    :param data_root: download files data root
+    :param json_file_path: produce json file path
+    :return:
+    """
+    # standard file name
+    standardization_of_file_names(data_root)
+
+    # produce json file
+    statistics_all_multimodal_data_information_to_json_file(data_root, json_file_path)
+
+    # analysis json file
+    multimodal_data_information_datafram = multimodal_data_json_file_to_datafram(json_file_path)
+    print(multimodal_data_information_datafram.describe())
+
+
+if __name__ == "__main__":
+    data_root = "/home/b418a/disk1/jupyter_workspace/yuanxiao/douyin/xinpianchang/MP4_download"
+    json_file_path = "./multimodal_data_info.json"
+
+    if len(sys.argv) == 3:
+        data_root = sys.argv[1]
+        json_file_path = sys.argv[2]
+    elif len(sys.argv) == 2:
+        data_root = sys.argv[1]
+
+    aggravate_data_utils_main(data_root, json_file_path)
--- a/aggregate_download_data_to_a_json_file/data_analysis.ipynb
+++ b/aggregate_download_data_to_a_json_file/data_analysis.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "multimodal_data_info_file_path ='multimodal_data_info.json'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def read_multimodal_data_information_json_file(json_file_path=\"multimodal_data_info.json\"):\n",
+    "    \"\"\"\n",
+    "    :param json_file_path:\n",
+    "    :return: multimodal_data_information_list\n",
+    "            [{'mp4_id': '97930081', 'mp4_download_url': ...'video_label': 'Military'},\n",
+    "            {'mp4_id': '64413672', 'mp4_download_url': ... 'video_label': 'Military'}]\n",
+    "    \"\"\"\n",
+    "    def check_data(line_dict):\n",
+    "        for item in ['mp4_id', 'video_label', 'mp4_time', 'mp4_download_url', 'mp4_background_image_url', 'mp4_txt_brief']:\n",
+    "            if item not in line_dict:\n",
+    "                return False\n",
+    "        return True\n",
+    "        \n",
+    "    multimodal_data_information_list = list()\n",
+    "    with open(json_file_path, 'r', encoding='utf-8') as f:\n",
+    "        try:\n",
+    "            while True:\n",
+    "                line = f.readline()\n",
+    "                if line:\n",
+    "                    line_dict = json.loads(line)\n",
+    "                    if check_data(line_dict):\n",
+    "                        multimodal_data_information_list.append(line_dict)\n",
+    "                    else:\n",
+    "                        print(\"incomplete data:\")\n",
+    "                        print(line_dict)\n",
+    "                else:\n",
+    "                    break\n",
+    "        except:\n",
+    "            f.close()\n",
+    "    return multimodal_data_information_list"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "multimodal_data_information_list = read_multimodal_data_information_json_file(multimodal_data_info_file_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "562342"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(multimodal_data_information_list)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[{'mp4_id': '75265848',\n",
+       "  'mp4_download_url': 'https://p5-v1.xpccdn.com/075265848_main_xl.mp4',\n",
+       "  'mp4_time': '0:13',\n",
+       "  'mp4_background_image_url': 'https://p5-i1.xpccdn.com/075265848_iconl.jpeg',\n",
+       "  'mp4_txt_brief': ' Old antique German military rifle',\n",
+       "  'video_label': 'Military'},\n",
+       " {'mp4_id': '44566064',\n",
+       "  'mp4_download_url': 'https://p5-v1.xpccdn.com/044566064_main_xl.mp4',\n",
+       "  'mp4_time': '0:09',\n",
+       "  'mp4_background_image_url': 'https://p5-i1.xpccdn.com/044566064_iconl.jpeg',\n",
+       "  'mp4_txt_brief': ' quadcopter aerial drone',\n",
+       "  'video_label': 'Military'},\n",
+       " {'mp4_id': '62447549',\n",
+       "  'mp4_download_url': 'https://p5-v1.xpccdn.com/062447549_main_xl.mp4',\n",
+       "  'mp4_time': '0:06',\n",
+       "  'mp4_background_image_url': 'https://p5-i1.xpccdn.com/062447549_iconl.jpeg',\n",
+       "  'mp4_txt_brief': ' Firearm dis-assembly for cleaning and safety check of handheld gun',\n",
+       "  'video_label': 'Military'}]"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multimodal_data_information_list[:3]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def multimodal_data_json_file_to_datafram(json_file_path=\"multimodal_data_info.json\"):\n",
+    "    \"\"\"\n",
+    "    :param json_file_path: \n",
+    "    :return: pandas.datafram\n",
+    "    \"\"\"\n",
+    "    multimodal_data_information_list = read_multimodal_data_information_json_file(json_file_path)\n",
+    "    \n",
+    "    multimodal_data_information_dict = {'mp4_id':[], 'video_label':[], 'mp4_time':[], \n",
+    "                                        'mp4_download_url':[], 'mp4_background_image_url':[], 'mp4_txt_brief':[]}\n",
+    "    \n",
+    "    for data in multimodal_data_information_list:\n",
+    "        multimodal_data_information_dict['mp4_id'].append(data['mp4_id'])\n",
+    "        multimodal_data_information_dict['video_label'].append(data['video_label'])\n",
+    "        multimodal_data_information_dict['mp4_time'].append(data['mp4_time'])\n",
+    "        multimodal_data_information_dict['mp4_download_url'].append(data['mp4_download_url'])\n",
+    "        multimodal_data_information_dict['mp4_background_image_url'].append(data['mp4_background_image_url'])\n",
+    "        multimodal_data_information_dict['mp4_txt_brief'].append(data['mp4_txt_brief'])\n",
+    "        \n",
+    "    multimodal_data_information_datafram = pd.DataFrame(multimodal_data_information_dict)\n",
+    "    \n",
+    "    return multimodal_data_information_datafram"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "multimodal_data_information_datafram = multimodal_data_json_file_to_datafram(json_file_path=\"multimodal_data_info.json\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>mp4_id</th>\n",
+       "      <th>video_label</th>\n",
+       "      <th>mp4_time</th>\n",
+       "      <th>mp4_download_url</th>\n",
+       "      <th>mp4_background_image_url</th>\n",
+       "      <th>mp4_txt_brief</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>75265848</td>\n",
+       "      <td>Military</td>\n",
+       "      <td>0:13</td>\n",
+       "      <td>https://p5-v1.xpccdn.com/075265848_main_xl.mp4</td>\n",
+       "      <td>https://p5-i1.xpccdn.com/075265848_iconl.jpeg</td>\n",
+       "      <td>Old antique German military rifle</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>44566064</td>\n",
+       "      <td>Military</td>\n",
+       "      <td>0:09</td>\n",
+       "      <td>https://p5-v1.xpccdn.com/044566064_main_xl.mp4</td>\n",
+       "      <td>https://p5-i1.xpccdn.com/044566064_iconl.jpeg</td>\n",
+       "      <td>quadcopter aerial drone</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>62447549</td>\n",
+       "      <td>Military</td>\n",
+       "      <td>0:06</td>\n",
+       "      <td>https://p5-v1.xpccdn.com/062447549_main_xl.mp4</td>\n",
+       "      <td>https://p5-i1.xpccdn.com/062447549_iconl.jpeg</td>\n",
+       "      <td>Firearm dis-assembly for cleaning and safety ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>42966432</td>\n",
+       "      <td>Military</td>\n",
+       "      <td>0:08</td>\n",
+       "      <td>https://p5-v1.xpccdn.com/042966432_main_xl.mp4</td>\n",
+       "      <td>https://p5-i1.xpccdn.com/042966432_iconl.jpeg</td>\n",
+       "      <td>Kalashnikov deadly weapon</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>103424272</td>\n",
+       "      <td>Military</td>\n",
+       "      <td>0:13</td>\n",
+       "      <td>https://p5-v1.xpccdn.com/103424272_main_xl.mp4</td>\n",
+       "      <td>https://p5-i1.xpccdn.com/103424272_iconl.jpeg</td>\n",
+       "      <td>Rows of ammunition in front of an animated Le...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "      mp4_id video_label mp4_time  \\\n",
+       "0   75265848    Military     0:13   \n",
+       "1   44566064    Military     0:09   \n",
+       "2   62447549    Military     0:06   \n",
+       "3   42966432    Military     0:08   \n",
+       "4  103424272    Military     0:13   \n",
+       "\n",
+       "                                 mp4_download_url  \\\n",
+       "0  https://p5-v1.xpccdn.com/075265848_main_xl.mp4   \n",
+       "1  https://p5-v1.xpccdn.com/044566064_main_xl.mp4   \n",
+       "2  https://p5-v1.xpccdn.com/062447549_main_xl.mp4   \n",
+       "3  https://p5-v1.xpccdn.com/042966432_main_xl.mp4   \n",
+       "4  https://p5-v1.xpccdn.com/103424272_main_xl.mp4   \n",
+       "\n",
+       "                        mp4_background_image_url  \\\n",
+       "0  https://p5-i1.xpccdn.com/075265848_iconl.jpeg   \n",
+       "1  https://p5-i1.xpccdn.com/044566064_iconl.jpeg   \n",
+       "2  https://p5-i1.xpccdn.com/062447549_iconl.jpeg   \n",
+       "3  https://p5-i1.xpccdn.com/042966432_iconl.jpeg   \n",
+       "4  https://p5-i1.xpccdn.com/103424272_iconl.jpeg   \n",
+       "\n",
+       "                                       mp4_txt_brief  \n",
+       "0                  Old antique German military rifle  \n",
+       "1                            quadcopter aerial drone  \n",
+       "2   Firearm dis-assembly for cleaning and safety ...  \n",
+       "3                          Kalashnikov deadly weapon  \n",
+       "4   Rows of ammunition in front of an animated Le...  "
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multimodal_data_information_datafram.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>mp4_id</th>\n",
+       "      <th>video_label</th>\n",
+       "      <th>mp4_time</th>\n",
+       "      <th>mp4_download_url</th>\n",
+       "      <th>mp4_background_image_url</th>\n",
+       "      <th>mp4_txt_brief</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>count</th>\n",
+       "      <td>562342</td>\n",
+       "      <td>562342</td>\n",
+       "      <td>562342</td>\n",
+       "      <td>562342</td>\n",
+       "      <td>562342</td>\n",
+       "      <td>562342</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>unique</th>\n",
+       "      <td>499607</td>\n",
+       "      <td>31</td>\n",
+       "      <td>184</td>\n",
+       "      <td>499607</td>\n",
+       "      <td>499607</td>\n",
+       "      <td>343020</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>top</th>\n",
+       "      <td>88460884</td>\n",
+       "      <td>Alpha Channel</td>\n",
+       "      <td>0:10</td>\n",
+       "      <td>https://p5-v1.xpccdn.com/023726153_main_xl.mp4</td>\n",
+       "      <td>https://p5-i1.xpccdn.com/088460884_iconl.jpeg</td>\n",
+       "      <td>Intro Background Texture Render Animation Col...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>freq</th>\n",
+       "      <td>9</td>\n",
+       "      <td>19200</td>\n",
+       "      <td>49660</td>\n",
+       "      <td>9</td>\n",
+       "      <td>9</td>\n",
+       "      <td>10974</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "          mp4_id    video_label mp4_time  \\\n",
+       "count     562342         562342   562342   \n",
+       "unique    499607             31      184   \n",
+       "top     88460884  Alpha Channel     0:10   \n",
+       "freq           9          19200    49660   \n",
+       "\n",
+       "                                      mp4_download_url  \\\n",
+       "count                                           562342   \n",
+       "unique                                          499607   \n",
+       "top     https://p5-v1.xpccdn.com/023726153_main_xl.mp4   \n",
+       "freq                                                 9   \n",
+       "\n",
+       "                             mp4_background_image_url  \\\n",
+       "count                                          562342   \n",
+       "unique                                         499607   \n",
+       "top     https://p5-i1.xpccdn.com/088460884_iconl.jpeg   \n",
+       "freq                                                9   \n",
+       "\n",
+       "                                            mp4_txt_brief  \n",
+       "count                                              562342  \n",
+       "unique                                             343020  \n",
+       "top      Intro Background Texture Render Animation Col...  \n",
+       "freq                                                10974  "
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multimodal_data_information_datafram.describe()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/aggregate_download_data_to_a_json_file/json_file_data_analysis.png
+++ b/aggregate_download_data_to_a_json_file/json_file_data_analysis.png
--- a/aggregate_download_data_to_a_json_file/make_fack_data.py
+++ b/aggregate_download_data_to_a_json_file/make_fack_data.py
+import os
+import shutil
+
+video_type_dict = {'360VR': 'VR', '4k': '4K', 'Technology': '科技', 'Sport': '运动', 'Timelapse': '延时',
+                   'Aerial': '航拍', 'Animals': '动物', 'Sea': '大海', 'Beach': '海滩', 'space': '太空',
+                   'stars': '星空', 'City': '城市', 'Business': '商业', 'Underwater': '水下摄影',
+                   'Wedding': '婚礼', 'Archival': '档案', 'Backgrounds': '背景', 'Alpha Channel': '透明通道',
+                   'Intro': '开场', 'Celebration': '庆典', 'Clouds': '云彩', 'Corporate': '企业',
+                   'Explosion': '爆炸', 'Film': '电影镜头', 'Green Screen': '绿幕', 'Military': '军事',
+                   'Nature': '自然', 'News': '新闻', 'R3d': 'R3d', 'Romantic': '浪漫', 'Abstract': '抽象'}
+
+
+def make_fake_data(true_data_root, fake_data_root="./MP4_download", fake_video_number=1):
+    """
+    In order not to damage the original data, copy the original data for research
+    """
+    if not os.path.exists(fake_data_root):
+        os.mkdir(fake_data_root)
+
+    video_type_list = list(video_type_dict.keys())
+
+    for multimodal_data_type in video_type_list[:fake_video_number]:
+        true_multimodal_a_type_data_dir = os.path.join(true_data_root, multimodal_data_type)
+        fake_multimodal_a_type_data_dir = os.path.join(fake_data_root, multimodal_data_type)
+        shutil.copytree(true_multimodal_a_type_data_dir, fake_multimodal_a_type_data_dir)
+
+
+if __name__=="__main__":
+    true_data_root = "/home/b418a/disk1/jupyter_workspace/yuanxiao/douyin/xinpianchang/MP4_download"
+    fake_data_root = "/home/b418a/disk1/pycharm_room/yuanxiao/my_lenovo_P50s/Multimodal-short-video-dataset-and-baseline-model/MP4_download"
+    fake_video_number = 1
+    make_fake_data(true_data_root, fake_data_root, fake_video_number)
--- a/aggregate_download_data_to_a_json_file/multimodal_data_info.rar
+++ b/aggregate_download_data_to_a_json_file/multimodal_data_info.rar
--- a/baseline_model/image_model_baseline_model.png
+++ b/baseline_model/image_model_baseline_model.png
--- a/baseline_model/model_structure_test.png
+++ b/baseline_model/model_structure_test.png
--- a/baseline_model/multimodal_baseline_model.png
+++ b/baseline_model/multimodal_baseline_model.png
--- a/baseline_model/mutimodal_baseline_model.py
+++ b/baseline_model/mutimodal_baseline_model.py
+import tensorflow as tf
+
+
+def create_text_baseline_model(txt_maxlen, vocab_size, embedding_dim=100, lstm_units=64, output_dim=50):
+    text_model = tf.keras.Sequential([
+        tf.keras.layers.Input(shape=(txt_maxlen)),
+        tf.keras.layers.Embedding(vocab_size, embedding_dim),
+        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm_units, return_sequences=True)),
+        tf.keras.layers.Flatten(),
+        tf.keras.layers.Dense(output_dim, activation='relu')
+    ], name='text_baseline_model')
+    return text_model
+
+
+def create_image_baseline_model(image_height, image_width, image_channels=3, output_dim=50):
+    image_model = tf.keras.Sequential([
+        tf.keras.layers.Input(shape=(image_height, image_width, image_channels)),
+        tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
+        tf.keras.layers.MaxPooling2D(2, 2),
+        tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
+        tf.keras.layers.MaxPooling2D(2, 2),
+        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
+        tf.keras.layers.MaxPooling2D(2, 2),
+        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
+        tf.keras.layers.Flatten(),
+        tf.keras.layers.Dense(output_dim)
+    ], name='image_baseline_model')
+    return image_model
+
+
+def create_video_baseline_model(max_video_frame_number, video_height, video_width, video_channels=3, output_dim=50):
+    """
+    :param input_shape: [video sequences length, video_height, video_width, video_channels]
+    :return: 3D_convolutional model
+    """
+
+    def get_con3d_block(filters=64, kernel_size=(3, 3, 3),
+                        strides=(1, 1, 1), padding='same'):
+        return tf.keras.layers.Conv3D(filters=filters, kernel_size=kernel_size, strides=strides,
+                                      padding=padding, data_format='channels_last',
+                                      dilation_rate=(1, 1, 1), activation='relu',
+                                      use_bias=True, kernel_initializer='glorot_uniform',
+                                      bias_initializer='zeros', kernel_regularizer=None,
+                                      bias_regularizer=None, activity_regularizer=None,
+                                      kernel_constraint=None, bias_constraint=None)
+
+    def get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2), padding='same'):
+        return tf.keras.layers.MaxPooling3D(pool_size=pool_size, strides=strides,
+                                            padding=padding, data_format='channels_last')
+
+    model = tf.keras.models.Sequential(name="video_baseline_model")
+    # Input
+    model.add(tf.keras.layers.Input(shape=(max_video_frame_number, video_height, video_width, video_channels)))
+
+    # Conv3D + MaxPooling3D
+    model.add(get_con3d_block(filters=32, kernel_size=(3, 3, 3)))
+    model.add(get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2)))
+    model.add(get_con3d_block(filters=32, kernel_size=(3, 3, 3)))
+    model.add(get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2)))
+    model.add(get_con3d_block(filters=64, kernel_size=(3, 3, 3)))
+    model.add(get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2)))
+    model.add(get_con3d_block(filters=64, kernel_size=(3, 3, 3)))
+    model.add(get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2)))
+    model.add(get_con3d_block(filters=128, kernel_size=(3, 3, 3)))
+    model.add(get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2)))
+    model.add(get_con3d_block(filters=128, kernel_size=(3, 3, 3)))
+    model.add(get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2)))
+    model.add(get_con3d_block(filters=256, kernel_size=(3, 3, 3)))
+    model.add(get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2)))
+    model.add(get_con3d_block(filters=256, kernel_size=(3, 3, 3)))
+    model.add(get_maxpooling3d_block(pool_size=(1, 2, 2), strides=(1, 2, 2)))
+
+    # Flatten
+    model.add(tf.keras.layers.Flatten())
+
+    # FC layers group
+    model.add(tf.keras.layers.Dense(256, activation='relu', name='fc6'))
+    model.add(tf.keras.layers.Dropout(0.3))
+    model.add(tf.keras.layers.Dense(128, activation='relu', name='fc7'))
+    model.add(tf.keras.layers.Dropout(0.2))
+    model.add(tf.keras.layers.Dense(output_dim))
+
+    return model
+
+
+def create_multimodal_baseline_model(label_number=31, txt_maxlen=20, text_vocab_size=15799, text_embedding_dim=100,
+                                     text_lstm_units=64, text_output_dim=50,
+                                     image_height=270, image_width=480, image_channels=3, image_output_dim=50,
+                                     max_video_frame_number=100, video_height=360, video_width=640, video_channels=3,
+                                     video_output_dim=50):
+    """
+    Multimodal Baseline Model
+    Text model parameters:
+    [ vocab_size, txt_maxlen, text_embedding_dim, text_lstm_units, text_output_dim]
+
+    Image model parameters:
+    [image_height, image_width, image_channels, image_output_dim]
+
+    Video model parameters:
+    [max_video_frame_number, video_height, video_width, video_channels, video_output_dim]
+
+    label_number
+    """
+    text_input = tf.keras.layers.Input(shape=(txt_maxlen), name='text')
+    image_input = tf.keras.layers.Input(shape=(image_height, image_width, image_channels), name='image')
+    video_input = tf.keras.layers.Input(shape=(max_video_frame_number, video_height, video_width, video_channels),
+                                        name='video')
+
+    text_model = create_text_baseline_model(txt_maxlen, text_vocab_size, text_embedding_dim, text_lstm_units,
+                                            text_output_dim)
+    image_model = create_image_baseline_model(image_height, image_width, image_channels, image_output_dim)
+    video_model = create_video_baseline_model(max_video_frame_number, video_height, video_width, video_channels,
+                                              video_output_dim)
+
+    text_feature = text_model(text_input)
+    image_feature = image_model(image_input)
+    video_feature = video_model(video_input)
+
+    multimodal_feature = tf.keras.layers.concatenate([text_feature, image_feature, video_feature], axis=-1)
+
+    x = tf.keras.layers.Dense(100)(multimodal_feature)
+    label_predict = tf.keras.layers.Dense(label_number, activation='softmax', name='label_predict')(x)
+
+    multimodal_baseline_model = tf.keras.Model(inputs=[text_input, image_input, video_input], outputs=[label_predict])
+    return multimodal_baseline_model
--- a/baseline_model/test_mutimodal_baseline_model.py
+++ b/baseline_model/test_mutimodal_baseline_model.py
+import tensorflow as tf
+from mutimodal_baseline_model import create_text_baseline_model, create_image_baseline_model, \
+    create_video_baseline_model, create_multimodal_baseline_model
+
+if __name__ == '__main__':
+    shuffle_data = True
+    BATCH_SIZE = 5
+    REPEAT_DATASET = None
+
+    vocab_size = 15798 + 1  # 1 for unknown
+    txt_maxlen = 20
+    image_height = 270
+    image_width = 480
+    image_channels = 3
+    max_video_frame_number = 100
+    video_height = 360
+    video_width = 640
+    video_channels = 3
+
+    label_number = 31
+
+    batch_txt_data = tf.random.uniform((BATCH_SIZE, txt_maxlen), 0, vocab_size, dtype=tf.int32)
+    print("batch_txt_data.shape", batch_txt_data.shape)
+    text_model = create_text_baseline_model(txt_maxlen, vocab_size, embedding_dim=100, lstm_units=64, output_dim=50)
+    tf.keras.utils.plot_model(text_model, show_shapes=True, to_file='text_model_baseline_model.png')
+    batch_txt_feature = text_model(batch_txt_data)
+    print("batch_txt_feature.shape", batch_txt_feature.shape)
+    print("")
+
+    batch_image_data = tf.random.normal(shape=(BATCH_SIZE, image_height, image_width, image_channels))
+    print("batch_image_data.shape", batch_image_data.shape)
+    image_model = create_image_baseline_model(image_height, image_width, image_channels, output_dim=50)
+    tf.keras.utils.plot_model(image_model, show_shapes=True, to_file='image_model_baseline_model.png')
+    batch_image_feature = image_model(batch_image_data)
+    print("batch_image_feature.shape", batch_image_feature.shape)
+    print("")
+
+    batch_video_data = tf.random.normal(
+        shape=(BATCH_SIZE, max_video_frame_number, video_height, video_width, video_channels))
+    print("batch_video_data.shape", batch_video_data.shape)
+    video_model = create_video_baseline_model(max_video_frame_number, video_height, video_width, video_channels,
+                                              output_dim=50)
+    tf.keras.utils.plot_model(video_model, show_shapes=True, to_file='video_model_baseline_model.png')
+    batch_video_feature = video_model(batch_video_data)
+    print("batch_video_feature.shape", batch_video_feature.shape)
+    print("")
+
+    batch_txt_data = tf.random.uniform((BATCH_SIZE, txt_maxlen), 0, vocab_size, dtype=tf.int32)
+    batch_image_data = tf.random.normal(shape=(BATCH_SIZE, image_height, image_width, image_channels))
+    batch_video_data = tf.random.normal(
+        shape=(BATCH_SIZE, max_video_frame_number, video_height, video_width, video_channels))
+
+    multimodal_model = create_multimodal_baseline_model(label_number=label_number, txt_maxlen=txt_maxlen,
+                                                        text_vocab_size=vocab_size, text_embedding_dim=100,
+                                                        text_lstm_units=64, text_output_dim=50,
+                                                        image_height=image_height, image_width=image_width,
+                                                        image_channels=image_channels, image_output_dim=50,
+                                                        max_video_frame_number=max_video_frame_number,
+                                                        video_height=video_height, video_width=video_width,
+                                                        video_channels=video_channels, video_output_dim=50)
+
+    tf.keras.utils.plot_model(multimodal_model, show_shapes=True, to_file='multimodal_baseline_model.png')
+    multimodal_model_out = multimodal_model([batch_txt_data, batch_image_data, batch_video_data])
+    print("multimodal_model_out.shape", multimodal_model_out.shape)
--- a/baseline_model/text_model_baseline_model.png
+++ b/baseline_model/text_model_baseline_model.png
--- a/baseline_model/video_model_baseline_model.png
+++ b/baseline_model/video_model_baseline_model.png
--- a/data_download_tools/README.md
+++ b/data_download_tools/README.md
+## How to use download tools
+
+
+### Require
+
+ python 3+, e.g. python==3.6
+ scrapy
+
+
+### Running web crawlers
+
+```
+cd xinpianchang
+python start_MP4_meta_info.py
+```
+
+### Detail Configuration
+
+[MP4_meta_info.py](data_download_tools/xinpianchang/xinpianchang/spiders/MP4_meta_info.py)
+
+7~13 lines: All video types required by default.
+
+```python
+video_type_dict = {'360VR': 'VR', '4k': '4K', 'Technology': '科技', 'Sport': '运动', 'Timelapse': '延时',
+                   'Aerial': '航拍', 'Animals': '动物', 'Sea': '大海', 'Beach': '海滩', 'space': '太空',
+                   'stars': '星空', 'City': '城市', 'Business': '商业', 'Underwater': '水下摄影',
+                   'Wedding': '婚礼', 'Archival': '档案', 'Backgrounds': '背景', 'Alpha Channel': '透明通道',
+                   'Intro': '开场', 'Celebration': '庆典', 'Clouds': '云彩', 'Corporate': '企业',
+                   'Explosion': '爆炸', 'Film': '电影镜头', 'Green Screen': '绿幕', 'Military': '军事',
+                   'Nature': '自然', 'News': '新闻', 'R3d': 'R3d', 'Romantic': '浪漫', 'Abstract': '抽象'}
+```
+
+30~35 lines: DOWNLOAD_DELAY The smaller the data capture speed, the faster
+
+```python
+custom_settings = {
+    'DOWNLOAD_DELAY': 3.5,
+    'DOWNLOAD_TIMEOUT': 180,
+    'RANDOMIZE_DOWNLOAD_DELAY': True,
+    'JOBDIR': "reamin/MP4_meta_info_001"
+}
+```
--- a/data_download_tools/xinpianchang/check_download_file.py
+++ b/data_download_tools/xinpianchang/check_download_file.py
+import pathlib
+import os
+import random
+import matplotlib.pyplot as plt
+
+
+def get_video_type(dir_name="MP4_download"):
+    """
+    :param dir_name:
+    :return: video_type: , example: {
+                   '360VR': 'VR', '4k': '4K', 'Technology': '科技', 'Sport': '运动', 'Timelapse': '延时',
+                   'Aerial': '航拍', 'Animals': '动物', 'Sea': '大海', 'Beach': '海滩', 'space': '太空',
+                   'stars': '星空', 'City': '城市', 'Business': '商业', 'Underwater': '水下摄影',
+                   'Wedding': '婚礼', 'Archival': '档案', 'Backgrounds': '背景', 'Alpha Channel': '透明通道',
+                   'Intro': '开场', 'Celebration': '庆典', 'Clouds': '云彩', 'Corporate': '企业',
+                   'Explosion': '爆炸', 'Film': '电影镜头', 'Green Screen': '绿幕', 'Military': '军事',
+                   'Nature': '自然', 'News': '新闻', 'R3d': 'R3d', 'Romantic': '浪漫', 'Abstract': '抽象'}
+    """
+    dir_name = pathlib.Path(dir_name)
+    video_type = list(dir_name.glob('*'))
+    video_type = [str(path).split("/")[-1] for path in video_type]
+    print("Existing Video Types Numbers:\t", len(video_type))
+    print("Existing Video Types        :\t", video_type)
+    print("")
+    return video_type
+
+
+def get_description_information(txt_path):
+    """description_information include: {'mp4_id': '', 'mp4_download_url': '', 'mp4_time': '',
+    'mp4_background_image_url': '', 'mp4_txt_brief': ''}"""
+    description_information_dict = eval(open(txt_path).read())
+    return description_information_dict
+
+
+def show_image_and_description_information(image_path, description_information_dict):
+    lena = plt.imread(image_path)
+    plt.imshow(lena)
+    plt.title(description_information_dict["mp4_background_image_url"])
+    plt.xlabel(description_information_dict["mp4_txt_brief"])
+    plt.ylabel(description_information_dict["mp4_id"])
+    plt.xticks([])
+    plt.yticks([])
+    # plt.axis('off')
+    plt.show()
+
+
+def check_download_file(dir_name="MP4_download", video_type=None, shuffle_data=False,
+                        print_file_path=False, show_txt=False, show_image=False, check_number=None, ):
+    """
+    Check one type of video file downloaded from https://www.xinpianchang.com/
+    :param dir_name: Root directory for storing data
+    :param video_type: Check video type, example: {
+                   '360VR': 'VR', '4k': '4K', 'Technology': '科技', 'Sport': '运动', 'Timelapse': '延时',
+                   'Aerial': '航拍', 'Animals': '动物', 'Sea': '大海', 'Beach': '海滩', 'space': '太空',
+                   'stars': '星空', 'City': '城市', 'Business': '商业', 'Underwater': '水下摄影',
+                   'Wedding': '婚礼', 'Archival': '档案', 'Backgrounds': '背景', 'Alpha Channel': '透明通道',
+                   'Intro': '开场', 'Celebration': '庆典', 'Clouds': '云彩', 'Corporate': '企业',
+                   'Explosion': '爆炸', 'Film': '电影镜头', 'Green Screen': '绿幕', 'Military': '军事',
+                   'Nature': '自然', 'News': '新闻', 'R3d': 'R3d', 'Romantic': '浪漫', 'Abstract': '抽象'}
+    :param shuffle_data: Scrambling data, sampling check
+    :param print_file_path: Print out all file paths
+    :param show_txt: Print video meta information
+    :param show_image: Show video cover image
+    :param check_number: Number of files to check, None stands for all
+    :return: Number of various documents (all_item_number, txt_number, image_number, video_number)
+    """
+    dir_name = pathlib.Path(dir_name)
+
+    path_mode = video_type + "/*"
+    all_item_paths = list(dir_name.glob(path_mode))
+    all_item_paths = [str(path) for path in all_item_paths]
+    if shuffle_data:
+        random.shuffle(all_item_paths)
+    all_item_number = len(all_item_paths)
+    txt_number = 0
+    image_number = 0
+    video_number = 0
+
+    for idx, item in enumerate(all_item_paths):
+        item_id = item.split("/")[-1]
+        item_type = item.split("/")[1]
+        for item_file in os.listdir(item):
+            if item_file.endswith('.txt'):
+                txt_path = os.path.join(item, item_file)
+            elif item_file.endswith('.jpeg'):
+                image_path = os.path.join(item, item_file)
+            elif item_file.endswith('.mp4'):
+                mp4_path = os.path.join(item, item_file)
+            else:
+                raise ValueError("An abnormal document appeared! check!")
+
+        if os.path.exists(txt_path):
+            description_information_dict = get_description_information(txt_path)
+        else:
+            description_information_dict = {'mp4_id': '', 'mp4_download_url': '', 'mp4_time': '',
+                                            'mp4_background_image_url': '', 'mp4_txt_brief': ''}
+
+        if os.path.exists(txt_path):
+            if print_file_path:
+                print(f"exsit {txt_path}")
+            txt_number += 1
+            if show_txt:
+                print(open(txt_path).read())
+                print("item_type:\t", item_type)
+        else:
+            if print_file_path:
+                print(f"Not exsit {txt_path}")
+
+        if os.path.exists(image_path):
+            if print_file_path:
+                print(f"exsit {image_path}")
+            image_number += 1
+            if show_image:
+                show_image_and_description_information(image_path, description_information_dict)
+        else:
+            if print_file_path:
+                print(f"Not exsit {image_path}")
+
+        if os.path.exists(mp4_path):
+            if print_file_path:
+                print(f"exists {mp4_path}")
+            video_number += 1
+        else:
+            if print_file_path:
+                print(f"Not exists {mp4_path}")
+
+        if print_file_path:
+            print("")
+
+        if check_number is not None:
+            if idx == check_number - 1:
+                break
+
+    count_item_number_list = [all_item_number, txt_number, image_number, video_number]
+    if len(set(count_item_number_list)) == 1:
+        print("All documents are complete!")
+    else:
+        print("Document missing!")
+    print("all_item_number:\t", all_item_number)
+    print("txt_number:\t", txt_number)
+    print("image_number:\t", image_number)
+    print("video_number:\t", video_number)
+    return all_item_number, txt_number, image_number, video_number
+
+
+def check_all_downloaded_files(dir_name="MP4_download"):
+    """Check all files downloaded from https://www.xinpianchang.com/"""
+    for mp4_type in get_video_type(dir_name=dir_name):
+        print(f"video_type\t:{mp4_type}")
+        check_download_file(dir_name=dir_name, video_type=mp4_type, shuffle_data=False, print_file_path=False, show_txt=False, show_image=False, check_number=None)
+        print(" ")
\ No newline at end of file
--- a/data_download_tools/xinpianchang/check_image.png
+++ b/data_download_tools/xinpianchang/check_image.png
--- a/data_download_tools/xinpianchang/download_file_info.ipynb
+++ b/data_download_tools/xinpianchang/download_file_info.ipynb
--- a/data_download_tools/xinpianchang/download_mp4_info.png
+++ b/data_download_tools/xinpianchang/download_mp4_info.png
--- a/data_download_tools/xinpianchang/scrapy.cfg
+++ b/data_download_tools/xinpianchang/scrapy.cfg
+# Automatically created by: scrapy startproject
+#
+# For more information about the [deploy] section see:
+# https://scrapyd.readthedocs.io/en/latest/deploy.html
+
+[settings]
+default = xinpianchang.settings
+
+[deploy]
+#url = http://localhost:6800/
+project = xinpianchang
--- a/data_download_tools/xinpianchang/start_MP4.py
+++ b/data_download_tools/xinpianchang/start_MP4.py
+from scrapy.cmdline import execute
+execute(['scrapy', 'crawl', 'MP4'])
\ No newline at end of file
--- a/data_download_tools/xinpianchang/start_MP4_meta_info.py
+++ b/data_download_tools/xinpianchang/start_MP4_meta_info.py
+from scrapy.cmdline import execute
+execute(['scrapy', 'crawl', 'MP4_meta_info'])
\ No newline at end of file
--- a/data_download_tools/xinpianchang/xinpianchang/__init__.py
+++ b/data_download_tools/xinpianchang/xinpianchang/__init__.py
--- a/data_download_tools/xinpianchang/xinpianchang/items.py
+++ b/data_download_tools/xinpianchang/xinpianchang/items.py
+# -*- coding: utf-8 -*-
+
+# Define here the models for your scraped items
+#
+# See documentation in:
+# https://doc.scrapy.org/en/latest/topics/items.html
+
+import scrapy
+
+
+class XinpianchangItem(scrapy.Item):
+    # define the fields for your item here like:
+    # name = scrapy.Field()
+    pass
--- a/data_download_tools/xinpianchang/xinpianchang/middlewares.py
+++ b/data_download_tools/xinpianchang/xinpianchang/middlewares.py
+# -*- coding: utf-8 -*-
+
+# Define here the models for your spider middleware
+#
+# See documentation in:
+# https://doc.scrapy.org/en/latest/topics/spider-middleware.html
+
+from scrapy import signals
+
+
+class XinpianchangSpiderMiddleware(object):
+    # Not all methods need to be defined. If a method is not defined,
+    # scrapy acts as if the spider middleware does not modify the
+    # passed objects.
+
+    @classmethod
+    def from_crawler(cls, crawler):
+        # This method is used by Scrapy to create your spiders.
+        s = cls()
+        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
+        return s
+
+    def process_spider_input(self, response, spider):
+        # Called for each response that goes through the spider
+        # middleware and into the spider.
+
+        # Should return None or raise an exception.
+        return None
+
+    def process_spider_output(self, response, result, spider):
+        # Called with the results returned from the Spider, after
+        # it has processed the response.
+
+        # Must return an iterable of Request, dict or Item objects.
+        for i in result:
+            yield i
+
+    def process_spider_exception(self, response, exception, spider):
+        # Called when a spider or process_spider_input() method
+        # (from other spider middleware) raises an exception.
+
+        # Should return either None or an iterable of Response, dict
+        # or Item objects.
+        pass
+
+    def process_start_requests(self, start_requests, spider):
+        # Called with the start requests of the spider, and works
+        # similarly to the process_spider_output() method, except
+        # that it doesn’t have a response associated.
+
+        # Must return only requests (not items).
+        for r in start_requests:
+            yield r
+
+    def spider_opened(self, spider):
+        spider.logger.info('Spider opened: %s' % spider.name)
+
+
+class XinpianchangDownloaderMiddleware(object):
+    # Not all methods need to be defined. If a method is not defined,
+    # scrapy acts as if the downloader middleware does not modify the
+    # passed objects.
+
+    @classmethod
+    def from_crawler(cls, crawler):
+        # This method is used by Scrapy to create your spiders.
+        s = cls()
+        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
+        return s
+
+    def process_request(self, request, spider):
+        # Called for each request that goes through the downloader
+        # middleware.
+
+        # Must either:
+        # - return None: continue processing this request
+        # - or return a Response object
+        # - or return a Request object
+        # - or raise IgnoreRequest: process_exception() methods of
+        #   installed downloader middleware will be called
+        return None
+
+    def process_response(self, request, response, spider):
+        # Called with the response returned from the downloader.
+
+        # Must either;
+        # - return a Response object
+        # - return a Request object
+        # - or raise IgnoreRequest
+        return response
+
+    def process_exception(self, request, exception, spider):
+        # Called when a download handler or a process_request()
+        # (from other downloader middleware) raises an exception.
+
+        # Must either:
+        # - return None: continue processing this exception
+        # - return a Response object: stops process_exception() chain
+        # - return a Request object: stops process_exception() chain
+        pass
+
+    def spider_opened(self, spider):
+        spider.logger.info('Spider opened: %s' % spider.name)
--- a/data_download_tools/xinpianchang/xinpianchang/pipelines.py
+++ b/data_download_tools/xinpianchang/xinpianchang/pipelines.py
+# -*- coding: utf-8 -*-
+
+# Define your item pipelines here
+#
+# Don't forget to add your pipeline to the ITEM_PIPELINES setting
+# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
+
+
+class XinpianchangPipeline(object):
+    def process_item(self, item, spider):
+        return item
--- a/data_download_tools/xinpianchang/xinpianchang/settings.py
+++ b/data_download_tools/xinpianchang/xinpianchang/settings.py
+# -*- coding: utf-8 -*-
+
+# Scrapy settings for xinpianchang project
+#
+# For simplicity, this file contains only settings considered important or
+# commonly used. You can find more settings consulting the documentation:
+#
+#     https://doc.scrapy.org/en/latest/topics/settings.html
+#     https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
+#     https://doc.scrapy.org/en/latest/topics/spider-middleware.html
+
+BOT_NAME = 'xinpianchang'
+
+SPIDER_MODULES = ['xinpianchang.spiders']
+NEWSPIDER_MODULE = 'xinpianchang.spiders'
+
+
+# Crawl responsibly by identifying yourself (and your website) on the user-agent
+#USER_AGENT = 'xinpianchang (+http://www.yourdomain.com)'
+
+# Obey robots.txt rules
+ROBOTSTXT_OBEY = False
+
+# Configure maximum concurrent requests performed by Scrapy (default: 16)
+#CONCURRENT_REQUESTS = 32
+
+# Configure a delay for requests for the same website (default: 0)
+# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
+# See also autothrottle settings and docs
+#DOWNLOAD_DELAY = 3
+# The download delay setting will honor only one of:
+#CONCURRENT_REQUESTS_PER_DOMAIN = 16
+#CONCURRENT_REQUESTS_PER_IP = 16
+
+# Disable cookies (enabled by default)
+COOKIES_ENABLED = False
+
+# Disable Telnet Console (enabled by default)
+#TELNETCONSOLE_ENABLED = False
+
+# Override the default request headers:
+#DEFAULT_REQUEST_HEADERS = {
+#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
+#   'Accept-Language': 'en',
+#}
+
+# Enable or disable spider middlewares
+# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
+#SPIDER_MIDDLEWARES = {
+#    'xinpianchang.middlewares.XinpianchangSpiderMiddleware': 543,
+#}
+
+# Enable or disable downloader middlewares
+# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
+DOWNLOADER_MIDDLEWARES = {
+    #'xinpianchang.middlewares.XinpianchangDownloaderMiddleware': 543,
+    'xinpianchang.user_agent.RotateUserAgentMiddleware': 400,
+}
+
+# Enable or disable extensions
+# See https://doc.scrapy.org/en/latest/topics/extensions.html
+#EXTENSIONS = {
+#    'scrapy.extensions.telnet.TelnetConsole': None,
+#}
+
+# Configure item pipelines
+# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
+#ITEM_PIPELINES = {
+#    'xinpianchang.pipelines.XinpianchangPipeline': 300,
+#}
+
+# Enable and configure the AutoThrottle extension (disabled by default)
+# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
+#AUTOTHROTTLE_ENABLED = True
+# The initial download delay
+#AUTOTHROTTLE_START_DELAY = 5
+# The maximum download delay to be set in case of high latencies
+#AUTOTHROTTLE_MAX_DELAY = 60
+# The average number of requests Scrapy should be sending in parallel to
+# each remote server
+#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
+# Enable showing throttling stats for every response received:
+#AUTOTHROTTLE_DEBUG = False
+
+# Enable and configure HTTP caching (disabled by default)
+# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
+#HTTPCACHE_ENABLED = True
+#HTTPCACHE_EXPIRATION_SECS = 0
+#HTTPCACHE_DIR = 'httpcache'
+#HTTPCACHE_IGNORE_HTTP_CODES = []
+#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
--- a/data_download_tools/xinpianchang/xinpianchang/spiders/MP4.py
+++ b/data_download_tools/xinpianchang/xinpianchang/spiders/MP4.py
+# -*- coding: utf-8 -*-
+import scrapy
+from scrapy.spiders import CrawlSpider, Rule
+import json
+import os
+
+def read_raw_mp4_info_json_file_2_list(mp4_info_file="mp4_info.json"):
+    MP4_info_list = list()
+    with open(mp4_info_file, 'r', encoding='utf-8') as rf:
+        while True:
+            line = rf.readline()
+            if line:
+                line_dict = json.loads(line)
+                MP4_info_list.append(line_dict["mp4_download_url"])
+            else:
+                break
+    return MP4_info_list
+
+
+class Mp4Spider(CrawlSpider):
+    name = 'MP4'
+    allowed_domains = ['xinpianchang.com']
+    start_urls = ['https://www.xinpianchang.com/']
+    custom_settings = {
+        'DOWNLOAD_DELAY': 2,
+        'RANDOMIZE_DOWNLOAD_DELAY': True,
+    }
+
+    def start_requests(self):
+        self.MP4_base_dir = "MP4_download"
+        task_mp4_type_list = ["4k_url", ]
+        for mp4_type in task_mp4_type_list:
+            mp4_type_store_dir = os.path.join(self.MP4_base_dir, mp4_type)
+            if not os.path.exists(mp4_type_store_dir):
+                os.makedirs(mp4_type_store_dir)
+            MP4_info_list = read_raw_mp4_info_json_file_2_list(os.path.join("task_file", mp4_type + ".json"))
+            for url in MP4_info_list:
+                yield scrapy.Request(url=url, callback=self.parse_video, meta={'mp4_type': mp4_type},
+                                     headers={'Referer': 'https://www.xinpianchang.com/'})
+
+    def parse_video(self, response):
+        meta = response.meta
+        url = response.url
+        mp4_type = response.meta["mp4_type"]
+        file_name = url.split("/")[-1]
+        mp4_type_store_dir = os.path.join(self.MP4_base_dir, mp4_type)
+        video_local_path = os.path.join(mp4_type_store_dir, file_name)
+        with open(video_local_path, "wb") as f:
+            f.write(response.body)
+        yield meta
\ No newline at end of file
--- a/data_download_tools/xinpianchang/xinpianchang/spiders/MP4_meta_info.py
+++ b/data_download_tools/xinpianchang/xinpianchang/spiders/MP4_meta_info.py
+# -*- coding: utf-8 -*-
+import scrapy
+import random
+import os
+from bs4 import BeautifulSoup
+
+video_type_dict = {'360VR': 'VR', '4k': '4K', 'Technology': '科技', 'Sport': '运动', 'Timelapse': '延时',
+                   'Aerial': '航拍', 'Animals': '动物', 'Sea': '大海', 'Beach': '海滩', 'space': '太空',
+                   'stars': '星空', 'City': '城市', 'Business': '商业', 'Underwater': '水下摄影',
+                   'Wedding': '婚礼', 'Archival': '档案', 'Backgrounds': '背景', 'Alpha Channel': '透明通道',
+                   'Intro': '开场', 'Celebration': '庆典', 'Clouds': '云彩', 'Corporate': '企业',
+                   'Explosion': '爆炸', 'Film': '电影镜头', 'Green Screen': '绿幕', 'Military': '军事',
+                   'Nature': '自然', 'News': '新闻', 'R3d': 'R3d', 'Romantic': '浪漫', 'Abstract': '抽象'}
+
+def get_page_start_end_by_mp4_type(mp4_type):
+	# Check https://resource.xinpianchang.com/video/list for the latest information
+	# The update time is 2019/07/19
+    if mp4_type in ["360VR"]:
+        return 1, 18
+    elif mp4_type in ["Archival"]:
+        return 1, 170
+    elif mp4_type in ["R3d"]:
+        return 1, 264
+    else:
+        return 1, 301
+
+class Mp4Spider(scrapy.Spider):
+    name = 'MP4_meta_info'
+    start_urls = ['https://www.xinpianchang.com/']
+    custom_settings = {
+        'DOWNLOAD_DELAY': 3.5,
+        'DOWNLOAD_TIMEOUT': 180,
+        'RANDOMIZE_DOWNLOAD_DELAY': True,
+        'JOBDIR': "reamin/MP4_meta_info_001"
+    }
+
+    def start_requests(self):
+        self.MP4_base_dir = "MP4_download"
+        accessed_url_file = "reamin/accessed_url.txt"
+        if not os.path.exists(self.MP4_base_dir):
+            os.mkdir(self.MP4_base_dir)
+        video_type_list = list(video_type_dict.keys())
+        random.shuffle(video_type_list)
+        for mp4_type in video_type_list:
+            if mp4_type not in ["Explosion", ]:
+                mp4_type_store_dir = os.path.join(self.MP4_base_dir, mp4_type)
+                if not os.path.exists(mp4_type_store_dir):
+                    os.makedirs(mp4_type_store_dir)
+                page_number_start, page_number_end = get_page_start_end_by_mp4_type(mp4_type)
+                for page_number in range(page_number_start, page_number_end):
+                    mp4_list_page_url = f"https://resource.xinpianchang.com/video/list?cate={mp4_type}&page={page_number}"
+                    yield scrapy.Request(url=mp4_list_page_url, callback=self.parse_video_meta_info,
+                                         meta={'mp4_type_store_dir': mp4_type_store_dir},
+                                         headers={'Referer': 'https://www.xinpianchang.com/'})
+
+    def parse_video_meta_info(self, response):
+        mp4_type_store_dir = response.meta["mp4_type_store_dir"]
+        bs = BeautifulSoup(response.body, "html.parser")
+        for index, item in enumerate(
+                bs.find_all("li", {"class": {"single-video J_sigle_video", "single-video J_sigle_video detail-more"}})):
+            mp4_id = item["id"]
+            mp4_download_url = item['data-preview']
+            mp4_time = item.find_all("div", class_="single-video-duration")[0].string
+            mp4_background_image_url = item.find_all("div", class_="thumb-img")[0]["style"][len("background-image:url("):-1]
+            mp4_txt_brief = item.find_all("p", class_="single-brief J_single_brief")[0].string
+            mp4_meta_info_dict = {"mp4_id": mp4_id, "mp4_download_url": mp4_download_url, "mp4_time": mp4_time,
+                        "mp4_background_image_url": mp4_background_image_url,
+                        "mp4_txt_brief": mp4_txt_brief}
+            mp4_meta_info_dir = os.path.join(mp4_type_store_dir, str(mp4_id))
+            if not os.path.exists(mp4_meta_info_dir):
+                os.makedirs(mp4_meta_info_dir)
+            with open(os.path.join(mp4_meta_info_dir, str(mp4_id) + ".txt"), "w", encoding="utf-8") as mp4_meta_wf:
+                mp4_meta_wf.write(str(mp4_meta_info_dict))
+            yield scrapy.Request(url=mp4_download_url, callback=self.parse_video,
+                                 meta={'mp4_meta_info_dir': mp4_meta_info_dir})
+            yield scrapy.Request(url=mp4_background_image_url, callback=self.parse_background_image,
+                                 meta={'mp4_meta_info_dir': mp4_meta_info_dir})
+
+    def parse_video(self, response):
+        mp4_meta_info_dir = response.meta["mp4_meta_info_dir"]
+        url = response.url
+        file_name = url.split("/")[-1]
+        video_local_path = os.path.join(mp4_meta_info_dir, file_name)
+        with open(video_local_path, "wb") as f:
+            f.write(response.body)
+
+    def parse_background_image(self, response):
+        mp4_meta_info_dir = response.meta["mp4_meta_info_dir"]
+        url = response.url
+        file_name = url.split("/")[-1]
+        image_local_path = os.path.join(mp4_meta_info_dir, file_name)
+        with open(image_local_path, "wb") as f:
+            f.write(response.body)
\ No newline at end of file
--- a/data_download_tools/xinpianchang/xinpianchang/spiders/__init__.py
+++ b/data_download_tools/xinpianchang/xinpianchang/spiders/__init__.py
+# This package will contain the spiders of your Scrapy project
+#
+# Please refer to the documentation for information on how to create and manage
+# your spiders.
--- a/data_download_tools/xinpianchang/xinpianchang/user_agent.py
+++ b/data_download_tools/xinpianchang/xinpianchang/user_agent.py
+#coding:utf-8
+from scrapy import log
+import random
+from scrapy.downloadermiddlewares.useragent import UserAgentMiddleware
+class RotateUserAgentMiddleware(UserAgentMiddleware):
+    '''
+    for more user agent strings,you can find it in http://www.useragentstring.com/pages/useragentstring.php
+    '''
+    user_agent_list = [
+        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 "
+        "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
+        "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 "
+        "(KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
+        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 "
+        "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
+        "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 "
+        "(KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
+        "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 "
+        "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
+        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 "
+        "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
+        "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 "
+        "(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
+        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
+        "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
+        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
+        "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
+        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
+        "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
+        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
+        "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
+        "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
+        "(KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
+        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 "
+        "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
+        "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 "
+        "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
+    ]
+    def process_request(self, request, spider):
+        '''设置默认的请求头，从中任意的选择一个'''
+        ua = random.choice(self.user_agent_list)
+        if ua:
+            request.headers.setdefault('User-Agent', ua)
+            request.headers.setdefault('accept-language','zh-CN,zh;q=0.8')
--- a/data_interface_for_model/dataset_public_interface.py
+++ b/data_interface_for_model/dataset_public_interface.py
+import os
+import random
+import pathlib
+
+import cv2
+import numpy as np
+from tensorflow import keras
+import tensorflow_datasets as tfds
+
+video_type_dict = {'360VR': 'VR', '4k': '4K', 'Technology': '科技', 'Sport': '运动', 'Timelapse': '延时',
+                   'Aerial': '航拍', 'Animals': '动物', 'Sea': '大海', 'Beach': '海滩', 'space': '太空',
+                   'stars': '星空', 'City': '城市', 'Business': '商业', 'Underwater': '水下摄影',
+                   'Wedding': '婚礼', 'Archival': '档案', 'Backgrounds': '背景', 'Alpha Channel': '透明通道',
+                   'Intro': '开场', 'Celebration': '庆典', 'Clouds': '云彩', 'Corporate': '企业',
+                   'Explosion': '爆炸', 'Film': '电影镜头', 'Green Screen': '绿幕', 'Military': '军事',
+                   'Nature': '自然', 'News': '新闻', 'R3d': 'R3d', 'Romantic': '浪漫', 'Abstract': '抽象'}
+
+video_type_list = ['360VR', '4k', 'Abstract', 'Aerial', 'Alpha Channel', 'Animals', 'Archival', 'Backgrounds', 'Beach',
+                   'Business', 'Celebration', 'City', 'Clouds', 'Corporate', 'Explosion', 'Film', 'Green Screen',
+                   'Intro', 'Military', 'Nature', 'News', 'R3d', 'Romantic', 'Sea', 'Sport', 'Technology', 'Timelapse',
+                   'Underwater', 'Wedding', 'space', 'stars']
+
+video_label_to_id = {'360VR': 0, '4k': 1, 'Abstract': 2, 'Aerial': 3, 'Alpha Channel': 4, 'Animals': 5, 'Archival': 6,
+                     'Backgrounds': 7, 'Beach': 8, 'Business': 9, 'Celebration': 10, 'City': 11, 'Clouds': 12,
+                     'Corporate': 13, 'Explosion': 14, 'Film': 15, 'Green Screen': 16, 'Intro': 17, 'Military': 18,
+                     'Nature': 19, 'News': 20, 'R3d': 21, 'Romantic': 22, 'Sea': 23, 'Sport': 24, 'Technology': 25,
+                     'Timelapse': 26, 'Underwater': 27, 'Wedding': 28, 'space': 29, 'stars': 30}
+
+
+def standardization_of_file_names(data_root="MP4_download"):
+    """
+    Uniform naming format for each set of data as follows:
+
+    multimodal_data_id
+        multimodal_data_id.jepg
+        multimodal_data_id.mp4
+        multimodal_data_id.txt
+    """
+
+    # Get all multimodal data type names
+    data_root = pathlib.Path(data_root)
+    label_names_list = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
+    print(f"data_root contain video type numbers {len(label_names_list)}")
+    print(f"data_root contain video type {label_names_list}")
+
+    # Processing multimodal data sequentially
+    for label_name in label_names_list:
+        # Get all folders under a certain type of multimodal data
+        label_mode = label_name + "/*"
+        multimodal_data_dir = list(data_root.glob(label_mode))
+        multimodal_data_dir = [str(path) for path in multimodal_data_dir]
+
+        # File name for standardized multimodal data
+        for multimodal_data_path in multimodal_data_dir:
+            multimodal_data_id = os.path.basename(multimodal_data_path)
+            for item_file in os.listdir(multimodal_data_path):
+                item_file = os.path.join(multimodal_data_path, item_file)
+                if item_file.endswith('.txt'):
+                    os.rename(item_file, os.path.join(multimodal_data_path, multimodal_data_id + ".txt"))
+                elif item_file.endswith('.jpeg'):
+                    os.rename(item_file, os.path.join(multimodal_data_path, multimodal_data_id + ".jpeg"))
+                elif item_file.endswith('.mp4'):
+                    os.rename(item_file, os.path.join(multimodal_data_path, multimodal_data_id + ".mp4"))
+                elif item_file.endswith('.ipynb_checkpoints'):
+                    pass
+                else:
+                    raise ValueError("An abnormal document appeared! check!")
+
+
+def get_filtered_all_multimodal_data_item_file_dir_list(data_root="MP4_download"):
+    """
+    :param data_root: Original file root path
+    :return: filtered_all_multimodal_data_item_file_dir_list
+         ['data_root/360VR/89422838', 'data_root/360VR/107178375', 'data_root/360VR/67370207']
+    """
+
+    def delete_incomplete_data(multimodal_data_item_file_dir):
+        multimodal_data_id = os.path.basename(multimodal_data_item_file_dir)
+
+        txt_file_path = os.path.join(multimodal_data_item_file_dir, multimodal_data_id + ".txt")
+        jpeg_file_path = os.path.join(multimodal_data_item_file_dir, multimodal_data_id + ".jpeg")
+        mp4_file_path = os.path.join(multimodal_data_item_file_dir, multimodal_data_id + ".mp4")
+
+        for file_path in [mp4_file_path, jpeg_file_path, txt_file_path]:
+            if not os.path.exists(file_path):
+                return False
+        return True
+
+    # Get all multimodal data type names
+    data_root = pathlib.Path(data_root)
+    label_names_list = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
+
+    all_multimodal_data_item_file_dir_list = list()
+    for label_name in label_names_list:
+        # Get all folders under a certain type of multimodal data
+        label_mode = label_name + "/*"
+        multimodal_data_dir = list(data_root.glob(label_mode))
+        multimodal_data_dir = [str(path) for path in multimodal_data_dir]
+
+        all_multimodal_data_item_file_dir_list.extend(multimodal_data_dir)
+    print("all_multimodal_data_item_file_dir_list length", len(all_multimodal_data_item_file_dir_list))
+
+    filtered_all_multimodal_data_item_file_dir_list = list(
+        filter(delete_incomplete_data, all_multimodal_data_item_file_dir_list))
+
+    print("filtered_all_multimodal_data_item_file_dir_list length",
+          len(filtered_all_multimodal_data_item_file_dir_list))
+
+    return filtered_all_multimodal_data_item_file_dir_list
+
+
+def get_description_information(txt_path):
+    """description_information include: {'mp4_id': '', 'mp4_download_url': '', 'mp4_time': '',
+    'mp4_background_image_url': '', 'mp4_txt_brief': ''}"""
+    description_information_dict = eval(open(txt_path).read())
+    return description_information_dict
+
+
+def get_text_list_from_raw_txt_file(data_root="MP4_download"):
+    """
+    Getting mp4_txt_brief text data from the original file
+    :param data_root:  Original file root path
+    :return: text_list
+    """
+    data_root = pathlib.Path(data_root)
+    all_txt_data_paths = [str(path) for path in
+                          list(data_root.glob('*/*/*.txt'))]  # [MP4_download/360VR/89422838/89422838.txt,...]
+    text_list = []
+    for text_data_path in all_txt_data_paths:
+        description_information_dict = eval(open(text_data_path).read())
+        txt_brief = description_information_dict['mp4_txt_brief']
+        text_list.append(txt_brief)
+    return text_list
+
+
+def tfds_text_encoder_and_word_set(text_list):
+    """
+    TensorFlow dataset encoder
+    :param text_list:
+    :return:
+    """
+    tokenizer = tfds.features.text.Tokenizer()
+    vocabulary_set = set()
+
+    for text in text_list:
+        some_tokens = tokenizer.tokenize(text)
+        vocabulary_set.update(some_tokens)
+
+    vocab_size = len(vocabulary_set)
+    print("vocab_size", vocab_size)
+
+    text_encoder = tfds.features.text.TokenTextEncoder(vocabulary_set)
+
+    example_text = 'I am the blogger of Wangjiang Artificial Think Tank.' \
+                   ' Welcome to https://yuanxiaosc.github.io./'
+    encoded_example = text_encoder.encode(example_text)
+    print("example_text:\t", example_text)
+    print("encoded_example:\t", encoded_example)
+
+    return text_encoder, vocabulary_set
+
+
+def multimodal_data_path_generator(data_root="MP4_download", shuffle_data=False):
+    """
+    Multimodal Data Path Generator
+    :param data_root:  Original file root path
+    :param shuffle_data: Disrupt data order
+    :return:data_path_generator
+
+    Usage method:
+    for mp4_file_path, jpeg_file_path, txt_file_path, label in multimodal_data_path_generator(data_root,
+                                                                                              shuffle_data):
+        print("")
+        print("mp4_file_path", mp4_file_path)
+        print("jpeg_file_path", jpeg_file_path)
+        print("txt_file_path", txt_file_path)
+        print("label", label)
+    """
+    multimodal_data_item_file_dir_list = get_filtered_all_multimodal_data_item_file_dir_list(data_root)
+
+    if shuffle_data:
+        random.shuffle(multimodal_data_item_file_dir_list)
+
+    for item_file_dir in multimodal_data_item_file_dir_list:  # data_root/Business/849
+        multimodal_data_id = os.path.basename(item_file_dir)  # 849
+        label = os.path.basename(os.path.dirname(item_file_dir))  # Business
+        txt_file_path = os.path.join(item_file_dir, multimodal_data_id + ".txt")  # data_root/Business/849/849.txt
+        jpeg_file_path = os.path.join(item_file_dir, multimodal_data_id + ".jpeg")  # data_root/Business/849/849.jpeg
+        mp4_file_path = os.path.join(item_file_dir, multimodal_data_id + ".mp4")  # data_root/Business/849/849.mp4
+
+        # yield data_root/Business/849/849.mp4, data_root/Business/849/849.jpeg, data_root/Business/849/849.txt, Business
+        yield mp4_file_path, jpeg_file_path, txt_file_path, label
+
+
+def get_multimodal_data_path_list(data_root="MP4_download", shuffle_data=False):
+    """
+    Getting a multimodal data path list
+    :param data_root:  Original file root path
+    :param shuffle_data: Disrupt data order
+    :return:
+    """
+    multimodal_data_path_list = [(mp4_file_path, jpeg_file_path, txt_file_path) for
+                                 mp4_file_path, jpeg_file_path, txt_file_path, label in
+                                 multimodal_data_path_generator(data_root, shuffle_data)]
+    return multimodal_data_path_list
+
+
+def multimodal_encode_data_generator(data_root="MP4_download", shuffle_data=False, txt_maxlen=25,
+                                     max_video_frame_number=None, video_width=640, video_height=360):
+    """
+    Multimodal Encode Data Generator
+    :param data_root:  Original file root path
+    :param shuffle_data: Disrupt data order
+    :param max_video_frame_number: None -> keep all video number, int -> max need video frame number
+    :return: multimodal_encode_data_generator
+
+    Usage method:
+    for encode_video, image_file_path, encode_txt, encode_label in encode_multimodal_data(fake_data_root,
+                                                                                          shuffle_data,
+                                                                                          max_video_frame_number):
+        print("")
+        print("encode_video.shape", encode_video.shape)
+        print("image_file_path", image_file_path)
+        print("encode_txt", encode_txt)
+        print("encode_label", encode_label)
+    """
+
+    text_list = get_text_list_from_raw_txt_file(data_root)
+    text_encoder, vocabulary_set = tfds_text_encoder_and_word_set(text_list)
+
+    def process_video(video_file_path, max_video_frame_number=None, video_width=640, video_height=360):
+        videoCapture = cv2.VideoCapture(video_file_path)
+        success, frame = videoCapture.read()
+        frame_list = []
+        frame_number = 0
+        while success:
+            if frame is None:
+                break
+            if isinstance(max_video_frame_number, int) and frame_number == max_video_frame_number:
+                break
+            image_np = frame
+            resize_image_np = cv2.resize(image_np, dsize=(video_width, video_height))
+            resize_image_np_expanded = np.expand_dims(resize_image_np, axis=0)
+            frame_list.append(resize_image_np_expanded)
+            frame_number += 1
+            success, frame = videoCapture.read()
+        encode_video = np.concatenate(frame_list, axis=0)
+        return encode_video
+
+    def process_image_data(label):
+        encode_label = video_label_to_id[label]
+        return encode_label
+
+    def process_txt_data(txt_file_path, txt_maxlen=25):
+        description_information_dict = get_description_information(txt_file_path)
+        encode_txt = text_encoder.encode(description_information_dict['mp4_txt_brief'])
+        encode_txt = keras.preprocessing.sequence.pad_sequences(
+            [encode_txt], maxlen=txt_maxlen, dtype='int32', padding='post', truncating='post', value=0.0)
+        return encode_txt[0]
+
+    for mp4_file_path, jpeg_file_path, txt_file_path, label in multimodal_data_path_generator(data_root, shuffle_data):
+        encode_video = process_video(mp4_file_path, max_video_frame_number, video_width, video_height)
+        image_file_path = jpeg_file_path
+        encode_label = process_image_data(label)
+        encode_txt = process_txt_data(txt_file_path, txt_maxlen)
+        yield encode_video, image_file_path, encode_txt, encode_label
+
+
+if __name__ == "__main__":
+    data_root = "/home/b418a/disk1/jupyter_workspace/yuanxiao/douyin/xinpianchang/MP4_download"
+    fake_data_root = "/home/b418a/disk1/pycharm_room/yuanxiao/my_lenovo_P50s/Multimodal-short-video-dataset-and-baseline-model/MP4_download"
+    standardized_file_name = False  # Only need to be executed once, format the path of the original download file
+    shuffle_data = True
+
+    txt_maxlen = 25
+    max_video_frame_number = 100
+    video_height = 360
+    video_width = 640
+
+    if standardized_file_name:
+        standardization_of_file_names(data_root)
+
+    for mp4_file_path, jpeg_file_path, txt_file_path, label in multimodal_data_path_generator(fake_data_root,
+                                                                                              shuffle_data):
+        print("")
+        print("mp4_file_path", mp4_file_path)
+        print("jpeg_file_path", jpeg_file_path)
+        print("txt_file_path", txt_file_path)
+        print("label", label)
+
+    for encode_video, image_file_path, encode_txt, encode_label in multimodal_encode_data_generator(fake_data_root,
+                                                                                                    shuffle_data,
+                                                                                                    txt_maxlen,
+                                                                                                    max_video_frame_number,
+                                                                                                    video_width,
+                                                                                                    video_height):
+        print("")
+        print("encode_video.shape", encode_video.shape)
+        print("image_file_path", image_file_path)
+        print("encode_txt.shape", encode_txt.shape)
+        print("encode_txt", encode_txt)
+        print("encode_label", encode_label)
--- a/data_interface_for_model/numpy_dataset_interface.py
+++ b/data_interface_for_model/numpy_dataset_interface.py
+from tensorflow_dataset_interface import multimodel_numpy_data_interface
+
+if __name__=="__main__":
+    data_root = "/home/b418a/disk1/jupyter_workspace/yuanxiao/douyin/xinpianchang/MP4_download"
+    fake_data_root = "/home/b418a/disk1/pycharm_room/yuanxiao/my_lenovo_P50s/Multimodal-short-video-dataset-and-baseline-model/MP4_download"
+
+    shuffle_data = True
+    BATCH_SIZE = 5
+    REPEAT_DATASET = None
+
+    txt_maxlen = 20
+    image_height = 270
+    image_width = 480
+    max_video_frame_number = 100
+    video_height = 360
+    video_width = 640
+
+    numpy_generator = multimodel_numpy_data_interface(fake_data_root, shuffle_data, BATCH_SIZE, REPEAT_DATASET,
+                                                       txt_maxlen, image_height, image_width,
+                                                       max_video_frame_number, video_height, video_width)
+
+    for encode_video, encode_image, encoded_text, encode_label in numpy_generator:
+        print("")
+        print("encode_video", encode_video.shape, encode_video.dtype)
+        print("encode_image", encode_image.shape, encode_image.dtype)
+        print("encoded_text", encoded_text.shape, encoded_text.dtype)
+        print("encode_label", encode_label.shape, encode_label.dtype)
\ No newline at end of file
--- a/data_interface_for_model/tensorflow_dataset_interface.py
+++ b/data_interface_for_model/tensorflow_dataset_interface.py
+import tensorflow as tf
+from tensorflow import keras
+import tensorflow_datasets as tfds
+import numpy as np
+import pathlib
+from dataset_public_interface import multimodal_encode_data_generator
+
+
+def multimodal_tensorflow_dataset(data_root, shuffle_data=False, BATCH_SIZE=100, REPEAT_DATASET=None,
+                                  txt_maxlen=20, image_height=270, image_width=480,
+                                  max_video_frame_number=100, video_height=360, video_width=640):
+    """
+    multimodal tensorflow dataset
+    :param data_root:  Original file root path
+    Usage method:
+    multimodal_dataset = multimodal_tensorflow_dataset(fake_data_root, shuffle_data, BATCH_SIZE, REPEAT_DATASET,
+                                                       txt_maxlen, image_height, image_width,
+                                                       max_video_frame_number, video_height, video_width)
+
+    i = 0
+    for encode_video, image, encoded_text, encode_label in multimodal_dataset:
+        print(f"{i}")
+        print(encode_video.shape, encode_video.dtype)
+        print(image.shape, image.dtype)
+        print(encoded_text.shape, encoded_text.dtype)
+        print(encode_label.shape, encode_label.dtype)
+        i += 1
+    """
+
+    def filter_video_data(encode_video, image_file_path, encoded_text, encode_label):
+        """
+        Filtered video is not equal to the specified(max_video_frame_number) number of frames
+        """
+        video_frame_number = tf.shape(encode_video)[0]
+        return tf.math.equal(video_frame_number, max_video_frame_number)
+
+    def parser_multimodal_data(encode_video, image_file_path, encoded_text, encode_label):
+        def parser_image_data(jpeg_file_path):
+            """
+            Read the picture data and specify the value in the [-1,1] range
+            """
+            image = tf.io.read_file(jpeg_file_path)
+            image = tf.image.decode_jpeg(image)
+            image = tf.image.resize(image, [image_height, image_width])
+            image = tf.cast(image, dtype=tf.float32)
+            image = (image / 127.5) - 1.0
+            return image
+
+        image = parser_image_data(image_file_path)
+        return encode_video, image, encoded_text, encode_label
+
+    multimodal_dataset = tf.data.Dataset.from_generator(
+        lambda: multimodal_encode_data_generator(data_root, shuffle_data, txt_maxlen,
+                                                 max_video_frame_number, video_width, video_height),
+        output_shapes=(tf.TensorShape([None, video_height, video_width, 3]),
+                       tf.TensorShape(None), tf.TensorShape(txt_maxlen), tf.TensorShape(())),
+        output_types=(tf.float32, tf.string, tf.int32, tf.int32))
+
+    multimodal_dataset = multimodal_dataset.map(parser_multimodal_data,
+                                                num_parallel_calls=tf.data.experimental.AUTOTUNE)
+
+    multimodal_dataset = multimodal_dataset.filter(filter_video_data)
+    multimodal_dataset = multimodal_dataset.repeat(REPEAT_DATASET)
+    multimodal_dataset = multimodal_dataset.batch(BATCH_SIZE)
+    multimodal_dataset = multimodal_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
+    return multimodal_dataset
+
+
+def multimodel_numpy_data_interface(data_root, shuffle_data=False, BATCH_SIZE=100, REPEAT_DATASET=None,
+                                    txt_maxlen=20, image_height=270, image_width=480,
+                                    max_video_frame_number=100, video_height=360, video_width=640):
+    multimodal_dataset = multimodal_tensorflow_dataset(data_root, shuffle_data, BATCH_SIZE, REPEAT_DATASET,
+                                                       txt_maxlen, image_height, image_width,
+                                                       max_video_frame_number, video_height, video_width)
+
+    for encode_video, encode_image, encoded_text, encode_label in multimodal_dataset:
+        yield encode_video.numpy(), encode_image.numpy(), encoded_text.numpy(), encode_label.numpy()
+
+
+if __name__ == "__main__":
+    data_root = "/home/b418a/disk1/jupyter_workspace/yuanxiao/douyin/xinpianchang/MP4_download"
+    fake_data_root = "/home/b418a/disk1/pycharm_room/yuanxiao/my_lenovo_P50s/Multimodal-short-video-dataset-and-baseline-model/MP4_download"
+
+    shuffle_data = True
+    BATCH_SIZE = 16
+    REPEAT_DATASET = None
+
+    txt_maxlen = 20
+    image_height = 270
+    image_width = 480
+    max_video_frame_number = 100
+    video_height = 360
+    video_width = 640
+
+    multimodal_dataset = multimodal_tensorflow_dataset(fake_data_root, shuffle_data, BATCH_SIZE, REPEAT_DATASET,
+                                                       txt_maxlen, image_height, image_width,
+                                                       max_video_frame_number, video_height, video_width)
+
+    i = 0
+    for encode_video, image, encoded_text, encode_label in multimodal_dataset:
+        print(f"{i}")
+        print(encode_video.shape, encode_video.dtype)
+        print(image.shape, image.dtype)
+        print(encoded_text.shape, encoded_text.dtype)
+        print(encode_label.shape, encode_label.dtype)
+        i += 1
--- a/example_data/99945958.jpeg
+++ b/example_data/99945958.jpeg
--- a/example_data/99945958.mp4
+++ b/example_data/99945958.mp4
--- a/example_data/99945958.txt
+++ b/example_data/99945958.txt
+{'mp4_id': '99945958', 'mp4_download_url': 'https://p5-v1.xpccdn.com/099945958_main_xl.mp4', 'mp4_time': '0:15', 'mp4_background_image_url': 'https://p5-i1.xpccdn.com/099945958_iconl.jpeg', 'mp4_txt_brief': ' Hong Kong Circa 2017, City B-Roll'}
\ No newline at end of file
--- a/example_data/example_data_file.png
+++ b/example_data/example_data_file.png
--- a/train_multimodal_baseline_model.py
+++ b/train_multimodal_baseline_model.py
+import tensorflow as tf
+import sys
+import os
+
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "data_interface_for_model")))
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "baseline_model")))
+from baseline_model.mutimodal_baseline_model import create_multimodal_baseline_model
+from data_interface_for_model.tensorflow_dataset_interface import multimodal_tensorflow_dataset
+
+
+def train_multimodal_model_main(data_root, train_dataset_numbers, EPOCHS, LEARN_RATE,
+                                checkpoint_path, shuffle_data, BATCH_SIZE, REPEAT_DATASET,
+                                vocab_size, txt_maxlen,
+                                image_height, image_width, image_channels,
+                                max_video_frame_number, video_height, video_width, video_channels,
+                                label_number):
+    """
+    Training Multimodal Baseline Model
+
+    Control training and data parameters:
+    [data_root, train_dataset_numbers, EPOCHS, LEARN_RATE,
+    checkpoint_path, shuffle_data, BATCH_SIZE, REPEAT_DATASET,]
+
+    Text model parameters:
+    [ vocab_size, txt_maxlen,]
+
+    Image model parameters:
+    [image_height, image_width, image_channels,]
+
+    Video model parameters:
+    [max_video_frame_number, video_height, video_width, video_channels,]
+
+    label_number
+    """
+
+    multimodal_dataset = multimodal_tensorflow_dataset(data_root, shuffle_data, BATCH_SIZE, REPEAT_DATASET,
+                                                       txt_maxlen, image_height, image_width,
+                                                       max_video_frame_number, video_height, video_width)
+
+    # Create callback
+    checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path, save_weights_only=True,
+                                                             verbose=1, save_freq='epoch')
+    tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=checkpoint_path)
+    callback_list = [checkpoint_callback, tensorboard_callback]
+
+    STEPS_PER_EPOCH = train_dataset_numbers // BATCH_SIZE
+
+    multimodal_model = create_multimodal_baseline_model(label_number=label_number, txt_maxlen=txt_maxlen,
+                                                        text_vocab_size=vocab_size, text_embedding_dim=100,
+                                                        text_lstm_units=64, text_output_dim=50,
+                                                        image_height=image_height, image_width=image_width,
+                                                        image_channels=image_channels, image_output_dim=50,
+                                                        max_video_frame_number=max_video_frame_number,
+                                                        video_height=video_height, video_width=video_width,
+                                                        video_channels=video_channels, video_output_dim=50)
+
+    multimodal_model.compile(optimizer=tf.keras.optimizers.Adam(LEARN_RATE),
+                             loss=tf.keras.losses.CategoricalCrossentropy(),
+                             metrics=[tf.keras.metrics.CategoricalAccuracy()])
+
+    multimodal_model.fit(multimodal_dataset, epochs=EPOCHS,
+                         steps_per_epoch=STEPS_PER_EPOCH, callbacks=callback_list)
+
+
+if __name__ == "__main__":
+    data_root = "/home/b418a/disk1/jupyter_workspace/yuanxiao/douyin/xinpianchang/MP4_download"
+    train_dataset_numbers = 500000
+    EPOCHS = 200
+    LEARN_RATE = 0.001
+    checkpoint_path = "./keras_checkpoints/train"
+
+    shuffle_data = True
+    BATCH_SIZE = 64
+    REPEAT_DATASET = None
+
+    vocab_size = 15798 + 1  # 1 for unknown
+    txt_maxlen = 20
+    image_height = 270
+    image_width = 480
+    image_channels = 3
+    max_video_frame_number = 100
+    video_height = 360
+    video_width = 640
+    video_channels = 3
+
+    label_number = 31
+
+    train_multimodal_model_main(data_root, train_dataset_numbers, EPOCHS, LEARN_RATE, checkpoint_path,
+                                shuffle_data, BATCH_SIZE, REPEAT_DATASET, vocab_size, txt_maxlen,
+                                image_height, image_width, image_channels, max_video_frame_number,
+                                video_height, video_width, video_channels, label_number)