create code

ed437dbe · Bubbliiiing · ed437dbe · ed437dbe · ed437dbe · ed437dbe
34 changed file
--- a/.gitignore
+++ b/.gitignore
+# ignore map, miou, datasets
+map_out/
+miou_out/
+VOCdevkit/
+datasets/
+Medical_Datasets/
+lfw/
+logs/
+model_data/
+.temp_map_out/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
--- a/LICENSE
+++ b/LICENSE
--- a/README.md
+++ b/README.md
+## YOLOV5：You Only Look Once目标检测模型在pytorch当中的实现（edition v6.1 in Ultralytics）
+---
+
+## 目录
+1. [仓库更新 Top News](#仓库更新)
+2. [相关仓库 Related code](#相关仓库)
+3. [性能情况 Performance](#性能情况)
+4. [所需环境 Environment](#所需环境)
+5. [文件下载 Download](#文件下载)
+6. [训练步骤 How2train](#训练步骤)
+7. [预测步骤 How2predict](#预测步骤)
+8. [评估步骤 How2eval](#评估步骤)
+9. [参考资料 Reference](#Reference)
+
+## Top News
+**`2022-05`**:**仓库创建，支持不同尺寸模型训练，分别为n、s、m、l、x版本的yolov5、支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整、新增图片裁剪、支持多GPU训练、支持各个种类目标数量计算、支持heatmap、支持EMA。**  
+
+## 相关仓库
+| 模型 | 路径 |
+| :----- | :----- |
+YoloV3 | https://github.com/bubbliiiing/yolo3-pytorch  
+Efficientnet-Yolo3 | https://github.com/bubbliiiing/efficientnet-yolo3-pytorch  
+YoloV4 | https://github.com/bubbliiiing/yolov4-pytorch
+YoloV4-tiny | https://github.com/bubbliiiing/yolov4-tiny-pytorch
+Mobilenet-Yolov4 | https://github.com/bubbliiiing/mobilenet-yolov4-pytorch
+YoloV5-V5.0 | https://github.com/bubbliiiing/yolov5-pytorch
+YoloV5-V6.1 | https://github.com/bubbliiiing/yolov5-v6.1-pytorch
+YoloX | https://github.com/bubbliiiing/yolox-pytorch
+
+## 性能情况
+| 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mAP 0.5:0.95 | mAP 0.5 |
+| :-----: | :-----: | :------: | :------: | :------: | :-----: |
+| COCO-Train2017 | [yolov5_n_v6.1.pth](https://github.com/bubbliiiing/yolov5-v6.1-pytorch/releases/download/v1.0/yolov5_n_v6.1.pth) | COCO-Val2017 | 640x640 | 27.6 | 45.0
+| COCO-Train2017 | [yolov5_s_v6.1.pth](https://github.com/bubbliiiing/yolov5-v6.1-pytorch/releases/download/v1.0/yolov5_s_v6.1.pth) | COCO-Val2017 | 640x640 | 37.0 | 56.2
+| COCO-Train2017 | [yolov5_m_v6.1.pth](https://github.com/bubbliiiing/yolov5-v6.1-pytorch/releases/download/v1.0/yolov5_m_v6.1.pth) | COCO-Val2017 | 640x640 | 44.7 | 63.4 
+| COCO-Train2017 | [yolov5_l_v6.1.pth](https://github.com/bubbliiiing/yolov5-v6.1-pytorch/releases/download/v1.0/yolov5_l_v6.1.pth) | COCO-Val2017 | 640x640 | 48.4 | 66.6 
+| COCO-Train2017 | [yolov5_x_v6.1.pth](https://github.com/bubbliiiing/yolov5-v6.1-pytorch/releases/download/v1.0/yolov5_x_v6.1.pth) | COCO-Val2017 | 640x640 | 50.1 | 68.3 
+
+## 所需环境
+torch==1.2.0    
+为了使用amp混合精度，推荐使用torch1.7.1以上的版本。
+
+## 文件下载
+训练所需的权值可在百度网盘中下载。  
+链接: https://pan.baidu.com/s/1oNl_9Bp6jjYFbGLnELbcjQ    
+提取码: 2dr9    
+
+VOC数据集下载地址如下，里面已经包括了训练集、测试集、验证集（与测试集一样），无需再次划分：  
+链接: https://pan.baidu.com/s/19Mw2u_df_nBzsC2lg20fQA    
+提取码: j5ge   
+
+## 训练步骤
+### a、训练VOC07+12数据集
+1. 数据集的准备   
+**本文使用VOC格式进行训练，训练前需要下载好VOC07+12的数据集，解压后放在根目录**  
+
+2. 数据集的处理   
+修改voc_annotation.py里面的annotation_mode=2，运行voc_annotation.py生成根目录下的2007_train.txt和2007_val.txt。   
+
+3. 开始网络训练   
+train.py的默认参数用于训练VOC数据集，直接运行train.py即可开始训练。   
+
+4. 训练结果预测   
+训练结果预测需要用到两个文件，分别是yolo.py和predict.py。我们首先需要去yolo.py里面修改model_path以及classes_path，这两个参数必须要修改。   
+**model_path指向训练好的权值文件，在logs文件夹里。   
+classes_path指向检测类别所对应的txt。**   
+完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。   
+
+### b、训练自己的数据集
+1. 数据集的准备  
+**本文使用VOC格式进行训练，训练前需要自己制作好数据集，**    
+训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。   
+训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。   
+
+2. 数据集的处理  
+在完成数据集的摆放之后，我们需要利用voc_annotation.py获得训练用的2007_train.txt和2007_val.txt。   
+修改voc_annotation.py里面的参数。第一次训练可以仅修改classes_path，classes_path用于指向检测类别所对应的txt。   
+训练自己的数据集时，可以自己建立一个cls_classes.txt，里面写自己所需要区分的类别。   
+model_data/cls_classes.txt文件内容为：      
+```python
+cat
+dog
+...
+```
+修改voc_annotation.py中的classes_path，使其对应cls_classes.txt，并运行voc_annotation.py。  
+
+3. 开始网络训练  
+**训练的参数较多，均在train.py中，大家可以在下载库后仔细看注释，其中最重要的部分依然是train.py里的classes_path。**  
+**classes_path用于指向检测类别所对应的txt，这个txt和voc_annotation.py里面的txt一样！训练自己的数据集必须要修改！**  
+修改完classes_path后就可以运行train.py开始训练了，在训练多个epoch后，权值会生成在logs文件夹中。  
+
+4. 训练结果预测  
+训练结果预测需要用到两个文件，分别是yolo.py和predict.py。在yolo.py里面修改model_path以及classes_path。  
+**model_path指向训练好的权值文件，在logs文件夹里。  
+classes_path指向检测类别所对应的txt。**  
+完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。  
+
+## 预测步骤
+### a、使用预训练权重
+1. 下载完库后解压，在百度网盘下载权值，放入model_data，运行predict.py，输入  
+```python
+img/street.jpg
+```
+2. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
+### b、使用自己训练的权重
+1. 按照训练步骤训练。  
+2. 在yolo.py文件里面，在如下部分修改model_path和classes_path使其对应训练好的文件；**model_path对应logs文件夹下面的权值文件，classes_path是model_path对应分的类**。  
+```python
+_defaults = {
+    #--------------------------------------------------------------------------#
+    #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
+    #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
+    #
+    #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
+    #   验证集损失较低不代表mAP较高，仅代表该权值在验证集上泛化性能较好。
+    #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
+    #--------------------------------------------------------------------------#
+    "model_path"        : 'model_data/yolov5_s_v6.1.pth',
+    "classes_path"      : 'model_data/coco_classes.txt',
+    #---------------------------------------------------------------------#
+    #   anchors_path代表先验框对应的txt文件，一般不修改。
+    #   anchors_mask用于帮助代码找到对应的先验框，一般不修改。
+    #---------------------------------------------------------------------#
+    "anchors_path"      : 'model_data/yolo_anchors.txt',
+    "anchors_mask"      : [[6, 7, 8], [3, 4, 5], [0, 1, 2]],
+    #---------------------------------------------------------------------#
+    #   输入图片的大小，必须为32的倍数。
+    #---------------------------------------------------------------------#
+    "input_shape"       : [640, 640],
+    #------------------------------------------------------#
+    #   所使用的YoloV5的版本。s、m、l、x
+    #------------------------------------------------------#
+    "phi"               : 's',
+    #---------------------------------------------------------------------#
+    #   只有得分大于置信度的预测框会被保留下来
+    #---------------------------------------------------------------------#
+    "confidence"        : 0.5,
+    #---------------------------------------------------------------------#
+    #   非极大抑制所用到的nms_iou大小
+    #---------------------------------------------------------------------#
+    "nms_iou"           : 0.3,
+    #---------------------------------------------------------------------#
+    #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
+    #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
+    #---------------------------------------------------------------------#
+    "letterbox_image"   : True,
+    #-------------------------------#
+    #   是否使用Cuda
+    #   没有GPU可以设置成False
+    #-------------------------------#
+    "cuda"              : True,
+}
+```
+3. 运行predict.py，输入  
+```python
+img/street.jpg
+```
+4. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
+
+## 评估步骤 
+### a、评估VOC07+12的测试集
+1. 本文使用VOC格式进行评估。VOC07+12已经划分好了测试集，无需利用voc_annotation.py生成ImageSets文件夹下的txt。
+2. 在yolo.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
+3. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
+
+### b、评估自己的数据集
+1. 本文使用VOC格式进行评估。  
+2. 如果在训练前已经运行过voc_annotation.py文件，代码会自动将数据集划分成训练集、验证集和测试集。如果想要修改测试集的比例，可以修改voc_annotation.py文件下的trainval_percent。trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1。train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1。
+3. 利用voc_annotation.py划分测试集后，前往get_map.py文件修改classes_path，classes_path用于指向检测类别所对应的txt，这个txt和训练时的txt一样。评估自己的数据集必须要修改。
+4. 在yolo.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
+5. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
+
+## Reference
+https://github.com/qqwweee/keras-yolo3/  
+https://github.com/Cartucho/mAP  
+https://github.com/Ma-Dan/keras-yolo4  
+https://github.com/ultralytics/yolov5   
--- a/VOCdevkit/VOC2007/Annotations/README.md
+++ b/VOCdevkit/VOC2007/Annotations/README.md
+存放标签文件
\ No newline at end of file
--- a/VOCdevkit/VOC2007/ImageSets/Main/README.md
+++ b/VOCdevkit/VOC2007/ImageSets/Main/README.md
+存放训练索引文件
\ No newline at end of file
--- a/VOCdevkit/VOC2007/JPEGImages/README.md
+++ b/VOCdevkit/VOC2007/JPEGImages/README.md
+存放图片文件
\ No newline at end of file
--- a/get_map.py
+++ b/get_map.py
+import os
+import xml.etree.ElementTree as ET
+
+from PIL import Image
+from tqdm import tqdm
+
+from utils.utils import get_classes
+from utils.utils_map import get_coco_map, get_map
+from yolo import YOLO
+
+if __name__ == "__main__":
+    '''
+    Recall和Precision不像AP是一个面积的概念，因此在门限值（Confidence）不同时，网络的Recall和Precision值是不同的。
+    默认情况下，本代码计算的Recall和Precision代表的是当门限值（Confidence）为0.5时，所对应的Recall和Precision值。
+
+    受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算不同门限条件下的Recall和Precision值
+    因此，本代码获得的map_out/detection-results/里面的txt的框的数量一般会比直接predict多一些，目的是列出所有可能的预测框，
+    '''
+    #------------------------------------------------------------------------------------------------------------------#
+    #   map_mode用于指定该文件运行时计算的内容
+    #   map_mode为0代表整个map计算流程，包括获得预测结果、获得真实框、计算VOC_map。
+    #   map_mode为1代表仅仅获得预测结果。
+    #   map_mode为2代表仅仅获得真实框。
+    #   map_mode为3代表仅仅计算VOC_map。
+    #   map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行
+    #-------------------------------------------------------------------------------------------------------------------#
+    map_mode        = 0
+    #--------------------------------------------------------------------------------------#
+    #   此处的classes_path用于指定需要测量VOC_map的类别
+    #   一般情况下与训练和预测所用的classes_path一致即可
+    #--------------------------------------------------------------------------------------#
+    classes_path    = 'model_data/voc_classes.txt'
+    #--------------------------------------------------------------------------------------#
+    #   MINOVERLAP用于指定想要获得的mAP0.x，mAP0.x的意义是什么请同学们百度一下。
+    #   比如计算mAP0.75，可以设定MINOVERLAP = 0.75。
+    #
+    #   当某一预测框与真实框重合度大于MINOVERLAP时，该预测框被认为是正样本，否则为负样本。
+    #   因此MINOVERLAP的值越大，预测框要预测的越准确才能被认为是正样本，此时算出来的mAP值越低，
+    #--------------------------------------------------------------------------------------#
+    MINOVERLAP      = 0.5
+    #--------------------------------------------------------------------------------------#
+    #   受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算mAP
+    #   因此，confidence的值应当设置的尽量小进而获得全部可能的预测框。
+    #   
+    #   该值一般不调整。因为计算mAP需要获得近乎所有的预测框，此处的confidence不能随便更改。
+    #   想要获得不同门限值下的Recall和Precision值，请修改下方的score_threhold。
+    #--------------------------------------------------------------------------------------#
+    confidence      = 0.001
+    #--------------------------------------------------------------------------------------#
+    #   预测时使用到的非极大抑制值的大小，越大表示非极大抑制越不严格。
+    #   
+    #   该值一般不调整。
+    #--------------------------------------------------------------------------------------#
+    nms_iou         = 0.5
+    #---------------------------------------------------------------------------------------------------------------#
+    #   Recall和Precision不像AP是一个面积的概念，因此在门限值不同时，网络的Recall和Precision值是不同的。
+    #   
+    #   默认情况下，本代码计算的Recall和Precision代表的是当门限值为0.5（此处定义为score_threhold）时所对应的Recall和Precision值。
+    #   因为计算mAP需要获得近乎所有的预测框，上面定义的confidence不能随便更改。
+    #   这里专门定义一个score_threhold用于代表门限值，进而在计算mAP时找到门限值对应的Recall和Precision值。
+    #---------------------------------------------------------------------------------------------------------------#
+    score_threhold  = 0.5
+    #-------------------------------------------------------#
+    #   map_vis用于指定是否开启VOC_map计算的可视化
+    #-------------------------------------------------------#
+    map_vis         = False
+    #-------------------------------------------------------#
+    #   指向VOC数据集所在的文件夹
+    #   默认指向根目录下的VOC数据集
+    #-------------------------------------------------------#
+    VOCdevkit_path  = 'VOCdevkit'
+    #-------------------------------------------------------#
+    #   结果输出的文件夹，默认为map_out
+    #-------------------------------------------------------#
+    map_out_path    = 'map_out'
+
+    image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split()
+
+    if not os.path.exists(map_out_path):
+        os.makedirs(map_out_path)
+    if not os.path.exists(os.path.join(map_out_path, 'ground-truth')):
+        os.makedirs(os.path.join(map_out_path, 'ground-truth'))
+    if not os.path.exists(os.path.join(map_out_path, 'detection-results')):
+        os.makedirs(os.path.join(map_out_path, 'detection-results'))
+    if not os.path.exists(os.path.join(map_out_path, 'images-optional')):
+        os.makedirs(os.path.join(map_out_path, 'images-optional'))
+
+    class_names, _ = get_classes(classes_path)
+
+    if map_mode == 0 or map_mode == 1:
+        print("Load model.")
+        yolo = YOLO(confidence = confidence, nms_iou = nms_iou)
+        print("Load model done.")
+
+        print("Get predict result.")
+        for image_id in tqdm(image_ids):
+            image_path  = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg")
+            image       = Image.open(image_path)
+            if map_vis:
+                image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg"))
+            yolo.get_map_txt(image_id, image, class_names, map_out_path)
+        print("Get predict result done.")
+        
+    if map_mode == 0 or map_mode == 2:
+        print("Get ground truth result.")
+        for image_id in tqdm(image_ids):
+            with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
+                root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot()
+                for obj in root.findall('object'):
+                    difficult_flag = False
+                    if obj.find('difficult')!=None:
+                        difficult = obj.find('difficult').text
+                        if int(difficult)==1:
+                            difficult_flag = True
+                    obj_name = obj.find('name').text
+                    if obj_name not in class_names:
+                        continue
+                    bndbox  = obj.find('bndbox')
+                    left    = bndbox.find('xmin').text
+                    top     = bndbox.find('ymin').text
+                    right   = bndbox.find('xmax').text
+                    bottom  = bndbox.find('ymax').text
+
+                    if difficult_flag:
+                        new_f.write("%s %s %s %s %s difficult\n" % (obj_name, left, top, right, bottom))
+                    else:
+                        new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
+        print("Get ground truth result done.")
+
+    if map_mode == 0 or map_mode == 3:
+        print("Get map.")
+        get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path)
+        print("Get map done.")
+
+    if map_mode == 4:
+        print("Get map.")
+        get_coco_map(class_names = class_names, path = map_out_path)
+        print("Get map done.")
--- a/img/street.jpg
+++ b/img/street.jpg
--- a/kmeans_for_anchors.py
+++ b/kmeans_for_anchors.py
+#-------------------------------------------------------------------------------------------------------#
+#   kmeans虽然会对数据集中的框进行聚类，但是很多数据集由于框的大小相近，聚类出来的9个框相差不大，
+#   这样的框反而不利于模型的训练。因为不同的特征层适合不同大小的先验框，shape越小的特征层适合越大的先验框
+#   原始网络的先验框已经按大中小比例分配好了，不进行聚类也会有非常好的效果。
+#-------------------------------------------------------------------------------------------------------#
+import glob
+import xml.etree.ElementTree as ET
+
+import matplotlib.pyplot as plt
+import numpy as np
+from tqdm import tqdm
+
+
+def cas_ratio(box,cluster):
+    ratios_of_box_cluster = box / cluster
+    ratios_of_cluster_box = cluster / box
+    ratios = np.concatenate([ratios_of_box_cluster, ratios_of_cluster_box], axis = -1)
+
+    return np.max(ratios, -1)
+
+def avg_ratio(box,cluster):
+    return np.mean([np.min(cas_ratio(box[i],cluster)) for i in range(box.shape[0])])
+
+def kmeans(box,k):
+    #-------------------------------------------------------------#
+    #   取出一共有多少框
+    #-------------------------------------------------------------#
+    row = box.shape[0]
+    
+    #-------------------------------------------------------------#
+    #   每个框各个点的位置
+    #-------------------------------------------------------------#
+    distance = np.empty((row,k))
+    
+    #-------------------------------------------------------------#
+    #   最后的聚类位置
+    #-------------------------------------------------------------#
+    last_clu = np.zeros((row,))
+
+    np.random.seed()
+
+    #-------------------------------------------------------------#
+    #   随机选5个当聚类中心
+    #-------------------------------------------------------------#
+    cluster = box[np.random.choice(row,k,replace = False)]
+
+    iter = 0
+    while True:
+        #-------------------------------------------------------------#
+        #   计算当前框和先验框的宽高比例
+        #-------------------------------------------------------------#
+        for i in range(row):
+            distance[i] = cas_ratio(box[i],cluster)
+        
+        #-------------------------------------------------------------#
+        #   取出最小点
+        #-------------------------------------------------------------#
+        near = np.argmin(distance,axis=1)
+
+        if (last_clu == near).all():
+            break
+        
+        #-------------------------------------------------------------#
+        #   求每一个类的中位点
+        #-------------------------------------------------------------#
+        for j in range(k):
+            cluster[j] = np.median(
+                box[near == j],axis=0)
+
+        last_clu = near
+        if iter % 5 == 0:
+            print('iter: {:d}. avg_ratio:{:.2f}'.format(iter, avg_ratio(box,cluster)))
+        iter += 1
+
+    return cluster, near
+
+def load_data(path):
+    data = []
+    #-------------------------------------------------------------#
+    #   对于每一个xml都寻找box
+    #-------------------------------------------------------------#
+    for xml_file in tqdm(glob.glob('{}/*xml'.format(path))):
+        tree = ET.parse(xml_file)
+        height = int(tree.findtext('./size/height'))
+        width = int(tree.findtext('./size/width'))
+        if height<=0 or width<=0:
+            continue
+        
+        #-------------------------------------------------------------#
+        #   对于每一个目标都获得它的宽高
+        #-------------------------------------------------------------#
+        for obj in tree.iter('object'):
+            xmin = int(float(obj.findtext('bndbox/xmin'))) / width
+            ymin = int(float(obj.findtext('bndbox/ymin'))) / height
+            xmax = int(float(obj.findtext('bndbox/xmax'))) / width
+            ymax = int(float(obj.findtext('bndbox/ymax'))) / height
+
+            xmin = np.float64(xmin)
+            ymin = np.float64(ymin)
+            xmax = np.float64(xmax)
+            ymax = np.float64(ymax)
+            # 得到宽高
+            data.append([xmax-xmin,ymax-ymin])
+    return np.array(data)
+
+if __name__ == '__main__':
+    np.random.seed(0)
+    #-------------------------------------------------------------#
+    #   运行该程序会计算'./VOCdevkit/VOC2007/Annotations'的xml
+    #   会生成yolo_anchors.txt
+    #-------------------------------------------------------------#
+    input_shape = [640, 640]
+    anchors_num = 9
+    #-------------------------------------------------------------#
+    #   载入数据集，可以使用VOC的xml
+    #-------------------------------------------------------------#
+    path        = 'VOCdevkit/VOC2007/Annotations'
+    
+    #-------------------------------------------------------------#
+    #   载入所有的xml
+    #   存储格式为转化为比例后的width,height
+    #-------------------------------------------------------------#
+    print('Load xmls.')
+    data = load_data(path)
+    print('Load xmls done.')
+    
+    #-------------------------------------------------------------#
+    #   使用k聚类算法
+    #-------------------------------------------------------------#
+    print('K-means boxes.')
+    cluster, near   = kmeans(data, anchors_num)
+    print('K-means boxes done.')
+    data            = data * np.array([input_shape[1], input_shape[0]])
+    cluster         = cluster * np.array([input_shape[1], input_shape[0]])
+
+    #-------------------------------------------------------------#
+    #   绘图
+    #-------------------------------------------------------------#
+    for j in range(anchors_num):
+        plt.scatter(data[near == j][:,0], data[near == j][:,1])
+        plt.scatter(cluster[j][0], cluster[j][1], marker='x', c='black')
+    plt.savefig("kmeans_for_anchors.jpg")
+    plt.show()
+    print('Save kmeans_for_anchors.jpg in root dir.')
+
+    cluster = cluster[np.argsort(cluster[:, 0] * cluster[:, 1])]
+    print('avg_ratio:{:.2f}'.format(avg_ratio(data, cluster)))
+    print(cluster)
+
+    f = open("yolo_anchors.txt", 'w')
+    row = np.shape(cluster)[0]
+    for i in range(row):
+        if i == 0:
+            x_y = "%d,%d" % (cluster[i][0], cluster[i][1])
+        else:
+            x_y = ", %d,%d" % (cluster[i][0], cluster[i][1])
+        f.write(x_y)
+    f.close()
--- a/logs/README.md
+++ b/logs/README.md
+训练好的权重会保存在这里
--- a/model_data/coco_classes.txt
+++ b/model_data/coco_classes.txt
+person
+bicycle
+car
+motorbike
+aeroplane
+bus
+train
+truck
+boat
+traffic light
+fire hydrant
+stop sign
+parking meter
+bench
+bird
+cat
+dog
+horse
+sheep
+cow
+elephant
+bear
+zebra
+giraffe
+backpack
+umbrella
+handbag
+tie
+suitcase
+frisbee
+skis
+snowboard
+sports ball
+kite
+baseball bat
+baseball glove
+skateboard
+surfboard
+tennis racket
+bottle
+wine glass
+cup
+fork
+knife
+spoon
+bowl
+banana
+apple
+sandwich
+orange
+broccoli
+carrot
+hot dog
+pizza
+donut
+cake
+chair
+sofa
+pottedplant
+bed
+diningtable
+toilet
+tvmonitor
+laptop
+mouse
+remote
+keyboard
+cell phone
+microwave
+oven
+toaster
+sink
+refrigerator
+book
+clock
+vase
+scissors
+teddy bear
+hair drier
+toothbrush
--- a/model_data/simhei.ttf
+++ b/model_data/simhei.ttf
--- a/model_data/voc_classes.txt
+++ b/model_data/voc_classes.txt
+aeroplane
+bicycle
+bird
+boat
+bottle
+bus
+car
+cat
+chair
+cow
+diningtable
+dog
+horse
+motorbike
+person
+pottedplant
+sheep
+sofa
+train
+tvmonitor
\ No newline at end of file
--- a/model_data/yolo_anchors.txt
+++ b/model_data/yolo_anchors.txt
+12, 16,  19, 36,  40, 28,  36, 75,  76, 55,  72, 146,  142, 110,  192, 243,  459, 401
\ No newline at end of file
--- a/nets/CSPdarknet.py
+++ b/nets/CSPdarknet.py
+import torch
+import torch.nn as nn
+
+
+def autopad(k, p=None):
+    if p is None:
+        p = k // 2 if isinstance(k, int) else [x // 2 for x in k] 
+    return p
+
+class SiLU(nn.Module):  
+    @staticmethod
+    def forward(x):
+        return x * torch.sigmoid(x)
+    
+class Conv(nn.Module):
+    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=SiLU()):  # ch_in, ch_out, kernel, stride, padding, groups
+        super(Conv, self).__init__()
+        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
+        self.bn = nn.BatchNorm2d(c2)
+        self.act = nn.LeakyReLU(0.1, inplace=True) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
+
+    def forward(self, x):
+        return self.act(self.bn(self.conv(x)))
+
+    def fuseforward(self, x):
+        return self.act(self.conv(x))
+    
+class RCSPDark_Block(nn.Module):
+    def __init__(self, c1, c2, c3, n=4, e=0.5, ids=[0]):
+        super(RCSPDark_Block, self).__init__()
+        c_ = int(c1 * e)
+        
+        self.ids = ids
+        self.cv1 = Conv(c1, c_, 1, 1)
+        self.cv2 = Conv(c1, c_, 1, 1)
+        self.cv3 = nn.ModuleList(
+            [Conv(c_ if i ==0 else c2, c2, 3, 1) for i in range(n)]
+        )
+        self.cv4 = Conv(c_ * 2 + c2 * (len(ids) - 2), c3, 1, 1)
+
+    def forward(self, x):
+        x_1 = self.cv1(x)
+        x_2 = self.cv2(x)
+        
+        x_all = [x_1, x_2]
+        for i in range(len(self.cv3)):
+            x_2 = self.cv3[i](x_2)
+            x_all.append(x_2)
+            
+        out = self.cv4(torch.cat([x_all[id] for id in self.ids], 1))
+        return out
+
+class MP(nn.Module):
+    def __init__(self, k=2):
+        super(MP, self).__init__()
+        self.m = nn.MaxPool2d(kernel_size=k, stride=k)
+
+    def forward(self, x):
+        return self.m(x)
+    
+class RCSPDark_Transition(nn.Module):
+    def __init__(self, c1, c2):
+        super(RCSPDark_Transition, self).__init__()
+        self.cv1 = Conv(c1, c2, 1, 1)
+        self.cv2 = Conv(c1, c2, 1, 1)
+        self.cv3 = Conv(c2, c2, 3, 2)
+        
+        self.mp  = MP()
+
+    def forward(self, x):
+        x_1 = self.mp(x)
+        x_1 = self.cv1(x_1)
+        
+        x_2 = self.cv2(x)
+        x_2 = self.cv3(x_2)
+        
+        return torch.cat([x_2, x_1], 1)
+    
+class CSPDarknet(nn.Module):
+    def __init__(self, base_channels, pretrained=False):
+        super().__init__()
+        #-----------------------------------------------#
+        #   输入图片是640, 640, 3
+        #   初始的基本通道是64
+        #-----------------------------------------------#
+        
+        self.stem = nn.Sequential(
+            Conv(3, base_channels, 3, 1),
+            Conv(base_channels, base_channels * 2, 3, 2),
+            Conv(base_channels * 2, base_channels * 2, 3, 1),
+        )
+        self.dark2 = nn.Sequential(
+            Conv(base_channels * 2, base_channels * 4, 3, 2),
+            RCSPDark_Block(base_channels * 4, base_channels * 2, base_channels * 8, ids=[-1, -3, -5, -6]),
+        )
+        self.dark3 = nn.Sequential(
+            RCSPDark_Transition(base_channels * 8, base_channels * 4),
+            RCSPDark_Block(base_channels * 8, base_channels * 4, base_channels * 16, ids=[-1, -3, -5, -6]),
+        )
+        self.dark4 = nn.Sequential(
+            RCSPDark_Transition(base_channels * 16, base_channels * 8),
+            RCSPDark_Block(base_channels * 16, base_channels * 8, base_channels * 32, ids=[-1, -3, -5, -6]),
+        )
+        self.dark5 = nn.Sequential(
+            RCSPDark_Transition(base_channels * 32, base_channels * 16),
+            RCSPDark_Block(base_channels * 32, base_channels * 8, base_channels * 32, e=1/4, ids=[-1, -3, -5, -6]),
+        )
+        
+        if pretrained:
+            phi = 'l'
+            url = {
+                "l" : 'https://github.com/bubbliiiing/yolov7-pytorch/releases/download/v1.0/cspdarknet_backbone_l.pth',
+            }[phi]
+            checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", model_dir="./model_data")
+            self.load_state_dict(checkpoint, strict=False)
+            print("Load weights from ", url.split('/')[-1])
+
+    def forward(self, x):
+        x = self.stem[0](x)
+        
+        x = self.stem[1](x)
+        x = self.stem[2](x)
+        x = self.dark2(x)
+        #-----------------------------------------------#
+        #   dark3的输出为80, 80, 256，是一个有效特征层
+        #-----------------------------------------------#
+        x = self.dark3(x)
+        feat1 = x
+        #-----------------------------------------------#
+        #   dark4的输出为40, 40, 512，是一个有效特征层
+        #-----------------------------------------------#
+        x = self.dark4(x)
+        feat2 = x
+        #-----------------------------------------------#
+        #   dark5的输出为20, 20, 1024，是一个有效特征层
+        #-----------------------------------------------#
+        x = self.dark5(x)
+        feat3 = x
+        return feat1, feat2, feat3
--- a/nets/__init__.py
+++ b/nets/__init__.py
+#
\ No newline at end of file
--- a/nets/yolo.py
+++ b/nets/yolo.py
+from regex import X
+import torch
+import torch.nn as nn
+import numpy as np
+
+from nets.CSPdarknet import CSPDarknet, Conv, MP, RCSPDark_Block, RCSPDark_Transition, autopad, SiLU
+
+class SPPCSPC(nn.Module):
+    # CSP https://github.com/WongKinYiu/CrossStagePartialNetworks
+    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
+        super(SPPCSPC, self).__init__()
+        c_ = int(2 * c2 * e)  # hidden channels
+        self.cv1 = Conv(c1, c_, 1, 1)
+        self.cv2 = Conv(c1, c_, 1, 1)
+        self.cv3 = Conv(c_, c_, 3, 1)
+        self.cv4 = Conv(c_, c_, 1, 1)
+        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
+        self.cv5 = Conv(4 * c_, c_, 1, 1)
+        self.cv6 = Conv(c_, c_, 3, 1)
+        self.cv7 = Conv(2 * c_, c2, 1, 1)
+
+    def forward(self, x):
+        x1 = self.cv4(self.cv3(self.cv1(x)))
+        y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
+        y2 = self.cv2(x)
+        return self.cv7(torch.cat((y1, y2), dim=1))
+
+class RepConv(nn.Module):
+    # Represented convolution
+    # https://arxiv.org/abs/2101.03697
+
+    def __init__(self, c1, c2, k=3, s=1, p=None, g=1, act=SiLU(), deploy=False):
+        super(RepConv, self).__init__()
+
+        self.deploy = deploy
+        self.groups = g
+        self.in_channels = c1
+        self.out_channels = c2
+
+        assert k == 3
+        assert autopad(k, p) == 1
+
+        padding_11 = autopad(k, p) - k // 2
+
+        self.act = nn.LeakyReLU(0.1, inplace=True) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
+
+        if deploy:
+            self.rbr_reparam = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=True)
+
+        else:
+            self.rbr_identity = (nn.BatchNorm2d(num_features=c1) if c2 == c1 and s == 1 else None)
+
+            self.rbr_dense = nn.Sequential(
+                nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False),
+                nn.BatchNorm2d(num_features=c2),
+            )
+
+            self.rbr_1x1 = nn.Sequential(
+                nn.Conv2d( c1, c2, 1, s, padding_11, groups=g, bias=False),
+                nn.BatchNorm2d(num_features=c2),
+            )
+
+    def forward(self, inputs):
+        if hasattr(self, "rbr_reparam"):
+            return self.act(self.rbr_reparam(inputs))
+
+        if self.rbr_identity is None:
+            id_out = 0
+        else:
+            id_out = self.rbr_identity(inputs)
+
+        return self.act(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out)
+    
+    def get_equivalent_kernel_bias(self):
+        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
+        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
+        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
+        return (
+            kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid,
+            bias3x3 + bias1x1 + biasid,
+        )
+
+    def _pad_1x1_to_3x3_tensor(self, kernel1x1):
+        if kernel1x1 is None:
+            return 0
+        else:
+            return nn.functional.pad(kernel1x1, [1, 1, 1, 1])
+
+    def _fuse_bn_tensor(self, branch):
+        if branch is None:
+            return 0, 0
+        if isinstance(branch, nn.Sequential):
+            kernel = branch[0].weight
+            running_mean = branch[1].running_mean
+            running_var = branch[1].running_var
+            gamma = branch[1].weight
+            beta = branch[1].bias
+            eps = branch[1].eps
+        else:
+            assert isinstance(branch, nn.BatchNorm2d)
+            if not hasattr(self, "id_tensor"):
+                input_dim = self.in_channels // self.groups
+                kernel_value = np.zeros(
+                    (self.in_channels, input_dim, 3, 3), dtype=np.float32
+                )
+                for i in range(self.in_channels):
+                    kernel_value[i, i % input_dim, 1, 1] = 1
+                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
+            kernel = self.id_tensor
+            running_mean = branch.running_mean
+            running_var = branch.running_var
+            gamma = branch.weight
+            beta = branch.bias
+            eps = branch.eps
+        std = (running_var + eps).sqrt()
+        t = (gamma / std).reshape(-1, 1, 1, 1)
+        return kernel * t, beta - running_mean * gamma / std
+
+    def repvgg_convert(self):
+        kernel, bias = self.get_equivalent_kernel_bias()
+        return (
+            kernel.detach().cpu().numpy(),
+            bias.detach().cpu().numpy(),
+        )
+
+    def fuse_conv_bn(self, conv, bn):
+
+        std = (bn.running_var + bn.eps).sqrt()
+        bias = bn.bias - bn.running_mean * bn.weight / std
+
+        t = (bn.weight / std).reshape(-1, 1, 1, 1)
+        weights = conv.weight * t
+
+        bn = nn.Identity()
+        conv = nn.Conv2d(in_channels = conv.in_channels,
+                              out_channels = conv.out_channels,
+                              kernel_size = conv.kernel_size,
+                              stride=conv.stride,
+                              padding = conv.padding,
+                              dilation = conv.dilation,
+                              groups = conv.groups,
+                              bias = True,
+                              padding_mode = conv.padding_mode)
+
+        conv.weight = torch.nn.Parameter(weights)
+        conv.bias = torch.nn.Parameter(bias)
+        return conv
+
+    def fuse_repvgg_block(self):    
+        if self.deploy:
+            return
+        print(f"RepConv.fuse_repvgg_block")
+                
+        self.rbr_dense = self.fuse_conv_bn(self.rbr_dense[0], self.rbr_dense[1])
+        
+        self.rbr_1x1 = self.fuse_conv_bn(self.rbr_1x1[0], self.rbr_1x1[1])
+        rbr_1x1_bias = self.rbr_1x1.bias
+        weight_1x1_expanded = torch.nn.functional.pad(self.rbr_1x1.weight, [1, 1, 1, 1])
+        
+        # Fuse self.rbr_identity
+        if (isinstance(self.rbr_identity, nn.BatchNorm2d) or isinstance(self.rbr_identity, nn.modules.batchnorm.SyncBatchNorm)):
+            # print(f"fuse: rbr_identity == BatchNorm2d or SyncBatchNorm")
+            identity_conv_1x1 = nn.Conv2d(
+                    in_channels=self.in_channels,
+                    out_channels=self.out_channels,
+                    kernel_size=1,
+                    stride=1,
+                    padding=0,
+                    groups=self.groups, 
+                    bias=False)
+            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.to(self.rbr_1x1.weight.data.device)
+            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.squeeze().squeeze()
+            # print(f" identity_conv_1x1.weight = {identity_conv_1x1.weight.shape}")
+            identity_conv_1x1.weight.data.fill_(0.0)
+            identity_conv_1x1.weight.data.fill_diagonal_(1.0)
+            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.unsqueeze(2).unsqueeze(3)
+            # print(f" identity_conv_1x1.weight = {identity_conv_1x1.weight.shape}")
+
+            identity_conv_1x1 = self.fuse_conv_bn(identity_conv_1x1, self.rbr_identity)
+            bias_identity_expanded = identity_conv_1x1.bias
+            weight_identity_expanded = torch.nn.functional.pad(identity_conv_1x1.weight, [1, 1, 1, 1])            
+        else:
+            # print(f"fuse: rbr_identity != BatchNorm2d, rbr_identity = {self.rbr_identity}")
+            bias_identity_expanded = torch.nn.Parameter( torch.zeros_like(rbr_1x1_bias) )
+            weight_identity_expanded = torch.nn.Parameter( torch.zeros_like(weight_1x1_expanded) )            
+        
+
+        #print(f"self.rbr_1x1.weight = {self.rbr_1x1.weight.shape}, ")
+        #print(f"weight_1x1_expanded = {weight_1x1_expanded.shape}, ")
+        #print(f"self.rbr_dense.weight = {self.rbr_dense.weight.shape}, ")
+
+        self.rbr_dense.weight = torch.nn.Parameter(self.rbr_dense.weight + weight_1x1_expanded + weight_identity_expanded)
+        self.rbr_dense.bias = torch.nn.Parameter(self.rbr_dense.bias + rbr_1x1_bias + bias_identity_expanded)
+                
+        self.rbr_reparam = self.rbr_dense
+        self.deploy = True
+
+        if self.rbr_identity is not None:
+            del self.rbr_identity
+            self.rbr_identity = None
+
+        if self.rbr_1x1 is not None:
+            del self.rbr_1x1
+            self.rbr_1x1 = None
+
+        if self.rbr_dense is not None:
+            del self.rbr_dense
+            self.rbr_dense = None
+            
+def fuse_conv_and_bn(conv, bn):
+    # Fuse convolution and batchnorm layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/
+    fusedconv = nn.Conv2d(conv.in_channels,
+                          conv.out_channels,
+                          kernel_size=conv.kernel_size,
+                          stride=conv.stride,
+                          padding=conv.padding,
+                          groups=conv.groups,
+                          bias=True).requires_grad_(False).to(conv.weight.device)
+
+    # prepare filters
+    w_conv = conv.weight.clone().view(conv.out_channels, -1)
+    w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
+    fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))
+
+    # prepare spatial bias
+    b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device) if conv.bias is None else conv.bias
+    b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
+    fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
+
+    return fusedconv
+
+#---------------------------------------------------#
+#   yolo_body
+#---------------------------------------------------#
+class YoloBody(nn.Module):
+    def __init__(self, anchors_mask, num_classes, pretrained=False):
+        super(YoloBody, self).__init__()
+        base_channels   = 32
+        #-----------------------------------------------#
+        #   输入图片是640, 640, 3
+        #   初始的基本通道是64
+        #-----------------------------------------------#
+
+        #---------------------------------------------------#   
+        #   生成CSPdarknet53的主干模型
+        #   获得三个有效特征层，他们的shape分别是：
+        #   52,52,256
+        #   26,26,512
+        #   13,13,1024
+        #---------------------------------------------------#
+        # self.backbone   = CSPDarknet(model, base_channels, base_depth)
+        self.backbone   = CSPDarknet(base_channels)
+
+        self.upsample   = nn.Upsample(scale_factor=2, mode="nearest")
+
+        self.sppcspc                = SPPCSPC(base_channels * 32, base_channels * 16)
+        self.conv_for_P5            = Conv(base_channels * 16, base_channels * 8)
+        self.conv_for_feat2         = Conv(base_channels * 32, base_channels * 8)
+        self.conv3_for_upsample1    = RCSPDark_Block(base_channels * 16, base_channels * 4, base_channels * 8, ids=[-1, -2, -3, -4, -5, -6])
+
+        self.conv_for_P4            = Conv(base_channels * 8, base_channels * 4)
+        self.conv_for_feat1         = Conv(base_channels * 16, base_channels * 4)
+        self.conv3_for_upsample2    = RCSPDark_Block(base_channels * 8, base_channels * 2, base_channels * 4, ids=[-1, -2, -3, -4, -5, -6])
+
+        self.down_sample1           = RCSPDark_Transition(base_channels * 4, base_channels * 4)
+        self.conv3_for_downsample1  = RCSPDark_Block(base_channels * 16, base_channels * 4, base_channels * 8, ids=[-1, -2, -3, -4, -5, -6])
+
+        self.down_sample2           = RCSPDark_Transition(base_channels * 8, base_channels * 8)
+        self.conv3_for_downsample2  = RCSPDark_Block(base_channels * 32, base_channels * 8, base_channels * 16, ids=[-1, -2, -3, -4, -5, -6])
+
+        self.rep_conv_1 = RepConv(base_channels * 4, base_channels * 8, 3, 1)
+        self.rep_conv_2 = RepConv(base_channels * 8, base_channels * 16, 3, 1)
+        self.rep_conv_3 = RepConv(base_channels * 16, base_channels * 32, 3, 1)
+
+        self.yolo_head_P3 = nn.Conv2d(base_channels * 8, len(anchors_mask[2]) * (5 + num_classes), 1)
+        self.yolo_head_P4 = nn.Conv2d(base_channels * 16, len(anchors_mask[1]) * (5 + num_classes), 1)
+        self.yolo_head_P5 = nn.Conv2d(base_channels * 32, len(anchors_mask[0]) * (5 + num_classes), 1)
+
+    def fuse(self):  # fuse model Conv2d() + BatchNorm2d() layers
+        print('Fusing layers... ')
+        for m in self.modules():
+            if isinstance(m, RepConv):
+                #print(f" fuse_repvgg_block")
+                m.fuse_repvgg_block()
+            elif type(m) is Conv and hasattr(m, 'bn'):
+                m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv
+                delattr(m, 'bn')  # remove batchnorm
+                m.forward = m.fuseforward  # update forward
+        return self
+    
+    def forward(self, x):
+        #  backbone
+        feat1, feat2, feat3 = self.backbone.forward(x)
+        
+        P5          = self.sppcspc(feat3)
+        P5_conv     = self.conv_for_P5(P5)
+        P5_upsample = self.upsample(P5_conv)
+        P4          = torch.cat([self.conv_for_feat2(feat2), P5_upsample], 1)
+        P4          = self.conv3_for_upsample1(P4)
+
+        P4_conv     = self.conv_for_P4(P4)
+        P4_upsample = self.upsample(P4_conv)
+        P3          = torch.cat([self.conv_for_feat1(feat1), P4_upsample], 1)
+        P3          = self.conv3_for_upsample2(P3)
+
+        P3_downsample = self.down_sample1(P3)
+        P4 = torch.cat([P3_downsample, P4], 1)
+        P4 = self.conv3_for_downsample1(P4)
+
+        P4_downsample = self.down_sample2(P4)
+        P5 = torch.cat([P4_downsample, P5], 1)
+        P5 = self.conv3_for_downsample2(P5)
+        
+        P3 = self.rep_conv_1(P3)
+        P4 = self.rep_conv_2(P4)
+        P5 = self.rep_conv_3(P5)
+        #---------------------------------------------------#
+        #   第三个特征层
+        #   y3=(batch_size,75,52,52)
+        #---------------------------------------------------#
+        out2 = self.yolo_head_P3(P3)
+        #---------------------------------------------------#
+        #   第二个特征层
+        #   y2=(batch_size,75,26,26)
+        #---------------------------------------------------#
+        out1 = self.yolo_head_P4(P4)
+        #---------------------------------------------------#
+        #   第一个特征层
+        #   y1=(batch_size,75,13,13)
+        #---------------------------------------------------#
+        out0 = self.yolo_head_P5(P5)
+        return out0, out1, out2
+
--- a/nets/yolo_training.py
+++ b/nets/yolo_training.py
--- a/predict.py
+++ b/predict.py
+#-----------------------------------------------------------------------#
+#   predict.py将单张图片预测、摄像头检测、FPS测试和目录遍历检测等功能
+#   整合到了一个py文件中，通过指定mode进行模式的修改。
+#-----------------------------------------------------------------------#
+import time
+
+import cv2
+import numpy as np
+from PIL import Image
+
+from yolo import YOLO
+
+if __name__ == "__main__":
+    yolo = YOLO()
+    #----------------------------------------------------------------------------------------------------------#
+    #   mode用于指定测试的模式：
+    #   'predict'           表示单张图片预测，如果想对预测过程进行修改，如保存图片，截取对象等，可以先看下方详细的注释
+    #   'video'             表示视频检测，可调用摄像头或者视频进行检测，详情查看下方注释。
+    #   'fps'               表示测试fps，使用的图片是img里面的street.jpg，详情查看下方注释。
+    #   'dir_predict'       表示遍历文件夹进行检测并保存。默认遍历img文件夹，保存img_out文件夹，详情查看下方注释。
+    #   'heatmap'           表示进行预测结果的热力图可视化，详情查看下方注释。
+    #   'export_onnx'       表示将模型导出为onnx，需要pytorch1.7.1以上。
+    #----------------------------------------------------------------------------------------------------------#
+    mode = "predict"
+    #-------------------------------------------------------------------------#
+    #   crop                指定了是否在单张图片预测后对目标进行截取
+    #   count               指定了是否进行目标的计数
+    #   crop、count仅在mode='predict'时有效
+    #-------------------------------------------------------------------------#
+    crop            = False
+    count           = False
+    #----------------------------------------------------------------------------------------------------------#
+    #   video_path          用于指定视频的路径，当video_path=0时表示检测摄像头
+    #                       想要检测视频，则设置如video_path = "xxx.mp4"即可，代表读取出根目录下的xxx.mp4文件。
+    #   video_save_path     表示视频保存的路径，当video_save_path=""时表示不保存
+    #                       想要保存视频，则设置如video_save_path = "yyy.mp4"即可，代表保存为根目录下的yyy.mp4文件。
+    #   video_fps           用于保存的视频的fps
+    #
+    #   video_path、video_save_path和video_fps仅在mode='video'时有效
+    #   保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。
+    #----------------------------------------------------------------------------------------------------------#
+    video_path      = 0
+    video_save_path = ""
+    video_fps       = 25.0
+    #----------------------------------------------------------------------------------------------------------#
+    #   test_interval       用于指定测量fps的时候，图片检测的次数。理论上test_interval越大，fps越准确。
+    #   fps_image_path      用于指定测试的fps图片
+    #   
+    #   test_interval和fps_image_path仅在mode='fps'有效
+    #----------------------------------------------------------------------------------------------------------#
+    test_interval   = 100
+    fps_image_path  = "img/street.jpg"
+    #-------------------------------------------------------------------------#
+    #   dir_origin_path     指定了用于检测的图片的文件夹路径
+    #   dir_save_path       指定了检测完图片的保存路径
+    #   
+    #   dir_origin_path和dir_save_path仅在mode='dir_predict'时有效
+    #-------------------------------------------------------------------------#
+    dir_origin_path = "img/"
+    dir_save_path   = "img_out/"
+    #-------------------------------------------------------------------------#
+    #   heatmap_save_path   热力图的保存路径，默认保存在model_data下
+    #   
+    #   heatmap_save_path仅在mode='heatmap'有效
+    #-------------------------------------------------------------------------#
+    heatmap_save_path = "model_data/heatmap_vision.png"
+    #-------------------------------------------------------------------------#
+    #   simplify            使用Simplify onnx
+    #   onnx_save_path      指定了onnx的保存路径
+    #-------------------------------------------------------------------------#
+    simplify        = True
+    onnx_save_path  = "model_data/models.onnx"
+
+    if mode == "predict":
+        '''
+        1、如果想要进行检测完的图片的保存，利用r_image.save("img.jpg")即可保存，直接在predict.py里进行修改即可。 
+        2、如果想要获得预测框的坐标，可以进入yolo.detect_image函数，在绘图部分读取top，left，bottom，right这四个值。
+        3、如果想要利用预测框截取下目标，可以进入yolo.detect_image函数，在绘图部分利用获取到的top，left，bottom，right这四个值
+        在原图上利用矩阵的方式进行截取。
+        4、如果想要在预测图上写额外的字，比如检测到的特定目标的数量，可以进入yolo.detect_image函数，在绘图部分对predicted_class进行判断，
+        比如判断if predicted_class == 'car': 即可判断当前目标是否为车，然后记录数量即可。利用draw.text即可写字。
+        '''
+        while True:
+            img = input('Input image filename:')
+            try:
+                image = Image.open(img)
+            except:
+                print('Open Error! Try again!')
+                continue
+            else:
+                r_image = yolo.detect_image(image, crop = crop, count=count)
+                r_image.show()
+
+    elif mode == "video":
+        capture = cv2.VideoCapture(video_path)
+        if video_save_path!="":
+            fourcc  = cv2.VideoWriter_fourcc(*'XVID')
+            size    = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
+            out     = cv2.VideoWriter(video_save_path, fourcc, video_fps, size)
+
+        ref, frame = capture.read()
+        if not ref:
+            raise ValueError("未能正确读取摄像头（视频），请注意是否正确安装摄像头（是否正确填写视频路径）。")
+
+        fps = 0.0
+        while(True):
+            t1 = time.time()
+            # 读取某一帧
+            ref, frame = capture.read()
+            if not ref:
+                break
+            # 格式转变，BGRtoRGB
+            frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
+            # 转变成Image
+            frame = Image.fromarray(np.uint8(frame))
+            # 进行检测
+            frame = np.array(yolo.detect_image(frame))
+            # RGBtoBGR满足opencv显示格式
+            frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR)
+            
+            fps  = ( fps + (1./(time.time()-t1)) ) / 2
+            print("fps= %.2f"%(fps))
+            frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+            
+            cv2.imshow("video",frame)
+            c= cv2.waitKey(1) & 0xff 
+            if video_save_path!="":
+                out.write(frame)
+
+            if c==27:
+                capture.release()
+                break
+
+        print("Video Detection Done!")
+        capture.release()
+        if video_save_path!="":
+            print("Save processed video to the path :" + video_save_path)
+            out.release()
+        cv2.destroyAllWindows()
+        
+    elif mode == "fps":
+        img = Image.open(fps_image_path)
+        tact_time = yolo.get_FPS(img, test_interval)
+        print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1')
+
+    elif mode == "dir_predict":
+        import os
+
+        from tqdm import tqdm
+
+        img_names = os.listdir(dir_origin_path)
+        for img_name in tqdm(img_names):
+            if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')):
+                image_path  = os.path.join(dir_origin_path, img_name)
+                image       = Image.open(image_path)
+                r_image     = yolo.detect_image(image)
+                if not os.path.exists(dir_save_path):
+                    os.makedirs(dir_save_path)
+                r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0)
+
+    elif mode == "heatmap":
+        while True:
+            img = input('Input image filename:')
+            try:
+                image = Image.open(img)
+            except:
+                print('Open Error! Try again!')
+                continue
+            else:
+                yolo.detect_heatmap(image, heatmap_save_path)
+                
+    elif mode == "export_onnx":
+        yolo.convert_to_onnx(simplify, onnx_save_path)
+        
+    else:
+        raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps', 'heatmap', 'export_onnx', 'dir_predict'.")
--- a/requirements.txt
+++ b/requirements.txt
+scipy==1.2.1
+numpy==1.17.0
+matplotlib==3.1.2
+opencv_python==4.1.2.30
+torch==1.2.0
+torchvision==0.4.0
+tqdm==4.60.0
+Pillow==8.2.0
+h5py==2.10.0
--- a/summary.py
+++ b/summary.py
+#--------------------------------------------#
+#   该部分代码用于看网络结构
+#--------------------------------------------#
+import torch
+from thop import clever_format, profile
+from torchsummary import summary
+
+from nets.yolo import YoloBody
+
+if __name__ == "__main__":
+    input_shape     = [640, 640]
+    anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+    num_classes     = 80
+    phi             = 'l'
+    
+    device  = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    m       = YoloBody(anchors_mask, num_classes, phi).to(device)
+    summary(m, (3, input_shape[0], input_shape[1]))
+    
+    dummy_input     = torch.randn(1, 3, input_shape[0], input_shape[1]).to(device)
+    flops, params   = profile(m.to(device), (dummy_input, ), verbose=False)
+    #--------------------------------------------------------#
+    #   flops * 2是因为profile没有将卷积作为两个operations
+    #   有些论文将卷积算乘法、加法两个operations。此时乘2
+    #   有些论文只考虑乘法的运算次数，忽略加法。此时不乘2
+    #   本代码选择乘2，参考YOLOX。
+    #--------------------------------------------------------#
+    flops           = flops * 2
+    flops, params   = clever_format([flops, params], "%.3f")
+    print('Total GFLOPS: %s' % (flops))
+    print('Total params: %s' % (params))
--- a/train.py
+++ b/train.py
--- a/utils/__init__.py
+++ b/utils/__init__.py
+#
\ No newline at end of file
--- a/utils/callbacks.py
+++ b/utils/callbacks.py
+import datetime
+import os
+
+import torch
+import matplotlib
+matplotlib.use('Agg')
+import scipy.signal
+from matplotlib import pyplot as plt
+from torch.utils.tensorboard import SummaryWriter
+
+import shutil
+import numpy as np
+
+from PIL import Image
+from tqdm import tqdm
+from .utils import cvtColor, preprocess_input, resize_image
+from .utils_bbox import DecodeBox
+from .utils_map import get_coco_map, get_map
+
+
+class LossHistory():
+    def __init__(self, log_dir, model, input_shape):
+        self.log_dir    = log_dir
+        self.losses     = []
+        self.val_loss   = []
+        
+        os.makedirs(self.log_dir)
+        self.writer     = SummaryWriter(self.log_dir)
+        try:
+            dummy_input     = torch.randn(2, 3, input_shape[0], input_shape[1])
+            self.writer.add_graph(model, dummy_input)
+        except:
+            pass
+
+    def append_loss(self, epoch, loss, val_loss):
+        if not os.path.exists(self.log_dir):
+            os.makedirs(self.log_dir)
+
+        self.losses.append(loss)
+        self.val_loss.append(val_loss)
+
+        with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f:
+            f.write(str(loss))
+            f.write("\n")
+        with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f:
+            f.write(str(val_loss))
+            f.write("\n")
+
+        self.writer.add_scalar('loss', loss, epoch)
+        self.writer.add_scalar('val_loss', val_loss, epoch)
+        self.loss_plot()
+
+    def loss_plot(self):
+        iters = range(len(self.losses))
+
+        plt.figure()
+        plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss')
+        plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss')
+        try:
+            if len(self.losses) < 25:
+                num = 5
+            else:
+                num = 15
+            
+            plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss')
+            plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss')
+        except:
+            pass
+
+        plt.grid(True)
+        plt.xlabel('Epoch')
+        plt.ylabel('Loss')
+        plt.legend(loc="upper right")
+
+        plt.savefig(os.path.join(self.log_dir, "epoch_loss.png"))
+
+        plt.cla()
+        plt.close("all")
+
+class EvalCallback():
+    def __init__(self, net, input_shape, anchors, anchors_mask, class_names, num_classes, val_lines, log_dir, cuda, \
+            map_out_path=".temp_map_out", max_boxes=100, confidence=0.05, nms_iou=0.5, letterbox_image=True, MINOVERLAP=0.5, eval_flag=True, period=1):
+        super(EvalCallback, self).__init__()
+        
+        self.net                = net
+        self.input_shape        = input_shape
+        self.anchors            = anchors
+        self.anchors_mask       = anchors_mask
+        self.class_names        = class_names
+        self.num_classes        = num_classes
+        self.val_lines          = val_lines
+        self.log_dir            = log_dir
+        self.cuda               = cuda
+        self.map_out_path       = map_out_path
+        self.max_boxes          = max_boxes
+        self.confidence         = confidence
+        self.nms_iou            = nms_iou
+        self.letterbox_image    = letterbox_image
+        self.MINOVERLAP         = MINOVERLAP
+        self.eval_flag          = eval_flag
+        self.period             = period
+        
+        self.bbox_util          = DecodeBox(self.anchors, self.num_classes, (self.input_shape[0], self.input_shape[1]), self.anchors_mask)
+        
+        self.maps       = [0]
+        self.epoches    = [0]
+        if self.eval_flag:
+            with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
+                f.write(str(0))
+                f.write("\n")
+
+    def get_map_txt(self, image_id, image, class_names, map_out_path):
+        f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"), "w", encoding='utf-8') 
+        image_shape = np.array(np.shape(image)[0:2])
+        #---------------------------------------------------------#
+        #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
+        #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
+        #---------------------------------------------------------#
+        image       = cvtColor(image)
+        #---------------------------------------------------------#
+        #   给图像增加灰条，实现不失真的resize
+        #   也可以直接resize进行识别
+        #---------------------------------------------------------#
+        image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
+        #---------------------------------------------------------#
+        #   添加上batch_size维度
+        #---------------------------------------------------------#
+        image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
+
+        with torch.no_grad():
+            images = torch.from_numpy(image_data)
+            if self.cuda:
+                images = images.cuda()
+            #---------------------------------------------------------#
+            #   将图像输入网络当中进行预测！
+            #---------------------------------------------------------#
+            outputs = self.net(images)
+            outputs = self.bbox_util.decode_box(outputs)
+            #---------------------------------------------------------#
+            #   将预测框进行堆叠，然后进行非极大抑制
+            #---------------------------------------------------------#
+            results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 
+                        image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
+                                                    
+            if results[0] is None: 
+                return 
+
+            top_label   = np.array(results[0][:, 6], dtype = 'int32')
+            top_conf    = results[0][:, 4] * results[0][:, 5]
+            top_boxes   = results[0][:, :4]
+
+        top_100     = np.argsort(top_label)[::-1][:self.max_boxes]
+        top_boxes   = top_boxes[top_100]
+        top_conf    = top_conf[top_100]
+        top_label   = top_label[top_100]
+
+        for i, c in list(enumerate(top_label)):
+            predicted_class = self.class_names[int(c)]
+            box             = top_boxes[i]
+            score           = str(top_conf[i])
+
+            top, left, bottom, right = box
+            if predicted_class not in class_names:
+                continue
+
+            f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
+
+        f.close()
+        return 
+    
+    def on_epoch_end(self, epoch, model_eval):
+        if epoch % self.period == 0 and self.eval_flag:
+            self.net = model_eval
+            if not os.path.exists(self.map_out_path):
+                os.makedirs(self.map_out_path)
+            if not os.path.exists(os.path.join(self.map_out_path, "ground-truth")):
+                os.makedirs(os.path.join(self.map_out_path, "ground-truth"))
+            if not os.path.exists(os.path.join(self.map_out_path, "detection-results")):
+                os.makedirs(os.path.join(self.map_out_path, "detection-results"))
+            print("Get map.")
+            for annotation_line in tqdm(self.val_lines):
+                line        = annotation_line.split()
+                image_id    = os.path.basename(line[0]).split('.')[0]
+                #------------------------------#
+                #   读取图像并转换成RGB图像
+                #------------------------------#
+                image       = Image.open(line[0])
+                #------------------------------#
+                #   获得预测框
+                #------------------------------#
+                gt_boxes    = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
+                #------------------------------#
+                #   获得预测txt
+                #------------------------------#
+                self.get_map_txt(image_id, image, self.class_names, self.map_out_path)
+                
+                #------------------------------#
+                #   获得真实框txt
+                #------------------------------#
+                with open(os.path.join(self.map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
+                    for box in gt_boxes:
+                        left, top, right, bottom, obj = box
+                        obj_name = self.class_names[obj]
+                        new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
+                        
+            print("Calculate Map.")
+            try:
+                temp_map = get_coco_map(class_names = self.class_names, path = self.map_out_path)[1]
+            except:
+                temp_map = get_map(self.MINOVERLAP, False, path = self.map_out_path)
+            self.maps.append(temp_map)
+            self.epoches.append(epoch)
+
+            with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
+                f.write(str(temp_map))
+                f.write("\n")
+            
+            plt.figure()
+            plt.plot(self.epoches, self.maps, 'red', linewidth = 2, label='train map')
+
+            plt.grid(True)
+            plt.xlabel('Epoch')
+            plt.ylabel('Map %s'%str(self.MINOVERLAP))
+            plt.title('A Map Curve')
+            plt.legend(loc="upper right")
+
+            plt.savefig(os.path.join(self.log_dir, "epoch_map.png"))
+            plt.cla()
+            plt.close("all")
+
+            print("Get map done.")
+            shutil.rmtree(self.map_out_path)
--- a/utils/dataloader.py
+++ b/utils/dataloader.py
--- a/utils/utils.py
+++ b/utils/utils.py
+import numpy as np
+from PIL import Image
+
+
+#---------------------------------------------------------#
+#   将图像转换成RGB图像，防止灰度图在预测时报错。
+#   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
+#---------------------------------------------------------#
+def cvtColor(image):
+    if len(np.shape(image)) == 3 and np.shape(image)[2] == 3:
+        return image 
+    else:
+        image = image.convert('RGB')
+        return image 
+
+#---------------------------------------------------#
+#   对输入图像进行resize
+#---------------------------------------------------#
+def resize_image(image, size, letterbox_image):
+    iw, ih  = image.size
+    w, h    = size
+    if letterbox_image:
+        scale   = min(w/iw, h/ih)
+        nw      = int(iw*scale)
+        nh      = int(ih*scale)
+
+        image   = image.resize((nw,nh), Image.BICUBIC)
+        new_image = Image.new('RGB', size, (128,128,128))
+        new_image.paste(image, ((w-nw)//2, (h-nh)//2))
+    else:
+        new_image = image.resize((w, h), Image.BICUBIC)
+    return new_image
+
+#---------------------------------------------------#
+#   获得类
+#---------------------------------------------------#
+def get_classes(classes_path):
+    with open(classes_path, encoding='utf-8') as f:
+        class_names = f.readlines()
+    class_names = [c.strip() for c in class_names]
+    return class_names, len(class_names)
+
+#---------------------------------------------------#
+#   获得先验框
+#---------------------------------------------------#
+def get_anchors(anchors_path):
+    '''loads the anchors from a file'''
+    with open(anchors_path, encoding='utf-8') as f:
+        anchors = f.readline()
+    anchors = [float(x) for x in anchors.split(',')]
+    anchors = np.array(anchors).reshape(-1, 2)
+    return anchors, len(anchors)
+
+#---------------------------------------------------#
+#   获得学习率
+#---------------------------------------------------#
+def get_lr(optimizer):
+    for param_group in optimizer.param_groups:
+        return param_group['lr']
+
+def preprocess_input(image):
+    image /= 255.0
+    return image
+
+def show_config(**kwargs):
+    print('Configurations:')
+    print('-' * 70)
+    print('|%25s | %40s|' % ('keys', 'values'))
+    print('-' * 70)
+    for key, value in kwargs.items():
+        print('|%25s | %40s|' % (str(key), str(value)))
+    print('-' * 70)
+        
+def download_weights(model_dir="./model_data"):
+    import os
+    from torch.hub import load_state_dict_from_url
+    
+    phi = "l"
+    download_urls = {
+        "l" : 'https://github.com/bubbliiiing/yolov7-pytorch/releases/download/v1.0/cspdarknet_backbone_l.pth',
+    }
+    url = download_urls[phi]
+    
+    if not os.path.exists(model_dir):
+        os.makedirs(model_dir)
+    load_state_dict_from_url(url, model_dir)
\ No newline at end of file
--- a/utils/utils_bbox.py
+++ b/utils/utils_bbox.py
--- a/utils/utils_fit.py
+++ b/utils/utils_fit.py
+import os
+
+import torch
+from tqdm import tqdm
+
+from utils.utils import get_lr
+        
+def fit_one_epoch(model_train, model, ema, yolo_loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, cuda, fp16, scaler, save_period, save_dir, local_rank=0):
+    loss        = 0
+    val_loss    = 0
+
+    if local_rank == 0:
+        print('Start Train')
+        pbar = tqdm(total=epoch_step,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3)
+    model_train.train()
+    for iteration, batch in enumerate(gen):
+        if iteration >= epoch_step:
+            break
+
+        images, targets, y_trues = batch[0], batch[1], batch[2]
+        with torch.no_grad():
+            if cuda:
+                images  = images.cuda(local_rank)
+                targets = [ann.cuda(local_rank) for ann in targets]
+                y_trues = [ann.cuda(local_rank) for ann in y_trues]
+        #----------------------#
+        #   清零梯度
+        #----------------------#
+        optimizer.zero_grad()
+        if not fp16:
+            #----------------------#
+            #   前向传播
+            #----------------------#
+            outputs         = model_train(images)
+
+            loss_value_all  = 0
+            #----------------------#
+            #   计算损失
+            #----------------------#
+            for l in range(len(outputs)):
+                loss_item = yolo_loss(l, outputs[l], targets, y_trues[l])
+                loss_value_all  += loss_item
+            loss_value = loss_value_all
+
+            #----------------------#
+            #   反向传播
+            #----------------------#
+            loss_value.backward()
+            optimizer.step()
+        else:
+            from torch.cuda.amp import autocast
+            with autocast():
+                #----------------------#
+                #   前向传播
+                #----------------------#
+                outputs         = model_train(images)
+
+                loss_value_all  = 0
+                #----------------------#
+                #   计算损失
+                #----------------------#
+                for l in range(len(outputs)):
+                    loss_item = yolo_loss(l, outputs[l], targets, y_trues[l])
+                    loss_value_all  += loss_item
+                loss_value = loss_value_all
+
+            #----------------------#
+            #   反向传播
+            #----------------------#
+            scaler.scale(loss_value).backward()
+            scaler.step(optimizer)
+            scaler.update()
+        if ema:
+            ema.update(model_train)
+
+        loss += loss_value.item()
+        
+        if local_rank == 0:
+            pbar.set_postfix(**{'loss'  : loss / (iteration + 1), 
+                                'lr'    : get_lr(optimizer)})
+            pbar.update(1)
+
+    if local_rank == 0:
+        pbar.close()
+        print('Finish Train')
+        print('Start Validation')
+        pbar = tqdm(total=epoch_step_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3)
+
+    if ema:
+        model_train_eval = ema.ema
+    else:
+        model_train_eval = model_train.eval()
+        
+    for iteration, batch in enumerate(gen_val):
+        if iteration >= epoch_step_val:
+            break
+        images, targets, y_trues = batch[0], batch[1], batch[2]
+        with torch.no_grad():
+            if cuda:
+                images  = images.cuda(local_rank)
+                targets = [ann.cuda(local_rank) for ann in targets]
+                y_trues = [ann.cuda(local_rank) for ann in y_trues]
+            #----------------------#
+            #   清零梯度
+            #----------------------#
+            optimizer.zero_grad()
+            #----------------------#
+            #   前向传播
+            #----------------------#
+            outputs         = model_train_eval(images)
+
+            loss_value_all  = 0
+            #----------------------#
+            #   计算损失
+            #----------------------#
+            for l in range(len(outputs)):
+                loss_item = yolo_loss(l, outputs[l], targets, y_trues[l])
+                loss_value_all  += loss_item
+            loss_value  = loss_value_all
+
+        val_loss += loss_value.item()
+        if local_rank == 0:
+            pbar.set_postfix(**{'val_loss': val_loss / (iteration + 1)})
+            pbar.update(1)
+            
+    if local_rank == 0:
+        pbar.close()
+        print('Finish Validation')
+        loss_history.append_loss(epoch + 1, loss / epoch_step, val_loss / epoch_step_val)
+        eval_callback.on_epoch_end(epoch + 1, model_train_eval)
+        print('Epoch:'+ str(epoch + 1) + '/' + str(Epoch))
+        print('Total Loss: %.3f || Val Loss: %.3f ' % (loss / epoch_step, val_loss / epoch_step_val))
+        
+        #-----------------------------------------------#
+        #   保存权值
+        #-----------------------------------------------#
+        if ema:
+            save_state_dict = ema.ema.state_dict()
+        else:
+            save_state_dict = model.state_dict()
+
+        if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch:
+            torch.save(save_state_dict, os.path.join(save_dir, "ep%03d-loss%.3f-val_loss%.3f.pth" % (epoch + 1, loss / epoch_step, val_loss / epoch_step_val)))
+            
+        if len(loss_history.val_loss) <= 1 or (val_loss / epoch_step_val) <= min(loss_history.val_loss):
+            print('Save best model to best_epoch_weights.pth')
+            torch.save(save_state_dict, os.path.join(save_dir, "best_epoch_weights.pth"))
+            
+        torch.save(save_state_dict, os.path.join(save_dir, "last_epoch_weights.pth"))
\ No newline at end of file
--- a/utils/utils_map.py
+++ b/utils/utils_map.py
--- a/utils_coco/coco_annotation.py
+++ b/utils_coco/coco_annotation.py
+#-------------------------------------------------------#
+#   用于处理COCO数据集，根据json文件生成txt文件用于训练
+#-------------------------------------------------------#
+import json
+import os
+from collections import defaultdict
+
+#-------------------------------------------------------#
+#   指向了COCO训练集与验证集图片的路径
+#-------------------------------------------------------#
+train_datasets_path     = "coco_dataset/train2017"
+val_datasets_path       = "coco_dataset/val2017"
+
+#-------------------------------------------------------#
+#   指向了COCO训练集与验证集标签的路径
+#-------------------------------------------------------#
+train_annotation_path   = "coco_dataset/annotations/instances_train2017.json"
+val_annotation_path     = "coco_dataset/annotations/instances_val2017.json"
+
+#-------------------------------------------------------#
+#   生成的txt文件路径
+#-------------------------------------------------------#
+train_output_path       = "coco_train.txt"
+val_output_path         = "coco_val.txt"
+
+if __name__ == "__main__":
+    name_box_id = defaultdict(list)
+    id_name     = dict()
+    f           = open(train_annotation_path, encoding='utf-8')
+    data        = json.load(f)
+
+    annotations = data['annotations']
+    for ant in annotations:
+        id = ant['image_id']
+        name = os.path.join(train_datasets_path, '%012d.jpg' % id)
+        cat = ant['category_id']
+        if cat >= 1 and cat <= 11:
+            cat = cat - 1
+        elif cat >= 13 and cat <= 25:
+            cat = cat - 2
+        elif cat >= 27 and cat <= 28:
+            cat = cat - 3
+        elif cat >= 31 and cat <= 44:
+            cat = cat - 5
+        elif cat >= 46 and cat <= 65:
+            cat = cat - 6
+        elif cat == 67:
+            cat = cat - 7
+        elif cat == 70:
+            cat = cat - 9
+        elif cat >= 72 and cat <= 82:
+            cat = cat - 10
+        elif cat >= 84 and cat <= 90:
+            cat = cat - 11
+        name_box_id[name].append([ant['bbox'], cat])
+
+    f = open(train_output_path, 'w')
+    for key in name_box_id.keys():
+        f.write(key)
+        box_infos = name_box_id[key]
+        for info in box_infos:
+            x_min = int(info[0][0])
+            y_min = int(info[0][1])
+            x_max = x_min + int(info[0][2])
+            y_max = y_min + int(info[0][3])
+
+            box_info = " %d,%d,%d,%d,%d" % (
+                x_min, y_min, x_max, y_max, int(info[1]))
+            f.write(box_info)
+        f.write('\n')
+    f.close()
+
+    name_box_id = defaultdict(list)
+    id_name     = dict()
+    f           = open(val_annotation_path, encoding='utf-8')
+    data        = json.load(f)
+
+    annotations = data['annotations']
+    for ant in annotations:
+        id = ant['image_id']
+        name = os.path.join(val_datasets_path, '%012d.jpg' % id)
+        cat = ant['category_id']
+        if cat >= 1 and cat <= 11:
+            cat = cat - 1
+        elif cat >= 13 and cat <= 25:
+            cat = cat - 2
+        elif cat >= 27 and cat <= 28:
+            cat = cat - 3
+        elif cat >= 31 and cat <= 44:
+            cat = cat - 5
+        elif cat >= 46 and cat <= 65:
+            cat = cat - 6
+        elif cat == 67:
+            cat = cat - 7
+        elif cat == 70:
+            cat = cat - 9
+        elif cat >= 72 and cat <= 82:
+            cat = cat - 10
+        elif cat >= 84 and cat <= 90:
+            cat = cat - 11
+        name_box_id[name].append([ant['bbox'], cat])
+
+    f = open(val_output_path, 'w')
+    for key in name_box_id.keys():
+        f.write(key)
+        box_infos = name_box_id[key]
+        for info in box_infos:
+            x_min = int(info[0][0])
+            y_min = int(info[0][1])
+            x_max = x_min + int(info[0][2])
+            y_max = y_min + int(info[0][3])
+
+            box_info = " %d,%d,%d,%d,%d" % (
+                x_min, y_min, x_max, y_max, int(info[1]))
+            f.write(box_info)
+        f.write('\n')
+    f.close()
--- a/utils_coco/get_map_coco.py
+++ b/utils_coco/get_map_coco.py
+import json
+import os
+
+import numpy as np
+import torch
+from PIL import Image
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
+from tqdm import tqdm
+
+from utils.utils import cvtColor, preprocess_input, resize_image
+from yolo import YOLO
+
+#---------------------------------------------------------------------------#
+#   map_mode用于指定该文件运行时计算的内容
+#   map_mode为0代表整个map计算流程，包括获得预测结果、计算map。
+#   map_mode为1代表仅仅获得预测结果。
+#   map_mode为2代表仅仅获得计算map。
+#---------------------------------------------------------------------------#
+map_mode            = 0
+#-------------------------------------------------------#
+#   指向了验证集标签与图片路径
+#-------------------------------------------------------#
+cocoGt_path         = 'coco_dataset/annotations/instances_val2017.json'
+dataset_img_path    = 'coco_dataset/val2017'
+#-------------------------------------------------------#
+#   结果输出的文件夹，默认为map_out
+#-------------------------------------------------------#
+temp_save_path      = 'map_out/coco_eval'
+
+class mAP_YOLO(YOLO):
+    #---------------------------------------------------#
+    #   检测图片
+    #---------------------------------------------------#
+    def detect_image(self, image_id, image, results, clsid2catid):
+        #---------------------------------------------------#
+        #   计算输入图片的高和宽
+        #---------------------------------------------------#
+        image_shape = np.array(np.shape(image)[0:2])
+        #---------------------------------------------------------#
+        #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
+        #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
+        #---------------------------------------------------------#
+        image       = cvtColor(image)
+        #---------------------------------------------------------#
+        #   给图像增加灰条，实现不失真的resize
+        #   也可以直接resize进行识别
+        #---------------------------------------------------------#
+        image_data  = resize_image(image, (self.input_shape[1],self.input_shape[0]), self.letterbox_image)
+        #---------------------------------------------------------#
+        #   添加上batch_size维度
+        #---------------------------------------------------------#
+        image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
+
+        with torch.no_grad():
+            images = torch.from_numpy(image_data)
+            if self.cuda:
+                images = images.cuda()
+            #---------------------------------------------------------#
+            #   将图像输入网络当中进行预测！
+            #---------------------------------------------------------#
+            outputs = self.net(images)
+            outputs = self.bbox_util.decode_box(outputs)
+            #---------------------------------------------------------#
+            #   将预测框进行堆叠，然后进行非极大抑制
+            #---------------------------------------------------------#
+            outputs = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 
+                        image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
+                                                    
+            if outputs[0] is None: 
+                return results
+
+            top_label   = np.array(outputs[0][:, 6], dtype = 'int32')
+            top_conf    = outputs[0][:, 4] * outputs[0][:, 5]
+            top_boxes   = outputs[0][:, :4]
+
+        for i, c in enumerate(top_label):
+            result                      = {}
+            top, left, bottom, right    = top_boxes[i]
+
+            result["image_id"]      = int(image_id)
+            result["category_id"]   = clsid2catid[c]
+            result["bbox"]          = [float(left),float(top),float(right-left),float(bottom-top)]
+            result["score"]         = float(top_conf[i])
+            results.append(result)
+        return results
+
+if __name__ == "__main__":
+    if not os.path.exists(temp_save_path):
+        os.makedirs(temp_save_path)
+
+    cocoGt      = COCO(cocoGt_path)
+    ids         = list(cocoGt.imgToAnns.keys())
+    clsid2catid = cocoGt.getCatIds()
+
+    if map_mode == 0 or map_mode == 1:
+        yolo = mAP_YOLO(confidence = 0.001, nms_iou = 0.65)
+
+        with open(os.path.join(temp_save_path, 'eval_results.json'),"w") as f:
+            results = []
+            for image_id in tqdm(ids):
+                image_path  = os.path.join(dataset_img_path, cocoGt.loadImgs(image_id)[0]['file_name'])
+                image       = Image.open(image_path)
+                results     = yolo.detect_image(image_id, image, results, clsid2catid)
+            json.dump(results, f)
+
+    if map_mode == 0 or map_mode == 2:
+        cocoDt      = cocoGt.loadRes(os.path.join(temp_save_path, 'eval_results.json'))
+        cocoEval    = COCOeval(cocoGt, cocoDt, 'bbox') 
+        cocoEval.evaluate()
+        cocoEval.accumulate()
+        cocoEval.summarize()
+        print("Get map done.")
--- a/voc_annotation.py
+++ b/voc_annotation.py
+import os
+import random
+import xml.etree.ElementTree as ET
+
+import numpy as np
+
+from utils.utils import get_classes
+
+#--------------------------------------------------------------------------------------------------------------------------------#
+#   annotation_mode用于指定该文件运行时计算的内容
+#   annotation_mode为0代表整个标签处理过程，包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt
+#   annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt
+#   annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt
+#--------------------------------------------------------------------------------------------------------------------------------#
+annotation_mode     = 0
+#-------------------------------------------------------------------#
+#   必须要修改，用于生成2007_train.txt、2007_val.txt的目标信息
+#   与训练和预测所用的classes_path一致即可
+#   如果生成的2007_train.txt里面没有目标信息
+#   那么就是因为classes没有设定正确
+#   仅在annotation_mode为0和2的时候有效
+#-------------------------------------------------------------------#
+classes_path        = 'model_data/voc_classes.txt'
+#--------------------------------------------------------------------------------------------------------------------------------#
+#   trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1
+#   train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1
+#   仅在annotation_mode为0和1的时候有效
+#--------------------------------------------------------------------------------------------------------------------------------#
+trainval_percent    = 0.9
+train_percent       = 0.9
+#-------------------------------------------------------#
+#   指向VOC数据集所在的文件夹
+#   默认指向根目录下的VOC数据集
+#-------------------------------------------------------#
+VOCdevkit_path  = 'VOCdevkit'
+
+VOCdevkit_sets  = [('2007', 'train'), ('2007', 'val')]
+classes, _      = get_classes(classes_path)
+
+#-------------------------------------------------------#
+#   统计目标数量
+#-------------------------------------------------------#
+photo_nums  = np.zeros(len(VOCdevkit_sets))
+nums        = np.zeros(len(classes))
+def convert_annotation(year, image_id, list_file):
+    in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8')
+    tree=ET.parse(in_file)
+    root = tree.getroot()
+
+    for obj in root.iter('object'):
+        difficult = 0 
+        if obj.find('difficult')!=None:
+            difficult = obj.find('difficult').text
+        cls = obj.find('name').text
+        if cls not in classes or int(difficult)==1:
+            continue
+        cls_id = classes.index(cls)
+        xmlbox = obj.find('bndbox')
+        b = (int(float(xmlbox.find('xmin').text)), int(float(xmlbox.find('ymin').text)), int(float(xmlbox.find('xmax').text)), int(float(xmlbox.find('ymax').text)))
+        list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
+        
+        nums[classes.index(cls)] = nums[classes.index(cls)] + 1
+        
+if __name__ == "__main__":
+    random.seed(0)
+    if " " in os.path.abspath(VOCdevkit_path):
+        raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格，否则会影响正常的模型训练，请注意修改。")
+
+    if annotation_mode == 0 or annotation_mode == 1:
+        print("Generate txt in ImageSets.")
+        xmlfilepath     = os.path.join(VOCdevkit_path, 'VOC2007/Annotations')
+        saveBasePath    = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Main')
+        temp_xml        = os.listdir(xmlfilepath)
+        total_xml       = []
+        for xml in temp_xml:
+            if xml.endswith(".xml"):
+                total_xml.append(xml)
+
+        num     = len(total_xml)  
+        list    = range(num)  
+        tv      = int(num*trainval_percent)  
+        tr      = int(tv*train_percent)  
+        trainval= random.sample(list,tv)  
+        train   = random.sample(trainval,tr)  
+        
+        print("train and val size",tv)
+        print("train size",tr)
+        ftrainval   = open(os.path.join(saveBasePath,'trainval.txt'), 'w')  
+        ftest       = open(os.path.join(saveBasePath,'test.txt'), 'w')  
+        ftrain      = open(os.path.join(saveBasePath,'train.txt'), 'w')  
+        fval        = open(os.path.join(saveBasePath,'val.txt'), 'w')  
+        
+        for i in list:  
+            name=total_xml[i][:-4]+'\n'  
+            if i in trainval:  
+                ftrainval.write(name)  
+                if i in train:  
+                    ftrain.write(name)  
+                else:  
+                    fval.write(name)  
+            else:  
+                ftest.write(name)  
+        
+        ftrainval.close()  
+        ftrain.close()  
+        fval.close()  
+        ftest.close()
+        print("Generate txt in ImageSets done.")
+
+    if annotation_mode == 0 or annotation_mode == 2:
+        print("Generate 2007_train.txt and 2007_val.txt for train.")
+        type_index = 0
+        for year, image_set in VOCdevkit_sets:
+            image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split()
+            list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8')
+            for image_id in image_ids:
+                list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(os.path.abspath(VOCdevkit_path), year, image_id))
+
+                convert_annotation(year, image_id, list_file)
+                list_file.write('\n')
+            photo_nums[type_index] = len(image_ids)
+            type_index += 1
+            list_file.close()
+        print("Generate 2007_train.txt and 2007_val.txt for train done.")
+        
+        def printTable(List1, List2):
+            for i in range(len(List1[0])):
+                print("|", end=' ')
+                for j in range(len(List1)):
+                    print(List1[j][i].rjust(int(List2[j])), end=' ')
+                    print("|", end=' ')
+                print()
+
+        str_nums = [str(int(x)) for x in nums]
+        tableData = [
+            classes, str_nums
+        ]
+        colWidths = [0]*len(tableData)
+        len1 = 0
+        for i in range(len(tableData)):
+            for j in range(len(tableData[i])):
+                if len(tableData[i][j]) > colWidths[i]:
+                    colWidths[i] = len(tableData[i][j])
+        printTable(tableData, colWidths)
+
+        if photo_nums[0] <= 500:
+            print("训练集数量小于500，属于较小的数据量，请注意设置较大的训练世代（Epoch）以满足足够的梯度下降次数（Step）。")
+
+        if np.sum(nums) == 0:
+            print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
+            print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
+            print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
+            print("（重要的事情说三遍）。")
--- a/yolo.py
+++ b/yolo.py
--- a/常见问题汇总.md
+++ b/常见问题汇总.md