未验证 提交 7b52bf69 编写于 作者: J Jason 提交者: GitHub

Merge pull request #283 from FlyingQianMM/develop_draw

add multichannel RemoteSensing
......@@ -6,6 +6,7 @@ API接口说明
transforms/index.rst
datasets.md
analysis.md
models/index.rst
slim.md
visualize.md
......
......@@ -3,8 +3,7 @@
## paddlex.seg.DeepLabv3p
```python
paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, pooling_crop_size=None)
paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, pooling_crop_size=None, input_channel=3)
```
> 构建DeepLabv3p分割器。
......@@ -23,6 +22,7 @@ paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride
> > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
> > - **pooling_crop_size** (int):当backbone为`MobileNetV3_large_x1_0_ssld`时,需设置为训练过程中模型输入大小,格式为[W, H]。例如模型输入大小为[512, 512], 则`pooling_crop_size`应该设置为[512, 512]。在encoder模块中获取图像平均值时被用到,若为None,则直接求平均值;若为模型输入大小,则使用`avg_pool`算子得到平均值。默认值None。
> > - **input_channel** (int): 输入图像通道数。默认值3。
### train
......@@ -114,7 +114,7 @@ batch_predict(self, img_file_list, transforms=None):
## paddlex.seg.UNet
```python
paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255)
paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, input_channel=3)
```
> 构建UNet分割器。
......@@ -127,6 +127,7 @@ paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, us
> > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
> > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
> > - **input_channel** (int): 输入图像通道数。默认值3。
> - train 训练接口说明同 [DeepLabv3p模型train接口](#train)
> - evaluate 评估接口说明同 [DeepLabv3p模型evaluate接口](#evaluate)
......@@ -136,7 +137,7 @@ paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, us
## paddlex.seg.HRNet
```python
paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255)
paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, input_channel=3)
```
> 构建HRNet分割器。
......@@ -149,6 +150,7 @@ paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=Fal
> > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
> > - **class_weight** (list|str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
> > - **input_channel** (int): 输入图像通道数。默认值3。
> - train 训练接口说明同 [DeepLabv3p模型train接口](#train)
> - evaluate 评估接口说明同 [DeepLabv3p模型evaluate接口](#evaluate)
......@@ -158,7 +160,7 @@ paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=Fal
## paddlex.seg.FastSCNN
```python
paddlex.seg.FastSCNN(num_classes=2, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, multi_loss_weight=[1.0])
paddlex.seg.FastSCNN(num_classes=2, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255, multi_loss_weight=[1.0], input_channel=3)
```
> 构建FastSCNN分割器。
......@@ -171,6 +173,7 @@ paddlex.seg.FastSCNN(num_classes=2, use_bce_loss=False, use_dice_loss=False, cla
> > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
> > - **multi_loss_weight** (list): 多分支上的loss权重。默认计算一个分支上的loss,即默认值为[1.0]。也支持计算两个分支或三个分支上的loss,权重按[fusion_branch_weight, higher_branch_weight, lower_branch_weight]排列,fusion_branch_weight为空间细节分支和全局上下文分支融合后的分支上的loss权重,higher_branch_weight为空间细节分支上的loss权重,lower_branch_weight为全局上下文分支上的loss权重,若higher_branch_weight和lower_branch_weight未设置则不会计算这两个分支上的loss。
> > - **input_channel** (int): 输入图像通道数。默认值3。
> - train 训练接口说明同 [DeepLabv3p模型train接口](#train)
> - evaluate 评估接口说明同 [DeepLabv3p模型evaluate接口](#evaluate)
......
......@@ -78,16 +78,19 @@ paddlex.seg.transforms.ResizeStepScaling(min_scale_factor=0.75, max_scale_factor
## Normalize
```python
paddlex.seg.transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
paddlex.seg.transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], min_val=[0, 0, 0], max_val=[255.0, 255.0, 255.0])
```
对图像进行标准化。
1.图像像素归一化到区间 [0.0, 1.0]。
2.对图像进行减均值除以标准差操作。
1.像素值减去min_val
2.像素值除以(max_val-min_val), 归一化到区间 [0.0, 1.0]。
3.对图像进行减均值除以标准差操作。
### 参数
* **mean** (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。
* **std** (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。
* **min_val** (list): 图像数据集的最小值。默认值[0, 0, 0]。
* **max_val** (list): 图像数据集的最大值。默认值[255.0, 255.0, 255.0]。
## Padding
```python
......@@ -167,6 +170,16 @@ paddlex.seg.transforms.RandomDistort(brightness_range=0.5, brightness_prob=0.5,
* **hue_range** (int): 色调因子的范围。默认为18。
* **hue_prob** (float): 随机调整色调的概率。默认为0.5。
## Clip
```python
paddlex.seg.transforms.Clip(min_val=[0, 0, 0], max_val=[255.0, 255.0, 255.0])
```
对图像上超出一定范围的数据进行截断。
### 参数
* **min_val** (list): 裁剪的下限,小于min_val的数值均设为min_val. 默认值0。
* **max_val** (list): 裁剪的上限,大于max_val的数值均设为max_val. 默认值255.0。
<!--
## ComposedSegTransforms
```python
......
......@@ -27,7 +27,7 @@ pdx.det.visualize('./xiaoduxiong_epoch_12/xiaoduxiong.jpeg', result, save_dir='.
## paddlex.seg.visualize
> **语义分割模型预测结果可视化**
```
paddlex.seg.visualize(image, result, weight=0.6, save_dir='./')
paddlex.seg.visualize(image, result, weight=0.6, save_dir='./', color=None)
```
将语义分割模型预测得到的Mask在原图上进行可视化。
......@@ -36,6 +36,7 @@ paddlex.seg.visualize(image, result, weight=0.6, save_dir='./')
> * **result** (str): 模型预测结果。
> * **weight**(float): mask可视化结果与原图权重因子,weight表示原图的权重。默认0.6。
> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下。默认值为'./'。
> * **color** (list): 各类别的BGR颜色值组成的列表。例如两类时可设置为[255, 255, 255, 0, 0, 0]。默认值为None,则使用默认生成的颜色列表。
### 使用示例
> 点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/cityscape_deeplab.tar.gz)和[测试图片](https://bj.bcebos.com/paddlex/datasets/city.png)
......
......@@ -12,3 +12,4 @@ PaddleX精选飞桨视觉开发套件在产业实践中的成熟模型结构,
solutions.md
meter_reader.md
human_segmentation.md
multi-channel_remote_sensing/README.md
# 多通道遥感影像分割
遥感影像分割是图像分割领域中的重要应用场景,广泛应用于土地测绘、环境监测、城市建设等领域。遥感影像分割的目标多种多样,有诸如积雪、农作物、道路、建筑、水源等地物目标,也有例如云层的空中目标。
本案例基于PaddleX实现多通道遥感影像分割,涵盖数据分析、模型训练、模型预测等流程,旨在帮助用户利用深度学习技术解决多通道遥感影像分割问题。
## 前置依赖
* Paddle paddle >= 1.8.4
* Python >= 3.5
* PaddleX >= 1.1.0
安装的相关问题参考[PaddleX安装](../../install.md)
**另外还需安装gdal**, 使用pip安装gdal可能出错,推荐使用conda进行安装:
```
conda install gdal
```
下载PaddleX源码:
```
git clone https://github.com/PaddlePaddle/PaddleX
```
该案例所有脚本均位于`PaddleX/examples/channel_remote_sensing/`,进入该目录:
```
cd PaddleX/examples/channel_remote_sensing/
```
## 数据准备
遥感影像的格式多种多样,不同传感器产生的数据格式也可能不同。PaddleX现已兼容以下4种格式图片读取:
- `tif`
- `png`
- `img`
- `npy`
标注图要求必须为单通道的png格式图像,像素值即为对应的类别,像素标注类别需要从0开始递增。例如0,1,2,3表示有4种类别,255用于指定不参与训练和评估的像素,标注类别最多为256类。
本案例使用[L8 SPARCS公开数据集](https://www.usgs.gov/land-resources/nli/landsat/spatial-procedures-automated-removal-cloud-and-shadow-sparcs-validation)进行云雪分割,该数据集包含80张卫星影像,涵盖10个波段。原始标注图片包含7个类别,分别是`cloud`, `cloud shadow`, `shadow over water`, `snow/ice`, `water`, `land``flooded`。由于`flooded``shadow over water`2个类别占比仅为`1.8%``0.24%`,我们将其进行合并,`flooded`归为`land``shadow over water`归为`shadow`,合并后标注包含5个类别。
数值、类别、颜色对应表:
|Pixel value|Class|Color|
|---|---|---|
|0|cloud|white|
|1|shadow|black|
|2|snow/ice|cyan|
|3|water|blue|
|4|land|grey|
![](../../../examples/multi-channel_remote_sensing/docs/images/dataset.png)
执行以下命令下载并解压经过类别合并后的数据集:
```shell script
mkdir dataset && cd dataset
wget https://paddleseg.bj.bcebos.com/dataset/remote_sensing_seg.zip
unzip remote_sensing_seg.zip
cd ..
```
其中`data`目录存放遥感影像,`data_vis`目录存放彩色合成预览图,`mask`目录存放标注图。
## 数据分析
遥感影像往往由许多波段组成,不同波段数据分布可能大相径庭,例如可见光波段和热红外波段分布十分不同。为了更深入了解数据的分布来优化模型训练效果,需要对数据进行分析。
参考文档[数据分析](./analysis.md)对训练集进行统计分析,确定图像像素值的截断范围,并统计截断后的均值和方差。
## 模型训练
本案例选择`UNet`语义分割模型完成云雪分割,运行以下步骤完成模型训练,模型的最优精度`miou``78.38%`
* 设置GPU卡号
```shell script
export CUDA_VISIBLE_DEVICES=0
```
* 运行以下脚本开始训练
```shell script
python train.py --data_dir dataset/remote_sensing_seg \
--train_file_list dataset/remote_sensing_seg/train.txt \
--eval_file_list dataset/remote_sensing_seg/val.txt \
--label_list dataset/remote_sensing_seg/labels.txt \
--save_dir saved_model/remote_sensing_unet \
--num_classes 5 \
--channel 10 \
--lr 0.01 \
--clip_min_value 7172 6561 5777 5103 4291 4000 4000 4232 6934 7199 \
--clip_max_value 50000 50000 50000 50000 50000 40000 30000 18000 40000 36000 \
--mean 0.15163569 0.15142828 0.15574491 0.1716084 0.2799778 0.27652043 0.28195933 0.07853807 0.56333154 0.5477584 \
--std 0.09301891 0.09818967 0.09831126 0.1057784 0.10842132 0.11062996 0.12791838 0.02637859 0.0675052 0.06168227 \
--num_epochs 500 \
--train_batch_size 3
```
也可以跳过模型训练步骤,下载预训练模型直接进行模型预测:
```
wget https://bj.bcebos.com/paddlex/examples/multi-channel_remote_sensing/models/l8sparcs_remote_model.tar.gz
tar -xvf l8sparcs_remote_model.tar.gz
```
## 模型预测
运行以下脚本,对遥感图像进行预测并可视化预测结果,相应地也将对应的标注文件进行可视化,以比较预测效果。
```shell script
export CUDA_VISIBLE_DEVICES=0
python predict.py
```
可视化效果如下所示:
![](../../../examples/multi-channel_remote_sensing/docs/images/prediction.jpg)
数值、类别、颜色对应表:
|Pixel value|Class|Color|
|---|---|---|
|0|cloud|white|
|1|shadow|black|
|2|snow/ice|cyan|
|3|water|blue|
|4|land|grey|
# 多通道遥感影像分割
遥感影像分割是图像分割领域中的重要应用场景,广泛应用于土地测绘、环境监测、城市建设等领域。遥感影像分割的目标多种多样,有诸如积雪、农作物、道路、建筑、水源等地物目标,也有例如云层的空中目标。
本案例基于PaddleX实现多通道遥感影像分割,涵盖数据分析、模型训练、模型预测等流程,旨在帮助用户利用深度学习技术解决多通道遥感影像分割问题。
## 目录
* [前置依赖](#1)
* [数据准备](#2)
* [数据分析](#3)
* [模型训练](#4)
* [模型预测](#5)
## <h2 id="1">前置依赖</h2>
* Paddle paddle >= 1.8.4
* Python >= 3.5
* PaddleX >= 1.1.0
安装的相关问题参考[PaddleX安装](../../docs/install.md)
**另外还需安装gdal**, 使用pip安装gdal可能出错,推荐使用conda进行安装:
```
conda install gdal
```
下载PaddleX源码:
```
git clone https://github.com/PaddlePaddle/PaddleX
```
该案例所有脚本均位于`PaddleX/examples/channel_remote_sensing/`,进入该目录:
```
cd PaddleX/examples/channel_remote_sensing/
```
## <h2 id="2">数据准备</h2>
遥感影像的格式多种多样,不同传感器产生的数据格式也可能不同。PaddleX现已兼容以下4种格式图片读取:
- `tif`
- `png`
- `img`
- `npy`
标注图要求必须为单通道的png格式图像,像素值即为对应的类别,像素标注类别需要从0开始递增。例如0,1,2,3表示有4种类别,255用于指定不参与训练和评估的像素,标注类别最多为256类。
本案例使用[L8 SPARCS公开数据集](https://www.usgs.gov/land-resources/nli/landsat/spatial-procedures-automated-removal-cloud-and-shadow-sparcs-validation)进行云雪分割,该数据集包含80张卫星影像,涵盖10个波段。原始标注图片包含7个类别,分别是`cloud`, `cloud shadow`, `shadow over water`, `snow/ice`, `water`, `land``flooded`。由于`flooded``shadow over water`2个类别占比仅为`1.8%``0.24%`,我们将其进行合并,`flooded`归为`land``shadow over water`归为`shadow`,合并后标注包含5个类别。
数值、类别、颜色对应表:
|Pixel value|Class|Color|
|---|---|---|
|0|cloud|white|
|1|shadow|black|
|2|snow/ice|cyan|
|3|water|blue|
|4|land|grey|
<p align="center">
<img src="./docs/images/dataset.png" align="middle"
</p>
<p align='center'>
L8 SPARCS数据集示例
</p>
执行以下命令下载并解压经过类别合并后的数据集:
```shell script
mkdir dataset && cd dataset
wget https://paddleseg.bj.bcebos.com/dataset/remote_sensing_seg.zip
unzip remote_sensing_seg.zip
cd ..
```
其中`data`目录存放遥感影像,`data_vis`目录存放彩色合成预览图,`mask`目录存放标注图。
## <h2 id="2">数据分析</h2>
遥感影像往往由许多波段组成,不同波段数据分布可能大相径庭,例如可见光波段和热红外波段分布十分不同。为了更深入了解数据的分布来优化模型训练效果,需要对数据进行分析。
参考文档[数据分析](./docs/analysis.md)对训练集进行统计分析,确定图像像素值的截断范围,并统计截断后的均值和方差。
## <h2 id="2">模型训练</h2>
本案例选择`UNet`语义分割模型完成云雪分割,运行以下步骤完成模型训练,模型的最优精度`miou``78.38%`
* 设置GPU卡号
```shell script
export CUDA_VISIBLE_DEVICES=0
```
* 运行以下脚本开始训练
```shell script
python train.py --data_dir dataset/remote_sensing_seg \
--train_file_list dataset/remote_sensing_seg/train.txt \
--eval_file_list dataset/remote_sensing_seg/val.txt \
--label_list dataset/remote_sensing_seg/labels.txt \
--save_dir saved_model/remote_sensing_unet \
--num_classes 5 \
--channel 10 \
--lr 0.01 \
--clip_min_value 7172 6561 5777 5103 4291 4000 4000 4232 6934 7199 \
--clip_max_value 50000 50000 50000 50000 50000 40000 30000 18000 40000 36000 \
--mean 0.15163569 0.15142828 0.15574491 0.1716084 0.2799778 0.27652043 0.28195933 0.07853807 0.56333154 0.5477584 \
--std 0.09301891 0.09818967 0.09831126 0.1057784 0.10842132 0.11062996 0.12791838 0.02637859 0.0675052 0.06168227 \
--num_epochs 500 \
--train_batch_size 3
```
也可以跳过模型训练步骤,下载预训练模型直接进行模型预测:
```
wget https://bj.bcebos.com/paddlex/examples/multi-channel_remote_sensing/models/l8sparcs_remote_model.tar.gz
tar -xvf l8sparcs_remote_model.tar.gz
```
## <h2 id="2">模型预测</h2>
运行以下脚本,对遥感图像进行预测并可视化预测结果,相应地也将对应的标注文件进行可视化,以比较预测效果。
```shell script
export CUDA_VISIBLE_DEVICES=0
python predict.py
```
可视化效果如下所示:
<img src="./docs/images/prediction.jpg" alt="预测图" align=center />
数值、类别、颜色对应表:
|Pixel value|Class|Color|
|---|---|---|
|0|cloud|white|
|1|shadow|black|
|2|snow/ice|cyan|
|3|water|blue|
|4|land|grey|
# 数据分析
遥感影像往往由许多波段组成,不同波段数据分布可能大相径庭,例如可见光波段和热红外波段分布十分不同。为了更深入了解数据的分布来优化模型训练效果,需要对数据进行分析。
## 目录
* [1. 统计分析](#1)
* [2. 确定像素值截断范围](#2)
* [3. 统计截断后的均值和方差](#3)
## <h2 id="1">统计分析</h2>
执行以下脚本,对训练集进行统计分析,屏幕会输出分析结果,同时结果也会保存至文件`train_information.pkl`中:
```
python tools/analysis.py
```
数据统计分析内容如下:
* 图像数量
例如统计出训练集中有64张图片:
```
64 samples in file dataset/remote_sensing_seg/train.txt
```
* 图像最大和最小的尺寸
例如统计出训练集中最大的高宽和最小的高宽分别是(1000, 1000)和(1000, 1000):
```
Minimal image height: 1000 Minimal image width: 1000.
Maximal image height: 1000 Maximal image width: 1000.
```
* 图像通道数量
例如统计出图像的通道数量为10:
```
Image channel is 10.
```
* 图像各通道的最小值和最大值
最小值和最大值分别以列表的形式输出,按照通道从小到大排列。例如:
```
Minimal image value: [7.172e+03 6.561e+03 5.777e+03 5.103e+03 4.291e+03 1.000e+00 1.000e+00 4.232e+03 6.934e+03 7.199e+03]
Maximal image value: [65535. 65535. 65535. 65535. 65535. 65535. 65535. 56534. 65535. 63215.]
```
* 图像各通道的像素值分布
针对各个通道,统计出各像素值的数量,并以柱状图的形式呈现在以'distribute.png'结尾的图片中。**需要注意的是,为便于观察,纵坐标为对数坐标**。用户可以查看这些图片来选择是否需要对分布在头部和尾部的像素值进行截断。
```
Image pixel distribution of each channel is saved with 'distribute.png' in the dataset/remote_sensing_seg
```
* 图像各通道归一化后的均值和方差
各通道归一化系数为各通道最大值与最小值之差,均值和方差以列别形式输出,按照通道从小到大排列。例如:
```
Image mean value: [0.23417574 0.22283101 0.2119595 0.2119887 0.27910388 0.21294892 0.17294037 0.10158925 0.43623915 0.41019192]
Image standard deviation: [0.06831269 0.07243951 0.07284761 0.07875261 0.08120818 0.0609302 0.05110716 0.00696064 0.03849307 0.03205579]
```
* 标注图中各类别的数量及比重
统计各类别的像素数量和在数据集全部像素的占比,以(类别值,该类别的数量,该类别的占比)的格式输出。例如:
```
Label pixel information is shown in a format of (label_id, the number of label_id, the ratio of label_id):
(0, 13302870, 0.20785734374999995)
(1, 4577005, 0.07151570312500002)
(2, 3955012, 0.0617970625)
(3, 2814243, 0.04397254687499999)
(4, 39350870, 0.6148573437500001)
```
## <h2 id="2">2 确定像素值截断范围</h2>
遥感影像数据分布范围广,往往存在一些异常值,这会影响算法对实际数据分布的拟合效果。为更好地对数据进行归一化,可以抑制遥感影像中少量的异常值。根据`图像各通道的像素值分布`来确定像素值的截断范围,并在后续图像预处理过程中对超出范围的像素值通过截断进行校正,从而去除异常值带来的干扰。**注意:该步骤是否执行根据数据集实际分布来决定。**
例如各通道的像素值分布可视化效果如下:
<img src="./images/image_pixel_distribution.png" width = "600" height = "600" alt="像素值分布图" align=center />
对于上述分布,我们选取的截断范围是(按照通道从小到大排列):
```
截断范围最小值: clip_min_value = [7172, 6561, 5777, 5103, 4291, 4000, 4000, 4232, 6934, 7199]
截断范围最大值: clip_max_value = [50000, 50000, 50000, 50000, 50000, 40000, 30000, 18000, 40000, 36000]
```
## <h2 id="3">3 确定像素值截断范围</h2>
为避免数据截断范围选取不当带来的影响,应该统计异常值像素占比,确保受影响的像素比例不要过高。接着对截断后的数据计算归一化后的均值和方差,**用于后续模型训练时的图像预处理参数设置**
执行以下脚本:
```
python tools/cal_clipped_mean_std.py
```
截断像素占比统计结果如下:
```
Channel 0, the ratio of pixels to be clipped = 0.00054778125
Channel 1, the ratio of pixels to be clipped = 0.0011129375
Channel 2, the ratio of pixels to be clipped = 0.000843703125
Channel 3, the ratio of pixels to be clipped = 0.00127125
Channel 4, the ratio of pixels to be clipped = 0.001330140625
Channel 5, the ratio of pixels to be clipped = 8.1375e-05
Channel 6, the ratio of pixels to be clipped = 0.0007348125
Channel 7, the ratio of pixels to be clipped = 6.5625e-07
Channel 8, the ratio of pixels to be clipped = 0.000185921875
Channel 9, the ratio of pixels to be clipped = 0.000139671875
```
可看出,被截断像素占比均不超过0.2%。
裁剪后数据的归一化系数如下:
```
Image mean value: [0.15163569 0.15142828 0.15574491 0.1716084 0.2799778 0.27652043 0.28195933 0.07853807 0.56333154 0.5477584 ]
Image standard deviation: [0.09301891 0.09818967 0.09831126 0.1057784 0.10842132 0.11062996 0.12791838 0.02637859 0.0675052 0.06168227]
(normalized by (clip_max_value - clip_min_value), arranged in 0-10 channel order)
```
import numpy as np
from PIL import Image
import paddlex as pdx
model_dir = "saved_model/remote_sensing_unet/best_model/"
img_file = "dataset/remote_sensing_seg/data/LC80150242014146LGN00_23_data.tif"
label_file = "dataset/remote_sensing_seg/mask/LC80150242014146LGN00_23_mask.png"
color = [255, 255, 255, 0, 0, 0, 255, 255, 0, 255, 0, 0, 150, 150, 150]
# 预测并可视化预测结果
model = pdx.load_model(model_dir)
pred = model.predict(img_file)
pdx.seg.visualize(
img_file, pred, weight=0., save_dir='./output/pred', color=color)
# 可视化标注文件
label = np.asarray(Image.open(label_file))
pred = {'label_map': label}
pdx.seg.visualize(
img_file, pred, weight=0., save_dir='./output/gt', color=color)
import paddlex as pdx
train_analysis = pdx.datasets.analysis.Seg(
data_dir='dataset/remote_sensing_seg',
file_list='dataset/remote_sensing_seg/train.txt',
label_list='dataset/remote_sensing_seg/labels.txt')
train_analysis.analysis()
import paddlex as pdx
clip_min_value = [7172, 6561, 5777, 5103, 4291, 4000, 4000, 4232, 6934, 7199]
clip_max_value = [
50000, 50000, 50000, 50000, 50000, 40000, 30000, 18000, 40000, 36000
]
data_info_file = 'dataset/remote_sensing_seg/train_infomation.pkl'
train_analysis = pdx.datasets.analysis.Seg(
data_dir='dataset/remote_sensing_seg',
file_list='dataset/remote_sensing_seg/train.txt',
label_list='dataset/remote_sensing_seg/labels.txt')
train_analysis.cal_clipped_mean_std(clip_min_value, clip_max_value,
data_info_file)
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path as osp
import argparse
from paddlex.seg import transforms
import paddlex as pdx
def parse_args():
parser = argparse.ArgumentParser(description='RemoteSensing training')
parser.add_argument(
'--data_dir',
dest='data_dir',
help='dataset directory',
default=None,
type=str)
parser.add_argument(
'--train_file_list',
dest='train_file_list',
help='train file_list',
default=None,
type=str)
parser.add_argument(
'--eval_file_list',
dest='eval_file_list',
help='eval file_list',
default=None,
type=str)
parser.add_argument(
'--label_list',
dest='label_list',
help='label_list file',
default=None,
type=str)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='model save directory',
default=None,
type=str)
parser.add_argument(
'--num_classes',
dest='num_classes',
help='Number of classes',
default=None,
type=int)
parser.add_argument(
'--channel',
dest='channel',
help='number of data channel',
default=3,
type=int)
parser.add_argument(
'--clip_min_value',
dest='clip_min_value',
help='Min values for clipping data',
nargs='+',
default=None,
type=int)
parser.add_argument(
'--clip_max_value',
dest='clip_max_value',
help='Max values for clipping data',
nargs='+',
default=None,
type=int)
parser.add_argument(
'--mean',
dest='mean',
help='Data means',
nargs='+',
default=None,
type=float)
parser.add_argument(
'--std',
dest='std',
help='Data standard deviation',
nargs='+',
default=None,
type=float)
parser.add_argument(
'--num_epochs',
dest='num_epochs',
help='number of traing epochs',
default=100,
type=int)
parser.add_argument(
'--train_batch_size',
dest='train_batch_size',
help='training batch size',
default=4,
type=int)
parser.add_argument(
'--lr', dest='lr', help='learning rate', default=0.01, type=float)
return parser.parse_args()
args = parse_args()
data_dir = args.data_dir
train_list = args.train_file_list
val_list = args.eval_file_list
label_list = args.label_list
save_dir = args.save_dir
num_classes = args.num_classes
channel = args.channel
clip_min_value = args.clip_min_value
clip_max_value = args.clip_max_value
mean = args.mean
std = args.std
num_epochs = args.num_epochs
train_batch_size = args.train_batch_size
lr = args.lr
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomVerticalFlip(0.5),
transforms.RandomHorizontalFlip(0.5),
transforms.ResizeStepScaling(0.5, 2.0, 0.25),
transforms.RandomPaddingCrop(im_padding_value=[1000] * channel),
transforms.Clip(
min_val=clip_min_value, max_val=clip_max_value),
transforms.Normalize(
min_val=clip_min_value, max_val=clip_max_value, mean=mean, std=std),
])
eval_transforms = transforms.Compose([
transforms.Clip(
min_val=clip_min_value, max_val=clip_max_value),
transforms.Normalize(
min_val=clip_min_value, max_val=clip_max_value, mean=mean, std=std),
])
train_dataset = pdx.datasets.SegDataset(
data_dir=data_dir,
file_list=train_list,
label_list=label_list,
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir=data_dir,
file_list=val_list,
label_list=label_list,
transforms=eval_transforms)
model = pdx.seg.UNet(num_classes=num_classes, input_channel=channel)
model.train(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=5,
log_interval_steps=10,
save_dir=save_dir,
learning_rate=lr,
use_vdl=True)
......@@ -32,6 +32,7 @@ from . import slim
from . import convertor
from . import tools
from . import deploy
from . import remotesensing
try:
import pycocotools
......
......@@ -20,3 +20,4 @@ from .easydata_cls import EasyDataCls
from .easydata_det import EasyDataDet
from .easydata_seg import EasyDataSeg
from .dataset import generate_minibatch
from .analysis import Seg
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
import numpy as np
import os.path as osp
import cv2
from PIL import Image
import pickle
import threading
import multiprocessing as mp
import paddlex.utils.logging as logging
from paddlex.utils import path_normalization
from paddlex.cv.transforms.seg_transforms import Compose
from .dataset import get_encoding
class Seg:
def __init__(self, data_dir, file_list, label_list):
self.data_dir = data_dir
self.file_list_path = file_list
self.file_list = list()
self.labels = list()
with open(label_list, encoding=get_encoding(label_list)) as f:
for line in f:
item = line.strip()
self.labels.append(item)
with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f:
items = line.strip().split()
if len(items) > 2:
raise Exception(
"A space is defined as the separator, but it exists in image or label name {}."
.format(line))
items[0] = path_normalization(items[0])
items[1] = path_normalization(items[1])
full_path_im = osp.join(data_dir, items[0])
full_path_label = osp.join(data_dir, items[1])
if not osp.exists(full_path_im):
raise IOError('The image file {} is not exist!'.format(
full_path_im))
if not osp.exists(full_path_label):
raise IOError('The image file {} is not exist!'.format(
full_path_label))
self.file_list.append([full_path_im, full_path_label])
self.num_samples = len(self.file_list)
def _get_shape(self):
max_height = max(self.im_height_list)
max_width = max(self.im_width_list)
min_height = min(self.im_height_list)
min_width = min(self.im_width_list)
shape_info = {
'max_height': max_height,
'max_width': max_width,
'min_height': min_height,
'min_width': min_width,
}
return shape_info
def _get_label_pixel_info(self):
pixel_num = np.dot(self.im_height_list, self.im_width_list)
label_pixel_info = dict()
for label_value, label_value_num in zip(self.label_value_list,
self.label_value_num_list):
for v, n in zip(label_value, label_value_num):
if v not in label_pixel_info.keys():
label_pixel_info[v] = [n, float(n) / float(pixel_num)]
else:
label_pixel_info[v][0] += n
label_pixel_info[v][1] += float(n) / float(pixel_num)
return label_pixel_info
def _get_image_pixel_info(self):
channel = max([len(im_value) for im_value in self.im_value_list])
im_pixel_info = [dict() for c in range(channel)]
for im_value, im_value_num in zip(self.im_value_list,
self.im_value_num_list):
for c in range(channel):
for v, n in zip(im_value[c], im_value_num[c]):
if v not in im_pixel_info[c].keys():
im_pixel_info[c][v] = n
else:
im_pixel_info[c][v] += n
return im_pixel_info
def _get_mean_std(self):
im_mean = np.asarray(self.im_mean_list)
im_mean = im_mean.sum(axis=0)
im_mean = im_mean / len(self.file_list)
im_mean /= self.max_im_value - self.min_im_value
im_std = np.asarray(self.im_std_list)
im_std = im_std.sum(axis=0)
im_std = im_std / len(self.file_list)
im_std /= self.max_im_value - self.min_im_value
return (im_mean, im_std)
def _get_image_info(self, start, end):
for id in range(start, end):
full_path_im, full_path_label = self.file_list[id]
image, label = Compose.decode_image(full_path_im, full_path_label)
height, width, channel = image.shape
self.im_height_list[id] = height
self.im_width_list[id] = width
self.im_channel_list[id] = channel
self.im_mean_list[
id] = [image[:, :, c].mean() for c in range(channel)]
self.im_std_list[
id] = [image[:, :, c].std() for c in range(channel)]
for c in range(channel):
unique, counts = np.unique(image[:, :, c], return_counts=True)
self.im_value_list[id].extend([unique])
self.im_value_num_list[id].extend([counts])
unique, counts = np.unique(label, return_counts=True)
self.label_value_list[id] = unique
self.label_value_num_list[id] = counts
def _get_clipped_mean_std(self, start, end, clip_min_value,
clip_max_value):
for id in range(start, end):
full_path_im, full_path_label = self.file_list[id]
image, label = Compose.decode_image(full_path_im, full_path_label)
for c in range(self.channel_num):
np.clip(
image[:, :, c],
clip_min_value[c],
clip_max_value[c],
out=image[:, :, c])
image[:, :, c] -= clip_min_value[c]
image[:, :, c] /= clip_max_value[c] - clip_min_value[c]
self.clipped_im_mean_list[id] = [
image[:, :, c].mean() for c in range(self.channel_num)
]
self.clipped_im_std_list[
id] = [image[:, :, c].std() for c in range(self.channel_num)]
def analysis(self):
self.im_mean_list = [[] for i in range(len(self.file_list))]
self.im_std_list = [[] for i in range(len(self.file_list))]
self.im_value_list = [[] for i in range(len(self.file_list))]
self.im_value_num_list = [[] for i in range(len(self.file_list))]
self.im_height_list = np.zeros(len(self.file_list), dtype='int32')
self.im_width_list = np.zeros(len(self.file_list), dtype='int32')
self.im_channel_list = np.zeros(len(self.file_list), dtype='int32')
self.label_value_list = [[] for i in range(len(self.file_list))]
self.label_value_num_list = [[] for i in range(len(self.file_list))]
num_workers = mp.cpu_count() // 2 if mp.cpu_count() // 2 < 8 else 8
threads = []
one_worker_file = len(self.file_list) // num_workers
for i in range(num_workers):
start = one_worker_file * i
end = one_worker_file * (
i + 1) if i < num_workers - 1 else len(self.file_list)
t = threading.Thread(
target=self._get_image_info, args=(start, end))
threads.append(t)
for t in threads:
t.start()
for t in threads:
t.join()
unique, counts = np.unique(self.im_channel_list, return_counts=True)
if len(unique) > 1:
raise Exception("There are {} kinds of image channels: {}.".format(
len(unique), unique[:]))
self.channel_num = unique[0]
shape_info = self._get_shape()
self.max_height = shape_info['max_height']
self.max_width = shape_info['max_width']
self.min_height = shape_info['min_height']
self.min_width = shape_info['min_width']
self.label_pixel_info = self._get_label_pixel_info()
self.im_pixel_info = self._get_image_pixel_info()
mode = osp.split(self.file_list_path)[-1].split('.')[0]
import matplotlib.pyplot as plt
for c in range(self.channel_num):
plt.figure()
plt.bar(self.im_pixel_info[c].keys(),
self.im_pixel_info[c].values(),
width=1,
log=True)
plt.xlabel('image pixel value')
plt.ylabel('number')
plt.title('channel={}'.format(c))
plt.savefig(
osp.join(self.data_dir,
'{}_channel{}_distribute.png'.format(mode, c)),
dpi=100)
plt.close()
max_im_value = list()
min_im_value = list()
for c in range(self.channel_num):
max_im_value.append(max(self.im_pixel_info[c].keys()))
min_im_value.append(min(self.im_pixel_info[c].keys()))
self.max_im_value = np.asarray(max_im_value)
self.min_im_value = np.asarray(min_im_value)
im_mean, im_std = self._get_mean_std()
info = {
'channel_num': self.channel_num,
'image_pixel': self.im_pixel_info,
'label_pixel': self.label_pixel_info,
'file_num': len(self.file_list),
'max_height': self.max_height,
'max_width': self.max_width,
'min_height': self.min_height,
'min_width': self.min_width,
'max_image_value': self.max_im_value,
'min_image_value': self.min_im_value
}
saved_pkl_file = osp.join(self.data_dir,
'{}_infomation.pkl'.format(mode))
with open(osp.join(saved_pkl_file), 'wb') as f:
pickle.dump(info, f)
logging.info(
"############## The analysis results are as follows ##############\n"
)
logging.info("{} samples in file {}\n".format(
len(self.file_list), self.file_list_path))
logging.info("Minimal image height: {} Minimal image width: {}.\n".
format(self.min_height, self.min_width))
logging.info("Maximal image height: {} Maximal image width: {}.\n".
format(self.max_height, self.max_width))
logging.info("Image channel is {}.\n".format(self.channel_num))
logging.info(
"Minimal image value: {} Maximal image value: {} (arranged in 0-{} channel order) \n".
format(self.min_im_value, self.max_im_value, self.channel_num))
logging.info(
"Image pixel distribution of each channel is saved with 'distribute.png' in the {}"
.format(self.data_dir))
logging.info(
"Image mean value: {} Image standard deviation: {} (normalized by the (max_im_value - min_im_value), arranged in 0-{} channel order).\n".
format(im_mean, im_std, self.channel_num))
logging.info(
"Label pixel information is shown in a format of (label_id, the number of label_id, the ratio of label_id):"
)
for v, (n, r) in self.label_pixel_info.items():
logging.info("({}, {}, {})".format(v, n, r))
logging.info("Dataset information is saved in {}".format(
saved_pkl_file))
def cal_clipped_mean_std(self, clip_min_value, clip_max_value,
data_info_file):
with open(data_info_file, 'rb') as f:
im_info = pickle.load(f)
channel_num = im_info['channel_num']
min_im_value = im_info['min_image_value']
max_im_value = im_info['max_image_value']
im_pixel_info = im_info['image_pixel']
if len(clip_min_value) != channel_num or len(
clip_max_value) != channel_num:
raise Exception(
"The length of clip_min_value or clip_max_value should be equal to the number of image channel {}."
.format(channle_num))
for c in range(channel_num):
if clip_min_value[c] < min_im_value[c] or clip_min_value[
c] > max_im_value[c]:
raise Exception(
"Clip_min_value of the channel {} is not in [{}, {}]".
format(c, min_im_value[c], max_im_value[c]))
if clip_max_value[c] < min_im_value[c] or clip_max_value[
c] > max_im_value[c]:
raise Exception(
"Clip_max_value of the channel {} is not in [{}, {}]".
format(c, min_im_value[c], self.max_im_value[c]))
self.clipped_im_mean_list = [[] for i in range(len(self.file_list))]
self.clipped_im_std_list = [[] for i in range(len(self.file_list))]
num_workers = mp.cpu_count() // 2 if mp.cpu_count() // 2 < 8 else 8
threads = []
one_worker_file = len(self.file_list) // num_workers
self.channel_num = channel_num
for i in range(num_workers):
start = one_worker_file * i
end = one_worker_file * (
i + 1) if i < num_workers - 1 else len(self.file_list)
t = threading.Thread(
target=self._get_clipped_mean_std,
args=(start, end, clip_min_value, clip_max_value))
threads.append(t)
for t in threads:
t.start()
for t in threads:
t.join()
im_mean = np.asarray(self.clipped_im_mean_list)
im_mean = im_mean.sum(axis=0)
im_mean = im_mean / len(self.file_list)
im_std = np.asarray(self.clipped_im_std_list)
im_std = im_std.sum(axis=0)
im_std = im_std / len(self.file_list)
for c in range(channel_num):
clip_pixel_num = 0
pixel_num = sum(im_pixel_info[c].values())
for v, n in im_pixel_info[c].items():
if v < clip_min_value[c] or v > clip_max_value[c]:
clip_pixel_num += n
logging.info("Channel {}, the ratio of pixels to be clipped = {}".
format(c, clip_pixel_num / pixel_num))
logging.info(
"Image mean value: {} Image standard deviation: {} (normalized by (clip_max_value - clip_min_value), arranged in 0-{} channel order).\n".
format(im_mean, im_std, self.channel_num))
......@@ -67,6 +67,10 @@ class ImageNet(Dataset):
with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f:
items = line.strip().split()
if len(items):
raise Exception(
"A space is defined as the separator, but it exists in image or label name {}."
.format(line))
items[0] = path_normalization(items[0])
if not is_pic(items[0]):
continue
......
......@@ -20,7 +20,6 @@ import paddlex.utils.logging as logging
from paddlex.utils import path_normalization
from .dataset import Dataset
from .dataset import get_encoding
from .dataset import is_pic
class SegDataset(Dataset):
......@@ -65,10 +64,12 @@ class SegDataset(Dataset):
with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f:
items = line.strip().split()
if len(items) > 2:
raise Exception(
"A space is defined as the separator, but it exists in image or label name {}."
.format(line))
items[0] = path_normalization(items[0])
items[1] = path_normalization(items[1])
if not is_pic(items[0]):
continue
full_path_im = osp.join(data_dir, items[0])
full_path_label = osp.join(data_dir, items[1])
if not osp.exists(full_path_im):
......
......@@ -91,6 +91,10 @@ class VOCDetection(Dataset):
line = fr.readline()
if not line:
break
if len(line.strip().split()) > 2:
raise Exception(
"A space is defined as the separator, but it exists in image or label name {}."
.format(line))
img_file, xml_file = [osp.join(data_dir, x) \
for x in line.strip().split()[:2]]
img_file = path_normalization(img_file)
......
......@@ -54,6 +54,8 @@ class DeepLabv3p(BaseAPI):
pooling_crop_size (list): 当backbone为MobileNetV3_large_x1_0_ssld时,需设置为训练过程中模型输入大小, 格式为[W, H]。
在encoder模块中获取图像平均值时被用到,若为None,则直接求平均值;若为模型输入大小,则使用'pool'算子得到平均值。
默认值为None。
input_channel (int): 输入图像通道数。默认值3。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: backbone取值不在['Xception65', 'Xception41', 'MobileNetV2_x0.25',
......@@ -75,7 +77,8 @@ class DeepLabv3p(BaseAPI):
use_dice_loss=False,
class_weight=None,
ignore_index=255,
pooling_crop_size=None):
pooling_crop_size=None,
input_channel=3):
self.init_params = locals()
super(DeepLabv3p, self).__init__('segmenter')
# dice_loss或bce_loss只适用两类分割中
......@@ -149,6 +152,7 @@ class DeepLabv3p(BaseAPI):
if self.output_is_logits:
self.conv_filters = self.num_classes
self.backbone_lr_mult_list = [0.15, 0.35, 0.65, 0.85, 1]
self.input_channel = input_channel
def _get_backbone(self, backbone):
def mobilenetv2(backbone):
......@@ -236,7 +240,8 @@ class DeepLabv3p(BaseAPI):
add_image_level_feature=self.add_image_level_feature,
use_sum_merge=self.use_sum_merge,
conv_filters=self.conv_filters,
output_is_logits=self.output_is_logits)
output_is_logits=self.output_is_logits,
input_channel=self.input_channel)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
......
......@@ -36,6 +36,8 @@ class FastSCNN(DeepLabv3p):
也支持计算两个分支或三个分支上的loss,权重按[fusion_branch_weight, higher_branch_weight, lower_branch_weight]排列,
fusion_branch_weight为空间细节分支和全局上下文分支融合后的分支上的loss权重,higher_branch_weight为空间细节分支上的loss权重,
lower_branch_weight为全局上下文分支上的loss权重,若higher_branch_weight和lower_branch_weight未设置则不会计算这两个分支上的loss。
input_channel (int): 输入图像通道数。默认值3。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
......@@ -52,7 +54,8 @@ class FastSCNN(DeepLabv3p):
use_dice_loss=False,
class_weight=None,
ignore_index=255,
multi_loss_weight=[1.0]):
multi_loss_weight=[1.0],
input_channel=3):
self.init_params = locals()
super(DeepLabv3p, self).__init__('segmenter')
# dice_loss或bce_loss只适用两类分割中
......@@ -93,6 +96,7 @@ class FastSCNN(DeepLabv3p):
self.ignore_index = ignore_index
self.labels = None
self.fixed_input_shape = None
self.input_channel = input_channel
def build_net(self, mode='train'):
model = paddlex.cv.nets.segmentation.FastSCNN(
......@@ -103,7 +107,8 @@ class FastSCNN(DeepLabv3p):
class_weight=self.class_weight,
ignore_index=self.ignore_index,
multi_loss_weight=self.multi_loss_weight,
fixed_input_shape=self.fixed_input_shape)
fixed_input_shape=self.fixed_input_shape,
input_channel=self.input_channel)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
......
......@@ -34,6 +34,7 @@ class HRNet(DeepLabv3p):
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
input_channel (int): 输入图像通道数。默认值3。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
......@@ -48,7 +49,8 @@ class HRNet(DeepLabv3p):
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
ignore_index=255,
input_channel=3):
self.init_params = locals()
super(DeepLabv3p, self).__init__('segmenter')
# dice_loss或bce_loss只适用两类分割中
......@@ -79,6 +81,7 @@ class HRNet(DeepLabv3p):
self.ignore_index = ignore_index
self.labels = None
self.fixed_input_shape = None
self.input_channel = input_channel
def build_net(self, mode='train'):
model = paddlex.cv.nets.segmentation.HRNet(
......@@ -89,7 +92,8 @@ class HRNet(DeepLabv3p):
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index,
fixed_input_shape=self.fixed_input_shape)
fixed_input_shape=self.fixed_input_shape,
input_channel=self.input_channel)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
......
......@@ -36,7 +36,7 @@ class PPYOLO(BaseAPI):
Args:
num_classes (int): 类别数。默认为80。
backbone (str): PPYOLO的backbone网络,取值范围为['ResNet50_vd']。默认为'ResNet50_vd'。
backbone (str): PPYOLO的backbone网络,取值范围为['ResNet50_vd_ssld']。默认为'ResNet50_vd_ssld'。
with_dcn_v2 (bool): Backbone是否使用DCNv2结构。默认为True。
anchors (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
......
......@@ -33,6 +33,7 @@ class UNet(DeepLabv3p):
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
input_channel (int): 输入图像通道数。默认值3。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
......@@ -47,7 +48,8 @@ class UNet(DeepLabv3p):
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
ignore_index=255,
input_channel=3):
self.init_params = locals()
super(DeepLabv3p, self).__init__('segmenter')
# dice_loss或bce_loss只适用两类分割中
......@@ -78,6 +80,7 @@ class UNet(DeepLabv3p):
self.ignore_index = ignore_index
self.labels = None
self.fixed_input_shape = None
self.input_channel = input_channel
def build_net(self, mode='train'):
model = paddlex.cv.nets.segmentation.UNet(
......@@ -88,7 +91,8 @@ class UNet(DeepLabv3p):
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index,
fixed_input_shape=self.fixed_input_shape)
fixed_input_shape=self.fixed_input_shape,
input_channel=self.input_channel)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
......
......@@ -20,6 +20,7 @@ import numpy as np
import time
import paddlex.utils.logging as logging
from .detection_eval import fixed_linspace, backup_linspace, loadRes
from paddlex.cv.datasets.dataset import is_pic
def visualize_detection(image, result, threshold=0.5, save_dir='./'):
......@@ -44,7 +45,11 @@ def visualize_detection(image, result, threshold=0.5, save_dir='./'):
return image
def visualize_segmentation(image, result, weight=0.6, save_dir='./'):
def visualize_segmentation(image,
result,
weight=0.6,
save_dir='./',
color=None):
"""
Convert segment result to color image, and save added image.
Args:
......@@ -52,10 +57,14 @@ def visualize_segmentation(image, result, weight=0.6, save_dir='./'):
result: the predict result of image
weight: the image weight of visual image, and the result weight is (1 - weight)
save_dir: the directory for saving visual image
color: the list of a BGR-mode color for each label.
"""
label_map = result['label_map']
color_map = get_color_map_list(256)
if color is not None:
color_map[0:len(color) // 3][:] = color
color_map = np.array(color_map).astype("uint8")
# Use OpenCV LUT for color mapping
c1 = cv2.LUT(label_map, color_map[:, 0])
c2 = cv2.LUT(label_map, color_map[:, 1])
......@@ -65,11 +74,26 @@ def visualize_segmentation(image, result, weight=0.6, save_dir='./'):
if isinstance(image, np.ndarray):
im = image
image_name = str(int(time.time() * 1000)) + '.jpg'
if image.shape[2] != 3:
logging.info(
"The image is not 3-channel array, so predicted label map is shown as a pseudo color image."
)
weight = 0.
else:
image_name = os.path.split(image)[-1]
im = cv2.imread(image)
if not is_pic(image):
logging.info(
"The image cannot be opened by opencv, so predicted label map is shown as a pseudo color image."
)
image_name = image_name.split('.')[0] + '.jpg'
weight = 0.
else:
im = cv2.imread(image)
vis_result = cv2.addWeighted(im, weight, pseudo_img, 1 - weight, 0)
if abs(weight) < 1e-5:
vis_result = pseudo_img
else:
vis_result = cv2.addWeighted(im, weight, pseudo_img, 1 - weight, 0)
if save_dir is not None:
if not os.path.exists(save_dir):
......
......@@ -72,6 +72,7 @@ class DeepLabv3p(object):
def __init__(self,
num_classes,
backbone,
input_channel=3,
mode='train',
output_stride=16,
aspp_with_sep_conv=True,
......@@ -115,6 +116,7 @@ class DeepLabv3p(object):
format(type(class_weight)))
self.num_classes = num_classes
self.input_channel = input_channel
self.backbone = backbone
self.mode = mode
self.use_bce_loss = use_bce_loss
......@@ -402,13 +404,16 @@ class DeepLabv3p(object):
if self.fixed_input_shape is not None:
input_shape = [
None, 3, self.fixed_input_shape[1], self.fixed_input_shape[0]
None, self.input_channel, self.fixed_input_shape[1],
self.fixed_input_shape[0]
]
inputs['image'] = fluid.data(
dtype='float32', shape=input_shape, name='image')
else:
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
dtype='float32',
shape=[None, self.input_channel, None, None],
name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
......
......@@ -33,6 +33,7 @@ from .model_utils.loss import bce_loss
class FastSCNN(object):
def __init__(self,
num_classes,
input_channel=3,
mode='train',
use_bce_loss=False,
use_dice_loss=False,
......@@ -62,6 +63,7 @@ class FastSCNN(object):
format(type(class_weight)))
self.num_classes = num_classes
self.input_channel = input_channel
self.mode = mode
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
......@@ -137,13 +139,16 @@ class FastSCNN(object):
inputs = OrderedDict()
if self.fixed_input_shape is not None:
input_shape = [
None, 3, self.fixed_input_shape[1], self.fixed_input_shape[0]
None, self.input_channel, self.fixed_input_shape[1],
self.fixed_input_shape[0]
]
inputs['image'] = fluid.data(
dtype='float32', shape=input_shape, name='image')
else:
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
dtype='float32',
shape=[None, self.input_channel, None, None],
name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
......
......@@ -32,6 +32,7 @@ import paddlex
class HRNet(object):
def __init__(self,
num_classes,
input_channel=3,
mode='train',
width=18,
use_bce_loss=False,
......@@ -61,6 +62,7 @@ class HRNet(object):
format(type(class_weight)))
self.num_classes = num_classes
self.input_channel = input_channel
self.mode = mode
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
......@@ -136,13 +138,16 @@ class HRNet(object):
if self.fixed_input_shape is not None:
input_shape = [
None, 3, self.fixed_input_shape[1], self.fixed_input_shape[0]
None, self.input_channel, self.fixed_input_shape[1],
self.fixed_input_shape[0]
]
inputs['image'] = fluid.data(
dtype='float32', shape=input_shape, name='image')
else:
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
dtype='float32',
shape=[None, self.input_channel, None, None],
name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
......
......@@ -64,6 +64,7 @@ class UNet(object):
def __init__(self,
num_classes,
input_channel=3,
mode='train',
upsample_mode='bilinear',
use_bce_loss=False,
......@@ -92,6 +93,7 @@ class UNet(object):
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.input_channel = input_channel
self.mode = mode
self.upsample_mode = upsample_mode
self.use_bce_loss = use_bce_loss
......@@ -232,13 +234,16 @@ class UNet(object):
if self.fixed_input_shape is not None:
input_shape = [
None, 3, self.fixed_input_shape[1], self.fixed_input_shape[0]
None, self.input_channel, self.fixed_input_shape[1],
self.fixed_input_shape[0]
]
inputs['image'] = fluid.data(
dtype='float32', shape=input_shape, name='image')
else:
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
dtype='float32',
shape=[None, self.input_channel, None, None],
name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
......
......@@ -18,8 +18,12 @@ import numpy as np
from PIL import Image, ImageEnhance
def normalize(im, mean, std):
im = im / 255.0
def normalize(im, mean, std, min_value, max_value):
# Rescaling (min-max normalization)
range_value = [max_value[i] - min_value[i] for i in range(len(max_value))]
im = (im - min_value) / range_value
# Standardization (Z-score Normalization)
im -= mean
im /= std
return im
......
......@@ -20,7 +20,11 @@ import os.path as osp
import numpy as np
from PIL import Image
import cv2
import imghdr
import six
import sys
from collections import OrderedDict
import paddlex.utils.logging as logging
......@@ -60,6 +64,63 @@ class Compose(SegTransform):
"Elements in transforms should be defined in 'paddlex.seg.transforms' or class of imgaug.augmenters.Augmenter, see docs here: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/"
)
@staticmethod
def read_img(img_path):
img_format = imghdr.what(img_path)
name, ext = osp.splitext(img_path)
if img_format == 'tiff' or ext == '.img':
try:
import gdal
except:
six.reraise(*sys.exc_info())
raise Exception(
"Please refer to https://github.com/PaddlePaddle/PaddleX/tree/develop/examples/multi-channel_remote_sensing/README.md to install gdal"
)
dataset = gdal.Open(img_path)
if dataset == None:
raise Exception('Can not open', img_path)
im_data = dataset.ReadAsArray()
return im_data.transpose((1, 2, 0))
elif img_format == 'png':
return np.asarray(Image.open(img_path))
elif ext == '.npy':
return np.load(img_path)
else:
raise Exception('Image format {} is not supported!'.format(ext))
@staticmethod
def decode_image(im, label):
if isinstance(im, np.ndarray):
if len(im.shape) != 3:
raise Exception(
"im should be 3-dimensions, but now is {}-dimensions".
format(len(im.shape)))
else:
try:
im = Compose.read_img(im)
except:
raise ValueError('Can\'t read The image file {}!'.format(im))
im = im.astype('float32')
if label is not None:
if isinstance(label, np.ndarray):
if len(label.shape) != 2:
raise Exception(
"label should be 2-dimensions, but now is {}-dimensions".
format(len(label.shape)))
else:
try:
label = np.asarray(Image.open(label))
except:
ValueError('Can\'t read The label file {}!'.format(label))
im_height, im_width, _ = im.shape
label_height, label_width = label.shape
if im_height != label_height or im_width != label_width:
raise Exception(
"The height or width of the image is not same as the label")
return (im, label)
def __call__(self, im, im_info=None, label=None):
"""
Args:
......@@ -73,24 +134,12 @@ class Compose(SegTransform):
tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。
"""
if isinstance(im, np.ndarray):
if len(im.shape) != 3:
raise Exception(
"im should be 3-dimensions, but now is {}-dimensions".
format(len(im.shape)))
else:
try:
im = cv2.imread(im)
except:
raise ValueError('Can\'t read The image file {}!'.format(im))
im = im.astype('float32')
if im_info is None:
im_info = [('origin_shape', im.shape[0:2])]
im, label = self.decode_image(im, label)
if self.to_rgb:
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
if im_info is None:
im_info = [('origin_shape', im.shape[0:2])]
if label is not None:
if not isinstance(label, np.ndarray):
label = np.asarray(Image.open(label))
origin_label = label.copy()
for op in self.transforms:
if isinstance(op, SegTransform):
......@@ -550,22 +599,35 @@ class ResizeStepScaling(SegTransform):
class Normalize(SegTransform):
"""对图像进行标准化。
1.尺度缩放到 [0,1]。
2.对图像进行减均值除以标准差操作。
1.像素值减去min_val
2.像素值除以(max_val-min_val)
3.对图像进行减均值除以标准差操作。
Args:
mean (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。
std (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。
min_val (list): 图像数据集的最小值。默认值[0, 0, 0]。
max_val (list): 图像数据集的最大值。默认值[255.0, 255.0, 255.0]。
Raises:
ValueError: mean或std不是list对象。std包含0。
"""
def __init__(self, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]):
def __init__(self,
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5],
min_val=[0, 0, 0],
max_val=[255.0, 255.0, 255.0]):
self.min_val = min_val
self.max_val = max_val
self.mean = mean
self.std = std
if not (isinstance(self.mean, list) and isinstance(self.std, list)):
raise ValueError("{}: input type is invalid.".format(self))
if not (isinstance(self.min_val, list) and isinstance(self.max_val,
list)):
raise ValueError("{}: input type is invalid.".format(self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
......@@ -588,7 +650,8 @@ class Normalize(SegTransform):
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
im = normalize(im, mean, std)
im = normalize(im, mean, std, self.min_val, self.max_val)
im = im.astype('float32')
if label is None:
return (im, im_info)
......@@ -752,23 +815,26 @@ class RandomPaddingCrop(SegTransform):
pad_height = max(crop_height - img_height, 0)
pad_width = max(crop_width - img_width, 0)
if (pad_height > 0 or pad_width > 0):
im = cv2.copyMakeBorder(
im,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.im_padding_value)
img_channel = im.shape[2]
import copy
orig_im = copy.deepcopy(im)
im = np.zeros((img_height + pad_height, img_width + pad_width,
img_channel)).astype(orig_im.dtype)
for i in range(img_channel):
im[:, :, i] = np.pad(
orig_im[:, :, i],
pad_width=((0, pad_height), (0, pad_width)),
mode='constant',
constant_values=(self.im_padding_value[i],
self.im_padding_value[i]))
if label is not None:
label = cv2.copyMakeBorder(
label,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.label_padding_value)
label = np.pad(label,
pad_width=((0, pad_height), (0, pad_width)),
mode='constant',
constant_values=(self.label_padding_value,
self.label_padding_value))
img_height = im.shape[0]
img_width = im.shape[1]
......@@ -1065,6 +1131,33 @@ class RandomDistort(SegTransform):
return (im, im_info, label)
class Clip(SegTransform):
"""
对图像上超出一定范围的数据进行截断。
Args:
min_val (list): 裁剪的下限,小于min_val的数值均设为min_val. 默认值0.
max_val (list): 裁剪的上限,大于max_val的数值均设为max_val. 默认值255.0.
"""
def __init__(self, min_val=[0, 0, 0], max_val=[255.0, 255.0, 255.0]):
self.min_val = min_val
self.max_val = max_val
if not (isinstance(self.min_val, list) and isinstance(self.max_val,
list)):
raise ValueError("{}: input type is invalid.".format(self))
def __call__(self, im, im_info=None, label=None):
for k in range(im.shape[2]):
np.clip(
im[:, :, k], self.min_val[k], self.max_val[k], out=im[:, :, k])
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ArrangeSegmenter(SegTransform):
"""获取训练/验证/预测所需的信息。
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册